Simulation is among the most powerful tools available in robotics. It permits thousands of parallel worlds, stepped faster than wall-clock time, with ground-truth labels generated at no annotation cost and no risk to physical hardware. A policy can fail thousands of times across a cluster overnight and execute a stable gait by morning. In locomotion this approach has proven decisive: several of the most capable walking and running controllers were trained almost entirely in simulation and transferred onto hardware that had never previously moved.
Real data remains essential because the same properties that make simulation inexpensive also make it an approximation of the physical world. The mismatch is small where simulation models the underlying physics accurately, such as rigid-body dynamics over open terrain, and substantial where its fidelity degrades, such as a thumb pressing a soft fruit against a cluttered surface. This discrepancy is known as the reality gap, and in contact-rich manipulation it is the dominant source of policy failure.
This article examines where simulation is the appropriate tool, where its fidelity breaks down, why domain randomization addresses only part of the problem, and how a layer of diverse real human data converts a fragile simulation-trained policy into one that performs reliably in a real home.
Where simulation performs well
The strengths of simulation are concrete and consequential, and they merit a precise accounting. Simulators such as MuJoCo and NVIDIA Isaac Sim provide four capabilities that the physical world cannot match at comparable cost.
- Volume. Thousands of environments step in parallel, faster than real time, yielding large quantities of robot experience within a short collection window.
- Exact labels. Joint angles, contact forces, object poses, and segmentation masks are known precisely, free of annotation latency and free of sensor noise unless it is deliberately introduced.
- Safety. The robot can attempt dangerous, destructive, or otherwise costly behaviour at negligible expense, making exploration cheap.
- Reset-free training. The environment resets to a clean initial state on every episode, allowing reinforcement learning to run continuously without human re-staging.
These are substantive advantages. They account for much of the recent progress in legged locomotion, a regime governed by rigid-body physics and gravity, both of which modern engines model with high fidelity. For tasks that fall within this regime, training predominantly in simulation is a sound starting point.
Where the model and the world diverge
Difficulty arises when a task depends on quantities the simulator approximates rather than computes exactly. Manipulation is precisely such a task. Grasping a mug, folding a shirt, wiping a spill, and threading a cable are each governed by contact, friction, and material behaviour that no engine reproduces faithfully. The resulting error is not distributed uniformly. It concentrates in the moments that determine success, the instant of contact and the instant of release, where a small mismatch in friction or compliance reverses the outcome.
Contact dynamics are stiff, discontinuous, and highly sensitive to small parameter errors. Whether a grasp holds or slips can depend on an estimated friction coefficient, an idealised fingertip compliance, or a millimetre of smoothed geometry. Deformable objects present a greater challenge still. A simulator can render cloth, but the manner in which a real towel bunches, adheres to itself, and drapes over a wet plate reflects a coupled interaction of friction, fabric memory, and moisture that no real-time model currently captures.
The distribution problem is larger than any single physics term. A real kitchen is not a clean scene containing three known objects. It contains glare, partially filled bottles, a child's drawing taped to a cabinet, a left-handed operator, a drawer that binds on one runner. The variability of real human environments is not noise around a tidy mean. It is the distribution into which the policy will be deployed, and it carries far heavier tails than any distribution constructed by hand.

Domain randomization and its boundary
The standard response to the reality gap is domain randomization. Rather than committing to a single set of physics parameters, the training process randomizes them, varying friction, mass, lighting, textures, latency, and sensor noise widely enough that the real world appears as one more sample the policy has already encountered. Applied with care, the technique is effective and accounts for a substantial share of successful sim-to-real transfer in locomotion and rigid-object grasping.
The technique has a clear boundary. Randomization is confined to what the simulator can represent. Friction, mass, and lighting each correspond to an exposed parameter and can therefore be swept. The difficulty lies with every phenomenon that has no parameter because it was never modelled: the way a particular fabric pills, the suction of a damp surface, the compliance of an overripe tomato, the accumulation of clutter on a counter over a week. A phenomenon absent from the model cannot be randomized. Randomization widens the distribution along axes already selected and remains silent on the axes that were omitted.
Domain randomization varies only what the simulator already represents. A large share of the reality gap resides in the phenomena that were never modelled.
Motionstack
This accounts for the common case in which a policy passes every randomized test in the cluster yet fails in a home. The evaluation distribution is frequently derived from the same model that produced the training distribution, so it inherits the same blind spots. Such an evaluation measures whether the policy handles the model of reality, not whether it handles reality, which is a distinct and considerably more forgiving question.
Real human data as an anchor
The objective is not to abandon simulation but to stop treating it as the complete world and to treat it instead as a strong prior that must be tied to ground truth. The anchor is diverse, first-person human demonstration: ordinary people performing ordinary tasks in their own homes, captured with synced motion on a consistent rig. Real data provides three capabilities that simulation cannot supply by construction.
First, it grounds the policy. The true distribution of contact events, materials, lighting, and clutter need not be modelled, because it is recorded directly from the source, including the long-tail behaviour that would be impractical to script. Second, it co-trains. Mixing real demonstrations into a simulation-pretrained policy steers the model toward behaviour that succeeds on physical objects, in the same manner that pooling diverse data across robots and labs improved transfer in the Open X-Embodiment effort. Third, and most consequentially, it evaluates honestly.
Held-out real data is one of the few evaluations that simulation cannot quietly conform to its own assumptions. Reserving a slice of genuinely real episodes, a city the policy never trained on, an operator type it has not seen, a task variation deliberately withheld, gives a passing score real meaning. It indicates that the policy generalised to the world rather than to a model of the world. Organisations building generalist manipulation models, such as Physical Intelligence, rely heavily on broad real-world demonstration data for this reason: the physical world is among the few references that does not share the simulator's blind spots.
A pragmatic procedure
None of this argues against simulation. It argues for a deliberate division of labour. In practice, an effective sequence proceeds as follows, with each stage exploiting a distinct strength.
- Pretrain in simulation. Exploit volume, exact labels, and safety to learn the bulk of the policy: the broad motor skills, the predictable dynamics, and the behaviours that can be specified and reset cheaply.
- Augment with randomization. Sweep friction, mass, lighting, latency, and noise to harden the policy against the variation that can be modelled. This yields meaningful but incomplete robustness.
- Anchor on diverse real data. Mix in first-person human demonstrations across many people, places, and task variations to ground the policy in the contact, materials, and unstructured conditions an engine does not reproduce.
- Fine-tune for the target. Close the remaining gap with real episodes drawn from the specific embodiment, objects, and settings into which the policy will be deployed.
- Evaluate on held-out real data. Reserve a real slice the policy never observed, selected to test generalisation, and weight that result above any in-simulation metric.
The ratio of simulation to real data shifts with the task. A walking controller may be predominantly simulation with a thin real layer for final transfer. A bimanual folding policy depends on real data far more heavily, because much of what makes folding difficult lies in the part of the world the simulator approximates least well. The principle is not a fixed mixture but a consistent tendency: for contact-rich work the real layer is rarely zero, and a held-out real evaluation is worth retaining.
Sequencing also matters. Real data contributes at both ends of the pipeline, not only the beginning. Early in training it anchors a simulation-pretrained policy toward behaviour that holds up under contact. At the end it serves as a check, one of the few references that does not share the simulator's assumptions. Omitting either role amounts to grading the work against an answer key written by the same process under evaluation.
Diagnosing which side is deficient
A reliable diagnostic applies here. If a policy is brittle along axes the simulator models, such as payload, terrain slope, or controller latency, the deficiency is in randomization, and additional or better-tuned simulation will generally help. If the policy passes every test in the cluster and then fails the moment it contacts a real towel or a real countertop, additional simulation will not resolve it. That pattern is the signature of a missing real layer, and the remedy is diverse, consented, real demonstration data together with an honest held-out slice that surfaces the next gap before a customer does.

The gap is as much a sourcing problem as a physics one
It is tempting to treat the reality gap as a problem a better simulator will eventually solve. Better engines do help, and they continue to improve. But the part of the gap that matters most for manipulation is not awaiting a numerical method so much as data that was never collected: unedited, first-person captures of real people performing real chores in real homes, with motion attached and rights cleared for commercial training. Such footage was never posted online and therefore cannot be scraped. It must be sourced.
This is a favourable property, because sourcing can be engineered. The right person, in the right place, performing the right task variation, on a standardised rig, cleared for use. It is fundamentally a logistics problem, and logistics scales. In many cases the reality gap closes from the real side faster and more reliably than from the simulator side, because the real side contains precisely what the simulator omitted.
If a policy performs well in simulation but turns fragile the moment it leaves the cluster, the missing component is typically a real anchor. Tell us the spec and we can field the people, places, and task variations required, captured on a consistent rig with synced motion and rights cleared, to ground and evaluate the policy you already have.