World Models

World models are learned predictive models of physical dynamics. They take observations and actions and produce future observations, latent states, or full rollouts — enabling planning in imagination, sample-efficient RL, synthetic data generation, and pretraining for downstream policies.

From an engineering standpoint, world models are how the field is attacking the data-efficiency problem: every step of imagined rollout is one fewer step on hardware, and a good world model can compress months of teleoperation into a reusable simulator. The same property makes them a load-bearing dependency for any system that uses them — model errors compound over rollout horizons, and silently shape what downstream policies can and cannot learn.

When choosing between approaches, separate generative pixel-space models (useful for synthetic data and visualisation) from latent-space predictive models (useful for planning and control), and weigh rollout horizon fidelity, action-conditioning quality, and compute cost per imagined step. For robot learning specifically, models with demonstrated downstream task transfer matter more than visually impressive video samples.

Start here

DreamerV3 is the most pragmatic entry point: a single fixed set of hyperparameters that produces strong results across 150+ tasks, and a clean reference implementation of latent-space world-model RL.

V-JEPA 2 (Meta FAIR) — Self-supervised video world model trained on 1M+ hours enabling zero-shot robot planning.
NVIDIA Cosmos — World foundation models for physically-grounded synthetic data generation.
Genie 2 (DeepMind) — Foundation world model that generates interactive, controllable 3D environments.
DreamerV3 — General world-model algorithm achieving strong results across 150+ tasks with fixed hyperparameters.
DayDreamer — World models applied to physical robot learning for sample-efficient skill acquisition.
UniSim — Universal simulator learning real-world interactions from diverse video data.
I-JEPA — Image joint-embedding predictive architecture; foundational to the JEPA world-model line.
World Models (Ha & Schmidhuber) — Foundational latent-dynamics framework for planning and control in learned simulators.
PlaNet — Latent-space planning with learned dynamics, a core precursor to Dreamer-style methods.
Dream to Control — Demonstrates control directly in latent imagination without pixel-space rollouts.
SimPLe — Model-based RL baseline showing strong sample efficiency from learned video prediction.
MuZero — Learned model-based planning architecture with strong performance across control domains.
DreamerV2 — Robust latent world-model RL variant with improved discrete latent representations.
GAIA-1 (Wayve) — Driving-oriented generative world model for physically plausible scenario synthesis.
TD-MPC2 — Modern latent model-predictive control method with broad robot-control transfer.
Robotic World Model (ETH RSL) — Learned world model for legged robots from ETH Zurich's Robotic Systems Lab; companion lite variant for lighter-weight experimentation.