Datasets

Datasets in Physical AI are the recorded streams of robot behaviour — teleoperated demonstrations, scripted rollouts, and large-scale egocentric video — that feed imitation learning, behaviour cloning, and pretraining for vision-language-action models. They typically include synchronised proprioception, multi-view RGB, depth, and language annotations across one or many embodiments.

From an engineering standpoint, the dataset is the single biggest determinant of what a learned policy can do and where it will silently fail. Coverage gaps, embodiment skew, demonstration quality, and licensing terms propagate directly into deployed behaviour, and dataset hygiene (synchronisation, calibration, action-space conventions) is far more decisive than model architecture for most real-world results.

When choosing between datasets, weigh embodiment alignment with your target robot, task and scene diversity, action-space and control-frequency conventions, and licence terms for commercial use. Cross-embodiment corpora are powerful for pretraining but rarely sufficient on their own — most production stacks combine a broad pretraining set with a smaller, tightly-scoped fine-tuning set collected on the deployment platform.

Start here

Open X-Embodiment is the default starting point: 1M+ trajectories across 22 embodiments, and the dataset most modern generalist policies are benchmarked against.

Open X-Embodiment — 1M+ trajectories across 22 embodiments; the de facto cross-embodiment training corpus.
DROID — Large-scale, in-the-wild manipulation dataset collected across 13 institutions.
BridgeData V2 — Diverse manipulation behaviours designed to support broad generalisation.
RH20T — Robot manipulation dataset with paired human demonstrations for one-shot learning research.
RoboMIND — Multimodal bimanual mobile manipulation dataset with 310K+ trajectories.
AgiBot World — Large-scale dataset designed to train and evaluate robot foundation models.
Ego4D — Massive-scale egocentric video dataset useful for pretraining perception and world models.
Something-Something V2 — Action recognition dataset frequently used to pretrain manipulation perception.
RLDS — Standardized format and tooling for logged trajectories used across robot-learning datasets.
BEHAVIOR-1K — Large-scale household activity dataset and environment targeting realistic embodied task diversity.
CALVIN ABC-D — Language-conditioned manipulation dataset for long-horizon policy training and evaluation.
EPIC-KITCHENS-100 — Egocentric video corpus useful for action understanding and embodied perception pretraining.
nuScenes — Multisensor autonomous-driving dataset with rich annotations for perception and planning research.
Waymo Open Dataset — Large-scale real-world driving dataset used for perception, motion forecasting, and closed-loop autonomy studies.
Argoverse 2 — High-quality motion-forecasting and 3D tracking datasets for real-world embodied prediction tasks.