Datasets
Datasets in Physical AI are the recorded streams of robot behaviour — teleoperated demonstrations, scripted rollouts, and large-scale egocentric video — that feed imitation learning, behaviour cloning, and pretraining for vision-language-action models. They typically include synchronised proprioception, multi-view RGB, depth, and language annotations across one or many embodiments.
From an engineering standpoint, the dataset is the single biggest determinant of what a learned policy can do and where it will silently fail. Coverage gaps, embodiment skew, demonstration quality, and licensing terms propagate directly into deployed behaviour, and dataset hygiene (synchronisation, calibration, action-space conventions) is far more decisive than model architecture for most real-world results.
When choosing between datasets, weigh embodiment alignment with your target robot, task and scene diversity, action-space and control-frequency conventions, and licence terms for commercial use. Cross-embodiment corpora are powerful for pretraining but rarely sufficient on their own — most production stacks combine a broad pretraining set with a smaller, tightly-scoped fine-tuning set collected on the deployment platform.
Open X-Embodiment is the default starting point: 1M+ trajectories across 22 embodiments, and the dataset most modern generalist policies are benchmarked against.
- Open X-Embodiment — 1M+ trajectories across 22 embodiments; the de facto cross-embodiment training corpus.
- DROID — Large-scale, in-the-wild manipulation dataset collected across 13 institutions.
- BridgeData V2 — Diverse manipulation behaviours designed to support broad generalisation.
- RH20T — Robot manipulation dataset with paired human demonstrations for one-shot learning research.
- RoboMIND — Multimodal bimanual mobile manipulation dataset with 310K+ trajectories.
- AgiBot World — Large-scale dataset designed to train and evaluate robot foundation models.
- Ego4D — Massive-scale egocentric video dataset useful for pretraining perception and world models.
- Something-Something V2 — Action recognition dataset frequently used to pretrain manipulation perception.
- RLDS — Standardized format and tooling for logged trajectories used across robot-learning datasets.
- BEHAVIOR-1K — Large-scale household activity dataset and environment targeting realistic embodied task diversity.
- CALVIN ABC-D — Language-conditioned manipulation dataset for long-horizon policy training and evaluation.
- EPIC-KITCHENS-100 — Egocentric video corpus useful for action understanding and embodied perception pretraining.
- nuScenes — Multisensor autonomous-driving dataset with rich annotations for perception and planning research.
- Waymo Open Dataset — Large-scale real-world driving dataset used for perception, motion forecasting, and closed-loop autonomy studies.
- Argoverse 2 — High-quality motion-forecasting and 3D tracking datasets for real-world embodied prediction tasks.