Datasets
- Seaborn data - Data repository for seaborn examples.
- Plotly data - Plotly sample datasets.
- Matplotlib data - Sample data needed for some of Matplotlib’s examples.
- Free public data sets for analysis - Free public data sets for analysis.
- Open Datasets - Explore, analyse and share quality data.
- Natural Language Toolkit Data - NLTK Corpora datasets.
- FiveThirtyEight Data - Free datasets to advance public knowledge.
- BuzzFeed Data - Free datasets for data analytics.
- Nasa Earth Data - Full and open access to NASA’s collection of Earth science data for understanding and protecting our home planet.
- Nasa Space Data - Advanced, focused search tools are available from several PDS discipline nodes.
- Our World in Data - Free data to make progress against the world’s largest problems.
- Bokeh Sample Data - The sample data module can be used to download datasets used in Bokeh examples.
- TensorFlow Data - Tensorflow datasets.
- PyDataset Data - Instant access to many popular datasets for Python (in data frame structure).
- Scikit-learn Data - The sklearn. datasets package embeds some toy datasets.
- Stratsmodel Data - Statsmodels provides datasets (i.e. data and meta-data) for use in examples, tutorials, model testing, etc.
Modern Repositories (LLM & Computer Vision)
- Hugging Face Datasets - The largest hub for NLP, Audio, and Vision datasets.
- Common Crawl - Open repository of web crawl data (often used for LLM training).
- LAION-5B - Large-scale image-text dataset.