AI Training Datasets Suppliers

Posted by

Check out the world’s training datasets suppliers. Subscribe and receive more updates from Lexsense.

Data Repositories


Appen Open Source Datasets: A selection of 270 audio, image, text and video datasets.
Anacode Chinese Web Datastore: A collection of crawled Chinese news and blogs.
AssetMacro: Historical data of macroeconomic indicators and market data.
Awesome Public Datasets: A topic-centric list of HQ open datasets
AWS Public Data Sets: A centralized repository of public datasets
USA.gov: APIs and data feeds to help people find useful government information
DataPortals.org: A Comprehensive List of Open Data Portals from Around the World
Data Planet: The largest repository of standardized and structured statistical data
DataSF.org: Search hundreds of datasets from the City and County of San Francisco
Europeana Data: Open metadata on 20 million texts, images, videos and sounds.
GEO Gene Expression Omnibus: Online resource for gene expression data browsing, query and retrieval
HitCompanies Datasets: Comprehensive data on random updated automatically using AI/Machine Learning
Data Challenge: 44 million blog posts made between August 1st and October 1st, 2008
JMP Public Datasets: Assorted public datasets from JMP
Kaggle Datasets: Explore, analyze, and share quality data
Linking Open Data: Making data freely available to everyone
Lyst Fashion Data Trends: The industry’s trusted source for tracking fashion data trends
Million Song Dataset: Collection of audio data for contemporary popular music tracks
NASDAQ Data Link: A premier source for financial, economic and alternative datasets
NASA Data Archive: NASA’s archive for space science mission data
Robert Schiller Data: Housing data, financial market data and more.
Sports Statistics: Data for soccer, NBA, NFL, NHL, and more
StatLib Archive: Datasets from Carnegie Mellon University
UCI Machine Learning Repository: A collection of databases for empirical analysis
UCR Time Series Archive: Datasets, papers, links, and code
UK Open Postcode Geo: Structured UK open data by location
United States Census Bureau: An assortment of US Census data
Web Data Commons: The largest structured web corpus available to the public.
Yahoo Webscope Program: Datasets for non-commercial use by academics and other scientists
Yelp Open Dataset: An all-purpose dataset for personal, educational, and academic purposes.

Leave a Reply

Your email address will not be published. Required fields are marked *

Want to read some blog posts about NLP?
Overlay Image
x Logo: Shield Security
This Site Is Protected By
Shield Security
Verified by MonsterInsights