- Stanford SNAP datasets: networks
- UCI Machine Learning Repository
- Internet Traffic Archive
- Academic Torrents
- NYC MTA transit data
- SFMTA GPS data on vehicles
- Uber Anonymized GPS
- Citi Bike NYC: json
- Capital Bike Share DC
- Bay Area Bike Share: data and gem
- Weather
- NYC Taxis
- swissnexSF: Urban Data Challenge
- San Francisco
- Education
- SF neighborhoods
- New York
- United States
- Census
- United Kingdom
- OpenPrism
- SF Building Permits
- SFPD Incidents
- Pew Research Datasets
- US Bill Cosponsorship
APIs -- public and hidden
- Wikipedia
- Foursquare
- Tumblr
- Rdio
- Delicous
- NYT
- Disqus
- Yelp
- Last.fm
- bitly
- Yahoo Finance (hidden)
- Hunch
- Trulia
- Evernote
- Songkick
- Freebase
- Programmable Web
- __100 interesting datasets for Statistics
- CrowdFlower Open Data
- 538 Datasets
- DataMob
- /r/datasets
- Introduction to Data Science (Berkeley): by Jeff (the Hammer) Hammerbacher
- Peter Skomoroch
- Hilary Mason Research Data sets
- Quora post
- StatsSci
- Open Science Data Cloud
- Visual.ly open data
- visualizing.org open data
- music + data
- Datasets released by Google
- marketplace: Infochimps
- time series: Quandl
- public datasets: enigma
- location contextualization: factual
- financial modeling: Quantopian
- email contextualization: Rapleaf
- social media: Gnip
- knoema
- Find the Data