Code Monkey home page Code Monkey logo

datasets's Introduction

wadefagen's Useful Datasets

This repository contains a collection of datasets I've found useful. Many of these datasets are clean versions of public datasets, provided in a clean, consistent format for use in data science projects.

Avaialble Datasets

General Format

Unless otherwise noted, all datasets are CSV files where the first row contains column headers.

Common column names across multiple datasets include:

  • Year, a four digit year (ex: 2018, 2017, etc)
  • Term, one of Spring, Summer, Fall, or Winter
  • YearTerm, a four digit year followed by -sp, -su, -fa, or -wi. For example: 2018-sp. This format ensure that all YearTerm >= "2016-fa" contains all data available from the Fall 2016 to present.

Useful Scripts

If you're working with these datasets, the following snippets may be helpful to load the data. Each example assumes you have cloned this repo inside of your project's working directory (as datasets, the default name).

Python (pandas)

import pandas as pd

df = pd.read_csv('datasets/gpa/uiuc-gpa-dataset.csv')
# `df` is a DataFrame of the CSV file

Python (dictionary)

import csv

with open("datasets/gpa/uiuc-gpa-dataset.csv", "r") as f:
  reader = csv.DictReader(f)
  for row in reader:
    # Each `row` is a row from the CSV as a Python dict indexed with column headers.
    
    # Example usage:
    term = row["Term"]
    year = int(row["Year"])    # Note that Python treats all data as strings; may be useful to make the year an `int`

JavaScript (node.js)

With the csv-parse package (npm install --save csv-parse):

const parse = require('csv-parse/lib/sync');

var rows = parse( fs.readFileSync("datasets/gpa/uiuc-gpa-dataset.csv"), {columns: true} );
rows.forEach(function (row) {
  // Each `row` is a row from the CSV as a dictionary indexed with column headers.

  // Example usage:
  var term = row["Term"];
  var year = row["Year"];
});

datasets's People

Contributors

wadefagen avatar chin123 avatar elliewix avatar sahilkamesh avatar dependabot[bot] avatar tinaabraham17 avatar will1982 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.