Code Monkey home page Code Monkey logo

reduced_reused_recycled's Introduction

Reduced, Reused, and Recycled: The Life of a Machine Learning Dataset

NeurIPS 2021

Best Paper NeurIPS Datasets and Benchmarks Track

This repository contains the datasheet, data, and code to reproduce all the analyses in the paper. I'm stil organizing things a bit, but if you need something immediately or find it confusing, please open a GitHub issue or email me. I recommend reading the paper, appendix, and datasheet (in that order) thoroughly before sifting through the code.

TEMPORARY EDIT: The raw datafiles are too big for Github and I need to find another solution like Zenodo. I am out of town through the end of January 2022 but if you need them ASAP let me know.

The Github is organized as follows:

"Analysis": This folder contains 3 Rscripts to reproduce the analyses and figures in the paper. These R scripts should save the figures to the "Figures" or "Appendix" folders as appropriate.

Most of the things you are looking for are in the "Data" folder.

Dataset_Curation: You'll find the main notebook to clean and curate the data for the whole project, "MainDataset.ipynb."

Dataset_Curation/Data: You'll find four json files that correspond to the raw data from MAG and PWC. The PWC files are 06/16/21 downloads from here. I will try and add some code to show how I got the MAG ID's for the papers in "datasets.json" but given that MAG is offline, I'm not sure it's that helpful.

Dataset_Curation/Data/Derivative Datasets: Derivative and intermediary datasets that are created throughout the cleaning process of MainDataset.ipynb are saved here. Some of these datasets take a long time to generate so it's helpful to save them as we go.

Dataset_Curation/Data/Analysis_Datasets: The final datasets created by MainDataset.ipyb drawn on by the R scripts in the "Analysis" Folder are saved here.

Dataset_Curation/Data/Sensitivity_Datasets: Sensitivity analysis datasets created by MainDataset.ipyb drawn on by the R scripts in the "Analysis" Folder are saved here.

reduced_reused_recycled's People

Contributors

kochbj avatar

Stargazers

Joseph Ko avatar yi du avatar Miklós Koren avatar ildik avatar Peter Steinbach avatar LittleFish avatar Hongwei Yi avatar Giang Nguyen avatar Stefan Baack avatar Finn Gaida avatar Zihan Wang avatar Risto Hinno avatar Ashwinkumar Ganesan avatar Atharva Sehgal avatar Alex Hanna avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.