Code Monkey home page Code Monkey logo

cookiecutter-data-science's Introduction

Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Standard Naming Convention

Files should be named according to the person who is creating them, version numbers, and the purpose. For example:

  • notebooks/ak-1.2-DataPreprocessing.py
  • data/interim/ak-1.2-DataPreprocessing.csv
  • logs/ak-1.2-DataPreprocessing/GENERATED_FILES

Your initials (preferably first two), the notebook number (in chronological order of creation) and version number (incremented any time you make changes that would effect the output, and the type of file.

Common file names

  • DataPreprocessing - manipulating the raw data files into standard Pandas Dataframe form
  • DataExploration - for looking at the contents of the data, usually raw but could be other
  • ModelX - running a particular model
  • ParameterTuning - sweeping across hyperparameters
  • Scratch - personal playground, not expected to be production ready at any point

Requirements to use the cookiecutter template:


  • Python 2.7 or 3
  • Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:


cookiecutter https://github.com/AmiiThinks/cookiecutter-data-science

asciicast

The resulting directory structure


The directory structure of your new project looks like this:

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│       └── README.md  <- Link or instructions on acquiring the data
│   ├── interim        <- Intermediate data that has been transformed, use standard naming convention.
│   ├── processed      <- The final, canonical data sets for modeling, snc refers to relevant notebooks.
│   └── raw            <- The original, immutable data dump.
│       └── README.md  <- Link or instructions on acquiring the data
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── logs               <- Stores details of experiments, learning curves from custom tools
│   └── ID-#-nb        <- Directory referencing the notebook or commit used to create logs.
|
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is the creator's initials,
│                         a number (for ordering), and a CamelCase description, e.g.
│                         `ak-1.0-DataPreprocessing.ipynb`.
│
├── references         <- Relevant papers, Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── src                <- Source code for use in this project, the ultimate deliverable
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.testrun.org

Contributing

We welcome contributions! See the docs for guidelines.

Installing development requirements


pip install -r requirements.txt

Running the tests


py.test tests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.