Code Monkey home page Code Monkey logo

dvc's Introduction

DVC logo

WebsiteDocsTwitterChat (Community & Support)TutorialMailing List

Travis

Windows Build

Code Climate

Codecov

Data Science Version Control or DVC is an open-source tool for data science and machine learning projects. With a simple and flexible Git-like architecture and interface it helps data scientists:

  1. manage machine learning models - versioning, including data sets and transformations (scripts) that were used to generate models;
  2. make projects reproducible;
  3. make projects shareable;
  4. manage experiments with branching and metrics tracking;

It aims to replace tools like Excel and Docs that are being commonly used as a knowledge repo and a ledger for the team, ad-hoc scripts to track and move deploy different model versions, ad-hoc data file suffixes and prefixes.

Contents

How DVC works

DVC is compatible with Git for storing code and the dependency graph (DAG), but not data files cache. To store and share data files cache DVC supports remotes - any cloud (S3, Azure, Google Cloud, etc) or any on-premise network storage (via SSH, for example).

how_dvc_works

Quick start

Please read Get Started for the full version. Common workflow commands include:

Step Command
Track code and data together
$ git add train.py
$ dvc add images.zip
Connect code and data by commands
$ dvc run -d images.zip -o images/ unzip -q images.zip
$ dvc run -d images/ -d train.py -o model.p python train.py
Make changes and reproduce
$ vi train.py
$ dvc repro model.p.dvc
Share code
$ git add .
$ git commit -m 'The baseline model'
$ git push
Share data and ML models
$ dvc remote add myremote s3://mybucket/image_cnn
$ dvc config core.remote myremote
$ dvc push

Installation

There are three options to install DVC: pip, Homebrew, or an OS-specific package:

pip (PyPI)

Stable

pip install dvc

Development

pip install git+git://github.com/iterative/dvc

Homebrew

brew install iterative/homebrew-dvc/dvc

or:

brew cask install iterative/homebrew-dvc/dvc

Package

Self-contained packages for Windows, Linux, Mac are available. The latest version of the packages can be found at GitHub releases page.

Ubuntu / Debian (apt)

sudo wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list
sudo apt-get update
sudo apt-get install dvc

Fedora / CentOS (rpm)

sudo wget https://dvc.org/rpm/dvc.repo -O /etc/yum.repos.d/dvc.repo
sudo yum update
sudo yum install dvc

Arch linux (AUR)

Unofficial package, any inquiries regarding the AUR package, refer to the maintainer.

yay -S dvc

Related technologies

  1. Git-annex - DVC uses the idea of storing the content of large files (that you don't want to see in your Git repository) in a local key-value store and uses file hardlinks/symlinks instead of the copying actual files.
  2. Git-LFS - DVC is compatible with any remote storage (S3, Google Cloud, Azure, SSH, etc). DVC utilizes reflinks or hardlinks to avoid copy operation on checkouts which makes much more efficient for large data files.
  3. Makefile (and its analogues). DVC tracks dependencies (DAG).
  4. Workflow Management Systems. DVC is a workflow management system designed specifically to manage machine learning experiments. DVC is built on top of Git.
  5. DAGsHub Is a Github equivalent for DVC - pushing your Git+DVC based repo to DAGsHub will give you a high level dashboard of your project, including DVC pipeline and metrics visualizations, as well as links to DVC managed files if they are in cloud storage.

Contributing

Contributions are welcome! Please see our Contributing Guide for more details.

0

1

2

3

4

5

6

7

Mailing List

Want to stay up to date? Want to help improve DVC by participating in our occasional polls? Subscribe to our mailing list. No spam, really low traffic.

Copyright

This project is distributed under the Apache license version 2.0 (see the LICENSE file in the project root).

By submitting a pull request for this project, you agree to license your contribution under the Apache license version 2.0 to this project.

dvc's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.