Code Monkey home page Code Monkey logo

archivepy's Introduction

Data Archive Py

Codacy Badge Build status Codecov test coverage

Data Archiving and Versioning

One issue that I've often faced in analytics is how to connect output (data, visualizations, and reports) with the code used to generate that output. The team will deliver output to the client/partner and then continue to develop the code base. If the client/partner has a specific question then the team has to back-track to figure out which code was used to create that particular output. A lot of teams archive their output use timestamps, e.g. data_20171019_124590. While this can be a good method for data archiving for some teams, this method doesn't easily allow an analyst to be able to backtrack and see which code created that specific output. This can also create issues if data is created and then used as an input farther downstream because it means that the filename is changing whenever the data are re-processed.

The archivePy package combines several steps important to data archiving purposes into one package.

First, we commit our code and extract the branch, unique 6-character hash, and message to be used when naming the archive.

Next we run the archiving functions which create a directory structure that serves two functions a) easy access to the current data and b) an archive of data can be extracted using the unique 6-character hash.

Example File Structure

Project/
|___Data/
|   |___Processed/
|       |___Current/
|              data.csv
|              summary_data.csv
|       |___Archive/
|                master_90r68d.zip
|___Deliverables/
|   |___Current/
|       fig1.png
|       fig2.png
|   |___Archive/
|       master_90r68d.zip

Commit and Archive Sample Code

Sample Code is located ./archivePy/archivePy/sample_code.py

  1. Navigate the repository's root directory, e.g. cd archivepy
  2. To write out data to the Current folder type python archivePy/sample_code.py
  3. To write out data to the Current folder and Archive the data type python archivePy/sample_code.py -c "commit message"
  • Adding the --commit (-c) command triggers the script to add the data/output to the Archive folder
  • Adding the --add_branch (-b) command is an optional argument. Including this command will add the branch name to the beginning of the archive zip filename.
  • Adding the --add_message (-m) command is an optional argument. Including this command will add the message to the end of the archive zip filename.

archivepy's People

Contributors

holmesjoli avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.