Code Monkey home page Code Monkey logo

canvas-data-cli's Introduction

Canvas Data CLI

A small CLI tool for syncing data from the Canvas Data API.

NOTE: this is currently in beta, please report any bugs or issues you find!

Installing

Prerequisites

This tool should work on Linux, OSX, and Windows. The tool uses node.js runtime, which you will need to install before being able to use it.

  1. Install Node.js - Any version newer than 0.12.0 should work, best bet is to follow the instructions here

Install via npm

npm install -g canvas-data-cli

OR Install from github

git clone https://github.com/instructure/canvas-data-cli.git && cd canvas-data-cli && make installLocal

Configuring

The Canvas Data CLI requires a configuration file with a fields set. Canvas Data CLI uses a small javascript file as configuration file. To generate a stub of this configuration run canvasDataCli sampleConfig which will create a config.js.sample file. Rename this to a file, like config.js.

Edit the file to point to where you want to save the files as well as the file used to track the state of which data exports you have already downloaded. By default the sample config file tries to pull your API key and secret from environment variables, CD_API_KEY and CD_API_SECRET, which is more secure, however, you can also hard code the credentials in the config file.

Configuring an HTTP Proxy

canvas-data-cli has support for HTTP Proxies, both with and without basic authentication. To do this there are three extra options you can add to your config file. httpsProxy, proxyUsername, and proxyPassword.

Config Option Value
httpsProxy the host:port of the https proxy. Ideally it'd look like: https_proxy_stuff.com:433
proxyUsername the basic auth username for the https proxy.
proxyPassword the basic auth password for the https proxy.

Usage

Syncing

If you want to simply download all the data from Canva Data, the sync command can be used to keep an up-to-date copy locally.

canvasDataCli sync -c path/to/config.js

This will start the sync process. The sync process uses the sync api endpoint to get a list of all the files. If the file does

not exist, it will download it. Otherwise, it will skip the file. After downloading all files, it will delete any unexpected files

in the directory to remove old data.

On subsequent executions, it will only download the files it doesn't have.

This process is also resumeable, if for whatever reason you have issues, it should restart and download only the files

that previously failed. One of the ways to make this more safe is that it downloads the file to a temporary name and

renames it once the process is finished. This may leave around gz.tmp files, but they should get deleted automatically once

you have a successful run.

If you run this daily, you should keep all of your data from Canvas Data up to date.

Fetch

Fetches most up to date data for a single table from the API. This ignores any previously downloaded files and will redownload all the files associated with that table.

canvasDataCli fetch -c path/to/config.js -t user_dim

This will start the fetch process and download what is needed to get the most recent data for that table (in this case, the user_dim).

On subsequent executions, this will redownload all the data for that table, ignoring any previous days data.

Unpack

NOTE: This only works after properly running a sync command

This command will unpack the gzipped files, concat any partitioned files, and add a header to the output file

canvasDataCli unpack -c path/to/config.js -f user_dim,account_dim

This command will unpack the user_dim and account_dim tables to a directory. Currently, you explictly have to give the files you want to unpack as this has the potential for creating very large files.

API

This subcommand is designed to allow users to make API calls directly. The main use case for which is debugging and development.

canvasDataCli api -c config.js -r /account/self/dump

Historical Requests

Periodically requests data is regrouped into collections that span more than just a single day. In this case, the date that the files were generated differs from the time that the included requests were made. To make it easier to identify which files contain the requests made during a particular time range, we have the historical-requests subcommand.

canvasDataCli historical-requests -c config.js

Its output takes the form:

{
  "dumpId": "...",
  "ranges": {
    "20180315_20180330": [
      {
        "url": "...",
        "filename": "..."
      },
      {
        "url": "...",
        "filename": "..."
      }
    ],
    "20180331_20180414": [
      {
        "url": "...",
        "filename": "..."
      }
    ]
  }
}

Developing

Process:

  1. Write some code
  2. Write tests
  3. Run make installLocal to test changes
  4. Run bin/canvasDataCli .... to test your changes locally
  5. Open a pull request

Running tests

In Docker

If you use docker, you can run tests inside a docker container

./build.sh

Native

npm install .
npm test

canvas-data-cli's People

Contributors

dlecocq avatar kblibr avatar howderek avatar buckett avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.