Code Monkey home page Code Monkey logo

crc-datasite-template's Introduction

CRC Data Visualization Website Template

Overview

This template is created for the researchers and students at Rice University. This template is used for building a visualization website powered by ploty dash for time-series data of CRC. The template only support line and scatter plot currently. This template will automatically generating line and scatter plots of the given data with group by and filter supported regardless of the schema of the data as long as the data meets the following requirements.

Dataset Requirements

The template can embed the data and build a website on it as long as the dataset:

  • is structural.
  • is in .csv form.
  • is well-preprocessed. This template does not support data cleaning module.
  • is a time-series data with at least one column to indicate the time.
  • has the time column in "yyyy-mm-dd" format.

Note: the data schema is similar to a time-series data with one column indicating the time and multiple other numerical columns indicating the values. So basically we expect the data can be well represented by a line plot or scatter plot with the horizontal coordinate of time.

Get Started

Setup a Orion VM

At first, the users of this template need to have an Orion account setup. Once you have one, login to your orion account

Add public ssh key.

  • click your name in the top right corner
  • click "settings" and click the "Update SSH Key" panel
  • put in the contents of your public ssh key in the box and click "Update SSH Key"

Create a virtual machine.

  • go to VMs in the left-hand navigation
  • click on the green button with a plus sign icon
  • select "rocky 8.9 cloud vm - small" in the list
  • click the "create" button
  • give the instance a name ("netid-project" or something)
  • leave the rest of the default settings
  • find your vm instance in the dashboard and click on it
  • remember the instance's ip address

when you first instantiate a vm, there are some post-deploy scripts that will run to perform some extra configuration (including creating a user account with your netid with sudo access and inserting your public key). You might want to wait five minutes or so before trying to connect the first time.

you should be able to access the vm through vnc in the orion web ui by clicking the button with the monitor icon and selecting "vnc". you'll be taken to a vnc session with the instance. you can also ssh to the instance using ssh @. depending on your ssh configuration, you may have to use the -i option to specify the location of your private key. For example: ssh -i ~/.ssh/<key> <user>@<ip>

Connect to the RDF share in the VM

The vm template in orion is already configured to mount the rdf share. you can do so by entering mount /rdf on the command line which mounts the top-level of the share. you can navigate down to your folder at /rdf/crc/<netid>.

Deleting the VM

  • log back in to https://orion.crc.rice.edu
  • find your vm instance and click on it
  • click the red button with the trash can icon and click "terminate"

Configure Docker in the VM

First, make sure you ssh to your VM. And inside your VM do:

### Install Docker engine

sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf config-manager --setopt="docker-ce-stable.baseurl=https://download.docker.com/linux/centos/8/x86_64/stable" --save
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

sudo systemctl enable docker --now
sudo systemctl enable containerd --now

### Add the user to the Docker group

sudo usermod -aG docker $USER

Clone this Tempelate Repository to your VM

When you are at this repository, just click the "Fork" button on the top-right hand. In the new page pops up, click the green button with "Create Fork". Then once you confirm that there is a same repo under your github account:

  • ssh to your VM
  • try following to clone the repo to your home directory:
git clone https://github.com/<github-username>/crc-datasite-template $HOME/crc-datasite-template

Get the Data Souce for Visualizing with the Template

Get the Data Source to the rdf share folder Using Curl

curl -o /rdf/crc/<netid>/<dataset-name>.csv 'https://data.cdc.gov/api/views/xkkf-xrst/rows.csv?accessType=DOWNLOAD&bom=true&format=true%20target='

Put the data source to the container by modiying Docker Compose file

In ./docker-compose.yml, make sure to include the location of the data source you just downloaded under services.rdf-usage-stats.volumns:

services:
  rdf-usage-stats:
    volumes:
      - "/rdf/crc/<netid>/<dataset-name>.csv:/app/<dataset-name>.csv"

.env configuration file

After having the dataset, in most cases, the only thing that needs to be modified is the .env file if the logic behind the template works fine There are five fields in the file related to a given data and have a big impact on the representation of the data:

  • DATE_COL -- for specifying the time column of the data
  • RATIO_COLS -- for specifying the columns which contain several groups of the data and can be used in "group by" manner in visualization
  • DROP_DOWN_COL -- for specifying the column which contain several categories for the data and can be used as data filters
  • SITE_NAME -- the name of the website
  • AUTHOR -- the author

Note: the RATIO_COLS and DROP_DOWN_COL can be left with no value specified if we do not need them. It will not influence the data visualization. Example on what to put in this file is given below.

Build the Docker image and run the container

cd ~/crc-datasite-template
docker compose up --build -d

the -d option here means you want your containers running in detach mode in the background without interrupting the current shell. If unneseccary, remove the -d option.

If no error comes out, now you can access to the app via :8000

Stop the container when done

cd ~/crc-datasite-template
docker compose down

Example of the application running

In this example, we use the covid mortality dataset The application showing the visualization of the dataset is running on http://10.134.196.74:8000 In this web page, the "Target" dropdown can choose the a value and the data shown will be the entries whose value of a specific column (the one you specified at the DROP_DOWN_COL in the .env file) equals to the chosen value.

In this example, if I choose "Ohio", the data shown would only be the entries with "State" column equals to "Ohio".

image

The "Date Range" picker can choose the date range within which you want your shown data is. The date values come from the field of the data you specified at the DATE_COL in the .env file. Here I picked this value.

image

The "Data Options" picker can choose the fields of the data you want to show in the plot. And the result of the above option would be: image

If I choose all the five fields in the "Data Options": image

The "Group by" Selector can choose a field. And the data will be grouped by the values in the field. Here is an example with "Group by" selected as "State": image

If we choose "Disable", the group by function will be disabled: image

Feel free to access the website and play with it!

Recap: Guidance on How to Make this App a Custom One

  • make sure you have everything setup, including the VM, the repository, the docker.
  • put the cleaned dataset under /rdf/crc/<netid>/<dataset-name>.csv
  • make sure you understand the schema of the dataset and create a .env file according to your needs
  • put the .env file under the /src
  • in the repository, try docker compose up --build -d
  • access the website via :8000

crc-datasite-template's People

Contributors

cs185 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.