CRC Data Visualization Website Template

Overview

This template is created for the researchers and students at Rice University. This template is used for building a visualization website powered by ploty dash for time-series data of CRC. The template only support line and scatter plot currently. This template will automatically generating line and scatter plots of the given data with group by and filter supported regardless of the schema of the data as long as the data meets the following requirements.

Dataset Requirements

The template can embed the data and build a website on it as long as the dataset:

is structural.
is in .csv form.
is well-preprocessed. This template does not support data cleaning module.
is a time-series data with at least one column to indicate the time.
has the time column in "yyyy-mm-dd" format.

Note: the data schema is similar to a time-series data with one column indicating the time and multiple other numerical columns indicating the values. So basically we expect the data can be well represented by a line plot or scatter plot with the horizontal coordinate of time.

Get Started

Setup a Orion VM

At first, the users of this template need to have an Orion account setup. Once you have one, login to your orion account

Add public ssh key.

click your name in the top right corner
click "settings" and click the "Update SSH Key" panel
put in the contents of your public ssh key in the box and click "Update SSH Key"

Create a virtual machine.

go to VMs in the left-hand navigation
click on the green button with a plus sign icon
select "rocky 8.9 cloud vm - small" in the list
click the "create" button
give the instance a name ("netid-project" or something)
leave the rest of the default settings
find your vm instance in the dashboard and click on it
remember the instance's ip address

when you first instantiate a vm, there are some post-deploy scripts that will run to perform some extra configuration (including creating a user account with your netid with sudo access and inserting your public key). You might want to wait five minutes or so before trying to connect the first time.

you should be able to access the vm through vnc in the orion web ui by clicking the button with the monitor icon and selecting "vnc". you'll be taken to a vnc session with the instance. you can also ssh to the instance using ssh @. depending on your ssh configuration, you may have to use the -i option to specify the location of your private key. For example: ssh -i ~/.ssh/<key> <user>@<ip>

Connect to the RDF share in the VM

The vm template in orion is already configured to mount the rdf share. you can do so by entering mount /rdf on the command line which mounts the top-level of the share. you can navigate down to your folder at /rdf/crc/<netid>.

Deleting the VM

log back in to https://orion.crc.rice.edu
find your vm instance and click on it
click the red button with the trash can icon and click "terminate"

Configure Docker in the VM

First, make sure you ssh to your VM. And inside your VM do:

### Install Docker engine

sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf config-manager --setopt="docker-ce-stable.baseurl=https://download.docker.com/linux/centos/8/x86_64/stable" --save
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

sudo systemctl enable docker --now
sudo systemctl enable containerd --now

### Add the user to the Docker group

sudo usermod -aG docker $USER

Clone this Tempelate Repository to your VM

When you are at this repository, just click the "Fork" button on the top-right hand. In the new page pops up, click the green button with "Create Fork". Then once you confirm that there is a same repo under your github account:

ssh to your VM
try following to clone the repo to your home directory:

git clone https://github.com/<github-username>/crc-datasite-template $HOME/crc-datasite-template

Get the Data Souce for Visualizing with the Template

Get the Data Source to the rdf share folder Using Curl

curl -o /rdf/crc/<netid>/<dataset-name>.csv 'https://data.cdc.gov/api/views/xkkf-xrst/rows.csv?accessType=DOWNLOAD&bom=true&format=true%20target='

Put the data source to the container by modiying Docker Compose file

In ./docker-compose.yml, make sure to include the location of the data source you just downloaded under services.rdf-usage-stats.volumns:

services:
  rdf-usage-stats:
    volumes:
      - "/rdf/crc/<netid>/<dataset-name>.csv:/app/<dataset-name>.csv"

.env configuration file

After having the dataset, in most cases, the only thing that needs to be modified is the .env file if the logic behind the template works fine There are five fields in the file related to a given data and have a big impact on the representation of the data:

DATE_COL -- for specifying the time column of the data
RATIO_COLS -- for specifying the columns which contain several groups of the data and can be used in "group by" manner in visualization
DROP_DOWN_COL -- for specifying the column which contain several categories for the data and can be used as data filters
SITE_NAME -- the name of the website
AUTHOR -- the author

Note: the RATIO_COLS and DROP_DOWN_COL can be left with no value specified if we do not need them. It will not influence the data visualization. Example on what to put in this file is given below.

Build the Docker image and run the container

cd ~/crc-datasite-template
docker compose up --build -d

the -d option here means you want your containers running in detach mode in the background without interrupting the current shell. If unneseccary, remove the -d option.

If no error comes out, now you can access to the app via :8000

Stop the container when done

cd ~/crc-datasite-template
docker compose down

Example of the application running

In this example, we use the covid mortality dataset The application showing the visualization of the dataset is running on http://10.134.196.74:8000 In this web page, the "Target" dropdown can choose the a value and the data shown will be the entries whose value of a specific column (the one you specified at the DROP_DOWN_COL in the .env file) equals to the chosen value.

In this example, if I choose "Ohio", the data shown would only be the entries with "State" column equals to "Ohio".

The "Date Range" picker can choose the date range within which you want your shown data is. The date values come from the field of the data you specified at the DATE_COL in the .env file. Here I picked this value.

The "Data Options" picker can choose the fields of the data you want to show in the plot. And the result of the above option would be:

If I choose all the five fields in the "Data Options":

The "Group by" Selector can choose a field. And the data will be grouped by the values in the field. Here is an example with "Group by" selected as "State":

If we choose "Disable", the group by function will be disabled:

Feel free to access the website and play with it!

Recap: Guidance on How to Make this App a Custom One

make sure you have everything setup, including the VM, the repository, the docker.
put the cleaned dataset under /rdf/crc/<netid>/<dataset-name>.csv
make sure you understand the schema of the dataset and create a .env file according to your needs
put the .env file under the /src
in the repository, try docker compose up --build -d
access the website via :8000

derekjkeller / crc-datasite-template Goto Github PK

crc-datasite-template's Introduction