Code Monkey home page Code Monkey logo

dgidb-v5's Introduction

DGIdb v5

A from-scratch rewrite of the Drug-Gene Interaction Database.

For developers

Initial dependencies

First, make sure you have all of the following installed:

Clone and enter the repository:

git clone https://github.com/dgidb/dgidb-v5
cd dgidb-v5

Server setup

First, you may need to switch your Ruby version with RVM to match the version declared in the first few lines of the Gemfile. For example, to switch to version 3.0.4:

rvm install 3.0.4
rvm 3.0.4

From the repo root, enter the server subdirectory:

cd server

If RVM is properly installed, you should expect to encounter a warning message here:

RVM used your Gemfile for selecting Ruby, it is all fine - Heroku does that too,
you can ignore these warnings with 'rvm rvmrc warning ignore ./Gemfile'.
To ignore the warning for all files run 'rvm rvmrc warning ignore allGemfiles'.

Next, install Rails and other required gems with bundle:

bundle install

The server will need a running Postgres instance. Postgres start commands may vary based on your OS and processor type. The following should work on M1 Macs:

pg_ctl -D /opt/homebrew/var/postgres start
# on older macs you may need to use a different path instead, eg "pg_ctl -D /usr/local/var/postgres start"

Database initialization utilities are in-progress, so for now, the easiest way to get a working database is to manually create it using the psql command. First, enter the psql console:

psql -d postgres  # if you are opening psql for the first time, you'll need to connect to the database 'postgres'
# should produce a prompt like the following:
# psql (14.2)
# Type "help" for help.
#
# jss009=#

Within the psql console, create the DGIdb database, then quit:

CREATE DATABASE dgidb;
\q

Next, back in the main shell, import a database dump file (ask on Slack if you need the latest file):

psql -d dgidb -f dgidb_dump_20220526.psql  # provide path to data dump

That should take a few minutes. Finally, start the Rails server:

rails s

Navigate to localhost:3000/api/graphiql in your browser. If the example query provided runs successfully, then you're all set.

Client setup

Navigate to the /client directory:

# from dgidb-v5 root
cd client

Install dependencies with yarn:

yarn install

Start the client:

yarn start

dgidb-v5's People

Contributors

acoffman avatar cjosu avatar jsstevenson avatar katiestahl avatar kcotto avatar korikuzma avatar mcannon068nw avatar nairod2000 avatar rbasu101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

layth17

dgidb-v5's Issues

Add interaction type and gene claim category if not already in DB

Currently, the base Importer class will raise an error if it encounters an interaction type or gene claim category that isn't already in the corresponding tables (see eg

def create_interaction_claim_type(interaction_claim, type)
)

We should (in separate issues) ensure that the normalization of the values going into those fields has satisfactory results -- but I don't think a normalized value should have to be manually added to any tables, so the constraints above should be removed, and if the value isn't already in the table, the importer should add it.

Design a Front-end UI mock-up

We've talked about designing a new front-end UI. It would be useful for us to come up with a mock-up for what that should look like and what components it should have.

Add importers

Big picture

Specific sources

  • BaderLab
  • CancerCommons
  • Caris
  • CGI
  • Clearity Foundation Biomarkers
  • Clearity Foundation Clinical Trial: Non-existent interaction claim type issue (immunotherapy) #49
  • CiVIC
  • COSMIC
  • dGene
  • DrugBank: Figure out data source, #52
  • Docm
  • DTC
  • Ensembl: #55
  • Entrez: #56
  • FDA
  • Foundation One Genes
  • GO
  • #57
  • Hingorini Casas
  • Hopkins Groom: Add gene claim category "DNA DIRECTED DNA POLYMERASE" #49
  • Human Protein Atlas
  • IDG: Could probably refactor as API importer. See #48
  • #54
  • MSK impact
  • My Cancer Genome: Non-existent interaction claim type issue (immunotherapy) #49
  • My Cancer Genome clinical trial
  • NCI: #51
  • OncoKB: #47
  • Oncomine
  • PharmGKB
  • Pharos
  • Russ-Lampel
  • TALC: #46
  • Tdg
  • Tempus
  • Tend
  • TTD

Spin up staging box on AWS

Will remain on WUSTL AWS resources

  • Add environment-specific hostnames to request URLs in client (#93)
  • Add github -> s3 deployment pipeline (of some kind) for client
  • Write CloudFormation templates for Beanstalk and RDS
  • Add some kind of deployment pipeline for server -> Beanstalk
  • Add cloudfront to templates

Drug Approval

Evaluate DGIdb current filtering strategy/language against planned approval enum expansion for all sources:

CHEMBL_1
CHEMBL_2
CHEMBL_3
CHEMBL_4
CHEMBL_WITHDRAWN
FDA_DISCONTINUED
FDA_PRESCRIPTION
FDA_OTC
FDA_TENTATIVE
GTOPDB_APPROVED
GTOPDB_WITHDRAWN
HEMONC_APPROVED
RXNORM_PRESCRIBABLE

Add updaters

Additionally, it'd be nice to implement better per-source deleters (so that you don't have to delete every grouping in order to delete/re-add a single source, unless this work is already done and I didn't copy them over correctly) and more optimized interaction grouping in this issue

Process remaining new interaction claim types and gene categories

InteractionClaimType -- normalization defined in interaction claim type model
Clarity Biomarkers: "Biomarker"
Clarity Clinical Trials: "immunostimulator", "natigen", "radioimmunotherapy"
My Cancer Genome: "immunotherapy"

GeneCategory
Hopkins/Groom: "DNA DIRECTED DNA POLYMERASE"

Ensembl importer

Figure it out. Potentially replaced by the VICC gene normalizer, so lower priority.

Check and update source citation data

For TALC:

For TALC, citation is different in three places but 'most correct' citation appears to be from website:

Morgensztern D, Campo MJ, Dahlberg SE, Doebele RC, Garon E, Gerber DE, Goldberg SB, Hammerman PS, Heist RS, Hensing T, et al. Molecularly targeted therapies in non-small-cell lung cancer annual update 2014. J Thorac Oncol 2015; 10: S1-63. PMID: 25535693

Some other sources appear to have dead or incorrect links, or old/weird source citation data as well

Drug attributes

  • Examine overlap/conflicts with therapy normalizer
  • Double-check current DB structure
  • Write any needed migrations

Prototype drug & gene record pages

From results page, clicking individual drug or gene should route to drug or gene record. Design these layouts following design goals laid out in user story exercise.

Identify way to filter drug claims by type/stage of development

From conversation with scientists in clinical research pipelines, it would be extremely useful for them to have a way to filter or sort drug claims by type/stage of development (e.g. (e.g. FDA approved drugs, drugs in clinical trials, research compounds, natural products).

Bring over remaining data models for drugs and genes

Similar to what we did for GeneClaims, bring over other data models from old version of DGIdb.

For now as learning exercise and general progress, we can just use the old data. We can refactor these as needed if changes to data structure occur.

Clean up gene and drug aliases

In particular, try to identify cases where non-namespaced ID numbers are getting grouped into genes and drugs and fix the importer code accordingly

Prototype results page

Designing layout for results page following design goals laid out in user story exercise.

Render sample data on a front-end page

Can be anything (such as a single GeneClaim or Source citation) and doesn't have to be pretty for now.

This is essentially to learn how to link everything together and show that we can render something on the top layer front end thats stored in the bottom layer database.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.