griffithlab / docm Goto Github PK
View Code? Open in Web Editor NEWRails frontend to The Genome Institute's database of curated mutations (DoCM)
Home Page: http://docm.genome.wustl.edu
License: MIT License
Rails frontend to The Genome Institute's database of curated mutations (DoCM)
Home Page: http://docm.genome.wustl.edu
License: MIT License
You example includes a drug_interactions field in the response, but if you try this example no drug interaction data is returned.
/api/v1/variants/ENST00000263967:c.1633G>C.json
If I go to http://docm.genome.wustl.edu/about, it redirects me to http://www.docm.infoabout/ (which is not a thing)
Missing slash in the redirect rules somewhere, I'm guessing
A user found the following discrepancy
There is a set of discrepancies between the VCF and TSV files in the DOCM.
In the TSV, there are four 4bp insertions in the NPM1 gene (chr5, 170837546).
In the VCF, they are listed as 3bp insertions.
Based on the cited paper (PMID 19657110), I believe that the TSV version is correct.
This column is not actually showing pubmed IDs, we should name it more descriptively. "Citations" is my vote but open to suggestions.
We should license DoCM the same way that we license CIViC. Creative Commons Attribution 4.0 International License.
If a variant isn't found the API returns 404, it would be useful to return something that indicates your request was good but the variant wasn't found.
The HGVSc. annotations in the TSV file are all given with respect to the plus strand, but they should be given with respect to the sense strand. For example, the variant in MTOR at chr1, 11174420 is listed as c.7255C>T, but it should be c.7255G>A (since the coding strand of MTOR is the minus strand).
It would be great if the was a link to the CIViC variant summary in the "Variant Data" box of the variant page.
Additionally, a link to the accompanying evidence item could go into the external links column of the disease data box on the variant page.
It would be especially cool if this was a icon or button with the civic logo. Maybe "see in CIViC" or something.
How is a list of pubmed IDs currently sorted in DoCM. e.g. PIK3CA p.E545K lists these pubmed ids:
18725974, 22162589, 19513541, 19029981, 16906227, 21430269, 15647370, 22271473, 22162582, 15805248, 20453058, 18676830, 15254419, 15016963, 16930767, 20619739, 19903786, 19366826
Maybe these should be ordered from oldest to newest?
Also, this list of IDs is very important to have, but not very informative to view. Would it be better to list the publications in the format of Jones et al. 2015
. So that at least for some people they would be more familiar.
e.g. (filtered from 1,015 total entries)
Display the number of variants from the current version.
TravisCI fails when trying to install better_errors referencing a requirement of Ruby version >= 2.0.0. @acoffman could that be fixed by changing the following line to 2.0.0?
The following variants are not importing
chromosome start stop gene reference variant
4 55593594 55593594 KIT GAAGTACAGTGGAAG -
4 55593594 55593594 KIT GAAGTACAGTGGAAGGTT -
4 55593600 55593601 KIT - CTA
4 55593600 55593601 KIT - CTC
4 55593600 55593601 KIT - CTG
4 55593600 55593601 KIT - CTT
4 55593600 55593601 KIT - TTA
4 55593600 55593601 KIT - TTG
4 55593609 55593611 KIT GTT -
4 55593628 55593628 KIT G T
4 55593630 55593647 KIT AACAATTATGTTACAGAC -
4 55593630 55593656 KIT AACAATTATGTTACAGACCCAACA -
4 55593657 55593658 KIT - CCAGAA
4 55593657 55593658 KIT - CCAGAG
4 55593657 55593658 KIT - CCCGAA
4 55593657 55593658 KIT - CCCGAG
4 55593657 55593658 KIT - CCGGAA
4 55593657 55593658 KIT - CCGGAG
4 55593657 55593658 KIT - CCTGAA
4 55593657 55593658 KIT - CCTGAG
4 55593669 55593671 KIT GAT -
4 55594197 55594197 KIT C T
4 55599338 55599338 KIT A T
4 55602700 55602700 KIT A G
4 55602770 55602770 KIT C T
7 6426892 6426892 RAC1 C T
7 128829195 128829196 SMO CC TT
7 128845101 128845101 SMO C T
7 128846116 128846117 SMO CC TT
7 128849224 128849225 SMO CC TT
7 128850279 128850280 SMO CC TT
7 128850341 128850341 SMO G T
7 128850838 128850838 SMO G A
7 128851883 128851883 SMO C T
7 128852191 128852192 SMO CC TT
7 140453155 140453155 BRAF C A
9 133738307 133738307 ABL1 A T
19 17948006 17948006 JAK3 G A
19 17948009 17948009 JAK3 G A
19 18271909 18271909 PIK3R2 C T
Please and thank you!
Do any search (I pulled down 'acute myeloid leukemia'), then export at TSV of the list. The first line of the text file contains the following:
ENST00000373103:c.1853G>A
Instead of the expected
ENST00000373103:c.1853G>A
I find the lack of spacing between news items a bit jarring:
http://docm.genome.wustl.edu/news
Would be nice if the syle looked a bit more like this:
http://dgidb.genome.wustl.edu/news
Right now CIViC and DGIdb have one, but DoCM does not.
Add VEP annotation and account for tags and rationale
The current about page http://docm.genome.wustl.edu/about is very light on content.
This should be a much richer description of the goals of DOCM. Perhaps include a visual as we do in CIViC: https://civic.genome.wustl.edu/#/collaborate
Inspiration for the content of the about page could come from the manuscript.
In the filter options at the right side of the page it would be awesome if we could ask for all variants associated with a particular publication (e.g. pubmed ID).
Currently, all transcripts in DoCM are labeled as build 74 of ensembl genes. Is this reflective of the VEP annotations we're pulling? My understanding is that the API only hits the most recent build... e74.rest.ensembl.org isn't a thing, unfortunately. I suggest that we check that the coordinates of our existing variants in DoCM are correct in 84, and then update our tags accordingly.
The current sources page (http://docm.genome.wustl.edu/sources) is worded in such a way that it appears that the only/main sources for DoCM are MyCancerGenome and the Knowledge Database. This is an underrepresentation of the sources that inspired content for DoCM. We should better explain how DoCM uses such sources, as well as key papers that published extensive lists, and more focused papers that describe specific groups of variants.
The news page has not been kept very up to date with developments. We should summarize when new features are added, when new variants are added, and perhaps even when important bugs are fixed. We can also report cases where DoCM will be presented publicly here.
@jmcmichael, When you have a minute could you change the TGI logo to MGI?
Thanks!
variants like http://docm.genome.wustl.edu/variants/ENST00000261609:c.2277delAACGGTCCTGACCTG?version=2
screw up the formatting.
@acoffman could this be incorporated in the next release?
It would be really nice to have a list of all the versions of the database, perhaps on the about page?
It could be a table with the version number the number of mutations in that version, the number of publications, and the number of cancer subtypes (DOIDs)
We should move the 'Chromosomes' filter down so that it is directly above the start/stop 'Position' filter. Then only allow filtering on position if one or more chromosomes has been selected?
Some additional variant lists have been curated but have not yet been imported into the liver version of DoCM. Prior to publication we should do these imports and update the News feed as we add them.
a variant can have many tags.
Tags must be 5 words or fewer.
Tags can describe the variant and allow for custom organization and filtering
We don't want our users to leave DoCM and never come back! It would be nice if a new tab would open when you click on a link like the pubmed id, dgidb or mycancergenome.
Users should submit variants in a TSV format
Add a field that allows the submitter to describe their rationale for including the batch in DoCM. This should be required and viewable on the interface following acceptance.
We should create a DoCM tutorial. This could live on a new page of the DoCM site (e.g. help
or getting started
).
This tutorial could include things like:
This could exist under a "pending" version, just like "current" version.
There should be some indication if a variant disease or publication is pending review.
This would allow for more routine updates of CIViC data without re-importing a version from scratch.
Hello, the vcf download has "/" and " " in the info field and this breaks at least pyvcf, does this fit VCF spec?
Thanks
Matt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.