Code Monkey home page Code Monkey logo

oncokb-annotator's Introduction

UPDATE: We now include Diagnostic Implications and Prognostic Implications during the annotation process

UPDATE: API token required, please see OncoKB API section for more information

oncokb-annotator

Status

Run all python tests Compare Study Annotation

Install dependencies

For python 3

pip install -r requirements/common.txt -r requirements/pip3.txt

For python 2.7

pip install -r requirements/common.txt -r requirements/pip2.7.txt

Usage

Example input files are under data. An example script is here: example.sh

MAF

Annotates variants in MAF(https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/) with OncoKB annotation. Supports both python2 and python3.
Get more details on the command line using python MafAnnotator.py -h.

We recommend processing VCF files by vcf2maf with OncoKB isoforms before using the MafAnnotator here.

Atypical Alteration

You can still use MAF format to annotate atypical alterations, such as MSI-H, TMB-H, EGFR vIII. Please see more examples HERE.

Copy Number Alteration

We use GISTIC 2.0 format. For more information, please see https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#discrete-copy-number-data.

Get more details on the command line using python CnaAnnotator.py -h.

Fusion

OncoKB offers to anntoate the strucutal variant. But in annotator, we only annotate the functional fusion. The fusion format for intragenic deletion is GENE-intragenic or GENE-GENE. For other fusions, please use GENEA-GENEB or GENEA-GENEB Fusion.

Get more details on the command line using python FusionAnnotator.py -h.

Clinical Data (Combine MAF+CNA+Fusion)

You can comebine all annotation on sample/patient level using the clinical data annotator.

Get more details on the command line using python ClinicalDataAnnotator.py -h.

Annotate with HGVSp_Short, HGVSp, HGVSg or Genomic Change

OncoKB MafAnnotator supports annotating the alteration with HGVSp, HGVSp_Short, HGVSg or Genomic Change format. Please specify the query type with -q parameter. The acceptable values are HGVSp_Short, HGVSp, HGVSg and Genomic_Change(case-insensitive). Please see data/example.sh for examples.
If you do not specify query type, the MafAnnotator will try to figure out the query type based on the headers.

For HGVSp_Short, the annotator takes alteration from the column HGVSp_Short or Alteration
For HGVSp, the annotator takes alteration from the column HGVSp or Alteration
For HGVSg, the annotator takes alteration from the column HGVSg or Alteration
For Genomic_Change, the annotator takes genomic change from columns Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele1 and Tumor_Seq_Allele2.

Annotation with Genomic_Change is relatively slow. We need to annotate the variant first with GenomeNexus(https://www.genomenexus.org/) then get annotation one by one. There is a plan to improve this method. If you are annotating a lot of data, please prioritize using other query type if applicable.

Annotate with different reference genomes (GRCh37, GRCh38)

OncoKB MafAnnotator supports annotating the alteration with reference genome GRCh37 and GRCh38.

The annotator will get the reference genome from MAF file column NCBI_Build or Reference_Genome.
If there is no reference genome specified in the file, we will use the default reference genome through -r parameter.

You can specify the default reference genome using -r parameter (This is only applicable to MafAnnotator.py).
The acceptable values are GRCh37, GRCh38 (case in-sensitive).

If both values are not specified, the annotator will use OncoKB default reference genome which is GRCh37.

Levels of Evidence

Introducing Simplified OncoKB Levels of Evidence:

  • New Level 2, defined as “Standard care biomarker recommended by the NCCN or other expert panels predictive of response to an FDA-approved drug in this indication” (formerly Level 2A).
  • Unified Level 3B, defined as “Standard care or investigational biomarker predictive of response to an FDA-approved or investigational drug in another indication” (combination of previous Levels 2B and 3B).

We have implemented these changes for 2 reasons:

OncoKB API

When you run MafAnnotator.py, FusionAnnotator.py and CnaAnnotator.py, you need a token before accessing the OncoKB data via its web API. Please visit OncoKB Data Access Page for more information about how to register an account and get an OncoKB API token.
With the token listed under OncoKB Account Settings Page, you could use it in the following format.

python ${FILE_NAME.py} -i ${INPUT_FILE} -o ${OUTPUT_FILE} -b ${ONCOKB_API_TOKEN}

Columns added in the annotation files

Column Possible Values Description
GENE_IN_ONCOKB TRUE, FALSE Whether the gene has been curated by the OncoKB Team
VARIANT_IN_ONCOKB TRUE, FALSE Whether the variant has been curated by the OncoKB Team. Note: when a variant does not exist, it may still have annotations.
MUTATION_EFFECT Gain-of-function, Likely Gain-of-function, Loss-of-function, Likely Loss-of-function, Switch-of-function, Likely Switch-of-function, Neutral, Likely Neutral, Inconclusive, Unknown The biological effect of a mutation/alteration on the protein function that gives rise to changes in the biological properties of cells expressing the mutant/altered protein compared to cells expressing the wildtype protein.
ONCOGENIC Oncogenic, Likely Oncogenic, Likely Neutral, Inconclusive, Unknown, Resistance In OncoKB, “oncogenic” is defined as “referring to the ability to induce or cause cancer” as described in the second edition of The Biology of Cancer by Robert Weinberg (2014).
LEVEL_* Therapeutic implications The leveled therapeutic implications
HIGHEST_LEVEL LEVEL_1, LEVEL_2, LEVEL_3A, LEVEL_3B, LEVEL_4, LEVEL_R1, LEVEL_R2 The highest level of evidence for therapeutic implications
CITATIONS PMID, Abstract, Website Link All citations related to a mutation/alteration
LEVEL_Dx* Tumor type the level of evidence is assigned to The leveled diagnostic implications
HIGHEST_DX_LEVEL LEVEL_Dx1, LEVEL_Dx2, LEVEL_Dx3 The highest level of evidence for diagnostic implications
LEVEL_Px* Tumor type the level of evidence is assigned to The leveled prognostic implications
HIGHEST_PX_LEVEL LEVEL_Px1, LEVEL_Px2, LEVEL_Px3 The highest level of evidence for prognostic implications

Questions?

The best way is to email [email protected] so all our team members can help.

oncokb-annotator's People

Contributors

jjgao avatar zhx828 avatar darasanchez avatar victoria34 avatar sheridancbio avatar leowisd avatar dependabot[bot] avatar ygindinrevmed avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.