Code Monkey home page Code Monkey logo

balsamic's People

Contributors

ashwini06 avatar dnil avatar fevac avatar hassanfa avatar imsarath avatar ivadym avatar khurrammaqbool avatar mathiasbio avatar mropat avatar northwestwitch avatar pbiology avatar rannick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

balsamic's Issues

Tumor mutational burden

  1. TMB was defined as the number of somatic, coding, base substitution, and indel mutations per megabase of genome examined. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0424-2
    1.1. Non-coding alterations were not counted.
    1.2 Alterations listed as known somatic alterations in COSMIC and truncations in tumor suppressor genes were not counted
    1.3 Alterations predicted to be germline by the somatic-germline-zygosity algorithm were not counted
    1.4 Alterations that were recurrently predicted to be germline in our cohort of clinical specimens were not counted.
    1.5 Known germline alterations in dbSNP were not counted.
    1.6 To calculate the TMB per megabase, the total number of mutations counted is divided by the size of the coding region of the targeted territory.
    1.7 select the table 1 from paper above as a ref for comparison.

  2. Tumor mutation burden (TMB), fraction of copy number–altered genome, and gene alterations were compared among patients with DCB and no durable benefit (NDB). http://ascopubs.org/doi/full/10.1200/JCO.2017.75.3384
    2.1 in addition to above, copy number alterations were also counted.

CollectMultipleMetrics

CollectMultipleMetrics can gather multiple metrics, but it might fail on some metrics due to various reasons (e.g. java memory issue). Prepare a list of metrics that we are interested from this.

Manta single sample mode

Manta in single sample mode should only have the following files in their output:

"tumorSV.vcf.gz", "candidateSV.vcf.gz", "candidateSmallIndels.vcf.gz"

while right now it is:

"diploidSV.vcf.gz", "somaticSV.vcf.gz", "candidateSV.vcf.gz", "candidateSmallIndels.vcf.gz"

Add logger

  • balsamic run
  • balsamic config sample
  • balsamic config report
  • balsamic report
  • balsamic install

Result report filter config

Result report filter config is series of filters, configs, and parameters to summarize analysis results as an effort to present a list of actionable targets and discovered new targets.

This aggregated list will generate 4 list of variants for SNV and INDELs:

  • High confidence set of MSK-IMPACT pipeline's exact replica; See FDA approved figure 1.
  • Low confidence set of MSK-IMPACT pipeline's exact replica, see FDA approved figure 1.
  • High confidence set of variants. identified by at least 1 variant caller, AD>=1, VD>=1, and in MVL.
  • Low confidence set of variants. These variants are not in MVL, thus low confidence. So instead, a set variant caller specific filters will be applied according to recent research and clinical findings.

Prepare bed file for analysis

BED files need interval preparation and run estimation time. A separate mini-analysis to prepare and split bed files can speed up analysis and reduce runtime.

Reporting actual DP and AD in Mutect2

It seems the AF that mutect2 in gatk3.8 is reporting is not matching the actual AD values. This is due to AD show is "unfiltered" AD. And read depth values are just estimates "before" applying the internal filters. broadinstitute/gatk#3808 and thus DP is not reported by Mutect2 in gatk3.8 (however, it is reported in gatk4 version, where AD values are also not matching with gatk3.8). This could be due to internal filtering process that's happening behind the scene.

A solution to get the values used by Mutect2 to add the actually AD to VCF is to invoke Coverage annotation through --annotation command. Note that DP value reported here will still be unfiltered reads of tumor + normal. For actually DP value, AD values need to be summed.

Add more PCT_TARGET_BASES steps

Add up tom 1000X instead of 200X that is today.

After 200X, it should be increased by 50X (e.g. 250, 300, etc) to make it possible to plot the data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.