Code Monkey home page Code Monkey logo

wrangle's People

Contributors

jingchunzhu avatar nathandunn avatar

Stargazers

Bob Carter avatar Chad Smith avatar  avatar aoi42 avatar Jian-Guo Zhou avatar Wen-Wei Liao avatar MaleicAcid avatar Vivek Todur avatar

Watchers

James Cloos avatar  avatar Kostas Georgiou avatar  avatar

wrangle's Issues

generate segmented CNV segment mean distribution

TCGA CNV data comes as CNV segments. Each segment has a value, called segment mean. These values are continuous values. However biologist often refer copy number as gain and loss, discrete states. I am interested in the distribution of segment means for TCGA pan-cancer cohort, as well as individual TCGA cohort. The distribution will help me to determine the cutoff for determine copy number gain or loss.

In a diploid genome, a single-copy gain in a perfectly pure, homogeneous sample has a copy ratio of 3/2. In log2 scale, this is log2(3/2) = 0.585, and a single-copy loss is log2(1/2) = -1.0.” However, most tumors are heterogeneous (clonal tumor populations) and have some normal stroma. Therefore, the sample’s purity need to be considered so alterations are not missed.

  1. generate copy number segment mean distribution without adjust for purity
  2. generate copy number adjusted segment mean (adjusted for purity) distribution .

Derived dataset for each sample: fraction of genome altered by 1: copy number change 2. number of mutations

Build derived datasets:
for each sample: fraction of genome altered by copy number change
for each sample: number of mutations

Hi,

Would you guys be able to create track for fraction of genome altered by 1: copy number change and 2: number of mutations for each TCGA cohort or for the pan cancer? It used to be available in cBioPortal. The number of mutations per sample is still available but fraction of the genome altered by copy number is no longer available. Someone from MSKCC is working on getting that live again. Or is there a way to generate this data from downloading it form Xena and calculating it myself?

Thanks,

derived dataset: mutation load

Build derived datasets for TCGA
For each sample, number of mutations -- this measures tumor mutation load, mutations include all types of mutations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.