ucscxena / wrangle Goto Github PK

View Code? Open in Web Editor NEW

9.0 4.0 3.0 21.38 MB

data_wrangling

Python 94.44% Shell 0.89% Jupyter Notebook 2.99% JavaScript 1.68%

wrangle's People

Contributors

Stargazers

Watchers

Forkers

maleicacid jianguozhou3 snijesh

wrangle's Issues

generate segmented CNV segment mean distribution

TCGA CNV data comes as CNV segments. Each segment has a value, called segment mean. These values are continuous values. However biologist often refer copy number as gain and loss, discrete states. I am interested in the distribution of segment means for TCGA pan-cancer cohort, as well as individual TCGA cohort. The distribution will help me to determine the cutoff for determine copy number gain or loss.

In a diploid genome, a single-copy gain in a perfectly pure, homogeneous sample has a copy ratio of 3/2. In log2 scale, this is log2(3/2) = 0.585, and a single-copy loss is log2(1/2) = -1.0.” However, most tumors are heterogeneous (clonal tumor populations) and have some normal stroma. Therefore, the sample’s purity need to be considered so alterations are not missed.

generate copy number segment mean distribution without adjust for purity
generate copy number adjusted segment mean (adjusted for purity) distribution .

Derived dataset for each sample: fraction of genome altered by 1: copy number change 2. number of mutations

Build derived datasets:
for each sample: fraction of genome altered by copy number change
for each sample: number of mutations

Hi,

Would you guys be able to create track for fraction of genome altered by 1: copy number change and 2: number of mutations for each TCGA cohort or for the pan cancer? It used to be available in cBioPortal. The number of mutations per sample is still available but fraction of the genome altered by copy number is no longer available. Someone from MSKCC is working on getting that live again. Or is there a way to generate this data from downloading it form Xena and calculating it myself?

Thanks,

derived dataset: mutation load

Build derived datasets for TCGA
For each sample, number of mutations -- this measures tumor mutation load, mutations include all types of mutations.

ucscxena / wrangle Goto Github PK

wrangle's People

Contributors

Stargazers

Watchers

Forkers

wrangle's Issues

generate segmented CNV segment mean distribution

Derived dataset for each sample: fraction of genome altered by 1: copy number change 2. number of mutations

derived dataset: mutation load

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent