lrgr / tcsm Goto Github PK

Tumor Covariate Signature Model

License: MIT License

Python 81.63% R 18.37%

tcsm's Introduction

Tumor Covariate Signature Model (TCSM)

This repository contains the code for reproducing the results on real data from the paper Modeling Clinical and Molecular Covariates of Mutational Process Activity in Cancer, which implements the Tumor Covariate Signature Model (TCSM).

References

Welles Robinson, Roded Sharan, Max Leiserson. (2019). Modeling clinical and molecular covariates of mutational process activity in cancer. Bioinformatics. paper link

Basic Usage

TCSM requires Python 3. We recommend using Anaconda to install dependencies.

Once you have Anaconda installed, you can create an environment with the dependencies and activate it with the following commands:

conda env create -f envs/environment.yml
conda activate tcsm

TCSM requires two submodules, mutation-signature-viz and signature-estimation-py, which can be installed by the following commands

git submodule init
git submodule update

Key Commands

To run TCSM without covariates, use the following command

./src/run_tcsm.R <mutation-count-input-file> <num-signatures>

To run TCSM with covariates, use the following command

./src/run_tcsm.R <mutation-count-input-file> <num-signatures> -c=<covariate-file-input> --covariates <covariate-names(separated by +)>

Demo

We have provided a demo (demo) to help users run TCSM on their own datasets. The demo shows how to use TCSM on real data with and without using covariates. First, the demo estimates the number of signatures in the dataset by plotting the heldout log-likelihood across a range of K. Next, we show how to estimate signatures and exposures using TCSM with and without covariates. Finally, for an advanced use case, we should how to estimate exposures with TCSM when the covariate value is unknown or hidden for a subset of the samples.

Reproducing Key Results

We use Travis CI to regenerate the key figures of the paper (Figure 3 and 4) when the master branch is updated.

Homologous recombination repair (HR) deficiency in breast cancer

Figure 3: (A) Comparison of the log-likelihood of held-out samples across K = 2–10 between TCSM with the biallelic HR covariate (inactivations of BRCA1, BRCA2 or RAD51C) and TCSM without covariates. (B) The log-likelihood ratio (LLR) of samples with the biallelic HR covariate hidden where LLR>0 indicates the mutations of a sample are more likely under the biallelic HR covariate inactivation model. (C) After excluding tumors with known biallelic inactivations in BRCA1, BRCA2 or RAD51C, the plot of a tumor’s LLR against its LST count

Simultaneously learning signatures in melanomas and lung cancers

Figure 4: (A) The heldout log-likelihood plot used for model selection to obtain K = 4. (B) The log-likelihood ratio (LLR) of the cancer type covariate for tumors where LLR <0 means the mutations of the tumor are more likely under LUSC and LLR >0 means the mutations of the tumor are more likely under SKCM

tcsm's People

Contributors

Stargazers

Watchers

Forkers

boston123456 tracyyxchen iuniorhsiung embodimentgeniuslm3

tcsm's Issues

Question on heldout method to estimate K

Dear @wir963,

I understood the way you did heldout method to estimate the best K. For each K, you split all the tumors into two parts - train dataset (80%) and test dataset (20%), and you can estimate the likelihood for each specific value of K.

Now the question is, do I need to do the heldout 5 times for each K to calculate 5 different likelihood values? Or I only need to do once?

Whether or How to determine the number of signatures (K) automatically?

Dear @wir963 ,

run_stm.R requires me to provide the number of signatures (K) in a mutation count input file.

But in many cases, it would not be possible to know K in advance. So is there a function to determine K automatically? If not, is there a plotting script to give me a PDF plot for aiding the selection of K? Thanks!

Lack an example for train.feature.file and test.feature.file

Dear @wir963 ,

Now I know the usage of tcsm. If covariate is NULL, it's much easier. If covariate is provided, I also need to provide the feature file and covariate file for the train and test tumors.

run.tcsm <- function(mutation.count.file, feature.file, covariates, K, seed, exposure.output.file, signature.output.file, effect.output.file, sigma.output.file, gamma.output.file)

run.stm <- function(train.mutation.count.file, test.mutation.count.file, train.feature.file, test.feature.file, covariates, K, seed, heldout.performance.file)

Now the question is, in your demo folder, I cannot find the train.feature.file and test.feature.file. Can you show me what does the feature file look like? Thanks!

Cannot run `git submodule update` using a non-root user in Linux

Dear @wir963 ,

I've created a conda environment as the direction, and named it as "tcsm". But I cannot run the last step of sub-module installation:
git submodule update

The error is printed as follows:

(base) [wuyang@monster tcsm]$ conda activate tcsm
(tcsm) [wuyang@monster tcsm]$ git submodule init
(tcsm) [wuyang@monster tcsm]$ git submodule update
Cloning into 'mutation-signatures-viz'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Clone of '[email protected]:lrgr/mutation-signatures-viz.git' into submodule path 'mutation-signatures-viz' failed

Can you help me to fix this issue? Thanks!

How to extract signatures and reconstruct mutation spectra?

Hi @wir963 ,

I have a question on using tcsm: how to use tcsm to extract signatures and quantify exposures? I cannot find any user manual or helper message to guide me in doing these two tasks.

Thanks!

lrgr / tcsm Goto Github PK

tcsm's Introduction

Tumor Covariate Signature Model (TCSM)

References

Basic Usage

Key Commands

Demo

Reproducing Key Results

Homologous recombination repair (HR) deficiency in breast cancer

Simultaneously learning signatures in melanomas and lung cancers

tcsm's People

Contributors

Stargazers

Watchers

Forkers

tcsm's Issues

Recommend Projects

Recommend Topics

Recommend Org