Code Monkey home page Code Monkey logo

tcsm's Introduction

Tumor Covariate Signature Model (TCSM)

This repository contains the code for reproducing the results on real data from the paper Modeling Clinical and Molecular Covariates of Mutational Process Activity in Cancer, which implements the Tumor Covariate Signature Model (TCSM).

References

Welles Robinson, Roded Sharan, Max Leiserson. (2019). Modeling clinical and molecular covariates of mutational process activity in cancer. Bioinformatics. paper link

Basic Usage

TCSM requires Python 3. We recommend using Anaconda to install dependencies.

Once you have Anaconda installed, you can create an environment with the dependencies and activate it with the following commands:

conda env create -f envs/environment.yml
conda activate tcsm

TCSM requires two submodules, mutation-signature-viz and signature-estimation-py, which can be installed by the following commands

git submodule init
git submodule update

Key Commands

To run TCSM without covariates, use the following command

./src/run_tcsm.R <mutation-count-input-file> <num-signatures>

To run TCSM with covariates, use the following command

./src/run_tcsm.R <mutation-count-input-file> <num-signatures> -c=<covariate-file-input> --covariates <covariate-names(separated by +)>

Demo

We have provided a demo (demo) to help users run TCSM on their own datasets. The demo shows how to use TCSM on real data with and without using covariates. First, the demo estimates the number of signatures in the dataset by plotting the heldout log-likelihood across a range of K. Next, we show how to estimate signatures and exposures using TCSM with and without covariates. Finally, for an advanced use case, we should how to estimate exposures with TCSM when the covariate value is unknown or hidden for a subset of the samples.

Reproducing Key Results

We use Travis CI to regenerate the key figures of the paper (Figure 3 and 4) when the master branch is updated.

Homologous recombination repair (HR) deficiency in breast cancer

Figure 3: (A) Comparison of the log-likelihood of held-out samples across K = 2–10 between TCSM with the biallelic HR covariate (inactivations of BRCA1, BRCA2 or RAD51C) and TCSM without covariates. (B) The log-likelihood ratio (LLR) of samples with the biallelic HR covariate hidden where LLR>0 indicates the mutations of a sample are more likely under the biallelic HR covariate inactivation model. (C) After excluding tumors with known biallelic inactivations in BRCA1, BRCA2 or RAD51C, the plot of a tumor’s LLR against its LST count

Simultaneously learning signatures in melanomas and lung cancers

Figure 4: (A) The heldout log-likelihood plot used for model selection to obtain K = 4. (B) The log-likelihood ratio (LLR) of the cancer type covariate for tumors where LLR <0 means the mutations of the tumor are more likely under LUSC and LLR >0 means the mutations of the tumor are more likely under SKCM

tcsm's People

Contributors

wir963 avatar

Stargazers

Wu,Yang avatar

Watchers

James Cloos avatar  avatar Max Leiserson avatar

tcsm's Issues

Question on heldout method to estimate K

Dear @wir963,

I understood the way you did heldout method to estimate the best K. For each K, you split all the tumors into two parts - train dataset (80%) and test dataset (20%), and you can estimate the likelihood for each specific value of K.

Now the question is, do I need to do the heldout 5 times for each K to calculate 5 different likelihood values? Or I only need to do once?

Whether or How to determine the number of signatures (K) automatically?

Dear @wir963 ,

run_stm.R requires me to provide the number of signatures (K) in a mutation count input file.

But in many cases, it would not be possible to know K in advance. So is there a function to determine K automatically? If not, is there a plotting script to give me a PDF plot for aiding the selection of K? Thanks!

Lack an example for train.feature.file and test.feature.file

Dear @wir963 ,

Now I know the usage of tcsm. If covariate is NULL, it's much easier. If covariate is provided, I also need to provide the feature file and covariate file for the train and test tumors.

run.tcsm <- function(mutation.count.file, feature.file, covariates, K, seed, exposure.output.file, signature.output.file, effect.output.file, sigma.output.file, gamma.output.file)

run.stm <- function(train.mutation.count.file, test.mutation.count.file, train.feature.file, test.feature.file, covariates, K, seed, heldout.performance.file)

Now the question is, in your demo folder, I cannot find the train.feature.file and test.feature.file. Can you show me what does the feature file look like? Thanks!

Cannot run `git submodule update` using a non-root user in Linux

Dear @wir963 ,

I've created a conda environment as the direction, and named it as "tcsm". But I cannot run the last step of sub-module installation:
git submodule update

The error is printed as follows:

(base) [wuyang@monster tcsm]$ conda activate tcsm
(tcsm) [wuyang@monster tcsm]$ git submodule init
(tcsm) [wuyang@monster tcsm]$ git submodule update
Cloning into 'mutation-signatures-viz'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Clone of '[email protected]:lrgr/mutation-signatures-viz.git' into submodule path 'mutation-signatures-viz' failed

Can you help me to fix this issue? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.