Code Monkey home page Code Monkey logo

mobster's Introduction

mobster

check-master check-development Lifecycle: stable

mobster is a package that implements a model-based approach for subclonal deconvolution of cancer genome sequencing data (Caravagna et al; PMID: 32879509).

The package integrates evolutionary theory (i.e., population) and Machine-Learning to analyze (e.g., whole-genome) bulk data from cancer samples. This analysis relates to clustering; we approach it via a maximum-likelihood formulation of Dirichlet mixture models, and use bootstrap routines to assess the confidence of the parameters. The package implements S3 objects to visualize the data and the fits.

Citation

If you use mobster, please cite:

  • G. Caravagna, T. Heide, M.J. Williams, L. Zapata, D. Nichol, K. Chkhaidze, W. Cross, G.D. Cresswell, B. Werner, A. Acar, L. Chesler, C.P. Barnes, G. Sanguinetti, T.A. Graham, A. Sottoriva. Subclonal reconstruction of tumors by using machine learning and population genetics. Nature Genetics 52, 898–907 (2020).

Help and support

Installation

You can install the released version of mobster from GitHub with:

# install.packages("devtools")
devtools::install_github("caravagnalab/mobster")

Copyright and contacts

mobster's People

Contributors

caravagn avatar luca-dex avatar militeee avatar t-heide avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobster's Issues

No Function: squareplot

Hi, I carefully check the funtion in Mobster, but don't see any information about "squareplot". Look forward to your update.

Bestwishes,

Sunny.

bug in call to easypar

Running the following code from the example gives me an error

library(mobster)

dataset = random_dataset(
  seed = 123, 
  Beta_variance_scaling = 100    # variance ~ U[0, 1]/Beta_variance_scaling
  )

fit = mobster_fit(
  dataset$data,    
  auto_setup = "FAST",
  parallel = F
  )
Loaded input data, n = 5000.
❯ n = 5000. Mixture with k = 1,2 Beta(s). Pareto tail: TRUE and FALSE. Output clusters with
π > 0.02 and n > 10.
! mobster automatic setup FAST for the analysis.
❯ Scoring (without parallel) 2 x 2 x 2 = 8 models by reICL.

[easypar] run 1 - Error: Can't merge the outer name `init.value` with a vector of length > 1.
Please supply a `.name_spec` specification.

[easypar] run 2 - Error: Can't merge the outer name `init.value` with a vector of length > 1.
Please supply a `.name_spec` specification.

[easypar] run 3 - Error: Can't merge the outer name `init.value` with a vector of length > 1.
Please supply a `.name_spec` specification.

[easypar] run 4 - Error: Can't merge the outer name `init.value` with a vector of length > 1.
Please supply a `.name_spec` specification.

[easypar] run 5 - Error: Can't merge the outer name `init.value` with a vector of length > 1.
Please supply a `.name_spec` specification.

[easypar] run 6 - Error: Can't merge the outer name `init.value` with a vector of length > 1.
Please supply a `.name_spec` specification.

[easypar] run 7 - Error: Can't merge the outer name `init.value` with a vector of length > 1.
Please supply a `.name_spec` specification.

[easypar] run 8 - Error: Can't merge the outer name `init.value` with a vector of length > 1.
Please supply a `.name_spec` specification.

[easypar] 8/8 computations returned errors and will be removed.
Error in mobster_fit(dataset$data, auto_setup = "FAST", parallel = F) : 
  All task returned errors, no fit available, raising this error to interrupt the computation....

Input specifications

I can't seem to find what the "d" and "popsize" are indicating in the input examples.
Could someone clarify?

Typo?

Hi,

This line in selection2clonenested() throws an error that time variable is missing. Is it a typo, and should it be the time1?

x3 <- log(2) * (time_end - time)

Best regards,
Paweł

fit error

Errore in -tests$likelihood : argomento non valido per l'operatore unario

How about WES data?

Hi, I was wondering whether it is feasible to deal with exome-sequencing data using mobster?

Improve speed of bootstrap

Write the bootstrap routines to work on offline data so to submit an array job to the cluster using easypar. That is much faster than the current implementation.

Wrapping support for neutralitytestr

@marcjwilliams1 Check out commit 7396fea3308aee9921220b05e4b6f7e6267cd93e. I wrapped a call to the neutralitytestr package.

It is wrapped from a general MOBSTER object (k regions) which contains MOBSTER fits in the $fit.MOBSTER field. VAF is already adjusted by MOBSTER, mutations are assumed to be diploid. The integration range is selected via custom upper and lower quantiles. The test run on tail mutations.

Function neutralitytest is run to each sample that has a fit tail. The final mutation rate M is a linear combination of the mutation rate per sample, weighted by the tail size (normalized). The idea is that if one sample has a large tail (900 muts), and 1 a very small one (100 mutations), we want to give more weight to estimate of mutation rate for the larger tail (90%).

Model selection failure when sparse low frequency mutations present

Hi Giulio,

So I've been running lots of examples with mobster (very cool stuff) but I did notice a consistent pattern of model selection failure (weird beta distributions fits) when sparse and dispersed low frequency mutations are present in the VAF. See example plot below (left):

mobster_failure_example

Both fits were run with the default settings (not auto_setup = FAST).

I think a straight forward solution is to just trim the neutral tail a bit by removing mutations below ~2-5% VAF (see right plot - trimming mutations below 5% VAF fixes this issue), but I am wondering if there is any automated solution within the package for this? Maybe I looked over something in the documentation.

Error in vignette during binomial_noise branch build

Error when building binomial_noise branch from repo on Rstudio

Quitting from lines 31-35 [unnamed-chunk-3] (a4_popgen.Rmd)
Error: processing vignette 'a4_popgen.Rmd' failed with diagnostics:
Bad MOBSTER input (list of fits).
--- failed re-building ‘a4_popgen.Rmd’

Can be worked around by setting 'vignettes = F' in build()

Is it possible to deal with ctDNA data?

Hi,
I find the mobster very useful for subclone reconstruction analysis and would like to use it for my ctDNA research.
Could you please let me know if mobster can deal with WES ctDNA data?
Thank you!

Strelka2 and Mutect2 inputs

Hi,

I would like to ask some questions about the inputs.
I have Mutect2 (both unfiltered and filtered with GATK FilterMutectCalls) and Strelka2 calls.
My questions are:

  • which are the correct arguments for the DP_column and NV_column paramenters in the load_vcf() function, for Mutect2 and Strelka2 vcf files?
  • is it possible to use directly Mutect2 and Strelka2 vcf files as input for load_vcf() or it is necessary to manipulate the files? In that case, how should I do it? Are there some kind of "helper" functions I missed from the package manual?
  • is it possible/suggested to use just the "PASS" calls?

Sorry for the silly questions, I just want to be 100% sure of what I'm doing.
Thank you!

Missing line in DESCRIPTION file

Error in (function (command = NULL, args = character(), error_on_status = TRUE, …:
! System command 'R' failed

Exit status: 1
stdout & stderr:

Type .Last.error to see the more details.
Warning messages:
1: In readLines(f, n) :
incomplete final line found on '/Users/madeleine.dale/Library/R/arm64/4.3/library/mobster/DESCRIPTION'
2: In readLines(file) :
incomplete final line found on '/Users/madeleine.dale/Library/R/arm64/4.3/library/mobster/DESCRIPTION'
3: In readLines(f, n) :
incomplete final line found on '/Users/madeleine.dale/Library/R/arm64/4.3/library/mobster/DESCRIPTION'
4: In readLines(file) :
incomplete final line found on '/Users/madeleine.dale/Library/R/arm64/4.3/library/mobster/DESCRIPTION'

Error: Error in mobster:::check_input(x, K, samples, init, tail, epsilon, maxIter, :

Hi,
I came across the following error. Do you think it is because I updated recently some packages? Could you please help me to fix this?

Thanks,

library(mobster)
library(tidyr)
library(dplyr)
example_data = Clusters(mobster::fit_example$best)
drivers_rows = c(2239, 3246, 3800)
example_data$is_driver = FALSE
example_data$driver_label = NA
example_data$is_driver[drivers_rows] = TRUE
example_data$driver_label[drivers_rows] = c("DR1", "DR2", "DR3")
# Fit and print the data
fit = mobster_fit(example_data, auto_setup = 'FAST')
​
 [ MOBSTER fit ] 

Error in mobster:::check_input(x, K, samples, init, tail, epsilon, maxIter, : There are some reserved names in the input data that cannot be used, please remove or rename columns: cluster, Tail, C1, C2
Traceback:

1. mobster_fit(example_data, auto_setup = "FAST")
2. mobster:::check_input(x, K, samples, init, tail, epsilon, maxIter, 
 .     fit.type, seed, model.selection, trace)
3. stop("There are some reserved names in the input data that cannot be used, please remove or rename columns: ", 
 .     paste0(fixed_names, collapse = ", "))

Missing required dependencies

It seems it is currently missing the following CRAN package

wesanderson
reshape2

plus my github package

ctree

Can we add these to the package so they get downloaded automatically? Should we do that into development and mirror the change to master as well?

Walk-through Example?

Hi,

I would like to try MOBSTER, but can't see any examples of how to input data, formatting, or what output will look like.

Is there a 'walk-through' with some example data that I can follow?

Thanks,

Bruce.

Can't subset columns that don't exist

In plot_latent_variables a MOBSTER FIT is passed as main argument and it contains a tibble like

# A tibble: 3 x 7
     VAF cluster   Tail       C1       C2 is_driver driver_label
   <dbl> <chr>    <dbl>    <dbl>    <dbl> <lgl>     <chr>       
1 0.448  C1      0.0125 9.88e- 1 8.08e-21 TRUE      DR1         
2 0.159  C2      0.225  2.35e-34 7.75e- 1 TRUE      DR2         
3 0.0629 Tail    1.00   1.91e-82 4.02e- 5 TRUE      DR3 

after passing the objet to the function Clusters we get

# A tibble: 5,000 x 10
     VAF cluster Tail...3 C1...4   C2...5 is_driver driver_label Tail...8 C1...9
   <dbl> <chr>      <dbl>  <dbl>    <dbl> <lgl>     <chr>           <dbl>  <dbl>
 1 0.497 C1       0.00736  0.993 5.22e-27 FALSE     NA            0.00736  0.993
 2 0.490 C1       0.00669  0.993 4.42e-26 FALSE     NA            0.00669  0.993
 3 0.470 C1       0.00705  0.993 1.31e-23 FALSE     NA            0.00705  0.993
 4 0.517 C1       0.0130   0.987 1.83e-29 FALSE     NA            0.0130   0.987
 5 0.506 C1       0.00903  0.991 3.86e-28 FALSE     NA            0.00903  0.991
 6 0.440 C1       0.0179   0.982 9.68e-20 FALSE     NA            0.0179   0.982
 7 0.428 C1       0.0347   0.965 3.88e-18 FALSE     NA            0.0347   0.965
 8 0.523 C1       0.0164   0.984 3.97e-30 FALSE     NA            0.0164   0.984
 9 0.482 C1       0.00648  0.994 3.87e-25 FALSE     NA            0.00648  0.994
10 0.499 C1       0.00759  0.992 3.20e-27 FALSE     NA            0.00759  0.992
# … with 4,990 more rows, and 1 more variable: C2...10 <dbl>

This results in the break of the execution of

clusters_names = names(x$pi)
assignments %>% select(clusters_names)

where assignments is the results of the Cluster function.
The two tibbles have different column names and the select doesn't work causing an error.

The problem can be replicated executing the second stage of the vignette "2. Plotting fits"

remove mutations with VAF = 0?

I get the following error sometimes, looks if I remove mutations with VAF = 0.0 it goes away. I think at the moment they get set to 1e-9, maybe we just remove them? Or do you think it's something else?

Error in if (.stoppingCriterion(i, prevNLL, fit$NLL, prevpi, fit$pi, fit.type,  : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In .dbpmm.EM(x, K = tests[r, "K"], init = init, tail = tests[r,  :
  Possible singularity in one Beta component a/b --> Inf.

Error when running exaples

At the moment R CMD check is failing when running examples. Here the error:

  Running examples in ‘mobster-Ex.R’ failed
  The error most likely occurred in:
  
  > ### Name: get_clone_trees
  > ### Title: Return clone trees from the fit.
  > ### Aliases: get_clone_trees

...

  > trees = get_clone_trees(x)
  Error in get_clone_trees(x) : 
    Your data should have driver events annotated, cannot use 'ctree' otherwise.
  Execution halted

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.