caravagnalab / rcongas Goto Github PK
View Code? Open in Web Editor NEWrcongas
License: GNU General Public License v3.0
rcongas
License: GNU General Public License v3.0
The actual filtering function allows just a basic filter on the quantile of the mean maximum expression, it would definitely be better to filter based also on the variance
See the new vignettes [9f169d5] 1_Models
We should rename this repo to Rcongas
to make it consistent with the package, update link everywhere (yaml files, Rmd, etc), as well as in the forks at the Lab Git - and eventually change ownership and let it be in just one place.
Problem:
# Load DE results - forward params
DE_table = get_DE_table(x, chromosomes = chromosomes, ...)
# Get gene locations
x$data$gene_locations %>%
filter(gene %in% DE_table$gene)
DE_table
contains many genes, when I subset given gene locations I loose multiple genes. There is some problem with the way genes are stored across the various data structions
Missing unique after computation of get_segment_ids
.
I was able to run segments_selector_congas()
but fit_congas()
failed with the following errors:
[easypar] 4/4 computations returned errors and will be removed.
Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `BIC` doesn't exist.
I get a similar error if I try to rerun with a different model selection criterion.
I was able to run successfully on the demo data using the same code and environment. I am using the most recent GitHub version of RCONGAS.
here's the full trace:
<error/vctrs_error_subscript_oob>
Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `BIC` doesn't exist.
---
Backtrace:
▆
1. ├─Rcongas:::fit_congas(...)
2. │ ├─base::order(model_selection_df %>% pull(!!model_selection))
3. │ └─model_selection_df %>% pull(!!model_selection)
4. ├─dplyr::pull(., !!model_selection)
5. ├─dplyr:::pull.data.frame(., !!model_selection)
6. │ └─tidyselect::vars_pull(names(.data), !!enquo(var))
7. │ └─tidyselect:::pull_as_location2(...)
8. │ ├─tidyselect:::with_subscript_errors(...)
9. │ │ └─rlang::try_fetch(...)
10. │ │ └─base::withCallingHandlers(...)
11. │ └─vctrs::vec_as_location2(...)
12. │ ├─vctrs:::result_get(...)
13. │ └─vctrs:::vec_as_location2_result(...)
14. │ ├─base::tryCatch(...)
15. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
16. │ │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
17. │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
18. │ └─vctrs::vec_as_location(i, n, names = names, arg = arg, call = call)
19. └─vctrs (local) `<fn>`()
20. └─vctrs:::stop_subscript_oob(...)
21. └─vctrs:::stop_subscript(...)
22. └─rlang::abort(...)
Really appreciate your contributions. I met the error when executing the function Rcongas:::fit_congas
as follows:
── Fit with k = 4 and lambda = 0.5.
── Fit with k = 2 and lambda = 0.5.
── Fit with k = 1 and lambda = 0.5.
── Fit with k = 3 and lambda = 0.5.
No protocol specified
No protocol specified
No protocol specified
No protocol specified
sys:1: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:178.)
sys:1: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:178.)
sys:1: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:178.)
sys:1: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:178.)
Warning messages:
1: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘Rcongas’
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’
Warning messages:
1: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘Rcongas’
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’
Warning messages:
1: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘Rcongas’
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’
Warning messages:
1: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘Rcongas’
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’
[easypar] 4/4 computations returned errors and will be removed.
── (R)CONGAS+ fits completed in 24.4s. ──
Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `BIC` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `BIC` doesn't exist.
---
Backtrace:
▆
1. ├─Rcongas:::fit_congas(...)
2. │ ├─base::order(model_selection_df %>% pull(!!model_selection))
3. │ └─model_selection_df %>% pull(!!model_selection)
4. ├─dplyr::pull(., !!model_selection)
5. └─dplyr:::pull.data.frame(., !!model_selection)
Run rlang::last_trace(drop = FALSE) to see 17 hidden frames.
I don't know how to deal with this. Look forward to your reply.
I encountered an issue when installing the package and received the following message:
E checking DESCRIPTION meta-information ...
Authors@R field gives more than one person with maintainer role:
Salvatore Milite <[email protected]> [aut, cre]
Lucrezia Patruno <[email protected]> [aut, cre]
See section 'The DESCRIPTION file' in the 'Writing R Extensions'
manual.
Error: Failed to install 'Rcongas' from GitHub:
! System command 'Rcmd.exe' failed
Is there any method to handle this problem? ATM I could only install the package by modifying DESCRIPTION file to be only 1 maintainer.
We should ensure that when we filter clusters we do not just remove labels, but we actually re-compute the model score, latent variables and clustering assignments
It contains old cluster labels 1, 2, etc instead of c1, c2 etc.
Should be get_clusters_ploidy
for consistency with all the other get_clusters*
functions.
Hi,
I meet the error as below when I run the following code:
filt = segments_selector_congas(x,score="BIC")
Warning message:
replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’
replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’
Error in f()
:
! Argument 1 must be a data frame or a named atomic vector.
Run rlang::last_trace()
to see where the error occurred.
Looks like this is the underlying error location:
runs <- easypar::run(FUN = one k, PARAMS = apply(k lambda pairs,1,list),parallel = parallel, progress bar = FALSE)
Fit with k = 2 and lambda = 0.5.
[easypar] 1/1 computations returned errors and will be removed.
runs
NULL
This is the data I used:
x
── [ (R)CONGAS+ ] Bimodal P4 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── CNA segments (reference: GRCh38)
→ Input 22 CNA segments, mean ploidy 2.
| | | | | | | | | | | |
Ploidy: 0 1 2 3 4 5 *
── Modalities
→ RNA: 7727 cells with 11101 mapped genes, 8889659 non-zero values. Likelihood: Negative Binomial.
→ ATAC: not available
! Clusters: not available.
── LOG ──
Thank you very much!
If we refit an object we should cancel all analyses based on clusters (DE)
Hi,
Thanks for the awesome package! I run the exmple_data, but meet the error as below:
inference <- best_cluster(input_rcongas, clusters = 1:5, model = "MixtureGaussian", param_list = list(theta_shape = theta_vals[1], theta_rate = 1),
steps = 500, lr = 0.01, MAP = TRUE, seed = 3, normalize_by_segs = FALSE, method = "ICL")
[1] "Model not present, please run list_models() to see what is available"
Error in eval(expr, envir, enclos): TypeError: 'str' object is not callable
Traceback:
I have multimodal 10x data (both RNA-seq and ATAC-seq measured from the same cells) and would like to use the "multomics" model (Fig. 2C in the PLoS Comp Bio paper) in which each cell is assigned to the same cluster in both modalities. However, the vignette for fitting a CONGAS+ model returns different cluster sizes between the two modalities, and I can't find a parameter to e.g. init() or fit_congas() to specify the multiomics model or that clusters should be consistent between modalities. Is there an interface available to the multiomics model?
Thanks for the tool. I am stopped by the error Could not find function "fit_congas_single_run"
when excuting one_k
function, which finally leads to the error x/x computations returned errors and will be removed
when running the function Rcongas:::fit_congas
.
After running CONGAS on 1 of my samples I was not able to obtain the clusters I think due to the following warning:
> input_rcongas <- init(data = count_matrix , cnv_data = cnv_table, description = s,
+ reference_genome = "hg19", online = FALSE, correct_bins = T)
ℹ Extracting gene data from reference.
✔ Validated input(s): 904 cells, 12449 genes and 126 CNA segments.
ℹ Processing input counts.
ℹ Assembly Rcongas object.
! Mapping inconsistent for 687 genes out 10610, removing those from the raw data table.
ℹ Retaining 98.6 Mb long-format tibble data with 4274786 points, matrix was 86.7 Mb.
ℹ Object size in memory: 99.9 Mb.
── [ Rcongas ] X ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
→ Data: 904 cells with 9878 genes, aggregated in 31 segments.
! Clusters: not available.
Warning message:
Each row in `x` should match at most 1 row in `y`.
ℹ Row 109 of `x` matches multiple rows.
ℹ If multiple matches are expected, specify `multiple = "all"` in the join call to silence this warning.
After digging in the code apparently some genes appear in multiple chromosomes which makes this step crash:
gene_locations = gene_locations %>% left_join(locations,
by = "gene") %>% dplyr::select(gene, chr, from, to, segment_id)
For example in my case gene "U1" according to gene_locations object appears in :
1 U1 chr1 16860381 16862144 chr1:14743501:22118500
2 U1 chr1 17197440 17200587 chr1:14743501:22118500
3 U1 chr3 125879126 125879288 chr1:14743501:22118500
4 U1 chr7 119646030 119646155 chr1:14743501:22118500
So maybe it is necessary applying an extra filtering step.
We need to systematically produce the Rmd files with roxygen to document the basic functions for the analysis. We need also to expose some tutorial on scRNAseq analysis with CONGAS.
Good day!
I have read the documentation in https://caravagnalab.github.io/rcongas/index.html, and it provides valuable information on how to run Rcongas on multimodal data, but it is not clear to me whether it can work only using the scATAC modality. If so, what adjustments would I need to make to run it correctly?
Also, after running the pipeline, do the results also include the classification of a cell to be either diploid or aneuploid?
Thanks in advance!
Can we add a function to compute the CCF of a CNA segment of the raw data? Can we plot that as a histogram, is that a power law?
I liked the idea behind CONGAS, and I was about to try it on my dataset. But after installing I could not run the example datasets. The following error popped out:
> data(congas_example)
> print(congas_example)
── [ Rcongas ] ──────────────────────────────────────────────────────────────────────────────────
→ Data: 503 cells with 8564 genes, aggregated in 70 segments.
ℹ Clusters: k = 2, model with AIC = 577427.35.
● Cluster c1, n = 379 [75.35% of total cells].
● Cluster c2, n = 124 [24.65% of total cells].
── CNA highlights (alpha = 0.05)
Error in `mutate()`:
! Problem while computing `cluster = ifelse(...)`.
Caused by error in `str_starts()`:
! could not find function "str_starts"
Run `rlang::last_error()` to see where the error occurred.
> plot(congas_example)
Error in `mutate()`:
! Problem while computing `cluster = ifelse(...)`.
Caused by error in `str_starts()`:
! could not find function "str_starts"
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/dplyr:::mutate_error>
Error in `mutate()`:
! Problem while computing `cluster = ifelse(...)`.
Caused by error in `str_starts()`:
! could not find function "str_starts"
---
Backtrace:
1. base::plot(congas_example)
22. base::ifelse(...)
Run `rlang::last_trace()` to see the full context.
It seems like a package version issue but I updated all the dependencies, so I am not sure if I should try some old versions for dplyr
or stringr
packages. Which versions did you use?
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rcongas_0.1.0
loaded via a namespace (and not attached):
[1] gtools_3.9.3 clisymbols_1.2.0 tidyselect_1.1.2.9000 akima_0.6-3.4 remotes_2.4.2 purrr_0.3.4 reshape2_1.4.4
[8] lattice_0.20-45 carData_3.0-5 testthat_3.1.4 colorspace_2.0-3 vctrs_0.4.1.9000 generics_0.1.3 usethis_2.1.5
[15] utf8_1.2.2 rlang_1.0.5 pkgbuild_1.3.1 ggpubr_0.4.0 pillar_1.8.1 withr_2.5.0 glue_1.6.2
[22] sp_1.5-0 sessioninfo_1.2.2 lifecycle_1.0.2.9000 plyr_1.8.7 stringr_1.4.1 munsell_0.5.0 ggsignif_0.6.3
[29] gtable_0.3.1 devtools_2.4.3 memoise_2.0.1 callr_3.7.2 fastmap_1.1.0 ps_1.7.1 curl_4.3.2
[36] fansi_1.0.3 broom_1.0.1 Rcpp_1.0.9 backports_1.4.1 scales_1.2.1 cachem_1.0.6 desc_1.4.2
[43] pkgload_1.3.0 abind_1.4-5 fs_1.5.2 brio_1.1.3 ggplot2_3.3.6 stringi_1.7.8 rstatix_0.7.0
[50] processx_3.7.0 dplyr_1.0.99.9000 rprojroot_2.0.3 grid_4.2.1 cli_3.4.0 tools_4.2.1 magrittr_2.0.3
[57] patchwork_1.1.2 tibble_3.1.8 crayon_1.5.1 car_3.1-0 tidyr_1.2.1 pkgconfig_2.0.3 ellipsis_0.3.2
[64] prettyunits_1.1.1 extraDistr_1.9.1 rstudioapi_0.14 R6_2.5.1 CNAqc_1.0.0 compiler_4.2.1
hi, i have problem with this, is there any files not included or it's just private. Ans if possible, may i have this to be able to run your R file to check the result?
rcongas/nobuild/OV_analysis.Rmd
Line 29 in 9aae602
Can we plot segments and then find some way to overlay the input data? Both the aggregated ccounts, and the raw per-gene,
As the choice of the library size distribution prior is somewhat hard if you are not familiar with Gamma distributions. A function that adapts the parameter to the data in an approximate way can be useful to standardize the inference
Good day!
I want to apply your tool to a set of scATAC data and read in your latest pre-print that CONGAS+ can be used on this moldaity of data. However, I have not been able to find a vignette/tutorial/documentation on how to.
Could you please address me where I can find that information?
Also, does CONGAS+ give a final classification for each cell to be aneuploid or diploid?
I really appreciate any help you can provide.
Add the stat as a parameter, and remake the plot
In https://github.com/Militeee/rcongas/blob/master/R/filter_clusters.R (function remove_small_clusters
) get_best_model
cannot be called as
bm <- Rcongas::get_best_model(x)
since it is not exported in https://github.com/Militeee/rcongas/blob/master/NAMESPACE.
Maybe this should be exported no?
I have run CONGAS on several cancer samples and overall I am quite happy the sensitivity of the regions identified as subclonal and the subclones it finds. However, there are some samples where CONGAS finds clusters composed by only 1 cell (sorry but I can't share the data). Moreover, the identity of the cell clusters sometimes change (I thought that the seed
parameter would solve this), and of course it also does when you change the number of clusters you test. But this seems to be influenced by the small subclusters found, hence the important of filtering small clusters.
For example this is the clustering result for 2 consecutive runs of the same dataset with the same parameters (inference <- best_cluster(input_rcongas, clusters = 1:5, model = "MixtureGaussian", param_list = list(theta_shape = theta_vals[1], theta_rate = 1),steps = 500, lr = 0.01, MAP = TRUE, seed = 3, normalize_by_segs = F, method = "ICL")
)
run2
run1 c1 c2 c3 c4
c1 314 13 0 2
c2 294 7 0 0
c3 1 221 0 0
c4 0 1 48 3
this is the comparison after running again changing clusters = 1:10
:
run3
run1 c1 c2 c3 c4 c5 c6 c7
c1 326 3 0 0 0 0 0
c2 0 299 1 0 0 1 0
c3 1 1 197 0 23 0 0
c4 0 0 0 51 0 0 1
run3
run2 c1 c2 c3 c4 c5 c6 c7
c1 311 296 1 0 0 1 0
c2 14 7 197 1 23 0 0
c3 0 0 0 47 0 0 1
c4 2 0 0 3 0 0 0
As you can see the emergence of a small subcluster on the second run has made the clones c1 and c2 from run1 to collapse into 1 clone in run2. The validity of clones detected in run1 seems to be verified by the third run (finds sames clones) now that it has room to find this smalls clusters. As you can see in run3 we have c6 and 7 composed by 1 cell.
Is there anything you would suggest to increase reproducibility?
Would it make sense to add a parameter to limit the minimum size of the clusters?
Or maybe it can also be interesting to be able to filter "undesired" clusters a posteriori from the results and being able to use plot_gw_cna_profiles
without having to run again the analysis.
As the number of false positive can be quite high in settings where we have different cell types (or developmental states) it is good practice to compare the gene expression distributions across different clusters for a given segment, to spot possible outliers
I think calculate_DE does not work anymore.. is there a problem with the transpose input etc?
With the default object the above command files.
> get_clusters(congas_example)
Error in `$<-.data.frame`(`*tmp*`, cell, value = c("AAACCTGTCGACAGCC-1", :
replacement has 503 rows, data has 504
The error comes from
z_nk$cell = names(best_model$parameters$assignement)
because there is 1 extra row in z_nk
compared to assignments.
binsize
line 48 of plot_single_segment
is not defined.
All functions that require to receive in input an object with clusters inside, should check that the input is correct.
For instance, all those in getters_clustering_results.R
should do that. Same for the others.
Intelligible errors should be raised via stop
.
Returns a matrix with 0 rows and 32737 columns.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.