vitkl / paretoti Goto Github PK
View Code? Open in Web Editor NEWR toolbox for Archetypal Analysis and Pareto Task Inference on single cell data (based on ParTI)
Home Page: https://vitkl.github.io/ParetoTI/
License: Apache License 2.0
R toolbox for Archetypal Analysis and Pareto Task Inference on single cell data (based on ParTI)
Home Page: https://vitkl.github.io/ParetoTI/
License: Apache License 2.0
Great software! I tried to run the example files on Matlab 2018 version and found two updates:
When running the "Synthetic"- example, the parameter Buffersize
in order to load in the data is no longer required.
The function princomp
got replaced by pca
but out and input remain the same.
Hi I am new to R. Not able to install pareto TI package. Below is the error message
Downloading GitHub repo vitkl/ParetoTI@HEAD
√ checking for file 'C:\Users\desaipa\AppData\Local\Temp\Rtmp42I98k\remotes70ac3fbf6754\vitkl-ParetoTI-5109906/DESCRIPTION' ...
Hi,
I got the error below when I try to follow your example and then when I run the
go_annot = map_go_annot(taxonomy_id = 10090, keys = rownames(hepatocytes),
columns = c("GOALL"), keytype = "ALIAS",
ontology_type = c("CC"))
I got the error below.
how can I fix it?
snapshotDate(): 2019-05-02
downloading 0 resources
loading from cache
‘AH70573 : 77319’
Error: failed to load resource
name: AH70573
title: org.Mm.eg.db.sqlite
reason: database disk image is malformed
In addition: Warning messages:
1: Couldn't set cache size: database disk image is malformed
Use `cache_size` = NULL to turn off this warning.
2: Couldn't set synchronous mode: database disk image is malformed
Use `synchronous` = NULL to turn off this warning.
Hi,
I've been working for a long time with ParetoTI (thank you for that), but after the latest update of R I can't upload the package anymore. I get this error:
library(ParetoTI)
Error: package or namespace load failed for ‘ParetoTI’:
Tried to reinstall all the other packages needed, but nothing worked.
This is especially important as because I can't work ParetoTI I also can load the environment with all the previous results I have from using ParetoTI.
Is there going to be an update for the new R version?
Thank you very much!
Hi,
I ran the script from your tutorial as is with V0.1.13 of your code on R.3.5.1 and was not able to reproduce some of your results, e.g.:
> pch_rand = randomise_fit_pch(PCs4arch, arc_data = arc_1,
n_rand = 1000,
replace = FALSE, bootstrap_N = NA,
volume_ratio = "t_ratio",
maxiter = 500, delta = 0, conv_crit = 1e-4,
type = "m", clust_options = list(cores = 3))
> pch_rand
Background distribution of k representative archetypes
in data with no relationships between variables (S3 class r_pch_fit)
N randomisation trials: 998
Summary of best-fit polytope to observed data (including p-value):
k var_name var_obs p_value
1: 4 varexpl 0.20280642 0.9789579
2: 4 t_ratio 0.03944111 0.7094188
3: 4 total_var NA NaN
I am trying out the package and running into some trouble. After setting up as described in readme(and making sure the right condaenv is loaded), I am trying to run the fit_pch_bootstrap
function and getting the error in the title.
I'm am running the code as follows:
library(ParetoTI)
reticulate::use_condaenv("reticulate_PCHA", conda = "auto", required = TRUE)
arc = fit_pch_bootstrap(t(obj@[email protected]), n = 200, sample_prop = 0.75, seed = 235,
noc = 4, delta = 0, conv_crit = 1e-04, type = "s")
I saw this happened also in a previous issue (#4) but there isn't a solution for this problem there (I am already running with type="s"
)
Hello,
I tried re-running an archetype analysis on a dataset for which I got good results in the past, and this time couldn't get the fitting functions to work. I replicated the error using the simulated example given in the vignette, given below. Looks like the single fit_pch run works, but once bootstrapping is included, it fails.
I would appreciate any help figuring out what it might be.
Thanks for all the hard work you've done on this package, it's been very helpful.
data = generate_data(archetypes$XC, N_examples = 1e4, jiiter = 0.04, size = 0.9)
arc_data = fit_pch(data, noc = as.integer(3), delta = 0)
arc_ks = k_fit_pch(data, ks = 2:6, check_installed = T,
bootstrap = T, bootstrap_N = 200, maxiter = 1000,
bootstrap_type = "m", seed = 2543,
volume_ratio = "none", # set to "none" if too slow
delta=0, conv_crit = 1e-04, order_type = "align",
sample_prop = 0.75)
Error in pch_fit_list[[1]] : subscript out of bounds
arcfit <- fit_pch_bootstrap(data, n = 200, sample_prop = 0.75, seed = 235,
noc = 3, delta = 0, conv_crit = 1e-04, type = "m")
Error in pch_fit_list[[1]] : subscript out of bounds
I am now doing analysis using ParetoTI, and this is a question about the result enrich_set.
In the step’ Find genes and gene sets enriched near vertices’, I generate the enrich_set and enrich_gene for each archetype. In the conclusion, I find out several columns describing relative properties of the genes, but I cannot understand the meaning of some of them. The median_diff, mean_diff, top_bin_mean, and especially the sample_count are those I cannot understand. The sample count is not about the number of all the cells either the enriched cells generated by bin_cells_by_arch, I am not sure what it means.
The code I write is about (just like the example):
activ = measure_activity(data,
which = "BP", return_as_matrix = F,
taxonomy_id = 9606,
keys=rownames(data),
keytype = "SYMBOL",
lower = 50, upper = 2000,
aucell_options = list(aucMaxRank = nrow(data) * 0.1,
binary = F, nCores = 2,
plotStats = TRUE))
data_attr = merge_arch_dist(arc_data = arc_data, data = data,
feature_data = as.matrix(data),
colData = activ,
dist_metric = c("euclidean", "arch_weights")[1],
colData_id = "cells" , rank = F)
enriched_genes = find_decreasing_wilcox(data_attr$data, data_attr$arc_col,
features = data_attr$features_col,
bin_prop = 0.1, method = "BioQC",
)
enriched_sets = find_decreasing_wilcox(data_attr$data, data_attr$arc_col,
features = data_attr$colData_col,
bin_prop = 0.1,method = "BioQC")
I have run measure_activity, following one of the tutorials, with the following code on human cells:
activ_alias = measure_activity(as.matrix(x), activity_method = 'pseudoinverse',
which = 'BP', return_as_matrix = F,
taxonomy_id = 9606, keytype = "ALIAS",
lower = 10, upper = 1000)
I then use merge_arch_dist() to make a single matrix:
data_attr = merge_arch_dist(arc_data = arc_ave, data = x_pca,
feature_data = as.matrix(x),
colData = activ_alias,
dist_metric = c("euclidean", "arch_weights")[1],
colData_id = "cells", rank = F)
However, when I try to use this to find features (GO sets) that are decreasing functions of distance from archetype, I get an error, shown below. It seems like these errors are coming from the GO terms themselves, but I'm not sure how to fix that to find the most important GO terms for each archetype. Any suggestions?
enriched_sets = find_decreasing(data_attr$data, data_attr$arc_col,
features = data_attr$colData_col, return_only_summary = TRUE)
output:
Error in parse(text = x, keep.source = FALSE): <text>:1:2: unexpected input
1: 2_
^
Traceback:
1. find_decreasing(data_attr$data, data_attr$arc_col, features = data_attr$colData_col,
. return_only_summary = TRUE)
2. lapply(seq_len(length(features)), .find_decreasing_1, features = features,
. arc_col = arc_col, N_smooths = N_smooths, data_attr = data_attr,
. min.sp = min.sp, ..., d = d, n_points = n_points, weights = weights,
. return_only_summary = return_only_summary, one_arc_per_model = one_arc_per_model)
3. FUN(X[[i]], ...)
4. lapply(arc_col, function(col) {
. ParetoTI::fit_arc_gam_1(feature = feature, col = col, N_smooths = N_smooths,
. data_attr = data_attr, min.sp = min.sp, ..., d = d, n_points = n_points,
. weights = weights)
. })
5. FUN(X[[i]], ...)
6. ParetoTI::fit_arc_gam_1(feature = feature, col = col, N_smooths = N_smooths,
. data_attr = data_attr, min.sp = min.sp, ..., d = d, n_points = n_points,
. weights = weights)
7. mgcv::gam(as.formula(form), data = data_attr, min.sp = min.sp,
. ...)
8. interpret.gam(formula)
9. as.formula(form)
10. formula(object, env = baseenv())
11. formula.character(object, env = baseenv())
12. formula(eval(parse(text = x, keep.source = FALSE)[[1L]]))
13. eval(parse(text = x, keep.source = FALSE)[[1L]])
14. parse(text = x, keep.source = FALSE)
Hey! Fantastic package--really enjoying playing around with it.
I couldn't see anything for it in the source code, but I was just wondering if you would have a strategy to calculate the angle of an effect vector to the Pareto front.
Talking through my thoughts here, I could imagine determining the front for some control data, embed new data (eg. cells following some perturbation) in the same PC space, and then perhaps based on distances to each archetype, you could calculate an angle. Do you have any thoughts on how this could be implemented or if I'm missing something?
I appreciate it!
David
I am now doing analysis using ParetoTI, and this is a question about the result enrich_set.
In the step’ Find genes and gene sets enriched near vertices’, I generate the enrich_set and enrich_gene for each archetype. In the conclusion, I find out several columns describing relative properties of the genes, but I cannot understand the meaning of some of them. The median_diff, mean_diff, top_bin_mean, and especially the sample_count are those I cannot understand. The sample count is not about the number of all the cells either the enriched cells generated by bin_cells_by_arch, I am not sure what it means.
The code I write is about (just like the example):
activ = measure_activity(data,
which = "BP", return_as_matrix = F,
taxonomy_id = 9606,
keys=rownames(data),
keytype = "SYMBOL",
lower = 50, upper = 2000,
aucell_options = list(aucMaxRank = nrow(data) * 0.1,
binary = F, nCores = 2,
plotStats = TRUE))
data_attr = merge_arch_dist(arc_data = arc_data, data = data,
feature_data = as.matrix(data),
colData = activ,
dist_metric = c("euclidean", "arch_weights")[1],
colData_id = "cells" , rank = F)
enriched_genes = find_decreasing_wilcox(data_attr$data, data_attr$arc_col,
features = data_attr$features_col,
bin_prop = 0.1, method = "BioQC",
)
enriched_sets = find_decreasing_wilcox(data_attr$data, data_attr$arc_col,
features = data_attr$colData_col,
bin_prop = 0.1,method = "BioQC")
Hi.
I tried reproducing the example in comparison_to_kmeans.Rmd and for the most part it works well if I use bootstrap_type = "s" instead of "m" in k_fit_pch function (chunk #5). Otherwise I get the same error as with the other datasets: Error in pch_fit_list[[1]] : subscript out of bounds
The only thing that still doesn't work is (also in chunk #5)
lou_cluster_ks = k_fit_pch(data, ks = 2:5,
bootstrap = T, bootstrap_N = 200, maxiter = 500,
bootstrap_type = "s", clust_options = list(cores = 3),
seed = 2543, replace = FALSE,
volume_ratio = "none", # set to "none" if too slow
sample_prop = 0.95, method = "louvain",
method_options = list(resolution = 0.1,
noc_optim_iter = 500)) # try resolutions for more iterations
Error in align_arc(ref_XC, res$pch_fits$XC[[i]]) :
align_arc() trying to match different number of archetypes
Hi
I'm trying to install ParetoTI within a condo environment with its R installation - I don't want to use the base path R.
Could I ask which R version is compatible with ParetoTI?
Thanks,
Francesco
I am not a native English speaker, so please point out if there are any problems with my description and let me try to describe it in detail again. And thank you for your work to help me on the archetypes analysis in my scRNA-seq at first!
Now, I am trying to use the fit_pch() function to construct the model. My input data is the results of principal component (PC) analysis, which is contain 5 PCs data of thousands of cells, and present as PCs in row and cells in column. I ran the following code:
fit_pch(data, noc = as.integer(4), delta = 0)
The output result contains the XS, S, C (0 for all C) ... t-ratio, with the var_vert and total_var are NA.
Now I need the data of total_var to demonstrate the robustness of the model. Is there any other setting I need to perform to obtain it?
Looking forward to your reply and appreciate your possible help!
I tried the hepatocyte notebook and got an error in the first chunk related to singleCellExperiment
> hepatocytes = SingleCellExperiment(assays = list(counts = data),+ colData = design)
Error in `rownames<-`(`*tmp*`, value = .get_colnames_from_assays(assays)) : invalid rownames length
Would it be possible for you to simplify this example and start with the pcs4arch data?
I followed your instructions for downloading. But I get this error. Do you know how to resolve that?
BiocManager::install("vitkl/ParetoTI", dependencies = c("Depends", "Imports", "LinkingTo"))
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details
replacement repositories:
CRAN: https://cran.rstudio.com/
Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01)
Installing github package(s) 'vitkl/ParetoTI'
Downloading GitHub repo vitkl/ParetoTI@HEAD
Installing 2 packages: edgeR, BioQC
Warning: unable to access index for repository https://bioconductor.org/packages/3.14/bioc/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
Warning: unable to access index for repository https://bioconductor.org/packages/3.14/data/annotation/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/data/annotation/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
Warning: unable to access index for repository https://bioconductor.org/packages/3.14/data/experiment/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/data/experiment/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
Warning: unable to access index for repository https://bioconductor.org/packages/3.14/workflows/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/workflows/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
Warning: unable to access index for repository https://bioconductor.org/packages/3.14/books/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/books/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
Packages which are only available in source form, and may need compilation of C/C++/Fortran: ‘edgeR’ ‘BioQC’
Do you want to attempt to install these from sources? (Yes/no/cancel) Yes
installing the source packages ‘edgeR’, ‘BioQC’
downloaded 1.7 MB
downloaded 4.3 MB
The downloaded source packages are in
‘/private/var/folders/m3/_dck4dnd2rq0nggxjmg7glb80000gn/T/Rtmp6y3o3Y/downloaded_packages’
Running R CMD build
...
library(ParetoTI)
Error in library(ParetoTI) : there is no package called ‘ParetoTI’
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.