Code Monkey home page Code Monkey logo

fastspar's Introduction

FastSpar

Build Status License

Rapid and scalable correlation estimation for compositional data

Table of contents

Introduction

FastSpar is a C++ implementation of the SparCC algorithm which is up to several thousand times faster than the original Python2 release and uses much less memory. The FastSpar implementation provides threading support and a p-value estimator which accounts for the possibility of repetitious data permutations (see this paper for further details).

Citation

If you use this tool, please cite the FastSpar paper and original SparCC paper:

Requirements

There are no requirements for using the pre-compiled static binaries on 64-bit linux distributions. Otherwise, there are several libraries which are required for building and running dynamically linked binaries. For further information, see Compiling from source.

Installing

FastSpar can be installed using conda or from source.

Conda

To install through conda, use:

conda install -c bioconda -c conda-forge fastspar

Compiling from source

Compiling from source requires these libraries and software:

C++11 (gcc-4.9.0+, clang-4.9.0+, etc)
OpenMP 4.0+
Gfortran
Armadillo 6.7+
LAPACK
OpenBLAS
GNU Scientific Library 2.1+
GNU getopt
GNU make
GNU autoconf
GNU autoconf-archive

These dependencies can be install with the following packages on ubuntu 20.04:

build-essential
gfortran
dh-autoreconf
libarmadillo-dev
libopenblas-openmp-dev
libgsl-dev

After meeting the above requirements, compiling and installing FastSpar from source can be done by:

git clone https://github.com/scwatts/fastspar.git
cd fastspar
./autogen.sh
./configure --prefix=/usr/
make
make install

Once completed, the FastSpar executables can be run from the command line.

Usage

Correlation inference

To run FastSpar, you must have absolute OTU counts in BIOM tsv format file (with no metadata). The fake_data.tsv (from the original SparCC implementation) will be used as an example:

fastspar --otu_table tests/data/fake_data.tsv --correlation median_correlation.tsv --covariance median_covariance.tsv

The number of iterations (rounds of SparCC correlation estimation) and exclusion iterations (the number of times highly correlation OTU pairs are discovered and excluded) can also be tweaked:

fastspar --iterations 50 --exclude_iterations 20 --otu_table tests/data/fake_data.tsv --correlation median_correlation.tsv --covariance median_covariance.tsv

Further, the minimum threshold to exclude correlated OTU pairs can be increased:

fastspar --threshold 0.2 --otu_table tests/data/fake_data.tsv --correlation median_correlation.tsv --covariance median_covariance.tsv

Calculation of exact p-values

There are several methods to calculate p-values for inferred correlations. Here we have elected to use a robust permutation based approach. This process involves inferring correlation from random permutations of the original OTU count data. The magnitude of each p-value is related to how often a more extreme correlation is observed for randomly permutated data. In the below example, we calculate p-values from 1000 bootstrap correlations.

First we generate the 1000 bootstrap counts:

mkdir bootstrap_counts
fastspar_bootstrap --otu_table tests/data/fake_data.tsv --number 1000 --prefix bootstrap_counts/fake_data

And then infer correlations for each bootstrap count (running in parallel with all processes available):

mkdir bootstrap_correlation
parallel fastspar --otu_table {} --correlation bootstrap_correlation/cor_{/} --covariance bootstrap_correlation/cov_{/} -i 5 ::: bootstrap_counts/*

From these correlations, the p-values are then calculated:

fastspar_pvalues --otu_table tests/data/fake_data.tsv --correlation median_correlation.tsv --prefix bootstrap_correlation/cor_fake_data_ --permutations 1000 --outfile pvalues.tsv

Threading

If FastSpar is compiled with OpenMP, threading can be used by invoking --threads <thread_number> at the command line for several tools:

fastspar --otu_table tests/data/fake_data.txt --correlation median_correlation.tsv --covariance median_covariance.tsv --iterations 50 --threads 10

Contributors

  • sritchie73
    • Advised on use of permutation based statistical testing
    • Provided an example use of statmod::permp
  • epruesse
    • Created bioconda recipe

License

GNU General Public License v3.0

fastspar's People

Contributors

scwatts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fastspar's Issues

FDR correction

Hi,

thanks for a very useful implementation of sparcc. I would like to request that an option be included for FDR or Benjamini-Hochberg corrected p-values to be included in outputs.

Thanks,

Theo

Input data error

Hi, I'm trying to use fastspar to run some correlation for my OTU data however I'm getting an error but I'm not sure how to address it. Any suggestions would be very helpful

Warning Messages
Warning: the following OTUs have only one unique permutation and it is recommended to remove them from this analysis:
(null) (row 270)

Running SparCC iterations
Running iteration: 1
Input triggered condition to perform clr correlation, this is not yet implemented

Many thanks

otu_tab.txt

Threads option ignored

Dear authors,

We are trying to run the fastspar command with the thread options but it is still running in one single thread. Do you a way to circumvent that problem?

Best regards,

Loïc

Conda recipe installs armadillo 9 instead of 8

The conda recipe for the package installs Armadillo 9 by default and causes a shared library error. It should be fixed if you restrict this line in the recipe. Let me know if you would like me to open this issue on the bioconda-recipes repository instead.

Thank you so much for your work!

FDR correction

Hi scwatts,

Thanks for developing FastSpar! Do you recommend applying FDR correction to fastspar p-values, or is this not required?

Thanks
Erica

Running with multiple threads lead to process getting killed

Dear Stephen,
When running FastSpar with multiple threads on our Ubuntu server the process gets killed (doesn't matter how many threads). The only error message is "Killed". When running FastSpar without threading there is no error. Any idea what could be the problem?
FastSpar is installed through conda.

Best regards, Kristian

bootstrap_correlation

HI, scwatts
when I run the following command, reported "fastspar: invalid argument: bootstrap_counts/data_99.tsv", can you help me solve this issues?
thanks
parallel fastspar --otu_table data --correlation bootstrap_correlation/cor_data --covariance bootstrap_correlation/cov_data -i 5 ::: bootstrap_counts/*

Error during installation

Below shows the log informaiton during the compilation.

checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports the include directive... yes (GNU style)
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking for main in -larmadillo... yes
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking for g77... no
checking for xlf... no
checking for f77... no
checking for frt... no
checking for pgf77... no
checking for cf77... no
checking for fort77... no
checking for fl32... no
checking for af77... no
checking for xlf90... no
checking for f90... no
checking for pgf90... no
checking for pghpf... no
checking for epcf90... no
checking for gfortran... gfortran
checking whether we are using the GNU Fortran 77 compiler... yes
checking whether gfortran accepts -g... yes
checking for ranlib... ranlib
checking for ar... ar
checking the archiver (ar) interface... ar
checking for main in -lgomp... no
configure: error: could not find the library OpenMP
make: *** No targets specified and no makefile found. Stop.
make: *** No rule to make target `install'. Stop.

I have no idea how to fix this. Is there any file I should modify for dealing with this issue?
I was running the script on macOS catalina.

Details of gcc version are shown as below:

Apple clang version 11.0.0 (clang-1100.0.33.16)
Target: x86_64-apple-darwin19.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

--threshold doesn't work

No matter what threshold I tried, I got the same correlation matrix. It seems that the --threshold doesn't work.

error when bootstrap

Hi, I'm stuck in boot_strap, I run the command below :
fastspar_bootstrap --otu_table rarefied_tidy_asvtable_bh_bac.txt --number 1000 --prefix bootstrap_counts/fake_data
but I got:
fastspar_bootstrap: error while loading shared libraries: libmkl_rt.so: cannot open shared object file: No such file or directory
I don't know what happened, can you please help me?

Error while running fastspar

Hi,

Thanks for working on the development of this package. I was able to run it using your fake dataset and a data set of a colleague. Now I am ready to run it on my own dataset, but I am getting a weird error.
I am running the following code:

fastspar --iterations 500 --exclude_iterations 20 --otu_table test.txt --correlation median_correlation.tsv --covariance median_covariance.tsv

And got the following error:

libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: stof: no conversion
Abort trap: 6

My ASV table looks something like this with a total of 64 samples and 154 taxa. It has tab-delimited formatting.

#OTU ID Sample1 Sample2 Sample3
Acidobacteria Candidatus Koribacter 579 399 814
Acidobacteria Candidatus Koribacter versatilis 189 0 1
Acidobacteria Candidatus Solibacter 139 239 367
Acidobacteria Edaphobacter 695 17 11

Do you have any idea of what could be happening here? Can it be the spaces causing the problem?

Thanks,
Yakshi

Bootstrap correlation command issue

Hello scwatts, thank you in advance... I am following your guide with the practice data I am stuck on the following steps:

First we generate the 1000 bootstrap counts:

mkdir bootstrap_counts
fastspar_bootstrap --otu_table tests/data/fake_data.tsv --number 1000 --prefix bootstrap_counts/fake_data
this step works fine, and the bootstrap counts are created in the bootstrap counts folder

And then infer correlations for each bootstrap count (running in parallel with all processes available):

mkdir bootstrap_correlation

parallel fastspar --otu_table {} --correlation bootstrap_correlation/cor_{/} --covariance bootstrap_correlation/cov_{/} -i 5 ::: bootstrap_counts/*

This last command doesn't do anything when I hit enter and the bootstrap correlation folder is empty. There isn't an error message until I try to run the next command, which confirms that the folder is empty

ERROR: number of permutations, 1000, isn't equal to the number of bootstrap correlations found, 0

Any help would be greatly appreciated.

error when fastspar_bootstrap

Hi, I met the problem when I run boot_strap: fastspar_bootstrap --otu_table rarefied_tidy_asvtable_bh_bac.txt --number 1000 --prefix bootstrap_counts/fake_data , but I got: fastspar_bootstrap: error while loading shared libraries: libmkl_rt.so: cannot open shared object file: No such file or directory
I downloaded the fastspar through conda, and I can also check the version of fastspar. but it seems i can not use the file libmkl_rt.so when I run fastspar_bootstrap. I checked the file libmkl_rt.so, i can find it under python, and also in other packages. Is that because the path in the file fastspar_bootstrap used for finding libmkl_rt.so is a absolute path? could you please help me out? Any reply will be highly appreciated.

Fastspar p-values output greater than 0.05

Hello Stephen. I am a beginner in constructing microbial networks. I want to create microbial networks using these thresholds: r > 0.8, p < 0.05. I used Fastspar to generate the correlation and exact p-values. However, I find the p-values output from the Fastspar strange. Most of the values start from 0.1 and higher (I was expecting it to be below some threshold like 0.5 or 0.01). These are also similar to the p-values in your fake_data_exact_pvalues.tsv. How do I deal with this? Any suggestions on what to do with the p-values before I can construct my network.

Returned p-values include 0?

Hello,

I was wondering under what circumstances a p-value would be returned as zero by fastspar.

I have an adjacency matrix (607 x 607), and the corresponding p-value matrix has 5461 zero values.

This has occurred in a small dataset (11 samples) but not a larger one (40 samples). Could the size of my dataset be causing problems while generating exact p-values?

Thanks,
Ben

Cannot compile on MacOS

Dear Author of fastspar,

I have been trying to compile it on a macOS system using gcc (by changing gcc in make file to the path of brew installed gcc path). I always have an error of not finding Armalio library.

Can you please also provide a binary on MacOS?

Thank you

Jianshu

installed fastspar with Anaconda but env was not created

@scwatts!

I installed fastspar with Anaconda3 to a remote computer through server.
$ conda install -c bioconda -c conda-forge fastspar

I got the message of environmente location:
$ /home/user/anaconda3

I ran the first example:
$ fastspar --otu_table tests/data/fake_data.tsv --correlation median_correlation.tsv --covariance median_covariance.tsv
And got the following error message:

$ Starting FastSpar
Program: FastSpar (c++ implementation of SparCC)
Version 0.0.6 #and all the information of usage
fastspar: error: OTU table tests/data/fake_data.tsv does not exist

It seems the files that are expected in anaconda/envs were not loaded, a fastsapr directory there was not created, the package is only in:
~/anaconda3/pkgs

In the list of pkgs I see:
fastcache-1.1.0-py38h7b6447c_0
fastcache-1.1.0-py38h7b6447c_0.conda
fastcache-1.1.0-py38h7b6447c_0tvvjy01e
fastcache-1.1.0-py38h7b6447c_0ylnry61h
fastspar-0.0.6-0
fastspar-0.0.6-0.tar.bz2

In my local computer I used the same installation command and everything went into place, it works properly.

How can I manage to use it at the remote computer through server?
It is Ubuntu 18.04.3 LTS (GNU/Linux 5.4.0-81-generic x86_64).

Any help is appreciated.

Differing results for different taxonomic subsets

Hi,

Apologies for coming with a question about sparCC, however I noticed the support for sparCC has been discontinued, and as my question related to the underlying functionality/results, hopefully it is still applicable/of interest

I'm analysing a faecal metatranscriptomics dataset, and am interested in investigating correlations between the abundance of microbial inhabitants and 3 parasites species present in the gut across 12 samples.

I have used kraken2 to classify the reads taxonomically to derive the taxonomic abundance data.

I first attempted to use sparCC to correlate all taxa (all data converted to Genus level taxa, those not classified to this level set as "unclassified") present (excluding host), around 20 000 taxa. This proved to be prohibitively slow to calculate pseudo p-values.

Q1: would this be a realistic size of dataset to investigate with fastspar?

to reduce the number of taxa investigated, I attempted 2 methods:

i) include only bacterial taxa in analysis + the parasites of interest

ii) exclude only animal taxa from the analysis

from these tables, I took only the most abundant taxa (roughly 300-500 taxa total), by excluding taxa that had low abundance across all samples (a taxa had to form at least 0.005% of total reads in at least one sample to be considered).

Note that for all my analyses, I included a taxa in my input to sparCC which was "unclassified/other" which was the difference between the total library size and the total of all the "classified" taxa

Q2: is including the "unclassified/other" category appropriate for use with sparCC/fastspar?

The dataset excluding animals contains a 462 taxa whereas the bacteria-only set has 358

I have noticed substantial differences in the correlations calculated with the 2 datasets. The datasets are very similar, i.e. same number of total reads for each sample, except a few additional taxa are included rather than merged with the "unclassified/other" group.

for example, here are the correlations for 10 taxa vs my parasite of interest in the bacteria-only table:

   Absiella   Acaryochloris     Acetobacter    Acholeplasma 
 -0.3280080      -0.7163737      -0.4053745       0.5991498 

Achromobacter Acidaminococcus Acidovorax Acinetobacter
0.6846492 -0.3106545 0.2301384 0.3902558
Actinobacillus Actinomyces
-0.6962667 0.6891886

and the matching correlations in the no animals table:

   Absiella   Acaryochloris     Acetobacter    Acholeplasma 
-0.41840805     -0.76625668     -0.61624764      0.45872736 

Achromobacter Acidaminococcus Acidovorax Acinetobacter
0.12130576 -0.48370664 -0.32701231 0.01257128
Actinobacillus Actinomyces
-0.70974461 0.47870636

Often the correlations approximately correspond, but there are a few cases in which the result is completely different. I'm visualising my results as a correlation network with threshold cutoffs for correlation strength and p-value, and the differences in correlation dramatically affect the results. One parasite species forms significant correlations with a number of bacteria genera in the bacteria-only table, but only one bacteria genus in the no animals table (a correlation which isn't significant in the bacteria only table)

Q3: could you aid my understanding of the difference. The additional different taxa won't affect the read counts for the existing taxa, they're only taken from the "unclassified/other" bin so I naively assume any detected correlations should be present in both. Which, if either, method would you consider to be the more accurate?

Apologies for the feedback, but any greater understanding of the sparCC/fastspar method you could provide would be much appreciated

Add option for different input fromats

Please, add an ability to process .csv and .biom, as they are used quite commonly in the field for data representation.

Thank you for such an elegant piece of software!

Diagonal of p-values matrix is 1

Hi, thank you very much for the tool. I have a doubt about the p-value results. I computed the correlations for a table of transformed absolute counts. However, the diagonal of the p-values is 1 for all comparisons, so ASV1-ASV1, ASV2-ASV2 and so on have correlation coefficients of 1, but the p-value for those correlations is also 1. Shouldn't the p-value in these cases be very close to 0? Am I missing something?
Thanks!

The --threads argument seems to be ignored

I'm seeing no connection between the --threads value and the CPU usage from fastspar.

Here 20 threads should correspond to 2000% CPU, but it's getting 4500%

USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                       
jmeppley  20   0 5408020   1.6g   7084 R  4524   0.1   1:34.09 fastspar --threshold 0.2 --otu_table family.FullData_totalsfastpar.tsv --threads 20 --correlation  fa

I'm running version 0.0.10 (from conda) on Fedora 30.

This is a problem for our shared system. I'd like to be able to run all the bootstrap calculations using something like 2/3 of the total CPUs, leaving the rest for other users. However, no matter what I put for --threads, it uses the whole system.

Inferred correlations not being created for all bootstrap count table?

I'm trying to calculate p-values following the usage instructions. The 1000 bootstrap count tables are created without issue.

I do get this warning while inferring correlations "Warning: the following OTUs have only one unique permutation and it is recommended to remove them from this analysis:" however, it still seems to proceed. I've filtered the data as suggested, but still receive some of these warnings. Is there something else I should do to address this?

When calculating p-values for 1000 bootstrapped tables I run into this error: "ERROR: number of permutations, 1000, isn't equal to the number of bootstrap correlations found, 812". 812 is equal to the number of "only one unique permutation and it is recommended..." messages received. If I change --permutations to 812 for fastspar_pvalues it does seem to work, but I'm not sure what is happening with the other 188 tables?

Cannot install fastspar with anaconda

Hi,

I am trying to install fastspar using the provided code:

conda install -c bioconda -c conda-forge fastspar

but get this output:

(qiime2-2020.2) [sdegregori@surfingmantis scnic]$ conda install -c bioconda -c conda-forge fastspar Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: - failed with initial frozen solve. Retrying with flexible solve.

And it will stay in this loop until I end it. Any help on this issue would much be appreciated.

Error generating 1000 bootstrap counts

Hello,

I encountered an error while creating the bootstrap counts:

fastspar_bootstrap: error while loading shared libraries: libmkl_rt.so.2: cannot open shared object file: No such file or directory

I installed the package using conda, so all dependencies should be installed. Maybe I am missing something obvious here.

Best

Samuel

libmkl_rt.so Error

Hello,

I am recieving the error when I try to run fastspar_bootsrap and fastspar_pvalues.
error while loading shared libraries: libmkl_rt.so: cannot open shared object file: No such file or directory

It seems I'm not the only one to run into this problem.
https://githubmemory.com/repo/scwatts/fastspar/issues/28
https://githubmemory.com/repo/scwatts/fastspar/issues/23

I installed fastspar using
conda create -n fastspar2 -c bioconda -c conda-forge -y fastspar

My conda environment looks like this...

# packages in environment at /data/user/gtwa/.conda/envs/fastspar2:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
armadillo                 9.900.5              h7c03176_0    conda-forge
arpack                    3.7.0                hdefa2d7_2    conda-forge
fastspar                  1.0.0                h6d2aef6_1    bioconda
gsl                       2.6                  he838d99_2    conda-forge
libblas                   3.9.0           11_linux64_openblas    conda-forge
libcblas                  3.9.0           11_linux64_openblas    conda-forge
libgcc-ng                 11.2.0               h1d223b6_9    conda-forge
libgfortran-ng            11.2.0               h69a702a_9    conda-forge
libgfortran5              11.2.0               h5c6108e_9    conda-forge
liblapack                 3.9.0           11_linux64_openblas    conda-forge
libopenblas               0.3.17          openmp_h3d5035f_1    conda-forge
libstdcxx-ng              11.2.0               he4da1e4_9    conda-forge
llvm-openmp               12.0.1               h4bd325d_1    conda-forge
openblas                  0.3.17          openmp_h35c1ac2_1    conda-forge
parallel                  20210822             ha770c72_0    conda-forge
perl                      5.32.1          0_h7f98852_perl5    conda-forge
superlu                   5.2.2                h16cfea0_0    conda-forge

I'm not sure what libmkl_rt.so is, why is seems to be missing, and why I can't seem to resolve it. Please help.

Strange correlations of -1 and 1

Hi. I am trying to reconstruct a network from an OTU table, using fastpar to obtain the correlations. However, I get strange results. For instance, many correlations are 1 or -1. A look into these OTUs shows there is variability in their corresponding data (I filtered out any OTUs with more than one 0). It is not very likely that the abundances of these bacteria are perfectly correlated, or perfectly uncorrelated; even if they were, there must be some measurement error.
Furthermore, these results seem to be inconsistent. For example, this is observed:
OTU A vs OTU B: fastspar correlation is -1
OTU A vs OTU C: fastspar correlation is -1
OTU B vs OTU C: fastspar correlation is not 1
If pairs A vs B and A vs C are perfectly uncorrelated, shouldn't B and C be perfectly correlated?

Also, I would expect the p values of perfect correlations (values of 1 or -1) to be 0, or the minimum attainable with the permutation number I used, i.e. 0.001, but they are variable.

I post an example, I think reproducible:

In the terminal:

fastspar --otu_table filtered_sanos.txt --correlation median_correlation_sanos.tsv --covariance median_covariance_sanos.tsv # Fastspar run to get the correlation matrix

filtered_sanos.txt

In R:

c_healthy <- read.table("median_correlation_sanos.tsv", sep ="\t") # read correlations into R
c_healthy_matrix <- as.matrix(c_healthy[,2:245]) # Take the numeric part
diag(c_healthy_matrix) <- 0 # Make the diagonal 0
negative_one <- which(c_healthy_matrix == -1, arr.ind = T) # From here I chose the following

Example:

c_healthy_matrix[240,11]
c_healthy_matrix[240,12]
c_healthy_matrix[11,12]

I get the following:

c_healthy_matrix[240,11]
V12
-1
c_healthy_matrix[240,12]
V13
-1
c_healthy_matrix[11,12]
V13
0.661

When I compute the pearson correlations for the same OTUs (normalized dividing by sample totals), I get positive values, for the three cases used as example.

So, my questions are:
Why are these correlations of -1 and 1 occurring in fastspar? Is there an issue with these data, or an error in the way fastspar was used?
How do I interpret these results? Are these OTUs really negatively correlated?

Thanks for your time.

fastspar enhancement or workaround

Hi and thank you for the tool!

fastspar doesn't allow the upload of any metadata, but I d like to test the correlation of a variable.

Could I add a continuous variable to the taxonomic compositional data so that the species correlations with it will also be calculated?

In other words: How bad would it be to treat a continuous variable (such as weight) as compositional even if it's not?

Best,
Dany

Problem installing with brew

I'm trying to do brew install https://raw.githubusercontent.com/scwatts/fastspar/master/scripts/fastspar.rb but am getting a SHA error...

######################################################################## 100.0%
==> Downloading https://github.com/scwatts/fastspar/archive/v0.0.5.tar.gz
==> Downloading from https://codeload.github.com/scwatts/fastspar/tar.gz/v0.0.5
######################################################################## 100.0%
Error: SHA256 mismatch
Expected: c6fafdff151fa3dcf430fb4c593d392ac27f721807ac4f49d3412c96d25a8389
Actual: c4cc7682720f566da7587e555b58a688671a97235d00c33d042a7f2cd6cef20a
Archive: /Users/pschloss/Library/Caches/Homebrew/fastspar-0.0.5.tar.gz
To retry an incomplete download, remove the file above.

Would it be possible to get a quick fix?

Does SPARCC require sample independence?

Hi, thank you for developing such a useful software.
I am a bit confused whether sparcc can be used for the calculation of species correlation between non-independent samples, e.g. time series samples from same mice?
Thanks for your reply.

Multithreading causes memory allocation fault

It works sometimes, and the fewer threads I specify the more likely it is to run to completion. Single threaded command runs every time. Please help.

pwilburn$ /usr/local/miniconda3/bin/fastspar -t 12 --otu_table data.txt --correlation data.cor.txt --covariance data.cov.txt
Starting FastSpar
Running SparCC iterations
	Running iteration: 2
	Running iteration: 1
	Running iteration: 4
	Running iteration: 8
	Running iteration: 7
	Running iteration: 10
	Running iteration: 5
	Running iteration: 9
	Running iteration: 6
	Running iteration: 11
	Running iteration: 12
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Segmentation fault: 11

Alternate table dimension generated during cor/cov calculation of bootstrap tables?

Hi,

I've encountered this error message at the last stage of the pipeline during p-value calculation

error: relational operator: incompatible matrix dimensions: 1575x1575 and 6110x6110
terminate called without an active exception
Aborted (core dumped)

I then check the dimensions of cor_otu_tab##.tsv files and indeed there are two dimensions 1575 and 6110.

The files in OTU table and bs_counts are all 6110x8 in dimensions. (6110 OTUs and 8 samples)

Is there a reason why alternative dimensions are being generated?
And are there ways to solve this issue?

Many thanks

Below, I attach the original OTU table used in fastspar pipeline
GI_otu_tab.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.