daliangning / icamp1 Goto Github PK

View Code? Open in Web Editor NEW

64.0 4.0 25.0 108.74 MB

Infer Community Assembly Mechanisms by Phylogenetic bin-based null model analysis (Version 1)

License: GNU General Public License v2.0

R 100.00%

icamp1's Introduction

iCAMP

Infer Community Assembly Mechanisms by Phylogenetic bin-based null model analysis (Latest version 1.6.5, 2023-12-23)

Daliang Ning

Downloaded 22,467 times from 2020.9.9 to 2024.1.9.
Key publications:
- Ning, D., Yuan, M., Wu, L. et al. A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications 11, 4717 (2020). https://doi.org/10.1038/s41467-020-18560-z iCAMP was developed in this study (Data&Code)
- Ning D, Wang Y, Fan Y, et al. Environmental stress mediates groundwater microbial community assembly. Nature Microbiology (2024). https://doi.org/10.1038/s41564-023-01573-x Genearl framework about stress-assembly relationship was proposed in this study (Data&Code)
Recommendation: NST (stochasticity assessment tool)

News

2024.1.11 Our paper about stress-assembly relationship is published on Nature Microbiology today. https://doi.org/10.1038/s41564-023-01573-x
2023.9.4 iCAMP v1.6.4 is uploaded. More functions for big data; added options for main function to save intermediate results for resuming after unexpected break.
2022.6.1 iCAMP v1.5.12 is published on CRAN.
2022.4.10 iCAMP v1.5.7 is uploaded to CRAN.
2021.10.28 Studies using iCAMP are published on Ecology Letters (https://doi.org/10.1111/ele.13904) and Water Research (https://doi.org/10.1016/j.watres.2021.117744; https://doi.org/10.1016/j.watres.2021.117295)
2021.4.18 iCAMP v1.4.3 updated on github, to allow relative abundances in community matrix and community data transformation.
2021.4.15 iCAMP is highlighted on DOE website.
2021.4.1 Frontiers in Microbilogy opens a research topic Community Assembly Mechanisms Shaping Microbiome Spatial or Temporal Dynamics.
2021.1.9 iCAMP v1.3.4 updated on CRAN, improved function qpen and added function icamp.cate to summary for different categories (e.g. core vs rare taxa). New example is updated in the subfolder Examples.
2020.9.22 Media reports: OU-VPR news, EurekAlert!, Phy.Org
2020.9.18 iCAMP paper is published on Nature Communications. https://doi.org/10.1038/s41467-020-18560-z
2020.9.9 iCAMP v1.2.8 has been published on CRAN. https://cran.r-project.org/web/packages/iCAMP/
2020.8.25 iCAMP v1.2.5 fixed some typo and memory.limit issue, and is submitted back to CRAN.
2020.8.24 iCAMP v1.2.4 has been submitted to CRAN.
2020.8.23 upload iCAMP package (v1.2.4) and the code/data for the first iCAMP manuscript to GitHub.

Key functions in iCAMP package

iCAMP: Quantify relative importance of basic community assembly processes at both community and phylogenetic group ('bin') levels.
- Based on phylogenetic marker gene sequencing results, e.g. OTU or ASV table and phylogenetic tree from 16S sequencing data.
- The processes including homogeneous and heterogeneous selection, homoginizing and limited dispersal, and 'drift' (drift and other processes)
- Quantitative for each turnover (between two samples) at community level, and for each phylogenetic bin in a group of samples.
- Each phylogenetic bin is usually a group of taxa (a few dozens to a few hundreds of OTUs or ASVs) from a family or order.
- key function: icamp.big (Ning et al 2020 Nat Commun)
To implement some other published methods
- NP: Neutral taxa percentage, i.e. number or relative abundance of taxa following neutral theory model.
  - developed by Burns et al (2016 ISME J), based on a neutral theory model (Sloan et al 2006 EM).
  - I add options to perform bootstraping test and re-define taxa abundance profile in one or multiple metacommunities (regional pools).
  - function: snm.comm
- QPEN: quantifying community assembly processes based on entire-community null model analysis.
  - developed by Stegen et al (2013 ISME J, 2015 Front Microbiol).
  - I add options to handle big datasets and re-define taxa abundance profile in the metacommunity.
  - function: qpen
- tNST and pNST: taxonomic and phylogenetic normalized stochasticity ratio.
  - Not in iCAMP, but in our another R package NST
  - We developed NST (Ning et al 2019 PNAS) from previous stochasticity ratio (Zhou et al 2014 PNAS).
Some handy functions for big datasets
- phylogenetic and taxonomic null model analysis at both community and bin levels
  - functions: bNTIn.p, bNTI.bin.big, bNRIn.p, bNRI.bin.big, RC.pc, RC.bin.bigc
- between-taxa niche difference and phylogenetic distance of big communities
  - functions: dniche, pdist.big
- phylogenetic signal test within phylogenetic groups
  - function: ps.bin
- midpoint root of big trees
  - function: midpoint.root.big

How to use

System requirements

Operating systems: Windows, or Mac, or Linux, any versions which can run R (version >= 3.2).
Dependencis: R (version >=3.5; https://www.r-project.org/), R packages:vegan,parallel,permute,ape,bigmemory,nortest,minpack.lm,Hmisc,stats4,DirichletReg,data.table.
- R package NST is necessary to run the funciton tNST and pNST in the example, but not required for running package iCAMP.
iCAMP current version 1.5.12 has been tested on the current development version of R (4.3.0 pre-release) and R 4.2.1.
Any required non-standard hardware: No. However, if you are dealing with a large dataset (e.g. >20,000 taxa), a server with enough CPU threads (e.g. >=20) is preferred to finish the calculation in reasonable time.

Installation guide

Downlaod and install R (https://www.r-project.org/).
Install iCAMP.
- Install published iCAMP (version<=1.5.12): Open R, use function "install.packages" as below.
```
install.packages("iCAMP")
```
- Install from source file (version>=1.4.1):
  - Download an iCAMP version from this repository iCAMP1/RPackage/AllVersions.
  - Open R, install or update following packages: vegan, parallel, permute, ape, bigmemory, nortest, minpack.lm, Hmisc, stats4, DirichletReg, data.table.
```
install.packages(c("vegan", "permute", "ape", "bigmemory", "nortest", "minpack.lm", "Hmisc", "stats4", "DirichletReg", "data.table"))
```
  - In R, click Packages/install package from local file, then select the file. For windows, select the .zip file. For Mac/Linux, select the .gz file. Alternatively, in Linux sytem, if you open R in a terminal, use following command to install from the .gz file (revise "/Path/to/the/folder" to the real path of the .gz file on your computer, revise "xxx" to the version number of iCAMP):
```
install.packages(pkgs="/Path/to/the/folder/iCAMP_xxx.tar.gz", repos = NULL, type="source")
```
The whole installation typically takes several minutes. Usually, <5 min for R installation, <1 min for the iCAMP package, <5 min for installation of other packages.

Instructions for use

Before analyze your own data with iCAMP, you may go through a simple example dataset in the folder /Examples/SimpleOTU.
When analyzing your own data, check the format of the example data files (otu.txt, tree.nwk, treat2col.txt, and environment.txt) in the folder "SimpleOTU". Revise your data files to the same format. It is fine if you do not have environment factor information, just pay attention to the notes specific to no-env.file situation in the file "icamp.test.r".
Change the folder paths and file names in the "icamp.test.r" to your data as indicated.
Change the thread number for parallel computing, memory limitation, and other parameter setting according to your need. You may check the help document of each function for detailed explanation.
Run the codes and check the output files in the output folder you've specified. You may check the ReadMe.md in /Examples/SimpleOTU for the meaning of each output file, as well as the help documents in the R package for details.

Publications

Our studies

Ning D, Yuan M, Wu L, Zhang Y, Guo X, Zhou X, Yang Y, Arkin AP, Firestone MK, and Zhou J. 2020. A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications 11, 4717. https://doi.org/10.1038/s41467-020-18560-z.
Aslani F, Geisen S, Ning D, Tedersoo L, and Bahram M. 2021. Towards revealing the global diversity and community assembly of soil eukaryotes. Ecology Letters https://doi.org/10.1111/ele.13904.
Wang A, Shi K, Ning D, Cheng H, Wang H, Liu W, Gao S, Li Z, Han J, Liang B, and Zhou J. 2021. Electrical selection for planktonic sludge microbial community function and assembly. Water Research 206, 117744. https://doi.org/10.1016/j.watres.2021.117744.
Ceja-Navarro JA, Wang Y, Ning D, Arellano A, Ramanculova L, Yuan MM, Byer A, Craven KD, Saha MC, Brodie EL, Pett-Ridge J, and Firestone MK. 2021. Protist diversity and community complexity in the rhizosphere of switchgrass are dynamic as plants develop. Microbiome 9, 96. https://doi.org/10.1186/s40168-021-01042-9.
Sun C, Zhang B, Ning D, Zhang Y, Dai T, Wu L, Li T, Liu W, Zhou J, and Wen X. 2021. Seasonal dynamics of the microbial community in two full-scale wastewater treatment plants: Diversity, composition, phylogenetic group based assembly and co-occurrence pattern. Water Research 200, 117295. https://doi.org/10.1016/j.watres.2021.117295.

Other examples

2022

Wang Y, Li S, Lang X, Huang X, and Su J. 2022. Effects of microtopography on soil fungal community diversity, composition, and assembly in a subtropical monsoon evergreen broadleaf forest of Southwest China. CATENA 211, 106025. https://doi.org/10.1016/j.catena.2022.106025.
Song T, Liang Q, Du Z, Wang X, Chen G, Du Z, and Mu D. 2022. Salinity Gradient Controls Microbial Community Structure and Assembly in Coastal Solar Salterns. Genes 13, 385. https://doi.org/10.3390/genes13020385.
Lv B, Shi J, Li T, Ren L, Tian W, Lu X, Han Y, Cui Y, and Jiang T. 2022. Deciphering the characterization, ecological function and assembly processes of bacterial communities in ship ballast water and sediments. Science of the Total Environment 816, 152721. https://doi.org/10.1016/j.scitotenv.2021.152721.
Ju Z, Du X, Feng K, Li S, Gu S, Jin D, and Deng Y. 2021. The Succession of Bacterial Community Attached on Biodegradable Plastic Mulches During the Degradation in Soil. Frontiers in Microbiology 12 https://doi.org/10.3389/fmicb.2021.785737.
Chen S, Tao J, Chen Y, Wang W, Fan L, and Zhang C. 2022. Interactions Between Marine Group II Archaea and Phytoplankton Revealed by Population Correlations in the Northern Coast of South China Sea. Frontiers in Microbiology 12 https://doi.org/10.3389/fmicb.2021.785532.
Zhang S, Li K, Hu J, Wang F, Chen D, Zhang Z, Li T, Li L, Tao J, Liu D, and Che R. 2022. Distinct assembly mechanisms of microbial sub-communities with different rarity along the Nu River. Journal of Soils and Sediments https://doi.org/10.1007/s11368-022-03149-4.

2021

Stopnisek N and Shade A. 2021. Persistent microbiome members in the common bean rhizosphere: an integrated analysis of space, time, and plant genotype. The Isme Journal 15, 2708-2722. https://doi.org/10.1038/s41396-021-00955-5.
Dong Y, Sanford RA, Connor L, Chee-Sanford J, Wimmer BT, Iranmanesh A, Shi L, Krapac IG, Locke RA, and Shao H. 2021. Differential structure and functional gene response to geochemistry associated with the suspended and attached shallow aquifer microbiomes from the Illinois Basin, IL. Water Research 202, 117431. https://doi.org/10.1016/j.watres.2021.117431.
Zhu D, Delgado-Baquerizo M, Ding J, Gillings MR, and Zhu Y-G. 2021. Trophic level drives the host microbiome of soil invertebrates at a continental scale. Microbiome 9, 189. https://doi.org/10.1186/s40168-021-01144-4.
Sun Y, Zhang M, Duan C, Cao N, Jia W, Zhao Z, Ding C, Huang Y, and Wang J. 2021. Contribution of stochastic processes to the microbial community assembly on field-collected microplastics. Environmental Microbiology 23, 6707-6720. https://doi.org/10.1111/1462-2920.15713.
Macia-Vicente JG and Popa F. 2022. Local endemism and ecological generalism in the assembly of root-colonizing fungi. Ecological Monographs 92, e1489. https://doi.org/10.1002/ecm.1489.
Yi M, Fang Y, Hu G, Liu S, Ni J, and Liu T. 2021. Distinct community assembly processes underlie significant spatiotemporal dynamics of abundant and rare bacterioplankton in the Yangtze River. Frontiers of Environmental Science & Engineering 16, 79. https://doi.org/10.1007/s11783-021-1513-4.
Zheng L, Wang X, Ding A, Yuan D, Tan Q, Xing Y, and Xie E. 2021. Ecological Insights Into Community Interactions, Assembly Processes and Function in the Denitrifying Phosphorus Removal Activated Sludge Driven by Phosphorus Sources. Frontiers in Microbiology 12 https://doi.org/10.3389/fmicb.2021.779369.
Matar GK, Ali M, Bagchi S, Nunes S, Liu W-T, and Saikaly PE. 2021. Relative Importance of Stochastic Assembly Process of Membrane Biofilm Increased as Biofilm Aged. Frontiers in Microbiology 12 https://doi.org/10.3389/fmicb.2021.708531.
Yuan H, Li T, Li H, Wang C, Li L, Lin X, and Lin S. 2021. Diversity Distribution, Driving Factors and Assembly Mechanisms of Free-Living and Particle-Associated Bacterial Communities at a Subtropical Marginal Sea. Microorganisms 9, 2445. https://doi.org/10.3390/microorganisms9122445.
Wang Y, Lu G, Yu H, Du X, He Q, Yao S, Zhao L, Huang C, Wen X, and Deng Y. 2021. Meadow degradation increases spatial turnover rates of the fungal community through both niche selection and dispersal limitation. Science of the Total Environment 798, 149362. https://doi.org/10.1016/j.scitotenv.2021.149362.
Xie J, Wang X, Xu J, Xie H, Cai Y, Liu Y, and Ding X. 2021. Strategies and Structure Feature of the Aboveground and Belowground Microbial Community Respond to Drought in Wild Rice (Oryza longistaminata). Rice 14, 79. https://doi.org/10.1186/s12284-021-00522-8.
Zhou R, Wang H, Wei D, Zeng S, Hou D, Weng S, He J, and Huang Z. 2021. Bacterial and eukaryotic community interactions might contribute to shrimp culture pond soil ecosystem at different culture stages. Soil Ecology Letters https://doi.org/10.1007/s42832-021-0082-6.

End

icamp1's People

Contributors

Stargazers

Watchers

icamp1's Issues

an issue in bNTI.big.r

Hi
Great package!
I found that there may be a unnecessary variable "c1" in Line 95 lapply function in bNTI.big.r file. This may lead to an error when nworker=1.

Chi

Error in serialize(data, node$con) : ignoring SIGPIPE signal

hello, I get the same error. I check the node number in the tree, the number isn't abnormally low compared to the tip number.

the code like this

----bMPD bin i=15 in 435 ---- Tue Aug 1 21:42:18 2023
Now calculating observed betaMPD. Begin at Tue Aug 1 21:42:18 2023. Please wait...
Now randomizing by parallel computing. Begin at Tue Aug 1 21:43:31 2023. Please wait...
Now fixing special cases. Begin at Tue Aug 1 21:44:54 2023. Please wait...
All match very well.
Now calculating observed MPD. Begin at Tue Aug 1 21:44:55 2023. Please wait...
Error in makePSOCKcluster(names = spec, ...) :
Cluster setup failed. 15 of 15 workers failed to connect.
Error in makePSOCKcluster(names = spec, ...) :
Cluster setup failed. 14 of 15 workers failed to connect.
Now randomizing by parallel computing. Begin at Tue Aug 1 21:47:54 2023. Please wait...
Error in serialize(data, node$con) : ignoring SIGPIPE signal

Can you give me some advices?

Issue with Cluster on R4.03 MacOS Big Sur

This package ran excellently on a previous version of R, but after updating to 4.03 and reinstalling packages, I am now getting the following error:

Error in makePSOCKcluster(names = spec, ...) :
Cluster setup failed. 4 of 4 workers failed to connect.

I tried saw on a different page this issue can be resolved by specifying clusters:
parallel::makeCluster(4, setup_strategy = "sequential")

But this did not seem to work for me.

Thanks!

Metagenomic data could be used for iCAMP?

Hi,
Metagenomic data or other OTU/ASV beyond 16S could be used for iCAMP? Thank you!

Error in com.bin[[i]]: attempt to select less than one element in integerOneIndex

Dear Ning

I am trying to use this package to estimate microbial community assembly, but when I run taxa.binphy.big, the value of bin.id.new for each OTU is "NA". I also tried icamp.big, but an error message "Error in com.bin[[i]]: attempt to select less than one element in integerOneIndex." occurred. Hope you could help me to fix the error. Thank you for your time!

Here is the code I used:
ds = 0.2
bin.size.limit = 12
sig.index="Confidence"
phylobin=taxa.binphy.big(tree = tree, pd.desc = pd.big$pd.file,pd.spname = pd.big$tip.label,
pd.wd = pd.big$pd.wd, ds = ds, bin.size.limit = bin.size.limit)
sp.bin=phylobin$sp.bin[,3,drop=FALSE]
sp.bin
icres=iCAMP::icamp.big(comm=comm, pd.desc = pd.big$pd.file, pd.spname=pd.big$tip.label,
pd.wd = pd.big$pd.wd, rand = 999, tree=tree,
ds = 0.2, pd.cut = NA, sp.check = TRUE,
phylo.rand.scale = "within.bin", taxa.rand.scale = "across.all",
phylo.metric = "bMPD", sig.index=sig.index, bin.size.limit = bin.size.limit,
rtree.save = FALSE, detail.save = TRUE,
qp.save = FALSE, detail.null = FALSE, ignore.zero = TRUE, output.wd = save.wd,
correct.special = TRUE, unit.sum = rowSums(comm), special.method = "depend",
ses.cut = 1.96, rc.cut = 0.95, conf.cut=0.975, omit.option = "no",meta.ab = NULL)

Besides, for QPEN calculation, there is no qpen.test() function for significane test.

Results of bin.size.limit parameters to12 and 72 were same

Hi,
When I set the bin.size.limit parameters to12 and 72, their results were the same and very different from pNST. But the pNST result is more consistent with the inferences of my other results.

Therefore, I would like to know if the result can be closer to the results of pNST by changing the parameters of bin.size.limit to 36 and 48.

In addition, if I want to use the results of pNST, how can I get the contributions of various processes of rare or Abundant species like ICAMP pipeline step 15 : summarize core, rare, and other taxa.

Error while computing bNRI for strict bins

The function icamp.big fails when restricting it to only strict bins with the error Invalid 'type' (list) of argument in R
This seems to because of how the community table is subsetted based on the strict bin IDs.

bin.lev = levels(as.factor(sp.bin[, 1]))
bin.num = length(bin.lev)
com.bin = lapply(1:bin.num, function(i) {
comm[, match(rownames(sp.bin)[which(sp.bin == i)], colnames(comm))]
  })

should instead be (if I am not mistaken)

bin.lev = levels(as.factor(sp.bin[, 1]))
bin.num = length(bin.lev)
com.bin = lapply(1:bin.num, function(i) {
comm[, match(rownames(sp.bin)[which(sp.bin == bin.lev[i])], colnames(comm))]
  })

Potentially incorrect column names in icamp.cate

Dear Ning,

The icamp.cate() function uses Wtuvk$name1 and Wtuvk$name2 to refer to the sample IDs. However the results from the corresponding icamp.bins() seems to name those two columns as Wtuvk$samp1 and Wtuvk$samp2 instead. The same issue also occurred in the local variable Ptuvct. I'm not 100% sure if I understand the code correctly but I've tried replacing all the $name* stuff with samp* in the code by fix() and that worked.

Values for ds and bin.size.limit

Hi Ning,

Thanks for the great package. I went through your articles & the example code, however, I couldn't find how one can find optimal values for ds and bin.size.limit. Appreciate any advice/thoughts

On a related note, can categorical variables be used for identifying niche preferences i.e. used in the environment.txt file? We sampled across depth and as such depth integrates multiple niche dimensions, many of which we did not measure explicitly

Cheers, Adi

Mismatch in values between vegdist Bray distances and RC.pc CB.obs

Dear Dr. Ning,

I am curious as to why I am getting different observed Bray-Curtis dissimilarity values when using vegdist (method = "bray) and RC.pc(taxo.metric = "bray").

If you know what might be going on, I would appreciate hearing your thoughts.

Thank you,
Magda

Code using the example data from the iCAMP package:

data("example.data")
comm=example.data$comm
rand.time=20
nworker=2

RC=RC.pc(comm=comm, rand = rand.time,
nworker = nworker, weighted = TRUE,
output.bray = TRUE,
taxo.metric = "bray",
sig.index="RC")

RC$BC.obs

vegdist(comm, method = "bray")

How to get the category information?

Dear Dr. Ning
I would like to use the package iCAMP to analyze my own data. However, I don't know how to get the category information. Core, others, and rare OTU definitions were based on their relative abundance from the OTU table ? What is the threshold value for core, others, or rare otu?

Best wishes

Long

Error in checkForRemoteErrors(val)

Hi Daliang,

I try to quantify the community assembly processes by using the following code:

setwd(save.wd) if(!file.exists("pd.desc")) { pd.big=iCAMP::pdist.big(tree = tree, wd=save.wd, nworker = nworker, memory.G = memory.G) }else{ pd.big=list() pd.big$tip.label=read.csv(paste0(save.wd,"/pd.taxon.name.csv"),row.names = 1,stringsAsFactors = FALSE)[,1] pd.big$pd.wd=save.wd pd.big$pd.file="pd.desc" pd.big$pd.name.file="pd.taxon.name.csv" }

iCAMP::qpen.cm(comm=comm,pd=pd.big$pd.file,pd.big.wd=pd.big$pd.wd, pd.big.spname=pd.big$tip.label,ab.weight=TRUE, rand.time=rand.time, nworker=nworker,project=prefix, wd=save.wd, save.bNTIRC=TRUE, meta.group = metadata[,"habitat2",drop=F])

However, I encountered the following error:

Setting parallel cluster for path computing cost 1.400004 secs. Tue Dec 19 13:42:17 2023
Parallel for 600 tips cost 1.988125 mins. Tue Dec 19 13:44:16 2023
Path computing by parallel may take 2.19445240951909 hours. Tue Dec 19 13:44:18 2023
Now computing path for the rest 39666 tips. begin at Tue Dec 19 13:44:19 2023. Please wait...
Computing path for the rest 39666 tips actually took 2.771099 mins. Tue Dec 19 13:47:04 2023
Now setting big matrix file on the disk. Tue Dec 19 13:47:05 2023
Setting parallel cluster for pdist computing cost 1.330648 secs. Tue Dec 19 13:47:20 2023
Parallel computing Pdist for the first 600 runs cost 2.446895 mins. Tue Dec 19 13:49:47 2023
The rest Pdist computing by parallel may take 1.33220872485118 hours. Tue Dec 19 13:49:47 2023
Computing pdist for the rest 39666 tips actually took 16.60026 mins. Tue Dec 19 14:06:23 2023
All match very well.
The names are re-ranked.
All match very well.
All match very well.
Now calculating observed betaMNTD. Begin at Tue Dec 19 14:06:33 2023. Please wait...
Now randomizing by parallel computing. Begin at Tue Dec 19 14:18:16 2023. Please wait...
Error in checkForRemoteErrors(val) :
120 nodes produced errors; first error: incorrect number of subscripts on matrix
Calls: ... clusterApply -> staticClusterApply -> checkForRemoteErrors
In addition: Warning message:
'memory.size()' is Windows-specific
Execution halted

Some confusions in using iCAMP

Hi Daliang,

I'm sorry to bother you, but I would like to ask you about two questions that have been troubling me for a long time

The first question is that can I perform separate iCAMP analysis for each group's data by setting the 'treat' parameter in 'iCAMP:bins' to NULL? Also, I want to ask if it's valid to describe the results as follows: for example, if I perform iCAMP analysis separately for Region A and Region B and find that diffusion limitation accounts for 60% in Region A and 50% in Region B, can I conclude that diffusion limitation plays a greater role in the construction of the community in Region A compared to Region B?

The reason I ask this question is because I currently have microbial data from multiple regions with control and experimental groups. I am considering directly performing iCAMP analysis on all the data, with the first column of the treatment file representing different regions and the second column representing different treatments. However, this calculation may not be appropriate. For example, when we divide the data by region, we cannot consider the impact of different treatments on microbial community construction. Similarly, dividing the data by treatment may overlook the influence of different regions. Of course, I believe I can solve this problem by pairwise comparison, but this may result in a cumbersome presentation of results. Therefore, it would be very convenient to perform iCAMP analysis separately for each group and then compare the results if feasible.

Maybe what I'm most concerned about is whether the results of iCAMP analysis on a microbiome will differ significantly depending on the presence or absence of a treatment file or different comparison objects (for example, comparing with sample A under condition A for the first time and with sample B under condition B for the second time). Because I think the assembly of microbiome in a certain group will not be changed by how they compare with other group. And do you have any better suggestions for this issue?

Another question I have is that after I conducted iCAMP analysis, my treatment file was divided into two groups: RS and RP. In the file named "ProcessImportance_EachGroup," besides the relative proportion of each ecological process for RS and RP, there is also a row named "RS_vs_RP." Could you please explain what this row represents?

And by the way could you give me some general rule for choosing the most appropriate bin.size.limit?

Best regards,

ITS data?

Hi,

Can I use this tool to analyze ITS data? I know that building phylogenetic tree using ITS data is problematic because the length of ITS region is highly variable.

Error shows" Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: subscript out of bounds"

Hi Professor Ning,
When I run "# 6 # calculate pairwise phylogenetic distance matrix.":

since microbial community data usually has a large number of species (OTUs or ASVs), we use "big.matrix" in R package "bigmemory" to handle the large phylogenetic distance matrix.

setwd(save.wd)
if(!file.exists("pd.desc"))

{
pd.big=iCAMP::pdist.big(tree = tree, wd=save.wd, nworker = nworker, memory.G = memory.G)
output files:
path.rda: a R object to list all the nodes and edge lengthes from root to every tip. saved in R data format. an intermediate output when claculating phylogenetic distance matrix.
pd.bin: BIN file (backingfile) generated by function big.matrix in R package bigmemory. This is the big matrix storing pairwise phylogenetic distance values. By using this bigmemory format file, we will not need memory but hard disk when calling big matrix for calculation.
pd.desc: the DESC file (descriptorfile) to hold the backingfile (pd.bin) description.
pd.taxon.name.csv: comma delimited csv file storing the IDs of tree tips (OTUs), serving as the row/column names of the big phylogenetic distance matrix.
}else{
if you already calculated the phylogenetic distance matrix in a previous run
pd.big=list()
pd.big$tip.label=read.csv(paste0(save.wd,"/pd.taxon.name.csv"),row.names = 1,stringsAsFactors = FALSE)[,1]
pd.big$pd.wd=save.wd
pd.big$pd.file="pd.desc"
pd.big$pd.name.file="pd.taxon.name.csv"
}
Now computing path. begin at Wed Mar 8 13:12:29 2023. Please wait...
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: subscript out of bounds
In addition: Warning message:
'memory.limit()' is no longer supported

Error shows" Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: subscript out of bounds"

I don't know what's wrong with the parameters. How can I solve it? Thank you very much!

Best regards,

Xiaojie

category.txt
classification.txt
otus.txt
treat2col.txt

Error in oldname[j, 1:2] : subscript out of bounds

hello, when i run icamp.bins (# 10 # iCAMP bin level statistics), i met the following error: Error in oldname[j, 1:2] : subscript out of bounds
Calls: icamp.bins -> -> t -> sapply -> lapply -> FUN
Execution halted

The nohup.out:
nohup: ignoring input
Now summarizing method=CbMPDiCbraya i=1 j=1. Sun Feb 20 21:25:44 2022
bootstrapping rt=1. Sun Feb 20 21:25:45 2022
bootstrapping rt=201. Sun Feb 20 21:26:04 2022
bootstrapping rt=401. Sun Feb 20 21:26:22 2022
bootstrapping rt=601. Sun Feb 20 21:26:42 2022
bootstrapping rt=801. Sun Feb 20 21:27:00 2022
Now summarizing method=CbMPDiCbraya i=1 j=2. Sun Feb 20 21:27:19 2022
bootstrapping rt=1. Sun Feb 20 21:27:20 2022
bootstrapping rt=201. Sun Feb 20 21:27:38 2022
bootstrapping rt=401. Sun Feb 20 21:27:57 2022
bootstrapping rt=601. Sun Feb 20 21:28:16 2022
bootstrapping rt=801. Sun Feb 20 21:28:35 2022
Now summarizing method=CbMPDiCbraya i=1 j=3. Sun Feb 20 21:28:54 2022
bootstrapping rt=1. Sun Feb 20 21:28:54 2022
bootstrapping rt=201. Sun Feb 20 21:29:13 2022
bootstrapping rt=401. Sun Feb 20 21:29:32 2022
bootstrapping rt=601. Sun Feb 20 21:29:51 2022
bootstrapping rt=801. Sun Feb 20 21:30:10 2022
Now summarizing method=CbMPDiCbraya i=1 j=4. Sun Feb 20 21:30:29 2022
bootstrapping rt=1. Sun Feb 20 21:30:29 2022
bootstrapping rt=201. Sun Feb 20 21:30:48 2022
bootstrapping rt=401. Sun Feb 20 21:31:07 2022
bootstrapping rt=601. Sun Feb 20 21:31:25 2022
bootstrapping rt=801. Sun Feb 20 21:31:44 2022
Now summarizing method=CbMPDiCbraya i=1 j=5. Sun Feb 20 21:32:03 2022
bootstrapping rt=1. Sun Feb 20 21:32:04 2022
bootstrapping rt=201. Sun Feb 20 21:32:22 2022
bootstrapping rt=401. Sun Feb 20 21:32:42 2022
bootstrapping rt=601. Sun Feb 20 21:33:01 2022
bootstrapping rt=801. Sun Feb 20 21:33:19 2022
Now summarizing method=CbMPDiCbraya i=1 j=6. Sun Feb 20 21:33:38 2022
bootstrapping rt=1. Sun Feb 20 21:33:39 2022
bootstrapping rt=201. Sun Feb 20 21:33:58 2022
bootstrapping rt=401. Sun Feb 20 21:34:16 2022
bootstrapping rt=601. Sun Feb 20 21:34:35 2022
bootstrapping rt=801. Sun Feb 20 21:34:54 2022
Now summarizing method=CbMPDiCbraya i=1 j=7. Sun Feb 20 21:35:12 2022
bootstrapping rt=1. Sun Feb 20 21:35:13 2022
bootstrapping rt=201. Sun Feb 20 21:35:32 2022
bootstrapping rt=401. Sun Feb 20 21:35:51 2022
bootstrapping rt=601. Sun Feb 20 21:36:10 2022
bootstrapping rt=801. Sun Feb 20 21:36:29 2022
Now summarizing method=CbMPDiCbraya i=1 j=8. Sun Feb 20 21:36:48 2022
Error in oldname[j, 1:2] : subscript out of bounds
Calls: icamp.bins -> -> t -> sapply -> lapply -> FUN
Execution halted

I am wondering where the problem lays. Looking forward for your reply. Big thanks!

The issue with step"binps=iCAMP::ps.bin"

Hello Daliang, l followed the script, and when l ran this part :
"binps=iCAMP::ps.bin(sp.bin = sp.bin,sp.ra = sp.ra,
spname.use = spname.use,
pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd,
nd.list = niche.dif$nd,nd.spname = nd.spname,ndbig.wd = niche.dif$nd.wd,
cor.method = "kendall",r.cut = 0.1, p.cut = 0.05, min.spn = 10)

The error message is "Error in cor(as.vector(xdis), ydis, method = method, use = use) :
missing observations in cov/cor"

How could i fix this?

Some questions about OTUs in each group

Dear Ning:
In my analysis,my treat file has juet one OTU in each group.When I use icamp package to calculate icamp.big,R will feedback error:

Error in [.data.frame(weight, (sig.phy > sig.phy.cut | sig.phy2 > sig.phy2.cut)) :
dims [product 2]与对象长度[11]不匹配

But when I add OTU to each group and make sure the number of OTUs in each gruop greater or eaqul to 2,the program can calculate smoothly.

Why is this happening?

Error in GetElements.bm(x, i, j): Illegal row index usage in extraction

Hi,
I got the following error when doing phylogenetic binning.

The data run normally for the other three seasons, but this error occurred during the summer. How to resolve it?

Hi, Daliang

It seems like something wrong happened in ''icres=iCAMP::icamp.big''.

icres=iCAMP::icamp.big(comm=comm, pd.desc = pd.big$pd.file, pd.spname=pd.big$tip.label,
pd.wd = pd.big$pd.wd, rand = rand.time, tree=tree,
prefix = prefix, ds = 0.2, pd.cut = NA, sp.check = TRUE,
phylo.rand.scale = "within.bin", taxa.rand.scale = "across.all",
phylo.metric = "bMPD", sig.index=sig.index, bin.size.limit = bin.size.limit,
nworker = nworker, rtree.save = FALSE, detail.save = TRUE,
qp.save = FALSE, detail.null = FALSE, ignore.zero = TRUE, output.wd = save.wd,
correct.special = TRUE, unit.sum = rowSums(comm), special.method = "depend",
ses.cut = 1.96, rc.cut = 0.95, conf.cut=0.975, omit.option = "no",meta.ab = NULL)

----bMPD bin i=27 in 67 ---- Wed Nov 1 18:25:33 2023
Now calculating observed betaMPD. Begin at Wed Nov 1 18:25:33 2023. Please wait...
Now randomizing by parallel computing. Begin at Wed Nov 1 18:25:35 2023. Please wait...
Now fixing special cases. Begin at Wed Nov 1 18:27:02 2023. Please wait...
All match very well.
Now calculating observed MPD. Begin at Wed Nov 1 18:27:02 2023. Please wait...
Now randomizing by parallel computing. Begin at Wed Nov 1 18:27:04 2023. Please wait...
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: invalid 'size' argument

I have modified the format of ASV (absolute abundance, integers) table. However, the same error is still present.
I would greatly appreciate it if you could help me solve the error. Thanks for your time.

How to get the results of each ecological process of each bin between each two samples

Hi,

I tried to find an output about the each ecological process of each bin between each two samples but failed. I found the results of the file named icbin$Wtuvk turned out to be pretty much what I was looking for, but they only showed the names of dominant process, not the detailed values. Does the output exist to get what I want? If not, how do I calculate to get them.

Sincerely

missing values are not allowed with argument 'na.rm = FALSE'

Hi Daliang,
When I run

9.5 # input community matrix as relative abundances (values < 1) rather than counts

comra=comm/rowSums(comm)
prefixra=paste0(prefix,"RA")
bin.size.limit = 24 # For real data, usually use a proper number according to phylogenetic signal test or try some settings then choose the reasonable stochasticity level. our experience is 12, or 24, or 48. but for this example dataset which is too small, have to use 5.
icres6=iCAMP::icamp.big(comm=comra,tree=tree,pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd,
rand=rand.time,prefix=prefixra,ds=0.2,pd.cut=NA,sp.check=TRUE,
phylo.rand.scale="within.bin",taxa.rand.scale="across.all",
phylo.metric="bMPD",sig.index="Confidence",
bin.size.limit=bin.size.limit,nworker=nworker,memory.G=memory.G,
rtree.save=FALSE,detail.save=TRUE,qp.save=FALSE,detail.null=FALSE,
ignore.zero=TRUE,output.wd=save.wd,correct.special=TRUE,unit.sum=rowSums(comra),
special.method="depend",ses.cut = 1.96,rc.cut = 0.95,conf.cut=0.975,
omit.option="no",meta.ab=NULL, taxo.metric="bray", transform.method=NULL,
logbase=2, dirichlet=TRUE)

Error shows" 4 nodes produced errors; first error: missing values are not allowed with argument 'na.rm = false' "
How can I solve it? Thank you very much!

Hi, professor Ning, I would like to ask you about the operation of iCAMP code

When I run the line of code: ieggr::save.file(t(comm),filename = "16s-otu.resamp")
there is an error: Error in library(ieggr) : 不存在叫‘ieggr’这个名字的程辑包

I have tried changed the version of R and installed the package in a variety of ways, but all falied. It seems there is no file of the package ieggr in github or other website. I would very appreciate it if you could provide me with the local files of package "ieggr" .

X nodes produced errors; first error: 'size'参数不对

Dear author, when I use the code of iCMAP ，the error occurred:
“ icamp.out=icamp.big(comm=comm,tree=tree,#pd.wd=pd.wd,

                rand=999, nworker=4,bin.size.limit=12,ds=0.2)

The names are re-ranked.
----------Now binning-----------------Sun Feb 12 16:12:40 2023
Now computing path. begin at Sun Feb 12 16:12:40 2023. Please wait...
Now computing path. begin at Sun Feb 12 16:12:44 2023. Please wait...
Now computing dist to root. begin at Sun Feb 12 16:12:45 2023. Please wait...
----------Now binning com and sp without omitting small bins-----------------Sun Feb 12 16:12:47 2023
----bMPD bin i=1 in 37 ---- Sun Feb 12 16:12:48 2023
Now calculating observed betaMPD. Begin at Sun Feb 12 16:12:48 2023. Please wait...
Now randomizing by parallel computing. Begin at Sun Feb 12 16:12:48 2023. Please wait...
Now fixing special cases. Begin at Sun Feb 12 16:13:07 2023. Please wait...
All match very well.
Now calculating observed MPD. Begin at Sun Feb 12 16:13:12 2023. Please wait...
Now randomizing by parallel computing. Begin at Sun Feb 12 16:13:12 2023. Please wait...
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: 'size'参数不对
此外: There were 11 warnings (use warnings() to see them) ”

I don't know wha't wrong with the parameters, my data have same formation with example data, and I already adjusted the parameter but the issue not addressed.Do you have any suggestions?

Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: subscript out of bounds

An error occurred before the updated version.
When I updated to the latest version, I still had problems

`otu <- otu_table(ps_rs_filtered) %>% data.frame(.) %>% t() #row is sampleID
tax <- tax_table(ps_rs_filtered) %>% data.frame(.)
rand.time=1000
nworker=30
memory.G=200
sampid.check=match.name(rn.list=list(otu=otu,treat=treat)) #Mismatch warning: treat.rowname has 24 mismatched names.
treat=sampid.check$treat
otu=sampid.check$otu
otu=otu[,colSums(otu)>0,drop=FALSE]
ncol(otu)# 32778
nrow(otu)# 72

spid.check=match.name(cn.list=list(otu=otu),rn.list=list(tax=tax),tree.list=list(tree=tree)) #Mismatch warning: tax.rowname has 45320 mismatched names.
#The names are re-ranked.
Warning message:
In ape::drop.tip(tr, rm.tip) : drop all tips of the tree: returning NULL

otu=spid.check$otu
tax=spid.check$tax
tree=spid.check$tree

setwd(save.wd)
set.seed(12345)
if(!file.exists("pd.desc"))
{
pd.big=iCAMP::pdist.big(tree = tree, wd=save.wd, nworker = nworker, memory.G = memory.G)
}else{
pd.big=list()
pd.big$tip.label=read.csv(paste0(save.wd,"/pd.taxon.name.csv"),row.names = 1,stringsAsFactors = FALSE)[,1]
pd.big$pd.wd=save.wd
pd.big$pd.file="pd.desc"
pd.big$pd.name.file="pd.taxon.name.csv"
}

save(pd.big,file = "pd.big.rda")
save.image(file = "icamp.Rdata")

#Now computing path. begin at Sun Jul 17 20:48:19 2022. Please wait...
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: subscript out of bounds

choice of phylogenetic metric

Dear Ning,

I tried reproducing results from your Nat. Comm. paper. I get 658 bins (ds = 0.2, bin.size.limit = 12) as you report in the paper. However, when I had a look at the maximum and mean phylogenetic distances, it looks like a good number of bins have a fair number of OTUs which lie outside the signal threshold as a result of merging bins. In this case, will bNRI still provide reliable results?

Regards, Adi

Error in UseMethod("is.rooted")

Dear Ning
I tried to use the following R code to quantify the assembly process of microbial communities: wd0 = getwd()
save.wd = paste0(tempdir(),"/pdbig")
nworker=6
rand.time = 1000
bin.size.limit = 24
otu <- read.table("otutab.txt", header = TRUE, row.names = 1, stringsAsFactors = FALSE, comment.char = "")
tree <- ape::read.tree("otus.nwk")
library(iCAMP)
icamp.out <- icamp.big(comm = otu,tree = tree,pd.wd = save.wd,rand = rand.time,nworker = nworker,bin.size.limit = bin.size.limit,ds = 0.2 ). But it failed, and the error message was as follows, "unexpected name: DMH40 DMH41 DMH42 DMH44
The names are re-ranked.
----------Now binning-----------------Sat Dec 9 12:54:23 2023
Error in UseMethod("is.rooted") :
'is.rooted' has no method available for 'NULL' target object
Calls: icamp.big -> -> is.rooted".
I tried to find the answer on various websites, but no one encountered a similar situation. If you have time, can you help me find this error information
Thank you very much,
yang

Error occured while runwhile bNRI.bin.big

Dealing with dataset with four metacommunities

Hi Ning,

How would you suggest using iCAMP to deal with a dataset comprising groups of samples from four different metacommunities? The group comparisons within each metacommunity is the same, just that I would like to treat the turnovers within each one separately and then compare them at the end. Thanks for your time

Regards, Adi

Other data transformations prior to beta diversity calculations

Hi Ning,

It would be great if other normalisation measures (e.g. hellinger or log) can be made available across the iCAMP R functions which calculate beta diversity. I am looking to apply the hellinger transformations and as such down-weight highly abundant species in my samples

Regards, Adi

Error in CreateFileBackedBigMatrix(as.character(backingfile), as.character(backingpath), : Problem creating filebacked matrix.

Dear Ning,
I get this error while I try to ran the example code.

the error:

Error in CreateFileBackedBigMatrix(as.character(backingfile), as.character(backingpath), :
Problem creating filebacked matrix.

the example code:

data("example.data")
comm=example.data$comm
tree=example.data$tree

save.wd=tempdir()
pd.wd=paste0(save.wd,"/pdbig")
nworker=4
rand.time=20

bin.size.limit=5

setwd(save.wd)
icamp.out=icamp.big(comm=comm,tree=tree,pd.wd=pd.wd,
rand=rand.time, nworker=nworker,
bin.size.limit=bin.size.limit)

Out of memory?

Thanks for the great package! I've run this on a dataset before without issue. However, I am having difficulty running the icamp.big() command with my current dataset. The output says:

----------Now binning-----------------Fri Feb 11 13:02:56 2022
Now computing path. begin at Fri Feb 11 13:03:04 2022. Please wait...
Now computing path. begin at Fri Feb 11 13:06:18 2022. Please wait...
Now computing dist to root. begin at Fri Feb 11 13:06:32 2022. Please wait...

----------Now binning com and sp without omitting small bins-----------------Fri Feb 11 13:08:22 2022
----bMPD bin i=1 in 13 ---- Fri Feb 11 13:08:23 2022
Now calculating observed betaMPD. Begin at Fri Feb 11 13:08:48 2022. Please wait...
Now randomizing by parallel computing. Begin at Fri Feb 11 13:11:32 2022. Please wait...
Error in serialize(data, node$con) : ignoring SIGPIPE signal
Calls: ... postNode -> sendData -> sendData.SOCKnode -> serialize
Execution halted

It appears the task has run out of memory. Does this seem to be the case? I am running this on 32 cores with 350 G of memory. The dataset is large, but not exceptionally so (87 samples and 39675 ASVs). Is it just a matter of giving the job more memory, or is there perhaps a different underlying issue?

A portion of my code is presented below. Thanks!

nworker = 32
memory.G = 350
rand.time = 1000
bin.size.limit = 24
sig.index="Confidence"

icres=iCAMP::icamp.big(comm=comm, pd.desc = pd.big$pd.file, pd.spname=pd.big$tip.label,
     pd.wd = pd.big$pd.wd, rand = rand.time, tree=tree,
     prefix = prefix, ds = 0.2, pd.cut = NA, sp.check = TRUE,
     phylo.rand.scale = "within.bin", taxa.rand.scale = "across.all",
     phylo.metric = "bMPD", sig.index=sig.index, bin.size.limit = bin.size.limit,
     nworker = nworker, memory.G = memory.G, rtree.save = FALSE, detail.save = TRUE,
     qp.save = FALSE, detail.null = FALSE, ignore.zero = TRUE, output.wd = save.wd,
     correct.special = TRUE, unit.sum = rowSums(comm), special.method = "depend",
     ses.cut = 1.96, rc.cut = 0.95, conf.cut=0.975, omit.option = "no",meta.ab = NULL)

Can't find the help documents

Hi,

Really sorry as I'm sure I am missing something very obvious - but I can't find the help documents. For example, in 8.2 it says '# see help document of the function "ps.bin" for the meaning of output' and in step 9 it says 'ig.index="Confidence" # see other options in help document of icamp.big'.

Would you be able to point me in the direction of the help documents.

Thanks,
Josh

"Invalide 'size' argument" using the RC.pc function

Hi @DaliangNing ,

I'm using the RC.pc function witch works well on my raw data. But when i rarefy and normalized my data with log and hellinger, I have the error:

Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: invalid 'size' argument

I've tried with the phyloseq data set Globalpatternsand i'm facing the same issue.
Do you know how to solve this problem?

library(vegan)
library(phyloseq)
library(iCAMP)
data(GlobalPatterns)
sort(sample_sums(GlobalPatterns))
GlobalPatterns <- rarefy_even_depth(GlobalPatterns,sample.size = 50000)
count <- t(as(otu_table(GlobalPatterns), "matrix"))
count <- log1p(count)
count <- as.data.frame(decostand (count, 'hellinger'))
RCbray <- RC.pc(count, rand = 1000, weighted = TRUE)

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

I get this error while I try to find optimal values for ds and bin.size.limit.

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

traceback()
2: colnames<-(*tmp*, value = *vtmp*)
1: iCAMP::ps.bin(sp.bin = sp.bin, sp.ra = sp.ra, spname.use = spname.use,
pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd,
nd.list = niche.dif$nd, nd.spname = niche.dif$names, ndbig.wd = niche.dif$nd.wd,
cor.method = "pearson", r.cut = 0.1, p.cut = 0.05, min.spn = 5)

Here is the code for this test.

ds = 0.2 # setting can be changed to explore the best choice
bin.size.limit = 5# setting can be changed to explore the best choice. # here set as 5 just for the small example dataset. For real data, usually try 12 to 48.
phylobin=taxa.binphy.big(tree = tree, pd.desc = pd.big$pd.file,pd.spname = pd.big$tip.label,
                         pd.wd = pd.big$pd.wd, ds = ds, bin.size.limit = bin.size.limit,
                         nworker = nworker)

# 8.2 # test within-bin phylogenetic signal.
sp.bin=phylobin$sp.bin[,3,drop=FALSE]
sp.ra=colMeans(comm/rowSums(comm))
abcut=3 # you may remove some species, if they are too rare to perform reliable correlation test.
commc=comm[,colSums(comm)>=abcut,drop=FALSE]
dim(commc)
spname.use=colnames(commc)
binps=iCAMP::ps.bin(sp.bin = sp.bin,sp.ra = sp.ra,spname.use = spname.use,
                    pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd,
                    nd.list = niche.dif$nd,nd.spname = niche.dif$names,ndbig.wd = niche.dif$nd.wd,
                    cor.method = "pearson",r.cut = 0.1, p.cut = 0.05, min.spn = 5)

traceback()

But it ran smoothly if I set bin.size.limit more than 11. About this test, I have a phylogenetic tree with 2718 tips. However, I want to combine their tips with phylogenetic bins (deeper levels) in order to make a reduced phylogeny for analysis of ecological processes. Many thanks for your help in advance.
Test_size_11.PhyloSignalDetail.csv
Test_size_11.PhyloSignalSummary.csv

abundance weighted bMNTD & bNTI

Hi Ning,

When comm is the raw counts matrix (i.e. no rarefaction or total sum scaling), how is abundance weighting done when bMNTD & bNTI are calculated? From the bMNTD function, I see that when abundance.weighted = TRUE, the counts are transformed into relative proportions before being multiplied with MNTD. Is my reading of how the function works correct?

Regards, Adi

core, rare and other taxa

Hi,
I am working through the SimpleOTU example and I cannot find the category.txt file. Is there any problem with it? How do you define the three categories? I searched in the icamp.test file and did not find anything. I also looked for it in the published article and its R scripts but I cannot find anything.

Thank you very much for your help and this nice library

Manuel

How to evaluate the different results resulting from different methods

Dear Ning
Because I am new to Null model , I tried every methods like tNST,pNST,iCAMP,qpen,etc. BUT I got completely different results.
I feed the data of two groups named T and W.
From tNST,i got

From pNST,i got

From iCAMP with the bmin as 72(determined by the maximum RAsig），i got

From qpen (by the way ,i was confused by the function 'qpen.cm' with the para 'metagroup' which i gave the 'treat' like the
other processes, but the qpen.cm was interrupted with an error. AND the reason i use 'qpen.cm' was that I can't get individual
results for each group from 'qpen', which i think i get from 'qpen.test' maybe ? Am i right?) , i got

Sincerely

Failing to match OTU IDs

Hi Daliang,
While running iCAMP pipeline (icamp.test.r) on my own dataset it is failing to match OTU IDs (Step #5). However, I am not sure why this issue comes up since all tree tip labels match the OTU names in the count and taxonomy tables.

> summary(tree$tip.label %in% colnames(comm))
  Mode    TRUE 
logical   22003

> summary(tree$tip.label %in% (clas$SpeciesID))
   Mode    TRUE 
logical   22003

Any suggestions on what to change?

I generated the nwk tree in QIIME2, imported it into R and done some filtering using phyloseq. I exported the final tree using the phy_tree() function from the phyloseq package and saved it as .nwk with write.tree() function from the ape package.

Error in 1:ncol(bMNTD.randm) : argument of length 0

When I set detail.null= TRUE using bNTIn.p. I encountered this error: Error in 1:ncol(bMNTD.randm) : argument of length 0. But if I set detail.null= FALSE (default) using bNTIn.p, It's working smothly. And I can get the beltaNTI value. I am not sure whether I may use this result. Qpen also have same error. Here is the dataset I used.
comm_and_phydist.zip

bNTI_placement.location=bNTIn.p(comm=OTU.location, dis=phydist, nworker = 4, weighted = TRUE, rand = 1000,output.bMNTD=TRUE,detail.null=TRUE)
Now calculating observed betaMNTD. Begin at Thu Mar 25 13:01:14 2021. Please wait...
Now randomizing by parallel computing. Begin at Thu Mar 25 13:01:16 2021. Please wait...
Error in 1:ncol(bMNTD.randm) : argument of length 0
bNTI_placement.location=bNTIn.p(comm=OTU.location, dis=phydist, nworker = 4, weighted = TRUE, rand = 1000,output.bMNTD=TRUE)
Now calculating observed betaMNTD. Begin at Thu Mar 25 13:02:41 2021. Please wait...
Now randomizing by parallel computing. Begin at Thu Mar 25 13:02:43 2021. Please wait...
qpen(comm=OTU.location, pd=phydist,rand.time = 1000,ab.weight = TRUE)
The names are re-ranked.
Now calculating observed betaMNTD. Begin at Thu Mar 25 13:10:56 2021. Please wait...
Now randomizing by parallel computing. Begin at Thu Mar 25 13:10:57 2021. Please wait...
Error in 1:ncol(bMNTD.randm) : argument of length 0

Process in "D" state on NFS filesystem in a HPC environment

Dear developer, I am an administrator of an HPC cluster. One of our users is running iCAMP on a cluster system running SLURM and ubuntu 22.04, with an installation of R 4.3.0. We notice that whenever iCAMP is called, it spawns many threads, some of which show a status of "D" (uninterruptable sleep). This has caused the NFS filesystem to be non responsive which affects other users. Do iCAMP or any of its dependent packages place file locks? If so on which file(s) and what happens if we disable file locking on the NFS partition? Thank you!

Correlate bin-level processes with environmental variates

Dear Ning,

It would be great to have a function which can correlate bin-level ecological processes (as you do in your paper) with environmental variables using Mantel tests/MRM

Cheers, Adi

Question about results of QPEN

Hi,
i got some confusions about my results "ALL.QPEN.Bootstrapping.Summary.csv"

in this figure, the management group "S", which relative importance of assembly process is zero except HoS, but it seems irrational?
if it is needed, i can upload extra materials for checking

Best regards,

Error in h(simpleError(msg, call)) : when running icamp.big

----bMPD bin i=1 in 1 ---- Wed Nov 23 22:31:43 2022
Now calculating observed betaMPD. Begin at Wed Nov 23 22:31:43 2022. Please wait...
Now randomizing by parallel computing. Begin at Wed Nov 23 22:31:45 2022. Please wait...
Now fixing special cases. Begin at Wed Nov 23 22:31:53 2022. Please wait...
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix': subscript out of bounds

when using
icamp.out <- icamp.big(comm = comm, tree = tree,
pd.wd = getwd(),
ses.cut = 1.96,
rc.cut = 0.95,
bin.size.limit = 5,
rand = 99, nworker = 4)

Solution to large dataset?

Hi Daliang,

My community composition matrix includes 1,000 samples x 550,000 OTUs. I ran the iCAMP analyses on a HPC with 94 CPUs and 470 Gb memory. However, an error message of out-of-memory occurs during the phylogenetic signal test and iCAMP::icamp.big.

Do you have any tips on such a large dataset?

Best,

Chen

issue with icamp.big

Good afternoon,
I installed iCAMP in R (v. 4.2.1 (2022-06-23) in LINUX) and tested the install with the example data. I am using Rstudio and connecting with a remote server. Everything works until step 9 (the iCAMP analysis) where I get the message

Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: there is no package called 'iCAMP'

The entire output to the console is:

All match very well.
----------Now binning-----------------Mon Jan 9 15:47:47 2023
Now computing path. begin at Mon Jan 9 15:47:48 2023. Please wait...
Now computing path. begin at Mon Jan 9 15:47:49 2023. Please wait...
Now computing dist to root. begin at Mon Jan 9 15:47:50 2023. Please wait...
----------Now binning com and sp without omitting small bins-----------------Mon Jan 9 15:47:50 2023
----bMPD bin i=1 in 3 ---- Mon Jan 9 15:47:51 2023
Set of permutations < 'minperm'. Generating entire set.
Now calculating observed betaMPD. Begin at Mon Jan 9 15:47:51 2023. Please wait...
Now randomizing by parallel computing. Begin at Mon Jan 9 15:47:51 2023. Please wait...
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: there is no package called 'iCAMP'

I am using the example data and the example code as provided. Is there a known issue with the version of R that fails to find iCAMP? I have tried a reinstall (that gave the same error). I also tried using one core (still failed).

All help/insights gratefully received.

Error in oldname[j, 1:2] : subscript out of bounds

Dear professor Ning,

Hope everthing is going well with you.

when I run ，the error occurred:
"> icbin=iCAMP::icamp.bins(icamp.detail = icres$detail,treat = treat,

             clas=clas,silent=FALSE, boot = TRUE,

             rand.time = rand.time,between.group = TRUE)

Now summarizing method=CbMPDiCbraya i=1 j=1. Mon Feb 27 21:43:00 2023
bootstrapping rt=1. Mon Feb 27 21:43:00 2023
bootstrapping rt=201. Mon Feb 27 21:43:01 2023
bootstrapping rt=401. Mon Feb 27 21:43:02 2023
bootstrapping rt=601. Mon Feb 27 21:43:03 2023
bootstrapping rt=801. Mon Feb 27 21:43:04 2023
Now summarizing method=CbMPDiCbraya i=1 for between.group j=1 v=2. Mon Feb 27 21:43:05 2023
Error in oldname[j, 1:2] : subscript out of bounds"

I don't know wha't wrong with it. Do you have any suggestions? Thank you very much!

Best regards,

xiaojie

OpenBLAS blas_thread_init: pthread_create failed for thread in betaMPD calculation

Hi Daliang,
When I used the iCAMP package to calculate betaMPD, some problems happened. I have set the thread numbers before I used the R code. Typed the code like this: export OPENBLAS_NUM_THREADS=10
However, the problems are still here. I have no idea how to deal with these problems.

the error information is following:
----bMPD bin i=14 in 200 ---- Sun Sep 26 01:49:55 2021
Now calculating observed betaMPD. Begin at Sun Sep 26 01:49:55 2021. Please wait...
Now randomizing by parallel computing. Begin at Sun Sep 26 01:49:58 2021. Please wait...
Now fixing special cases. Begin at Sun Sep 26 01:52:01 2021. Please wait...
All match very well.
Now calculating observed MPD. Begin at Sun Sep 26 01:52:01 2021. Please wait...
Now randomizing by parallel computing. Begin at Sun Sep 26 01:52:04 2021. Please wait...
OpenBLAS blas_thread_init: pthread_create failed for thread 76 of 80: 资源暂时不可用
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 6191435 max
OpenBLAS blas_thread_init: pthread_create failed for thread 77 of 80: 资源暂时不可用
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 6191435 max
OpenBLAS blas_thread_init: pthread_create failed for thread 78 of 80: 资源暂时不可用
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 6191435 max
OpenBLAS blas_thread_init: pthread_create failed for thread 79 of 80: 资源暂时不可用
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 6191435 max
(END)

Is it necessary to rarify the count data before starting iCAMP analysis ?

Hi there,
Is it necessary to rarify the count data based on sample containing lowest number of sequences before starting iCAMP analysis ?

Regards,
Dinesh

unsolved problem

bin.size.limit = 12
sig.index="Confidence" # see other options in help document of icamp.big.
icres=iCAMP::icamp.big(comm=comm, pd.desc = pd.big$pd.file, pd.spname=pd.big$tip.label,

                   pd.wd = pd.big$pd.wd, rand = rand.time, tree=tree,

                   prefix = prefix, ds = 0.2, pd.cut = NA, sp.check = TRUE,

                   phylo.rand.scale = "within.bin", taxa.rand.scale = "across.all",

                   phylo.metric = "bMPD", sig.index=sig.index, bin.size.limit = bin.size.limit,

                   nworker = nworker, memory.G = memory.G, rtree.save = FALSE, detail.save = TRUE,

                   qp.save = FALSE, detail.null = FALSE, ignore.zero = TRUE, output.wd = save.wd,

                   correct.special = TRUE, unit.sum = rowSums(comm), special.method = "depend",

                   ses.cut = 1.96, rc.cut = 0.95, conf.cut=0.975, omit.option = "no",meta.ab = NULL)

The names are re-ranked.
Now calculating max phylogenetic distance.
----------Now binning-----------------Sun Apr 17 13:30:44 2022
Now computing path. begin at Sun Apr 17 13:30:45 2022. Please wait...
Now computing path. begin at Sun Apr 17 17:37:25 2022. Please wait...
Now computing dist to root. begin at Sun Apr 17 21:43:29 2022. Please wait...
----------Now binning com and sp without omitting small bins-----------------Sun Apr 17 21:45:58 2022
----bMPD bin i=1 in 471 ---- Sun Apr 17 21:45:59 2022
Now calculating observed betaMPD. Begin at Sun Apr 17 21:46:00 2022. Please wait...
Now randomizing by parallel computing. Begin at Sun Apr 17 21:46:03 2022. Please wait...
Now fixing special cases. Begin at Sun Apr 17 22:13:18 2022. Please wait...
----bMPD bin i=2 in 471 ---- Sun Apr 17 22:13:18 2022
Now calculating observed betaMPD. Begin at Sun Apr 17 22:13:19 2022. Please wait...
Now randomizing by parallel computing. Begin at Sun Apr 17 22:13:19 2022. Please wait...
Now fixing special cases. Begin at Sun Apr 17 22:14:25 2022. Please wait...
----bMPD bin i=3 in 471 ---- Sun Apr 17 22:14:26 2022
Now calculating observed betaMPD. Begin at Sun Apr 17 22:14:26 2022. Please wait...
Now randomizing by parallel computing. Begin at Sun Apr 17 22:14:27 2022. Please wait...
Now fixing special cases. Begin at Sun Apr 17 22:15:34 2022. Please wait...
----bMPD bin i=4 in 471 ---- Sun Apr 17 22:15:35 2022
Now calculating observed betaMPD. Begin at Sun Apr 17 22:15:35 2022. Please wait...
Now randomizing by parallel computing. Begin at Sun Apr 17 22:15:35 2022. Please wait...
Now fixing special cases. Begin at Sun Apr 17 22:16:40 2022. Please wait...
----bMPD bin i=5 in 471 ---- Sun Apr 17 22:16:41 2022
Now calculating observed betaMPD. Begin at Sun Apr 17 22:16:41 2022. Please wait...
Now randomizing by parallel computing. Begin at Sun Apr 17 22:16:42 2022. Please wait...
Now fixing special cases. Begin at Sun Apr 17 22:17:45 2022. Please wait...
All match very well.
Now calculating observed MPD. Begin at Sun Apr 17 22:17:46 2022. Please wait...
Now randomizing by parallel computing. Begin at Sun Apr 17 22:17:47 2022. Please wait...
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: 'size'参数不对
Dear professor
how we solve this problem?

daliangning / icamp1 Goto Github PK

icamp1's Introduction

iCAMP

News

Key functions in iCAMP package

How to use

System requirements

Installation guide

Instructions for use

Publications

Our studies

Other examples

2022

2021

End

icamp1's People

Contributors

Stargazers

Watchers

Forkers

icamp1's Issues

since microbial community data usually has a large number of species (OTUs or ASVs), we use "big.matrix" in R package "bigmemory" to handle the large phylogenetic distance matrix.

output files:

path.rda: a R object to list all the nodes and edge lengthes from root to every tip. saved in R data format. an intermediate output when claculating phylogenetic distance matrix.

pd.bin: BIN file (backingfile) generated by function big.matrix in R package bigmemory. This is the big matrix storing pairwise phylogenetic distance values. By using this bigmemory format file, we will not need memory but hard disk when calling big matrix for calculation.

pd.desc: the DESC file (descriptorfile) to hold the backingfile (pd.bin) description.

pd.taxon.name.csv: comma delimited csv file storing the IDs of tree tips (OTUs), serving as the row/column names of the big phylogenetic distance matrix.

if you already calculated the phylogenetic distance matrix in a previous run

9.5 # input community matrix as relative abundances (values < 1) rather than counts

Recommend Projects

Recommend Topics

Recommend Org