vascoelbrecht / jamp Goto Github PK

View Code? Open in Web Editor NEW

32.0 9.0 6.0 6.87 MB

JAMP - Just Another Metabarcoding Pipeline

License: Other

R 100.00%

metabarcoding bioinformatics

jamp's Introduction

JAMP introduction

Just Another Metabarcoding Pipeline - Twitter: @VascoElbrecht

JAMP is modular metabarcoding pipeline, integrating different functions from VSEARCH, CUTADAPT and other programs. The pipeline is run as an R package and automatically generates the needed folders and summary statistics for each processing step, allowing you to trouble shoot and adjust settings as needed. Checking the data and statistics after each processing step is a key element of JAMP and encouraged as it also gives you a better understanding of your data and the bioinformatic process.

End of 2021 update: Currently updating the documentation and older functions for a more streamlined experience 😄 .

For a a short tutorial on extracting haplotypes / ESVs from metabarcoding datasets take a look at the denoising quick guide-Denoising-quick-guide!).

Initialling JAMP

Please keep in mind that JAMP needs Vsearch, and Cutadapt installed to work properly. Thus Mac or linux based systems are recommended (and windows not officially supported, but you can install e.g. a ubuntu shell on your windows system!).

To install JAMP locally

# Installing dependencies needed fro JAMP
install.packages(c("bold", "XML", "seqinr", "devtools", "fastqcr"), dependencies=T)
# Load devtools and install package directly from GitHub
library("devtools")
install_github("VascoElbrecht/PrimerMiner", subdir="PrimerMiner")
install_github("tobiasgf/lulu")
install_github("VascoElbrecht/JAMP", subdir="JAMP")

You can also download the latest release of JAMP, extract and intal within R using install.packages("JAMP", repos = NULL, type="source")

Example of a system wide installation on a ubuntu|debian server:

wget https://github.com/VascoElbrecht/JAMP/archive/v0.53.tar.gz
tar -xzf v0.53.tar.gz
cd JAMP-0.53
sudo R CMD INSTALL JAMP

Licence

JAMP is for non profit and academic use only. If you wish to use any aspects of JAMP commercially, please kindly request permission from Vasco Elbrecht first. Thank you!

jamp's People

Contributors

Stargazers

Watchers

Forkers

tristanlefebure karl-cottenie pyspider ondrov jaredfreedman sbu211

jamp's Issues

U_cluster_otus() crashes if only one file is present

Error in swarm

20% of reads get discarded in PE merging with Vsearch

alignment score too low, or score drop too high, vsearch v2.19.0_macos_aarch64

Need to check setting changes

clustering does not work on 1000+ files due to cat issue

cat command can't handle more than 1000 files at once. Temporally fixed, for up to 2000 files.

Subsampling only works, if the file contains enough reads

usearch -fastx_subsample "../D_Minmax/_data/GMP-04606)CCDB-S5-0053)CBGMB-00003_cut_minmax.fasta" -fastaout "_data/GMP-04606)CCDB-S5-0053)CBGMB-00003_cut_minmax_N50000.fasta" -sample_size 50000 -sizein -sizeout

Solution, count reads first and then copy the file over if it has fewer reads than should be subsetted

Bold_web_hack: Make %tage filtering on below order level

In rare cases, ~50-70% matches are returned from bold, and thus NA should be given as a hit rather than a taxonomy.

JAMP won't work with up to date cutadapt

Hi Vasco,
JAMP does not seem to work with the latest version of cutadapt.
After downgrading to 2.5 everything works as usual.
Maybe fix this for performance?

Plotting in Cutadapt function breaks the code even if LDist is set to False

Hi Vasco,
in some rare cases the Cutadapt function will run into an infinite xlim error, even though the data handling completed successfully. Got it worked around by commenting out the plotting part in the sourcecode and write a placeholder to the log.
Suggested solution: If LDist is set to False, avoid all plots, not just the length distribution plots. Or even better add another argument to seperate both plotting options.
best Dominik

add option to consider wobbles in map2ref data

need to do a few tests here

installation

I am trying to install JAMP in a virtual linux machine and I get this error:

installing to library ‘/usr/local/lib/R/site-library’
ERROR: dependency ‘seqinr’ is not available for package ‘JAMP’
removing ‘/usr/local/lib/R/site-library/JAMP’

How can I fix it?

Filtering 5031 reads with min max 217 bp: keep 4270 (84.87%)
Filtering 5635 reads with min max 217 bp: keep 3197 (56.73%)

152 dereplicated files where merged into file:
"_data/3_OTU_clustering/A_all_files_united.fasta"
Total number of sequences (not dereplicated): 534201

United sequences are dereplicated + size filtered into a total of 95693 unique sequences.
File prepared for OTU clustering: B_all_derep.fasta

Clustering reads from
"B_all_derep.fasta" 
otu_radius_pct = 3
strand = plus
Chimeras discarded: 206
OTUs written: 1149 -> file "C_OTUs.fasta"

read renamed! ame as in "B_all_derep.fasta"
Reads remapped!
Subsetting OTUs with 0.1 % anundance; Keeping 13 OTUs
Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning messages:
1: In dir.create(temp_foldername) :
  cannot create dir '_data/6_haplotypes/_data/5_mapp/K01_116_BR2B_BF24_merged_cut_trunc_minmax_RC_maxee.txt_9.txt', reason 'No such file or directory'
2: In file(file, ifelse(append, "a", "w")) :
  cannot open file '_data/6_haplotypes/_data/5_mapp/K01_116_BR2B_BF24_merged_cut_trunc_minmax_RC_maxee.txt_9.txt/OTU_13/OTU_13_tab.csv': No such file or directory
>