a-h-b / dadasnake Goto Github PK

Amplicon sequencing workflow heavily using DADA2 and implemented in snakemake

License: GNU General Public License v3.0

Python 60.48% R 33.22% Shell 6.30%

dadasnake's Introduction

dadasnake is a Snakemake workflow to process amplicon sequencing data, from raw fastq-files to taxonomically assigned "OTU" tables, based on the DADA2 method. Running dadasnake could not be easier: it is called by a single command from the command line. With a human-readable configuration file and a simple sample table, its steps are adjustable to a wide array of input data and requirements. It is designed to run on a computing cluster using a single conda environment in multiple jobs triggered by Snakemake. dadasnake reports on intermediary steps and statistics in intuitive figures and tables. Final data output formats include biom format, phyloseq objects, and flexible text files or R data sets for easy integration in microbial ecology analysis scripts.

Installing dadasnake

For dadasnake to work, you need conda.

Clone this repository to your disk:

git clone https://github.com/a-h-b/dadasnake.git

Change into the dadasnake directory:

cd dadasnake

At this point, you have all the scripts you need to run the workflow using snakemake, and you'd just need to get some data and databases (see point 8). If you want to use the comfortable dadasnake wrapper, follow the points 2-6.

Decide how you want to run dadasnake, if you let it submit jobs to the cluster: Only do one of the two:

if you want to submit the process running snakemake to the cluster:

cp auxiliary_files/dadasnake_allSubmit dadasnake
chmod 755 dadasnake

if you want to keep the process running snakemake on the frontend using tmux:

cp auxiliary_files/dadasnake_tmux dadasnake
chmod 755 dadasnake

If you don't submit jobs to the cluster, but want to run the whole workflow interactively, e.g. on a laptop, it doesn't matter which wrapper you use. Just copy one of them, as described above.

Adjust the file VARIABLE_CONFIG to your requirements (have a tab between the variable name and your setting):

SNAKEMAKE_VIA_CONDA - set this to true, if you don't have snakemake in your path and want to install it via conda. Leave empty, if you don't need an additional snakemake.
SNAKEMAKE_EXTRA_ARGUMENTS - if you want to pass additional arguments to snakemake, put them here (e.g. --latency-wait=320 for slower file systems). Leave empty usually.
LOADING_MODULES - insert a bash command to load modules, if you need them to run conda. Leave empty, if you don't need to load a module.
SUBMIT_COMMAND - insert the bash command you'll usually use to submit a job to your cluster to run on a single cpu for a few days. You only need this, if you want to have the snakemake top instance running in a submitted job. You alternatively have the option to run it on the frontend via tmux. Leave empty, if you want to use this frontend version and have tmux installed. You don't need to set this, if you are wanting to run the workflow interactively / on a laptop.
BIND_JOBS_TO_MAIN - if you use the option to run the snakemake top instance in a submitted job and need to bind the other jobs to the same node, you can set this option to true. See FAQ below for more details. You don't need to set this, if you are wanting to run the workflow interactively / on a laptop.
NODENAME_VAR - if you use the BIND_JOBS_TO_MAIN option, you need to let dadasnake know, how to access the node name (e.g.SLURMD_NODENAME on slurm). You don't need to set this, if you are wanting to run the workflow interactively / on a laptop.
SCHEDULER - insert the name of the scheduler you want to use (currently slurm or uge). This determines the cluster config given to snakemake, e.g. the cluster config file for slurm is config/slurm.config.yaml . Also check that the settings in this file is correct. If you have a different system, contact us ( https://github.com/a-h-b/dadasnake/issues ). You don't need to set this, if you are wanting to run the workflow interactively / on a laptop.
MAX_THREADS - set this to the maximum number of cores you want to be using in a run. If you don't set this, the default will be 50. Users can override this setting at runtime.
NORMAL_MEM_EACH - set the size of the RAM of one core of your normal copute nodes (e.g. 8G). If you're not planning to use dadasnake to submit to a cluster, you don't need to set this.
BIGMEM_MEM_EACH - set the size of the RAM of one core of your bigmem (or highmem) compute nodes. If you're not planning to use dadasnake to submit to a cluster or don't have separate bigmem nodes, you don't need to set this.
BIGMEM_CORES - set this to the maximum number of bigmem cores you want to require for a task. Set to 0, if you don't have separate bigmem nodes. You don't need to set this, if you're not planning to use dadasnake to submit to a cluster.
LOCK_SETTINGS - set this to true, if you don't want users to choose numbers and sizes of compute nodes at run time. If you're not planning to use dadasnake to submit to a cluster, you don't need to set this. Setting LOCK_SETTINGS makes the workflow slightly less flexible, as all large data sets will be run with the maximum number of bigmem nodes you set up here (see big_data settings below). On the other hand, it can be helpful, if you're setting up dadasnake for inexperienced users or have only one possible setting anyhow. If you're not locking, it's advised to set useful settings in the config/config.default.yaml file for normalMem, bigMem, and bigCores.

optional, but highly recommended: Install snakemake via conda: If you want to use snakemake via conda (and you've set SNAKEMAKE_VIA_CONDA to true), install the environment, as recommended by Snakemake:

conda install -c conda-forge mamba
mkdir -p conda
mamba create --prefix $PWD/conda/snakemake_env
conda activate $PWD/conda/snakemake_env
mamba install -c conda-forge -c bioconda snakemake=6.9.1 mamba tabulate=0.8
conda deactivate

Alternatively, if the above does not work, you can install a fixed snakemake version without mamba like so:

conda env create -f workflow/envs/snakemake_env.yml --prefix $PWD/conda/snakemake_env

Dadasnake will run with Snakemake version >= 5.9.1 and hasn't been tested with any previous versions.

Set permissions / PATH: Dadasnake is meant to be used by multiple users. Set the permissions accordingly. I'd suggest:

to have read access for all files for the users plus
execution rights for the dadasnake file and the .sh scripts in the subfolder submit_scripts
read, write and execution rights for the conda subfolder
Add the dadasnake directory to your path.
It can also be useful to make the VARIABLE_CONFIG file not-writable, because you will always need it. The same goes for config.default.yaml once you've set the paths to the databases you want to use (see below).

Initialize conda environments: This run sets up the conda environments that will be usable by all users:

./dadasnake -i config/config.init.yaml

This step will take several minutes. It will also create a folder with the name "dadasnake_initialized". You can safely remove it or keep it. I strongly suggest to remove one line from the activation script after the installation, namely the one reading: R CMD javareconf > /dev/null 2>&1 || true, because you don't need this line later and if two users run this at the same time it can cause trouble. You can do this by running:

sed -i "s/R CMD javareconf/#R CMD javareconf/" conda/*/etc/conda/activate.d/activate-r-base.sh

Optional test run: The test run does not need any databases. You should be able to start it by running

./dadasnake -l -n "TESTRUN" -r config/config.test.yaml

If all goes well, dadasnake will run in the current session, load the conda environment, and make and fill a directory called testoutput. A completed run contains a file "workflow.done". If you don't want to see dadasnake's guts at this point, you can also run this with the -c or -f settings to submit to your cluster or start a tmux session (see How to run dadasnake below).

Databases: The dadasnake does not supply databases. I'd suggest to use the SILVA database for 16S data and UNITE for ITS.

dadasnake can use mothur to do the classification, as it's faster and likely more accurate than the legacy DADA2 option. You need to format the database like for mothur (see here).
dadasnake can alternatively use the DADA2 implementation of the same classifier. You can find some databases maintained by Michael R. McLaren here. More information on the format is in the DADA2 tutorial.
In addition to the bayesian classifier, dadasnake implements DECIPHER. You can find decipher databases on the decipher website or build them yourself.
dadasnake can use fungal traits to assign traits to fungal genere. Download the latest table from here - dadasnake has been tested with v1.2.
You can also use dadasnake to blast and summarize results using basta. Have a look at the NCBI's ftp.
To annotate fungal taxonomy with guilds via funguild, if you have suitable databases.
If you still have a tax4fun2 installation, you can also use it within dadasnake. The package and database were taken off github, so it's not part of defaut dadasnake anymore.
You can now use picrust2 within dadasnake. You need to set the path to the databases of your choice in the config file. By default, dadasnake looks for databases in the directory above where it was called. It makes sense to change this for your system in the config.default.yaml file upon installation, if all users access databases in the same place.

Fasttree: dadasnake comes with fasttree for treeing, but if you have a decent number of sequences, it is likely to be relatively slow. If you have fasttreeMP, you can give the path to it in the config file.

How to cite dadasnake

Christina Weißbecker, Beatrix Schnabel, Anna Heintz-Buschart, Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology, GigaScience, Volume 9, Issue 12, December 2020, giaa135. Please also cite DADA2: Callahan, B., McMurdie, P., Rosen, M. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581–583 (2016), and any other tools you use within dadasnake, e.g. mothur, DECIPHER, ITSx, Fasttree, VSEARCH, FUNGuild, PICRUSt2, BASTA, tax4fun2.

How to run dadasnake

To run the dadasnake, you need a config file and a sample table, plus data:

The config file (in yaml format) is read by Snakemake to determine the inputs, steps, arguments and outputs.
The sample table (tab-separated text) always gives sample names and file names, with column headers named library and r1_file (and r2_file for paired-end data sets). The path to the sample table has to be mentioned in the config file. You can add columns labeled run and sample to indicate libraries that should be combined into one final column and different sequencing runs (see the section about the sample table below).
All raw data (usually fastq files) need to be in one directory (which has to be given in the config file).
It is possible (and the best way to do this) to have one config file per run, which defines all settings that differ from the default config file.

Using the dadasnake wrapper

As shown in the installation description above, dadasnake can be run in a single step, by calling dadasnake. Since most of the configuration is done via the config file, the options are very limited. You can either:

-c run (submit to a cluster) dadasnake and make a report (-r), or
-l run (in the current terminal) dadasnake and make a report (-r), or
-f run (in a tmux session on the frontend) dadasnake only available in the tmux installation and make a report (-r), or
just make a report (-r), or
run a dryrun (-d), or
unlock a working directory, if a run was killed (-u)
initialize the conda environmnets only (-i) - you should only need this during the installation. It is strongly recommended to first run a dryrun on a new configuration, which will tell you within a few seconds and without submission to a cluster whether your chosen steps work together, the input files are where you want them, and your sample file is formatted correctly. In all cases you need the config file as the last argument.

dadasnake -d -r config.yaml

You can also set the number of cpus to maximally run at the same time with -t. The defaults (1 for local/frontend runs and 50 for clusters) are reasonable for many settings and if you don't know what this means, you probably don't have to worry. But you may want to increase the numbers for larger datasets or bigger infrastructure, or decrease the numbers to match your environment's constraints. You can add a name for your main job (-n NAME), e.g.:

dadasnake -c -n RUNNAME -r config.yaml

Note that spaces in RUNNAME are not allowed and dots will be replaced by underscores.

If you use the tmux version, you can see the tmux process running by typing tmux ls. You can also see the progress by checking the stdandard error file tail RUNNAME_XXXXXXXXXX.stderr.

Depending on your dataset and settings and your cluster's scheduler, the workflow will take a few minutes to days to finish.

Running snakemake manually

Once raw data, config file and sample file are present, the workflow can be started from the dadasnake directory by the snakemake command:

snakemake -s Snakefile --configfile /PATH/TO/YOUR/CONFIGFILE --use-conda

If you're using a computing cluster, add your cluster's submission command and the number of jobs you want to maximally run at the same time, e.g.:

snakemake -j 50 -s Snakefile --cluster "qsub -l h_rt={resources.runtime},h_vmem=8G -pe smp {threads} -cwd" --configfile /PATH/TO/YOUR/CONFIGFILE --use-conda

This will submit most steps as their own job to your cluster's queue. The same can be achieved with a cluster configuration:

snakemake -j 50 -s Snakefile --cluster-config PATH/TO/SCHEDULER.config.yaml --cluster "{cluster.call} {cluster.runtime}{resources.runtime} {cluster.mem_per_cpu}{resources.mem} {cluster.threads}{threads} {cluster.partition}" --configfile /PATH/TO/YOUR/CONFIGFILE --use-conda

If you want to share the conda installation with colleagues, use the --conda-prefix argument of Snakemake

snakemake -j 50 -s Snakefile --cluster-config PATH/TO/SCHEDULER.config.yaml --cluster "{cluster.call} {cluster.runtime}{params.runtime} {cluster.mem_per_cpu}{resources.mem} {cluster.threads}{threads} {cluster.partition}" --use-conda --conda-prefix /PATH/TO/YOUR/COMMON/CONDA/DIRECTORY

Depending on your dataset and settings, and your cluster's queue, the workflow will take a few minutes to days to finish.

What does the dadasnake do?

primer removal and removal of poly-G-tails - using cutadapt
quality filtering and trimming - using DADA2
optional downsampling of reads per sample - using seqtk
error estimation & denoising - using DADA2, including Novaseq-enabled models
paired-ends assembly - using DADA2
"OTU" table generation (it contains ASVs, of course) - using DADA2
chimera removal - using DADA2
clustering of ASVs at a user-set similarity (these are called OTU now)
taxonomic classification - using mothur and/or DECIPHER (& ITS detection - using ITSx & blastn + BASTA)
functional annotation - using funguild, fungalTraits, picrust2 (or tax4fun2)
length check - in R
treeing - using clustal omega and fasttree
hand-off in biom-format, as R object, as R phyloseq object, and as fasta and tab-separated tables
keeping tabs on number of reads in each step, and read quality control - using fastqc & multiQC You can control the settings for each step in a config file.

The samples table

Every samples table needs sample names (under header library) and file names (just the names, the path should be in the config file under header r1_file and potentially r2_file). Since DADA2 estimates run-specific errors, it can be helpful to give run IDs (under header run). If you have many (>500 samples), it is also useful to split them into runs for the analysis, as some of the most memory-intensive steps are done by run.
If several fastq files should end up in the same column of the ASV/OTU table, you can indicate this by giving these libraries the same sample name (under header sample). Libraries from different runs are combined in the final ASV/OTU table (example 1). Libraries from the same run are combined after primer-processing (example 2). Example 1: Example 2:

The configuration

The config file must be in .yaml format. The order within the yaml file does not matter, but the hierarchy has to be kept. Explanations can be found in the config-file in config/config.default.yaml .

top-level parameters	sub-parameters	subsub-parameters	default value	possible values	used in stage	explanation	comments / recommendations
raw_directory			"testdata"	any one path where you might have your raw data	all	directory with all raw data	you will usually have this somewhere in a project folder
sample_table			"testdata/samples.small.tsv"	any one location of your samples table	all	path to the samples table	the dadasnake will copy it to your output directory
outputdir			"dadasnake_output"	any path that you have permissions for writing to	all	directory where all the output will go	change this; a scratch-type place works best; each output directory can hold the results of one completed pipeline run only
paired			true	true or false	primers and dada	do you want to use paired-end sequencing data?	if true, you have to give r1_file and r2_file in the samples table, if false only r1_file is read (if you want to use only R2 files from a paired-end sequencing run, put their name in the r1_file column)
tmp_dir			"tmp"	any path that you have permissions for writing to	all	directory for temporary, intermediate files that shouldn't be kept	keep this in a temporary place so you don't need to worry about removing its contents
big_data			false	a boolean	dada, taxonomy, post	whether to use big data settings	set to true, if you have extra high memory nodes and more than 1000 samples
email			""	"" or a valid email address	all	email address for mail notification	keep empty if you don't want emails. Check spelling, it's not tested.
do_primers			true	true or false	all	should primers be cut?
do_dada			true	true or false	all	should DADA2 be run?
do_taxonomy			true	true or false	all	should taxonomic classification be done?
do_postprocessing			true	true or false	all	should some more steps be done (e.g. functional annotation)
hand_off					dada, taxonomy, postprocessing		settings deciding if additional formats should be given
	biom		false	true or false	dada, taxonomy	whether a biome format output should be written	biome contains ASV table or ASV table and taxonomy (if taxonomy was run); biome table is never filtered
	phyloseq		true	true or false	taxonomy, postprocessing	whether a phyloseq object (or two - for ASVs and OTUs) should be returned	contains ASV or OTU table and taxonomy and tree (if each was run; if tree is run on pruned OTU or ASV table, phyloseq object contains filtered dataset)
primers					primers		information on primers
	fwd				primers		information on forward primer
		sequence	GTGYCAGCMGCCGCGGTAA	any sequence of IUPAC DNA code	primers	sequence of forward primer
		name	515F	anything	primers	name of forward primer	for your reference only
	rvs				primers		information on reverse primer
		sequence	GGACTACNVGGGTWTCTAAT	any sequence of IUPAC DNA code	primers	sequence of reverse primer
		name	806R	anything	primers	name of reverse primer
primer_cutting					primers		arguments for primer cutting by cutadapt
	overlap		10	1-length of primer	primers	minimum length of detected primer
	count		2	a positive integer	primers	maximum number of primers removed from each end
	filter_if_not_match		any	any or both	primers	reads are discarded if primer is not found on both or any end	any is the more strict setting; not used in single-end mode
	perc_mismatch		0.2	0-1	primers	% mismatch between read and each primer	don't set this to 1
	indels		"--no-indels"	"--no-indels" or ""	primers	whether indels in the primer sequence are allowed
	both_primers_in_read		false	false or true	primers	whether both primers are expected to be in the read	only used in single-end mode
sequencing_direction			"unknown"	fwd_1, rvs_1 or unknown	primers	fwd_1: fwd primer in read 1; rvs_1: rvs primer in read 1; unknown: you don't know the sequencing direction or the direction is mixed	if you don't know the direction, dadasnake will try to re-orient using the primers
nextseq_novaseq			false	true or false	primers	whether poly-G tails should be removed	set for Nextseq or Novaseq data
filtering					dada		settings for quality / length filtering; note on terminology: for paired sequencing fwd read refers to reads that had fwd primer or were declared as such (if no primer cutting was done); for single-end workflow, only the fwd setting is used, no matter the sequencing direction
	trunc_length				dada		length to truncate to (shorter reads are discarded)
		fwd	0	a positive integer	dada	length after which fwd read is cut - shorter reads are discarded	0: no truncation by length; if you've cut the primers, this number refers to the length left after primer cutting
		rvs	0	a positive integer	dada	length after which rvs read is cut - shorter reads are discarded	0: no truncation by length; ignored in single-ende mode; if you've cut the primers, this number refers to the length left after primer cutting
	trunc_qual				dada		length to truncate to (shorter reads are discarded)
		fwd	2	0-40	dada	fwd reads are cut before the first position with this quality
		rvs	2	0-40	dada	rvs reads are cut before the first position with this quality	ignored in single-ende mode
	max_EE				dada		filtering by maximum expected error after truncation: Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10))
		fwd	2	a positive number	dada	After truncation, read pairs with higher than maxEE "expected errors" in fwd read will be discarded	use with trunc_length and/or truncQ; note that low truncQ or high trunc_length make it difficult to reach low maxEE values
		rvs	2	a positive number	dada	After truncation, read pairs with higher than maxEE "expected errors" in rvs read will be discarded	ignored in single-ende mode; use with trunc_length and/or truncQ; note that low truncQ or high trunc_length make it difficult to reach low maxEE values
	minLen				dada		filtering by mimum length
		fwd	20	a positive integer	dada	Remove reads with length less than minLen on fwd read. minLen is enforced after trimming and truncation.	use with truncQ
		rvs	20	a positive integer	dada	Remove reads with length less than minLen on rvs read. minLen is enforced after trimming and truncation.	ignored in single-ende mode; use with truncQ
	maxLen				dada		filtering by maximum length
		fwd	Inf	a positive integer or Inf	dada	Remove reads with length of fwd read greater than maxLen. maxLen is enforced before trimming and truncation.
		rvs	Inf	a positive integer or Inf	dada	Remove reads with length of rvs read greater than maxLen. maxLen is enforced before trimming and truncation.	ignored in single-ende mode
	minQ				dada		filtering by minimum quality after tuncation
		fwd	0	0 or a positive number	dada	read pairs that contain a quality score lower than this in the fwd read after truncation will be discarded	use with trunc_length
		rvs	0	0 or a positive number	dada	read pairs that contain a quality score lower than this in the rvs read after truncation will be discarded	ignored in single-ende mode; use with trunc_length
	trim_left				dada
		fwd	0	0 or a positive number	dada	this many bases will be cut from the 5' end of fwd reads	filtered reads will have length truncLen-trimLeft
		rvs	0	0 or a positive number	dada	this many bases will be cut from the 5' end of rvs reads	filtered reads will have length truncLen-trimLeft
	rm_phix		true	true or false	dada	remove phiX	useful with Illumina sequencing
downsampling					dada
	do		false	true or false	dada	set to true if you want to downsample before DADA2 ASV construction
	number		50000	positive integer	dada	number of reads to keep per sample
	min		true	true or false	dada	true to keep only samples with that many reads	samples with less reads are discarded
	use_total		false	true or false	dada	downsample to the fraction of a total number	useful for testing settings
	total		100000000	positive integer	dada	total number of reads to keep over all samples	used only with use_total
	seed		123	any positive integer	dada	seed for downsampling	keep constant in re-runs
error_seed			100	any positive integer	dada	seed for error models	keep constant in re-runs
dada					dada		special DADA2 settings - default is good for Illumina
	band_size		16	a positive integer	dada	Banding restricts the net cumulative number of insertion of one sequence relative to the other.	default is good for Illumina; set to 32 for 454 or PacBio
	homopolymer_gap_penalty		NULL	NULL or a negative integer	dada	The cost of gaps in homopolymer regions (>=3 repeated bases). Default is NULL, which causes homopolymer gaps to be treated as normal gaps.	default is good for Illumina; set to -1 for 454
	pool		false	true, false, "pseudo", or "within_run"	dada	Should DADA2 be run per sample (default) or in a pool, or should pseudo-pooling be done?	default is good for Illumina and much more efficient for large data sets; set to true for 454, pacbio and nanopore; set to pseudo for non-huge datasets, if you're interested in rare ASVs. You can also have within-run pools, but this setting is rarely useful.
	omega_A		1e-40	number between 0 and 1	dada	Threshold to start new partition based on abundance in ASV finding.	default is good for Illumina; set lower for 454; according to the DADA2 authors, it's an underused feature - it can also kill your analysis
	priors		""	"" or the absolute path to a fasta file with prior sequence data	dada	You can give DADA2 sequences to look out for in your dataset.	Don't change unless you know what you're doing.
	omega_P		1e-4	number between 0 and 1	dada	Like omega_A, but for sequences matched by priors.	Only does anything, if you gave priors.
	omega_C		1e-40	number between 0 and 1	dada	Threshold to start new partition based on quality in ASV finding.	Don't change unless you know what you're doing.
	selfConsist		false	true or false	dada	Should DADA2 do multiple rounds of ASV inference based on the normal error estimation?	Don't change unless you know what you're doing.
	no_error_assumptions		false	true or false	dada	If you've set selfConsist to true, you can make DADA2 not start from the normal error estimation.	Don't change unless you know what you're doing.
	errorEstimationFunction		loessErrfun	loessErrfun, PacBioErrfun or noqualErrfun, or loessErrfun_mod1 to 4	dada	The error estimation method within the DADA2 inference step.	default is good for Illumina; set to PacBioErrfun for pacbio and possibly to noqualErrfun if your hacking data without real quality values; ErnakovichLab models for Novaseq data are also available as e.g. loessErrfun_mod4
	use_quals		true	true or false	dada	DADA2 can be run without caring about quality.	Don't change unless you know what you're doing.
	gapless		true	true or false	dada	In the pre-screening, Kmers are employed to find gaps.	Don't change unless you know what you're doing - might help with 454 data and the like.
	kdist_cutoff		0.42	a number between 0 and 1	dada	After the pre-screening, sequences of Kmers with this similarity are checked for actual matches.	Don't change unless you know what you're doing.
	match		4	a number	dada	Score for match in Needleman-Wunsch-Alignment (the check for matching sequences).	Don't change unless you know what you're doing.
	mismatch		-5	a number	dada	Penaltiy for mismatch in Needleman-Wunsch-Alignment (the check for matching sequences).	Don't change unless you know what you're doing.
	gap_penalty		-8	a number	dada	Penaltiy for gaps in Needleman-Wunsch-Alignment (the check for matching sequences), unless the gaps are part of homopolymers - these are handled separately, see above.	Don't change unless you know what you're doing.
pair_merging					dada		settings for merging of read pairs
	min_overlap		12	a positive integer	dada	The minimum length of the overlap required for merging the forward and reverse reads.	ignored in single-ende mode
	max_mismatch		0	0 or a positive integer	dada	The maximum mismatches allowed in the overlap region.	ignored in single-ende mode
	just_concatenate		false	true or false	dada	whether reads should be concatenated rather than overlapped	ignored in single-ende mode; If TRUE, the forward and reverse-complemented reverse read are concatenated rather than merged, with a NNNNNNNNNN (10 Ns) spacer inserted between them.
	trim_overhang		true	true or false	dada	whether overhangs should be trimmed off after merging	ignored in single-ende mode; usually, overhangs should have been removed with the primer cutting step
chimeras					dada		settings for chimera removal
	remove		true	true or false	dada	whether chimeras should be removed
	method		consensus	consensus, pooled or per-sample	dada	how chimeras are detected	consensus: samples are checked individually and sequences are removed by consensus; pooled: the samples are pooled and chimeras are inferred from pool; samples are checked individually and sequence counts of chimeras are set to 0 in individual samples
	minFoldParentOverAbundance		2	a number > 1	dada	how overabundant do parents have to be to consider a read chimeric?	Should be higher for long amplicons (e.g. pacbio 3.5)
	minParentAbundance		8	a number > 1	dada	how abundant do parents have to be to consider a read chimeric?	Don't change unless you know what you're doing.
	allowOneOff		false	true or false	dada	should sequences with a mismatch be flagged as potential chimera?	Don't change unless you know what you're doing.
	minOneOffParentDistance		4	a number > 1	dada	if flagging sequences with one mismatch as potential one-off parents, how many mismatches are needed	Don't change unless you know what you're doing.
	maxShift		16	a number	dada	maximum shift when aligning to potential parents	Don't change unless you know what you're doing.
taxonomy					taxonomy		settings for taxonomic annotation
	dada				taxonomy		settings for DADA2 implementation of bayesian classifier
		do	false	true or false	taxonomy	whether DADA2 should be used for taxonomic annotation	the DADA2 implementation may work less well than the mothur classifier, and it may be slower
		post_ITSx	false	true or false	taxonomy	whether the classifier should be run before or after ITSx	if you set this to true, you also have to set ITSx[do] to true; the DB isn't cut to a specific ITS region
		run_on	list with ASV and cluster	list containing ASV and/or cluster	taxonomy	whether the classifier should be run on ASVs and or OTUs clustered from ASVs
		db_path	"../DBs/DADA2"		taxonomy	directory where the database sits	change when setting up dadasnake on a new system
		refFasta	"silva_nr99_v138_train_set.fa.gz"		taxonomy	training database name
		db_short_names	"silva_v138_nr99"		taxonomy	short name(s) to label database(s) in the output, separated by a whitespace; should be as many items as in ref_dbs_full	if your give less database names than databases, not all databases will be used
		ref_dbs_full	""		taxonomy	full path and database file name(s) (without suffix), separated by a whitespace	if your give less database names than databases, not all databases will be used
		minBoot	50	1-100	taxonomy	bootstrap value for classification	see DADA2 documentation for details
		tryRC	false	false or true	taxonomy	if your reads are in the direction of the database (false), or reverse complement or you don't know (true)	true takes longer than false
		seed	101	a positive integer	taxonomy	seed for DADA2 taxonomy classifier	keep constant in re-runs
		look_for_species	false	true or false	taxonomy	whether you want to run a species-level annotation	species is an overkill for 16S data; if you set this, you need to have a specialised database (currently available for 16S silva 132)
		spec_db	"../DBs/DADA2/silva_species_assignment_v138.fa.gz"		taxonomy	a DADA2-formatted species assignment database with path	change when setting up dadasnake on a new system
	decipher				taxonomy		settings for DECIPHER
		do	false	true or false	taxonomy	whether DECIPHER should be used for taxonomic annotation	DECIPHER can work better than the mothur classifier, but it is slower and we don't have many databases for this software; you can run both DECIPHER and mothur (in parallel)
		post_ITSx	false	true or false	taxonomy	whether DECIPHER should be run before or after ITSx	if you set this to true, you also have to set ITSx[do] to true; the DB isn't cut to a specific ITS region
		run_on	list with ASV and cluster	list containing ASV and/or cluster	taxonomy	whether the classifier should be run on ASVs and or OTUs clustered from ASVs
		db_path	"../DBs/decipher"		taxonomy	directory where the database sits	change when setting up dadasnake on a new system
		tax_db	"SILVA_SSU_r138_2019.RData"		taxonomy	decipher database name
		db_short_names	"SILVA_138_SSU"		taxonomy	short name(s) to label database(s) in the output, separated by a whitespace; should be as many items as in ref_dbs_full	if your give less database names than databases, not all databases will be used
		ref_dbs_full	""		taxonomy	full path and database file name(s) (without suffix), separated by a whitespace	if your give less database names than databases, not all databases will be used
		threshold	60	1-100	taxonomy	threshold for classification	see DECIPHER documentation for details
		strand	bottom	bottom, top or both	taxonomy	if your reads are in the direction of the database (top), reverse complement (bottom) or you don't know (both)	both takes roughly twice as long as the others
		bootstraps	100	a positive integer	taxonomy	number of bootstraps
		seed	100	a positive integer	taxonomy	seed for DECIPHER run	keep constant in re-runs
		look_for_species	false	true or false	taxonomy	whether you want to run a species-level annotation after DECIPHER	species is an overkill for 16S data; if you set this, you need to have a specialised database (currently available for 16S silva 132)
		spec_db	"../DBs/DADA2/silva_species_assignment_v138.fa.gz"		taxonomy	a DADA2-formatted species assignment database with path	change when setting up dadasnake on a new system
	mothur				taxonomy		settings for Bayesian classifier (mothur implementation)
		do	true	true or false	taxonomy	whether mothur's classify.seqs should be used for taxonomix annotation	we have more and more specific databases for mothur (and can make new ones), it's faster than DECIPHER, but potentially less correct; you can run both mothur and DECIPHER (in parallel)
		post_ITSx	false	true or false	taxonomy	whether mothur's classify.seqs should be run before or after ITSx	if you set this to true, you also have to set ITSx[do] to true; use an ITSx-cut database if run afterwards
		run_on	list with ASV and cluster	list containing ASV and/or cluster	taxonomy	whether the classifier should be run on ASVs and or OTUs clustered from ASVs
		db_path	"../DBs/amplicon"		taxonomy	directory where the database sits	change when setting up dadasnake on a new system
		tax_db	"SILVA_138_SSURef_NR99_prok.515F.806R"		taxonomy	the beginning of the filename of a mothur-formatted database	don't add .taxonomy or .fasta
		db_short_names	"SILVA_138_SSU_NR99"		taxonomy	short name(s) to label database(s) in the output, separated by a whitespace; should be as many items as in ref_dbs_full	if your give less database names than databases, not all databases will be used
		ref_dbs_full	""		taxonomy	full path and database file name(s) (without suffix), separated by a whitespace	if your give less database names than databases, not all databases will be used
		cutoff	60	1-100	taxonomy	cut-off for classification
blast					taxonomy
	do		true	true or false	taxonomy	whether blast should be run
		run_on	list with ASV and cluster	list containing ASV and/or cluster	taxonomy	whether blast should be run on ASVs and or OTUs clustered from ASVs
	db_path		"../DBs/ncbi_16S_ribosomal_RNA"		taxonomy	path to blast database
	tax_db		16S_ribosomal_RNA		taxonomy	name (without suffix) of blast database
	e_val		0.01		taxonomy	e-value for blast
	tax2id		""	"tax2id table or "none"	taxonomy	whether taxonomic data is available in a tax2id table	this also assumes there is a taxdb file in the db_path; you don't need it, if you have a blast5 database
	all		true		taxonomy	whether blastn should also be run on sequences that have been classified already
	run_basta		true	true or false	taxonomy	whether BASTA should be run on the BLASTn output
	basta_db		"../DBs/ncbi_taxonomy"		taxonomy	path to the NCBI-taxonomy database that is prepared when basta is installed
	basta_e_val		0.00001		taxonomy	e-value for hit selection
	basta_alen		100		taxonomy	minimum alignment length of hits
	basta_number		0	0 or a positive integer	taxonomy	maximum number of hits to use for classification	if set to 0 all hits will be considered
	basta_min		3	a positive number	taxonomy	minimum number of hits a sequence must have to be assigned an LCA	needs to be smaller or equal to max_targets
	basta_id		80	1-100	taxonomy	minimum identity of hit to be considered good
	basta_besthit		true	true or false	taxonomy	if set the final taxonomy will contain an additional column containing the taxonomy of the best (first) hit with defined taxonomy
	basta_perchits		99	an odd number greater than 50	taxonomy	percentage of hits that are used for LCA estimation
ITSx					taxonomy		settings for ITSx
	do		false	true or false	taxonomy	whether ITSx should be run	only makes sense for analyses targetting an ITS region
	min_regions		1	1-4	taxonomy	minimum number of detected regions	counting includes SSU, LSU and 5.8 next to the ITS regions
	region		ITS2	ITS1 or ITS2	taxonomy	which region to extract
	e_val		1.00E-05	0-1	taxonomy	e-value for ITS detection
	query_taxa		.	a letter	taxonomy	Profile set to use for the search	ITSx's -t option, see manual for list
	target_taxon		F	a letter	taxonomy	taxon output from ITSx to filter for	default is F for fungi
postclustering					dada		settings for clustering ASVs into OTUs (since 0.11)
	do		true	true or false	dada	whether to do clustering, if no taxonomy is done	this is ignored if any of the taxonomy steps ask for clustered input
	cutoff		0.97	a value between 0.5 and 1	dada	similarity cut-off
	method		vsearch	vsearch or deciperh	dada	clustering algorithm
	strand		plus	plus or both	dada	which strand to use for vsearch clustering	only used by vsearch, plus is faster and should be appropriate unless sequencing direction is unknown and can't be determined
final_table_filtering					postprocessing		settings for filtering the final ASV and/or OTU tables (before postprocessing, if postprocessing is done)
	do		true	true or false	postprocessing	whether a filtered version of the ASV/OTU table and sequences should be made and used for the post-processing steps
	keep_target_taxa		"."	"." or a regular expression for taxa to keep, e.g. "Bacteria"	postprocessing	pattern to look for in the taxstrings	done based on mothur and dada/DECIPHER result; "." means all are kept; all taxstrings are searched, if multiple classifiers were used - for clustered OTU tables, only the annotation of the OTUs is used, not the summary of ASV taxonomies
	target_min_length		0		postprocessing	minimal length sequence
	target_max_length		Inf		postprocessing	maximum length of sequence
postprocessing					postprocessing		settings for postprocessing
	fungalTraits				postprocessing		settings for fungalTraits
		do	false	true or false	postprocessing	whether fungalTraits should be assigned
		db	"../DBs/functions/FungalTraits_1.2_ver_16Dec_2020_V.1.2.tsv"		postprocessing	path to fungalTraits DB	change when setting up dadasnake on a new system
		classifier_db	mothur.SILVA_138_SSURef_NR99_cut		postprocessing	which classifier to use	can only be one
	funguild				postprocessing		settings for funguild
		do	false	true or false	postprocessing	whether funguild should be run
		funguild_db	"../DBs/functions/funguild_db.json"		postprocessing	path to funguild DB	change when setting up dadasnake on a new system
		classifier_db	mothur.SILVA_138_SSURef_NR99_cut		postprocessing	which classifier to use	can only be one
	picrust2				postprocessing		settings for PICRUSt2
		do	true	true or false	postprocessing	whether PICRUSt2 should be run
		stratified	true	true or false	postprocessing	whether PICRUSt2 should return stratefied output	takes longer
		per_sequence_contrib	true	true or false	postprocessing	whether PICRUSt2 should run per_sequence_contrib routine	takes longer
		skip_norm	false	true or false	postprocessing	whether PICRUSt2 should skip normalization of marker genes
		max_nsti	2	integer	postprocessing	PICRUSt2 max_nsti setting	see PICRUSt2 documentation for details
		do_nsti	true	true or false	postprocessing	whether PICRUSt2 should do NSTI	see PICRUSt2 documentation for details
		do_minpath	true	true or false	postprocessing	PICRUSt2 minpath setting	see PICRUSt2 documentation for details
		do_gapfill	true	true or false	postprocessing	PICRUSt2 gapfill setting	see PICRUSt2 documentation for details
		do_coverage	false	true or false	postprocessing	PICRUSt2 coverage setting	see PICRUSt2 documentation for details
		pathways	true	true or false	postprocessing	PICRUSt2 pathway setting	see PICRUSt2 documentation for details
		min_reads	1	integer	postprocessing	minimum number of reads per ASV to filter before PICRUSt2	setting this higher will remove rare ASVs from calculation (quicker and potentially less noisy)
		min_samples	1	integer	postprocessing	minimum number of samples an ASV needs to be in before PICRUSt2	setting this higher will remove rare ASVs from calculation (quicker and potentially less noisy)
		placement_tool	epa-ng	epa-ng or sepp	postprocessing	PICRUSt2 placement_tool setting	see PICRUSt2 documentation for details
		in_traits	EC,KO	comma-separated combination of COG, EC, KO, PFAM, TIGRFAM	postprocessing	PICRUSt2 in_traits setting	see PICRUSt2 documentation for details
		hsp_method	mp	mp, emp_prob, pic, scp, or subtree_average	postprocessing	PICRUSt2 hsp_method setting	see PICRUSt2 documentation for details
		edge_exponent	0.5	number	postprocessing	PICRUSt2 edge_exponent setting	see PICRUSt2 documentation for details
		min_align	0	number	postprocessing	PICRUSt2 min_align setting	see PICRUSt2 documentation for details
		custom_trait_tables	''	string	postprocessing	PICRUSt2 custom_trait_tables setting	see PICRUSt2 documentation for details - not tested in dadasnake context yet
		marker_gene_table	''	string	postprocessing	PICRUSt2 marker_gene_table setting	see PICRUSt2 documentation for details - not tested in dadasnake context yet
		pathway_map	''	string	postprocessing	PICRUSt2 pathway_map setting	see PICRUSt2 documentation for details - not tested in dadasnake context yet
		reaction_func	''	string	postprocessing	PICRUSt2 reaction_func setting	see PICRUSt2 documentation for details - not tested in dadasnake context yet
		regroup_map	''	string	postprocessing	PICRUSt2 regroup_map setting	see PICRUSt2 documentation for details - not tested in dadasnake context yet
	tax4fun2				postprocessing		settings for tax4fun2 - deprecated !
		do	false	true or false	postprocessing	whether tax4fun2 should be used
		db	"../DBs/functions/Tax4Fun2_ReferenceData_v2"		postprocessing	path to tax4fun2 DB	change when setting up dadasnake on a new system
		database_mod	Ref99NR	Ref99NR or Ref100NR	postprocessing	which database to use
		normalize_by_copy_number	true	true or false	postprocessing	whether to normalize tax4fun2 results by copy number	normalization of pathway results is not possible
		min_identity_to_reference	0.97	90 to 100 or 0.9 to 1.0	postprocessing	minimum similarity between ASV sequence and tax4fun DB
		user_data	false	true or false	postprocessing	whether user database should be used
		user_dir	"../DBs/Functions/GTDB_202_tax4fun2"		postprocessing	path to user database
		user_db	GTDB_fun		postprocessing	path to user database
	treeing			true or false	postprocessing
		do	true		postprocessing	whether a phylogenetic tree should be made
		fasttreeMP	""		postprocessing	path to fasttreeMP executable	change when setting up dadasnake on a new system
	rarefaction_curve		true	true or false	postprocessing	whether a rarefaction curve should be made
sessionName			""	"" or a single word	all	session name	only read, if you're not using the dadasnake wrappernormalMem
bigMem			""	"" or a number and letter	all	size of the RAM of one core of your high memory copute nodes (e.g. 30G)	may be fixed during installation, only necessary for cluster submission
bigCores			""	"" or a number	all	maximum number of high memory copute nodes to use (e.g. 4)	0 means all nodes have the same (normal) size may be fixed during installation, only necessary for cluster submission
sessionKind			""	a string	all	automatically set by dadasnake wrapper	keep ""
settingsLocked			false	a boolean or string	all	automatically set by dadasnake wrapper	it doesn't matter what you do

What if something goes wrong?

If you gave dadasnake your email address and your system supports mailing (to that address), you will receive an email upon start and if the workflow encountered a problem or after the successful run. If there was a problem, you have to check the output and logs.

Use the -d option of dadasnake or the --dryrun option of Snakemake before the run to check that your input files are where you want them and that you have permissions to write to your target directory. This will also do some checks on the configuration and samples table, so it discovers the majority of errors on a suitable combination of dataset and configuration.
You can not make two runs of dadasnake write to the same output directory. If you start the second run while the first is still running, you will get an error either indicating that the directory can't be locked, or that the metadata is incomplete. If you've finished the first run already, the dadasnake will tell you that there's nothing to be done. Change the output directory in the config file to be unique for each run.
A common reason for errors are misformatted inputs, e.g. the databases for the classification or the read files.
dadasnake should catch most errors related to empty outputs. For example: the filtering is too stringent and no sequences are left; the primers you expected to find are not present; the sequences were truncated too short to be merged. Please report issues where this didn't happen.
The best way to pinpoint those errors is to first check the .stderr file made by dadasnake (or the Snakemake output, if you run the workflow outside dadasnake). This will tell you which rule encountered the error, and, if you use the cluster submission, the job ID. You may have to search for the error a bit, because dadasnake will try to finish as much as possible of your run before dying. Hint: you can find errors by colour or by searching for "Error in rule".
If you use the cluster submission, log files for every rule are written into the output directory and you can check the one with the job ID for additional information, otherwise the same information is written to the Snakemake output.
The logs directory in the output directory contains log files for all steps that can produce comments. They are named with the step and then the name of the rule, so you can check the log file of the step that sent the error. Depending on the tool that sent the error, this will be easy to understand or cryptic. Don't hesitate to raise an issue in this repository if you get stuck.

How to ...?

I don't have primers on my reads, what do I do? Set do_primers: false in the configuration file, but make sure that orientation of the reads is the same.

I did paired end sequencing, but my reads are too short to overlap You have two options:

use only one read (usually the first) by setting paired: false in the config file and providing only the read you want to use in the samples table. This will run a single-end workflow. The makers of DADA2 would probably recommend this option in most cases.
use both reads, set a truncation length for filtering to make sure the sequences have the same lengths and use DADA2's option to "merge" reads without overlap e.g.

filtering:
  trunc_length:
    fwd: 250
    rvs: 200
pair_merging:
  min_overlap: 0
  just_concatenate: true

I need to set further parameters for job submission You can change the cluster configs and add the parameter, for example directly as part of the call field.

I need to bind the jobs to the same node as the main job Yes, you can. If you use the submission-based wrapper, you can provide the flag for choosing a node as part of the SUBMIT_COMMAND variable in the VARIABLE_CONFIG file. Also, specify BIND_JOBS_TO_MAIN as true. You also need to set the variable that holds the node's name in your submission system as NODENAME_VAR. All jobs will then be submitted to the same node as the one that runs the main snakemake, if you include the flag for choosing a node as part of the call field in the cluster config. You can also specify that one, using -b. Example: VARIABLE_CONFIG file:

...
SUBMIT_COMMAND	slurm --nodelist=
BIND_JOBS_TO_MAIN	true
NODENAME_VAR	SLURMD_NODENAME
SCHEDULER	slurm_simple
...

slurm_simple.config:

__default__:
  call: "sbatch --nodelist="
  mem_per_cpu: "--mem-per-cpu "
  partition: ""
  runtime: "-t"
  threads: "-c"
  stdout: "-o dadasnake.{rule}.{wildcards}.stdout"

call:

./dadasnake -c -b favorite_node -n TESTRUN config/config.test.yaml

I have a very large dataset Great, if you have the computing power to match it, dadasnake will help you. It has successfully processed >27,000 samples in the same run. If you run out of memory in your run, set big_data to true in the config file and allow the use of multiple bigmem cores (we needed 360GB RAM for the 27,000 dataset). Disable highly memory intensive steps, such as treeing, chimera removal, plotting of rarefaction curves. If you didn't use the grouping by runs in your sample table, invent some runs of approx 100 samples each - these will be treated separately for some of the heavier DADA2 steps (error estimation).

How do I restart a failed run? Depends on why it failed...

If you ran into a time limit or similar, you can just run dadasnake on the same config with the -u option and then again with the -c option. This will make Snakemake pick up where it left off.
For most other situations, it's probably best to fix what caused the error in your config file and delete the output directory to start from scratch. If you're going to be loosing a lot of run time to that, and you're quite certain the problem is only in the last attempted step, you can try to restart. Ask us, if in doubt.

Can I restart from a certain step? If you're familiar with Snakemake, you can use it to force re-running the steps you need. It's not (yet) part of the dadasnake to do this more comfortably.

dadasnake's People

Contributors

Stargazers

Watchers

Forkers

marcnioz vmikk lvelosuarez hivlab saishikawa atchon fred-white94 ppreshant sarah-nhm ramongallego sebrauschert felizitas-roy lafontrapnouiltristan bodington matteamueller ayixon

dadasnake's Issues

NCBI BLAST nt database configuration with dadasnake: Example config.yaml files for use with BLAST

Hi Anna and coauthors, thanks in advance for any advice. I really like the pipeline and could use some help getting it to work with using BLAST and NCBI's nt database. I am having issues getting the correct config settings for using NCBI nt database and taxdb as reference databases for COI.

What are the appropriate config parameters to use NCBI's nt database and taxonomy (taxdb) as reference for a marker like COI?
Could you provide an example config.yaml file that uses Blast nt database as the reference db?

I am able to run the pipeline, but am getting errors at the blastn_cluster step. Specifically, the name of the blast database is 'nt', but because the NCBI nt database is so big there is not a single file named 'nt' but many files with nt.XXX. I am getting the error in logs/blastn_cluster.log. It appears the issues are with the makeblastdb step in blastn_cluster. The database is already made and in a local directory. I have the NCBI nt and taxdump database installed locally and following installation instructions from BASTA as linked in the dadasnake installation instructions.

#Here are the errors I'm getting.

BLAST options error: File /home/jwhitney/dadasnake/DBs/blastdbs/nt does not exist.

log: logs/blastn_cluster.log (check log file(s) for error message)

conda-env: /home/jwhitney/programs/dadasnake/conda/66132e6a149ec730ec4c2d24861f8d4c

shell:

if [ -s clusteredTables/consensus.fasta ]; then

if [ ! -f "/home/jwhitney/dadasnake/DBs/blastdbs/nt.nin" ]

then

makeblastdb -dbtype nucl -in /home/jwhitney/dadasnake/DBs/blastdbs/nt -out /home/jwhitney/dadasnake/DBs/blastdbs/nt &> logs/blastn_cluster.log

fi

blastn -db /home/jwhitney/dadasnake/DBs/blastdbs/nt -query clusteredTables/consensus.fasta -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids stitle" -out clusteredTables/blast_results.tsv -max_target_seqs 10 &>> logs/blastn_cluster.log

else

touch clusteredTables/blast_results.tsv

fi

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

And here are the relevant parts of the config.yaml

SETTINGS FOR TAXONOMIC ANNOTATION

taxonomy:
dada:
do: TRUE

classification is only done, if do_taxonomy is true

taxonomy:
mothur:
do: FALSE
db_path: "/home/jwhitney/.basta/taxonomy"
tax_db: ""

blast:
do: true

blast is only done, if do_taxonomy is true

run_on:
- ASV
- cluster
db_path: "/home/jwhitney/dadasnake/DBs/blastdbs"
tax_db: "nt"
e_val: 0.01
tax2id: ""
all: true
max_targets: 10
run_basta: true
basta_db: "/home/jwhitney/.basta/taxonomy"
basta_e_val: 0.00001
basta_alen: 100
basta_number: 0
basta_min: 3
basta_id: 80
basta_besthit: true
basta_perchits: 99

Thanks in advance for any advice.

Failed "optional test run" - Step 7 of Install

Thanks for creating this pipeline. Following installation, I can not get dadasnake's test run to complete. I believe I have successfully installed everything and worked through the install checklist up to step 7 (optional test run). When running the test run workflow, I receive an attribute error (full output pasted below):

./dadasnake -l -n "TESTRUN" -r config/config.test.yaml

AttributeError: 'str' object has no attribute 'name' and the completed workflow does not generate.

I have tried this outside of the conda environment and also inside the snakemake_env, with the same result. I also get the same error when running the dadasnake wrapper or when running snakemake manually. The test run outputs the folder 'test_output' which contains the tmp directory (empty) and the full.config.yaml file.

Here is the command and output:
$ ./dadasnake -l -n "TESTRUN" -r config/config.test.yaml

Running workflow in current session - don't use this setting except with small datasets.
laptop
Final resource settings:
maxCores: 1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Traceback (most recent call last):
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/init.py", line 699, in snakemake
success = workflow.execute(
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/workflow.py", line 1052, in execute
logger.run_info("\n".join(dag.stats()))
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/dag.py", line 2187, in stats
yield tabulate(rows, headers="keys")
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/tabulate/init.py", line 2048, in tabulate
list_of_lists, headers = _normalize_tabular_data(
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/tabulate/init.py", line 1471, in _normalize_tabular_data
rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/tabulate/init.py", line 1471, in
rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/tabulate/init.py", line 107, in _is_separating_line
(len(row) >= 1 and row[0] == SEPARATING_LINE)
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/rules.py", line 1127, in eq
return self.name == other.name and self.output == other.output
AttributeError: 'str' object has no attribute 'name'

Final resource settings:
maxCores: 1
Building DAG of jobs...
Creating report...
Missing metadata for file workflow.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file primers.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file preprocessing/2/sample_2.fwd.fastq.gz. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
WorkflowError:
File reporting/primerNumbers_perLibrary.tsv marked for report but does not exist.

$ tree ../test_output/
../test_output/
├── full.config.yaml
└── tmp
1 directory, 1 file

Tried running Test Run using Snakemake manually:
conda activate conda/snakemake_env/
snakemake -s Snakefile --configfile config/config.test.yaml --use-conda --cores 30

Final resource settings:
maxCores: 30
Building DAG of jobs...
Creating conda environment ../workflow/envs/fastqc.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/fastqc.yml created (location: .snakemake/conda/40d0a99ce3531274640417bbe23d90e9)
Creating conda environment ../workflow/envs/picrust2_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/picrust2_env.yml created (location: .snakemake/conda/4117cf158b05e8fb263fbf1aafa50706)
Creating conda environment ../workflow/envs/dadasnake_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/dadasnake_env.yml created (location: .snakemake/conda/bb7758cbd57f9c6ad9bd1f4b617f928c)
Creating conda environment ../workflow/envs/dada2_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/dada2_env.yml created (location: .snakemake/conda/15d5591b4d9c375abf290c8e8bc6eaaf)
Creating conda environment ../workflow/envs/vsearch_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/vsearch_env.yml created (location: .snakemake/conda/23d658bcbdffe93a87791ca15057d8cf)
Creating conda environment ../workflow/envs/add_R_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/add_R_env.yml created (location: .snakemake/conda/d0efdf5fa5224374229d84a1a76862f2)
Using shell: /usr/bin/bash
Provided cores: 30
Rules claiming more threads will be scaled down.
Traceback (most recent call last):
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/init.py", line 699, in snakemake
success = workflow.execute(
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/workflow.py", line 1052, in execute
logger.run_info("\n".join(dag.stats()))
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/dag.py", line 2187, in stats
yield tabulate(rows, headers="keys")
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/tabulate/init.py", line 2048, in tabulate
list_of_lists, headers = _normalize_tabular_data(
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/tabulate/init.py", line 1471, in _normalize_tabular_data
rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/tabulate/init.py", line 1471, in
rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/tabulate/init.py", line 107, in _is_separating_line
(len(row) >= 1 and row[0] == SEPARATING_LINE)
File "/home/jwhitney/programs/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/snakemake/rules.py", line 1127, in eq
return self.name == other.name and self.output == other.output
AttributeError: 'str' object has no attribute 'name'

$ tree ./test_output/
./test_output/
├── full.config.yaml
└── tmp

1 directory, 1 file

Thanks in advance for any suggestions.

Visualize DAG

How can I visualize the workflow of dadasnake? This would allow me to get an insight in the separate processes of the pipeline.

I do not succeed in creating a dag file with the snakemake --dag option with the dadasnake wrapper.

catch error in blast rule if no sequences are left after classification

Within-run pooling

Hello Anna!

This feature request is somehow related to #6.

Currently, there are three DADA2 modes in Dadasnake: run per sample, pool, pseudo-pooling.
Unfortunately, 120GB RAM is not enough to perform pooled inference on our data.
So we are using sample-wise removal of sequencing errors now (dada_dadaReads.single.R to be exact.)
However, it is possible to perform within-run-pooling.

For this purpose it is possible to use errors/models.{run}.RDS generated for each run and dada_dadaReads.pool.R with FASTQs for the same run as input.

To my surprise, it was much faster (but of course more RAM-demanding) then sample-wise inference (due to the issue mentioned in #6). So this mode will avoid spawning of multiple tasks for creation of merged/{run}/{sample}.RDS and will directly produce merged/dada_merged.{run}.RDS. And, in theory, this mode should have more power in resolving ASVs in comparison with sample-wise inference.

With kind regards,
Vladimir

Node selection Slurm with dadasnake wrapper

Hi I just wanted to know if it is possible with the ./dadasnake command to choose which worker node to run the pipeline on. Perhaps in VARIABLE_CONFIG?

Thanks in advance,
Fred.

WorkflowError: File reporting/primerNumbers_perLibrary.tsv marked for report but does not exist.

I have installed the program and have no issue in running the rest run. However, when i tried to run test dataset using config.16S.yaml, I was prompted with the error message as above. I have modified the config file in the section of raw and output directory, as well as the database for Silva. Complete error message is below:

Building DAG of jobs...
Creating report...
Missing metadata for file workflow.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file primers.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file preprocessing/2/sample_2.fwd.fastq.gz. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
WorkflowError:
File reporting/primerNumbers_perLibrary.tsv marked for report but does not exist.

Please help.

Initializing dada2_env.yml failed

Hello!
I came across your algorithm and wanted to check the possibilities it offers. The idea seems wonderful!

I'm a beginner at bioinformatics, so I'm not able to resolve such complex issues, that I've encountered here. I'm not also an experienced linux user but finally I managed to follow your manual.
However, I'm stuck at "Initialize conda environments: " - it seems dada2_env.yml. might be broken.
I've reinstalled conda and started from scratch several times, but I keep getting exactly the same issue at dada2_env.yml.

I attach the error code below.

(env) pawel@HP:~/dadasnake$ ./dadasnake -i config/config.init.yaml
Initializing conda environments.

Final resource settings:
maxCores: 1
Building DAG of jobs...
Creating conda environment ../workflow/envs/add_R_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/add_R_env.yml created (location: ../conda/082bc6d2876235cd5a86d65fec4a6b48)
Creating conda environment ../workflow/envs/blast_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/blast_env.yml created (location: ../conda/534f9f6712e1af6533276dc176024136)
Creating conda environment ../workflow/envs/dadasnake_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/dadasnake_env.yml created (location: ../conda/dea9e5a1f589b5df38f3d8647e300592)
Creating conda environment ../workflow/envs/fastqc.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/fastqc.yml created (location: ../conda/83107ed4a9dc38962bf38b6777f27edc)
Creating conda environment ../workflow/envs/tax4fun2_env.yml...
Downloading and installing remote packages.
Environment for ../workflow/envs/tax4fun2_env.yml created (location: ../conda/6b9d58a1b41bedc2390b748ad3db6406)
Creating conda environment ../workflow/envs/dada2_env.yml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/pawel/dadasnake/workflow/envs/dada2_env.yml:
  Package                                   Version  Build                Channel                    Size
───────────────────────────────────────────────────────────────────────────────────────────────────────────
  Install:
───────────────────────────────────────────────────────────────────────────────────────────────────────────

  + _libgcc_mutex                               0.1  conda_forge          conda-forge/linux-64     Cached
  + _openmp_mutex                               4.5  1_gnu                conda-forge/linux-64     Cached
  + _r-mutex                                  1.0.1  anacondar_1          conda-forge/noarch       Cached
  + binutils_impl_linux-64                     2.35  h18a2f87_9           conda-forge/linux-64     Cached
  + binutils_linux-64                          2.35  heab0d09_28          conda-forge/linux-64     Cached
  + bioconductor-biobase                     2.48.0  r40h037d062_0        bioconda/linux-64           2MB
  + bioconductor-biocgenerics                0.34.0  r40_0                bioconda/noarch           707kB
  + bioconductor-biocparallel                1.22.0  r40h5f743cb_0        bioconda/linux-64           1MB
  + bioconductor-biostrings                  2.56.0  r40h037d062_0        bioconda/linux-64          15MB
  + bioconductor-dada2                       1.16.0  r40h5f743cb_0        bioconda/linux-64           3MB
  + bioconductor-decipher                    2.16.0  r40h037d062_0        bioconda/linux-64          12MB
  + bioconductor-delayedarray                0.14.0  r40h037d062_0        bioconda/linux-64           2MB
  + bioconductor-genomeinfodb                1.24.0  r40_0                bioconda/noarch             4MB
  + bioconductor-genomeinfodbdata             1.2.3  r40_0                bioconda/noarch             7kB
  + bioconductor-genomicalignments           1.24.0  r40h037d062_0        bioconda/linux-64           2MB
  + bioconductor-genomicranges               1.40.0  r40h037d062_0        bioconda/linux-64           2MB
  + bioconductor-iranges                     2.22.1  r40h037d062_0        bioconda/linux-64           3MB
  + bioconductor-rhtslib                     1.20.0  r40h037d062_0        bioconda/linux-64           2MB
  + bioconductor-rsamtools                    2.4.0  r40h5f743cb_0        bioconda/linux-64           4MB
  + bioconductor-s4vectors                   0.26.0  r40h037d062_0        bioconda/linux-64           2MB
  + bioconductor-shortread                   1.46.0  r40h5f743cb_0        bioconda/linux-64           5MB
  + bioconductor-summarizedexperiment        1.18.1  r40_0                bioconda/noarch             3MB
  + bioconductor-xvector                     0.28.0  r40h037d062_0        bioconda/linux-64         755kB
  + bioconductor-zlibbioc                    1.34.0  r40h037d062_0        bioconda/linux-64         116kB
  + bwidget                                  1.9.14  0                    conda-forge/linux-64     Cached
  + bzip2                                     1.0.8  h516909a_3           conda-forge/linux-64     Cached
  + c-ares                                   1.11.0  h470a237_1           bioconda/linux-64          89kB
  + ca-certificates                       2020.6.20  hecda079_0           conda-forge/linux-64     Cached
  + cairo                                    1.16.0  h3fc0475_1005        conda-forge/linux-64        2MB
  + certifi                               2020.6.20  py38h32f6830_0       conda-forge/linux-64      155kB
  + curl                                     7.71.1  he644dc0_6           conda-forge/linux-64      142kB
  + fontconfig                               2.13.1  h1056068_1002        conda-forge/linux-64      374kB
  + freetype                                 2.10.2  he06d7ca_0           conda-forge/linux-64     Cached
  + fribidi                                  1.0.10  h516909a_0           conda-forge/linux-64      115kB
  + gcc_impl_linux-64                         7.5.0  hdb87b24_16          conda-forge/linux-64       40MB
  + gcc_linux-64                              7.5.0  hf34d7eb_28          conda-forge/linux-64     Cached
  + gettext                                0.19.8.1  hc5be6a0_1002        conda-forge/linux-64     Cached
  + gfortran_impl_linux-64                    7.5.0  h1104b78_16          conda-forge/linux-64        9MB
  + gfortran_linux-64                         7.5.0  ha781d05_28          conda-forge/linux-64     Cached
  + glib                                     2.66.0  h0dae87d_0           conda-forge/linux-64        4MB
  + graphite2                                1.3.13  he1b5a44_1001        conda-forge/linux-64     Cached
  + gsl                                         2.6  h294904e_0           conda-forge/linux-64        3MB
  + gxx_impl_linux-64                         7.5.0  h1104b78_16          conda-forge/linux-64       10MB
  + gxx_linux-64                              7.5.0  ha781d05_28          conda-forge/linux-64     Cached
  + harfbuzz                                  2.7.2  hee91db6_0           conda-forge/linux-64        2MB
  + icu                                        67.1  he1b5a44_0           conda-forge/linux-64       13MB
  + jpeg                                         9d  h516909a_0           conda-forge/linux-64     Cached
  + kernel-headers_linux-64                  2.6.32  h77966d4_13          conda-forge/noarch       Cached
  + krb5                                     1.17.1  hfafb76e_3           conda-forge/linux-64        2MB
  + ld_impl_linux-64                           2.35  h769bd43_9           conda-forge/linux-64     Cached
  + libblas                                   3.8.0  17_openblas          conda-forge/linux-64     Cached
  + libcblas                                  3.8.0  17_openblas          conda-forge/linux-64       11kB
  + libcurl                                  7.71.1  hcdd3856_6           conda-forge/linux-64      320kB
  + libedit                            3.1.20191231  he28a2e2_2           conda-forge/linux-64     Cached
  + libev                                      4.33  h516909a_1           conda-forge/linux-64     Cached
  + libffi                                    3.2.1  he1b5a44_1007        conda-forge/linux-64     Cached
  + libgcc-devel_linux-64                     7.5.0  h42c25f5_16          conda-forge/linux-64        4MB
  + libgcc-ng                                 9.3.0  h24d8f2e_16          conda-forge/linux-64        8MB
  + libgfortran-ng                            7.5.0  hdf63c60_16          conda-forge/linux-64        1MB
  + libgomp                                   9.3.0  h24d8f2e_16          conda-forge/linux-64      387kB
  + libiconv                                   1.16  h516909a_0           conda-forge/linux-64     Cached
  + liblapack                                 3.8.0  17_openblas          conda-forge/linux-64     Cached
  + libnghttp2                               1.41.0  hab1572f_1           conda-forge/linux-64      726kB
  + libopenblas                              0.3.10  pthreads_hb3c22a3_4  conda-forge/linux-64     Cached
  + libpng                                   1.6.37  hed695b0_2           conda-forge/linux-64     Cached
  + libssh2                                   1.9.0  hab1572f_5           conda-forge/linux-64      230kB
  + libstdcxx-devel_linux-64                  7.5.0  h4084dd6_16          conda-forge/linux-64       10MB
  + libstdcxx-ng                              9.3.0  hdf63c60_16          conda-forge/linux-64        4MB
  + libtiff                                   4.1.0  hc7e4089_6           conda-forge/linux-64     Cached
  + libuuid                                  2.32.1  h14c3975_1000        conda-forge/linux-64     Cached
  + libwebp-base                              1.1.0  h516909a_3           conda-forge/linux-64     Cached
  + libxcb                                     1.13  h14c3975_1002        conda-forge/linux-64     Cached
  + libxml2                                  2.9.10  h68273f3_2           conda-forge/linux-64        1MB
  + lz4-c                                     1.9.2  he1b5a44_3           conda-forge/linux-64     Cached
  + make                                        4.3  h516909a_0           conda-forge/linux-64     Cached
  + ncurses                                     6.2  he1b5a44_1           conda-forge/linux-64     Cached
  + openssl                                  1.1.1h  h516909a_0           conda-forge/linux-64        2MB
  + pango                                    1.42.4  h7062337_4           conda-forge/linux-64      533kB
  + pcre                                       8.44  he1b5a44_0           conda-forge/linux-64     Cached
  + pcre2                                     10.35  h2f06484_0           conda-forge/linux-64      701kB
  + pip                                      20.2.3  py_0                 conda-forge/noarch          1MB
  + pixman                                   0.38.0  h516909a_1003        conda-forge/linux-64      608kB
  + pthread-stubs                               0.4  h14c3975_1001        conda-forge/linux-64     Cached
  + python                                    3.8.5  h1103e12_8_cpython   conda-forge/linux-64       23MB
  + python_abi                                  3.8  1_cp38               conda-forge/linux-64        4kB
  + r-assertthat                              0.2.1  r40h6115d3f_2        conda-forge/noarch       Cached
  + r-backports                              1.1.10  r40hcdcec82_0        conda-forge/linux-64       93kB
  + r-base                                    4.0.2  he766273_1           conda-forge/linux-64       25MB
  + r-bh                                   1.72.0_3  r40h6115d3f_1        conda-forge/noarch         11MB
  + r-biocmanager                           1.30.10  r40h6115d3f_1        conda-forge/noarch        106kB
  + r-bit                                     4.0.4  r40hcdcec82_0        conda-forge/linux-64      633kB
  + r-bit64                                   4.0.5  r40hcdcec82_0        conda-forge/linux-64      522kB
  + r-bitops                                  1.0_6  r40hcdcec82_1004     conda-forge/linux-64       40kB
  + r-blob                                    1.2.1  r40h6115d3f_1        conda-forge/noarch         64kB
  + r-callr                                   3.4.4  r40h6115d3f_0        conda-forge/noarch        389kB
  + r-cli                                     2.0.2  r40h6115d3f_1        conda-forge/noarch        405kB
  + r-cluster                                 2.1.0  r40h9bbef5b_3        conda-forge/linux-64      565kB
  + r-colorspace                              1.4_1  r40hcdcec82_2        conda-forge/linux-64        3MB
  + r-crayon                                  1.3.4  r40h6115d3f_1003     conda-forge/noarch        765kB
  + r-dbi                                     1.1.0  r40h6115d3f_1        conda-forge/noarch        685kB
  + r-desc                                    1.2.0  r40h6115d3f_1003     conda-forge/noarch        298kB
  + r-digest                                 0.6.25  r40h0357c0b_2        conda-forge/linux-64      203kB
  + r-ellipsis                                0.3.1  r40hcdcec82_0        conda-forge/linux-64     Cached
  + r-evaluate                                 0.14  r40h6115d3f_2        conda-forge/noarch         83kB
  + r-fansi                                   0.4.1  r40hcdcec82_1        conda-forge/linux-64      200kB
  + r-farver                                  2.0.3  r40h0357c0b_1        conda-forge/linux-64        1MB
  + r-formatr                                   1.7  r40h6115d3f_2        conda-forge/noarch        170kB
  + r-futile.logger                           1.4.3  r40h6115d3f_1003     conda-forge/noarch        110kB
  + r-futile.options                          1.0.1  r40h6115d3f_1002     conda-forge/noarch         27kB
  + r-ggplot2                                 3.3.2  r40h6115d3f_0        conda-forge/noarch          4MB
  + r-glue                                    1.4.2  r40hcdcec82_0        conda-forge/linux-64      145kB
  + r-gtable                                  0.3.0  r40h6115d3f_3        conda-forge/noarch        433kB
  + r-hwriter                                 1.3.2  r40h6115d3f_1003     conda-forge/noarch        177kB
  + r-isoband                                 0.2.2  r40h0357c0b_0        conda-forge/linux-64        3MB
  + r-jpeg                                  0.1_8.1  r40hcdcec82_1        conda-forge/linux-64       52kB
  + r-labeling                                  0.3  r40h6115d3f_1003     conda-forge/noarch         68kB
  + r-lambda.r                                1.2.4  r40h6115d3f_1        conda-forge/noarch        122kB
  + r-lattice                               0.20_41  r40hcdcec82_2        conda-forge/linux-64        1MB
  + r-latticeextra                           0.6_29  r40h6115d3f_1        conda-forge/noarch          2MB
  + r-lifecycle                               0.2.0  r40h6115d3f_1        conda-forge/noarch        114kB
  + r-magrittr                                  1.5  r40h6115d3f_1003     conda-forge/noarch        171kB
  + r-mass                                   7.3_53  r40hcdcec82_0        conda-forge/linux-64        1MB
  + r-matrix                                 1.2_18  r40h7fa42b6_3        conda-forge/linux-64        4MB
  + r-matrixstats                            0.56.0  r40hcdcec82_1        conda-forge/linux-64      925kB
  + r-memoise                                 1.1.0  r40h6115d3f_1004     conda-forge/noarch         43kB
  + r-mgcv                                   1.8_33  r40h7fa42b6_0        conda-forge/linux-64        3MB
  + r-munsell                                 0.5.0  r40h6115d3f_1003     conda-forge/noarch        252kB
  + r-nlme                                  3.1_149  r40h9bbef5b_0        conda-forge/linux-64        2MB
  + r-permute                                 0.9_5  r40h6115d3f_3        conda-forge/noarch        519kB
  + r-pillar                                  1.4.6  r40h6115d3f_0        conda-forge/noarch        199kB
  + r-pkgbuild                                1.1.0  r40h6115d3f_0        conda-forge/noarch        160kB
  + r-pkgconfig                               2.0.3  r40h6115d3f_1        conda-forge/noarch       Cached
  + r-pkgload                                 1.1.0  r40h0357c0b_0        conda-forge/linux-64      171kB
  + r-plogr                                   0.2.0  r40h6115d3f_1003     conda-forge/noarch         20kB
  + r-plyr                                    1.8.6  r40h0357c0b_1        conda-forge/linux-64      850kB
  + r-png                                     0.1_7  r40hcdcec82_1004     conda-forge/linux-64       59kB
  + r-praise                                  1.0.0  r40h6115d3f_1004     conda-forge/noarch         24kB
  + r-prettyunits                             1.1.1  r40h6115d3f_1        conda-forge/noarch       Cached
  + r-processx                                3.4.4  r40hcdcec82_0        conda-forge/linux-64      302kB
  + r-ps                                      1.3.4  r40hcdcec82_0        conda-forge/linux-64      239kB
  + r-r6                                      2.4.1  r40h6115d3f_1        conda-forge/noarch         65kB
  + r-rcolorbrewer                            1.1_2  r40h6115d3f_1003     conda-forge/noarch         60kB
  + r-rcpp                                  1.0.4.6  r40h0357c0b_1        conda-forge/linux-64        2MB
  + r-rcppparallel                            5.0.2  r40h0357c0b_0        conda-forge/linux-64        2MB
  + r-rcurl                                1.98_1.2  r40hcdcec82_1        conda-forge/linux-64      984kB
  + r-reshape2                                1.4.4  r40h0357c0b_1        conda-forge/linux-64      138kB
  + r-rlang                                   0.4.7  r40hcdcec82_0        conda-forge/linux-64        1MB
  + r-rprojroot                               1.3_2  r40h6115d3f_1003     conda-forge/noarch         96kB
  + r-rsqlite                                 2.2.0  r40h0357c0b_2        conda-forge/linux-64        1MB
  + r-rstudioapi                               0.11  r40h6115d3f_1        conda-forge/noarch        272kB
  + r-scales                                  1.1.1  r40h6115d3f_0        conda-forge/noarch        569kB
  + r-snow                                    0.4_3  r40h6115d3f_1002     conda-forge/noarch        126kB
  + r-stringi                                 1.5.3  r40h604b29c_0        conda-forge/linux-64      820kB
  + r-stringr                                 1.4.0  r40h6115d3f_2        conda-forge/noarch        214kB
  + r-testthat                                2.3.2  r40h0357c0b_1        conda-forge/linux-64        1MB
  + r-tibble                                  3.0.3  r40hcdcec82_0        conda-forge/linux-64      395kB
  + r-utf8                                    1.1.4  r40hcdcec82_1003     conda-forge/linux-64      164kB
  + r-vctrs                                   0.3.4  r40hcdcec82_0        conda-forge/linux-64        1MB
  + r-vegan                                   2.5_6  r40hbf399a0_2        conda-forge/linux-64        4MB
  + r-viridislite                             0.3.0  r40h6115d3f_1003     conda-forge/noarch         65kB
  + r-withr                                   2.2.0  r40h6115d3f_1        conda-forge/noarch        232kB
  + r-zeallot                                 0.1.0  r40h6115d3f_1002     conda-forge/noarch       Cached
  + readline                                    8.0  he28a2e2_2           conda-forge/linux-64     Cached
  + sed                                         4.8  hbfbb72e_0           conda-forge/linux-64      269kB
  + seqtk                                       1.3  hed695b0_2           bioconda/linux-64          40kB
  + setuptools                               49.6.0  py38h32f6830_1       conda-forge/linux-64      963kB
  + sqlite                                   3.33.0  h4cf870e_0           conda-forge/linux-64        1MB
  + sysroot_linux-64                           2.12  h77966d4_13          conda-forge/noarch       Cached
  + tk                                       8.6.10  hed695b0_0           conda-forge/linux-64     Cached
  + tktable                                    2.10  h555a92e_3           conda-forge/linux-64     Cached
  + wheel                                    0.35.1  pyh9f0ad1d_0         conda-forge/noarch         30kB
  + xorg-kbproto                              1.0.7  h14c3975_1002        conda-forge/linux-64     Cached
  + xorg-libice                              1.0.10  h516909a_0           conda-forge/linux-64     Cached
  + xorg-libsm                                1.2.3  h84519dc_1000        conda-forge/linux-64     Cached
  + xorg-libx11                              1.6.12  h516909a_0           conda-forge/linux-64     Cached
  + xorg-libxau                               1.0.9  h14c3975_0           conda-forge/linux-64     Cached
  + xorg-libxdmcp                             1.1.3  h516909a_0           conda-forge/linux-64     Cached
  + xorg-libxext                              1.3.4  h516909a_0           conda-forge/linux-64     Cached
  + xorg-libxrender                          0.9.10  h516909a_1002        conda-forge/linux-64     Cached
  + xorg-renderproto                         0.11.1  h14c3975_1002        conda-forge/linux-64     Cached
  + xorg-xextproto                            7.3.0  h14c3975_1002        conda-forge/linux-64     Cached
  + xorg-xproto                              7.0.31  h14c3975_1007        conda-forge/linux-64     Cached
  + xz                                        5.2.5  h516909a_1           conda-forge/linux-64     Cached
  + zlib                                     1.2.11  h516909a_1009        conda-forge/linux-64     Cached
  + zstd                                      1.4.5  h6597ccf_2           conda-forge/linux-64     Cached

  Summary:

  Install: 185 packages

  Total download: 299MB

───────────────────────────────────────────────────────────────────────────────────────────────────────────

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
ERROR conda.core.link:_execute(733): An error occurred while installing package 'bioconda::bioconductor-genomeinfodbdata-1.2.3-r40_0'.
Rolling back transaction: ...working... done
class: LinkError
message:
post-link script failed for package bioconda::bioconductor-genomeinfodbdata-1.2.3-r40_0
location of failed script: /home/pawel/dadasnake/conda/e113b36d0031c615c2a05b811c538959/bin/.bioconductor-genomeinfodbdata-post-link.sh
==> script messages <==
<None>
==> script output <==
stdout: /home/pawel/dadasnake/conda/e113b36d0031c615c2a05b811c538959/share/bioconductor-genomeinfodbdata-1.2.3-0/GenomeInfoDbData_1.2.3.tar.gz: FAILED
/home/pawel/dadasnake/conda/e113b36d0031c615c2a05b811c538959/share/bioconductor-genomeinfodbdata-1.2.3-0/GenomeInfoDbData_1.2.3.tar.gz: FAILED
/home/pawel/dadasnake/conda/e113b36d0031c615c2a05b811c538959/share/bioconductor-genomeinfodbdata-1.2.3-0/GenomeInfoDbData_1.2.3.tar.gz: FAILED
ERROR: post-link.sh was unable to download any of the following URLs with the md5sum 720784da6bddbd4e18ab0bccef6b0a95:
https://bioconductor.org/packages/3.11/data/annotation/src/contrib/GenomeInfoDbData_1.2.3.tar.gz
https://bioarchive.galaxyproject.org/GenomeInfoDbData_1.2.3.tar.gz
https://depot.galaxyproject.org/software/bioconductor-genomeinfodbdata/bioconductor-genomeinfodbdata_1.2.3_src_all.tar.gz

stderr:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   416  100   416    0     0   1320      0 --:--:-- --:--:-- --:--:--  1324
md5sum: WARNING: 1 computed checksum did NOT match
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   153  100   153    0     0    193      0 --:--:-- --:--:-- --:--:--   192
md5sum: WARNING: 1 computed checksum did NOT match
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   153  100   153    0     0    194      0 --:--:-- --:--:-- --:--:--   194
md5sum: WARNING: 1 computed checksum did NOT match

return code: 1

kwargs:
{}

Traceback (most recent call last):
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/exceptions.py", line 1129, in __call__
    return func(*args, **kwargs)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda_env/cli/main.py", line 80, in do_call
    exit_code = getattr(module, func_name)(args, parser)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/notices/core.py", line 72, in wrapper
    return_value = func(*args, **kwargs)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda_env/cli/main_create.py", line 156, in execute
    result[installer_type] = installer.install(prefix, pkg_specs, args, env)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/mamba/mamba_env.py", line 173, in mamba_install
    handle_txn(conda_transaction, prefix, args, True)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/mamba/linking.py", line 44, in handle_txn
    unlink_link_transaction.execute()
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/core/link.py", line 284, in execute
    self._execute(tuple(concat(interleave(self.prefix_action_groups.values()))))
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/core/link.py", line 747, in _execute
    raise CondaMultiError(tuple(concatv(
conda.CondaMultiErrorclass: LinkError
message:
post-link script failed for package bioconda::bioconductor-genomeinfodbdata-1.2.3-r40_0
location of failed script: /home/pawel/dadasnake/conda/e113b36d0031c615c2a05b811c538959/bin/.bioconductor-genomeinfodbdata-post-link.sh
==> script messages <==
<None>
==> script output <==
stdout: /home/pawel/dadasnake/conda/e113b36d0031c615c2a05b811c538959/share/bioconductor-genomeinfodbdata-1.2.3-0/GenomeInfoDbData_1.2.3.tar.gz: FAILED
/home/pawel/dadasnake/conda/e113b36d0031c615c2a05b811c538959/share/bioconductor-genomeinfodbdata-1.2.3-0/GenomeInfoDbData_1.2.3.tar.gz: FAILED
/home/pawel/dadasnake/conda/e113b36d0031c615c2a05b811c538959/share/bioconductor-genomeinfodbdata-1.2.3-0/GenomeInfoDbData_1.2.3.tar.gz: FAILED
ERROR: post-link.sh was unable to download any of the following URLs with the md5sum 720784da6bddbd4e18ab0bccef6b0a95:
https://bioconductor.org/packages/3.11/data/annotation/src/contrib/GenomeInfoDbData_1.2.3.tar.gz
https://bioarchive.galaxyproject.org/GenomeInfoDbData_1.2.3.tar.gz
https://depot.galaxyproject.org/software/bioconductor-genomeinfodbdata/bioconductor-genomeinfodbdata_1.2.3_src_all.tar.gz

stderr:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   416  100   416    0     0   1320      0 --:--:-- --:--:-- --:--:--  1324
md5sum: WARNING: 1 computed checksum did NOT match
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   153  100   153    0     0    193      0 --:--:-- --:--:-- --:--:--   192
md5sum: WARNING: 1 computed checksum did NOT match
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   153  100   153    0     0    194      0 --:--:-- --:--:-- --:--:--   194
md5sum: WARNING: 1 computed checksum did NOT match

return code: 1

kwargs:
{}

: <exception str() failed>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pawel/dadasnake/conda/snakemake_env/bin/mamba", line 11, in <module>
    sys.exit(main())
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/mamba/mamba.py", line 923, in main
    return mamba_env.main()
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/mamba/mamba_env.py", line 196, in main
    return conda_env_main()
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda_env/cli/main.py", line 91, in main
    return conda_exception_handler(do_call, args, parser)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/exceptions.py", line 1429, in conda_exception_handler
    return_value = exception_handler(func, *args, **kwargs)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/exceptions.py", line 1132, in __call__
    return self.handle_exception(exc_val, exc_tb)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/exceptions.py", line 1161, in handle_exception
    return self.handle_application_exception(exc_val, exc_tb)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/exceptions.py", line 1175, in handle_application_exception
    self._print_conda_exception(exc_val, exc_tb)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/exceptions.py", line 1179, in _print_conda_exception
    print_conda_exception(exc_val, exc_tb)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/exceptions.py", line 1106, in print_conda_exception
    stderrlog.error("\n%r\n", exc_val)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/logging/__init__.py", line 1506, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/logging/__init__.py", line 1624, in _log
    self.handle(record)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/logging/__init__.py", line 1633, in handle
    if (not self.disabled) and self.filter(record):
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/logging/__init__.py", line 821, in filter
    result = f.filter(record)
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/gateways/logging.py", line 50, in filter
    record.msg = record.msg % new_args
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/__init__.py", line 107, in __repr__
    errs.append(e.__repr__())
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/__init__.py", line 64, in __repr__
    return '%s: %s' % (self.__class__.__name__, str(self))
  File "/home/pawel/dadasnake/conda/snakemake_env/lib/python3.10/site-packages/conda/__init__.py", line 68, in __str__
    return str(self.message % self._kwargs)
ValueError: unsupported format character 'T' (0x54) at index 1132

use snakemake wrappers and/or contribute to Snakemake workflows

@jafors just pointed me to this workflow and your paper, @a-h-b. Very impressive workflow (and very cool logo!;).

He also remembered, that we recently had a lot of dada2 wrappers added to the snakemake wrappers (in case you haven't seen this project, yet, it's simply reusable and reproducible building blocks for snakemake workflows, including environment definitions and sometimes dedicated input/output handling). @cpauvert contributed those wrappers and even included dada2 meta-wrappers that chain together many of the individual wrappers in something of a subworkflow. From the discussions on these, I remembered he was also planning to create a workflow at some point. So maybe this could be a good fit to team up and use these wrappers in dadasnake?

And if you feel like even more standardisation and sharing with the workflow, have a look at the snakemake workflows project. That's intended as a growing library of standardized, easy-to-use snakemake worfklows, so dadasnake looks like a perfect fit.

Setup and cutadapt cyclic dependency issue

Hi - sorry for multiple questions but I really want to use this pipeline!

I set up (installed the pipeline as per the readme but when trying to run (on a slurm system using -c) I get the message:
" The person who set up dadasnake disabled changing resource settings." and the pipeline then only claims one core. How can I re enable the resource setting change?
When trying a dryrun I get an error from the cutadapt.smk rule combine_or_rename
"CyclicGraphException in line 22 of /dadasnake/workflow/rules/cutadapt.smk:
Cyclic dependency on rule combine_or_rename."

It is not so clear why this error is occurring - could it be to do with the samples.tsv file?

Thanks again for your time!

submitting to cluster not working

Hi there,

I've installed dadasnake on our computing server (slurm) and am trying out the trial run using this command:

./dadasnake -c -n "TESTRUN" -r config/config.test.yaml

the error I get is that there's no "--time" variable set for sbatch.:

Any thoughts on how to fix it or how to include the time parameter into the submission?

Your help is greatly appreciated!!

Thanks,

Varada

ITSxpress support

Hello,

I like dadasnake and I am using it for processing 16S rRNA amplicons. I have some data from ITS and saw there is support for ITSx. However, there is an newer alternative for ITSx that uses fastq (instead of fasta) and should be better suitable for ASVs, according to their description. Is there any plan to support ITSxpress?

Thank you

Multiple cores when using option -l

How can I use multiple cores when working on a single server, not a cluster.
The -l option uses a single core.

VARIABLE_CONFIG?

SCHEDULER - insert the name of the scheduler you want to use (currently slurm or uge). This determines the cluster config given to snakemake, e.g. the cluster config file for slurm is config/slurm.config.yaml . Also check that the settings in this file is correct. If you have a different system, contact us ( https://github.com/a-h-b/dadasnake/issues ).

re the above quote, you kindly mentioned to contact you in case I am not using slurm, I am on ubuntu (server name is calling) so what should I write down in the config file under SCHEDULER?

SUBMIT_COMMAND - insert the bash command you'll usually use to submit a job to your cluster to run on a single cpu for a few days. You only need this, if you want to have the snakemake top instance running in a submitted job. You alternatively have the option to run it on the frontend via tmux. Leave empty, if you want to use this frontend version and have tmux installed.

I won't use tmux and I will be using the code (nice -n 19 .....) to invoke snakemake under SUBMIT_COMMAND in case needed? or it is fine to add 19 under MAX_THREADS otherwise?

Error in rule picrust2

hI @a-h-b,

I have tried ./dadasnake -c -n "TESTRUN" -r config/config.test.yaml and get following error:

Error in rule picrust2:
    jobid: 70
    input: post/filtered.seqTab.biom, post/filtered.seqs.fasta
    output: post/picrust2_output
    log: logs/picrust2.log (check log file(s) for error message)
    conda-env: /usr/users/bheimbu/bin/Huizhen/dadasnake/conda/9ea92eb48e6067ce75da7c42ebece00a_
    shell:
        
        if grep --quiet OTU_ post/filtered.seqs.fasta; then
           echo "replacing OTU with ASV in seqs" > logs/picrust2.log
           TMPD=$(mktemp -d -t --tmpdir=/home/uni08/bheimbu/bin/Huizhen/dadasnake/test_output/tmp "XXXXXX")
           SEQS=$TMPD/seqs.fa
           sed 's#OTU_#ASV_#g' post/filtered.seqs.fasta > $SEQS
        else
           SEQS=post/filtered.seqs.fasta
        fi
        picrust2_pipeline.py -s $SEQS -i post/filtered.seqTab.biom -o post/picrust2_output -p 12          --stratified --per_sequence_contrib  --max_nsti 2              --min_reads 1          --min_samples 1  -t epa-ng          --in_traits EC,KO               -e 0.5 -m mp          --min_align 0 &>> logs/picrust2.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Submitted batch job 16597700

My version of snakemake is 7.18.2, installed locally.

Cheers Bastian

multiple primer sets

Hi all,

what about incorporating multiple primer set from a multiplex approach (i.e. different genes/regions)?

Activating conda environment: /home/minion/git/dadasnake/conda/cb1403239da0c2b661e2126fd0da49e1
[Tue Dec 14 16:08:35 2021]
Error in rule primer_numbers:
jobid: 16
output: reporting/primerNumbers_perLibrary.tsv, reporting/primerNumbers_perSample.tsv
log: logs/countPrimerReads.log (check log file(s) for error message)
conda-env: /home/minion/git/dadasnake/conda/cb1403239da0c2b661e2126fd0da49e1

Originally posted by @thierryjanssens in #11 (comment)

snakemake: error: argument --configfile: expected one argument

Hi,
i downloaded the dadasnake from github by git clone https://github.com/a-h-b/dadasnake.git command
then i installed snakemake in conda env as described in step 4 of the installation guide
after that i tried step 6 of the guide to Initialize conda environments using command /home/anil/dadasnake/submit_scripts/dadasnake_withReport.sh -i /home/anil/dadasnake/config/config.init.yaml and i got the following error
usage: snakemake [-h] [--profile PROFILE] [--snakefile FILE] [--gui [PORT]]
[--cores [N]] [--local-cores N]
[--resources [NAME=INT [NAME=INT ...]]]
[--config [KEY=VALUE [KEY=VALUE ...]]] [--configfile FILE]
[--list] [--list-target-rules] [--directory DIR] [--dryrun]
[--printshellcmds] [--debug-dag] [--dag]
[--force-use-threads] [--rulegraph] [--d3dag] [--summary]
[--detailed-summary] [--archive FILE] [--touch]
[--keep-going] [--force] [--forceall]
[--forcerun [TARGET [TARGET ...]]]
[--prioritize TARGET [TARGET ...]]
[--until TARGET [TARGET ...]]
[--omit-from TARGET [TARGET ...]] [--allow-ambiguity]
[--cluster CMD | --cluster-sync CMD | --drmaa [ARGS]]
[--drmaa-log-dir DIR] [--cluster-config FILE]
[--immediate-submit] [--jobscript SCRIPT] [--jobname NAME]
[--cluster-status CLUSTER_STATUS] [--kubernetes [NAMESPACE]]
[--kubernetes-env ENVVAR [ENVVAR ...]]
[--container-image IMAGE] [--reason] [--stats FILE]
[--nocolor] [--quiet] [--nolock] [--unlock]
[--cleanup-metadata FILE [FILE ...]] [--rerun-incomplete]
[--ignore-incomplete] [--list-version-changes]
[--list-code-changes] [--list-input-changes]
[--list-params-changes] [--latency-wait SECONDS]
[--wait-for-files [FILE [FILE ...]]] [--benchmark-repeats N]
[--notemp] [--keep-remote] [--keep-target-files]
[--keep-shadow]
[--allowed-rules ALLOWED_RULES [ALLOWED_RULES ...]]
[--max-jobs-per-second MAX_JOBS_PER_SECOND]
[--max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND]
[--restart-times RESTART_TIMES] [--attempt ATTEMPT]
[--timestamp] [--greediness GREEDINESS] [--no-hooks]
[--print-compilation]
[--overwrite-shellcmd OVERWRITE_SHELLCMD] [--verbose]
[--debug] [--runtime-profile FILE] [--mode {0,1,2}]
[--bash-completion] [--use-conda] [--conda-prefix DIR]
[--create-envs-only] [--list-conda-envs] [--use-singularity]
[--singularity-prefix DIR] [--singularity-args ARGS]
[--wrapper-prefix WRAPPER_PREFIX]
[--default-remote-provider {S3,GS,FTP,SFTP,S3Mocked,gfal,gridftp}]
[--default-remote-prefix DEFAULT_REMOTE_PREFIX]
[--no-shared-fs] [--version]
[target [target ...]]
snakemake: error: argument --configfile: expected one argument
please help me with the issue, did i ran a wrong command or i forgot something during installation
Thank you

Workflow error

Hi all,
I've encountered a recurring issue when using the dadasnake wrapper. The job stops after a few seconds and I end up with the following error in the slurm report (.out file) after running my personal config file: ./dadasnake -c -r config/config.16S.seb.yaml

bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
Building DAG of jobs...
Creating report...
Missing metadata for file workflow.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file primers.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file dada.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file taxonomy.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file postprocessing.done. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file sequenceTables/all.seqTab.biom. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
Missing metadata for file preprocessing/1/NS10_Pminus_1_71610-A01_GAACTGAGCGCGCTCCACGA_L001_R1_001_AHC3HTDRXY.filt.fastq.gz.fwd.fastq. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.
[...]
WorkflowError:
File reporting/readNumbers.tsv marked for report but does not exist.
File "/home/sjaupit/personal/dadasnake/conda/snakemake_env/lib/python3.7/site-packages/snakemake/report/init.py", line 629, in auto_report

Dadasnake is set up to submit the process to a cluster.
No particular error seem to show up when doing a dry run.
The config file parameters and the sample_table don't seem be the cause of the error.
I've tried to remove, reinstall and rerun dadasnake in different directories and still get the same error.
All output files and directory were deleted after each run.

I would greatly appreciate some insights or ideas on what could be the source of the issue here.
Thanks in advance !

add rule to make blast DBs

because it's not done automatically yet

pygments.util.ClassNotFound: no lexer for alias None found - bug or documention

Hi,
I'm trying to do the test run and have successfully generated a workflow.done file but also get the errors / warnings below. The docs mention "Don't worry if you see a few warnings from mv, such as mv: cannot stat ‘slurm*’: No such file or directory" but these errors seem more extensive than just that? (below I've copied the output from a re-run of the snakemake, hence the "Nothing to be done." However, the error/warning outputs were identical in the first run). Also the report.html file is empty.

Cheers.

OS: 16.04.6
dadasnake installed via github clone.

(base) olin@gru:/scratch/olin/dadasnake$ ./dadasnake -l -n "TESTRUN" -r config/config.test.yaml
Running workflow in current session - don't use this setting except with small datasets.
Building DAG of jobs...
Nothing to be done.
Complete log: /scratch/olin/dadasnake/.snakemake/log/2020-06-17T104604.069252.snakemake.log
mv: cannot stat 'slurm*': No such file or directory
mv: cannot stat 'snakejob.*': No such file or directory
mv: cannot stat '*log': No such file or directory
mv: cannot stat '*logfile': No such file or directory
Building DAG of jobs...
Creating report...
Adding readNumbers.tsv (0.00037 MB).
Adding primerNumbers_perLibrary.tsv (0.00044 MB).
Adding primerNumbers_perSample.tsv (0.00039 MB).
Adding finalNumbers_perSample.tsv (0.00053 MB).
Adding QC_1.1.fwd.pdf (0.1 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_1.1.fwd.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_1.1.rvs.pdf (0.12 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_1.1.rvs.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_1.2.fwd.pdf (0.029 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_1.2.fwd.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_1.2.rvs.pdf (0.033 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_1.2.rvs.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_filtered.1.fwd.pdf (0.02 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_filtered.1.fwd.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_filtered.1.rvs.pdf (0.017 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_filtered.1.rvs.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_filtered.2.fwd.pdf (0.012 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_filtered.2.fwd.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_filtered.2.rvs.pdf (0.011 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_filtered.2.rvs.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding mergedNumbers_perLibrary.tsv (0.00055 MB).
Adding mergedNumbers_perSample.tsv (0.00047 MB).
Adding filteredNumbers_perLibrary.tsv (0.00052 MB).
Adding filteredNumbers_perSample.tsv (0.00045 MB).
Traceback (most recent call last):
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/report/__init__.py", line 224, in code
    lexer = get_lexer_by_name(language)
  File "/home/olin/miniconda3/lib/python3.7/site-packages/pygments/lexers/__init__.py", line 107, in get_lexer_by_name
    raise ClassNotFound('no lexer for alias %r found' % _alias)
pygments.util.ClassNotFound: no lexer for alias None found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/__init__.py", line 547, in snakemake
    export_cwl=export_cwl)
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/workflow.py", line 514, in execute
    auto_report(dag, report)
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/report/__init__.py", line 577, in auto_report
    pygments_css=HtmlFormatter(style="trac").get_style_defs('.source')))
  File "/home/olin/miniconda3/lib/python3.7/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/home/olin/miniconda3/lib/python3.7/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/home/olin/miniconda3/lib/python3.7/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/report/report.html", line 474, in top-level template code
    {{ rule.code()|safe }}
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/report/__init__.py", line 226, in code
    except pygments.utils.ClassNotFound:
NameError: name 'pygments' is not defined
(base) olin@gru:/scratch/olin/dadasnake$ ./dadasnake -l -n "TESTRUN" -r config/config.test.yaml
Running workflow in current session - don't use this setting except with small datasets.
Building DAG of jobs...
Nothing to be done.
Complete log: /scratch/olin/dadasnake/.snakemake/log/2020-06-17T104930.617740.snakemake.log
mv: cannot stat 'slurm*': No such file or directory
mv: cannot stat 'snakejob.*': No such file or directory
mv: cannot stat '*log': No such file or directory
mv: cannot stat '*logfile': No such file or directory
Building DAG of jobs...
Creating report...
Adding readNumbers.tsv (0.00037 MB).
Adding primerNumbers_perLibrary.tsv (0.00044 MB).
Adding primerNumbers_perSample.tsv (0.00039 MB).
Adding finalNumbers_perSample.tsv (0.00053 MB).
Adding QC_1.1.fwd.pdf (0.1 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_1.1.fwd.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_1.1.rvs.pdf (0.12 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_1.1.rvs.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_1.2.fwd.pdf (0.029 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_1.2.fwd.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_1.2.rvs.pdf (0.033 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_1.2.rvs.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_filtered.1.fwd.pdf (0.02 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_filtered.1.fwd.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_filtered.1.rvs.pdf (0.017 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_filtered.1.rvs.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_filtered.2.fwd.pdf (0.012 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_filtered.2.fwd.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding QC_filtered.2.rvs.pdf (0.011 MB).
Failed to convert image to png with imagemagick convert: b"convert: not authorized `stats/QC_filtered.2.rvs.pdf' @ error/constitute.c/ReadImage/412.\nconvert: no images defined `png:-' @ error/convert.c/ConvertImageCommand/3210.\n"
Adding mergedNumbers_perLibrary.tsv (0.00055 MB).
Adding mergedNumbers_perSample.tsv (0.00047 MB).
Adding filteredNumbers_perLibrary.tsv (0.00052 MB).
Adding filteredNumbers_perSample.tsv (0.00045 MB).
Traceback (most recent call last):
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/report/__init__.py", line 224, in code
    lexer = get_lexer_by_name(language)
  File "/home/olin/miniconda3/lib/python3.7/site-packages/pygments/lexers/__init__.py", line 107, in get_lexer_by_name
    raise ClassNotFound('no lexer for alias %r found' % _alias)
pygments.util.ClassNotFound: no lexer for alias None found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/__init__.py", line 547, in snakemake
    export_cwl=export_cwl)
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/workflow.py", line 514, in execute
    auto_report(dag, report)
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/report/__init__.py", line 577, in auto_report
    pygments_css=HtmlFormatter(style="trac").get_style_defs('.source')))
  File "/home/olin/miniconda3/lib/python3.7/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/home/olin/miniconda3/lib/python3.7/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/home/olin/miniconda3/lib/python3.7/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/report/report.html", line 474, in top-level template code
    {{ rule.code()|safe }}
  File "/home/olin/miniconda3/lib/python3.7/site-packages/snakemake/report/__init__.py", line 226, in code
    except pygments.utils.ClassNotFound:
NameError: name 'pygments' is not defined

complaining about GB locale

I finally got the snake to work but a number of Jobs are failing, reporting issues with locale variables

one example below.
my locales are defined as US

$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

Can I do something to fix this or ignore it?

Activating conda environment: /opt/biotools/dadasnake/conda/b7228a92baaacc5aaa092b362d244b49
/usr/bin/bash: line 1: warning: setlocale: LC_ALL: cannot change locale (en_GB.utf8): No such file or directory
[Tue Feb 22 14:10:58 2022]
Error in rule multiqc:
    jobid: 87
    output: stats/multiqc_filtered_report_data, stats/multiqc_filtered_report.html
    log: logs/multiqc_filtered.log (check log file(s) for error message)
    conda-env: /opt/biotools/dadasnake/conda/b7228a92baaacc5aaa092b362d244b49
    shell:
        
        export LC_ALL=en_GB.utf8
        export LANG=en_GB.utf8
        multiqc -n stats/multiqc_filtered_report.html stats/fastqc_filtered >> logs/multiqc_filtered.log 2>&1
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.

digging in the log for multiqc I read this:

cat logs/multiqc_filtered.log
Traceback (most recent call last):
  File "/opt/biotools/dadasnake/conda/b7228a92baaacc5aaa092b362d244b49/bin/multiqc", line 6, in <module>
    from multiqc.__main__ import multiqc
  File "/opt/biotools/dadasnake/conda/b7228a92baaacc5aaa092b362d244b49/lib/python3.6/site-packages/multiqc/__main__.py", line 53, in <module>
    multiqc.run_cli(prog_name="multiqc")
  File "/opt/biotools/dadasnake/conda/b7228a92baaacc5aaa092b362d244b49/lib/python3.6/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/opt/biotools/dadasnake/conda/b7228a92baaacc5aaa092b362d244b49/lib/python3.6/site-packages/click/core.py", line 1043, in main
    _verify_python_env()
  File "/opt/biotools/dadasnake/conda/b7228a92baaacc5aaa092b362d244b49/lib/python3.6/site-packages/click/_unicodefun.py", line 100, in _verify_python_env
    raise RuntimeError("\n\n".join(extra))
RuntimeError: Click will abort further execution because Python was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/unicode-support/ for mitigation steps.

This system supports the C.UTF-8 locale which is recommended. You might be able to resolve your issue by exporting the following environment variables:

    export LC_ALL=C.UTF-8
    export LANG=C.UTF-8

Click discovered that you exported a UTF-8 locale but the locale system could not pick up from it because it does not exist. The exported locale is 'en_GB.utf8' but it is not supported.

make blast for all OTUs an option

error: "Could not create conda environment from ... "

Hey there,
(problem)
(Mac OS) having an error after step 6 of the Dadasnake tutorial; "Initialize conda environments: This run sets up the conda environments that will be usable by all users:"

when running:
./dadasnake -i config/config.init.yaml

I get the following error message:
Initializing conda environments.
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

Then runs and crashes into:
CreateCondaEnvironmentException:
Could not create conda environment from /Users/.....

I have tried reinstalling everything and removing environments, also changing shell
conda init bash or conda init zsh but the error persists. Moreover, a new set of "Encountered problems while solving:" appeared each time.

Thanks!

Screenshot:
image

Add ITSx result to table if taxonomy isn't based on it.

It's currently only done if the taxonomy is based on the ITSx result

Feature request / general question re: parameter sweeps for DADA2

Hi Anna and co-authors,

Thanks for the wonderful work on this pipeline - it's really a great resource for the whole community.

Q/request: I am interested in using dadasnake for doing parameter sweeps to test the effect of denoising parameters on the quality of DADA2's ASVs.

Background: We have noticed that DADA2 can create spurious ASVs, based on results from sequencing mock communities. In our experience, this is rare (and only affects specific sequencing runs), but is potentially problematic for us as these spurious ASVs can comprise up to 7 or 8 % of the mock community reads and therefore presumably are also causing similar problems in the environmental samples. Most of these artifacts are 1-mismatches to the true mock sequence so we believe it to be an artifact of DADA2's processing not contamination or bleedthrough. So, we want to try a variety of different parameters recommended by Ben Callahan et al in a combinatorial manner to see if we can eliminate these artifacts. I have experience doing similar parameter sweeps with snakemake before and your pipeline looks to be an excellent place for me to at least begin this kind of analysis. Parameters of interest would be those contained in config/config.default.yaml:

dada:
  band_size: 16
  homopolymer_gap_penalty: NULL
  pool: false
  omega_A: 1e-40
  priors: ""
  omega_P: 1e-4
  omega_C: 1e-40
  gapless: true
  selfConsist: false
  no_error_assumptions: false
  kdist_cutoff: 0.42
  match: 4
  mismatch: -5
  gap_penalty: -8
  errorEstimationFunction: loessErrfun
  use_quals: true

For the above parameters, I would try to figure out what a good range of values would be and then run DADA2 for each combination of relevant parameters to try and empirically "tune" DADA2 to see if I can make the artifactual ASVs disappear.

My understanding of dadasnake's mode of operation: Based on my understanding, each type of operation (e.g. dada2-paired) reads input from a YAML config which is then passed to the R scripts via snakemake. As such, it does not seem completely straightforward to run parameter sweeps using snakemake's built-in ability to expand parameters.

Potential solutions: Without altering your pipeline, it seems one solution would be to define a large number of config files corresponding to the desired parameter sweeps. But this seems a bit unwieldy so I was thinking the best way to do it would be to use your config files and R scripts to run DADA2, but write my own Snakefile defining the parameter sweeps I want. If so, I will probably fork your repo and try to define an additional rule for this kind of scenario.

Do you have any advice on this? Am I missing an easy way to implement my desired behaviour with the pipeline as currently written?

Thanks a lot for your advice and again for making these really great scripts available for the benefit of the whole community.

Cheers,
Jesse

Multiple primers per sample

Hi, I have an ITS dataset with multiple forward and reverse primers per sample. Is it possible to add multiple (slightly different) primers to the config file and if so how should that be formatted?

Question on representative sequences

Hi ahb,
many thanks for including the post-clustering step. I have two questions related to the fasta-files (representative sequences) after post-processing:

Is the filtered.seqs.fasta file based on the most abundant ASV within a cluster? And are these sequences represented as Row.names in the final table as we already know from ASV tables without clustering? Or maybe on centroid sequences?
The filtered.consensus.fasta file is based on the definition from vsearch (=taking the majority symbol (nucleotide or gap) from each column of the alignment), right?
Thanks, ju(is)mo

Nanopore data

Hi all,

I am running an analysis on Naopore amplicon data (without primer processing, since it is a multiplex study, which I want to trim and treat further downstream) but get the following error in the job dad_poolTabs:

[1] "Removing chimeras"
Error in S4Vectors:::normarg_names(value, class(x), length(x)) :
attempt to set too many names (2) on GroupedIRanges object of length 0
Calls: names<- -> names<- -> names<- -> names<- ->
Execution halted

I have no clue where to start debugging. Any suggestions?

$ nano ./config/config.nanoporetest.yaml

raw_directory: /home/test
sample_table: /home/test/sample_table.tsv
outputdir: /home/test/output
do_dada: true
do_primers: no
do_taxonomy: no
paired: false
primer_cutting:
overlap: 12
perc_mismatch: 0.25
indels: ''
count: 1
both_primers_in_read: true
primers:
fwd:
sequence: AGRGTTTGATCMTGGCTCAG
name: 8F
rvs:
sequence: GGGCGGWGTGTACAAG
name: 1387R
sequencing_direction: fwd_1
filtering:
trunc_length:
fwd: 0
trunc_qual:
fwd: 0
max_EE:
fwd: Inf
minLen:
fwd: 500
maxLen:
fwd: Inf
minQ:
fwd: 0
dada:
pool: true
band_size: 32
homopolymer_gap_penalty: -1
use_quals: true
omega_C: 1
omega_A: 1e-30
gapless: false
no_error_assumptions: false
errorEstimationFunction: noqualErrfun
selfConsist: false
chimeras:
remove: true
method: pooled
minFoldParentOverAbundance: 3.5
final_table_filtering:
do: false
postprocessing:
funguild:
do: false
rarefaction_curve: true
treeing:
do: false
ITSx:
run: false
taxonomy:
decipher:
do: false
mothur:
cutoff: 60
db_path: "../DBs/amplicon"
tax_db: "SILVA_138_SSURef_NR99_prok"
do: true
post_ITSx: false
tmp_dir: $USER/tmp
email: ''

mothur database issue

Hi Anna,

I tried to download files from the mothur and decipher sites as suggested in the readme but when I link them in the config, mothur does not seem to work.

for mothur I downloaded from thir wiki link Full-length sequences and taxonomy references and untar'ed to /data/biodata/mothur_taxonomy:

silva.nr_v138_1.align
silva.nr_v138_1.fasta
silva.seed_v138_1.align
silva.seed_v138_1.tax

these 4 files do not match the command reported in the error log

for decipher I downloaded and unzipped to /data/biodata/decipher_taxonomy

SILVA_SSU_r138_2019.RData

Then I edited my yaml as:

taxonomy:
  mothur:
    do: true
    db_path: "/data/biodata/mothur_taxonomy"
    tax_db: "silva.nr_v138_1"
  decipher:
    do: false
    db_path: "/data/biodata/decipher_taxonomy"
    tax_db: "SILVA_SSU_r138_2019.RData"
    db_short_names: "SILVA_SSU_r138"

If not correct, would you have a primer on how to install SSU references for mothur.
Or at least the listing of your database folder to see which files are expected there.

Thanks in advance

mothur error log

Script Mode


mothur > set.dir(tempdefault=/data/biodata/mothur_taxonomy)
Mothur's directories:
tempDefault=/data/biodata/mothur_taxonomy/

mothur > 
            classify.seqs(fasta=sequenceTables/all.seqs.for_SILVA_138_SSURef_NR99_cut.fasta, template=silva.nr_v138_1.fasta, taxonomy=silva.nr_v138_1.taxonomy, cutoff=60, method=wang, processors=1)

Using 1 processors.
Unable to open silva.nr_v138_1.fasta. Trying default /data/biodata/mothur_taxonomy/silva.nr_v138_1.fasta.
Unable to open /data/biodata/mothur_taxonomy/silva.nr_v138_1.fasta. Trying mothur's executable location /opt/biotools/dadasnake/conda/ec4abc68013d874fa157380f1e65649f/bin/silva.nr_v138_1.fasta.
Unable to open /opt/biotools/dadasnake/conda/ec4abc68013d874fa157380f1e65649f/bin/silva.nr_v138_1.fasta.
Unable to open silva.nr_v138_1.fasta
Unable to open silva.nr_v138_1.taxonomy. Trying default /data/biodata/mothur_taxonomy/silva.nr_v138_1.taxonomy.
Unable to open /data/biodata/mothur_taxonomy/silva.nr_v138_1.taxonomy. Trying mothur's executable location /opt/biotools/dadasnake/conda/ec4abc68013d874fa157380f1e65649f/bin/silva.nr_v138_1.taxonomy.
Unable to open /opt/biotools/dadasnake/conda/ec4abc68013d874fa157380f1e65649f/bin/silva.nr_v138_1.taxonomy.
Unable to open silva.nr_v138_1.taxonomy
[ERROR]: did not complete classify.seqs.

mothur > quit()

Rule `dada_dadaSingle` works only in a single-thread

Hello!

We have 10 sequencing runs, each with ~100 samples.
We run Dadasnake on a desktop (in -l mode).
Dadasnake works pretty well at the first stages (filtering and error estimation), however when it comes to the dada_dadaSingle rule, it switches to the sequential analysis of samples (not in parallel).
If we terminate and resume the workflow, Snakemake starts 8 processes at first, but after they finish it proceeds in single-thread mode only (one sample at time). However, it should be enough of resources to proceed with all 8 cores.

The command we are using:

dadasnake -t 8 -l  config.pacbioCCS_vm.yaml

with

big_data: false
dada:
  pool: false

so the main sub-workflow is dada.single.smk.

I've tried to remove the resources section in the rules, and to decrease NORMAL_MEM_EACH to 3G in VARIABLE_CONFIG. But it does not help.
Could you please tell us where the problem could be?

With kind regards,
Vladimir

install instructions using cluster snakemake module

Do you have any particular instructions for installing and running dadasnake using an already installed snakemake module (6.10.0) on our cluster? I get the following while attempting the test run, after completing steps 1-3.

user@cpu018:/dadasnake$ ./dadasnake -l -n "TESTRUN" -r config/config.test.yaml
Running workflow in current session - don't use this setting except with small datasets (e.g. the test data set is okay).
Removing uri version main
Loading uri version main
Error: mamba package manager is not available. The mamba package manager (https://github.com/mamba-org/mamba) is an extremely fast and robust conda replacement. It is the recommended way of using Snakemake's conda integration. It can be installed with `conda install -n base -c conda-forge mamba.If you still prefer to use conda, you can enforce that by setting `--conda-frontend conda`.
Error: mamba package manager is not available. The mamba package manager (https://github.com/mamba-org/mamba) is an extremely fast and robust conda replacement. It is the recommended way of using Snakemake's conda integration. It can be installed with `conda install -n base -c conda-forge mamba.If you still prefer to use conda, you can enforce that by setting `--conda-frontend conda`.```

Slurm Latency Issues - how to add --latency-wait into config

Hi, final issue/question (hopefully) before the pipeline is running nicely on our server.

I am running on a slurm cluster and sometimes get problems with the suggestion of adding in --latency-wait time.
I am unsure where or how to add this in (presumably the slurm_simple.config.yaml file however adding the line:

  latency_wait: "--latency-wait=320"

seems to have no effect. if you could suggest where this slurm parameter could be added in that would be great.

I would also like to have some tips on the VARIABLE_CONFIG file - in particular the SUBMIT_COMMAND is unclear to me since the settings for this are declared in the config files already. When I put sbatch as the argument here it seems to try to submit the job twice.

Thanks in advance for having a look!

Runtime of 120 hours in some rules

Hi, just thought I'd drop a note about something that seemed like a bit odd to me: I was trying to run dadasnake in batch mode on our HPC cluster with a single-end setup, and kept getting an error message "sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)". Looking into this, I found that the file "workflow/rules/dada.single.smk" included "runtime="120:00:00"" for several rules. After changing the requested time to 12:00:00 instead, everything worked fine. At a quick check, a few other files/rules also have this 120 hour runtime request. Is this on purpose?

Test run fail

Hi,

I am using a Mac and follow the instructions until Step 7 (the test run).

Here are the first several lines of the error output:

"Activating conda environment: /path/to/dadasnake/conda/707773e24eac688b76abbf0176048d28
Not a conda environment: /path/to/dadasnake/conda/707773e24eac688b76abbf0176048d28
mktemp: mkdtemp failed on /var/folders/zg/9bd_wdgn3qg8kl1z973r48mjxx5dnb/T/--tmpdir=/path/to/dadasnake/testoutput/tmp.WAYe0zsC: No such file or directory"

Any clues why it cannot successfully activate the conda environment?

What is also weird is that, when I run the initialization step by "./dadasnake -i config/config.init.yaml", it takes 20s or less, definitely not several minutes. Not sure what is going wrong here.