bensutherland / edna_metabarcoding Goto Github PK
View Code? Open in Web Editor NEWPipeline to analyze eDNA metabarcoding samples (PE and SE, demultiplexed, multiplexed)
Pipeline to analyze eDNA metabarcoding samples (PE and SE, demultiplexed, multiplexed)
We do not do the internal removal for the denoising with single-end data due to the large number of unique amplicons. Because of this, we do not have an *_HS.fa output file, and therefore the standard obitab script will not work on SE data.
Can we re-name the 'unidentified' files to not require a subsequent different script for after the '02_ngsfilter_SE_exp_unident.sh', that is, can we somehow make only a single obiannotate.sh script? Currently there are two: obiannotate_unident.sh and obiannotate_ident.sh.
Hello!
I am so happy to have found this page - I have been trying to separate demultiplexed sequences using ngsfilter for so long, and this code has been so useful!
I have unfortunately hit a problem when trying to grep sequences using the tags attached via ngsfilter. I finally got my code to work (and grep!), but it grepped 0 entries ๐ข
Here is my code so far -- I've modified it to work within my environment.
activate virtual env
source obi3-env/bin/activate
cd {folder where all the files are}
import a few sequences to test code functionality
obi import raw_sequences/E-AFR090512_S75_L001_R1_001.fastq.gz test_072022/E-AFR090512_S75_L001_R1
obi import raw_sequences/E-AFR090512_S75_L001_R2_001.fastq.gz test_072022/E-AFR090512_S75_L001_R2
obi import raw_sequences/E-ALM180712_S34_L001_R1_001.fastq.gz test_072022/E-ALM180712_S34_L001_R1
obi import raw_sequences/E-ALM180712_S34_L001_R2_001.fastq.gz test_072022/E-ALM180712_S34_L001_R2
obi import raw_sequences/E-ANE040812_S36_L001_R1_001.fastq.gz test_072022/E-ANE040812_S36_L001_R1
obi import raw_sequences/E-ANE040812_S36_L001_R2_001.fastq.gz test_072022/E-ANE040812_S36_L001_R2
for some reason, obi import does NOT want to work within a for-loop, so I just do it manually
import the ngsfilter file w/ info on sequences
obi import --ngsfilter baboon_diet_ngsfilter.txt test_072022/ngsfile
check if import worked
obi ls test_072022
create file with all the sample names to use for for-loops throughout pipeline
ls *_R1_001.fastq.gz | cut -c -23 > ../samples_R1
ls *_R2_001.fastq.gz | cut -c -23 > ../samples_R2
add primer tags using ngsfilter
for sample in $(cat samples_R1)
do
echo "On sample: $sample"
obi ngsfilter -t ngsfile -u test_072022/unidentified_${sample} test_072022/${sample} test_072022/identified_${sample}
done
separate samples using ngsfilter and grep
first test it using one sample before putting it in a for-loop
obi grep -E -A3 'sample=trnl' test_072022/identified_E-AFR090512_S75_L001_R1 | obi grep -vE '^--$' - > trnl_E-AFR090512_S75_L001_R1
error codes: "error: unrecognized arguments: -s", "error: unrecognized arguments: -E", "error: argument -v/--invert-selection: ignored explicit argument 'E'"
obi grep -S 'trnl' test_072022/identified_E-AFR090512_S75_L001_R1 | obi grep '^--$' - > trnl_E-AFR090512_S75_L001_R1
error codes: "error: the following arguments are required: OUTPUT", "ValueError: unknown url type: '^--$'", "FileNotFoundError: [Errno 2] No such file or directory: '^--$'"
obi grep -S 'trnl' test_072022/identified_E-AFR090512_S75_L001_R1 trnl_E-AFR090512_S75_L001_R1
results: "2022-06-30 19:36:26,770 [grep : INFO ] Grepped 0 entries"
So I've struggled to figure out how to grep sequences individually within my file and filter them into a new file. Is the original script formatted for obitools, and not obitools3? The grep is different (i.e., including 'obi').
Any help would be massively appreciated, thank you!!
The scripts for the single-end data analysis is currently using the folder 04b_annotated_samples
, but this should not be necessary and complicates the convergence of the PE and SE pipelines.
Can 01a_read_merging.sh
and 01a_read_merging_no_prime.sh
be merged into a single script (these are for multiplexed and de-multiplexed, respectively.
cp -l 02_raw_data/your_file_R1_001.fastq 03_merged/your_file_ali.fq
Input fille is 'sample_S1_L001_R1_001.fastq' and the expected in the merged is 03_merged/sample_S1_L001_ali.fq
When using ngsfilter or illuminapairedend, they require input as .fastq, but currently we are using .fastq.gz as the primer removal (cutadapt) input in the script.
Need to make this parallel.
Currently 01_scripts/03_retain_unique.sh
is operating on the 'cut to 230 bp' fastq file, but this needs to be specifically pointed to, as the standard works on *assi.fq
instead of *assi_230.fq
.
Need to find a solution that when cutadapt works, the following script is not needed to fix it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.