hsgweon / pipits Goto Github PK
View Code? Open in Web Editor NEWAutomated pipeline for analyses of fungal ITS from the Illumina
License: GNU General Public License v3.0
Automated pipeline for analyses of fungal ITS from the Illumina
License: GNU General Public License v3.0
Hi!
First, thanks for developing this tool, I think it will be brilliant if I get it work.
I am processing more than 2000 PE FASTQ files and everything was working until the step of merging and producing the final out_seqprep/prepped.fasta file after running pispino_seqprep.
This is the error I get:
**2022-06-29 17:36:22 ... done
2022-06-29 17:36:22 Joining paired-end reads [VSEARCH]
2022-06-29 17:36:22 Joining with VSEARCH.
vsearch v2.21.1_linux_x86_64, 251.8GB RAM, 64 cores
https://github.com/torognes/vsearch
Merging reads
Fatal error: Invalid line 3 in FASTQ file: '+' line must be empty or identical to header
2022-06-29 17:36:22 Error: None zero returncode: vsearch --fastq_mergepairs ../out-seqprep/tmp/reindex_fastq_F/ERR3280518.fastq --reverse ../out-seqprep/tmp/reindex_fastq_R/ERR3280518.fastq --fastqout ../out-seqprep/tmp/joined/ERR3280518.fastq --threads 1 --fastq_allowmergestagger --fastq_maxdiffs 500 --fastq_minovlen 20 --fastq_minmergelen 100**
I've checked the tmp files and I've realised the in the process, the program has changed the headers of the FASTQ files, so the ">" and "+" lines no longer match, and the search cannot merge the files if that's the case. Do you know what have gone wrong or how to fix it? The original files didn't have this problem.
Hope you can help, thanks very much!
Do the adapters need to be removed prior to running PIPITS or those not matter since it extracts the ITS region?
Hello,
PEAR cannot be downloaded with wget at the moment. They also have a new homepage where you are supposed to register and get your academic-usage link to PEAR tar via mail later. I tried this two times, it worked the second time. The path for creating a link in pipits/bin is also slightly different now.
New PEAR page: https://www.h-its.org/downloads/pear-academic/
Cheers,
Fabian
Hi PIPITS team,
I am on the last step of pipits pipeline:
pipits_process -i out_funits/ITS.fasta -o out_process -v -r
WHILE Downloading UNITE trained database, version: 27.10.2022
It gives error that the Downloaded data is corrupt. Get in touch with PIPITS team!. Exiting...
Kindly guide how to solve this issue.
Thank you,
Javaria
Hi,
I'm using the current version of Pipits on Ubuntu 18.04. The program has been working fine with ITS2 datasets but I keep encountering an error (see subject field) from ITSx when processing data generated with the BITS (ACCTGCGGARGGATCA) and B58S3 (GAGATCCRTTGYTRAAAGTT) primers.
In addition to my own data, I tried to process a published BITS data set that has successfully used pipits, and encountered the same error: ERROR: You have 0 sequences identified as ITS1. Are you sure your sequences are ITS1?
The data set used to verify my original issue was PRJEB32659 (A meta-barcoding analysis of soil mycobiota of the upper Andean Colombian agro-environment. Scientific Reports volume 9, Article number: 10085 (2019).
Cheers, Greg
Hi
I recently used the PIPITS to analyze my sequencing data in Dec and it was a success
but last week i tried to analyze another set of data and received this error code during the PIPITS Process
i looked through the previous issues and tried the options of updating scipy but still had the same error response
Do you know if there is a recent software update that may be interfering with smooth analysis
Thanks
Sandra
Hi,
This is my first time using PIPITS. I had no issue with the tutorial on mock samples but during pipits_process for my actual samples, I run into this issue: Error: None zero returncode: classifier -Xms4g -Xmx16g classify -t pipits_db/UNITE_retrained_27.10.2022/UNITE_retrained/rRNAClassifier.properties -o process_out/assigned_taxonomy_rdp_raw.txt process_out/intermed
iate/input_nr_otus_nonchimeras_relabelled.fasta.
Here are the results from the output.log.
2024-01-11 13:42:54 pipits_process started
2024-01-11 13:42:54 Downloading UNITE trained database, version: 27.10.2022
2024-01-11 13:48:38 ... Unpacking
2024-01-11 13:48:51 ... done
2024-01-11 13:48:51 Downloading database for SINTAX
2024-01-11 13:49:24 ... Unpacking
2024-01-11 13:49:25 ... done
2024-01-11 13:49:25 Downloading WARCUP trained database:
2024-01-11 13:49:46 ... Unpacking
2024-01-11 13:49:47 ... done
2024-01-11 13:49:47 Downloading UCHIME database for chimera filtering:
2024-01-11 13:49:55 ... Unpacking
2024-01-11 13:49:55 ... done
2024-01-11 13:49:55 Dereplicating and removing unique sequences prior to picking OTUs
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch
Dereplicating file funits_out/ITS.fasta 100%
882802030 nt in 3525785 seqs, min 100, max 482, avg 250
Sorting 100%
307813 unique sequences, avg cluster 11.5, median 1, max 146595
Writing FASTA output file 100%
2024-01-11 13:49:59 Picking OTUs [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch
Reading file process_out/intermediate/input_nr.fasta 100%
84061458 nt in 307813 seqs, min 100, max 482, avg 273
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 18750 Size min 1, max 11914, avg 16.4
Singletons: 13416, 4.4% of seqs, 71.6% of clusters
2024-01-11 13:50:33 Removing chimeras [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch
Reading file pipits_db/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta 100%
16786547 nt in 30555 seqs, min 146, max 2570, avg 549
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Detecting chimeras 100%
Found 96 (0.5%) chimeras, 18638 (99.4%) non-chimeras,
and 16 (0.1%) borderline sequences in 18750 unique sequences.
Taking abundance information into account, this corresponds to
7322 (1.8%) chimeras, 402368 (98.2%) non-chimeras,
and 156 (0.0%) borderline sequences in 409846 total sequences.
2024-01-11 13:50:39 Renaming OTUs
2024-01-11 13:50:39 Mapping reads onto centroids [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch
Reading file process_out/intermediate/input_nr_otus_nonchimeras_relabelled.fasta 100%
4421074 nt in 18638 seqs, min 100, max 482, avg 237
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching unique query sequences: 3514479 of 3525785 (99.68%)
2024-01-11 13:54:55 Making OTU table
2024-01-11 13:55:04 Converting classic tabular OTU into a BIOM format [BIOM]
2024-01-11 13:55:10 Assigning taxonomy with VSEARCH-SINTAX [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch
Reading file pipits_db/UNITE_retrained_27.10.2022.sintax.fa/UNITE_retrained_27.10.2022.sintax.fa 100%
189392914 nt in 326300 seqs, min 140, max 1501, avg 580
Counting k-mers 100%
Creating k-mer index 100%
Classifying sequences 100%
Classified 14934 of 18638 sequences (80.13%)
2024-01-11 13:56:57 Adding SINTAX assignment to OTU table [BIOM]
2024-01-11 13:56:58 Converting OTU table with taxa assignment into a BIOM format [BIOM]
2024-01-11 13:57:00 Phylotyping OTU table
2024-01-11 13:57:04 Assigning taxonomy with UNITE [RDP Classifier]
2024-01-11 16:16:51 Error: None zero returncode: classifier -Xms4g -Xmx16g classify -t pipits_db/UNITE_retrained_27.10.2022/UNITE_retrained/rRNAClassifier.properties -o process_out/assigned_taxonomy_rdp_raw.txt process_out/intermed
iate/input_nr_otus_nonchimeras_relabelled.fasta
I am unsure of what the issue is with the classifier? The SINTAX taxonomic classification did not appear to have issues. Thank you for your time.
Hi
I have a problem with running pipits_funits with test dataset
2018-09-03 18:59:17 pipits_funits started
2018-09-03 18:59:17 Checking input FASTA for illegal characters
2018-09-03 18:59:17 ... done
2018-09-03 18:59:17 Counting input sequences
2018-09-03 18:59:17 ... number of input sequences: 53
2018-09-03 18:59:17 Dereplicating sequences for efficiency
vsearch v2.8.2_linux_x86_64, 15.6GB RAM, 8 cores
https://github.com/torognes/vsearch
Dereplicating file out_seqprep/prepped.fasta 100%
13859 nt in 53 seqs, min 239, max 370, avg 261
Sorting 100%
20 unique sequences, avg cluster 2.6, median 2, max 11
Writing output file 100%
Writing uc file, first partSegmentation fault (core dumped)
2018-09-03 18:59:17 Error: None zero returncode: vsearch --derep_fulllength out_seqprep/prepped.fasta --output out_funits/intermediate/derep.fasta --uc out_funits/intermediate/derep.uc --fasta_width 0 —sizeout
Do you have solution for this?
Thank you for your help.
Hi,
I have an issue with running pipits_funits. I get the error below; however, if I copy paste the ITSX command and run it on it's own, it works perfectly fine. Any ideas what the problem could be?
[monodon Fungal_pipits_HPC]$ pipits_funits -i pipits_prep/prepped.fasta -o pipits_funits -t 40 -x ITS1
2018-01-16 15:53:18 INFO: PIPITS_FUNITS started
2018-01-16 15:53:18 INFO: Checking input FASTA for illegal characters
2018-01-16 15:53:25 INFO: Counting input sequences
2018-01-16 15:53:38 INFO: Number of input sequences: 6809988
2018-01-16 15:53:38 INFO: Dereplicating sequences for efficiency
2018-01-16 15:54:52 INFO: Extracting ITS1 from sequences [ITSx]
2018-01-16 15:54:52 ERROR: None zero returncode: ITSx -i pipits_funits/intermediate/derep.fasta -o pipits_funits/intermediate/derep --preserve T -t F --cpu 40 --save_regions ITS1
[monodon Fungal_pipits_HPC]$ ITSx -i pipits_funits/intermediate/derep.fasta -o pipits_funits/intermediate/derep --preserve T -t F --cpu 40 --save_regions ITS1
ITSx -- Identifies ITS sequences and extracts the ITS region
by Johan Bengtsson-Palme et al., University of Gothenburg
Version: 1.0.11
-----------------------------------------------------------------
Tue Jan 16 16:07:48 2018 : Preparing HMM database (should be quick)...
Tue Jan 16 16:07:48 2018 : Checking and handling input sequence data (should not take long)...
Tue Jan 16 16:08:26 2018 : Doing paralellised comparison to HMM database (this may take a long while)...
Thanks for your help!
Kind regards,
Roger
Hi,
I am working my way through the instructions in your README with your test dataset. I've made it to the final step. I am getting stuck in pipits_process
with the following output:
pipits_process -i out_funits/ITS.fasta -o out_process
pipits_process 2.2, the PIPITS Project
https://github.com/hsgweon/pipits
---------------------------------
2018-10-11 11:25:31 pipits_process started
2018-10-11 11:25:31 Generating a sample list from the input sequences
2018-10-11 11:25:32 Dereplicating and removing unique sequences prior to picking OTUs
2018-10-11 11:25:32 Picking OTUs [VSEARCH]
2018-10-11 11:25:32 Removing chimeras [VSEARCH]
2018-10-11 11:25:37 Renaming OTUs
2018-10-11 11:25:37 Mapping reads onto centroids [VSEARCH]
2018-10-11 11:25:37 Making OTU table
2018-10-11 11:25:37 Converting classic tabular OTU into a BIOM format [BIOM]
2018-10-11 11:25:37 Error: None zero returncode: biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json
When I run the problem command directly from the command line I get this:
biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json
Traceback (most recent call last):
File "/Fingerlin/home/russellp/.conda/envs/pipits_env/bin/biom", line 7, in <module>
from biom.cli import cli
File "/Fingerlin/home/russellp/.conda/envs/pipits_env/lib/python3.6/site-packages/biom/__init__.py", line 51, in <module>
from .table import Table
File "/Fingerlin/home/russellp/.conda/envs/pipits_env/lib/python3.6/site-packages/biom/table.py", line 176, in <module>
import numpy as np
File "/usr/lib/python3.5/site-packages/numpy/__init__.py", line 142, in <module>
from . import add_newdocs
File "/usr/lib/python3.5/site-packages/numpy/add_newdocs.py", line 13, in <module>
from numpy.lib import add_newdoc
File "/usr/lib/python3.5/site-packages/numpy/lib/__init__.py", line 8, in <module>
from .type_check import *
File "/usr/lib/python3.5/site-packages/numpy/lib/type_check.py", line 11, in <module>
import numpy.core.numeric as _nx
File "/usr/lib/python3.5/site-packages/numpy/core/__init__.py", line 14, in <module>
from . import multiarray
ImportError: cannot import name 'multiarray'
Some Googling suggests that this issue can be solved by deleting and reinstalling NumPy. However, I have not been able to change the version or reinstall NumPy due to the dependencies of PIPITS. Can you suggest a solution for this issue?
Thanks!
Pam
Hi,
I am running pipits in our cluster server with the command:
"pipits_process -i /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_funits6/ITS.fasta -o /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6 -v -r"
2022-09-01 13:29:16 pipits_process started
2022-09-01 13:29:16 Generating a sample list from the input sequences
2022-09-01 13:29:35 Downloading UNITE trained database, version: 10.05.2021
Traceback (most recent call last):
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 345, in _make_request
self._validate_conn(conn)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 844, in validate_conn
conn.connect()
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 326, in connect
ssl_context=context)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/util/ssl.py", line 325, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/home/nguyenl15/.conda/envs/pipits_env/lib/python3.6/ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "/home/nguyenl15/.conda/envs/pipits_env/lib/python3.6/ssl.py", line 817, in init
self.do_handshake()
File "/home/nguyenl15/.conda/envs/pipits_env/lib/python3.6/ssl.py", line 1077, in do_handshake
self._sslobj.do_handshake()
File "/home/nguyenl15/.conda/envs/pipits_env/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/adapters.py", line 438, in send
timeout=timeout
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 630, in urlopen
raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nguyenl15/.conda/envs/pipits_env/bin/pipits_process", line 344, in
verbose = options.verbose)
File "/home/nguyenl15/.conda/envs/pipits_env/bin/pipits_process", line 272, in downloadDB
request = requests.get(url, stream=True)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 661, in send
history = [resp for resp in gen] if allow_redirects else []
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 661, in
history = [resp for resp in gen] if allow_redirects else []
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 214, in resolve_redirects
**adapter_kwargs
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/adapters.py", line 512, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)
Could you please suggest me the solution?
Thank you,
Kind regards,
Le Phuong Nguyen.
Hi,
I´m cleaning up UNITE files to only include a subset and would like to retrain the files on RDP. I´ve seen that you provide already retrained files but would like to retrain on my subset.
Is there any script available to do that ?
I have installed miniconda3 and installed pipits, but when i run conda update pipits or conda update -c bioconda pipits, it will only update to V 2.1. Is there another way to obtain V 2.2?
Dear hsgweon,
I recently ran pipits 2.4 with a job of 53 million reads. I used an SGE (SUSE linux).
pipits_process produced a broken phylotype table: abundances with decimals and as minor issue, the first column was named "OTU ID" and filled with taxonomy.
I reproduced the error by running the pipis_process step (pipits 2.4) on Ubuntu 18.04.1 (WSL version).
Running just pipits_process in a pipits 2.3 environment solved the problem for me.
Running the test data under 2.4 also produced the strange looking phylotype table, but with whole numbers (no difference between OTUs and phylotypes in test data, maybe) .
Therefore, I guess there might be some error hidden the phylotyping step of pipits 2.4?
Cheers and many thanks for your regular updates,
Fabian
Hi,
I had several 'water blanks' (negative controls) in my dataset. These were given names "water_blank_1", "water_blank_2", "water_blank_3" etc. After the initial 'pipits prep' step, there was only a single 'water' sample remaining. If I had to guess, your script is overwriting the samples after splitting on the delimiter "". If true, your pipeline is broken. It is not advisable to parse file names based on "", as this is a very common delimiter used in naming. Please fix this. It will only cause further headache.
Thanks!
Dear Pipits,
Thank you for the tool, I wish to try this with my data, which is not fungal. Therefore I would have to retrain rdp.
when I type:
$ pipits_retrain_rdp -h
usage: Retrains RDP Classifier [-h] -j -f -t -o DIR
optional arguments:
-h, --help show this help message and exit
-j [REQUIRED] RDP Classifier .jar file
-f [REQUIRED] UNITE training data - FASTA sequences downloaded
from http://sourceforge.net/projects/rdp-
classifier/files/RDP_Classifier_TrainingData
-t [REQUIRED] UNITE training data - taxonomy file downloaded
from http://sourceforge.net/projects/rdp-
classifier/files/RDP_Classifier_TrainingData
-o DIR Output directory where files and settings for retrained
parameters are stored.
All the above help-info is related to the trained project. Would you be able to add a help page on github regarding retraining and how to prepare the .xml file which has the taxonomy tree? Would this work if I gave it a fasta file of sequences and an xml formatted tree ... or is it more complicated than that?
Also, would that mean I would have to alter the enviroment variable: export PIPITS_UNITE_RETRAINED_DIR=$HOME/pipits/refdb/UNITE_retrained
to my new trained data? Or can you pass this as an argument when it is required?
regards,
Peter Thorpe
Dear Hyun,
First, I apologize if this is not the right place to ask questions. I have used your test dataset and all steps worked fine for me.
When using my dataset, pipits_funits step (hmmsearch) seems to be taking longer than expected. Currently, the progress output has been stopped at "2019-01-07 19:06:58 Extracting ITS1 from sequences [ITSx]" for about 4 days. According to the MacOS Activity Monitor, terminal is still processing the command. As the process seems to be still running, I checked the size of output files. Under "intermediate" folder, the "derep.fasta" is at 945.3 MB, "derep.summary.txt" is at 162 bytes, and "derep.uc" is at 398.7 MB and have not changed in size for a while.
Is this a simple computing speed issue? This computer has 8 GB of memory and pispino_seqprep output says I have 6441992 sequences prepped.
Thank you and please let me know if I should provide additional information.
Hi Mr Gweon,
I'm currently trying to use PIPITS 1.5 on a cluster where PIPITS have been installed by a system administrator for all users.
Unfortunatly, I'm facing an error with the last tool of the workflow: PIPITS_process.
The error is as follow:
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /path/to/pipits/install/PIPITS-1.5.0/refdb/UNITE_retrained_28.06.2017/rRNAClassifier.properties (Permission denied)
After encountering this error I downloaded the UNITE_retrained_28.06.2017 archive myself (from the sourceforge repository) and I checked the default permissions of the files.
All the other files have -rw-r--r--
as permissions but the rRNAClassifier.properties has only -rw-------
.
Maybe you should reupload a new version of the archive with fixed default permission for this rRNAClassifier.properties file so that other people don't fall in the same trap as I did ? :)
Thanks in advance
A.B
Hi there, I have just installed Pipits in my computer. I had all dependencies previously installed and re-tested as per instructions at pipits page. Now I am running the test data and I can't move forward from step 2.
test_data jcnavarro$ pipits_getreadpairslist -i rawdata -o readpairslist.txt
Generating a read-pair list file from the input directory...
Done. "readpairslist.txt" created.
test_data jcnavarro$ pipits_prep -i rawdata -o pipits_prep -l readpairslist.txt
2017-04-07 12:56:08 PIPITS_PREP_SINGLE started
2017-04-07 12:56:08 Processing the listfile
2017-04-07 12:56:08 Counting sequences in rawdata
2017-04-07 12:56:08 Number of reads: 75
2017-04-07 12:56:08 Reindexing forward reads
2017-04-07 12:56:08 Reindexing reverse reads
2017-04-07 12:56:08 Joining paired-end reads [PEAR]
Traceback (most recent call last):
File "/Users/jcnavarro/pipits/bin/pipits_prep", line 208, in
verbose = options.verbose)
File "/Users/jcnavarro/pipits/bin/pipits_prep_lib.py", line 268, in join
numberofsequences += int(p.communicate()[0]) / 4
ValueError: invalid literal for int() with base 10: ''
Any thoughts?
Thanks!
Javier
Hi
I installed pipits on macOS mojave as well as on a linux cluster, but I run into a similar problem, when testing on the included test files, as reportet here #20. Changing the local to en_US.UTF-8 could not solve the problem...
2019-05-26 18:17:44 Converting classic tabular OTU into a BIOM format [BIOM]
Traceback (most recent call last):
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/bin/biom", line 11, in
sys.exit(cli())
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/biom/cli/table_converter.py", line 129, in convert
table_type, process_obs_metadata, tsv_metadata_formatter)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/biom/cli/table_converter.py", line 207, in _convert
write_biom_table(result, fmt, output_filepath)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/biom/cli/util.py", line 26, in write_biom_table
f.write(table.to_json(biom.parse.generatedby()))
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/biom/table.py", line 4278, in to_json
raise TableException("Unsupported matrix data type.")
biom.exception.TableException: Unsupported matrix data type.
2019-05-26 18:18:01 Error: None zero returncode: biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json
Any help is appreciated
Thanks
Error is as follows:
2019-05-14 09:31:04 Downloading UCHIME database for chimera filtering:
2019-05-14 09:31:04 ... DB directory and files exits, but seems to be different/corrupt -> re-downloading...
[###################################################################################################]100% | 8.6 MiB/s | 4578703 of 4578703 | Time: 0:00:00
File size: 4.37 MB
Downloaded data is corrupt. Get in touch with PIPITS team!. Exiting...
This only appears in the latest version and I think is due to md5 checksum not being correct for the UCHIME database it downloads. I have deleted earlier versions of the database from the bin already and that did not solve the problem.
Any ideas how to solve this?
Thanks!
Hi,
I have a problem with running pipits_funits. Other data sets have been successful. Two attached files are a "output.log" and a "versions.log".
Do you have solution for this?
Thank you for your help.
output.log
versions.log
Dear all,
I tested PIPITS (latest version) on the 'data_test' and everything has gone well (running on the frontend).
I submitted a job via qsub
using my data sequencing, but the job has been killed after '
pipits_funits
step.
In details, .sh script is the following:
#!/bin/bash
conda init bash
source ~/.bashrc
conda activate pipits_env
pipits_funits -i prepped.fasta -o out_funits_ITS2 -x ITS2 -r -t 30
I submitted it via qsub and the following is the content of 'output.log':
2021-04-19 15:53:33 pipits_funits started
2021-04-19 15:53:33 Checking input FASTA for illegal characters
2021-04-19 15:53:45 ... done
2021-04-19 15:53:45 Counting input sequences
2021-04-19 15:53:54 ... number of input sequences: 4190427
2021-04-19 15:53:54 Dereplicating sequences for efficiency
vsearch v2.17.0_linux_x86_64, 251.7GB RAM, 40 cores
https://github.com/torognes/vsearch
Dereplicating file /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_seqprep_plate3/prepped.fasta 100%
1462933517 nt in 4190427 seqs, min 100, max 582, avg 349
Sorting 100%
2118105 unique sequences, avg cluster 2.0, median 1, max 38177
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
2021-04-19 15:54:30 ... done
2021-04-19 15:54:30 Counting dereplicated sequences
2021-04-19 15:54:35 ... number of dereplicated sequences: 2118105
2021-04-19 15:54:35 Splitting sequences to multiple parts
[INFO] split into 30 parts
[INFO] read sequences ...
[INFO] read 2118105 sequences
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_001.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_002.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_003.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_004.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_005.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_006.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_007.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_008.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_009.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_010.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_011.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_012.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_013.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_014.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_015.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_016.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_017.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_018.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_019.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_020.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_021.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_022.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_023.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_024.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_025.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_026.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_027.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_028.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_029.fasta
[INFO] write 70589 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_030.fasta
2021-04-19 15:54:57 ... done
2021-04-20 01:48:07 Counting ITS sequences (dereplicated)
2021-04-20 01:48:07 ... number of ITS sequences (dereplicated): 124443
2021-04-20 01:48:07 Sorting by ID
[INFO] read sequences ...
[INFO] 124443 sequences loaded
[INFO] sorting ...
[INFO] output ...
2021-04-20 01:48:10 ... done
2021-04-20 01:48:10 Removing short sequences below < 100bp
vsearch v2.17.0_linux_x86_64, 251.7GB RAM, 40 cores
https://github.com/torognes/vsearch
Reading input file 100%
124309 sequences kept (of which 0 truncated), 134 sequences discarded.
2021-04-20 01:48:10 ... done
2021-04-20 01:48:10 Counting length-filtered sequences (dereplicated)
2021-04-20 01:48:10 ... number of length-filtered sequences (dereplicated): 124309
2021-04-20 01:48:10 Re-inflating sequences
Traceback (most recent call last):
File "/lustrehome/alabbate/.conda/envs/pipits_env/bin/pipits_rereplicate", line 31, in
from pipits import pipits_SeqIO as SeqIO
ImportError: No module named pipits
2021-04-20 01:48:11 Error: None zero returncode: python /lustrehome/alabbate/.conda/envs/pipits_env/bin/pipits_rereplicate -i /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/derep.ITS2.sizefiltered.fasta -o /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/ITS.fasta --uc /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/derep.uc
Do you have any suggestions to fix it?
Thank you in advance for your precious help.
Best,
AL
Hi @hsgweon
I am running on MacOS terminal. The installation of pipits_env is done without issues and the test with your mock data worked out fine. I am running pipits_process command and it seems stopped at the UCHIME database downloading step. Please see below.
2020-01-17 12:52:53 pipits_process started
2020-01-17 12:52:53 Generating a sample list from the input sequences
2020-01-17 12:52:54 Downloading UNITE trained database, version: 02.02.2019
[#################################################################################]100% | 6.1 MiB/s | 111590718 of 111590718 | Time: 0:00:17
File size: 106.42 MB
2020-01-17 12:53:19 ... Unpacking
2020-01-17 12:53:25 ... done
2020-01-17 12:53:25 Downloading WARCUP trained database:
[###################################################################################]100% | 13.9 MiB/s | 17880783 of 17880783 | Time: 0:00:01
File size: 17.05 MB
2020-01-17 12:53:27 ... Unpacking
2020-01-17 12:53:28 ... done
2020-01-17 12:53:28 Downloading UCHIME database for chimera filtering:
Does is usually takes time during this step? It has been around 30 min without any change.
Thank you!
When running the pipits_process command with the reference dataset, I keep getting the following error:
2018-09-10 19:25:06 pipits_process started
2018-09-10 19:25:06 Generating a sample list from the input sequences
2018-09-10 19:25:06 Dereplicating and removing unique sequences prior to picking OTUs
2018-09-10 19:25:06 Picking OTUs [VSEARCH]
2018-09-10 19:25:06 Removing chimeras [VSEARCH]
2018-09-10 19:25:06 Error: None zero returncode: vsearch --uchime_ref out_process/intermediate/input_nr_otus.fasta --db $PIPITS_UNITE_REFERENCE_DATA_CHIMERA --nonchimeras out_process/intermediate/input_nr_otus_nonchimeras.fasta --threads 1
I've put the following sets of environmental variables in my .bashrc file, and continue to get the same error:
export PIPITS_UNITE_RETRAINED_DIR=$HOME/pipits/refdb/UNITE_retrained
export PIPITS_UNITE_REFERENCE_DATA_CHIMERA=$HOME/pipits/refdb/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta
export PIPITS_WARCUP_RETRAINED_DIR=$HOME/pipits/refdb/warcup_retrained_V2
(From earlier question in thread about same issue):
export PIPITS_UNITE_RETRAINED_DIR=/home/lh001/pipits/refdb/UNITE_retrained
export PIPITS_UNITE_REFERENCE_DATA_CHIMERA=/home/lh001/pipits/refdb/uchime_reference_dataset_28.06.2017/uchime_referenc$
export PIPITS_WARCUP_RETRAINED_DIR=/home/lh001/pipits/refdb/warcup_retrained_V2
What can I do to resolve this issue?
Best,
Bryce Alex
Hi,
Thank you very much for the pipeline. It is very useful and comfortable to work with a specific pipeline developed for fungi. I have already tested with satisfactory results
I was wondering if you plan to incorporate some options for clustering with VSEARCH, especially an option to cluster at > 97%. This would be awesome for the pipeline.
Thanks a lot again.
Greetings,
Jaime
I just ran the pispino_createreadpairslist -i rawdata/ -o paired_readlist.txt
and the paired_readlist resulted to be empty? I have my fastq files in the rawdata folder I don't know what could be happening.
Hi there, I have been trying out the PIPITS pipeline and had a few questions regarding processing speeds. As stated in the PIPITS paper, the pipits_funits step is the computational bottleneck in the pipeline. If I'm not mistaken, ITSx utilizes hmmscan. I did a little bit of research, and it seems that hmmscan is substantially slower than hmmsearch (see URL below). What are your thoughts on this? Do you think it would be possible or worthwhile to use hmmsearch to speed up this step? Apologies if I have overlooked an obvious reason not to do so.
https://cryptogenomicon.org/2011/05/27/hmmscan-vs-hmmsearch-speed-the-numerology/
I am running pipits_funits
and after a while when Re-inflating sequences I get this error:
Error: None zero returncode: python /Users/miniconda3/envs/pipits_envs/bin/pipits_rereplicate -i out_funits/intermediate/derep.ITS1.sizefiltered.fasta -o out_funits/ITS.fasta --uc out_funits/intermediate/derep.uc
When I check my folder I am in the intermediate folder? What this could be? I have also done the export LC_ALL=en_us.UTF-8
since I saw this helped in a similar problem.
ERRO: environment variables (PIPITS_UNITE_REFERENCE_DATA_CHIMERA, PIPITS_UNITE_RETRAINED_DIR, PIPITS_WARCUP_RETRAINED_DIR) are not set. Please see PIPITS installation for an instruction on this.
(pipits_env)
I am getting the above issue and also unable to update to PIPITS 2.3
Hi there, I have installed pipits 2 on mac following all instructions in the site, and tested it using the raw data provided and i can't pass from pispino_seqprep step. See below.
Any suggestions?
Bests
JCNavarro-M47:test_pipits jcnavarro$ source activate pipits_env
discarding //anaconda/bin from PATH
prepending //anaconda/envs/pipits_env/bin to PATH
(pipits_env)JCNavarro-M47:test_pipits jcnavarro$ pispino_createreadpairslist -i rawdata -o readpairslist.txt
Generating a read-pair list file from the input directory...
Done - "readpairslist.txt" created
(pipits_env)JCNavarro-M47:test_pipits jcnavarro$ pispino_seqprep -i rawdata -o out_seqprep -l readpairslist.txt
2018-03-28 15:42:35 pispino_seqprep started
2018-03-28 15:42:35 Checking listfile
2018-03-28 15:42:35 ... done
2018-03-28 15:42:35 Counting sequences in rawdata
Traceback (most recent call last):
File "//anaconda/envs/pipits_env/bin/pispino_seqprep", line 203, in
verbose = options.verbose)
File "//anaconda/envs/pipits_env/lib/python2.7/site-packages/pispino/seqprep.py", line 43, in count_sequences
numberofsequences += int(getFileLineCount(input_dir + "/" + filename, extensionType) / 4)
File "//anaconda/envs/pipits_env/lib/python2.7/site-packages/pispino/seqtools.py", line 21, in getFileLineCount
f = bz2.open(filename, "r")
AttributeError: 'module' object has no attribute 'open'
The primary pipeline is for Illumina paired-end
But my dataset are from Ion PGM
How I do this? Thanks for the atention
Hi,
I have a problem when using pipits_funits on my own data. It works perfectly fine on the test data provided. But when I try to run it on my data its output looks like this:
pipits_funits 2.1, the PIPITS Project
https://github.com/hsgweon/pipits
---------------------------------
2018-06-15 13:56:32 pipits_funits started
2018-06-15 13:56:32 Checking input FASTA for illegal characters
2018-06-15 13:56:37 ... done
2018-06-15 13:56:37 Counting input sequences
2018-06-15 13:56:39 ... number of input sequences: 2402738
2018-06-15 13:56:39 Dereplicating sequences for efficiency
2018-06-15 13:57:09 Counting dereplicated sequences
2018-06-15 13:57:10 ... number of dereplicated sequences: 557710
2018-06-15 13:57:10 Extracting ITS2 from sequences [ITSx]
ITSx -- Identifies ITS sequences and extracts the ITS region
by Johan Bengtsson-Palme et al., University of Gothenburg
Version: 1.1b1
-----------------------------------------------------------------
Fri Jun 15 13:57:10 2018 : Preparing HMM database (should be quick)...
Fri Jun 15 13:57:10 2018 : Checking and handling input sequence data (should not take long)...
Fri Jun 15 13:57:20 2018 : Doing paralellised comparison to HMM database (this may take a long while)...
Fri Jun 15 18:47:46 2018 : Fungi analysis of main strand finished.
There is no error, it just stops and I am left with the intermediate files. I've tried it couple of times and the output always looks like that.
Do you know where could be the problem?
Hi,
I have an issue importing the phylotype table using the phyloseq package. I tried phylotype table from different pipits runs. All show the same behavior (see below).
ps <- import_biom(BIOMfilename = "data/20056_ITS/phylotype_table.biom")
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: There were 50 or more warnings (use warnings() to see the first 50)
The normal biom file (otu_tabke.biom) just works fine.
Any suggestings to get rid off this error?
Thanks a lot!
Best,
Axel
Dear all,
I also encountered the non-zero returncode error but in a different part of the pipeline, during the re-inflating sequences. After the pipits-funits command. I have tried the suggestions made in earlier threads but it still gave me the error. If somebody has some suggestions that would be appreciated.
2019-11-28 10:16:41 pipits_funits started
2019-11-28 10:16:41 Checking input FASTA for illegal characters
2019-11-28 10:16:41 ... done
2019-11-28 10:16:41 Counting input sequences
2019-11-28 10:16:41 ... number of input sequences: 53
2019-11-28 10:16:41 Dereplicating sequences for efficiency
2019-11-28 10:16:41 ... done
2019-11-28 10:16:41 Counting dereplicated sequences
2019-11-28 10:16:41 ... number of dereplicated sequences: 20
2019-11-28 10:16:41 Extracting ITS2 from sequences [ITSx]
2019-11-28 10:16:44 ... done
2019-11-28 10:16:44 Counting ITS sequences (dereplicated)
2019-11-28 10:16:44 ... number of ITS sequences (dereplicated): 19
2019-11-28 10:16:44 Sorting by ID
2019-11-28 10:16:44 ... done
2019-11-28 10:16:44 Removing short sequences below < 100bp
2019-11-28 10:16:44 ... done
2019-11-28 10:16:44 Counting length-filtered sequences (dereplicated)
2019-11-28 10:16:44 ... number of length-filtered sequences (dereplicated): 19
2019-11-28 10:16:44 Re-inflating sequences
2019-11-28 10:16:44 Error: None zero returncode: python /Users/henrik/miniconda2/envs/pipits_env/bin/pipits_rereplicate -i out_funits/intermediate/derep.ITS2.sizefiltered.fasta -o out_funits/ITS.fasta --uc out_funits/intermediate/derep.uc
(pipits_env) henriks-mini:pipits_test henrik$
Hi, Hyun,
I followed your instructions, but I can not move forward from the stage 'PIPITS_PROCESS'.
I am running a test with your test file, and my problem is indicated below.
�[91m2018-09-10 17:27:16�[0m �[0mpipits_process started�[0m
�[91m2018-09-10 17:27:16�[0m �[0mGenerating a sample list from the input sequences�[0m
�[91m2018-09-10 17:27:16�[0m �[0mDereplicating and removing unique sequences prior to picking OTUs�[0m
vsearch v2.8.0_linux_x86_64, 7.6GB RAM, 4 cores
https://github.com/torognes/vsearch
Reading file out_funits/ITS.fasta 100%
8584 nt in 52 seqs, min 144, max 275, avg 165
Dereplicating 100%
Sorting 100%
15 unique sequences, avg cluster 3.5, median 2, max 12
Writing output file 100%
9 uniques written, 6 clusters discarded (40.0%)
�[91m2018-09-10 17:27:16�[0m �[0mPicking OTUs [VSEARCH]�[0m
vsearch v2.8.0_linux_x86_64, 7.6GB RAM, 4 cores
https://github.com/torognes/vsearch
Reading file out_process/intermediate/input_nr.fasta 100%
1447 nt in 9 seqs, min 147, max 187, avg 161
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 6 Size min 1, max 3, avg 1.5
Singletons: 4, 44.4% of seqs, 66.7% of clusters
�[91m2018-09-10 17:27:16�[0m �[0mRemoving chimeras [VSEARCH]�[0m
vsearch v2.8.0_linux_x86_64, 7.6GB RAM, 4 cores
https://github.com/torognes/vsearch
Unable to open file for reading ($HOME/pipits/refdb/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta)
�[91m2018-09-10 17:27:16�[0m Error: None zero returncode: vsearch --uchime_ref out_process/intermediate/input_nr_otus.fasta --db $PIPITS_UNITE_REFERENCE_DATA_CHIMERA --nonchimeras out_process/intermediate/input_nr_otus_nonchimeras.fasta --threads 1
I am a beginner, so would you please let me know in details, so that I can follow it.
Many thanks in advance.
DongHyeon
Thanks for a great pipeline. Very nice.
I hope I'm not missing something obvious about my environment here, but this ModuleNotFoundError seems odd. Any help is greatly appreciated. 2.2 is just installed and everything running fine until pipits_funits. pipits can't find pipits module? I'm running on a cluster with Anaconda3. Here's the last few lines from the log (my paths changed to $HOME):
Traceback (most recent call last):
File "$HOME/.conda/envs/pipits_env/bin/pipits_rereplicate", line 31, in
from pipits import pipits_SeqIO as SeqIO
ModuleNotFoundError: No module named 'pipits'
ESC[91m2018-08-02 17:08:34ESC[0m Error: None zero returncode: python $HOME/.conda/envs/pipits_env/bin/pipits_rereplicate -i ITS2ExtractOut/intermediate/derep.ITS2.sizefiltered.fasta -o ITS2ExtractOut/ITS.fasta --uc ITS2ExtractOut/intermediate/derep.uc
Thanks!
I think I installed pipits correctly. I downloaded the test data and the first few commands seemed to work fine, but then I got an error message when running pipits_process
:
$ pipits_process -i out_funits/ITS.fasta -o out_process --Xmx 12G
pipits_process 2.2, the PIPITS Project
…
2018-09-11 12:18:37 Converting classic tabular OTU into a BIOM format [BIOM]
2018-09-11 12:18:37 Error: None zero returncode: biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json
Then I tried running that command directly and it looks like the dependency "Click" does not work when the locale for Python 3 is set to "ASCII":
biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json
Traceback (most recent call last):
…
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult http://click.pocoo.org/python3/for mitigation steps.
Any ideas of how to work around this issue? I've installed everything via conda into the default conda environment.
I think my locale is okay:
env | grep UTF
LANG=en_US.UTF-8
I am running FUNITS step and the I indicate -t 30 I got an error that says that I have only 2 cpu. my question is the -t is it to specify cpus or number of threads? I have dual cpu 20 cores and 40 threads workstation. It gave me the same error in both 20 and above integers but it ran when only used 2 as an integer.
I am looking to run PIPITS but my samples have multiple fields and I know that pipits takes the first field with delimiter "_".
111_M_0_Rhizo_ITS_R1_001.fastq.gz
111_M_0_Rhizo_ITS_R2_001.fastq.gz
Any recommendations.
Hi,
I am trying to work out on your pipeline I installed all dependencies including ITSx. But after installation as you mentioned in synopsis, i tried to run test data and used command pipits_prep -i rawdata -o pipits_prep -l readpairslist.txt but i found below mentioned output and unable to move forward to command pipits_funits -i pipits_prep/prepped.fasta -o pipits_funits -x ITS2
2017-10-31 03:26:13 PIPITS_PREP started
2017-10-31 03:26:13 Processing the listfile
2017-10-31 03:26:13 Counting sequences in rawdata
2017-10-31 03:26:13 Number of reads: 75
2017-10-31 03:26:13 Reindexing forward reads
2017-10-31 03:26:13 Reindexing reverse reads
2017-10-31 03:26:13 Joining paired-end reads [VSEARCH]
2017-10-31 03:26:13 ERROR: None zero returncode: vsearch --fastq_mergepairs pipits_prep/tmp/reindex_fastq_F/A01B.fastq --reverse pipits_prep/tmp/reindex_fastq_R/A01B.fastq --fastqout pipits_prep/tmp/joined/A01B.fastq --threads 1 --fastq_allowmergestagger --fastq_maxdiffs 500 --fastq_minovlen 20 --fastq_minmergelen 100
I am anxiously waiting for your quick and favorable response.
Abid
Hi @hsgweon, I hope all is well!
I'm planning on using the PIPITS pipeline for some MiSeq data analysis. I'm currently in the process of testing PIPITS installation (as advised in the guide) but I'm having issues accessing the rdp classifier to assign taxonomy for my final OTU table:
Classifier.jar is present in the relevant directory, and file permissions seem okay... any ideas?
Sorry if I'm missing something drastic! I'm new to Linux (Ubuntu) and this pipeline, with all my previous work focused to 16S rRNA pipelines in R).
Best wishes
Ryan
Hello,
I am trying to run fungal sequences with PIPITS but I'm running into issues with the pipits_funits step. I saw that a previous user also had this issue but they did not share their solution. Here is my script and the resulting error message:
(pipits_env) qiime2@qiime2core2018-4:/media/sf_Shared_Folder/Arctic_Bioremediation_tFINAL/ITS_30Aug2018_undetermined$ pipits_funits -i out_seqprep/prepped.0.fasta -o outfunits_0 -x ITS2 -t 8
2018-12-17 13:56:15 pipits_funits started
2018-12-17 13:56:15 Checking input FASTA for illegal characters
2018-12-17 13:56:20 ... done
2018-12-17 13:56:20 Counting input sequences
2018-12-17 13:56:22 ... number of input sequences: 1312800
2018-12-17 13:56:22 Dereplicating sequences for efficiency
2018-12-17 13:56:35 ... done
2018-12-17 13:56:35 Counting dereplicated sequences
2018-12-17 13:56:35 ... number of dereplicated sequences: 154140
2018-12-17 13:56:35 Extracting ITS2 from sequences [ITSx]
2018-12-17 16:43:56 ... done
2018-12-17 16:43:56 Counting ITS sequences (dereplicated)
2018-12-17 16:43:56 ERROR: You have 0 sequences!
I have also included the summary and output files:
output.log
summary.log
To try to fix this I have tried installing vsearch 2.8.0, splitting the data into smaller groups, and using different numbers of threads. Do you have any suggestions?
Thank you so much!
Shelby
I am running the command
pipits_funits -i combined_seqs.fasta -o pipits_funits -x ITS2
Here are my headers:
>ITS.Rev.B.1.S138_0 M02849:171:000000000-ANHND:1:1101:20355:1874_1:N:0:138
GGTACTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTTCGGAAGGATCATTAAATAATTTTTAATTTTTATTCTTCGCGTTATATTCTTAATATATTTTACTGTGAACTGTATTATTTCATTACGCTTGATTAATCCTTCTGCTTTACCATAATGGACAGTTCATCGAAGATGTTAACCGAGTCGTGGTCAAGCTTATCCTTGGTGTCCTTAATTATTATTCTCCAAAAGAATTCATTTTAAAAATATTTTAATATGGGCTTAAAAAACTCATTAAAACAACTTTTAAC
>ITS.Rev.B.1.S138_1 M02849:171:000000000-ANHND:1:1101:22962:1969_1:N:0:138
GGTACTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGTAAGGATCATTACCGAGTGCGGGCCCTCTGGGTCCAACCTCCCATCCGTGTCTATCTGTACCCTGTTGCTTCGGCGTTTCCTCGGCCCGCCGCAGACTAACATTTTAACACTGTCTGAAGTTTGCAGTCTGAGTTTTTAGTTAAACAATAATTAAAACTTTCAACAACTTATCTCTTGGTTCCGTCATCGATGAAGAACGCAGCGAAATGCGATAATTAATTTGAATTTCAGAATTCAGTGAATCTTCG
Hi
I am using pipits_process with the command:
pipits_process -i /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_funits6/ITS.fasta -o /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6
However, I received the errors as below:
2022-09-01 18:45:46 pipits_process started
2022-09-01 18:45:46 Generating a sample list from the input sequences
2022-09-01 18:46:04 Downloading UNITE trained database, version: 10.05.2021
2022-09-01 18:46:05 ... DB directory and files exits, and all looking good. No need to download.
2022-09-01 18:46:05 ... Unpacking
2022-09-01 18:46:10 ... done
2022-09-01 18:46:10 Downloading WARCUP trained database:
2022-09-01 18:46:10 ... DB directory and files exits, and all looking good. No need to download.
2022-09-01 18:46:10 ... Unpacking
2022-09-01 18:46:11 ... done
2022-09-01 18:46:11 Downloading UCHIME database for chimera filtering:
2022-09-01 18:46:11 ... DB directory and files exits, and all looking good. No need to download.
2022-09-01 18:46:11 ... Unpacking
2022-09-01 18:46:11 ... done
2022-09-01 18:46:11 Dereplicating and removing unique sequences prior to picking OTUs
vsearch v2.18.0_linux_x86_64, 7.6GB RAM, 2 cores
https://github.com/torognes/vsearch
Dereplicating file /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_funits6/ITS.fasta 100%
1618981758 nt in 8399800 seqs, min 100, max 462, avg 193
Sorting 100%
429233 unique sequences, avg cluster 19.6, median 1, max 732858
Writing output file 100%
68158 uniques written, 361075 clusters discarded (84.1%)
2022-09-01 18:46:21 Picking OTUs [VSEARCH]
vsearch v2.18.0_linux_x86_64, 7.6GB RAM, 2 cores
https://github.com/torognes/vsearch
Reading file /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/input_nr.fasta 100%
13560288 nt in 68158 seqs, min 102, max 372, avg 199
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 1319 Size min 1, max 6769, avg 51.7
Singletons: 458, 0.7% of seqs, 34.7% of clusters
2022-09-01 18:46:33 Removing chimeras [VSEARCH]
vsearch v2.18.0_linux_x86_64, 7.6GB RAM, 2 cores
https://github.com/torognes/vsearch
Reading file pipits_db/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta 100%
16786547 nt in 30555 seqs, min 146, max 2570, avg 549
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Detecting chimeras 100%
Found 519 (39.3%) chimeras, 792 (60.0%) non-chimeras,
and 8 (0.6%) borderline sequences in 1319 unique sequences.
Taking abundance information into account, this corresponds to
12389 (2.1%) chimeras, 584070 (97.8%) non-chimeras,
and 453 (0.1%) borderline sequences in 596912 total sequences.
2022-09-01 18:46:37 Renaming OTUs
2022-09-01 18:46:37 Mapping reads onto centroids [VSEARCH]
vsearch v2.18.0_linux_x86_64, 7.6GB RAM, 2 cores
https://github.com/torognes/vsearch
Reading file /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/input_nr_otus_nonchimeras_relabelled.fasta 100%
147858 nt in 792 seqs, min 102, max 372, avg 187
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching unique query sequences: 8307353 of 8399800 (98.90%)
2022-09-01 19:04:23 Making OTU table
Traceback (most recent call last):
File "/hpc/home/nguyenl15/.conda/envs/pipits_env/bin/pipits_uc2otutable", line 35, in
infile = open(options.infile, "r")
FileNotFoundError: [Errno 2] No such file or directory: '/hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/otus.uc'
2022-09-01 19:04:23 Error: None zero returncode: python /hpc/home/nguyenl15/.conda/envs/pipits_env/bin/pipits_uc2otutable -i /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/otus.uc -o /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/otu_table_prelim.txt -l /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/sampleIDs.txt
Could you give some advice?
Thank you,
Kind regards,
Le Phuong Nguyen
Dear all,
@hsgweon
I would like to know how I could use PIPITS pipeline to analyze my ITS1 data generated by ion torrent PGM? How I can perform prepare the sequences list step?
Marwa
I've been receiving the following error while running pipits_funits on the test data set:
2018-09-18 09:13:01 pipits_funits started
2018-09-18 09:13:01 Checking input FASTA for illegal characters
2018-09-18 09:13:01 ... done
2018-09-18 09:13:01 Counting input sequences
2018-09-18 09:13:01 ... number of input sequences: 7394
2018-09-18 09:13:01 Dereplicating sequences for efficiency
2018-09-18 09:13:01 ... done
2018-09-18 09:13:01 Counting dereplicated sequences
2018-09-18 09:13:01 ... number of dereplicated sequences: 3488
2018-09-18 09:13:01 Extracting ITS2 from sequences [ITSx]
2018-09-18 09:40:19 ... done
2018-09-18 09:40:19 Counting ITS sequences (dereplicated)
2018-09-18 09:40:19 ERROR: You have 0 sequences!
Could you help me with this?
My colleagues faced the following issue with a fresh Pipits installation from conda and asked me to investigate.
(pipits) $ pipits_funits -i test/out_seqprep/prepped.fasta -o test/out_funits -x ITS2
pipits_funits 2.2, the PIPITS Project
https://github.com/hsgweon/pipits
---------------------------------
2018-12-30 23:54:23 Error: None zero returncode: conda list
I had figured this might be a version incompatibility issue, so I downgraded both pipits and conda
(pipits) $ conda list
# packages in environment at /home/anaconda/conda/envs/pipits:
#
# Name Version Build Channel
biom-format 2.1.6 py36_1 bioconda
blas 1.0 mkl
bzip2 1.0.6 h470a237_2 conda-forge
ca-certificates 2018.11.29 ha4d7672_0 conda-forge
certifi 2018.11.29 py36_1000 conda-forge
click 7.0 py_0 conda-forge
cython 0.29.2 py36hfc679d8_0 conda-forge
fastx_toolkit 0.0.14 0 bioconda
future 0.17.1 py36_1000 conda-forge
h5py 2.9.0 py36he5c79e1_0 conda-forge
hdf5 1.10.4 nompi_h5598ddc_1105 conda-forge
hmmer 3.2.1 hfc679d8_0 bioconda
intel-openmp 2019.1 144
itsx 1.1b 1 bioconda
libffi 3.2.1 hfc679d8_5 conda-forge
libgcc-ng 7.2.0 hdf63c60_3 conda-forge
libgfortran 3.0.0 1 conda-forge
libgfortran-ng 7.2.0 hdf63c60_3 conda-forge
libgtextutils 0.7 h470a237_4 bioconda
libstdcxx-ng 7.2.0 hdf63c60_3 conda-forge
mkl 2018.0.3 1
mkl_fft 1.0.10 py36_0 conda-forge
mkl_random 1.0.2 py36_0 conda-forge
ncurses 6.1 hfc679d8_2 conda-forge
nose 1.3.7 py36_1002 conda-forge
numpy 1.15.0 py36h1b885b7_0
numpy-base 1.15.0 py36h3dfced4_0
openjdk 11.0.1 h470a237_14 conda-forge
openssl 1.0.2p h470a237_1 conda-forge
pandas 0.23.4 py36hf8a1672_0 conda-forge
perl 5.26.2 h470a237_0 conda-forge
pip 18.1 py36_1000 conda-forge
pipits 2.1 py_5 bioconda
pispino 1.1 py_1 bioconda
python 3.6.7 h5001a0f_1 conda-forge
python-dateutil 2.7.5 py_0 conda-forge
pytz 2018.7 py_0 conda-forge
rdptools 2.0.2 1 bioconda
readline 7.0 haf1bffa_1 conda-forge
scipy 1.1.0 py36hc49cb51_0
setuptools 40.6.3 py36_0 conda-forge
six 1.12.0 py36_1000 conda-forge
sqlite 3.26.0 hb1c47c0_0 conda-forge
tk 8.6.9 ha92aebf_0 conda-forge
vsearch 2.10.3 h96824bc_0 bioconda
wheel 0.32.3 py36_0 conda-forge
xz 5.2.4 h470a237_1 conda-forge
zlib 1.2.11 h470a237_3 conda-forge
Alas, the issue persisted. I debugged the code and introduced the following modifications to make it run:
def run_cmd(command, log_file, verbose):
FNULL = open(os.devnull, 'w')
p = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for l in p.stdout:
if verbose:
logger(str(l, 'utf-8').rstrip(), log_file, display = True, timestamp = False)
else:
logger(str(l, 'utf-8').rstrip(), log_file, display = False, timestamp = False)
p.wait()
FNULL.close()
# MODIFICATION: raise an error instead of calling `exit(1)`
if p.returncode != 0:
raise RuntimeError("Error: None zero returncode: " + command)
# Log versions
# MODIFICATION: add exception handling; add `import logging` to the header
try:
cmd = " ".join(["conda list"])
run_cmd(cmd, version_file, False)
except RuntimeError:
logging.exception('Hotfixed conda list call failure')
This is clearly a hotfix, yet I think it would be more convenient in the long run to introduce exception handling into your sources instead of exit
ing abruptly (as is the case in the original run_cmd
function).
Hi @hsgweon,
I recently started to use PIPITS to analyse ITS1 data and I came across an issue. Maybe it isn't really an issue and I just lack some understanding of the algorithm behind PIPITS.
I have a couple of samples where the biom file (and the otu table as well) shows actually more counts than raw read pairs.
For instance, I have a sample with 31,371 raw read pairs. If I just sum over all mapped contigs in the biom file (R command: apply(phyloseq::otu_table(x), 2, sum )) I get 124,475 features/contigs mapping for this particular sample.
Do you have any explanation for this behaviour?
I'm running PIPITS2 following the installation instructions from the github page. I tested PIPITS on a Linux platform and on macOS 10.13.5 with the same results.
Best,
Axel
First of all I would like to thank you for the pipeline.
Lately when running pispino_seqprep on default, from about 1 000 000 sequences only 5 came out of the process and when trying to do the next step no sequences survived. I've changed the default joiner method with PEAR (it comes with microbiome helper) but I'm not sure if I'm getting good results. The final taxa seems to not be correct and there is a huge variance between samples (maybe this last part is my fault in some way).
Hope this comment helps
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.