Code Monkey home page Code Monkey logo

pipits's People

Contributors

fabwei avatar grayfall avatar hsgweon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pipits's Issues

Files not merged after pispino_seqprep

Hi!

First, thanks for developing this tool, I think it will be brilliant if I get it work.

I am processing more than 2000 PE FASTQ files and everything was working until the step of merging and producing the final out_seqprep/prepped.fasta file after running pispino_seqprep.
This is the error I get:

**2022-06-29 17:36:22 ... done
2022-06-29 17:36:22 Joining paired-end reads [VSEARCH]
2022-06-29 17:36:22 Joining with VSEARCH.
vsearch v2.21.1_linux_x86_64, 251.8GB RAM, 64 cores
https://github.com/torognes/vsearch

Merging reads

Fatal error: Invalid line 3 in FASTQ file: '+' line must be empty or identical to header
2022-06-29 17:36:22 Error: None zero returncode: vsearch --fastq_mergepairs ../out-seqprep/tmp/reindex_fastq_F/ERR3280518.fastq --reverse ../out-seqprep/tmp/reindex_fastq_R/ERR3280518.fastq --fastqout ../out-seqprep/tmp/joined/ERR3280518.fastq --threads 1 --fastq_allowmergestagger --fastq_maxdiffs 500 --fastq_minovlen 20 --fastq_minmergelen 100**

I've checked the tmp files and I've realised the in the process, the program has changed the headers of the FASTQ files, so the ">" and "+" lines no longer match, and the search cannot merge the files if that's the case. Do you know what have gone wrong or how to fix it? The original files didn't have this problem.

Hope you can help, thanks very much!

Is Removing adapters needed?

Do the adapters need to be removed prior to running PIPITS or those not matter since it extracts the ITS region?

PEAR download restriction

Hello,

PEAR cannot be downloaded with wget at the moment. They also have a new homepage where you are supposed to register and get your academic-usage link to PEAR tar via mail later. I tried this two times, it worked the second time. The path for creating a link in pipits/bin is also slightly different now.
New PEAR page: https://www.h-its.org/downloads/pear-academic/

Cheers,
Fabian

unable to download UNITE trained database, version: 27.10.2022

Hi PIPITS team,
I am on the last step of pipits pipeline:
pipits_process -i out_funits/ITS.fasta -o out_process -v -r
WHILE Downloading UNITE trained database, version: 27.10.2022
It gives error that the Downloaded data is corrupt. Get in touch with PIPITS team!. Exiting...
Kindly guide how to solve this issue.

Thank you,
Javaria

ERROR: You have 0 sequences identified as ITS1. Are you sure your sequences are ITS1?

Hi,

I'm using the current version of Pipits on Ubuntu 18.04. The program has been working fine with ITS2 datasets but I keep encountering an error (see subject field) from ITSx when processing data generated with the BITS (ACCTGCGGARGGATCA) and B58S3 (GAGATCCRTTGYTRAAAGTT) primers.

In addition to my own data, I tried to process a published BITS data set that has successfully used pipits, and encountered the same error: ERROR: You have 0 sequences identified as ITS1. Are you sure your sequences are ITS1?

The data set used to verify my original issue was PRJEB32659 (A meta-barcoding analysis of soil mycobiota of the upper Andean Colombian agro-environment. Scientific Reports volume 9, Article number: 10085 (2019).

Cheers, Greg

PIPITS PROCESS Error: none zero returncode: biom convert

Hi
I recently used the PIPITS to analyze my sequencing data in Dec and it was a success
but last week i tried to analyze another set of data and received this error code during the PIPITS Process
i looked through the previous issues and tried the options of updating scipy but still had the same error response
Do you know if there is a recent software update that may be interfering with smooth analysis
Thanks

Sandra

pipits process error

Error when running RDP classification: Error: None zero returncode: classifier -Xms4g -Xmx16g classify -t pipits_db/UNITE_retrained_27.10.2022/UNITE_retrained/rRNAClassifier.properties -o process_out/assigned_taxonomy_rdp_raw.txt process_out/intermed iate/input_nr_otus_nonchimeras_relabelled.fasta

Hi,

This is my first time using PIPITS. I had no issue with the tutorial on mock samples but during pipits_process for my actual samples, I run into this issue: Error: None zero returncode: classifier -Xms4g -Xmx16g classify -t pipits_db/UNITE_retrained_27.10.2022/UNITE_retrained/rRNAClassifier.properties -o process_out/assigned_taxonomy_rdp_raw.txt process_out/intermed
iate/input_nr_otus_nonchimeras_relabelled.fasta.

Here are the results from the output.log.

2024-01-11 13:42:54 pipits_process started
2024-01-11 13:42:54 Downloading UNITE trained database, version: 27.10.2022
2024-01-11 13:48:38 ... Unpacking
2024-01-11 13:48:51 ... done
2024-01-11 13:48:51 Downloading database for SINTAX
2024-01-11 13:49:24 ... Unpacking
2024-01-11 13:49:25 ... done
2024-01-11 13:49:25 Downloading WARCUP trained database: 
2024-01-11 13:49:46 ... Unpacking
2024-01-11 13:49:47 ... done
2024-01-11 13:49:47 Downloading UCHIME database for chimera filtering: 
2024-01-11 13:49:55 ... Unpacking
2024-01-11 13:49:55 ... done
2024-01-11 13:49:55 Dereplicating and removing unique sequences prior to picking OTUs
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Dereplicating file funits_out/ITS.fasta 100%
882802030 nt in 3525785 seqs, min 100, max 482, avg 250
Sorting 100%
307813 unique sequences, avg cluster 11.5, median 1, max 146595
Writing FASTA output file 100%
2024-01-11 13:49:59 Picking OTUs [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file process_out/intermediate/input_nr.fasta 100%
84061458 nt in 307813 seqs, min 100, max 482, avg 273
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 18750 Size min 1, max 11914, avg 16.4
Singletons: 13416, 4.4% of seqs, 71.6% of clusters
2024-01-11 13:50:33 Removing chimeras [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file pipits_db/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta 100%
16786547 nt in 30555 seqs, min 146, max 2570, avg 549
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Detecting chimeras 100%
Found 96 (0.5%) chimeras, 18638 (99.4%) non-chimeras,
and 16 (0.1%) borderline sequences in 18750 unique sequences.
Taking abundance information into account, this corresponds to
7322 (1.8%) chimeras, 402368 (98.2%) non-chimeras,
and 156 (0.0%) borderline sequences in 409846 total sequences.
2024-01-11 13:50:39 Renaming OTUs
2024-01-11 13:50:39 Mapping reads onto centroids [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file process_out/intermediate/input_nr_otus_nonchimeras_relabelled.fasta 100%
4421074 nt in 18638 seqs, min 100, max 482, avg 237
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching unique query sequences: 3514479 of 3525785 (99.68%)
2024-01-11 13:54:55 Making OTU table
2024-01-11 13:55:04 Converting classic tabular OTU into a BIOM format [BIOM]
2024-01-11 13:55:10 Assigning taxonomy with VSEARCH-SINTAX [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file pipits_db/UNITE_retrained_27.10.2022.sintax.fa/UNITE_retrained_27.10.2022.sintax.fa 100%
189392914 nt in 326300 seqs, min 140, max 1501, avg 580
Counting k-mers 100%
Creating k-mer index 100%
Classifying sequences 100%
Classified 14934 of 18638 sequences (80.13%)
2024-01-11 13:56:57 Adding SINTAX assignment to OTU table [BIOM]
2024-01-11 13:56:58 Converting OTU table with taxa assignment into a BIOM format [BIOM]
2024-01-11 13:57:00 Phylotyping OTU table
2024-01-11 13:57:04 Assigning taxonomy with UNITE [RDP Classifier]
2024-01-11 16:16:51 Error: None zero returncode: classifier -Xms4g -Xmx16g classify -t pipits_db/UNITE_retrained_27.10.2022/UNITE_retrained/rRNAClassifier.properties -o process_out/assigned_taxonomy_rdp_raw.txt process_out/intermed
iate/input_nr_otus_nonchimeras_relabelled.fasta

I am unsure of what the issue is with the classifier? The SINTAX taxonomic classification did not appear to have issues. Thank you for your time.

pipits_funits

Hi

I have a problem with running pipits_funits with test dataset

(pipits_env) bio@biolinux-All-Series:~/pipits$ pipits_funits -i out_seqprep/prepped.fasta -o out_funits -x ITS2 -v
pipits_funits 2.2, the PIPITS Project
https://github.com/hsgweon/pipits

2018-09-03 18:59:17 pipits_funits started
2018-09-03 18:59:17 Checking input FASTA for illegal characters
2018-09-03 18:59:17 ... done
2018-09-03 18:59:17 Counting input sequences
2018-09-03 18:59:17 ... number of input sequences: 53
2018-09-03 18:59:17 Dereplicating sequences for efficiency
vsearch v2.8.2_linux_x86_64, 15.6GB RAM, 8 cores
https://github.com/torognes/vsearch

Dereplicating file out_seqprep/prepped.fasta 100%
13859 nt in 53 seqs, min 239, max 370, avg 261
Sorting 100%
20 unique sequences, avg cluster 2.6, median 2, max 11
Writing output file 100%
Writing uc file, first partSegmentation fault (core dumped)
2018-09-03 18:59:17 Error: None zero returncode: vsearch --derep_fulllength out_seqprep/prepped.fasta --output out_funits/intermediate/derep.fasta --uc out_funits/intermediate/derep.uc --fasta_width 0 —sizeout

Do you have solution for this?
Thank you for your help.

pipits_funits command issue

Hi,

I have an issue with running pipits_funits. I get the error below; however, if I copy paste the ITSX command and run it on it's own, it works perfectly fine. Any ideas what the problem could be?


[monodon Fungal_pipits_HPC]$ pipits_funits -i pipits_prep/prepped.fasta -o pipits_funits -t 40 -x ITS1
2018-01-16 15:53:18 INFO: PIPITS_FUNITS started
2018-01-16 15:53:18 INFO: Checking input FASTA for illegal characters
2018-01-16 15:53:25 INFO: Counting input sequences
2018-01-16 15:53:38 INFO:       Number of input sequences: 6809988
2018-01-16 15:53:38 INFO: Dereplicating sequences for efficiency
2018-01-16 15:54:52 INFO: Extracting ITS1 from sequences [ITSx]
2018-01-16 15:54:52 ERROR: None zero returncode: ITSx -i pipits_funits/intermediate/derep.fasta -o pipits_funits/intermediate/derep --preserve T -t F --cpu 40 --save_regions ITS1


[monodon Fungal_pipits_HPC]$ ITSx -i pipits_funits/intermediate/derep.fasta -o pipits_funits/intermediate/derep --preserve T -t F --cpu 40 --save_regions ITS1
ITSx -- Identifies ITS sequences and extracts the ITS region
by Johan Bengtsson-Palme et al., University of Gothenburg
Version: 1.0.11
-----------------------------------------------------------------
Tue Jan 16 16:07:48 2018 : Preparing HMM database (should be quick)...
Tue Jan 16 16:07:48 2018 : Checking and handling input sequence data (should not take long)...
Tue Jan 16 16:08:26 2018 : Doing paralellised comparison to HMM database (this may take a long while)...

Thanks for your help!

Kind regards,
Roger

Probable NumPy issue with pipits_process

Hi,

I am working my way through the instructions in your README with your test dataset. I've made it to the final step. I am getting stuck in pipits_process with the following output:

pipits_process -i out_funits/ITS.fasta -o out_process
pipits_process 2.2, the PIPITS Project
https://github.com/hsgweon/pipits
---------------------------------

2018-10-11 11:25:31 pipits_process started
2018-10-11 11:25:31 Generating a sample list from the input sequences
2018-10-11 11:25:32 Dereplicating and removing unique sequences prior to picking OTUs
2018-10-11 11:25:32 Picking OTUs [VSEARCH]
2018-10-11 11:25:32 Removing chimeras [VSEARCH]
2018-10-11 11:25:37 Renaming OTUs
2018-10-11 11:25:37 Mapping reads onto centroids [VSEARCH]
2018-10-11 11:25:37 Making OTU table
2018-10-11 11:25:37 Converting classic tabular OTU into a BIOM format [BIOM]
2018-10-11 11:25:37 Error: None zero returncode: biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json

When I run the problem command directly from the command line I get this:

biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json
Traceback (most recent call last):
  File "/Fingerlin/home/russellp/.conda/envs/pipits_env/bin/biom", line 7, in <module>
    from biom.cli import cli
  File "/Fingerlin/home/russellp/.conda/envs/pipits_env/lib/python3.6/site-packages/biom/__init__.py", line 51, in <module>
    from .table import Table
  File "/Fingerlin/home/russellp/.conda/envs/pipits_env/lib/python3.6/site-packages/biom/table.py", line 176, in <module>
    import numpy as np
  File "/usr/lib/python3.5/site-packages/numpy/__init__.py", line 142, in <module>
    from . import add_newdocs
  File "/usr/lib/python3.5/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/usr/lib/python3.5/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/usr/lib/python3.5/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/usr/lib/python3.5/site-packages/numpy/core/__init__.py", line 14, in <module>
    from . import multiarray
ImportError: cannot import name 'multiarray'

Some Googling suggests that this issue can be solved by deleting and reinstalling NumPy. However, I have not been able to change the version or reinstall NumPy due to the dependencies of PIPITS. Can you suggest a solution for this issue?

Thanks!
Pam

[SSL: CERTIFICATE_VERIFY_FAILED]

Hi,

I am running pipits in our cluster server with the command:
"pipits_process -i /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_funits6/ITS.fasta -o /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6 -v -r"

However, I received the following errors:
pipits_process -i /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_funits6/ITS.fasta -o /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6 -v -r
pipits_process 2.8, the PIPITS Project
https://github.com/hsgweon/pipits

2022-09-01 13:29:16 pipits_process started
2022-09-01 13:29:16 Generating a sample list from the input sequences
2022-09-01 13:29:35 Downloading UNITE trained database, version: 10.05.2021
Traceback (most recent call last):
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 345, in _make_request
self._validate_conn(conn)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 844, in validate_conn
conn.connect()
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 326, in connect
ssl_context=context)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/util/ssl
.py", line 325, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/home/nguyenl15/.conda/envs/pipits_env/lib/python3.6/ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "/home/nguyenl15/.conda/envs/pipits_env/lib/python3.6/ssl.py", line 817, in init
self.do_handshake()
File "/home/nguyenl15/.conda/envs/pipits_env/lib/python3.6/ssl.py", line 1077, in do_handshake
self._sslobj.do_handshake()
File "/home/nguyenl15/.conda/envs/pipits_env/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/adapters.py", line 438, in send
timeout=timeout
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 630, in urlopen
raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/nguyenl15/.conda/envs/pipits_env/bin/pipits_process", line 344, in
verbose = options.verbose)
File "/home/nguyenl15/.conda/envs/pipits_env/bin/pipits_process", line 272, in downloadDB
request = requests.get(url, stream=True)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 661, in send
history = [resp for resp in gen] if allow_redirects else []
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 661, in
history = [resp for resp in gen] if allow_redirects else []
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 214, in resolve_redirects
**adapter_kwargs
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "/hpc/apps/anaconda/anaconda3/lib/python3.4/site-packages/requests/adapters.py", line 512, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)

Could you please suggest me the solution?

Thank you,
Kind regards,
Le Phuong Nguyen.

Retrain UNITE fungal data.

Hi,
I´m cleaning up UNITE files to only include a subset and would like to retrain the files on RDP. I´ve seen that you provide already retrained files but would like to retrain on my subset.

Is there any script available to do that ?

update pipits via conda doesn't work properly

I have installed miniconda3 and installed pipits, but when i run conda update pipits or conda update -c bioconda pipits, it will only update to V 2.1. Is there another way to obtain V 2.2?

error in phylotyping with pipits 2.4

Dear hsgweon,

I recently ran pipits 2.4 with a job of 53 million reads. I used an SGE (SUSE linux).
pipits_process produced a broken phylotype table: abundances with decimals and as minor issue, the first column was named "OTU ID" and filled with taxonomy.
I reproduced the error by running the pipis_process step (pipits 2.4) on Ubuntu 18.04.1 (WSL version).
Running just pipits_process in a pipits 2.3 environment solved the problem for me.
Running the test data under 2.4 also produced the strange looking phylotype table, but with whole numbers (no difference between OTUs and phylotypes in test data, maybe) .
Therefore, I guess there might be some error hidden the phylotyping step of pipits 2.4?

Cheers and many thanks for your regular updates,
Fabian

pipits_prep parses file names badly

Hi,
I had several 'water blanks' (negative controls) in my dataset. These were given names "water_blank_1", "water_blank_2", "water_blank_3" etc. After the initial 'pipits prep' step, there was only a single 'water' sample remaining. If I had to guess, your script is overwriting the samples after splitting on the delimiter "". If true, your pipeline is broken. It is not advisable to parse file names based on "", as this is a very common delimiter used in naming. Please fix this. It will only cause further headache.

Thanks!

training with a new dataset

Dear Pipits,

Thank you for the tool, I wish to try this with my data, which is not fungal. Therefore I would have to retrain rdp.

when I type:
$ pipits_retrain_rdp -h
usage: Retrains RDP Classifier [-h] -j -f -t -o DIR

optional arguments:
-h, --help show this help message and exit
-j [REQUIRED] RDP Classifier .jar file
-f [REQUIRED] UNITE training data - FASTA sequences downloaded
from http://sourceforge.net/projects/rdp-
classifier/files/RDP_Classifier_TrainingData
-t [REQUIRED] UNITE training data - taxonomy file downloaded
from http://sourceforge.net/projects/rdp-
classifier/files/RDP_Classifier_TrainingData
-o DIR Output directory where files and settings for retrained
parameters are stored.

All the above help-info is related to the trained project. Would you be able to add a help page on github regarding retraining and how to prepare the .xml file which has the taxonomy tree? Would this work if I gave it a fasta file of sequences and an xml formatted tree ... or is it more complicated than that?

Also, would that mean I would have to alter the enviroment variable: export PIPITS_UNITE_RETRAINED_DIR=$HOME/pipits/refdb/UNITE_retrained
to my new trained data? Or can you pass this as an argument when it is required?

regards,

Peter Thorpe

Question on runtime of pipits_funits step

Dear Hyun,

First, I apologize if this is not the right place to ask questions. I have used your test dataset and all steps worked fine for me.

When using my dataset, pipits_funits step (hmmsearch) seems to be taking longer than expected. Currently, the progress output has been stopped at "2019-01-07 19:06:58 Extracting ITS1 from sequences [ITSx]" for about 4 days. According to the MacOS Activity Monitor, terminal is still processing the command. As the process seems to be still running, I checked the size of output files. Under "intermediate" folder, the "derep.fasta" is at 945.3 MB, "derep.summary.txt" is at 162 bytes, and "derep.uc" is at 398.7 MB and have not changed in size for a while.

Is this a simple computing speed issue? This computer has 8 GB of memory and pispino_seqprep output says I have 6441992 sequences prepped.

Thank you and please let me know if I should provide additional information.

PIPITS_process error linked to the UNITE_retrained_28.06.2017 reference

Hi Mr Gweon,

I'm currently trying to use PIPITS 1.5 on a cluster where PIPITS have been installed by a system administrator for all users.

Unfortunatly, I'm facing an error with the last tool of the workflow: PIPITS_process.

The error is as follow:

Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /path/to/pipits/install/PIPITS-1.5.0/refdb/UNITE_retrained_28.06.2017/rRNAClassifier.properties (Permission denied)

After encountering this error I downloaded the UNITE_retrained_28.06.2017 archive myself (from the sourceforge repository) and I checked the default permissions of the files.

All the other files have -rw-r--r-- as permissions but the rRNAClassifier.properties has only -rw-------.

Maybe you should reupload a new version of the archive with fixed default permission for this rRNAClassifier.properties file so that other people don't fall in the same trap as I did ? :)

Thanks in advance

A.B

problems at level of test_data

Hi there, I have just installed Pipits in my computer. I had all dependencies previously installed and re-tested as per instructions at pipits page. Now I am running the test data and I can't move forward from step 2.

test_data jcnavarro$ pipits_getreadpairslist -i rawdata -o readpairslist.txt
Generating a read-pair list file from the input directory...
Done. "readpairslist.txt" created.

test_data jcnavarro$ pipits_prep -i rawdata -o pipits_prep -l readpairslist.txt
2017-04-07 12:56:08 PIPITS_PREP_SINGLE started
2017-04-07 12:56:08 Processing the listfile
2017-04-07 12:56:08 Counting sequences in rawdata
2017-04-07 12:56:08 Number of reads: 75
2017-04-07 12:56:08 Reindexing forward reads
2017-04-07 12:56:08 Reindexing reverse reads
2017-04-07 12:56:08 Joining paired-end reads [PEAR]
Traceback (most recent call last):
File "/Users/jcnavarro/pipits/bin/pipits_prep", line 208, in
verbose = options.verbose)
File "/Users/jcnavarro/pipits/bin/pipits_prep_lib.py", line 268, in join
numberofsequences += int(p.communicate()[0]) / 4
ValueError: invalid literal for int() with base 10: ''

Any thoughts?

Thanks!

Javier

Error: None zero returncode - biom.exception.TableException: Unsupported matrix data type.

Hi

I installed pipits on macOS mojave as well as on a linux cluster, but I run into a similar problem, when testing on the included test files, as reportet here #20. Changing the local to en_US.UTF-8 could not solve the problem...

2019-05-26 18:17:44 Converting classic tabular OTU into a BIOM format [BIOM]
Traceback (most recent call last):
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/bin/biom", line 11, in
sys.exit(cli())
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/biom/cli/table_converter.py", line 129, in convert
table_type, process_obs_metadata, tsv_metadata_formatter)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/biom/cli/table_converter.py", line 207, in _convert
write_biom_table(result, fmt, output_filepath)
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/biom/cli/util.py", line 26, in write_biom_table
f.write(table.to_json(biom.parse.generatedby()))
File "/home/ampere/ckeuschn/anaconda3/envs/pipits_env/lib/python3.6/site-packages/biom/table.py", line 4278, in to_json
raise TableException("Unsupported matrix data type.")
biom.exception.TableException: Unsupported matrix data type.
2019-05-26 18:18:01 Error: None zero returncode: biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json

Any help is appreciated
Thanks

Pipits process: UCHIME database MD5 checksum

Error is as follows:

2019-05-14 09:31:04 Downloading UCHIME database for chimera filtering:
2019-05-14 09:31:04 ... DB directory and files exits, but seems to be different/corrupt -> re-downloading...
[###################################################################################################]100% | 8.6 MiB/s | 4578703 of 4578703 | Time: 0:00:00
File size: 4.37 MB
Downloaded data is corrupt. Get in touch with PIPITS team!. Exiting...

This only appears in the latest version and I think is due to md5 checksum not being correct for the UCHIME database it downloads. I have deleted earlier versions of the database from the bin already and that did not solve the problem.

Any ideas how to solve this?

Thanks!

problem with pipits_funits

Hi,

I have a problem with running pipits_funits. Other data sets have been successful. Two attached files are a "output.log" and a "versions.log".
Do you have solution for this?

Thank you for your help.
output.log
versions.log

ImportError: No module named pipits - Error: None zero returncode

Dear all,

I tested PIPITS (latest version) on the 'data_test' and everything has gone well (running on the frontend).

I submitted a job via qsub using my data sequencing, but the job has been killed after '
pipits_funits step.

In details, .sh script is the following:

#!/bin/bash
conda init bash
source ~/.bashrc
conda activate pipits_env
pipits_funits -i prepped.fasta -o out_funits_ITS2 -x ITS2 -r -t 30   

I submitted it via qsub and the following is the content of 'output.log':

(base) [alabbate@ui03 out_funits_ITS2]$ cat output.log
pipits_funits 2.7, the PIPITS Project
https://github.com/hsgweon/pipits

2021-04-19 15:53:33 pipits_funits started
2021-04-19 15:53:33 Checking input FASTA for illegal characters
2021-04-19 15:53:45 ... done
2021-04-19 15:53:45 Counting input sequences
2021-04-19 15:53:54 ... number of input sequences: 4190427
2021-04-19 15:53:54 Dereplicating sequences for efficiency
vsearch v2.17.0_linux_x86_64, 251.7GB RAM, 40 cores
https://github.com/torognes/vsearch

Dereplicating file /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_seqprep_plate3/prepped.fasta 100%
1462933517 nt in 4190427 seqs, min 100, max 582, avg 349
Sorting 100%
2118105 unique sequences, avg cluster 2.0, median 1, max 38177
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
2021-04-19 15:54:30 ... done
2021-04-19 15:54:30 Counting dereplicated sequences
2021-04-19 15:54:35 ... number of dereplicated sequences: 2118105
2021-04-19 15:54:35 Splitting sequences to multiple parts
[INFO] split into 30 parts
[INFO] read sequences ...
[INFO] read 2118105 sequences
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_001.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_002.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_003.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_004.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_005.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_006.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_007.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_008.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_009.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_010.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_011.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_012.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_013.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_014.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_015.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_016.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_017.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_018.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_019.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_020.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_021.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_022.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_023.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_024.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_025.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_026.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_027.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_028.fasta
[INFO] write 70604 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_029.fasta
[INFO] write 70589 sequences to file: /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/split/derep.part_030.fasta
2021-04-19 15:54:57 ... done
2021-04-20 01:48:07 Counting ITS sequences (dereplicated)
2021-04-20 01:48:07 ... number of ITS sequences (dereplicated): 124443
2021-04-20 01:48:07 Sorting by ID
[INFO] read sequences ...
[INFO] 124443 sequences loaded
[INFO] sorting ...
[INFO] output ...
2021-04-20 01:48:10 ... done
2021-04-20 01:48:10 Removing short sequences below < 100bp
vsearch v2.17.0_linux_x86_64, 251.7GB RAM, 40 cores
https://github.com/torognes/vsearch

Reading input file 100%
124309 sequences kept (of which 0 truncated), 134 sequences discarded.
2021-04-20 01:48:10 ... done
2021-04-20 01:48:10 Counting length-filtered sequences (dereplicated)
2021-04-20 01:48:10 ... number of length-filtered sequences (dereplicated): 124309
2021-04-20 01:48:10 Re-inflating sequences
Traceback (most recent call last):
File "/lustrehome/alabbate/.conda/envs/pipits_env/bin/pipits_rereplicate", line 31, in
from pipits import pipits_SeqIO as SeqIO
ImportError: No module named pipits
2021-04-20 01:48:11 Error: None zero returncode: python /lustrehome/alabbate/.conda/envs/pipits_env/bin/pipits_rereplicate -i /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/derep.ITS2.sizefiltered.fasta -o /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/ITS.fasta --uc /lustre/home/alabbate/2018_hany_in_bari/analyses/pipits/plate3/out_funits_ITS2/intermediate/derep.uc

Do you have any suggestions to fix it?
Thank you in advance for your precious help.

Best,

AL

Delay in downloading UCHIME database during the pipits_process run

Hi @hsgweon

I am running on MacOS terminal. The installation of pipits_env is done without issues and the test with your mock data worked out fine. I am running pipits_process command and it seems stopped at the UCHIME database downloading step. Please see below.

$ pipits_process -i ./tagged-its2/ITS2.assembled_tag.fasta -o out_process --unite 02.02.2019
pipits_process 2.4, the PIPITS Project
https://github.com/hsgweon/pipits

2020-01-17 12:52:53 pipits_process started
2020-01-17 12:52:53 Generating a sample list from the input sequences
2020-01-17 12:52:54 Downloading UNITE trained database, version: 02.02.2019
[#################################################################################]100% | 6.1 MiB/s | 111590718 of 111590718 | Time: 0:00:17
File size: 106.42 MB
2020-01-17 12:53:19 ... Unpacking
2020-01-17 12:53:25 ... done
2020-01-17 12:53:25 Downloading WARCUP trained database:
[###################################################################################]100% | 13.9 MiB/s | 17880783 of 17880783 | Time: 0:00:01
File size: 17.05 MB
2020-01-17 12:53:27 ... Unpacking
2020-01-17 12:53:28 ... done
2020-01-17 12:53:28 Downloading UCHIME database for chimera filtering:

Does is usually takes time during this step? It has been around 30 min without any change.
Thank you!

Locating the uchime Reference Dataset

When running the pipits_process command with the reference dataset, I keep getting the following error:

2018-09-10 19:25:06 pipits_process started
2018-09-10 19:25:06 Generating a sample list from the input sequences
2018-09-10 19:25:06 Dereplicating and removing unique sequences prior to picking OTUs
2018-09-10 19:25:06 Picking OTUs [VSEARCH]
2018-09-10 19:25:06 Removing chimeras [VSEARCH]
2018-09-10 19:25:06 Error: None zero returncode: vsearch --uchime_ref out_process/intermediate/input_nr_otus.fasta --db $PIPITS_UNITE_REFERENCE_DATA_CHIMERA --nonchimeras out_process/intermediate/input_nr_otus_nonchimeras.fasta --threads 1

I've put the following sets of environmental variables in my .bashrc file, and continue to get the same error:
export PIPITS_UNITE_RETRAINED_DIR=$HOME/pipits/refdb/UNITE_retrained
export PIPITS_UNITE_REFERENCE_DATA_CHIMERA=$HOME/pipits/refdb/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta
export PIPITS_WARCUP_RETRAINED_DIR=$HOME/pipits/refdb/warcup_retrained_V2

(From earlier question in thread about same issue):
export PIPITS_UNITE_RETRAINED_DIR=/home/lh001/pipits/refdb/UNITE_retrained
export PIPITS_UNITE_REFERENCE_DATA_CHIMERA=/home/lh001/pipits/refdb/uchime_reference_dataset_28.06.2017/uchime_referenc$
export PIPITS_WARCUP_RETRAINED_DIR=/home/lh001/pipits/refdb/warcup_retrained_V2

What can I do to resolve this issue?

Best,
Bryce Alex

Clustering at > 97%

Hi,
Thank you very much for the pipeline. It is very useful and comfortable to work with a specific pipeline developed for fungi. I have already tested with satisfactory results
I was wondering if you plan to incorporate some options for clustering with VSEARCH, especially an option to cluster at > 97%. This would be awesome for the pipeline.
Thanks a lot again.
Greetings,
Jaime

Empty read pair list

I just ran the pispino_createreadpairslist -i rawdata/ -o paired_readlist.txt and the paired_readlist resulted to be empty? I have my fastq files in the rawdata folder I don't know what could be happening.

ITSx and hmmscan speeds

Hi there, I have been trying out the PIPITS pipeline and had a few questions regarding processing speeds. As stated in the PIPITS paper, the pipits_funits step is the computational bottleneck in the pipeline. If I'm not mistaken, ITSx utilizes hmmscan. I did a little bit of research, and it seems that hmmscan is substantially slower than hmmsearch (see URL below). What are your thoughts on this? Do you think it would be possible or worthwhile to use hmmsearch to speed up this step? Apologies if I have overlooked an obvious reason not to do so.

https://cryptogenomicon.org/2011/05/27/hmmscan-vs-hmmsearch-speed-the-numerology/

Error: None zero returncode after Re-inflating sequences

I am running pipits_funits and after a while when Re-inflating sequences I get this error:

Error: None zero returncode: python /Users/miniconda3/envs/pipits_envs/bin/pipits_rereplicate -i out_funits/intermediate/derep.ITS1.sizefiltered.fasta -o out_funits/ITS.fasta --uc out_funits/intermediate/derep.uc

When I check my folder I am in the intermediate folder? What this could be? I have also done the export LC_ALL=en_us.UTF-8 since I saw this helped in a similar problem.

Issue with Pipits_ process

ERRO: environment variables (PIPITS_UNITE_REFERENCE_DATA_CHIMERA, PIPITS_UNITE_RETRAINED_DIR, PIPITS_WARCUP_RETRAINED_DIR) are not set. Please see PIPITS installation for an instruction on this.
(pipits_env)

I am getting the above issue and also unable to update to PIPITS 2.3

pipits 2 on mac

Hi there, I have installed pipits 2 on mac following all instructions in the site, and tested it using the raw data provided and i can't pass from pispino_seqprep step. See below.
Any suggestions?
Bests

JCNavarro-M47:test_pipits jcnavarro$ source activate pipits_env
discarding //anaconda/bin from PATH
prepending //anaconda/envs/pipits_env/bin to PATH
(pipits_env)JCNavarro-M47:test_pipits jcnavarro$ pispino_createreadpairslist -i rawdata -o readpairslist.txt
Generating a read-pair list file from the input directory...
Done - "readpairslist.txt" created
(pipits_env)JCNavarro-M47:test_pipits jcnavarro$ pispino_seqprep -i rawdata -o out_seqprep -l readpairslist.txt
2018-03-28 15:42:35 pispino_seqprep started
2018-03-28 15:42:35 Checking listfile
2018-03-28 15:42:35 ... done
2018-03-28 15:42:35 Counting sequences in rawdata
Traceback (most recent call last):
File "//anaconda/envs/pipits_env/bin/pispino_seqprep", line 203, in
verbose = options.verbose)
File "//anaconda/envs/pipits_env/lib/python2.7/site-packages/pispino/seqprep.py", line 43, in count_sequences
numberofsequences += int(getFileLineCount(input_dir + "/" + filename, extensionType) / 4)
File "//anaconda/envs/pipits_env/lib/python2.7/site-packages/pispino/seqtools.py", line 21, in getFileLineCount
f = bz2.open(filename, "r")
AttributeError: 'module' object has no attribute 'open'

problem with pipits_funits (ITSx)

Hi,

I have a problem when using pipits_funits on my own data. It works perfectly fine on the test data provided. But when I try to run it on my data its output looks like this:

pipits_funits 2.1, the PIPITS Project
https://github.com/hsgweon/pipits
---------------------------------

2018-06-15 13:56:32 pipits_funits started
2018-06-15 13:56:32 Checking input FASTA for illegal characters
2018-06-15 13:56:37 ... done
2018-06-15 13:56:37 Counting input sequences
2018-06-15 13:56:39 ... number of input sequences: 2402738
2018-06-15 13:56:39 Dereplicating sequences for efficiency
2018-06-15 13:57:09 Counting dereplicated sequences
2018-06-15 13:57:10 ... number of dereplicated sequences: 557710
2018-06-15 13:57:10 Extracting ITS2 from sequences [ITSx]
ITSx -- Identifies ITS sequences and extracts the ITS region
by Johan Bengtsson-Palme et al., University of Gothenburg
Version: 1.1b1
-----------------------------------------------------------------
Fri Jun 15 13:57:10 2018 : Preparing HMM database (should be quick)...
Fri Jun 15 13:57:10 2018 : Checking and handling input sequence data (should not take long)...
Fri Jun 15 13:57:20 2018 : Doing paralellised comparison to HMM database (this may take a long while)...
    Fri Jun 15 18:47:46 2018 : Fungi analysis of main strand finished.

There is no error, it just stops and I am left with the intermediate files. I've tried it couple of times and the output always looks like that.

Do you know where could be the problem?

Not able to import phylotype table using phyloseq

Hi,

I have an issue importing the phylotype table using the phyloseq package. I tried phylotype table from different pipits runs. All show the same behavior (see below).

ps <- import_biom(BIOMfilename = "data/20056_ITS/phylotype_table.biom")
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: There were 50 or more warnings (use warnings() to see the first 50)

The normal biom file (otu_tabke.biom) just works fine.

Any suggestings to get rid off this error?

Thanks a lot!

Best,
Axel

error non zero returncode re-inflating sequences.

Dear all,

I also encountered the non-zero returncode error but in a different part of the pipeline, during the re-inflating sequences. After the pipits-funits command. I have tried the suggestions made in earlier threads but it still gave me the error. If somebody has some suggestions that would be appreciated.

(pipits_env) henriks-mini:pipits_test henrik$ pipits_funits -i out_seqprep/prepped.fasta -o out_funits -x ITS2
pipits_funits 2.4, the PIPITS Project
https://github.com/hsgweon/pipits

2019-11-28 10:16:41 pipits_funits started
2019-11-28 10:16:41 Checking input FASTA for illegal characters
2019-11-28 10:16:41 ... done
2019-11-28 10:16:41 Counting input sequences
2019-11-28 10:16:41 ... number of input sequences: 53
2019-11-28 10:16:41 Dereplicating sequences for efficiency
2019-11-28 10:16:41 ... done
2019-11-28 10:16:41 Counting dereplicated sequences
2019-11-28 10:16:41 ... number of dereplicated sequences: 20
2019-11-28 10:16:41 Extracting ITS2 from sequences [ITSx]
2019-11-28 10:16:44 ... done
2019-11-28 10:16:44 Counting ITS sequences (dereplicated)
2019-11-28 10:16:44 ... number of ITS sequences (dereplicated): 19
2019-11-28 10:16:44 Sorting by ID
2019-11-28 10:16:44 ... done
2019-11-28 10:16:44 Removing short sequences below < 100bp
2019-11-28 10:16:44 ... done
2019-11-28 10:16:44 Counting length-filtered sequences (dereplicated)
2019-11-28 10:16:44 ... number of length-filtered sequences (dereplicated): 19
2019-11-28 10:16:44 Re-inflating sequences
2019-11-28 10:16:44 Error: None zero returncode: python /Users/henrik/miniconda2/envs/pipits_env/bin/pipits_rereplicate -i out_funits/intermediate/derep.ITS2.sizefiltered.fasta -o out_funits/ITS.fasta --uc out_funits/intermediate/derep.uc
(pipits_env) henriks-mini:pipits_test henrik$

Stucked in the stage 'PIPITS_PROCESS'

Hi, Hyun,

I followed your instructions, but I can not move forward from the stage 'PIPITS_PROCESS'.

I am running a test with your test file, and my problem is indicated below.

  • From the output log -

pipits_process 2.2, the PIPITS Project
https://github.com/hsgweon/pipits

�[91m2018-09-10 17:27:16�[0m �[0mpipits_process started�[0m
�[91m2018-09-10 17:27:16�[0m �[0mGenerating a sample list from the input sequences�[0m
�[91m2018-09-10 17:27:16�[0m �[0mDereplicating and removing unique sequences prior to picking OTUs�[0m
vsearch v2.8.0_linux_x86_64, 7.6GB RAM, 4 cores
https://github.com/torognes/vsearch

Reading file out_funits/ITS.fasta 100%
8584 nt in 52 seqs, min 144, max 275, avg 165
Dereplicating 100%
Sorting 100%
15 unique sequences, avg cluster 3.5, median 2, max 12
Writing output file 100%
9 uniques written, 6 clusters discarded (40.0%)
�[91m2018-09-10 17:27:16�[0m �[0mPicking OTUs [VSEARCH]�[0m
vsearch v2.8.0_linux_x86_64, 7.6GB RAM, 4 cores
https://github.com/torognes/vsearch

Reading file out_process/intermediate/input_nr.fasta 100%
1447 nt in 9 seqs, min 147, max 187, avg 161
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 6 Size min 1, max 3, avg 1.5
Singletons: 4, 44.4% of seqs, 66.7% of clusters
�[91m2018-09-10 17:27:16�[0m �[0mRemoving chimeras [VSEARCH]�[0m
vsearch v2.8.0_linux_x86_64, 7.6GB RAM, 4 cores
https://github.com/torognes/vsearch

Unable to open file for reading ($HOME/pipits/refdb/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta)
�[91m2018-09-10 17:27:16�[0m Error: None zero returncode: vsearch --uchime_ref out_process/intermediate/input_nr_otus.fasta --db $PIPITS_UNITE_REFERENCE_DATA_CHIMERA --nonchimeras out_process/intermediate/input_nr_otus_nonchimeras.fasta --threads 1

I am a beginner, so would you please let me know in details, so that I can follow it.

Many thanks in advance.

DongHyeon

Issue rereplicating

Thanks for a great pipeline. Very nice.
I hope I'm not missing something obvious about my environment here, but this ModuleNotFoundError seems odd. Any help is greatly appreciated. 2.2 is just installed and everything running fine until pipits_funits. pipits can't find pipits module? I'm running on a cluster with Anaconda3. Here's the last few lines from the log (my paths changed to $HOME):

Traceback (most recent call last):
File "$HOME/.conda/envs/pipits_env/bin/pipits_rereplicate", line 31, in
from pipits import pipits_SeqIO as SeqIO
ModuleNotFoundError: No module named 'pipits'
ESC[91m2018-08-02 17:08:34ESC[0m Error: None zero returncode: python $HOME/.conda/envs/pipits_env/bin/pipits_rereplicate -i ITS2ExtractOut/intermediate/derep.ITS2.sizefiltered.fasta -o ITS2ExtractOut/ITS.fasta --uc ITS2ExtractOut/intermediate/derep.uc

Thanks!

Error while running pipits_process: "Error: None zero returncode: biom convert ..."

I think I installed pipits correctly. I downloaded the test data and the first few commands seemed to work fine, but then I got an error message when running pipits_process:

$ pipits_process -i out_funits/ITS.fasta -o out_process --Xmx 12G

pipits_process 2.2, the PIPITS Project
…
2018-09-11 12:18:37 Converting classic tabular OTU into a BIOM format [BIOM]
2018-09-11 12:18:37 Error: None zero returncode: biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json

Then I tried running that command directly and it looks like the dependency "Click" does not work when the locale for Python 3 is set to "ASCII":

biom convert -i out_process/intermediate/otu_table_prelim.txt -o out_process/intermediate/otu_table_prelim.biom --table-type="OTU table" --to-json
Traceback (most recent call last):
…
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment.  Consult http://click.pocoo.org/python3/for mitigation steps.

Any ideas of how to work around this issue? I've installed everything via conda into the default conda environment.

I think my locale is okay:

env | grep UTF
LANG=en_US.UTF-8

FUNITS -t command is it specific for cpus or threads?

I am running FUNITS step and the I indicate -t 30 I got an error that says that I have only 2 cpu. my question is the -t is it to specify cpus or number of threads? I have dual cpu 20 cores and 40 threads workstation. It gave me the same error in both 20 and above integers but it ran when only used 2 as an integer.

pipits_prep command issue

Hi,
I am trying to work out on your pipeline I installed all dependencies including ITSx. But after installation as you mentioned in synopsis, i tried to run test data and used command pipits_prep -i rawdata -o pipits_prep -l readpairslist.txt but i found below mentioned output and unable to move forward to command pipits_funits -i pipits_prep/prepped.fasta -o pipits_funits -x ITS2

2017-10-31 03:26:13 PIPITS_PREP started
2017-10-31 03:26:13 Processing the listfile
2017-10-31 03:26:13 Counting sequences in rawdata
2017-10-31 03:26:13 Number of reads: 75
2017-10-31 03:26:13 Reindexing forward reads
2017-10-31 03:26:13 Reindexing reverse reads
2017-10-31 03:26:13 Joining paired-end reads [VSEARCH]
2017-10-31 03:26:13 ERROR: None zero returncode: vsearch --fastq_mergepairs pipits_prep/tmp/reindex_fastq_F/A01B.fastq --reverse pipits_prep/tmp/reindex_fastq_R/A01B.fastq --fastqout pipits_prep/tmp/joined/A01B.fastq --threads 1 --fastq_allowmergestagger --fastq_maxdiffs 500 --fastq_minovlen 20 --fastq_minmergelen 100

I am anxiously waiting for your quick and favorable response.
Abid

Issues accessing classifier.jar within PIPETS PROCESS

Hi @hsgweon, I hope all is well!

I'm planning on using the PIPITS pipeline for some MiSeq data analysis. I'm currently in the process of testing PIPITS installation (as advised in the guide) but I'm having issues accessing the rdp classifier to assign taxonomy for my final OTU table:

image

Classifier.jar is present in the relevant directory, and file permissions seem okay... any ideas?

image

Sorry if I'm missing something drastic! I'm new to Linux (Ubuntu) and this pipeline, with all my previous work focused to 16S rRNA pipelines in R).

Best wishes
Ryan

pipits_funits "you have 0 sequences" error

Hello,

I am trying to run fungal sequences with PIPITS but I'm running into issues with the pipits_funits step. I saw that a previous user also had this issue but they did not share their solution. Here is my script and the resulting error message:

(pipits_env) qiime2@qiime2core2018-4:/media/sf_Shared_Folder/Arctic_Bioremediation_tFINAL/ITS_30Aug2018_undetermined$ pipits_funits -i out_seqprep/prepped.0.fasta -o outfunits_0 -x ITS2 -t 8

pipits_funits 2.2, the PIPITS Project
https://github.com/hsgweon/pipits

2018-12-17 13:56:15 pipits_funits started
2018-12-17 13:56:15 Checking input FASTA for illegal characters
2018-12-17 13:56:20 ... done
2018-12-17 13:56:20 Counting input sequences
2018-12-17 13:56:22 ... number of input sequences: 1312800
2018-12-17 13:56:22 Dereplicating sequences for efficiency
2018-12-17 13:56:35 ... done
2018-12-17 13:56:35 Counting dereplicated sequences
2018-12-17 13:56:35 ... number of dereplicated sequences: 154140
2018-12-17 13:56:35 Extracting ITS2 from sequences [ITSx]
2018-12-17 16:43:56 ... done
2018-12-17 16:43:56 Counting ITS sequences (dereplicated)
2018-12-17 16:43:56 ERROR: You have 0 sequences!

I have also included the summary and output files:
output.log
summary.log

To try to fix this I have tried installing vsearch 2.8.0, splitting the data into smaller groups, and using different numbers of threads. Do you have any suggestions?

Thank you so much!
Shelby

Unable to run pipits

I am running the command

pipits_funits -i combined_seqs.fasta -o pipits_funits -x ITS2

Here are my headers:

>ITS.Rev.B.1.S138_0 M02849:171:000000000-ANHND:1:1101:20355:1874_1:N:0:138
GGTACTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTTCGGAAGGATCATTAAATAATTTTTAATTTTTATTCTTCGCGTTATATTCTTAATATATTTTACTGTGAACTGTATTATTTCATTACGCTTGATTAATCCTTCTGCTTTACCATAATGGACAGTTCATCGAAGATGTTAACCGAGTCGTGGTCAAGCTTATCCTTGGTGTCCTTAATTATTATTCTCCAAAAGAATTCATTTTAAAAATATTTTAATATGGGCTTAAAAAACTCATTAAAACAACTTTTAAC

>ITS.Rev.B.1.S138_1 M02849:171:000000000-ANHND:1:1101:22962:1969_1:N:0:138
GGTACTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGTAAGGATCATTACCGAGTGCGGGCCCTCTGGGTCCAACCTCCCATCCGTGTCTATCTGTACCCTGTTGCTTCGGCGTTTCCTCGGCCCGCCGCAGACTAACATTTTAACACTGTCTGAAGTTTGCAGTCTGAGTTTTTAGTTAAACAATAATTAAAACTTTCAACAACTTATCTCTTGGTTCCGTCATCGATGAAGAACGCAGCGAAATGCGATAATTAATTTGAATTTCAGAATTCAGTGAATCTTCG

Error can not find pipits_uc2otutable in line 35, in <module>

Hi
I am using pipits_process with the command:

pipits_process -i /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_funits6/ITS.fasta -o /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6

However, I received the errors as below:

pipits_process 2.8, the PIPITS Project
https://github.com/hsgweon/pipits

2022-09-01 18:45:46 pipits_process started
2022-09-01 18:45:46 Generating a sample list from the input sequences
2022-09-01 18:46:04 Downloading UNITE trained database, version: 10.05.2021
2022-09-01 18:46:05 ... DB directory and files exits, and all looking good. No need to download.
2022-09-01 18:46:05 ... Unpacking
2022-09-01 18:46:10 ... done
2022-09-01 18:46:10 Downloading WARCUP trained database:
2022-09-01 18:46:10 ... DB directory and files exits, and all looking good. No need to download.
2022-09-01 18:46:10 ... Unpacking
2022-09-01 18:46:11 ... done
2022-09-01 18:46:11 Downloading UCHIME database for chimera filtering:
2022-09-01 18:46:11 ... DB directory and files exits, and all looking good. No need to download.
2022-09-01 18:46:11 ... Unpacking
2022-09-01 18:46:11 ... done
2022-09-01 18:46:11 Dereplicating and removing unique sequences prior to picking OTUs
vsearch v2.18.0_linux_x86_64, 7.6GB RAM, 2 cores
https://github.com/torognes/vsearch

Dereplicating file /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_funits6/ITS.fasta 100%
1618981758 nt in 8399800 seqs, min 100, max 462, avg 193
Sorting 100%
429233 unique sequences, avg cluster 19.6, median 1, max 732858
Writing output file 100%
68158 uniques written, 361075 clusters discarded (84.1%)
2022-09-01 18:46:21 Picking OTUs [VSEARCH]
vsearch v2.18.0_linux_x86_64, 7.6GB RAM, 2 cores
https://github.com/torognes/vsearch

Reading file /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/input_nr.fasta 100%
13560288 nt in 68158 seqs, min 102, max 372, avg 199
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 1319 Size min 1, max 6769, avg 51.7
Singletons: 458, 0.7% of seqs, 34.7% of clusters
2022-09-01 18:46:33 Removing chimeras [VSEARCH]
vsearch v2.18.0_linux_x86_64, 7.6GB RAM, 2 cores
https://github.com/torognes/vsearch

Reading file pipits_db/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta 100%
16786547 nt in 30555 seqs, min 146, max 2570, avg 549
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Detecting chimeras 100%
Found 519 (39.3%) chimeras, 792 (60.0%) non-chimeras,
and 8 (0.6%) borderline sequences in 1319 unique sequences.
Taking abundance information into account, this corresponds to
12389 (2.1%) chimeras, 584070 (97.8%) non-chimeras,
and 453 (0.1%) borderline sequences in 596912 total sequences.
2022-09-01 18:46:37 Renaming OTUs
2022-09-01 18:46:37 Mapping reads onto centroids [VSEARCH]
vsearch v2.18.0_linux_x86_64, 7.6GB RAM, 2 cores
https://github.com/torognes/vsearch

Reading file /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/input_nr_otus_nonchimeras_relabelled.fasta 100%
147858 nt in 792 seqs, min 102, max 372, avg 187
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching unique query sequences: 8307353 of 8399800 (98.90%)
2022-09-01 19:04:23 Making OTU table
Traceback (most recent call last):
File "/hpc/home/nguyenl15/.conda/envs/pipits_env/bin/pipits_uc2otutable", line 35, in
infile = open(options.infile, "r")
FileNotFoundError: [Errno 2] No such file or directory: '/hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/otus.uc'
2022-09-01 19:04:23 Error: None zero returncode: python /hpc/home/nguyenl15/.conda/envs/pipits_env/bin/pipits_uc2otutable -i /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/otus.uc -o /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/intermediate/otu_table_prelim.txt -l /hpc/home/nguyenl15/Phuong_ITS/ITS_FCG_out_process6/sampleIDs.txt

Could you give some advice?

Thank you,
Kind regards,
Le Phuong Nguyen

"You Have 0 Sequences" Error When Running pipits_funits

I've been receiving the following error while running pipits_funits on the test data set:

2018-09-18 09:13:01 pipits_funits started
2018-09-18 09:13:01 Checking input FASTA for illegal characters
2018-09-18 09:13:01 ... done
2018-09-18 09:13:01 Counting input sequences
2018-09-18 09:13:01 ... number of input sequences: 7394
2018-09-18 09:13:01 Dereplicating sequences for efficiency
2018-09-18 09:13:01 ... done
2018-09-18 09:13:01 Counting dereplicated sequences
2018-09-18 09:13:01 ... number of dereplicated sequences: 3488
2018-09-18 09:13:01 Extracting ITS2 from sequences [ITSx]
2018-09-18 09:40:19 ... done
2018-09-18 09:40:19 Counting ITS sequences (dereplicated)
2018-09-18 09:40:19 ERROR: You have 0 sequences!

Could you help me with this?

Error: None zero returncode: conda list

My colleagues faced the following issue with a fresh Pipits installation from conda and asked me to investigate.

(pipits) $ pipits_funits -i test/out_seqprep/prepped.fasta -o test/out_funits -x ITS2                             
pipits_funits 2.2, the PIPITS Project
https://github.com/hsgweon/pipits
---------------------------------

2018-12-30 23:54:23 Error: None zero returncode: conda list

I had figured this might be a version incompatibility issue, so I downgraded both pipits and conda

(pipits) $ conda list
# packages in environment at /home/anaconda/conda/envs/pipits:
#
# Name                    Version                   Build  Channel
biom-format               2.1.6                    py36_1    bioconda
blas                      1.0                         mkl  
bzip2                     1.0.6                h470a237_2    conda-forge
ca-certificates           2018.11.29           ha4d7672_0    conda-forge
certifi                   2018.11.29            py36_1000    conda-forge
click                     7.0                        py_0    conda-forge
cython                    0.29.2           py36hfc679d8_0    conda-forge
fastx_toolkit             0.0.14                        0    bioconda
future                    0.17.1                py36_1000    conda-forge
h5py                      2.9.0            py36he5c79e1_0    conda-forge
hdf5                      1.10.4          nompi_h5598ddc_1105    conda-forge
hmmer                     3.2.1                hfc679d8_0    bioconda
intel-openmp              2019.1                      144  
itsx                      1.1b                          1    bioconda
libffi                    3.2.1                hfc679d8_5    conda-forge
libgcc-ng                 7.2.0                hdf63c60_3    conda-forge
libgfortran               3.0.0                         1    conda-forge
libgfortran-ng            7.2.0                hdf63c60_3    conda-forge
libgtextutils             0.7                  h470a237_4    bioconda
libstdcxx-ng              7.2.0                hdf63c60_3    conda-forge
mkl                       2018.0.3                      1  
mkl_fft                   1.0.10                   py36_0    conda-forge
mkl_random                1.0.2                    py36_0    conda-forge
ncurses                   6.1                  hfc679d8_2    conda-forge
nose                      1.3.7                 py36_1002    conda-forge
numpy                     1.15.0           py36h1b885b7_0  
numpy-base                1.15.0           py36h3dfced4_0  
openjdk                   11.0.1              h470a237_14    conda-forge
openssl                   1.0.2p               h470a237_1    conda-forge
pandas                    0.23.4           py36hf8a1672_0    conda-forge
perl                      5.26.2               h470a237_0    conda-forge
pip                       18.1                  py36_1000    conda-forge
pipits                    2.1                        py_5    bioconda
pispino                   1.1                        py_1    bioconda
python                    3.6.7                h5001a0f_1    conda-forge
python-dateutil           2.7.5                      py_0    conda-forge
pytz                      2018.7                     py_0    conda-forge
rdptools                  2.0.2                         1    bioconda
readline                  7.0                  haf1bffa_1    conda-forge
scipy                     1.1.0            py36hc49cb51_0  
setuptools                40.6.3                   py36_0    conda-forge
six                       1.12.0                py36_1000    conda-forge
sqlite                    3.26.0               hb1c47c0_0    conda-forge
tk                        8.6.9                ha92aebf_0    conda-forge
vsearch                   2.10.3               h96824bc_0    bioconda
wheel                     0.32.3                   py36_0    conda-forge
xz                        5.2.4                h470a237_1    conda-forge
zlib                      1.2.11               h470a237_3    conda-forge

Alas, the issue persisted. I debugged the code and introduced the following modifications to make it run:

  1. In /home/anaconda/conda/envs/pipits/lib/python3.6/site-packages/pispino/runcmd.py
def run_cmd(command, log_file, verbose):
    
    FNULL = open(os.devnull, 'w')

    p = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

    for l in p.stdout:
        if verbose:
            logger(str(l, 'utf-8').rstrip(), log_file, display = True, timestamp = False)
        else:
            logger(str(l, 'utf-8').rstrip(), log_file, display = False, timestamp = False)

    p.wait()
    FNULL.close()

    # MODIFICATION: raise an error instead of calling `exit(1)`
    if p.returncode != 0:
        raise RuntimeError("Error: None zero returncode: " + command)
  1. in /home/anaconda/conda/envs/pipits/bin/pipits_funits
    # Log versions
    # MODIFICATION: add exception handling; add `import logging` to the header
    try:
        cmd = " ".join(["conda list"])
        run_cmd(cmd, version_file, False)
    except RuntimeError:
        logging.exception('Hotfixed conda list call failure')

This is clearly a hotfix, yet I think it would be more convenient in the long run to introduce exception handling into your sources instead of exiting abruptly (as is the case in the original run_cmd function).

More counts than read pairs

Hi @hsgweon,

I recently started to use PIPITS to analyse ITS1 data and I came across an issue. Maybe it isn't really an issue and I just lack some understanding of the algorithm behind PIPITS.

I have a couple of samples where the biom file (and the otu table as well) shows actually more counts than raw read pairs.
For instance, I have a sample with 31,371 raw read pairs. If I just sum over all mapped contigs in the biom file (R command: apply(phyloseq::otu_table(x), 2, sum )) I get 124,475 features/contigs mapping for this particular sample.

Do you have any explanation for this behaviour?

I'm running PIPITS2 following the installation instructions from the github page. I tested PIPITS on a Linux platform and on macOS 10.13.5 with the same results.

Best,
Axel

major problem with pispino_seqprep and VSEARCH

First of all I would like to thank you for the pipeline.

Lately when running pispino_seqprep on default, from about 1 000 000 sequences only 5 came out of the process and when trying to do the next step no sequences survived. I've changed the default joiner method with PEAR (it comes with microbiome helper) but I'm not sure if I'm getting good results. The final taxa seems to not be correct and there is a huge variance between samples (maybe this last part is my fault in some way).

Hope this comment helps

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.