palmuc / transpi Goto Github PK

TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly

License: Other

Dockerfile 0.35% Nextflow 69.24% R 5.53% Python 2.78% Shell 22.10%

rna-seq-pipeline denovo-assembly non-model-species rna-seq transcriptomics assembly annotation nextflow nextflow-pipeline

transpi's Introduction

TransPi - TRanscriptome ANalysiS PIpeline

 _______                                 _____   _
|__   __|                               |  __ \ (_)
   | |     _ __    __ _   _ __    ___   | |__) | _
   | |    |  __|  / _  | |  _ \  / __|  |  ___/ | |
   | |    | |    | (_| | | | | | \__ \  | |     | |
   |_|    |_|     \__,_| |_| |_| |___/  |_|     |_|

Preprint

General info
- Pipeline processes
- Manual
Publication
- Citation
Funding
Future work
Issues
- Chat

General info

TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly

TransPi is based on the scientific workflow manager Nextflow. It is designed to help researchers get the best reference transcriptome assembly for their organisms of interest. It performs multiple assemblies with different parameters to then get a non-redundant consensus assembly. It also performs other valuable analyses such as quality assessment of the assembly, BUSCO scores, Transdecoder (ORFs), and gene ontologies (Trinotate), etc. All these with minimum input from the user but without losing the potential of a comprehensive analysis.

Pipeline processes

Figure 1. TransPi v1.0.0 flowchart showing the various steps and analyses it can performed. For simplicity, this diagram does not show all the connections between the processes. Also, it omits other additional options like the BUSCO distribution and transcriptome filtering with psytrans (see Section 2.6). ORFs=Open reading Frames; HTML=Hypertext Markup Language.

Manual

TransPi documentation and examples can be found here

Publication

Preprint of TransPi including kmer, reads length, and reads quantities tests can be found here. Also we tested the pipeline with over 45 samples from different phyla.

TransPi has been peer-reviewed and recommended by Peer Community In Genomics (https://doi.org/10.24072/pci.genomics.100009)

Citation

If you use TransPi please cite the peer-reviewed publication:

Rivera-Vicéns, R.E., García-Escudero, CA., Conci, N., Eitel, M., and Wörheide, G. (2021). TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly. bioRxiv 2021.02.18.431773; doi: https://doi.org/10.1101/2021.02.18.431773

Funding

European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 764840 (ITN IGNITE).
Advanced Human Capital Program of the National Commission for Scientific and Technological Research (CONICYT)
Lehre@LMU (project number: W19 F1; Studi forscht@GEO)
LMU Munich’s Institutional Strategy LMUexcellent within the framework of the German Excellence Initiative

Future work

Cloud deployment of the tool

Issues

We tested TransPi using conda, singularity and docker. However, if you find a problem or get an error please let us know by opening an issue.

Chat

If you have further questions and need help with TransPi you can chat with us in the TransPi Gitter chat

transpi's People

Contributors

Stargazers

Watchers

Forkers

juadiegaitan gitter-badger seifudd n-conci rsuchecki wjt0925 brittonstrickland altingia peterdfields idobar yandinglilly mdibl solaymane rosscampbellnih

transpi's Issues

Certificate check in BUSCO not working

BUSCO webpage certificate is expired. Adding the --no-check-certificate to avoid precheck error.

Unexpected run time, Trinity/salmon

Hi,

I've noticed that for some datasets the TransPi (conda controlled) seems to sort of get stuck in the Trinity assembly phase. Basically the pipeline stays in the salmon index phase. When I run the Trinity step (pulling the specific Trinity installation from conda matching that which is created as part of the pipeline) independent of the larger TransPi NextFlow recipe it completes successfully relatively quickly compared to how long I've been waiting for the TransPi pipeline to finish. Do you have any ideas about what might be going wrong?

License

TransPi looks really useful and I am keen to give it a go, but as far as I understand, I cannot due to the lack of a license for this repo.

See e.g. https://opensource.stackexchange.com/questions/1720/what-can-i-assume-if-a-publicly-published-project-has-no-license

precheck not installing software

Hi, thanks for developing this nice pipeline, it looks really promising. I am trying to follow the manual, but it seems like the precheck_TransPi.sh script is not installing any software. It seems to download all of the DBs, but when I proceed to run the full TransPi analysis, it stops almost immediately because it cannot find fastp. In fact, I don't seem to find any of the dependencies installed on my system. It is my understanding that the precheck_TransPi script should have handled that. Am I missing something? Sorry if this has been addressed elsewhere, or if I am missing something obvious.

Mulled container libbz2.so.1 shared libraries

Problem:
Container for evigene produces the following error

/usr/local/bin/blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory

Possible solution:
Add bzip2 to the container recipe

Signal P Compatibility Error?

Hello,

I am trying to install and use signalP. Which version of signalP works with TransPi?

I tried installing v6 and v5. I get similar errors as below , so we changed the nextflow.config file from "-f short -n 4.signalp.out 4.combined.okay.fa.transdecoder.pep" to be "-format short -stdout 2.signalp.out -fasta 2.combined.okay.fa.transdecoder.pep" and are still getting error messages. Is this because we are using an incompatible version or is there a different error you could help us fix? Thank you for your help.

Error executing process > 'signalP_trinotate (4)'

Caused by:
Process signalP_trinotate (4) terminated with an error exit status (2)

Command executed:

#signalP to predict signal peptides

echo -e "\n-- Starting with SignalP --\n"

/usr/local/bin/signalp -f short -n 4.signalp.out 4.combined.okay.fa.transdecoder.pep

echo -e "\n-- Done with SignalP --\n"

Command exit status:
2

Command output:

-- Starting with SignalP --

Command error:
flag provided but not defined: -f
Usage of /usr/local/bin/signalp:
-batch int
Number of sequences that the tool will run simultaneously. Decrease or increase size depending on your system memory. (default 10000)
-fasta string
Input file in fasta format.
-format string
Output format. 'long' for generating the predictions with plots, 'short' for the predictions without plots. (default "short")
-gff3
Make gff3 file of processed sequences.
-mature
Make fasta file with mature sequence.
-org string
Organism. Archaea: 'arch', Gram-positive: 'gram+', Gram-negative: 'gram-' or Eukarya: 'euk' (default "euk")
-plot string
Plots output format. When long output selected, choose between 'png', 'eps' or 'none' to get just a tabular file. (default "png")
-prefix string
Output files prefix. (default "Input file prefix")
-stdout
Write the prediction summary to the STDOUT.
-tmp string
Specify temporary file directory. (default "System default tmpdir")
-verbose
Verbose output. Specify '-verbose=false' to avoid printing. (default true)
-version
Prints version.

Work dir:
/data/gn2311/transpi/TransPi/work/a2/ee16086eae55047a892f95dc3fb6cd

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

TransPi Single End Reads?

Hi, I am currently working on a project that involves de novo transcriptome assembly and analysis. However I only have single end reads. Is TransPi able to process single end read fastq files? Thank you for your help.

Java issue

Hi all,

Excited to get this pipeline started. The installation via Conda and the precheck worked great. I'm currently trying to run some data through the pipeline, but there seems to be an issue when pointing to my java installation.

By running:

/nextflow run TransPi.nf --all --reads '~Desktop/RNAseq/*_R[1,2].fastq.gz' --k 25,41,53 --maxReadLen 80 -profile conda

I made sure I have java set in my path:
export JAVA_HOME=$(/usr/libexec/java_home)

Running the transpi command, I get the following error:

ERROR: Cannot find Java or it's a wrong version -- please make sure that Java 8 or later is installed
NOTE: Nextflow is trying to use the Java VM defined by the following environment variables:
JAVA_CMD: /Library/Java/JavaVirtualMachines/jdk-16.0.2.jdk/Contents/Home/bin/java
JAVA_HOME: /Library/Java/JavaVirtualMachines/jdk-16.0.2.jdk/Contents/Home

I tried a few troubleshooting items online but no success. Any thoughts? Thanks for your help.

Unknown error

It is a great tool and I would love to use it for my RNA seq.
However, when I download and run the database, it quits with the following error immediately after starting.

$ nextflow run TransPi.nf --onlyAsm --reads '/media/kazu/8TB2/TransPi/quality_trimmed_R[1,2].fastq.gz' --k 41,53 --maxReadLen 84 -profile conda
N E X T F L O W ~ version 20.07.1
Launching TransPi.nf [sharp_lorenz] - revision: 6acd9a7ecd
Script compilation error

file : /media/kazu/8TB2/TransPi/TransPi.nf
cause: expecting '}', found ',' @ line 2876, column 35.
file(assembly), file(annotation) from

nextflow version is 20.07.1.5412

rnaquast conda error

Hi @rivera10 ,

when runnning the test sample almost the whole pipeline worked for me! However, the very last evaluation step using rnaquast failed with the error message:

`error executing process > 'rna_quast (Sponge_sample)'

Caused by:
Failed to create Conda environment
command: conda create --mkdir --yes --quiet --prefix /home/kathrin/postdoc_vienna/de_novo/test_transpi/work/conda/env-fa8b0a403402a49160ee9bce39b0a3c5 -c conda-forge bioconda::rnaquast=2.0.1=0
status : 143
message:
Might it be a compatibility/dependency issue? When I just run conda create --mkdir --yes --quiet --prefix /home/kathrin/postdoc_vienna/de_novo/test_transpi/work/conda/env-fa8b0a403402a49160ee9bce39b0a3c5 -c conda-forge bioconda::rnaquast=2.0.1=0 I get this outputconda create --mkdir --yes --quiet --prefix /home/kathrin/postdoc_vienna/de_novo/test_transpi/work/conda/env-fa8b0a403402a49160ee9bce39b0a3c5 -c conda-forge bioconda::rnaquast=2.0.1=0
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working...`
which runs forever (or at least until I stopped it after several hours).

I installed conda (anaconda3) using your precheck script

All the best
Kathrin

Issue finding busco when running Transpi with slurm / sbatch

Hello!

I was able to successfully run TransPi when using slurm's srun (interactively), but I'm having trouble getting busco to run when submitting the job with sbatch.

The error:
.command.sh: line 4: busco: command not found

The command it is trying to run is:

#!/bin/bash -ue
echo -e "\n-- Starting BUSCO --\n"

busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi//DBs/busco_db/metazoa_odb10 -m tran -c 8 --offline

echo -e "\n-- DONE with BUSCO --\n"

cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv

Do you have an suggestions on any variables to set in the slurm script so that it is able to find busco properly?

Thank you so much!

Running onlyEvi

Hi,

I am trying to run onlyEvi on a single fasta file of a de novo transcriptome. I try the code ./nextflow run TransPi.nf --onlyEvi -profile conda

but received the following error:

N E X T F L O W  ~  version 20.10.0
Launching `TransPi.nf` [backstabbing_torvalds] - revision: 6acd9a7ecd
====================================================
  TransPi - Transcriptome Analysis Pipeline v1.1.0-rc
====================================================
TransPi.nf Directory:   /burg/home/gn2311/TransPi/TransPi.nf
Launch Directory:       /burg/home/gn2311/TransPi
Results Directory:      results
Work Directory:         /burg/home/gn2311/TransPi/work
TransPi DBs:            /burg/home/gn2311/TransPi
Busco DB:               /burg/home/gn2311/TransPi/DBs/busco_db/diptera_odb10

        Running TransPi with your dataset


        Running Evidential Gene analysis only

Local avail `memory` attribute cannot zero. Expression: (availMemory > 0). Values: availMemory = 0

 -- Check script 'TransPi.nf' at line: 1111 or see '.nextflow.log' file for more details

I checked line 1111 in TransPi.nf and I think one of the parameters are not set correctly. Please let me know if this issue can be resolved. Thank you for your help.

rnaspades step dies with "AttributeError: module 'collections' has no attribute 'Hashable'"

The rnaspades step is dying with AttributeError: module 'collections' has no attribute 'Hashable' with the transpi repo version pulled on the 27th December 2021 using the conda profile.

It appears to be the following issue: ablab/spades#863

It could potentially be solved until rnaspades is updated by using python < 3.10.

I was using Nextflow/21.04.3 amd Miniconda3/4.9.2 (Python 3.8.5), however there doesn't seem to be a way to specify rnaspades to use python < 3.10 through transpi (i.e. under system information in the error message below, Python version: 3.10.1 was the version being used with rnaspades)

Command:

nextflow run ../TransPi/TransPi.nf --all --reads '/nesi/nobackup/uoo00105/orthoskim_test/*_[1,2].fastq.gz' \
     --k 25,41,53 --maxReadLen 150 -profile conda -resume

Full error message:

Something went wrong. Check error message below and/or log files.
Error executing process > 'rna_spades_assembly (SRR6472974)'

Caused by:
  Process `rna_spades_assembly (SRR6472974)` terminated with an error exit status (1)

Command executed:

  echo -e "\n-- Starting rnaSPADES assemblies --\n"
  
  mem=$( echo 100 GB | cut -f 1 -d " " )
  
  for x in `echo 25,41,53 | tr "," " "`;do
      echo -e "\n-- rnaSPADES k${x} --\n"
      rnaspades.py -1 left-SRR6472974.norm.fq -2 right-SRR6472974.norm.fq -o SRR6472974_spades_${x} -t 36 -k ${x} -m ${mem}
  done
  
  echo -e "\n-- Finished with the assemblies --\n"
  
  for x in `echo 25,41,53 | tr "," " "`;do
      sed -i "s/>/>SPADES.k${x}./g" SRR6472974_spades_${x}/transcripts.fasta
  done
  
  cat SRR6472974_spades_*/transcripts.fasta >SRR6472974.SPADES.fa
  
  for x in `echo 25,41,53 | tr "," " "`;do
      cp SRR6472974_spades_${x}/transcripts.fasta SRR6472974.SPADES.k${x}.fa
  done
  
  rm -rf SRR6472974_spades_*
  
  v=$( rnaspades.py -v 2>&1 | awk '{print $4}' | tr -d "v" )
  echo "rna-SPADES: $v" >rna_spades.version.txt

Command exit status:
  1

Command output:
  
  -- Starting rnaSPADES assemblies --
  
  
  -- rnaSPADES k25 --
  
  Command line: /scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/bin/rnaspades.py	-1	left-SRR6472974.norm.fq	-2	right-SRR6472974.norm.f-o	SRR6472974_spades_25	-t	36	-k	25	-m	100	
  
  System information:
    SPAdes version: 3.14.0
    Python version: 3.10.1
    OS: Linux-3.10.0-693.2.2.el7.x86_64-x86_64-with-glibc2.17
  
  Output dir: SRR6472974_spades_25
  Mode: ONLY assembling (without read error correction)
  Debug mode is turned OFF
  
  Dataset parameters:
    RNA-seq mode
    Reads:

Command error:
  Traceback (most recent call last):
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/bin/rnaspades.py", line 639, in <module>
      main(sys.argv)
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/bin/rnaspades.py", line 579, in main
      print_params(log, log_filename, command_line, args, cfg)
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/bin/rnaspades.py", line 322, in print_params
      print_used_values(cfg, log)
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/bin/rnaspades.py", line 112, in print_used_values
      dataset_data = pyyaml.load(open(cfg["dataset"].yaml_filename))
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/share/spades/pyyaml3/__init__.py", line 72, in load
      return loader.get_single_data()
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/share/spades/pyyaml3/constructor.py", line 37, in get_single_data
      return self.construct_document(node)
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/share/spades/pyyaml3/constructor.py", line 46, in construct_document
      for dummy in generator:
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/share/spades/pyyaml3/constructor.py", line 398, in construct_yaml_map
      value = self.construct_mapping(node)
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/share/spades/pyyaml3/constructor.py", line 204, in construct_mapping
      return super().construct_mapping(node, deep=deep)
    File "/scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/condaEnv/env-8715806f9fdbb3029a17af22acc9fcad/share/spades/pyyaml3/constructor.py", line 126, in construct_mapping
      if not isinstance(key, collections.Hashable):
  AttributeError: module 'collections' has no attribute 'Hashable'

Work dir:
  /scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/work/cf/9b79cada5568a0d4a90a83cab19c56

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Getting done without running processes

Hi, Thank you for this amazing pipeline!

I tried to run the intire pipeline with the test dataset (-profile conda,test) and everything was ok.

However, when I try to run with my data, it gets done after running only the first three processes:

I tried to run with --onlyAsm option as well, but it gets done without running any process:

I also tested running with all different profiles and got the same error.
Do you know what might be happening here?

Thanks in advance!

Process requirement exceed available memory

Hi,

I've been getting this error when trying to run a TransPi full analysis:

"Error executing process > 'normalize_reads (AC466_S687_S3_L001)'

Caused by:
Process requirement exceed available memory -- req: 150 GB; avail: 128 GB"

How can I limitate the required memory to be lower than the available memory?

KEGG warn message

Thanks for providing this nice pipeline!
I run an analysis with the following command:
./nextflow run TransPi.nf --all --maxReadLen 143 --k 25,35,55,75,85 --reads '/media/server2/Data_2Gb/Melongena/Data/Trimmed/Trimmed/provaR673_HP/R673HP831val_R[1,2].fastq.gz' --outdir /media/server2/Data_2Gb/Melongena/TransPi_R673_HP8 -profile docker --skipQC

Everthing was ok, except that the pipeline ended with a KEGG warning message:
"WARN: Access to undefined parameter skipKegg -- Initialise it to a default value eg. params.skipKegg = some_value"
annotation.
Indedd in the trinotate folder the KEGG report file is empty as well as the kegg section in the html.

Any suggestion for that issue?

Thank you in advance

Installation Issue (file yml?)

Dear Author,
we are trying - with little success - to install TransPi (conda option) but it seems not to create any environment, not even the referenced TransPi environment.
First question is whether your recent update is complete or you are still working on it - suggesting the opportunity to wait a while before trying this new version.

A couple of possible issues while trying to go through the initial lines of your precheck script:

at the beginning, the script checks whether conda version is higher than 4.8 (variable $vern). We have 4.10, and the condition is not satisfied. In other words, 4.8 is seen as higher than 4.10 (and numerically it is).
the script seems to look for a transpi_env.yml in the github cloned folder, but there is none. Should the file be acquired from an alternative source?

Thanks a lot for this useful piece of software, that we are eager to see running on our computers,
Regards.

Save bam files

Add option to save bam files from mapping step in --all

Error in process Trinotate.

Hi Ramon,
I run full pipeline by foolowing command.

nextflow run TransPi.nf --all --reads /path/to/Read_R[1,2].fastq.gz --k 25,41,53,75 --maxReadLen 100 --outdir transpi-out -w transpi-wd --tracedir transpi-info -profile docker --allBuscos --skipQC --skipFilter --rescueBusco --buscoDist

And I got an error in process trinotate.
My .command.out from process trinotate is this.


-- Running Trinotate --

memory
memory

-- Ending run of Trinotate --


-- Loading hits and predictions to sqlite database... --

memory
memory
memory
memory
memory
No transmembrane domains (tmhmm)
No Signal-P
No rnammer results

-- Loading finished --


-- Generating report... --


-- Report generated --


-- Creating GO file from XLS... --


-- Done with the GO --


-- Creating KEGG file from XLS... --


-- Done with the KEGG --


-- Creating eggNOG file from XLS... --

Since there is no output after the following process (e.g. the message "Done with the eggNOG"), I thought there was a problem with this command. However, when I ran this command in a different terminal, no error occurred.

cat Read_R.trinotate_annotation_report.xls | cut -f 1,13 | grep "OG" | tr "\`" ";" | sed 's/^/#/g' | sed 's/;/\n;/g' | cut -f 1 -d "^" | tr -d "\n" | tr "#" "\n" | grep "OG" >Read_R.eggNOG_COG.terms.txt

Is there any reason for this error?
My TransPi version is v1.3.0-rc.
Thank you.

Error: No module named 'busco'

Hello Ramón!

Thanks for the great tool that is TransPi.

Currently, I am having some issues related to BUSCO. After exhaustive troubleshooting, it seems the condaEnv that contains BUSCO is not being activated during the execution of the pipeline.

I am using the versions:
Nextflow version 21.04.1
TransPi - Transcriptome Analysis Pipeline v1.3.0-rc

The Environment is created and I can navigate to the folder and find BUSCO:

Creating Conda env: -c conda-forge bioconda::bowtie2=2.4.2=py36hff7a194_2 bioconda::samtools=1.11=h6270b1f_0 [cache /home/porifera/TransPi/condaEnv/env-ef941e5448d01c1e0c7325e5aa0c3700]
Creating Conda env: -c conda-forge bioconda::busco=4.1.4=py_2 [cache /home/porifera/TransPi/condaEnv/env-e8d22bfae5e32eac973114001910dada]
Creating Conda env: -c conda-forge bioconda::cd-hit=4.8.1 bioconda::exonerate=2.4 bioconda::blast=2.11.0 [cache /home/porifera/TransPi/condaEnv/env-0761ea60e49e247680763dddcc8ef240]

I am having the following error:

Command output:
  -- Starting BUSCO --

No module named 'busco'
There was a problem installing BUSCO or importing one of its dependencies. See the user guide and the GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.

  -- DONE with BUSCO --

Command error:
  cp: cannot stat ‘SRR4423080.Trinity.bus4/short_summary.*.SRR4423080.Trinity.bus4.txt’: No such file or directory

Work dir:
 
/home/porifera/TransPi/work/1d/383229a8419abad57e7485ee281de2

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

It seems that TransPi is searching the PATH of my user and executing the BUSCO installed on my server (which is currently broken… The no module named ‘busco’ error message appears when I execute from terminal).

Any thoughts? My log can be found at:
transpi_out.txt

(Sorry for my English, if something is not clear I can try to explain better)

Best Wishes,
Gabriel

Uniprot Metazoan database not downloaded due to changes in UniProt REST API

The precheck_TransPi.sh script fails to download the UniProt Metazoa database due to changes to the API.
This is what comes back from the download command:

$ curl -o uniprot_metazoa_33208.fasta.gz "https://www.uniprot.org/uniprot/?query=taxonomy:33208&format=fasta&compress=yes&include=no"
$ cat uniprot_metazoa_33208.fasta.gz
Search result downloads have now moved to https://rest.uniprot.org. Please consult https://www.uniprot.org/help/api for guidance on changes.

I tried implementing the changes suggested by the new API guide with the command below, but it doesn't download anything.

curl -o uniprot_metazoa_33208.fasta.gz  "https://rest.uniprot.org/uniprotkb/search?query=organism_id%3A33208&format=fasta&compressed=true"

Thanks, Ido

Error in process busco4_dist

Hi, I apologize for my frequent contacts.

When the runninfg of SOS_busco.py in process busco4_dist, I got following error,

Command error:
  Traceback (most recent call last):
    File "/mnt/data/software/TransPi/bin/SOS_busco.py", line 38, in <module>
      busco_df = pd.read_csv(input_busco_file, sep=',',header=0,names=['Busco_id','Status','Sequence','Score','Length'])
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv
      return _read(filepath_or_buffer, kwds)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 458, in _read
      data = parser.read(nrows)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1186, in read
      ret = self._engine.read(nrows)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 2145, in read
      data = self._reader.read(nrows)
    File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
    File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
    File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
    File "pandas/_libs/parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows
    File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
  pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 51, saw 8

I think this is a problem for SOS_busco.py input file(In my case, Read_R_all_busco4.tsv).
Most of lines of my Read_R_all_busco4.tsv have 6 commas (7 columns), like this.
0at38820,Duplicated,SOAP.k25.scaffold27258,8202.3,4167,https://www.orthodb.org/v10?query=0at38820,sacsin

However, some lines of my file have 7 or 8 commas ( 8 or 9 columns) like this.
121at38820,Complete,SOAP.k25.scaffold11722,3027.5,1446,https://www.orthodb.org/v10?query=121at38820,Zinc finger, RING-type
I think that this difference in the number of commas (columns) is the cause of this pandas error.

SOS_busco.py doesn't seem to use columns 6 onwards in the input file.
If so, we can remove columns 6 onwards before SOS_busco.py.

TransPi/TransPi.nf

Lines 1591 to 1592 in 899d160

    
                               cat $transpi_tsv | grep -v "#" | tr "\\t" "," >>$all_busco 
        
                               SOS_busco.py -input_file_busco $all_busco -input_file_fasta $assembly -min ${params.minPerc} -kmers ${params.k}

This is an example of my suggestion for revising.

cat $transpi_tsv | grep -v "#" | tr "\\t" "," >>$all_busco
awk -F',' 'OFS="," {print $1,$2,$3,$4,$5}' $all_busco > some.csv
SOS_busco.py -input_file_busco some.csv -input_file_fasta $assembly -min ${params.minPerc} -kmers ${params.k}
rm -rf some.csv

I hope this helps you.
Thank you.

Using several databases

We are using TransPi to annotate transcriptome of plants but we also want to retrieve mitochondrial and plastome (chloroplast) genes/sequences from them. Those sequences are present in the assemblies made by all 5 program by not in transdecoder results ".combined.okay.fa.transdecoder.cds". Is this because DB (uniprot) does not include those genes encoded in the organelles or because transcecoder does not recognize genes (ORFs) encoded in mitochondrial and chloroplast?
How could include the genes encoded in chloroplast and mitochondria to be annotated in TransPi pipeline?
Thanks,
Joan

Process fastp (AC422_S687_S14_L001_R) terminated with an error exit status (1)

Hi,

I tried running full analysis on TransPi with the following command:
"./nextflow -bg run TransPi.nf --all --reads '/home/geninfo/cassianoriyu/RNA_Millepora/AC_422/AC422_S687_S14_L001_R[1,2].fastq.gz' \ --k 25,35,55,75,85 --maxReadLen 100 --outdir Results_AC422 -profile conda --myConda"

But I get an error message at the very beginning:
"Error executing process > 'fastp (AC422_S687_S14_L001_R)'

Caused by:
Process fastp (AC422_S687_S14_L001_R) terminated with an error exit status (1)

Command executed:
fastp -i AC422_S687_S14_L001_R1.fastq.gz -I AC422_S687_S14_L001_R2.fastq.gz -o left-AC422_S687_S14_L001_R.filter.fq -O right-AC422_S687_S14_L001_R.filter.fq --detect_adapter_for_pe --average_qual null --overrepresentation_analysis --html AC422_S687_S14_L001_R.fastp.html --json AC422_S687_S14_L001_R.fastp.json --thread 1 --report_title AC422_S687_S14_L001_R

v=$( fastp --version 2>&1 | awk '{print $2}' )
echo "fastp: $v" >fastp.version.txt

Command exit status:
1

Command output:
(empty)

Command error:
-G, --disable_trim_poly_g disable polyG tail trimming, by default trimming is automatically enabled for Illumina NextSeq/NovaSeq data
-x, --trim_poly_x enable polyX trimming in 3' ends.
--poly_x_min_len the minimum length to detect polyX in the read tail. 10 by default. (int [=10])
-5, --cut_front move a sliding window from front (5') to tail, drop the bases in the window if its mean quality < threshold, stop otherwise.
-3, --cut_tail move a sliding window from tail (3') to front, drop the bases in the window if its mean quality < threshold, stop otherwise.
-r, --cut_right move a sliding window from front to tail, if meet one window with mean quality < threshold, drop the bases in the window and the right part, and then stop.
-W, --cut_window_size the window size option shared by cut_front, cut_tail or cut_sliding. Range: 11000, default: 4 (int [=4])
-M, --cut_mean_quality the mean quality requirement option shared by cut_front, cut_tail or cut_sliding. Range: 136 default: 20 (Q20) (int [=20])
--cut_front_window_size the window size option of cut_front, default to cut_window_size if not specified (int [=4])
--cut_front_mean_quality the mean quality requirement option for cut_front, default to cut_mean_quality if not specified (int [=20])
--cut_tail_window_size the window size option of cut_tail, default to cut_window_size if not specified (int [=4])
--cut_tail_mean_quality the mean quality requirement option for cut_tail, default to cut_mean_quality if not specified (int [=20])
--cut_right_window_size the window size option of cut_right, default to cut_window_size if not specified (int [=4])
--cut_right_mean_quality the mean quality requirement option for cut_right, default to cut_mean_quality if not specified (int [=20])
-Q, --disable_quality_filtering quality filtering is enabled by default. If this option is specified, quality filtering is disabled
-q, --qualified_quality_phred the quality value that a base is qualified. Default 15 means phred quality >=Q15 is qualified. (int [=15])
-u, --unqualified_percent_limit how many percents of bases are allowed to be unqualified (0100). Default 40 means 40% (int [=40])
-n, --n_base_limit if one read's number of N base is >n_base_limit, then this read/pair is discarded. Default is 5 (int [=5])
-e, --average_qual if one read's average quality score <avg_qual, then this read/pair is discarded. Default 0 means no requirement (int [=0])
-L, --disable_length_filtering length filtering is enabled by default. If this option is specified, length filtering is disabled
-l, --length_required reads shorter than length_required will be discarded, default is 15. (int [=15])
--length_limit reads longer than length_limit will be discarded, default 0 means no limitation. (int [=0])
-y, --low_complexity_filter enable low complexity filter. The complexity is defined as the percentage of base that is different from its next base (base[i] != base[i+1]).
-Y, --complexity_threshold the threshold for low complexity filter (0100). Default is 30, which means 30% complexity is required. (int [=30])
--filter_by_index1 specify a file contains a list of barcodes of index1 to be filtered out, one barcode per line (string [=])
--filter_by_index2 specify a file contains a list of barcodes of index2 to be filtered out, one barcode per line (string [=])
--filter_by_index_threshold the allowed difference of index barcode for index filtering, default 0 means completely identical. (int [=0])
-c, --correction enable base correction in overlapped regions (only for PE data), default is disabled
--overlap_len_require the minimum length to detect overlapped region of PE reads. This will affect overlap analysis based PE merge, adapter trimming and correction. 30 by default. (int [=30])
--overlap_diff_limit the maximum number of mismatched bases to detect overlapped region of PE reads. This will affect overlap analysis based PE merge, adapter trimming and correction. 5 by default. (int [=5])
--overlap_diff_percent_limit the maximum percentage of mismatched bases to detect overlapped region of PE reads. This will affect overlap analysis based PE merge, adapter trimming and correction. Default 20 means 20%. (int [=20])
-U, --umi enable unique molecular identifier (UMI) preprocessing
--umi_loc specify the location of UMI, can be (index1/index2/read1/read2/per_index/per_read, default is none (string [=])
--umi_len if the UMI is in read1/read2, its length should be provided (int [=0])
--umi_prefix if specified, an underline will be used to connect prefix and UMI (i.e. prefix=UMI, UMI=AATTCG, final=UMI_AATTCG). No prefix by default (string [=])
--umi_skip if the UMI is in read1/read2, fastp can skip several bases following UMI, default is 0 (int [=0])
-p, --overrepresentation_analysis enable overrepresented sequence analysis.
-P, --overrepresentation_sampling one in (--overrepresentation_sampling) reads will be computed for overrepresentation analysis (110000), smaller is slower, default is 20. (int [=20])
-j, --json the json format report file name (string [=fastp.json])
-h, --html the html format report file name (string [=fastp.html])
-R, --report_title should be quoted with ' or ", default is "fastp report" (string [=fastp report])
-w, --thread worker thread number, default is 3 (int [=3])
-s, --split split output by limiting total split file number with this option (2999), a sequential number prefix will be added to output name ( 0001.out.fq, 0002.out.fq...), disabled by default (int [=0])
-S, --split_by_lines split output by limiting lines of each file with this option(>=1000), a sequential number prefix will be added to output name ( 0001.out.fq, 0002.out.fq...), disabled by default (long [=0])
-d, --split_prefix_digits the digits for the sequential number padding (1~10), default is 4, so the filename will be padded as 0001.xxx, 0 to disable padding (int [=4])
--cut_by_quality5 DEPRECATED, use --cut_front instead.
--cut_by_quality3 DEPRECATED, use --cut_tail instead.
--cut_by_quality_aggressive DEPRECATED, use --cut_right instead.
--discard_unmerged DEPRECATED, no effect now, see the introduction for merging.
-?, --help print this message"

I have no idea what's causing this error. I tried manually installing fastp, and fastp different versions. Also tried reinstalling TransPi but nothing changed.
Does anyone has any clue on how to solve this?

Thanks.

Trans-ABySS assembly error

Hi Ramon
Super cool pipeline!
I wonder if you had any issues with the Trans-ABySS assemblies. I am getting errors for that step when running it. Basically the pipeline kills/finish the step. There is no major information in the log file, just that the single attempt and the end of the run for that step.
Thoughts about this?
Thanks for the feedback!
Best
Juan

Log

echo -e "\n-- Starting Trans-ABySS assemblies --\n"

for x in echo 25,35,55,75,85 | tr "," " ";do
echo -e "\n-- Trans-ABySS k${x} --\n"
transabyss -k ${x} --pe left-S69L.norm.fq right-S69L.norm.fq --outdir S69L_transabyss_${x} --name k${x}.transabyss.fa --threads 15 -c 12 --length 200
done

echo -e "\n-- Finished with the assemblies --\n"

for x in echo 25,35,55,75,85 | tr "," " ";do
sed -i "s/>/>TransABySS.k${x}./g" S69L_transabyss_${x}/k${x}.transabyss.fa-final.fa
done

cat S69L_transabyss_/k.transabyss.fa-final.fa >S69L.TransABySS.fa

for x in echo 25,35,55,75,85 | tr "," " ";do
cp S69L_transabyss_${x}/k${x}.transabyss.fa-final.fa S69L.TransABySS.k${x}.fa
done

rm -rf S69L_transabyss_*
Status

Exit: 1 (FAILED) Attempts: 1 (action: FINISH)

Error Running Full Analysis: Process `fastp (1_R1)` terminated with an error exit status (255)

Hi,

I have an issue with trying to run Full Analysis on TransPi. I have 8 files containing paired end data named 1_R1.fastq.gz, 1_R2.fastq.qz, etc that I would like to produce a de novo transcriptome from.

I cloned the repository and ran the configuration using the bash precheck_TransPi.sh for the conda installation (option 1). Everything was downloaded and installed without any error.

I tried to use the full analysis by running
"nextflow run TransPi.nf -- all --reads .<path_to_reads>/*.fastq.gz --k 25,35,55,75,85 --maxReadLen 150 -profile conda"

I receive the following error message:

Something went wrong. Check error message below and/or log files.
Error executing process > 'fastp (1_R1)'

Caused by:
Process fastp (1_R1) terminated with an error exit status (255)

Command executed:

fastp -i 1_R1.fastq.gz -I null -o left-1_R1.filter.fq -O right-1_R1.filter.fq --detect_adapter_for_pe --average_qual 5 --overrepresentation_analysis --html 1_R1.fastp.html --json 1_R1.fastp.json --thread 8 --report_title 1_R1

v=$( fastp --version 2>&1 | awk '{print $2}' )
echo "fastp: $v" >fastp.version.txt

Command exit status:
255

Command output:
(empty)

Command error:
ERROR: Failed to open file: null

Work dir:
/data/gn2311/transpi/TransPi/work/ae/ec5145900bb3af7866a1cb7e83f07f

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

I also checked the command.sh file, which contains the following:

#!/bin/bash -ue
fastp -i 1_R1.fastq.gz -I null -o left-1_R1.filter.fq -O right-1_R1.filter.fq --detect_adapter_for_pe --average_qual 5 --overrepresentation_analysis --html 1_R1.fastp.html --json 1_R1.fastp.json --thread 8 --report_title 1_R1

v=$( fastp --version 2>&1 | awk '{print $2}' )
echo "fastp: $v" >fastp.version.txt

I am not sure what is causing the error, could you please let me know how to fix this issue?

Best,
Andy

no perl in cointainer

It appears that the mulled container
https://depot.galaxyproject.org/singularity/mulled-v2-962eae98c9ff8d5b31e1df7e41a355a99e1152c4:5aac6d1d2253d47aee81f01cc070a17664c86f07-0
lacks perl interpreter required by the evigene process:

Command output:
  
  -- Starting EviGene --

Command error:
  .command.sh: /path/redacted/TransPi/scripts/evigene/scripts/prot/tr2aacds.pl: /usr/bin/perl: bad interpreter: No such file or directory

Error executing process > 'normalize_reads', RAM required 150GB

Greetings! I tried running a docker version of TransPi on my transcriptomic data and after a seemingly successful start it failed during "normalize reads" process. Seemingly it wants to have 150GB worth of RAM (I only have 125, which I always thought would be more than enough). Is there a work-around or am I doing something wrong?

The code I used to run it:

sudo ./nextflow run TransPi.nf --all --maxReadLen 150 --k 25,35,55,75,85 --reads '/media/jcoludar/Daten/Ivan/02_Transcri/Transcriptomes/T5_09_Po_do_VG2/*_R[1,2].fastq.gz' --outdir Results_Polistes -profile docker,TransPiContainer

The Error message

Error executing process > 'normalize_reads (09-Po-du-VG2)'

Caused by:
Process requirement exceed available memory -- req: 150 GB; avail: 125.8 GB

Command executed:

echo 09-Po-du-VG2

echo -e "\n-- Starting Normalization --\n"

mem=$( echo 150 GB | cut -f 1 -d " " )

insilico_read_normalization.pl --seqType fq -JM ${mem}G --max_cov 100 --min_cov 1 --left left-09-Po-du-VG2.filter.fq --right right-09-Po-du-VG2.filter.fq --pairs_together --PARALLEL_STATS --CPU 15

echo -e "\n-- DONE with Normalization --\n"

cat .command.out | grep "stats_file" -A 3 | tail -n 3 >09-Po-du-VG2_normStats.txt

cp left.norm.fq left-"09-Po-du-VG2".norm.fq
cp right.norm.fq right-"09-Po-du-VG2".norm.fq

mv left.norm.fq 09-Po-du-VG2_norm.R1.fq
mv right.norm.fq 09-Po-du-VG2_norm.R2.fq

pigz --best --force -p 15 -r 09-Po-du-VG2_norm.R1.fq
pigz --best --force -p 15 -r 09-Po-du-VG2_norm.R2.fq

Command exit status:

Command output:
(empty)

Work dir:
/media/jcoludar/Daten/Ivan/Software/TransPi/TransPi/work/b3/b323002917226824e319cadb1441f5

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Single-end reads

Hi Ramon,

I have a bunch of SE reads for creating a de-novo transcriptome. Is this possible with TransPi? So far I tried the full analysis but failed already at the fastp step, as the program is already missing the file with the reverse reads (-I null):

`Error executing process > 'fastp (Pool_348_L13_S13_L002_R1)'

Caused by:
Process fastp (Pool_348_L13_S13_L002_R1) terminated with an error exit status (255)

Command executed:

fastp -i Pool_348_L13_S13_L002_R1.fastq.gz -I null -o left-Pool_348_L13_S13_L002_R1.filter.fq -O right-Pool_348_L13_S13_L002_R1.filter.fq --detect_adapter_for_pe --average_qual 5 --overrepresentation_analysis --html Pool_348_L13_S13_L002_R1.fastp.html --json Pool_348_L13_S13_L002_R1.fastp.json --thread 6 --report_title Pool_348_L13_S13_L002_R1

v=$( fastp --version 2>&1 | awk '{print $2}' )
echo "fastp: $v" >fastp.version.txt

Command exit status:
255

Command output:
(empty)

Command error:
ERROR: Failed to open file: null
`
Is there anything I can do? My installation works with your test data and other PE data.
In principle, all the programs should work with SE reads as well, right?

Thank you so much!
Kathrin

error in Trinity

Here is Joan Pons. I run full pipeline in conda environment in linux manjaro
./nextflow run TransPi.nf --all --reads "/dades/jpp/rna_plantes/*_R[1,2].uniq.fastq.gz" --k 25,41,53 --maxReadLen 150 -resume

config file was set to skipNormalization = true
Command output:
ERROR, don't recognize parameter: --no_normalize_reads
Please review usage info for accepted parameters.
I checked Trintity and subcommand --no_normalize_reads do not exist. I guess this command was set by the pipeline istself
Any tip?
Thanks

Problem with --onlyAnn ?

Hi,

I am trying to use the pipeline on an already assembled transcriptome (a single FASTA file with ~300k sequences) which I am putting in a directory called onlyAnn as per the instructions. I believe I have followed correctly all the pipeline installation pipelines and tests. However, when I am trying to run:

./nextflow run ~/TransPi/TransPi.nf --onlyAnn -profile conda

I get

N E X T F L O W  ~  version 20.10.0
Launching `/home/moulos/TransPi/TransPi.nf` [happy_koch] - revision: ad8250cb5a
====================================================
  TransPi - Transcriptome Analysis Pipeline v1.0.0-dev
====================================================
TransPi.nf Directory:   /home/moulos/TransPi/TransPi.nf
Launch Directory:       /media/raid/tmp/tmp/genohub/transpi_db
Results Directory:      /media/raid/tmp/tmp/genohub/transpi_db/results
Work Directory:         /media/raid/tmp/tmp/genohub/transpi_db/work
TransPi DBs:            /media/raid/tmp/tmp/genohub/transpi_db
Uniprot DB:             /media/raid/tmp/tmp/genohub/transpi_db/DBs/uniprot_db/uniprot_metazoa_33208.fasta

        Running TransPi with your dataset


        Running only annotation analysis

[-        ] process > transdecoder_longorf     -
[-        ] process > transdecoder_diamond     -
[-        ] process > transdecoder_hmmer       -
[-        ] process > transdecoder_predict     -
[-        ] process > swiss_diamond_trinotate  -
[-        ] process > custom_diamond_trinotate -
[-        ] process > hmmer_trinotate          -
[-        ] process > skip_signalP             -
No such variable: evigene_ch_rnammer

 -- Check script '/home/moulos/TransPi/TransPi.nf' at line: 2299 or see '.nextflow.log' file for more details

Any clues? Please note that I am using a custom path to install TransPi databases instead of the default directory (cloned from GitHub).

Thank you in advance!

Problem in installing

Hi,

Thank you for providing such a very beneficial tool

However, I try to install the program and I faced an error while I install it, I attach the screenshot

I select EUKARYOTA, and then I select Superkingdom, suddenly the program stops with the error message above.

    Please select database: 1
cat: ./conf/busV4list.txt: No such file or directory
cat: ./conf/busV4list.txt: No such file or directory
cat: ./conf/busV4list.txt: No such file or directory
cat: ./conf/busV3list.txt: No such file or directory

         -- No BUSCO V3 available for  --

Here are the commands I used so far

git clone https://github.com/palmuc/TransPi.git
cd TransPi
bash precheck_TransPi.sh .

Error related to BUSCO

Hi there,
I am having problems installing TransPi with the "bash precheck_TransPi.sh" . The installation ended up in:

 -- Selecting BUSCO V4 database -- 


	 -- ERROR: Please make sure that file "busV4list.txt" is available. Please check requirements and rerun the pre-check --

Thanks in advance!

Conda direct download?

Great pipeline! Looking forward to use it. Any chance we can download the program directly from Conda? Thanks!

github update 4 days ago of TransPi.nf

Hi Ramon,
Few days ago you updated TransPi.nf at github so now all programs (and specific versions) needed for the pipeline are installed by the script in a new environment is despite of some or all programs are alreay installed in the conda environment. However, this behaviour is performed for every analysis so the computer is populated with as many environments as analyses started. Could you change the behaviour of the script enclosed in TransPi.nf so the programs are installed just once at the original environment?
Thanks a lot for making pipeline more user friendly at the installation level. Busco v4 is not well integrated in the older version since busco config file is not found by script and/or the path to programs are wrongly set by default.
Cheers,
Joan

Transdecoder cannot be found

Hi @rivera10

Thank you for creating the TransPi pipeline. I have tried running it and encountered this error:

===============================================================================================
"Something went wrong. Check error message below and/or log files.
Error executing process > 'transdecoder_longorf (GER921R_trimmed_corr)'

Caused by:
Process transdecoder_longorf (GER921R_trimmed_corr) terminated with an error exit status (127)

Command executed:

cp GER921R_trimmed_corr.combined.okay.fa GER921R_trimmed_corr_asssembly.fasta

echo -e "\n-- TransDecoder.LongOrfs... --\n"

TransDecoder.LongOrfs -t GER921R_trimmed_corr.combined.okay.fa --output_dir GER921R_trimmed_corr.transdecoder_dir -G Universal

cp GER921R_trimmed_corr.transdecoder_dir/longest_orfs.pep GER921R_trimmed_corr.longest_orfs.pep

echo -e "\n-- Done with TransDecoder.LongOrfs --\n"

v=$( TransDecoder.LongOrfs --version | cut -f 2 -d " " )
echo "Transdecoder: $v" >transdecoder.version.txt

Command exit status:
127

Command output:

-- TransDecoder.LongOrfs... --

Command error:
.command.sh: line 6: TransDecoder.LongOrfs: command not found

Work dir:
~/Applications/TransPi/working_RNA_combined/58/d62e3e4699b1267a0b39d17184c8bf

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line"

===============================================================================================

It looks to me that TransDecoder cannot be found, based on the error message highlighted in bold, which is strange because I understand that a new conda environment will be created to install the necessary programs. I have confirmed that the input files are indeed in $PWD. I have tried searching for "TransDecoder.LongOrfs" in all condaEnv directories in TransPi, and true enough, I cannot find the script in any of the condaEnv bins. Could it be that the conda create command was not executed? I have also tried installing TransDecoder and added it to $PATH, but even then, the program still cannot be found by TransPi. Is there a way to resolve this? Or can I perhaps install the condaEnv on my own before resuming the run? I need advice on how the condaEnvs are named as they seem pretty random. How would I need to name it so that nextflow will know to activate that particular environment for TransDecoder-related programs?

Kindly advise, thank you!

Regards
Marc

Operator `into` has been deprecated -- it's not available in DSL2 syntax

Hi,
I am trying to run TransPi with the '--all' option, but the process halted abruptly stating the following error:

DEBUG nextflow.Session - Session aborted -- Cause: Operator `into` has been deprecated -- it's not available in DSL2 syntax
Jul-22 14:24:15.801 [main] ERROR nextflow.cli.Launcher - @unknown
groovy.lang.DeprecationException: Operator `into` has been deprecated -- it's not available in DSL2 syntax

I assume this is an issue with the new nextflow update. Hope you would make changes accordingly soon.

When using singularity, busco step fails when it cannot find database

When running transpi with the singularity profile using the transpi repo version pulled on the 27th December 2021:

nextflow run ../TransPi/TransPi.nf --all --reads '/nesi/nobackup/uoo00105/orthoskim_test/*_[1,2].fastq.gz' \
     --k 25,41,53 --maxReadLen 150 -profile singularity -resume

Program dies at the busco step with the full error message below. I have confirmed that the busco files etc are at the link it claims it cannot find. This step also works fine if I load the busco module on our HPC (and using the same busco db pointed to in the command below that cannot be found by transpi-busco), so it seems like potentially an issue with how transpi is running busco rather than busco itself.

Full error message:

Something went wrong. Check error message below and/or log files.
Error executing process > 'busco4_tri (SRR6472974)'

Caused by:
  Process `busco4_tri (SRR6472974)` terminated with an error exit status (1)

Command executed:

  echo -e "\n-- Starting BUSCO --\n"
  
  busco -i SRR6472974.Trinity.fa -o SRR6472974.Trinity.bus4 -l /nesi/nobackup/uoo00105/transpi_conda/DBs/busco_db/viridiplantae_odb10 -m tran -c 36 --offline
  
  echo -e "\n-- DONE with BUSCO --\n"
  
  cp SRR6472974.Trinity.bus4/short_summary.*.SRR6472974.Trinity.bus4.txt .
  cp SRR6472974.Trinity.bus4/run_*/full_table.tsv full_table_SRR6472974.Trinity.bus4.tsv

Command exit status:
  1

Command output:
  
  -- Starting BUSCO --
  
  INFO:	***** Start a BUSCO v4.1.4 analysis, current time: 01/04/2022 23:49:52 *****
  INFO:	Configuring BUSCO with /usr/local/share/busco/config.ini
  INFO:	Mode is transcriptome
  INFO:	Input file is SRR6472974.Trinity.fa

Command error:
  ERROR:	/nesi/nobackup/uoo00105/transpi_conda/DBs/busco_db/viridiplantae_odb10 does not exist
  ERROR:	BUSCO analysis failed !
  ERROR:	Check the logs, read the user guide, and check the BUSCO issue board on https://gitlab.com/ezlab/busco/issues

Work dir:
  /scale_wlg_nobackup/filesets/nobackup/uoo00105/transpi_conda/work/ea/db81677c5bb959b84f677197473562

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Trinotate preparation step in Precheck script silently fails with missing Perl module

When running the Precheck script bash precheck_TransPi.sh ~/scratch/sandbox/TransPi_test, the Trinotate preparation process fails with the following message:

Can't locate DBI.pm in @INC (you may need to install the DBI module) (@INC contains: /scratch/s2978925/sandbox/TransPi_test/DBs/sqlite_db/Trinotate_build_scrip
ts/admin/util/../../PerlLib /export/home/s2978925/conda3/lib/perl5/5.32/site_perl /export/home/s2978925/conda3/lib/perl5/site_perl /export/home/s2978925/conda3
/lib/perl5/5.32/vendor_perl /export/home/s2978925/conda3/lib/perl5/vendor_perl /export/home/s2978925/conda3/lib/perl5/5.32/core_perl /export/home/s2978925/cond
a3/lib/perl5/core_perl .) at /scratch/s2978925/sandbox/TransPi_test/DBs/sqlite_db/Trinotate_build_scripts/admin/util/../../PerlLib/Sqlite_connect.pm line 15.
BEGIN failed--compilation aborted at /scratch/s2978925/sandbox/TransPi_test/DBs/sqlite_db/Trinotate_build_scripts/admin/util/../../PerlLib/Sqlite_connect.pm li
ne 15.
Compilation failed in require at /scratch/s2978925/sandbox/TransPi_test/DBs/sqlite_db/Trinotate_build_scripts/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceD
B.pl line 8.
BEGIN failed--compilation aborted at /scratch/s2978925/sandbox/TransPi_test/DBs/sqlite_db/Trinotate_build_scripts/admin/util/EMBL_dat_to_Trinotate_sqlite_resou
rceDB.pl line 8.
Error, cmd: /scratch/s2978925/sandbox/TransPi_test/DBs/sqlite_db/Trinotate_build_scripts/admin/util/EMBL_dat_to_Trinotate_sqlite_resourceDB.pl --sqlite Trinota
te.sqlite --create died with ret 512 at /scratch/s2978925/sandbox/TransPi_test/DBs/sqlite_db/Trinotate_build_scripts/admin/../PerlLib/Pipeliner.pm line 102.
        Pipeliner::run(Pipeliner=HASH(0x560f9ee83548)) called at ./Trinotate_build_scripts/admin/Build_Trinotate_Boilerplate_SQLite_db.pl line 120
rm: cannot remove ‘Pfam-A.hmm.gz’: No such file or directory

This is caused by the missing Perl module (DBI), which is an easy fix, but why doesn't this step happens within the Nextflow pipeline when the required tools and modules are already set up (within a conda environment or a container)?
I would also suggest that the script will report this error in the final statement rather than send the user to look for error messages in the output.

Thanks, Ido

Missing transpi_env.yml

The Dockerfile tries to copy transpi_env.yml and build a conda environment from it, but no such file seems to exist in this repo. Where can I access this file?

problems running full analysis (--all option) and evigene's script

Hi Ramón,

I'm running the --all module and I get an error just when I should be using an evigene script. For some reason I don't know, the file combined.okay.fa is not created. All the others are created, but not that one.
This is the output I got.

nextflow run TransPi.nf --all --reads '/run/media/DataProcessing/tiburon/teste/R[1,2].fq.gz' --k 25,41,53,61 --maxReadLen 100 --outdir /run/media/DataProcessing/tiburon/teste/ -profile conda
N E X T F L O W ~ version 21.04.1
Launching `TransPi.nf` [admiring_woese] - revision: 5a26a6e85d

TransPi - Transcriptome Analysis Pipeline v1.3.0-rc

TransPi.nf Directory: /home/sbravo/Descargas/TransPi/TransPi.nf
Launch Directory: /home/sbravo/Descargas/TransPi
Results Directory: /run/media/DataProcessing/tiburon/teste/
Work Directory: /home/sbravo/Descargas/TransPi/work
TransPi DBs: /home/sbravo/Descargas/TransPi
Uniprot DB: /run/media/DataProcessing/DBs/uniprot_db/uniprot_metazoa_33208.fasta
Busco DB: /run/media/DataProcessing/DBs/busco_db/metazoa_odb10/
Reads Directory: /run/media/DataProcessing/tiburon/teste/R[1,2].fq.gz
Read Length: 100
Kmers: 25,41,53,61

Running TransPi with your dataset


Running the full TransPi analysis

[- ] process > fasqc -
[- ] process > fastp -
[- ] process > fasqc -
[- ] process > fastp -
[- ] process > fastp_stats -
[- ] process > skip_rrna_removal -
[- ] process > normalize_reads -
[- ] process > trinity_assembly -
[- ] process > soap_assembly -
executor > local (1)
...

executor > local (14)
[a1/b551cf] process > fasqc (R) [100%] 1 of 1 ✔
[a1/db5052] process > fastp (R) [100%] 1 of 1 ✔
[08/2cd9ee] process > fastp_stats (R) [100%] 1 of 1 ✔
[44/4a8458] process > skip_rrna_removal (R) [100%] 1 of 1 ✔
[d3/7e1697] process > normalize_reads (R) [100%] 1 of 1 ✔
[f6/ec9860] process > trinity_assembly (R) [100%] 1 of 1 ✔
[26/0bdb02] process > soap_assembly (R) [100%] 1 of 1 ✔
[d6/b4833e] process > velvet_oases_assembly (R) [100%] 1 of 1 ✔
[6a/86e1f1] process > rna_spades_assembly (R) [100%] 1 of 1 ✔
[09/779b72] process > transabyss_assembly (R) [100%] 1 of 1 ✔
[d2/04067d] process > evigene (R) [ 0%] 0 of 1
[- ] process > rna_quast -
[- ] process > mapping_evigene -
[- ] process > busco4 -
[d7/40bafd] process > mapping_trinity (R) [100%] 1 of 1 ✔
[- ] process > summary_evigene_individual -
[- ] process > busco4_tri -
[b3/bf26e0] process > skip_busco_dist (R) [100%] 1 of 1 ✔
[- ] process > summary_busco4_individual -
[- ] process > get_busco4_comparison -
[- ] process > transdecoder_longorf -
[- ] process > transdecoder_diamond -
[- ] process > transdecoder_hmmer -
[- ] process > transdecoder_predict -
[- ] process > swiss_diamond_trinotate -
[- ] process > custom_diamond_trinotate -
[- ] process > hmmer_trinotate -
[- ] process > skip_signalP -
[- ] process > skip_tmhmm -
[- ] process > skip_rnammer -
[- ] process > trinotate -
[- ] process > get_GO_comparison -
[- ] process > summary_custom_uniprot -
[- ] process > skip_kegg -
[- ] process > get_transcript_dist -
[- ] process > summary_transdecoder_individual -
[- ] process > summary_trinotate_individual -
[- ] process > get_report -
[c3/fa967a] process > get_run_info [100%] 1 of 1 ✔
Error executing process > 'evigene (R)'

Caused by:
Process evigene (R) terminated with an error exit status (1)

Command executed:

echo -e "\n-- Starting EviGene --\n"

cat R.Velvet.fa R.SOAP.fa R.SPADES.fa R.Trinity.fa R.TransABySS.fa >R.combined.fa

/home/sbravo/Descargas/TransPi/scripts/evigene/scripts/prot/tr2aacds.pl -tidy -NCPU 15 -MAXMEM 153600 -log -cdna R.combined.fa

echo -e "\n-- DONE with EviGene --\n"

cp okayset/combined.okay.fa R.combined.okay.fa
cp okayset/combined.okay.cds R.combined.okay.cds

if [ -d tmpfiles/ ];then
rm -rf tmpfiles/
fi

v=$( echo "2019.05.14" )
echo "EvidentialGene: $v" >evigene.version.txt
v=$( blastn -version | head -n1 | awk '{print $2}' )
echo "Blast: $v" >>evigene.version.txt
v=$( cd-hit -h | head -n1 | cut -f 1 -d "(" | cut -f 2 -d "n" )
echo "CD-HIT: $v" >>evigene.version.txt
v=$( exonerate -v | head -n1 | cut -f 5 -d " " )
echo "Exonerate: $v" >>evigene.version.txt

Command exit status:
1

Command output:

-- Starting EviGene --

-- DONE with EviGene --

Command error:
Use of uninitialized value $okaa in -f at /home/sbravo/Descargas/evigene/scripts/genes/../cdnexecutor > local (14)
[a1/b551cf] process > fasqc (R) [100%] 1 of 1 ✔
[a1/db5052] process > fastp (R) [100%] 1 of 1 ✔
[08/2cd9ee] process > fastp_stats (R) [100%] 1 of 1 ✔
[44/4a8458] process > skip_rrna_removal (R) [100%] 1 of 1 ✔
[d3/7e1697] process > normalize_reads (R) [100%] 1 of 1 ✔
[f6/ec9860] process > trinity_assembly (R) [100%] 1 of 1 ✔
[26/0bdb02] process > soap_assembly (R) [100%] 1 of 1 ✔
[d6/b4833e] process > velvet_oases_assembly (R) [100%] 1 of 1 ✔
[6a/86e1f1] process > rna_spades_assembly (R) [100%] 1 of 1 ✔
[09/779b72] process > transabyss_assembly (R) [100%] 1 of 1 ✔
[d2/04067d] process > evigene (R) [100%] 1 of 1, failed: 1 ✘
[- ] process > rna_quast -
[- ] process > mapping_evigene -
[- ] process > busco4 -
[d7/40bafd] process > mapping_trinity (R) [100%] 1 of 1 ✔
[- ] process > summary_evigene_individual -
[- ] process > busco4_tri -
[b3/bf26e0] process > skip_busco_dist (R) [100%] 1 of 1 ✔
[- ] process > summary_busco4_individual -
[- ] process > get_busco4_comparison -
[- ] process > transdecoder_longorf -
[- ] process > transdecoder_diamond -
[- ] process > transdecoder_hmmer -
[- ] process > transdecoder_predict -
[- ] process > swiss_diamond_trinotate -
[- ] process > custom_diamond_trinotate -
[- ] process > hmmer_trinotate -
[- ] process > skip_signalP -
[- ] process > skip_tmhmm -
[- ] process > skip_rnammer -
[- ] process > trinotate -
[- ] process > get_GO_comparison -
[- ] process > summary_custom_uniprot -
[- ] process > skip_kegg -
[- ] process > get_transcript_dist -
[- ] process > summary_transdecoder_individual -
[- ] process > summary_trinotate_individual -
[- ] process > get_report -
[c3/fa967a] process > get_run_info [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit

Something went wrong. Check error message below and/or log files.
Error executing process > 'evigene (R)'

Caused by:
Process evigene (R) terminated with an error exit status (1)

Command executed:

echo -e "\n-- Starting EviGene --\n"

cat R.Velvet.fa R.SOAP.fa R.SPADES.fa R.Trinity.fa R.TransABySS.fa >R.combined.fa

/home/sbravo/Descargas/TransPi/scripts/evigene/scripts/prot/tr2aacds.pl -tidy -NCPU 15 -MAXMEM 153600 -log -cdna R.combined.fa

echo -e "\n-- DONE with EviGene --\n"

cp okayset/combined.okay.fa R.combined.okay.fa
cp okayset/combined.okay.cds R.combined.okay.cds

if [ -d tmpfiles/ ];then
rm -rf tmpfiles/
fi

Command exit status:
1

Command output:

-- Starting EviGene --

-- DONE with EviGene --

Command error:
Use of uninitialized value $okaa in -f at /home/sbravo/Descargas/evigene/scripts/genes/../cdnexecutor > local (14)
[a1/b551cf] process > fasqc (R) [100%] 1 of 1 ✔
[a1/db5052] process > fastp (R) [100%] 1 of 1 ✔
[08/2cd9ee] process > fastp_stats (R) [100%] 1 of 1 ✔
[44/4a8458] process > skip_rrna_removal (R) [100%] 1 of 1 ✔
[d3/7e1697] process > normalize_reads (R) [100%] 1 of 1 ✔
[f6/ec9860] process > trinity_assembly (R) [100%] 1 of 1 ✔
[26/0bdb02] process > soap_assembly (R) [100%] 1 of 1 ✔
[d6/b4833e] process > velvet_oases_assembly (R) [100%] 1 of 1 ✔
[6a/86e1f1] process > rna_spades_assembly (R) [100%] 1 of 1 ✔
[09/779b72] process > transabyss_assembly (R) [100%] 1 of 1 ✔
[d2/04067d] process > evigene (R) [100%] 1 of 1, failed: 1 ✘
[- ] process > rna_quast -
[- ] process > mapping_evigene -
[- ] process > busco4 -
[d7/40bafd] process > mapping_trinity (R) [100%] 1 of 1 ✔
[- ] process > summary_evigene_individual -
[- ] process > busco4_tri -
[b3/bf26e0] process > skip_busco_dist (R) [100%] 1 of 1 ✔
[- ] process > summary_busco4_individual -
[- ] process > get_busco4_comparison -
[- ] process > transdecoder_longorf -
[- ] process > transdecoder_diamond -
[- ] process > transdecoder_hmmer -
[- ] process > transdecoder_predict -
[- ] process > swiss_diamond_trinotate -
[- ] process > custom_diamond_trinotate -
[- ] process > hmmer_trinotate -
[- ] process > skip_signalP -
[- ] process > skip_tmhmm -
[- ] process > skip_rnammer -
[- ] process > trinotate -
[- ] process > get_GO_comparison -
[- ] process > summary_custom_uniprot -
[- ] process > skip_kegg -
[- ] process > get_transcript_dist -
[- ] process > summary_transdecoder_individual -
[- ] process > summary_trinotate_individual -
[- ] process > get_report -
[c3/fa967a] process > get_run_info [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit

Something went wrong. Check error message below and/or log files.
Error executing process > 'evigene (R)'

Caused by:
Process evigene (R) terminated with an error exit status (1)

Command executed:

echo -e "\n-- Starting EviGene --\n"

cat R.Velvet.fa R.SOAP.fa R.SPADES.fa R.Trinity.fa R.TransABySS.fa >R.combined.fa

/home/sbravo/Descargas/TransPi/scripts/evigene/scripts/prot/tr2aacds.pl -tidy -NCPU 15 -MAXMEM 153600 -log -cdna R.combined.fa

echo -e "\n-- DONE with EviGene --\n"

cp okayset/combined.okay.fa R.combined.okay.fa
cp okayset/combined.okay.cds R.combined.okay.cds

if [ -d tmpfiles/ ];then
rm -rf tmpfiles/
fi

Command exit status:
1

Command output:

-- Starting EviGene --

-- DONE with EviGene --

Command error:
Use of uninitialized value $okaa in -f at /home/sbravo/Descargas/evigene/scripts/genes/../cdna_evigenesub.pm line 814.
Use of uninitialized value $altaa in -f at /home/sbravo/Descargas/evigene/scripts/genes/../cdna_evigenesub.pm line 815.
#readPubidTab(publicset/R.combined.pubids)= 8394

nin=63492, nok=16561, nfrag=168, nskipnotloc=45359, nskipdupfrag=1121, nskipdiffloc=283

#insertUniqExons= 167
#collectExonChains= 8166 of 8237 ids
#assignChainLoci
#n_class: ichain=5796 icalt=1144 icsub=1226 icdup=71
#n_alts : t1=5796 t2=550 t3=202 t4=103 t5=65 t6=46 t7=31 t8=21 t9=17 t10=15 t11=12 t12=8 t13=8 t14=6 t15=5 t16=5 t17=5 t18=4 t19=4 t20=3
Use of uninitialized value $okaa in -f at /home/sbravo/Descargas/evigene/scripts/genes/../cdna_evigenesub.pm line 814.
Use of uninitialized value $altaa in -f at /home/sbravo/Descargas/evigene/scripts/genes/../cdna_evigenesub.pm line 815.
#egr: FATAL Missing mrna
cp: no se puede efectuar `stat' sobre 'okayset/combined.okay.fa': No existe el fichero o el directorio

Work dir:
/home/sbravo/Descargas/TransPi/work/d2/04067d1da59bc7532be8e0e9217e1a

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

I don't understand the error. I do not know how to fix it.
Help me, please.

Thank you so much!
Scarleth

TransPi on HPC failing to pull singularity image

Hi there,

I've been trying to use TransPi (love the concept and the benchmarking in the paper looks great!) to re-assemble some older public datasets. Running on an interactive slurm session of an HPC which has singularity installed:

(TransPi) chril1@hpc-gpu-002:~/TransPiDBs$ ./nextflow run ../TransPi/TransPi.nf --all --maxReadLen 150 --k 25,35,55,75,85 --reads '/home/chril1/NERC_PhyloCourse/transcriptome_data/Flagellophora_sp/SRR8641368_R[1,2].fastq.gz' --outdir Results_Flagellophora -profile singularity,TransPiContainer

N E X T F L O W ~ version 21.04.1
Launching `../TransPi/TransPi.nf` [furious_einstein] - revision: 5a26a6e85d

TransPi - Transcriptome Analysis Pipeline v1.3.0-rc

TransPi.nf Directory: /gpfs/nhmfsa/bulk/share/data/mbl/share/workspaces/users/chril1/TransPi/TransPi.nf
Launch Directory: /gpfs/nhmfsa/bulk/share/data/mbl/share/workspaces/users/chril1/TransPiDBs
Results Directory: Results_Flagellophora
Work Directory: /gpfs/nhmfsa/bulk/share/data/mbl/share/workspaces/users/chril1/TransPiDBs/work
TransPi DBs: /home/chril1/TransPiDBs
Uniprot DB: /home/chril1/TransPiDBs/DBs/uniprot_db/uniprot_metazoa_33208.fasta
Busco DB: /home/chril1/TransPiDBs/DBs/busco_db/metazoa_odb10
Reads Directory: /home/chril1/NERC_PhyloCourse/transcriptome_data/Flagellophora_sp/SRR8641368_R[1,2].fastq.gz
Read Length: 150
Kmers: 25,35,55,75,85

Running TransPi with your dataset


Running the full TransPi analysis

executor > local (2)
[6c/876909] process > fasqc (SRR8641368_R) [ 0%] 0 of 1
executor > local (3)
[6c/876909] process > fasqc (SRR8641368_R) [ 0%] 0 of 1
executor > local (3)
[6c/876909] process > fasqc (SRR8641368_R) [ 0%] 0 of 1
executor > local (3)
[6c/876909] process > fasqc (SRR8641368_R) [ 0%] 0 of 1
executor > local (3)
[6c/876909] process > fasqc (SRR8641368_R) [ 0%] 0 of 1
executor > local (3)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
executor > local (3)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
executor > local (4)
executor > local (5)
executor > local (6)
executor > local (6)
executor > local (6)
executor > local (6)
executor > local (6)
executor > local (6)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
executor > local (8)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
executor > local (9)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
executor > local (9)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
executor > local (10)
executor > local (10)
executor > local (10)
executor > local (10)
executor > local (11)
executor > local (11)
executor > local (11)
executor > local (11)
executor > local (11)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
executor > local (13)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
executor > local (13)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
[38/2daf75] process > fastp (SRR8641368_R) [100%] 1 of 1 ✔
[09/b36296] process > fastp_stats (SRR8641368_R) [100%] 1 of 1 ✔
[a5/77f26f] process > skip_rrna_removal (SRR8641368_R) [100%] 1 of 1 ✔
[94/71880f] process > normalize_reads (SRR8641368_R) [100%] 1 of 1 ✔
[de/71f78c] process > trinity_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[de/bc6216] process > soap_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[25/457622] process > velvet_oases_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[84/17fe6c] process > rna_spades_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[2e/672733] process > transabyss_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[80/f0f8e6] process > evigene (SRR8641368_R) [ 0%] 0 of 1
[- ] process > rna_quast -
[- ] process > mapping_evigene -
[- ] process > busco4 -
[ca/050e8b] process > mapping_trinity (SRR8641368_R) [ 0%] 0 of 1
[- ] process > summary_evigene_individual -
[- ] process > busco4_tri -
[47/a485e7] process > skip_busco_dist (SRR8641368_R) [100%] 1 of 1, cached: 1 ✔
executor > local (13)
[6c/876909] process > fasqc (SRR8641368_R) [100%] 1 of 1 ✔
[38/2daf75] process > fastp (SRR8641368_R) [100%] 1 of 1 ✔
[09/b36296] process > fastp_stats (SRR8641368_R) [100%] 1 of 1 ✔
[a5/77f26f] process > skip_rrna_removal (SRR8641368_R) [100%] 1 of 1 ✔
[94/71880f] process > normalize_reads (SRR8641368_R) [100%] 1 of 1 ✔
[de/71f78c] process > trinity_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[de/bc6216] process > soap_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[25/457622] process > velvet_oases_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[84/17fe6c] process > rna_spades_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[2e/672733] process > transabyss_assembly (SRR8641368_R) [100%] 1 of 1 ✔
[80/f0f8e6] process > evigene (SRR8641368_R) [ 0%] 0 of 1
[- ] process > rna_quast -
[- ] process > mapping_evigene -
[- ] process > busco4 -
[ca/050e8b] process > mapping_trinity (SRR8641368_R) [ 0%] 0 of 1
[- ] process > summary_evigene_individual -
[- ] process > busco4_tri -
[47/a485e7] process > skip_busco_dist (SRR8641368_R) [100%] 1 of 1, cached: 1 ✔
[- ] process > summary_busco4_individual -
[- ] process > get_busco4_comparison -
[- ] process > transdecoder_longorf -
[- ] process > transdecoder_diamond -
[- ] process > transdecoder_hmmer -
[- ] process > transdecoder_predict -
[- ] process > swiss_diamond_trinotate -
[- ] process > custom_diamond_trinotate -
[- ] process > hmmer_trinotate -
[- ] process > skip_signalP -
[- ] process > skip_tmhmm -
[- ] process > skip_rnammer -
[- ] process > trinotate -
[- ] process > get_GO_comparison -
[- ] process > summary_custom_uniprot -
[- ] process > skip_kegg -
[- ] process > get_transcript_dist -
[- ] process > summary_transdecoder_individual -
[- ] process > summary_trinotate_individual -
[- ] process > get_report -
[98/934751] process > get_run_info [100%] 1 of 1 ✔
Pulling Singularity image docker://null [cache /gpfs/nhmfsa/bulk/share/data/mbl/share/workspaces/users/chril1/TransPiDBs/singularityCache/null.img]
Error executing process > 'busco4_tri (SRR8641368_R)'

Caused by:
Failed to pull singularity image
command: singularity pull --name null.img.pulling.1664887974429 docker://null > /dev/null
status : 255
message:
FATAL: While making image from oci registry: error fetching image to cache: failed to get checksum for docker://null: Error reading manifest latest in docker.io/library/null: errors:
denied: requested access to the resource is denied
unauthorized: authentication required

It seems that it gets all the way through building the four assemblies then crashes when trying to pull a singularity image for BUSCO evaluation. I confess I'm not quite sure how to debug this one - do you have any ideas? I've also tried -profile conda and am getting environment creation conflict errors out of that, and so seem to have reached an impasse.

Thanks very much for your attention,

Chris L

precheck issues

The pipeline looks really promising but I am having difficulty with the setup.

bash precheck_TransPi.sh $PWD

With either yes or no to singularity/docker, followed by

2) EUKARYOTA
4) Plants_(Kingdom) 
7) Poales_(Order)

Successfully downloads BUSCO V4 DB,

         -- Preparing files ... --


         -- DONE with BUSCO V4 database --


         -- No BUSCO V3 available for poales --

BUSCO v3 are not available and nextflow.config is not generated from the template.

What is the reasoning for requiring both versions of BUSCO databases? Would v4 not be sufficient?

Error in rRNAfilter and sortmerna

Hi @rivera10 ,

I have problems removing rRNA using --rRNAfilter (sortmerna) and --rRNAdb (database of 12.2 GB) options. I hope you can point me in the right direction to resolve this issue.
I was able to run TransPi, and all the steps were good when the rRNA removal step was skipped.

Here is my script

./nextflow run TransPi.nf --all --reads '/All_reads/LP_ALL_R[1,2].fastq.gz'
--k 25,41,53 --maxReadLen 75 -profile conda
--allBuscos --rescueBusco --buscoDist
--rRNAfilter --rRNAdb "/LSU_SSU/LSU_SSU.fasta"
--outdir Results_LP2 -w LP2_work

and here is what I got
[c7/479987] process > remove_rrna (LP_ALL_R) [ 0%] 0 of 1
. . .
[48/1fe9d5] process > get_run_info [100%] 1 of 1 ✔
Error executing process > 'remove_rrna (LP_ALL_R)'

Caused by:
Process remove_rrna (LP_ALL_R) terminated with an error exit status (135)

Command executed:

sortmerna --ref /LSU_SSU/LSU_SSU.fasta --reads left-LP_ALL_R.filter.fq --reads right-LP_ALL_R.filter.fq --threads 15 --aligned rRNAreads --other nonrRNAreads --paired_in --out2 --fastx --workdir .
mv rRNAreads.log LP_ALL_R_remove_rRNA.log
mv rRNAreads_fwd* LP_ALL_R_rRNA_reads.R1.fq
mv rRNAreads_rev* LP_ALL_R_rRNA_reads.R2.fq
mv nonrRNAreads_fwd* LP_ALL_R_no_rRNA.R1.fq
mv nonrRNAreads_rev* LP_ALL_R_no_rRNA.R2.fq
v=$( sortmerna -version | grep "SortMeRNA version" | awk '{print $3}' )
echo "SortMeRNA: $v" >sortmerna.version.txt

Command exit status:
135

Command output:
[pop:108] read_queue Popped read number: 92990059
[pop:108] write_queue Popped read number: 92990062
. . .
[pop:108] write_queue Popped read number: 94140063
[pop:108] read_queue Popped read number: 94190059
[pop:108] write_queue Po

Command error:
.command.sh: line 2: 21691 Bus error sortmerna --ref /LSU_SSU/LSU_SSU.fasta --reads left-LP_ALL_R.filter.fq --reads right-LP_ALL_R.filter.fq --threads 15 --aligned rRNAreads --other nonrRNAreads --paired_in --out2 --fastx --workdir .

Work dir:
/LP2_work/c7/4799873eae6b2df5b8c5ffeffc3aba

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

Saludos!
David Paz

Multiple taxid for database download

Have the option to accept multiple taxid for the custom database download in the precheck script. Currently, only one taxid can be used with the script.

error at TransPi_Report_Ind.Rmd

I got this error at the final report. Could you give me some hint on how to solve it?

Cheers,
Joan

Something went wrong. Check error message below and/or log files.
Error executing process > 'get_report (1)'

Caused by:
Process get_report (1) terminated with an error exit status (1)

Command executed:

sample_id=$( cat input.1 )
cp /dades/jpp/bin/TransPi_Report_Ind.Rmd .
Rscript -e "rmarkdown::render('TransPi_Report_Ind.Rmd',output_file='TransPi_Report_${sample_id}.html')" ${sample_id} false true false false false false

Command exit status:
1

Command output:

|                                                                            
|.....                                                                 |   7%
ordinary text without R code


|                                                                            
|......                                                                |   9%

label: load_libraries (with options)
List of 1
$ include: logi FALSE

|                                                                            
|........                                                              |  11%
ordinary text without R code


|                                                                            
|.........                                                             |  13%

label: readstats_table (with options)
List of 1
$ echo: logi FALSE

|                                                                            
|...........                                                           |  15%
ordinary text without R code


|                                                                            
|............                                                          |  17%

label: qual_plot (with options)
List of 1
$ echo: logi FALSE

|                                                                            
|..............                                                        |  20%
ordinary text without R code


|                                                                            
|...............                                                       |  22%

label: qual_plot2 (with options)
List of 1
$ echo: logi FALSE

Command error:

processing file: TransPi_Report_Ind.Rmd

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

  last_plot

The following object is masked from 'package:stats':

  filter

The following object is masked from 'package:graphics':

  layout

Quitting from lines 87-100 (TransPi_Report_Ind.Rmd)
Error in 1:rqual[5, 1] : NA/NaN argument
Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> plot_ly
Execution halted

Work dir:
/dades/jpp/TransPi/work/0b/979b0f5ca8ac4399d912ac57764e60

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

Combine processes that are shared between various analysis mode

Currently, each analysis is run separately (e.g. --onlyAsm,--all), and common processes are not shared. Combining common processes from the various analysis modes will reduce several lines of code. Also, it will make it possible to use -resume from --onlyAsm to --all if needed.

Process `fastp (Ccelos_S1_L001)` terminated with an error exit status (127)

Dear all

many thanks for putting together this great tool. I had the error pointed at this issue title. Here is what I have done:

1- Clone the repository
git clone https://github.com/palmuc/TransPi.git

2- Move to the TransPi directory
cd TransPi

3- Install
bash precheck_TransPi.sh /media/david/90aba6e6-45be-4827-9400-ae7db2b6022c/

used the singularity option

4- Run the pipeline

nextflow run TransPi.nf --all --maxReadLen 150 --k 25,35,55,75,85
--reads '/media/david/90aba6e6-45be-4827-9400-ae7db2b6022c/fastqc_files/*_R[1,2].fastq.gz' --outdir out_Ccelo
-profile singularity,TransPiContainer

Examples of the fastq files names: Ccelos_S1_L001_R1.fastq.gz ; Ccelos_S1_L001_R2.fastq.gz

ERROR:

Error executing process > 'fastp (Ccelos_S1_L001)'

Caused by:
Process fastp (Ccelos_S1_L001) terminated with an error exit status (127)

Command executed:

fastp -i Ccelos_S1_L001_R1.fastq.gz -I Ccelos_S1_L001_R2.fastq.gz -o left-Ccelos_S1_L001.filter.fq -O right-Ccelos_S1_L001.filter.fq --detect_adapter_for_pe --average_qual null --overrepresentation_analysis --html Ccelos_S1_L001.fastp.html --json Ccelos_S1_L001.fastp.json --thread 1 --report_title Ccelos_S1_L001

v=$( fastp --version 2>&1 | awk '{print $2}' )
echo "fastp: $v" >fastp.version.txt

Command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: fastp: command not found

Work dir:
/media/david/90aba6e6-45be-4827-9400-ae7db2b6022c/TransPi/work/51/626cb6ae865c314206ae95e7e67f70

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

Here is the equipment and software info:
Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz ; Linux
Linux version 5.4.0-73-generic (buildd@lcy01-amd64-019) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04))

Many thanks

Ricardo

User DB option

If the user has already the DBs downloaded and in another directory (to use with other tools), provide the option to add these PATHs when running the precheck and skip the generation of the DBs

Pfam-A.hmm.h3i -- not downloaded

Hello!

I am having issues running the "transdecoder_hmmer" step.

Do you know how to regenerate the hmmr DB files?

It appears they exist, but the .h3i file is empty:
-rw-r--rw- 1 root root 1459135873 Mar 8 17:44 Pfam-A.hmm
-rw-rw-r-- 1 root root 57253888 Mar 9 12:11 Pfam-A.hmm.h3f
-rw-rw-r-- 1 root root 0 Mar 9 12:11 Pfam-A.hmm.h3i
-rw-rw-r-- 1 root root 105185280 Mar 9 12:11 Pfam-A.hmm.h3m
-rw-rw-r-- 1 root root 123682816 Mar 9 12:11 Pfam-A.hmm.h3p

Thank you in advance!

Something went wrong. Check error message below and/or log files.
Error executing process > 'transdecoder_hmmer (Cprol)'

Caused by:
  Process `transdecoder_hmmer (Cprol)` terminated with an error exit status (1)

Command executed:

  echo -e "\n-- Starting HMMER --\n"

  hmmscan --cpu 8 --domtblout Cprol.pfam.domtblout /cm/shared/apps/TransPi/DBs/hmmerdb/Pfam-A.hmm Cprol.longest_orfs.pep

  echo -e "\n-- Done with HMMER --\n"

Command exit status:
  1

Command output:

  -- Starting HMMER --

Command error:

  Error: File format problem, trying to open HMM file /cm/shared/apps/TransPi/DBs/hmmerdb/Pfam-A.hmm.
  Opened /cm/shared/apps/TransPi/DBs/hmmerdb/Pfam-A.hmm.h3m, a pressed HMM file; but format of its .h3i file unrecognized

Work dir:
  /TransPi_files/work/11/5e810cef8a3a0929cdd99789408a73

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

	cat $transpi_tsv \| grep -v "#" \| tr "\\t" "," >>$all_busco
	SOS_busco.py -input_file_busco $all_busco -input_file_fasta $assembly -min ${params.minPerc} -kmers ${params.k}

palmuc / transpi Goto Github PK

transpi's Introduction

TransPi - TRanscriptome ANalysiS PIpeline

Table of contents

General info

Pipeline processes

Manual

Publication

Citation

Funding

Future work

Issues

Chat

transpi's People

Contributors

Stargazers

Watchers

Forkers

transpi's Issues

nextflow run TransPi.nf --all --reads '/run/media/DataProcessing/tiburon/teste/R[1,2].fq.gz' --k 25,41,53,61 --maxReadLen 100 --outdir /run/media/DataProcessing/tiburon/teste/ -profile conda N E X T F L O W ~ version 21.04.1 Launching TransPi.nf [admiring_woese] - revision: 5a26a6e85d

TransPi - Transcriptome Analysis Pipeline v1.3.0-rc

nin=63492, nok=16561, nfrag=168, nskipnotloc=45359, nskipdupfrag=1121, nskipdiffloc=283

(TransPi) chril1@hpc-gpu-002:~/TransPiDBs$ ./nextflow run ../TransPi/TransPi.nf --all --maxReadLen 150 --k 25,35,55,75,85 --reads '/home/chril1/NERC_PhyloCourse/transcriptome_data/Flagellophora_sp/SRR8641368_R[1,2].fastq.gz' --outdir Results_Flagellophora -profile singularity,TransPiContainer

N E X T F L O W ~ version 21.04.1 Launching ../TransPi/TransPi.nf [furious_einstein] - revision: 5a26a6e85d

TransPi - Transcriptome Analysis Pipeline v1.3.0-rc

used the singularity option

Recommend Projects

Recommend Topics

Recommend Org

nextflow run TransPi.nf --all --reads '/run/media/DataProcessing/tiburon/teste/R[1,2].fq.gz' --k 25,41,53,61 --maxReadLen 100 --outdir /run/media/DataProcessing/tiburon/teste/ -profile conda
N E X T F L O W ~ version 21.04.1
Launching `TransPi.nf` [admiring_woese] - revision: 5a26a6e85d

N E X T F L O W ~ version 21.04.1
Launching `../TransPi/TransPi.nf` [furious_einstein] - revision: 5a26a6e85d