anantharamanlab / metabolic Goto Github PK

A scalable high-throughput metabolic and biogeochemical functional trait profiler

Perl 78.85% Shell 0.46% R 20.69%

metabolic's Introduction

METABOLIC

METabolic And BiogeOchemistry anaLyses In miCrobes
Current Version: 4.0 Tested on: Ubuntu 18.04.5 LTS (Linux 5.4.0-81-generic x86_64) (Sep 2021)

This software enables the prediction of metabolic and biogeochemical functional trait profiles to any given genome datasets. These genome datasets can either be metagenome-assembled genomes (MAGs), single-cell amplified genomes (SAGs) or isolated strain sequenced genomes. METABOLIC has two main implementations, which are METABOLIC-G and METABOLIC-C. METABOLIC-G.pl allows for generation of metabolic profiles and biogeochemical cycling diagrams of input genomes and does not require input of sequencing reads. METABOLIC-C.pl generates the same output as METABOLIC-G.pl, but as it allows for the input of metagenomic read data, it will generate information pertaining to community metabolism. It can also calculate the genome coverage. The information is parsed and diagrams for elemental/biogeochemical cycling pathways (currently Nitrogen, Carbon, Sulfur and "other") are produced.

Program Name	Program Description
METABOLIC-G.pl	Allows for classification of the metabolic capabilities of input genomes.
METABOLIC-C.pl	Allows for classification of the metabolic capabilities of input genomes, calculation of genome coverage, creation of biogeochemical cycling diagrams, and visualization of community metabolic interactions and contribution to biogeochemical processes by each microbial group.

Slides of introducing METABOLIC (for a C-DEBI series meeting presentation) were provided here: (https://github.com/AnantharamanLab/METABOLIC/blob/master/METABOLIC_C-DEBI_slides.pdf)

(The carbon fixation pathway automated annotation gets updated - in Appendix)

If you are using this program, please consider citing our paper, available at Microbiome:

Zhou, Z., Tran, P.Q., Breister, A.M. et al. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. Microbiome 10, 33 (2022). https://doi.org/10.1186/s40168-021-01213-8

Installing and using METABOLIC

Please see the project home page for usage details and installation instructions:
https://github.com/AnantharamanLab/METABOLIC/wiki

metabolic's People

Contributors

Stargazers

Watchers

Forkers

linxingchen pythseq thexiyang susheelbhanu kriskieft juadiegaitan jfq3 liupfskygre arghya1611 jianshu93 rajaldebnath tin6150 ale-rossi pandengwang hannet91 chaolab jbuongio anyihu songbaozou sarahisherb izabelshen kate-lane maocheng2020 bikmi nanyw123 yuanbaowen521 duqiyao tianningsun zz-ai sandra-hs-leung calvin2077 dinindusenanayake rishibhandari63 natalihb zhengzhengzhj shahed30 diaomuhe jeanmarcbillod hocnonsense marcosquintelab aubeldutcha

metabolic's Issues

Possible to concatenate Hmm files ?

Hi,
The software creates ~10,000 hmm files per genome, I tried to run the analysis on ~100 of them which created thus ~100,000 files that the cluster that I'm using is finding quite hard to process. I would suggest if possible to concatenate all these to have 1 per genome if possible?
Best
Gregoire

Error(s) with test run

Hi @patriciatran and @ChaoLab,

I've recently re-installed METABOLIC using conda [https://github.com//issues/27] and git clone https://github.com/AnantharamanLab/METABOLIC.git in a new environment. When I tried running the command with the test dataset, I got some errors that I assume are perl-related.

This is the command I used:
perl METABOLIC-G.pl -test true

And this was the output + errors/warnings:

[2021-03-22 20:46:38] The Prodigal annotation is running...
[2021-03-22 20:47:23] The Prodigal annotation is finished
[2021-03-22 20:47:23] The hmmsearch is running with 5 cpu threads...
[2021-03-22 21:34:30] The hmmsearch is finished
readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoA.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoA.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoA.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoA.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoB.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoB.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoB.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoB.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoC.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoC.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.amoC.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.amoC.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrE.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrF.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.dsrH.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoA.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoA.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoA.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoA.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoC.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoC.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoC.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoC.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoB.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoB.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1361.
Use of uninitialized value in concatenation (.) or string at METABOLIC-G.pl line 1373.
Parse failed (sequence file METABOLIC_out/tmp.pmoB.check.faa):
Premature EOF in parsing FASTA name/description line

Parse failed (sequence file METABOLIC_out/tmp.pmoB.check.faa):
Premature EOF in parsing FASTA name/description line

readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
readline() on closed filehandle _IN at METABOLIC-G.pl line 1337.
Use of uninitialized value $seq in pattern match (m//) at METABOLIC-G.pl line 356.
[2021-03-22 21:34:35] The hmm hit result is calculating...
[2021-03-22 21:34:35] Generating each hmm faa collection...
[2021-03-22 21:34:35] Each hmm faa collection has been made
[2021-03-22 21:34:35] The KEGG module result is calculating...
[2021-03-22 21:38:26] The KEGG identifier (KO id) result is calculating...
[2021-03-22 21:38:26] The KEGG identifier (KO id) seaching result is finished
[2021-03-22 21:38:26] Searching CAZymes by dbCAN2...
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
[2021-03-22 21:41:10] dbCAN2 searching is done
[2021-03-22 21:41:10] Searching MEROPS peptidase...
[2021-03-22 21:42:37] MEROPS peptidase searching is done
Warning message:
package ‘openxlsx’ was built under R version 4.0.3 
[2021-03-22 21:42:39] METABOLIC table has been generated
[2021-03-22 21:42:39] Drawing element cycling diagrams...
Loading required package: shape
[2021-03-22 21:42:41] Drawing element cycling diagrams finished

Please let me know if you need any additional information - thanks in advance!
Looking forward to re-running this smoothly and applying it to my datasets.

Reads not mapping to metagenomes

Describe the bug
I have 40 metagenomes from geothermal systems of which I assembled and binned seperately. From these, I produced 1,540 MAGs. The output of METABOLIC-C shows predicted pathways of each of the MAGs, but no read percent is given.

To Reproduce
I believe it is more user error than a bug. My input commands are as follows:
perl METABOLIC-C.pl -t 40 -m-cutoff 0.60 -in-gn /home/lloydlab/TJ_BMS/Panama_2018_BMS/ALL_METAW_REFINED_BINS -kofam-db full -r /home/lloydlab/TJ_BMS/Panama_2018_BMS/Panama_Reads.txt -o Panama_Metabolic_Run -tax family

My .txt doc is as follows:

MAG directory looks like this(not all fasta files would fit on screen):

Read directory looks like this:

Some of the output files:

When I only run MAGs and reads from two different Metagenomes, everything seems to work fine. But, once all are included, I get the above out put. Have I missed something? Any help will be greatly appreciated.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Linux
Terminal
Smartphone (please complete the following information):
Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

What happens with several reads pairs?

Hi,
Something that I don't find very clear in the documentation, is what happens to the "figures" if there are several reads pairs.
I wanted to compared metagenomic samples to check the differences in metabolism/MAGs abundance after a coassembly.
If I have 5 samples, should I run the script 5 times with one sample each time? Such as for example to obtain different MW-scores for each samples.
Thanks!
Greg

Metabolic network diagrams

Hello,

Is it possible to construct the metabolic network diagrams at taxonomic levels other than phylum? Such as order, family etc?

Thank you,
Sarha

Question: Can I run my own data using "test=true" option?

Hi,
I have two questions.

Can I run my own data using "test=true" option?
When I run the test data, it seems to work smoothly. But when I am running my own data with "perl METABOLIC-C.pl -t 40 -m-cutoff 0.75 -in-gn Genome_files -kofam-db full -r omic_reads_parameters.txt -o METABOLIC_out" option, I had many errors. For example:
Traceback (most recent call last):
File "/project/qcx/test/METABOLIC_running_folder/METABOLIC/Accessory_scripts/hmmscan-parser-dbCANmeta.py", line 36, in
with open('temp') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'temp'.
I don't know how it happens. So, can I put my data into the test_files directory and run the program with "test= true" option? Will the results be different from the normal way?
Can I directly input several metagenomic data?
In your test data, there was only 1 sample data in METABOLIC_test_reads directory (SRR3577362_sub_1 and 2.fastq). Can I input more sample (sample1_1.fastq; sample1_2.fastq; sample2_1.fastq; sample2_2.fastq and so on)? Will they be read automatically?
Thank you for your help in advance!
Bests,
Qicheng

Show the help if no arguments to main command METABOLIC-G.pl

Hi,
When you try to run the command perl METABOLIC-G.pl without any arguments, the software attempts to run and give a log such as:

mkdir: cannot create directory ‘/home/michoug/Softwares/METABOLIC’: File exists
readline() on closed filehandle IN at METABOLIC-G.pl line 158.
readline() on closed filehandle IN at METABOLIC-G.pl line 175.
mkdir: cannot create directory ‘/home/michoug/Softwares/METABOLIC’: File exists
sh: line 1: /intermediate_files: No such file or directory
Use of uninitialized value $input_protein_folder in concatenation (.) or string at METABOLIC-G.pl line 220.
ls: cannot access '/*.faa': No such file or directory

and continue on an seemingly infinite loop:

Use of uninitialized value $hmm in hash element at METABOLIC-G.pl line 291, <IN> line 1.
Use of uninitialized value $hmm in hash element at METABOLIC-G.pl line 356.
Use of uninitialized value $hmm in hash element at METABOLIC-G.pl line 291, <IN> line 2.

May I suggest to show the help in this case?
Best
Greg

Empty `network` and `energy_flow` input files

Hi,

I have a full completed run of METABOLIC, and the following two files are empty:

1. Metabolic_energy_flow_input.txt
2. Metabolic_network_input.txt

Therefore, the Community Plot is also empty. I have 662 bins and provided the reads for each of the bins, in a file that looks the following:

# Read pairs:
/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R10_GL11_UP_3/run1/Preprocessing/mg.r1.preprocessed.fq,/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R10_GL11_UP_3/run1/Preprocessing/mg.r2.preprocessed.fq
/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R10_GL11_UP_3/run1/Preprocessing/mg.r1.preprocessed.fq,/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R10_GL11_UP_3/run1/Preprocessing/mg.r2.preprocessed.fq
/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R10_GL11_UP_3/run1/Preprocessing/mg.r1.preprocessed.fq,/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R10_GL11_UP_3/run1/Preprocessing/mg.r2.preprocessed.fq
/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R10_GL11_UP_3/run1/Preprocessing/mg.r1.preprocessed.fq,/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R10_GL11_UP_3/run1/Preprocessing/mg.r2.preprocessed.fq
/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R11_GL15_UP_1/run1/Preprocessing/mg.r1.preprocessed.fq,/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R11_GL15_UP_1/run1/Preprocessing/mg.r2.preprocessed.fq
/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R11_GL15_UP_1/run1/Preprocessing/mg.r1.preprocessed.fq,/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R11_GL15_UP_1/run1/Preprocessing/mg.r2.preprocessed.fq
/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R11_GL15_UP_1/run1/Preprocessing/mg.r1.preprocessed.fq,/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R11_GL15_UP_1/run1/Preprocessing/mg.r2.preprocessed.fq
/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R11_GL15_UP_1/run1/Preprocessing/mg.r1.preprocessed.fq,/work/projects/nomis/metaG_JULY_2020/IMP3/GL_R11_GL15_UP_1/run1/Preprocessing/mg.r2.preprocessed.fq

Could you please let me know how best to proceed?
Thank you!

Error when ggplot was used to plot

When I used METABOLIC-C.pl I got this error: Error: Must request at least one colour from a hue palette. And the CommunityPlot.PDF was empty.
The log file was attached below.
nohup.txt

I don't know how to deal with this. Could you please help me with this?

METABOLIC-C error with -r

In order to run METABOLIC-C I am using the flags -in-gn and -r. According to the instructions, I need to provide the path to the paired-end reads. I tested the command: 1) providing a path to the files, and 2) command providing a text file with the path to the files. When providing the path to the files, I tested three different options for the syntax. Here are the scripts:

Providing a path to the files :

bsub -n 1 -R "rusage[mem=8000]" METABOLIC-G.pl -in-gn /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files/Gamma/ -r /cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/ -o Guaymas_3

Providing a text file with the paths to the files (content of the text file below):
bsub -n 1 -R "rusage[mem=8000]" METABOLIC-C.pl -t 1 -in-gn /cluster/work/magna/databases_metabolic/METABOLIC_test_files/Guaymas_Basin_genome_files/Gamma/ -r omic_reads_parameters.txt -o Guaymas_9
2.1) /cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/*.fastq
2.2) /cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/
2.3)/cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/SRR3577362_sub_1.fastq
/cluster/work/magna/databases_metabolic/METABOLIC_test_files/METABOLIC_test_reads/SRR3577362_sub_2.fastq

When trying all of the options cited above, I get the following error:
Use of uninitialized value in concatenation (.) or string at /cluster/apps/nss/metabolic/16082021/x86_64/METABOLIC-C.pl line 1788, <__IN> line 1.
Use of uninitialized value in concatenation (.) or string at /cluster/apps/nss/metabolic/16082021/x86_64/METABOLIC-C.pl line 1804.
stat: Bad file descriptor
Warning: Could not open read file "-S" for reading; skipping...
stat: Bad file descriptor
Warning: Could not open read file "Guaymas_11/All_gene_collections_mapped.1.sam" for reading; skipping...
Error: No input read files were valid
(ERR): bowtie2-align exited with value 1
[E::hts_open_format] Failed to open file "Guaymas_11/All_gene_collections_mapped.1.sorted.bam" : No such file or directory
samtools index: failed to open "Guaymas_11/All_gene_collections_mapped.1.sorted.bam": No such file or directory
rm: cannot remove 'Guaymas_11/All_gene_collections_mapped.1.sam': No such file or directory
rm: cannot remove 'Guaymas_11/.bam': No such file or directory
rm: cannot remove 'Guaymas_11/.bai': No such file or directory

Therefore, I would like to ask you how can I fix the issue and provide the correct syntax for the command.

Thank you very much!

Paula.

Cann't find the bam file when running test or real data

Hi Chao,

I met a problem that the bam file cann't be found when I tried to run either the test file or my real data. I have check the read address and made sure they were correct. Could you provide some suggestions? Thanks.

Please find the log information below.

[2021-11-27 22:32:49] The Prodigal annotation is running...
[2021-11-27 22:33:46] The Prodigal annotation is finished
[2021-11-27 22:33:46] The hmmsearch is running with 60 cpu threads...
[2021-11-27 22:37:50] The hmmsearch is finished
[2021-11-27 22:37:57] Generating each hmm faa collection...
[2021-11-27 22:37:57] Each hmm faa collection has been made
[2021-11-27 22:37:57] The KEGG module result is calculating...
[2021-11-27 22:43:29] The KEGG identifier (KO id) result is calculating...
[2021-11-27 22:43:29] The KEGG identifier (KO id) seaching result is finished
[2021-11-27 22:43:29] Searching CAZymes by dbCAN2...
[2021-11-27 22:50:10] dbCAN2 searching is done
[2021-11-27 22:50:10] Searching MEROPS peptidase...
[2021-11-27 22:51:46] MEROPS peptidase searching is done
[2021-11-27 22:51:48] METABOLIC table has been generated
[2021-11-27 22:51:48] Drawing element cycling diagrams...
[E::hts_open_format] Failed to open file "METABOLIC_out/All_gene_collections_mapped.1.sorted.bam" : No such file or directory
samtools index: failed to open "METABOLIC_out/All_gene_collections_mapped.1.sorted.bam": No such file or directory
rm: cannot remove 'METABOLIC_out/.bam': No such file or directory
rm: cannot remove 'METABOLIC_out/.bai': No such file or directory
Loading required package: shape
[2021-11-27 22:53:14] Drawing element cycling diagrams finished
[2021-11-27 22:53:14] Drawing metabolic handoff diagrams...
[2021-11-27 22:53:18] Drawing metabolic handoff diagrams finished
[2021-11-27 22:53:18] Drawing energy flow chart...
==> Processed 112/120 markers (93%) |██████████████ | [384.93marker/s, ETA 00:00 ==> Processed 22/45560 sequences (0%) | | [214.84sequence/s, ETA 0==> Processed 44/45560 sequences (0%) | | [209.74sequence/s, ETA 0==> Processed 95/45560 sequences (0%) | | [316.98sequence/s, ETA 0==> Processed 146/45560 sequences (0%) | | [370.88sequence/s, ETA ==> Processed 197/45560 sequences (0%) | | [403.23sequence/s, ETA ==> Processed 245/45560 sequences (1%) | | [418.63sequence/s, ETA ==> Processed 296/45560 sequences (1%) | | [435.30sequence/s, ETA

Is there a dockerfile or docker image for METABOLIC? It is too difficult to install it because many dependencies. Thanks a lot.

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Problem loading .faa files

Hello,

I tried to run METABOLIC with the following settings:

$ perl ~/user_data/Programs/METABOLIC/METABOLIC-G.pl -in GUT_GENOME000001.fna.genes.faa -o metabolic_gamma_v1 -t 6

And immediately get the following error:

ls: cannot access 'GUT_GENOME000001.fna.genes.faa/*.faa': Not a directory
sh: 1: cannot create GUT_GENOME000001.fna.genes.faa/faa.total: Directory nonexistent
mv: failed to access 'GUT_GENOME000001.fna.genes.faa/total.faa': Not a directory
[2020-09-02 17:42:47] The hmmsearch is running with 6 cpu threads...

Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading


Error: Failed to open sequence file GUT_GENOME000001.fna.genes.faa/total.faa for reading

With the last line repeating forever. It appears that the program is assuming that I'm passing it a folder containing a bunch of .faa files, rather than the .faa files themselves. I can get around this by putting my .faa files into a folder and passing the folder as the argument, so no rush on fixing, but I just wanted to point this out.

Best,
Matt

How do I explain the percentage of coverage

I thought the coverage in files, for example draw_carbon_cycle_total.pdf, was the percentage of MAGs harboring the specific functions in the whole community, i.e., (sequences mapped to MAGs)/(total sequences in the metagenomic datasets). However, I got a coverage value of 100%, that means, the MAGs I provided to METABOLIC accounted for 100% of the whole community. This is impossible.
So, could you explain that what is the coverage referred to?
Thank you!
Bests,
Qicheng

make_pepunit_db.pl missing in Accessory scripts folder

Error in METABOLIC_result.xlsx

Hi,
I ran the 5_genomes test and when I open the file METABOLIC_result.xlsx, first it was "repaired" by excel and the log told me that Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.
Secondly, 4 of the 5 genomes don't have any genes while the 5th genome had some gene present but the hits were from several genomes, i.e. CP002100.1_595,CP002100.1_595,CP031156.1_1669,CP031156.1_1669 for K00817.hmm

METABOLIC_result.xlsx

Question: What does 'Coverage' in the total C/N/S cycles mean?

Hi all,

Just a quick question regarding the coverage value in the draw_carbon_cycle_total.pdf plots.

For example, I have the attached output which shows that 583 genomes are involved in Organic Carbon Oxidation with a coverage of ~45.1%.

What does coverage here represent and how is it calculated, especially in the context of the 583 genomes?

Thank you very much for your help!
draw_carbon_cycle_total.pdf

RuBisCo hmm output parsing

Hello,
I'd like to report strange behavior of parsing hmm outputs of rubisco. There are hits in rubisco_form_III.hmm.total.hmmsearch_result.txt (score threshold = 450). However, if I check the HMMHitNum spreadsheet the RuBisCo Form III is absent in all MAGs. There are also hits in K01601.hmm.total.hmmsearch_result.txt (threshold = 409.67 score) and the KEGGModuleStepHit shows some irregular results; depending on the module it shows absent/present although the module step should be the same protein/ko. See picture below:

I am not sure if it is really a bug as I don't precisely understand how the noise and trusted cutoffs works as described in the manuscript. Is there any filtering step after the hmm search using defined score thresholds? Anyway, BLASTP shows the closest proteins are RuBisCo form III with cca 55% identity against RefSeq and 90% against NR.

METABOLIC v4.0

Protein in question:

>MAG03_10_4
MVHEKYEDFVELSYKPSKKELLCSFYVEPAAGESIKRAAGAVASESSVGTWTSVPGLHLKHVMKIAATCYEINGNWIKIAYPVENFEPGSLPQIFSSIAGNVFGMKAVRNLRLHDVEWPLSLKRSFPGPQFGIDGIRKILKVQGRPITASVPKPKIGMTAEEHAQIGYKIWTGGFDLLKDDENLTSQKFDRFEDRVKHSMRMREKAEKETGERKACLLNITAPFQEMVKRAKLVSDYGNEYVMVDMLTIGWSALQGIREVCEELKLALYAHRAFHAAFTRNRRHGMSMLVVAESARLAGVDNVHIGTVVGKLESPKEEVLAIHERMQKQKIESDSRQHLFGEDWGSMKKVMSTSSGGLHPGLIPKIIKMLGRDIAIQVGGGVHGHPDGSKAGATAVMQAIDAALKEIPLREYAKTHKELKAALVQWGYLMPR

KEGG module steps are not reflecting reality.

As mentioned in #42 I experienced some issues with KEGG module step results table. The issue is, whenever a KEGG module is deemed to be present/absent using a completeness threshold, all the module steps are also marked present/absent ignoring reality if some of the steps are present or absent.

If you run the test METABOLIC data, you can clearly see that individual module steps are either all absent or present for whole modules. E.g. the results for Pentose phosphate pathway, non-oxidative phase, fructose 6P => ribose 5P show:

Whereas the reality is this (as seen in hmm hits/extracted AA):

The bug might have been introduced when correcting #12 as the attached old .xlsx shows the module steps correctly.

METABOLIC v.4, rest of packages versions are in #27 (comment)

Thanks for looking into it.

Michal

METABOLIC_result.xlsx cannot be opened

Hi Thanks for the nice tool! When dealing with many genomes, e..g 300, the produced METABOLIC_result.xlsx cannot be opened, saying this file is corrupt and cannot be opened. For dealing with 2 or 3 genomes, it works well. What could be the issue? Thanks!

Wrong results in METABOLIC_result.xlsx

Thank you so much for developing this extremely useful tool.
I updated this tool to the latest version one week ago. After I run it with my genomes, the contents in METABOLIC_result.xlsx are as follows:

Category	Function	Gene abbreviation	Gene name	Hmm file	Corresponding KO	Reaction	Substrate	Product	Gn0001 Hmm presence	Gn0001 Hit numbers	Gn0001 Hits	Gn0002 Hmm presence	Gn0002 Hit numbers	Gn0002 Hits
Thermophilic specific	Thermophilic specific	rgy	reverse gyrase	TIGR01054.hmm	K03170	Thermophilic specific	N/A	N/A	Absent	0	None	Absent	0	None
Amino acid utilization	4-aminobutyrate aminotransferase and related aminotransferases	4-aminobutyrate aminotransferase and related aminotransferases	4-aminobutyrate aminotransferase and related aminotransferases	K00823.hmm, K07250.hmm, K13524.hmm, K14268.hmm, K03918.hmm	K00823, K07250, K13524, K14268, K03918	4-aminobutanoate + 2-oxoglutarate = succinate semialdehyde + L-glutamate [RN:R01648]	4-aminobutanoate; 2-oxoglutarate	succinate semialdehyde; L-glutamate	Absent	0	None	Absent	0	None
Amino acid utilization	Aminotransferase class I and II	aminotransferase class I and II	aminotransferase class I and II	K05825.hmm	K05825	L-2-aminoadipate + 2-oxoglutarate = 2-oxoadipate + L-glutamate [RN:R01939]	L-2-aminoadipate; 2-oxoglutarate	2-oxoadipate; L-glutamate	Present	1	None	Present	2	None
Amino acid utilization	Phosphoserine aminotransferase	phosphoserine aminotransferase	phosphoserine aminotransferase	K00831.hmm	K00831	O-phospho-L-serine + 2-oxoglutarate = 3-phosphooxypyruvate + L-glutamate [RN:R04173]; 4-phosphooxy-L-threonine + 2-oxoglutarate = (3R)-3-hydroxy-2-oxo-4-phosphooxybutanoate + L-glutamate [RN:R05085]	O-phospho-L-serine; 2-oxoglutarate; 4-phosphooxy-L-threonine	3-phosphooxypyruvate; L-glutamate; (3R)-3-hydroxy-2-oxo-4-phosphooxybutanoate	Present	1	None	Present	1	None
Amino acid utilization	Ornithine/acetylornithine aminotransferase	ornithine/acetylornithine aminotransferase	ornithine/acetylornithine aminotransferase	K00819.hmm, K00821.hmm, K05830.hmm, K00840.hmm	K00819, K00821, K05830, K00840	L-ornithine + a 2-oxo carboxylate = L-glutamate 5-semialdehyde + an L-amino acid [RN:R01343]	L-ornithine; 2-oxo carboxylate	L-glutamate 5-semialdehyde; L-amino acid	Absent	0	None	Absent	0	None
Amino acid utilization	Branched-chain amino acid aminotransferase/4-amino-4-deoxychorismate lyase	branched-chain amino acid aminotransferase/4-amino-4-deoxychorismate lyase	branched-chain amino acid aminotransferase/4-amino-4-deoxychorismate lyase	K00826.hmm, K02619.hmm, K03342.hmm	K00826, K02619, K03342	L-leucine + 2-oxoglutarate = 4-methyl-2-oxopentanoate + L-glutamate [RN:R01090]	L-leucine; 2-oxoglutarate	4-methyl-2-oxopentanoate; L-glutamate	Absent	0	None	Absent	0	None
Amino acid utilization	Aspartate/tyrosine/aromatic aminotransferase	aspartate/tyrosine/aromatic aminotransferase	aspartate/tyrosine/aromatic aminotransferase	K00812.hmm, K00813.hmm, K11358.hmm, K00832.hmm	K00812, K00813, K11358, K00832	L-aspartate + 2-oxoglutarate = oxaloacetate + L-glutamate [RN:R00355]	L-aspartate; 2-oxoglutarate	oxaloacetate; L-glutamate	Absent	0	None	Absent	0	None
Amino acid utilization	Histidinol-phosphate/aromatic aminotransferase	histidinol-phosphate/aromatic aminotransferase	histidinol-phosphate/aromatic aminotransferase	K00817.hmm	K00817	L-histidinol phosphate + 2-oxoglutarate = 3-(imidazol-4-yl)-2-oxopropyl phosphate + L-glutamate [RN:R03243]	L-histidinol phosphate; 2-oxoglutarate	3-(imidazol-4-yl)-2-oxopropyl phosphate; L-glutamate	Present	1	None	Present	1	None

Gn0001 Hits and Gn0001 Hmm presence didn't match each other.

Excel file cannot be opened

Hello,
When I ran the script with a specified m-cutoff, the excel file turned out to be corrupt and could not be opened.
Emine

Coverages and genomes values in nutrient cycling diagrams when using metaT reads

Hi there,
I have a quick question -- when looking at the results from a METABOLIC-C run that used metatranscriptomic reads, how am I to interpret the "Genomes" and "Coverage" in the summary nutrient cycling diagrams? For example, would a result of "Genomes: 2" and "Coverage = 0%" indicate that two genomes contain that step of the process, but 0% of the reads map to it? I am confused about the percentage part.
Additionally, what makes something show up as red (present) in this diagram? Is there a percentage cutoff?
Thank you!!
Best,
Joy

metagenomics reads

Describe the bug
do metagenomics reads have to be in unzipped format? can I run this using zipped reads?

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Error in the instructions: All_Module_KO_ids.txt

Hi,
There is an error in the instructions to install:
the command: mv All_Module_KO_ids.txt profiles fails because you are in the kofam_database folder as the file is in the root folder.
It should be mv ../All_Module_KO_ids.txt profile
Best
Greg

Pod::Perldoc::ToTerm module issue?

Thresholds?

Can someone tell me where to find the thresholds used for the hmms? I've been trying to figure out where the cutoff for something like amoA is set and I don't easily see it. I know it's a TIGRFAM you use, but I don't know if you are using TCs or an E-value.

Discrepancy between v1.3 and v3.0

Hello,

I just updated METABOLIC to its latest version. However, I noted several discrepancies between the old output table and the new one (see files below). Specifically, all K profiles are absent. It does seem to me that the newest version of METABOLIC is not searching the K profiles. None of them are found in the intermediate files folder.

Is there a way to download previous versions of METABOLIC? I have unfortunately removed v1.3.

Please let me know how to fix this.

Thank you,

--
Sergio A. Muñoz-Gómez, Ph.D.
Postdoctoral Research Scholar
Center for Mechanisms of Evolution
Biodesign Institute
Arizona State University

METABOLIC_result_old.xlsx
METABOLIC_result.xlsx

Question about `sequential flow results`

Hi..

I have a question about the attached figure. On the X-axis, there are a few variable such as M+N, Z+AB etc.

Where can I find the legend for these?

Thank you!

METABOLIC-C for Nanopore reads (Unpaired)

Hi!
For context, I have 15 sets of marine samples, each sequenced and basecalled using ONT minION.
I intend to use METABOLIC-C.pl since I have community data. However, it seems that I have to use .fasta files and list the paired reads for the script to work. Problem is, the output from the MINION is in .fastq and I don't think they are paired read.
Any idea how I can run the script to analyse my dataset?

Perl error

Thanks for the nice tool. I keep getting the message like this: Use of uninitialized value $head_old in concatenation (.) or string at METABOLIC-G.pl line 1101, <_IN> line 515785. I used the command exactly as what shown in the help message. What could be the reason for this? How can I solve it? Thanks！

Error in draw_biogeochemical_cycles.R

Hi,
there is a confusion in the graph about anammox and nitrite ammonification, the data are inverted :

The input

Total.R_input.txt

N-S-07:Nitrous oxide reduction	15	0.0404908627697313
N-S-08:Nitrite ammonification	28	0.196098546607314
N-S-09:Anammox	0	0
O-S-01:Metal reduction	1	0

and in the script
draw_biogeochemical_cycles.R

textplain(mid = c(0.55, 0.65), 
          lab = c("Step9: Anammox",
                  paste("Genomes:",input.total$Nb.Genome[17]),
                  paste("Coverage:",input.total$Genome.Coverage.Percentages.Round[17],"%"))) 
textplain(mid = c(0.47, 0.35), 
          lab = c("Step8: Nitrite ammonification",
                  paste("Genomes:",input.total$Nb.Genome[18]),
                  paste("Coverage:",input.total$Genome.Coverage.Percentages.Round[18],"%")))

See the attached figure

draw_nitrogen_cycle_total.pdf

Best
Greg

perl script shebang line

The installation docs state:

Note that one additional step is required: the shebang line of the two main scripts (METABOLIC-C.pl and METABOLIC-G.pl) should be edited to match the perl installation in your conda environment (ie: #! /path/to/conda/env/bin/perl)

Why not just use #!/usr/bin/env perl, which should use whichever perl executable in the user's PATH?

Wrong file name is run_to_setup.sh

Hi,
the run_to_setup.sh file is trying to uncompress the file METABOLIC_temp_and_db.tgz which apparently is now named METABOLIC_template_and_database.tgz
Best
Greg

Sulfur oxidation - dsrAB definition

Hello,
First, thanks for METABOLIC, the tool looks really promising! I have a question about the MN-score function definition S-S-03:Sulfur oxidation - dsrAB. Are these only oxidative dsrAB (reverse dsr) or does it look for all dsrAB? If it is only oxidative shouldn't there be also reductive version under the sulfite reduction and if it looks for all shouldn't it be labeled differently? Am I missing something?

Thanks for clarification!
Michal

Question on MN-score_results.txt

Hello,

In the output file MN-score_results.txt I have one more column than phyla, i.e., one unlabelled column. Is this the percentage that is from MAGs that could not be assigned taxonomically?

Thanks for your help.

Discrepancy between hmm hits and KEGG modules in excell file

Hello, thanks for the great tool. For some of the pathways, although none of the involved genes were found to be present by hmmsearch, in the kegg module hit table, the pathway is marked present. What may be the reason for this and which one should I consider to be accurate.

Emine.

KEGG Identifier Result files description

Is your feature request related to a problem? Please describe.
I've been having a problem with the .xlsx output, as per issue #8 and issue #10. I think a good workaround would be to use the .genes.hits.txt or hits.txt files but I can't find any documentation on how they were generated. I can see that these files are not identical (via, for example md5sums as well as manual inspection. How are they different?

Describe the solution you'd like
I would like to know the differences between the genes.hits and hits text files.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Thanks!!

perl

Describe the bug
Can't exec "hmmpress": Not a directory at batch_hmmpress.pl line 9,

To Reproduce
Steps to reproduce the behavior:

Go to './kofam_database/profiles/'
run: perl batch_hmmpress.pl

i got the error "Can't exec "hmmpress": Not a directory at batch_hmmpress.pl line 9", the files in the profiles folder are files not directories.

How i may solve this problem?

Use the results of METABOLIC-G.pl to run METABOLIC-C.pl

Hello, I have successfully run METABOLIC-G.pl, how should I set the command to use the result of METABOLIC-G.pl to run METABOLIC-C.pl, it seems that a large part of them are repeated, which is helpful to save time, can you achieve it, thank you very much!

Can't handle many genomes at a time?

Hi, I have 39 genomes that I have been trying to run through METABOLIC, but when I run all 39 at the same time, only a few of the cycling diagrams that come out have highlighted pathways in red. I have tried running 5 at a time, and even 2 at a time, and the outputs are different each time (the same genome run in a group of 5 genomes vs. 2 genomes gives different results). Is there any way I can fix this without having to run each genome 1 at a time? Thanks!

Laptop info

MacOS
Version 10.13.6

Inquiry about dependencies

This looks like a great software and I'm excited to try it. I just have a quick question; do I need to manually download all the dependencies listed or does cloning the github repository suffice? Also, is it downloadable using conda? and is the paper already out or still only available in BioRxiv? Thanks and looking forward to using METABOLIC!

problem with dbCAN2

Hello,

I've had a problem with the program. I'm trying to use it with 4 diferent genomes, but at certain point, there were the next errors:

[2021-02-15 11:12:17] Searching CAZymes by dbCAN2...
Error:
Error:
Error:
Error: Failed to open binary auxfiles for /mnt/c/Users/AsusLap/Documents/Otros/Apps/METABOLIC-master/dbCAN2/dbCAN-fam-HMMs.txt: use hmmpress first
Failed to open binary auxfiles for /mnt/c/Users/AsusLap/Documents/Otros/Apps/METABOLIC-master/dbCAN2/dbCAN-fam-HMMs.txt: use hmmpress first
Failed to open binary auxfiles for /mnt/c/Users/AsusLap/Documents/Otros/Apps/METABOLIC-master/dbCAN2/dbCAN-fam-HMMs.txt: use hmmpress first
Failed to open binary auxfiles for /mnt/c/Users/AsusLap/Documents/Otros/Apps/METABOLIC-master/dbCAN2/dbCAN-fam-HMMs.txt: use hmmpress first

cat: /mnt/c/Users/AsusLap/Documents/UNAM/Semestre_5/MetabolicProject/Metabolic_Results/intermediate_files/dbCAN2_Files/369_Pseudomonas_A5.feature_protein.dbCAN2.out.dm: No such file or directorycat:
cat: /mnt/c/Users/AsusLap/Documents/UNAM/Semestre_5/MetabolicProject/Metabolic_Results/intermediate_files/dbCAN2_Files/183_Paenarthrobacter_A5.feature_protein.dbCAN2.out.dmcat: /mnt/c/Users/AsusLap/Documents/UNAM/Semestre_5/MetabolicProject/Metabolic_Results/intermediate_files/dbCAN2_Files/A1_Pseudomonas_A5.feature_protein.dbCAN2.out.dm: No such file or directory/mnt/c/Users/AsusLap/Documents/UNAM/Semestre_5/MetabolicProject/Metabolic_Results/intermediate_files/dbCAN2_Files/181_Alcanivorax_A5.feature_protein.dbCAN2.out.dm: No such file or directory
: No such file or directory
[2021-02-15 11:12:19] dbCAN2 searching is done
[2021-02-15 11:12:19] Searching MEROPS peptidase...
[2021-02-15 11:17:11] MEROPS peptidase searching is done
sh: 1: Rscript: Permission denied
mv: cannot stat 'METABOLIC_result.xlsx': No such file or directory
[2021-02-15 11:17:11] METABOLIC table has been generated
[2021-02-15 11:17:11] Drawing element cycling diagrams...
sh: 1: Rscript: Permission denied
mv: cannot stat '/mnt/c/Users/AsusLap/Documents/UNAM/Semestre_5/MetabolicProject/Metabolic_Results/Output/draw_biogeochem_cycles': No such file or directory
rm: cannot remove '/mnt/c/Users/AsusLap/Documents/UNAM/Semestre_5/MetabolicProject/Metabolic_Results/Output': No such file or directory
[2021-02-15 11:17:12] Drawing element cycling diagrams finished

So there were no diagrams or other results generated. What should I do?

Discrepancy between HMMhits and KEGGModuleStepHit (metabolic version 4.0)

Thanks for releasing this so useful software!
After I updated METABOLIC to the latest version (4.0), I found some problems about the results.

HMMhits results didn't match the results from KEGGModuleStepHit
METOBOLIC: version 4.0
Command: perl /mdata/xxx/software/METABOLIC-v4.0/METABOLIC-G.pl -t 20 -m-cutoff 0.75 -in pro/ -o metabolic/

Belows are the results:
HMMhit:

ModuleStepHit results:

After I tidied the results, you can see the Discrepancy more clearly.

Question : How are calculated the R_input files

Hi
I wonder how the presence/absence of metabolism in the R_input files is calculated when there are several genes per metabolism.
For example: N-S-02:Ammonia oxidation with amoA.hmm amoB.hmm amoC.hmm.
Am I wrong in saying that if one of these genes is present, then the metabolism will be considered present?
Greg

Question about GTDBtk

Hi,

Firstly, thanks for a great tool. I have a question regarding setting the path to the GTDBtk database. I already have the latest version, and was wondering if there's a flag in the METABOLIC-G.pl or the -C scripts for specifying the path to the directory?

Thank you,
Susheel

successful conda install order & missing dependency in documentation

EDIT:
the coment below highlights the steps that worked to get an anaconda environment set up with all the dependencies required to run METABOLIC in both C and G mode. Further down this thread I've posted a yaml file with the specifications for that environment. Note that this does not install METABOLIC itself, this still needs to be cloned from this repository, and the resulting METABOLIC directory needs to be put in $PATH for the scripts to run.

One additional step is required: the shebang line of the two main scripts (METABOLIC-C.pl and METABOLIC-G.pl) should be edited to match the perl installation in your conda environment (ie: #! /path/to/conda/env/bin/perl)

##############
Orginal comment:

I spent a bunch of time today getting METABOLIC installed in a conda environment, and I found that the order in which I installed the dependencies mattered for the success of the install. Finally got it working with the list of commands below.

the only actual issue is a dependency missing in the documentation:
Parallel::Forkmanager
edit: ah, and ggraph is listed as a dependency twice

Otherwise, I hope this is helpful for people trying to get the software installed

conda channels I have added (r channel is not needed for the install):

https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/bioconda/linux-64
https://conda.anaconda.org/bioconda/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch

create env

conda create -n metabolic
conda activate metabolic

conda install the required tools

conda install sambamba
conda install bamtools
conda install coverm # installs perl 5.32
conda install gtdbtk
conda install diamond
conda install bowtie2
conda install R=3.6.0

conda install R dependencies

conda install r-tidyverse=1.3.0
conda install r-diagram
conda install r-ggthemes
conda install r-ggalluvial
conda install r-ggraph
conda install r-openxlsx
conda install r-pdftools

conda install perl dependencies

conda install perl-data-dumper # downgrades perl to 5.26.2
conda install perl-excel-writer-xlsx
conda install perl-posix
conda install perl-getopt-long
conda install perl-statistics-descriptive
conda install perl-bioperl

get the one pesky perl dependency not available through conda

conda install perl-app-cpanminus
env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split

conda install gdown

conda install the perl package to solve the first (and so far only) error

conda install perl-parallel-forkmanager

Bowtie2 error in METABOLIC-C.pl

Dear all

I have just recently installed METABOLIC and wanted to run the METABOLIC-C.pl test run with the recommended command

perl ./METABOLIC-C.pl -test true

Everything seemed alright until the bowtie2 step. It started with an error related to not specifying an output file, and I suppose all steps requiring bowtie2 seemed to have failed:

[2022-03-01 20:34:30] The Prodigal annotation is running...
[2022-03-01 20:35:00] The Prodigal annotation is finished
[2022-03-01 20:35:01] The hmmsearch is running with 5 cpu threads...
[2022-03-01 21:14:06] The hmmsearch is finished
[2022-03-01 21:14:09] Generating each hmm faa collection...
[2022-03-01 21:14:09] Each hmm faa collection has been made
[2022-03-01 21:14:09] The KEGG module result is calculating...
[2022-03-01 21:16:38] The KEGG identifier (KO id) result is calculating...
[2022-03-01 21:16:38] The KEGG identifier (KO id) seaching result is finished
[2022-03-01 21:16:38] Searching CAZymes by dbCAN2...
[2022-03-01 21:18:14] dbCAN2 searching is done
[2022-03-01 21:18:14] Searching MEROPS peptidase...
[2022-03-01 21:18:39] MEROPS peptidase searching is done
[2022-03-01 21:18:40] METABOLIC table has been generated
[2022-03-01 21:18:40] Drawing element cycling diagrams...
No output file specified!
Bowtie 2 version 2.4.5 by Ben Langmead ([email protected], www.cs.jhu.edu/~langmea)
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
    reference_in            comma-separated list of files with ref sequences
    bt2_index_base          write bt2 data to files with this dir/basename
*** Bowtie 2 indexes will work with Bowtie v1.2.3 and later. ***
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as
                            <reference_in>)
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    --debug                 use the debug binary; slower, assertions enabled
    --sanitized             use sanitized binary; slower, uses ASan and/or UBSan
    --verbose               log the issued command
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, less memory
    --bmax <int>            max bucket sz for blockwise suffix-array builder
    --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
    --dcv <int>             diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4 index files
    -3/--justref            just build .3/.4 index files
    -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
    -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
    --threads <int>         # of threads
    --seed <int>            seed for random number generator
    -q/--quiet              verbose output (for debugging)
    --h/--help              print this message and quit
(ERR): "METABOLIC_out/All_gene_collections.gene.scaffold" does not exist or is not a Bowtie 2 index
Exiting now ...
(ERR): "METABOLIC_out/All_gene_collections.gene.scaffold" does not exist or is not a Bowtie 2 index
Exiting now ...
rm: cannot remove ‘METABOLIC_out/All_gene_collections_mapped.1.sam’: No such file or directory
rm: cannot remove ‘METABOLIC_out/All_gene_collections_mapped.1.sam’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bt2’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bt2’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bam’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bam’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bai’: No such file or directory
rm: cannot remove ‘METABOLIC_out/*.bai’: No such file or directory
[2022-03-01 21:18:43] Drawing element cycling diagrams finished
[2022-03-01 21:18:43] Drawing metabolic handoff diagrams...
mv: cannot stat ‘METABOLIC_out/newdir/Bar_plot/bar_plot_input_1.pdf’: No such file or directory
mv: cannot stat ‘METABOLIC_out/newdir/Bar_plot/bar_plot_input_1.pdf’: No such file or directory
mv: cannot stat ‘METABOLIC_out/newdir/Bar_plot/bar_plot_input_2.pdf’: No such file or directory
mv: cannot stat ‘METABOLIC_out/newdir/Bar_plot/bar_plot_input_2.pdf’: No such file or directory

I notice in the METABOLIC-C.pl script line 113 the $output should just be the working directory

`my $output = `pwd`; # The output folder`

Is there something regarding the output directory I will need to specify? Thanks

Marcus

A question about community analyses

Hi,
If I have 100 genomes of isolates, how can I run community analyses like metagenome genomes? Thanks!
Best wishes!

Avoiding taxonomic inference - or providing your own

Hi everybody,
I was trying to utilize the tool on my 38 MAGs, for which I already have carried out taxonomic inferences independently. Is there a way to prevent METABOLIC-C from re-computingtaxonomy for my genomes, or to feed it my own data? I don't need a taxonomic breakdown of the metabolic inferences, I'd more like to check how my genomes interact within their community.
Best regards

anantharamanlab / metabolic Goto Github PK

metabolic's Introduction

METABOLIC

Installing and using METABOLIC

metabolic's People

Contributors

Stargazers

Watchers

Forkers

metabolic's Issues

conda channels I have added (r channel is not needed for the install):

create env

conda install the required tools

conda install R dependencies

conda install perl dependencies

get the one pesky perl dependency not available through conda

conda install gdown

conda install the perl package to solve the first (and so far only) error

Recommend Projects

Recommend Topics

Recommend Org