matteopaluh / kemet Goto Github PK

View Code? Open in Web Editor NEW

23.0 3.0 5.0 182.56 MB

KEGG Module Evaluation Tool

License: Other

Python 78.58% Jupyter Notebook 21.42%

kegg kegg-modules metabolic-models metabolic-reconstruction gap-filling genome-scale-metabolic-model gem

kemet's People

Contributors

Stargazers

Watchers

Forkers

ale-rossi mattoslmp chuanfaliu dtdoering wocer2019

kemet's Issues

add_taxonomy_from_gtdb-tk.py - help!

I am trying to run this script but it keeps returning with this
"The genomes.instruction file has been updated with 0 genome(s) taxonomy indications, using '.fasta' extension"
Could you please tell me if there is anything that I can do to fix it ?

Merge multiple KEMET results

Dear,
I performed kemet against several samples, can you give me some tips on how to merge these tables into one?
Best regards,
Leandro.

Custom Modules?

Hi,
I saw that the README indicated that custom modules could be added.
Should they just be included in the kk_files folder, or do they need to be included elsewhere?
Cheers
Greg

'ktest' error

While running this program, getting the following error:
python kemet.py genomes/test.fna -a eggnog --skip_hmm --skip_gsmm
Traceback (most recent call last):
File "./kemet.py", line 2450, in
if ktest in sorted(os.listdir()):
NameError: name 'ktest' is not defined

[error] problems with KoFamKOALA

Hi,

Thanks for the kemet package. The package and the article looks awesome.
I installed kemet following the instructions and when i run it, i get the following error:

`python kemet.py genomes/mcs.fasta -a kofamkoala --hmm_mode kos

Traceback (most recent call last):
File "kemet.py", line 2514, in
if LOGflag:
NameError: name 'LOGflag' is not defined`

Kofamscan format problem

Hi,

Thank you for developing KEMET. I'm very interested in making use of the three main modules included in this package. Nonetheless, I'm facing a couple of issues and I kindly request some assistance.

For testing purposes, I'm currently working with a high-quality MAG with filename KEMET/genomes/SB_biofilm_MAG_1_.fa. KEGG annotations were performed with KofamKOALA, and were included as a tsv file (KEMET/KEGG_annotations/SB_biofilm_MAG_2_.tsv). I'm running the --hmm_mode modules option with "M00001" as the only input for the "module_file.instruction" file . The "genomes.instruction" file contains the following (tab separated):

id      taxonomy        universe
SB_biofilm_MAG_1_.fa    Bacteroidetes   gramneg

Currently I'm running the following command in the KEMET directory: ./kemet.py genomes/SB_biofilm_MAG_1_.fa -a kofamkoala --log --hmm_mode modules --skip_gsmm

Issues:

For the KEGG modules completeness evaluation, I'm getting unexpected results compared to the output from KEGG mapper. While in KEGG mapper I'm having multiple complete modules (e.g., M00001), both outputs from KEMET (.tsv and .txt) display that every module is incomplete (with 0% completeness). Here is an example of how the output .txt file looks:

M00001.kk       M00001_Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
%       0.0     0__10   INCOMPLETE
1.      K00844, K12407, K00845, K00886, K08074, K00918
2.      K01810, K06859, K13810, K15916
3.      K00850, K16370, K00918
4.      K01623, K01624, K11645, K16305, K16306
5.      K01803
6.      K00134, K00150, K11389
7.      K00927, K11389
8.      K01834, K15633, K15634, K15635
9.      K01689
10.     K00873, K12406

M00002.kk       M00002_Glycolysis, core module involving three-carbon compounds
%       0.0     0__6    INCOMPLETE
1.      K01803
2.      K00134, K00150, K11389
3.      K00927, K11389
4.      K01834, K15633, K15634, K15635
...

Despite the previous issue, I tried to run the --hmm_mode modules option with "M00001". While running the HMM search function, the following is printed:

Alignment input open failed.
   couldn't guess alphabet (maybe try --dna/--rna/--amino if available)
   while reading file K12406.msa
   while parsing for aligned FASTA format
Alignment input open failed.
   couldn't guess alphabet (maybe try --dna/--rna/--amino if available)
   while reading file K01624.msa
   while parsing for aligned FASTA format
...

I looked for the *.msa files at their respective directories, and it seems that the files are blank. Consequently, the following is printed on screen:

Error: File existence/permissions problem in trying to open query file K12406.h$
HMM file K12406.hmm not found (nor an .h3m binary of it)


Error: File existence/permissions problem in trying to open query file K01624.h$
HMM file K01624.hmm not found (nor an .h3m binary of it)
...

After the completion of nhmer significant hints, the following traceback is printed on screen:

Traceback (most recent call last):
  File "./kemet.py", line 2536, in <module>
    HMM_hits_longestTRANSLATED_dict = HMM_hits_longest_translated_sequences(HMM$
  File "./kemet.py", line 1290, in HMM_hits_longest_translated_sequences
    max_len_dict[fasta_nf].append(seq_max) # add the longest to list
KeyError: '>SB_biofilm_MAG_1'

If necessary, I would gladly share via e-mail the original nucleotide fasta and KEGG annotations files.

Thank you so much in advance.

Best,

David

Incorrect recognition of MAG filename

Hello, thanks for developing this useful tool.

I put co_metabat2.1.fa, co_metabat2.12.fa, co_metabat2.100.fa, co_metabat2.199.fa in the genomes folder at the same time, and their annotation files are also all placed in the KEGG_annotations folder.
If I run kemet.py -a eggnog --skip_hmm genomes/co_metabat2.1.fa, only reportKMC_co_metabat2.199.tsv is displayed in the
reports_tsv folder, but co_metabat2.1.ktest, co_metabat2.12.ktest, co_metabat2.100.ktest, co_metabat2.199.ktest are displayed in the ktests folder.

Can you help me with this problem?

How to use the script to convert the KEGG module file to <module_id>.kk?

Hi,
Thank you for the excellent tool! It's very helpful to me!

I have a question for you. Do you have a script or program to convert the KEGG module file to <module_id>.kk?

The KEGG module file I mean here is:
M00001_Glycolysis_(Embden-Meyerhof_pathway).txt

I'm asking because I want to assess the completeness of some KEGG pathways in the bacterial genome and I can't process the KEGG htext format files in bulk (Unless convert them manually).

I hope you have a solution for me...

I really appreciate any help you can provide.
Hao Jin

"ktest" file error - due to file naming convention

I met an error.
$./kemet.py genomes/x23.fna -a eggnog --skip_hmm

Traceback (most recent call last):
File "/root/KEMET/./kemet.py", line 2440, in
if ktest in sorted(os.listdir()):
NameError: name 'ktest' is not defined

What is the problem?

Equivocal README & file-naming problems

Hello and thanks for creating this software,

I have gene-to-ko annotations for all my MAGs. I would like to use KEMET to calculate the completeness of KEGG modules for these MAGs.

Unfortunately, i have not yet managed to do so. I think the instructions in README.md are not up-to-date. The file setup.py is mentioned in multiple places but seems to be missing from the repository. It is unclear to me why i cannot run the tool without providing a FASTA file when I'm using --skip_hmm and --skip_gsmm. The help text references the genomes.instruction file in this context, but that one is also not part of the repository.

I'm also not sure if a am providing KO annotations in the right format. For each MAG, i created a tab-separated file with gene identifiers in the first column and KOs (e.g. K24042) in the second column. They are named bin1_ko.txt, bin2_ko.txt etc.
If one gene has multiple KO annotations, the file will contain one row for each of those annotations.
Is this approach correct? What would i put for --annotation_format? If my approach is incorrect, can you give me an example of how i should format my input to match one of the valid annotation formats?

Thank you very much for any help.

Kind Regards,
Tom

Dealing with different MAG completness?

Hi,
Very nice tool that I'm excited to try
I wanted to know how (if it does) the software delt with different MAGs completeness
Best
Greg

Module completeness as stand-alone package

First of all, thank you for putting together this really great package.
I find the module completeness assessment really unique, with only a few other lesser options out there (e.g., KeggDecoder). I also liked the way you break down the module definition in .kk files for improved completeness assessment. Therefore, I look forward to see continued support and development for this function.

In my case, I use ko annotations made within a different pipeline to assess module completeness with KEMET. In theory I would only need the annotation .txt file, but I have to also provide the genome assembly .fasta file to run the script (which is not really needed when running with --skip_hmm and --skip_gsmm arguments).

If I could make a feature request/suggestion, it would be to separate the module completeness functionality where it accepts just ko annotation files (either a path to a file or a path to a folder for batch operation).

It would also be great to have a stand-alone tool to create module definition .kk files from the official kegg module .txt files, for situations where KEMET is not continuously supported and current .kk files become obsolete.

Thank you for giving these some consideration.

Create .kk files

Hi,

would it be possible to share the script that you are using to create .kk files?
I need to pin the completeness analysis to a specific KEGG version that I am also using for different other analyses.

Error regarding output directory

Thank you for your code, but I encountered an issue when running it.

This is how I used it. Under the 'eggnog' directory, there is a file named 'emapper.annotations', and under the 'genome' directory, there is a file named 'genome.fna'. The code I used is
python ./kemet.py -I ./eggnog -a eggnog --skip_hmm --skip_gsmm ./genome -q --log --path_output ./output

However, an error occurred: FileNotFoundError: [Errno 2] No such file or directory: './output/ktests/'

Strangely, when I manually created the entire folder, the code seemed to run smoothly, but no output file was generated.

I got the same error even when using your test files.

conversion to python package

Any plans on making KEMET a legit python package that can be installed via pip (from pypi)? I see that the setup.py is non-standard. Converting the current code in the setup.py to a separate script that is referenced via scripts: in a standard setup.py would likely be all that is needed.