matteopaluh / kemet Goto Github PK
View Code? Open in Web Editor NEWKEGG Module Evaluation Tool
License: Other
KEGG Module Evaluation Tool
License: Other
I am trying to run this script but it keeps returning with this
"The genomes.instruction file has been updated with 0 genome(s) taxonomy indications, using '.fasta' extension"
Could you please tell me if there is anything that I can do to fix it ?
Dear,
I performed kemet against several samples, can you give me some tips on how to merge these tables into one?
Best regards,
Leandro.
Hi,
I saw that the README indicated that custom modules could be added.
Should they just be included in the kk_files
folder, or do they need to be included elsewhere?
Cheers
Greg
While running this program, getting the following error:
python kemet.py genomes/test.fna -a eggnog --skip_hmm --skip_gsmm
Traceback (most recent call last):
File "./kemet.py", line 2450, in
if ktest in sorted(os.listdir()):
NameError: name 'ktest' is not defined
Hi,
Thanks for the kemet package. The package and the article looks awesome.
I installed kemet following the instructions and when i run it, i get the following error:
`python kemet.py genomes/mcs.fasta -a kofamkoala --hmm_mode kos
Traceback (most recent call last):
File "kemet.py", line 2514, in
if LOGflag:
NameError: name 'LOGflag' is not defined`
Hi,
Thank you for developing KEMET. I'm very interested in making use of the three main modules included in this package. Nonetheless, I'm facing a couple of issues and I kindly request some assistance.
For testing purposes, I'm currently working with a high-quality MAG with filename KEMET/genomes/SB_biofilm_MAG_1_.fa
. KEGG annotations were performed with KofamKOALA, and were included as a tsv file (KEMET/KEGG_annotations/SB_biofilm_MAG_2_.tsv
). I'm running the --hmm_mode modules
option with "M00001" as the only input for the "module_file.instruction" file . The "genomes.instruction" file contains the following (tab separated):
id taxonomy universe
SB_biofilm_MAG_1_.fa Bacteroidetes gramneg
Currently I'm running the following command in the KEMET directory: ./kemet.py genomes/SB_biofilm_MAG_1_.fa -a kofamkoala --log --hmm_mode modules --skip_gsmm
Issues:
M00001.kk M00001_Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
% 0.0 0__10 INCOMPLETE
1. K00844, K12407, K00845, K00886, K08074, K00918
2. K01810, K06859, K13810, K15916
3. K00850, K16370, K00918
4. K01623, K01624, K11645, K16305, K16306
5. K01803
6. K00134, K00150, K11389
7. K00927, K11389
8. K01834, K15633, K15634, K15635
9. K01689
10. K00873, K12406
M00002.kk M00002_Glycolysis, core module involving three-carbon compounds
% 0.0 0__6 INCOMPLETE
1. K01803
2. K00134, K00150, K11389
3. K00927, K11389
4. K01834, K15633, K15634, K15635
...
--hmm_mode modules
option with "M00001". While running the HMM search function, the following is printed:Alignment input open failed.
couldn't guess alphabet (maybe try --dna/--rna/--amino if available)
while reading file K12406.msa
while parsing for aligned FASTA format
Alignment input open failed.
couldn't guess alphabet (maybe try --dna/--rna/--amino if available)
while reading file K01624.msa
while parsing for aligned FASTA format
...
I looked for the *.msa files at their respective directories, and it seems that the files are blank. Consequently, the following is printed on screen:
Error: File existence/permissions problem in trying to open query file K12406.h$
HMM file K12406.hmm not found (nor an .h3m binary of it)
Error: File existence/permissions problem in trying to open query file K01624.h$
HMM file K01624.hmm not found (nor an .h3m binary of it)
...
Traceback (most recent call last):
File "./kemet.py", line 2536, in <module>
HMM_hits_longestTRANSLATED_dict = HMM_hits_longest_translated_sequences(HMM$
File "./kemet.py", line 1290, in HMM_hits_longest_translated_sequences
max_len_dict[fasta_nf].append(seq_max) # add the longest to list
KeyError: '>SB_biofilm_MAG_1'
If necessary, I would gladly share via e-mail the original nucleotide fasta and KEGG annotations files.
Thank you so much in advance.
Best,
David
Hello, thanks for developing this useful tool.
I put co_metabat2.1.fa, co_metabat2.12.fa, co_metabat2.100.fa, co_metabat2.199.fa in the genomes
folder at the same time, and their annotation files are also all placed in the KEGG_annotations
folder.
If I run kemet.py -a eggnog --skip_hmm genomes/co_metabat2.1.fa
, only reportKMC_co_metabat2.199.tsv is displayed in the
reports_tsv
folder, but co_metabat2.1.ktest, co_metabat2.12.ktest, co_metabat2.100.ktest, co_metabat2.199.ktest are displayed in the ktests
folder.
Can you help me with this problem?
Hi,
Thank you for the excellent tool! It's very helpful to me!
I have a question for you. Do you have a script or program to convert the KEGG module file to <module_id>.kk?
The KEGG module file I mean here is:
M00001_Glycolysis_(Embden-Meyerhof_pathway).txt
I'm asking because I want to assess the completeness of some KEGG pathways in the bacterial genome and I can't process the KEGG htext format files in bulk (Unless convert them manually).
I hope you have a solution for me...
I really appreciate any help you can provide.
Hao Jin
I met an error.
$./kemet.py genomes/x23.fna -a eggnog --skip_hmm
Traceback (most recent call last):
File "/root/KEMET/./kemet.py", line 2440, in
if ktest in sorted(os.listdir()):
NameError: name 'ktest' is not defined
What is the problem?
Hello and thanks for creating this software,
I have gene-to-ko annotations for all my MAGs. I would like to use KEMET to calculate the completeness of KEGG modules for these MAGs.
Unfortunately, i have not yet managed to do so. I think the instructions in README.md are not up-to-date. The file setup.py
is mentioned in multiple places but seems to be missing from the repository. It is unclear to me why i cannot run the tool without providing a FASTA file when I'm using --skip_hmm
and --skip_gsmm
. The help text references the genomes.instruction
file in this context, but that one is also not part of the repository.
I'm also not sure if a am providing KO annotations in the right format. For each MAG, i created a tab-separated file with gene identifiers in the first column and KOs (e.g. K24042
) in the second column. They are named bin1_ko.txt
, bin2_ko.txt
etc.
If one gene has multiple KO annotations, the file will contain one row for each of those annotations.
Is this approach correct? What would i put for --annotation_format
? If my approach is incorrect, can you give me an example of how i should format my input to match one of the valid annotation formats?
Thank you very much for any help.
Kind Regards,
Tom
Hi,
Very nice tool that I'm excited to try
I wanted to know how (if it does) the software delt with different MAGs completeness
Best
Greg
First of all, thank you for putting together this really great package.
I find the module completeness assessment really unique, with only a few other lesser options out there (e.g., KeggDecoder). I also liked the way you break down the module definition in .kk files for improved completeness assessment. Therefore, I look forward to see continued support and development for this function.
In my case, I use ko annotations made within a different pipeline to assess module completeness with KEMET. In theory I would only need the annotation .txt file, but I have to also provide the genome assembly .fasta file to run the script (which is not really needed when running with --skip_hmm and --skip_gsmm arguments).
If I could make a feature request/suggestion, it would be to separate the module completeness functionality where it accepts just ko annotation files (either a path to a file or a path to a folder for batch operation).
It would also be great to have a stand-alone tool to create module definition .kk files from the official kegg module .txt files, for situations where KEMET is not continuously supported and current .kk files become obsolete.
Thank you for giving these some consideration.
Hi,
would it be possible to share the script that you are using to create .kk files?
I need to pin the completeness analysis to a specific KEGG version that I am also using for different other analyses.
Thank you for your code, but I encountered an issue when running it.
This is how I used it. Under the 'eggnog' directory, there is a file named 'emapper.annotations', and under the 'genome' directory, there is a file named 'genome.fna'. The code I used is
python ./kemet.py -I ./eggnog -a eggnog --skip_hmm --skip_gsmm ./genome -q --log --path_output ./output
However, an error occurred: FileNotFoundError: [Errno 2] No such file or directory: './output/ktests/'
Strangely, when I manually created the entire folder, the code seemed to run smoothly, but no output file was generated.
I got the same error even when using your test files.
Any plans on making KEMET a legit python package that can be installed via pip (from pypi)? I see that the setup.py
is non-standard. Converting the current code in the setup.py to a separate script that is referenced via scripts:
in a standard setup.py would likely be all that is needed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.