Code Monkey home page Code Monkey logo

metachip's Introduction

metachip's People

Contributors

songweizhi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

metachip's Issues

Biopython need to be >= 1.78, MetaCHIP exited!

Hi weizhi,

I already installed the MetaCHIP under my own directory through the conda installation. But, when running, error happened.

Conda installation
module load tools #pre for miniconda3/4.14.0
module load miniconda3/4.14.0
module load intel/perflibs/64/2020_update2
module load gcc/7.4.0
module load R/4.2.0
conda create -n MetaCHIP
conda activate MetaCHIP
conda install -c conda-forge -c bioconda MetaCHIP
conda deactivate

Running command lines
module load tools #pre for miniconda3/4.12.0
module load miniconda3/4.12.0
module load intel/perflibs/64/2020_update2
module load gcc/7.4.0
module load R/4.2.0
conda activate MetaCHIP
MetaCHIP PI -p HGT_sib -r s -t 20 -i ${in_dir} -x fasta -taxon ${MAG_GTDB}
MetaCHIP BP -p HGT_sib -r s -t 20
conda deactivate

Error
Screenshot 2023-04-05 at 19 58 04

Best,

Bing

rename_cotig

When to add this rename_cotig parameter, please!

what is blastdb/*.fasta ?

Hello,

what is the file called *_combined_ffn.fasta in the directory *_cofg_blastdb/ after running metaCHIP?
The file can be accessed as follows

cd test_MetaCHIP_wd/
ls -1
test_cofg_blastdb/
test_cofg_blastn_commands.txt
test_cofg_blastn_results/
test_cofg_blastn_results_filtered_al200bp_cov75/
test_cofg_combined_faa.fasta
test_cofg_get_SCG_tree_wd/
test_cofg_log_files/
test_cofg_prodigal_output/
test_cofg_SCG_tree_cov50_css25.aln
test_cofg_SCG_tree.newick
test_combined_cofg_HGTs_ip90_al200bp_c75_ei80_f10kbp/
test_grouping_c9.txt
test_grouping_f26.txt
test_grouping_g43.txt
test_grouping_o15.txt
test_HGT_ip90_al200bp_c75_ei80_f10kbp_c9/
test_HGT_ip90_al200bp_c75_ei80_f10kbp_f26/
test_HGT_ip90_al200bp_c75_ei80_f10kbp_g43/
test_HGT_ip90_al200bp_c75_ei80_f10kbp_o15/

cd test_cofg_blastdb/
ls -1
test_cofg_combined_ffn.fasta
test_cofg_combined_ffn.fasta.nhr
test_cofg_combined_ffn.fasta.nin
test_cofg_combined_ffn.fasta.nog
test_cofg_combined_ffn.fasta.nsd
test_cofg_combined_ffn.fasta.nsi
test_cofg_combined_ffn.fasta.nsq

Thank you very much!
Marie

MetaCHIP working directory detected, program exited!

Hi Weizhi,

I'm using MeatCHIP (v 1.10.12) with your example files, for the BI step:

MetaCHIP PI -r pcofg -t 30 -p gut -x fasta -o metachip_output -i /scratch/p290555/E1_meta/final_contig_all/metachip_test/ -human_gut_bins_GTDB.tsv

But the errors reported:

[2023-03-01 14:54:08] Input genomes grouped into 5 phyla.
[2023-03-01 14:54:09] Input genomes grouped into 5 classes.
[2023-03-01 14:54:09] Input genomes grouped into 6 orders.
[2023-03-01 14:54:10] Input genomes grouped into 9 families.
[2023-03-01 14:54:10] Input genomes grouped into 9 genera.
[2023-03-01 14:54:11] Total number of qualified genomes for HGT detection: 10.
[2023-03-01 14:54:11] Grouping file exported to: gut_grouping_p5.txt.
[2023-03-01 14:54:12] Grouping file exported to: gut_grouping_c5.txt.
[2023-03-01 14:54:12] Grouping file exported to: gut_grouping_o6.txt.
[2023-03-01 14:54:13] Grouping file exported to: gut_grouping_f9.txt.
[2023-03-01 14:54:14] Grouping file exported to: gut_grouping_g9.txt.
[2023-03-01 14:54:14] Running Prodigal for 10 genomes with 30 cores (1-3 minutes per genome per core).

Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.
Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.
Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.
Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.
Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.
Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.
Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.
Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.
Warning: saw non-sequence line longer than 10000 chars, sequence might not be read correctly.

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/p290555/metachip_env/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/data/p290555/metachip_env/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/data/p290555/metachip_env/lib/python3.8/site-packages/MetaCHIP/PI.py", line 247, in prodigal_worker
prodigal_parser(pwd_input_genome, pwd_output_sco, input_genome_basename, pwd_prodigal_output_folder)
File "/data/p290555/metachip_env/lib/python3.8/site-packages/MetaCHIP/PI.py", line 98, in prodigal_parser
for each_seq in SeqIO.parse(seq_file, 'fasta'):
File "/data/p290555/metachip_env/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 72, in next
return next(self.records)
File "/data/p290555/metachip_env/lib/python3.8/site-packages/Bio/SeqIO/FastaIO.py", line 246, in iterate
Seq(sequence), id=first_word, name=first_word, description=title
File "/data/p290555/metachip_env/lib/python3.8/site-packages/Bio/Seq.py", line 2028, in init
self._data = bytes(data, encoding="ASCII")
UnicodeEncodeError: 'ascii' codec can't encode character '\u21b5' in position 4375: ordinal not in range(128)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/p290555/metachip_env/bin/MetaCHIP", line 167, in
PI(args, MetaCHIP_config.config_dict)
File "/data/p290555/metachip_env/lib/python3.8/site-packages/MetaCHIP/PI.py", line 891, in PI
pool.map(prodigal_worker, list_for_multiple_arguments_Prodigal)
File "/data/p290555/metachip_env/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/data/p290555/metachip_env/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
UnicodeEncodeError: 'ascii' codec can't encode character '\u21b5' in position 4375: ordinal not in range(128)

Could you please help me with that? Thank you in advance.
Xipeng

Question about average group identity

Hi.
Thank you for developing great tools.
I have a question for searching gene candidate for HGT using average group identity.

I understand a gene is considered as a candidate for HGT when it has the highest similarity with other group.
However, some genes have highest similarity to self group while they are also very similar to other groups. In this case, are those genes not considered as gene candidate for HGT?

Thank you!

All files in strain_pcofg_blastn_results are empty, program exited

hi, song!
I'm sorry to bother you. but I got some error, when I run MetaCHIP BP: "All files in strain_pcofg_blastn_results are empty, program exited."
PI have done and the "strain_pcofg_blastn_results" directory is not empty, which have many files, such as GCF_900117335.1_PRJEB17656_genomic_blastn.tab, ... ...

what should i do?

Plots still produced in v1.10.12

Hey @songweizhi,

I'm currently running v1.10.12 (see screenshot), and for some reason, the plots are still made in the BP module despite me not providing the -pfr flag. Is this normal behaviour?

image

See the timestamp on the folder for the Flanking plots below (last line). I can confirm that it has so far produced 324575 files and is still going strong.

image

Thanks,
Susheel

Is there a way to suppress generation of flanking region plots?

Hello,

I am running metaCHIP on 1k archaeal genomes, and I am running out of file quota due to flanking region plots (in genomes_s6_Flanking_region_plots). Is there a way to suppress generation of these plots as at the end of the story the only plot I am interested in is the circos plot?

Thank you!

Number of input file for metagenomic studies. ERROR: MetaCHIP working directory detected, program exited!

Hello, I have encountered an error.
This is the script

MetaCHIP PI -p ~/ky/kw1/output/hgt/freshwater_ky -r pcofgs -t 8 -o ~/ky/kw1/output/hgt/freshwater_ky -i ~/ky/kw1/output/orf -x fasta -taxon ~/ky/kw1/output/hgt/karyern_fw_hgt_taxon.tsv

and the output was just a simple

MetaCHIP working directory detected, program exited!

I am currently trying to find HGT in a metagenomic fasta sample. All the identified sequences with taxon classifications are inside 1 fasta file. None of the manual talks about if it is necessary to have more than 1 fasta/fa file in order to run MetaCHIP (hence why I'm trying my luck to use this). So, do your perhaps know if the error I got was because of the fact Im using only 1 fasta file, or it's due to other issues?

I want to exclude MetaCHIP installation program because provided test sequence works for me.

Thank you so much!

Error in PI step with Prodigal

Hi,

We're running into an issue during the PI step with Prodigal. We get the following message back:

[2021-11-19 11:31:33] Total number of qualified genomes for HGT detection: 98.
[2021-11-19 11:31:33] Running Prodigal for 98 qualified genomes with 1 cores (1-3 minutes per genome per core).
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ddf37/.conda/envs/metachip/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/ddf37/.conda/envs/metachip/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages/MetaCHIP/PI.py", line 238, in prodigal_worker
prodigal_parser(pwd_input_genome, pwd_output_sco, input_genome_basename, pwd_prodigal_output_folder)
File "/home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages/MetaCHIP/PI.py", line 138, in prodigal_parser
transl_table = seq_to_transl_table_dict[seq_id]
KeyError: 'CVPL010W_10000976'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/ddf37/.conda/envs/metachip/bin/MetaCHIP", line 166, in
PI(args, MetaCHIP_config.config_dict)
File "/home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages/MetaCHIP/PI.py", line 879, in PI
pool.map(prodigal_worker, list_for_multiple_arguments_Prodigal)
File "/home/ddf37/.conda/envs/metachip/lib/python3.10/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/ddf37/.conda/envs/metachip/lib/python3.10/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: 'CVPL010W_10000976'

The code that was use is as follows:

MetaCHIP PI -p Demo1_1Node -g /ifs/groups/russellGrp/ddf37/metachippkg/Genometax.txt -t 1 -i /ifs/groups/russellGrp/ddf37/metachippkg/genomes -x fasta

We have checked to make sure the program and dependencies are up to date:

Requirement already satisfied: MetaCHIP in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (1.10.8)
Requirement already satisfied: scipy in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from MetaCHIP) (1.7.2)
Requirement already satisfied: biopython in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from MetaCHIP) (1.79)
Requirement already satisfied: ete3 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from MetaCHIP) (3.1.2)
Requirement already satisfied: matplotlib in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from MetaCHIP) (3.5.0)
Requirement already satisfied: numpy in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from MetaCHIP) (1.21.4)
Requirement already satisfied: reportlab in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from MetaCHIP) (3.6.2)
Requirement already satisfied: cycler>=0.10 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from matplotlib->MetaCHIP) (0.11.0)
Requirement already satisfied: packaging>=20.0 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from matplotlib->MetaCHIP) (21.2)
Requirement already satisfied: fonttools>=4.22.0 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from matplotlib->MetaCHIP) (4.28.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from matplotlib->MetaCHIP) (1.3.2)
Requirement already satisfied: setuptools-scm>=4 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from matplotlib->MetaCHIP) (6.3.2)
Requirement already satisfied: python-dateutil>=2.7 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from matplotlib->MetaCHIP) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from matplotlib->MetaCHIP) (8.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from matplotlib->MetaCHIP) (2.4.7)
Requirement already satisfied: six>=1.5 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib->MetaCHIP) (1.16.0)
Requirement already satisfied: setuptools in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from setuptools-scm>=4->matplotlib->MetaCHIP) (59.1.1)
Requirement already satisfied: tomli>=1.0.0 in /home/ddf37/.conda/envs/metachip/lib/python3.10/site-packages (from setuptools-scm>=4->matplotlib->MetaCHIP) (1.2.2)

This was all done on a computing cluster. Any help would be appreciated!

blastn issue

Hi,

First of all thanks for the pipeline!
I met an issue at the blastn step. The db is well generated but all blastn results tab are empty.

However, when I took a command line from blastn_commands.txt , it worked perfectly and I got an output.
I have already modified the files names to be less than 50 char.

Have you ever seen this issue?

MetaCHIP PI -r g -t 6 -i MAGs_filtered/ -x fasta -taxon gtdb_filtered_genomes_tax_new_name_fin.tsv -p TARA
[2021-09-22 13:29:36] Input genomes grouped into 338 genera.
[2021-09-22 13:29:37] Ignored 17 genome(s) for genus level HGT detection (unknown genus assignment).
[2021-09-22 13:29:37] Total number of qualified genomes for HGT detection: 553.
[2021-09-22 13:29:38] Grouping file exported to: TARA_grouping_g338.txt.
[2021-09-22 13:29:38] Running Prodigal for 553 qualified genomes with 6 cores (1-3 minutes pre genome per core).
[2021-09-22 14:34:02] Get SCG tree: running hmmsearch with 6 cores.
[2021-09-22 14:35:23] Get SCG tree: running hmmalign with 6 cores.
[2021-09-22 14:35:57] Get SCG tree: concatenating alignments.
[2021-09-22 14:35:58] Get SCG tree: removing columns from concatenated alignment represented by <50% of genomes and with amino acid consensus <25%.
[2021-09-22 14:36:19] Get SCG tree: running FastTree.
[2021-09-22 14:44:40] SCG tree exported to: TARA_g_SCG_tree.newick.
[2021-09-22 14:44:44] Making blast database.

Building a new DB, current time: 09/22/2021 14:46:11
New DB name: /media/nico/MyBook/MAGs_TARA/TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta
New DB title: TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B

BLAST Database creation error: Near line 1862829, the local id is too long. Its length is 51 but the maximum allowed local id length is 50. Please find and correct all local ids that are too long.
[2021-09-22 14:47:28] Blastn commands exported to: TARA_g_blastn_commands.txt.
[2021-09-22 14:47:28] Running blastn for 553 qualified genomes with 6 cores.
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
BLAST Database error: No alias or index file found for nucleotide database [TARA_MetaCHIP_wd/TARA_g_blastdb/TARA_g_combined_ffn.fasta] in search path [/media/nico/MyBook/MAGs_TARA::]
...

Thanks for your help!

Cheers

Issue when parsing the Ranger-DTL2 output, no circos created

Hi,

I am using MetaCHIP 1.10.9 and there's an issue at the BP step:

[2022-07-04 10:53:41] Filtered blastn results detected, filtration step skipped.
[2022-07-04 10:53:41] Detect HGT among species: input genomes were clustered into 2 species.
[2022-07-04 10:53:42] Detect HGT among species: Best-match approach.
[2022-07-04 10:53:42] Detect HGT among species: get group-to-group identities with 128 cores.
[2022-07-04 10:53:45] Detect HGT among species: analyzing Blast hits with 128 cores.
[2022-07-04 10:53:46] Detect HGT among species: plotting flanking regions with 128 cores.

[2022-07-04 10:55:38] Detect HGT among species: get sequences of BM predicted HGT candidates.
[2022-07-04 10:55:41] Detect HGT among species: done for BM approach!
[2022-07-04 10:55:41] Detect HGT among species: get gene/genome members in gene/species tree for BM predicted HGT candidates.
[2022-07-04 10:55:41] Detect HGT among species: prepare genomes_s_combined_faa.fasta subset.
[2022-07-04 10:55:42] Detect HGT among species: get species and gene tree for 21 BM approach identified HGTs.
[2022-07-04 10:55:43] Detect HGT among species: running Ranger-DTL2.
[2022-07-04 10:55:44] Detect HGT among species: parsing Ranger-DTL2 outputs.
[2022-07-04 10:55:44] Detect HGT among species: add Ranger-DTL predicted direction to genomes_s2_HGTs_PG.txt
Error in contrib.url(repos, type) : 
  trying to use CRAN without setting a mirror
Calls: suppressMessages ... check.packages -> install.packages -> startsWith -> contrib.url
Execution halted

May you please have any suggestions? Thank you :)

No input genome detected, program exited! using costumized grouping

I tried MetaCHIP with five samples. I have the bins in a single folder BRbins, and customized grouping like this
RC_V,RC_V.bin.7
RC_V,RC_V.bin.8
RC_V,RC_V.bin.9
RC_VI,RC_VI.bin.10
RC_VI,RC_VI.bin.11 ...
...

The bins are in fasta format, but when i try the first step, using this command:
MetaCHIP PI -p Bioreactor -g customized_grouping2.txt -t 12 -i BRbins -x fasta
it sends this message:
No input genome detected, program exited!
and NOTHING MORE!!

When I tried the example it ran well and gives the plots, but not with my bins

Could you help me please?

Bio.Alphabet has been removed from Biopython

Dear MetaCHIP team,

I've just installed MetaCHIP via pip3 install, however it is not functioning. I'm seeing an error associated with the Bio.Alphabet module from Biopython as follows:

$ MetaCHIP
Traceback (most recent call last):
  File "/opt/software/uoa/2020/apps/python/pypi/3.7/bin/MetaCHIP", line 24, in <module>
    from MetaCHIP.PI import PI
  File "/opt/software/uoa/2020/apps/python/pypi/3.7/lib/python3.7/site-packages/MetaCHIP/PI.py", line 31, in <module>
    from Bio.Alphabet import IUPAC, generic_dna
  File "/opt/software/uoa/2020/apps/python/pypi/3.7/lib/python3.7/site-packages/Bio/Alphabet/__init__.py", line 21, in <module>
    "Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

Any suggestions on how to fix this? We have the most recent version of biopython installed.

Thanks,

Sophie

Unexplainable behaviour of MetaCHIP PI

Hello!

I am trying to run MetaCHIP for the first time and getting some issues. I don't yet know how how naive my question is but I do need help :D

So I am starting with:
MetaCHIP PI -i genomes_msmithii_gut/ -x fasta -g taxonomy_msmithii_gut_groupping.txt -p gut
The output I get:

[2022-06-22 19:31:22] Total number of qualified genomes for HGT detection: 5.
[2022-06-22 19:31:22] Genome ids provided in taxonomy_msmithii_gut_groupping.txt do not match genome files in genomes_msmithii_gut, program exited!
[2022-06-22 19:31:22] Please note that file extension (e.g. fa, fasta) of the input genomes should NOT be included in the grouping file.

The output is weird as I have around 1000 genomes in my genomes_msmithii_gut/ directory, and it said there are only 5 qualified genomes.
Then, genome ids in taxonomy_msmithii_gut_groupping.txt are the same as in the folder with the only difference that they have an extension .fasta.

When I run MetaCHIP BP -p gutafterwards, I get:

Traceback (most recent call last):
  File "/home/users/pnovikova/miniconda3/envs/metachip/bin/MetaCHIP", line 169, in <module>
    BP(args, MetaCHIP_config.config_dict)
  File "/home/users/pnovikova/miniconda3/envs/metachip/lib/python3.10/site-packages/MetaCHIP/BP.py", line 1741, in BP
    pwd_prodigal_output_folder_detected    = [os.path.basename(file_name) for file_name in glob.glob(pwd_prodigal_output_folder_re)][0]
IndexError: list index out of range

Below I put heads of my files.

taxonomy_msmithii_gut_groupping.txt

GUT_GENOME001950,s__Methanobrevibacter_A_smithii
GUT_GENOME001966,s__Methanobrevibacter_A_smithii
GUT_GENOME002944,s__Methanobrevibacter_A_smithii
GUT_GENOME003140,s__Methanobrevibacter_A_smithii
GUT_GENOME004154,s__Methanobrevibacter_A_smithii
GUT_GENOME004870,s__Methanobrevibacter_A_smithii
GUT_GENOME005651,s__Methanobrevibacter_A_smithii_A
GUT_GENOME005889,s__Methanobrevibacter_A_smithii
GUT_GENOME006755,s__Methanobrevibacter_A_smithii
GUT_GENOME008460,s__Methanobrevibacter_A_smithii

files in genomes_msmithii_gut/:

-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.7M Jun 22 14:41 GUT_GENOME001950.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.7M Jun 22 14:41 GUT_GENOME001966.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.2M Jun 22 14:41 GUT_GENOME002944.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.6M Jun 22 14:41 GUT_GENOME003140.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.2M Jun 22 14:41 GUT_GENOME004154.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.4M Jun 22 14:41 GUT_GENOME004870.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  2.1M Jun 22 14:41 GUT_GENOME005651.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.7M Jun 22 14:41 GUT_GENOME005889.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.5M Jun 22 14:41 GUT_GENOME006755.fasta

Is there any explanation to that and what could be done to fix? It feels like there's something wrong with the file format or so, but I cannot see it, in any case I am trying to follow the examples.

The version of MetaCHIP is v1.10.9.

plots issues

Hi,Mr Song
I work around my assembles and cutomized grouping text with your pipeline. Although everything went well at the begining, the error indicated no files detected when creating flankregion plots. By the way the circle plot didn't exist either, just with the matrix file. Would you mind helping me on this issue. My command line and error message were listed as follows:
1.MetaCHIP PI -p APEC -g APECgrouping.csv -t 48 -i APECgenome -x fasta
2.MetaCHIP BP -p APEC -g APECgrouping.csv -t 48
Uploading 1. Uploading 2.jpg… 3  jpg…

Error in BP module

I got an error when I use the BP module as follow:
image
and I tried to comment out the line 1991 of BP.py but it is still not work.
Could you please help me with this.

Problem Running BP

Hello Dr. Song, I run the following command under the folder MetaCHIP/input_file_examples/human_gut_bins/, and the error below showed up. Could you help me with this? Thank you. Zirui

MetaCHIP BP -p GUT_BP -g /work/cascades/zirui/MetaChIP/customized_grouping.txt -t 6

Traceback (most recent call last):
File "/home/zirui/miniconda3/bin/MetaCHIP", line 168, in
BP(args, MetaCHIP_config.config_dict)
File "/home/zirui/miniconda3/lib/python3.8/site-packages/MetaCHIP/BP.py", line 1732, in BP
pwd_prodigal_output_folder_detected = [os.path.basename(file_name) for file_name in glob.glob(pwd_prodigal_o utput_folder_re)][0]
IndexError: list index out of range

Error in PI step

Hi-
I installed MetaCHIP through an ete3 conda environment. I get the general command list when I type MetaCHIP. I ran the summarize command on a bunch of gbf files and that worked. However, when I went to do the PI command with the test sequences (NorthSea bins), it gives me this error:

(ete3) [bcampbell@localhost Metachip]$ MetaCHIP PI -p Metachip_test -g Test_metachip.txt -t 20 -i /home/shared/MetaCHIP_test/Metachip -x fasta
[2021-04-19 16:03:13] Total number of qualified genomes for HGT detection: 4.[2021-04-19 16:03:13] Running Prodigal for 4 qualified genomes with 20 cores (1-3 minutes pre genome per core).
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last): File "/home/bcampbell/anaconda3/envs/ete3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds)) File "/home/bcampbell/anaconda3/envs/ete3/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args)) File "/home/bcampbell/anaconda3/envs/ete3/lib/python3.6/site-packages/MetaCHIP/PI.py", line 233, in prodigal_worker prodigal_parser(pwd_input_genome, pwd_output_sco, input_genome_basename, pwd_prodigal_output_folder) File "/home/bcampbell/anaconda3/envs/ete3/lib/python3.6/site-packages/MetaCHIP/PI.py", line 202, in prodigal_parser
SeqIO.write(current_SeqRecord, bin_gbk_file_handle, 'genbank') File "/home/bcampbell/.local/lib/python3.6/site-packages/Bio/SeqIO/init.py", line 533, in write
count = writer_class(fp).write_file(sequences) File "/home/bcampbell/.local/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py", line 237, in write_file
count = self.write_records(records) File "/home/bcampbell/.local/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py", line 222, in write_records
self.write_record(record) File "/home/bcampbell/.local/lib/python3.6/site-packages/Bio/SeqIO/InsdcIO.py", line 830, in write_record
self._write_the_first_line(record) File "/home/bcampbell/.local/lib/python3.6/site-packages/Bio/SeqIO/InsdcIO.py", line 631, in _write_the_first_line
raise ValueError("Need a Nucleotide or Protein alphabet")
ValueError: Need a Nucleotide or Protein alphabet
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/bcampbell/anaconda3/envs/ete3/bin/MetaCHIP", line 165, in
PI(args, MetaCHIP_config.config_dict) File "/home/bcampbell/anaconda3/envs/ete3/lib/python3.6/site-packages/MetaCHIP/PI.py", line 870, in PI
pool.map(prodigal_worker, list_for_multiple_arguments_Prodigal) File "/home/bcampbell/anaconda3/envs/ete3/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/bcampbell/anaconda3/envs/ete3/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: Need a Nucleotide or Protein alphabet

It's not putting data into the genbank files, but other files were generated before that step. I checked biopython (installed in the conda environment), but it said the requirement was already satisfied.

I am the administrator of the workstation. Another user installed biopython outside the conda environment (I think) and MetaCHIP is working.

Any suggestions? We had problems with FastTree as well, but put that in the path for everyone to use.

Thanks!
Barb

Unknown Classification

Hi,

Thanks for a nice tool. I tried it on some of my bins, and found a classification as follows:
B12.metabat.65.contigs_02841 B12.metabat.77.contigs_02542 G A 81.394 no no B12.metabat.65.contigs-->B12.metabat.77.contigs

However, I am not sure what the "G" stands for in this. Could you please let me know?

I also found in another sample "H" and "G".

B12.metabat.65.contigs_01463    B12.metabat.40.contigs_00582    H       G       85.023  no      no```

Thanks for your help!

Error in rgb

Hi,

I tried re-running the analyses with a different taxonomic classification and now have a newer error:

Error in rgb(rgb_mat, maxColorValue = 255, alpha = (1 - transparency) *  :
  alpha level NA, not in 0:255
Calls: chordDiagram ... chordDiagramFromMatrix -> chordDiagramFromDataFrame -> rgb
Execution halted```

I have the required R packages loaded in my environment though. 

Thanks for your help!

Too many files produced

Hey @songweizhi,

I'm running the analyses on a 1000 genomes and though the PI step is finished, it is failing at the subsequent steps due to disk quota errors. I have a files limit of 4000000 and yet this fails..

I figured that the reason this fails is that the BLAST_all steps create many files for each of the combinations of genomes. I noticed that you have the -noblast option in the PI step.

I don't see an example though of how to use it since it says skip running all-vs-all blastn, provide if you have other ways (e.g. with job scripts) to speed up the blastn step.

Could you please provide an example and how one can maybe reduce the number of output files?

Thanks!

No alias or index file found for nucleotide database

As reported by others previously, I am getting the following error during the PI step:

BLAST Database error: No alias or index file found for nucleotide database [/metachip/gut/output/gut_pcofg_blastdb/gut_pcofg_combined_ffn.fasta] in search path [/databases/gut/renamed::]

Here's my code:

MetaCHIP PI -i /databases/gut/renamed -o /metachip/gut/output -p gut -r pcofg -x fna -taxon taxonomy.txt -force -t 32

Unlike those with this issue previously, it doesn't seem to be due to the length of the filenames. All filenames and fasta headers are less than 20 characters. It seems that the gut_pcofg_combined_ffn.fasta file for making the BLAST database is not being generated as its size is zero.

Your help is appreciated, thanks.

Installation issues

Hi,

I recently downloaded MetaCHIP and have installed the dependencies on the same directory, and when I list them, they show, but I cannot run the code without error saying some third party dependencies are missing. Has anyone gone through similar issues or is there a guide to see how to diagnose this?

Terminal Saved Output.txt

Rooting the phylogeentic tree

Hi:
MetaCHIP utilizes the best-match approach and phylogenetic approach to study HGTs at the community level. I want to use MetaCHIP to identify the HGTs but I have a problem. I have read the paper "MetaCHIP: community-level horizontal gene transfer identification through the combination of best-match and phylogenetic approaches" and realized that MetaCHIP employ the phylogenetic approach to corroborate the results given by the best-match approach and provide information on the direction of gene flow. As we know, we could know the direction of gene flow only if we root the gene tree. However, I am not sure if MetaCHIP root the gene tree when conducting the phylogenetic analysis. Could you help me? I really appreciate if you could help.

Best regards,
YangYuan

Error thrown while running BP

Error in the terminal:

Command line argument error: Argument "subject". File is not accessible: northsea_MetaCHIP_wd/northsea_HGT_ip90_al200bp_c75_ei80_f10kbp_p18/northsea_p18_Flanking_region_plots/Myxococcales_bacterium_UW$ Command line argument error: Argument "subject". File is not accessible: northsea_MetaCHIP_wd/northsea_HGT_ip90_al200bp_c75_ei80_f10kbp_p18/northsea_p18_Flanking_region_plots/Myxococcales_bacterium_UW$
Command line argument error: Argument "query". File is not accessible: northsea_MetaCHIP_wd/northsea_HGT_ip90_al200bp_c75_ei80_f10kbp_p18/northsea_p18_Flanking_region_plots/Candidatus_Marinimicrobia_b$ Command line argument error: Argument "query". File is not accessible: northsea_MetaCHIP_wd/northsea_HGT_ip90_al200bp_c75_ei80_f10kbp_p18/northsea_p18_Flanking_region_plots/Candidatus_Marinimicrobia_b$
Command line argument error: Argument "subject". File is not accessible: northsea_MetaCHIP_wd/northsea_HGT_ip90_al200bp_c75_ei80_f10kbp_p18/northsea_p18_Flanking_region_plots/Planctomycetes_bacterium_$ Command line argument error: Argument "subject". File is not accessible: northsea_MetaCHIP_wd/northsea_HGT_ip90_al200bp_c75_ei80_f10kbp_p18/northsea_p18_Flanking_region_plots/Planctomycetes_bacterium_$
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/shreyansh/anaconda3/envs/MetaCHIP/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/shreyansh/anaconda3/envs/MetaCHIP/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/shreyansh/anaconda3/envs/MetaCHIP/lib/python3.6/site-packages/MetaCHIP/BP.py", line 861, in get_gbk_blast_act2
for blast_hit in open(output_c_full_len):
FileNotFoundError: [Errno 2] No such file or directory: 'northsea_MetaCHIP_wd/northsea_HGT_ip90_al200bp_c75_ei80_f10kbp_p18/northsea_p18_Flanking_region_plots/Alphaproteobacteria_bacterium_UWMA-0321_017$
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/shreyansh/anaconda3/envs/MetaCHIP/bin/MetaCHIP", line 168, in
BP(args, config_dict)
File "/home/shreyansh/anaconda3/envs/MetaCHIP/lib/python3.6/site-packages/MetaCHIP/BP.py", line 2164, in BP
pool_flanking_regions.map(get_gbk_blast_act2, list_for_multiple_arguments_flanking_regions)
File "/home/shreyansh/anaconda3/envs/MetaCHIP/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/shreyansh/anaconda3/envs/MetaCHIP/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'northsea_MetaCHIP_wd/northsea_HGT_ip90_al200bp_c75_ei80_f10kbp_p18/northsea_p18_Flanking_region_plots/Alphaproteobacteria_bacterium_UWMA-0321_017$

Please let know where am I going wrong!

Error running Ranger-DTL2 in BP module

Hello
When I try to test run a fresh installion of MetaCHIP I get the following error:

$ MetaCHIP BP -p Test -r c -t 20
[2020-10-19 13:21:39] Filtering blastn results with the following criteria: Query genome != Subject genome, Alignment length >= 200bp and coverage >= 75%.
[2020-10-19 13:21:39] HGT Detection among classes: input genomes were clustered into 5 classes.
[2020-10-19 13:21:39] HGT Detection among classes: Best-match approach.
[2020-10-19 13:21:39] HGT Detection among classes: get group-to-group identities with 20 cores.
[2020-10-19 13:21:39] HGT Detection among classes: analyzing Blast hits with 20 cores.
[2020-10-19 13:21:39] HGT Detection among classes: plotting flanking regions with 20 cores.
[2020-10-19 13:21:40] HGT Detection among classes: get sequences of BM predicted HGT candidates.
[2020-10-19 13:21:40] HGT Detection among classes: done for BM approach!
[2020-10-19 13:21:40] HGT Detection among classes: get gene/genome members in gene/species tree for BM predicted HGT candidates.
[2020-10-19 13:21:40] HGT Detection among classes: prepare Test_c_combined_faa.fasta subset.
[2020-10-19 13:21:40] HGT Detection among classes: get species and gene tree for 2 BM approach identified HGTs.
[2020-10-19 13:21:41] HGT Detection among classes: running Ranger-DTL2.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/casper/.conda/envs/py36/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/casper/.conda/envs/py36/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/casper/.conda/envs/py36/lib/python3.6/site-packages/MetaCHIP/BP.py", line 1232, in Ranger_worker
species_tree = Tree(pwd_species_tree_newick_no_hyphen_in_branch_length, format=0)
File "/home/casper/.local/lib/python3.6/site-packages/ete3/coretype/tree.py", line 213, in init
quoted_names=quoted_node_names)
File "/home/casper/.local/lib/python3.6/site-packages/ete3/parser/newick.py", line 266, in read_newick
return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
File "/home/casper/.local/lib/python3.6/site-packages/ete3/parser/newick.py", line 341, in _read_newick_from_string
_read_node_data(closing_internal, current_parent, "internal", matcher, formatcode)
File "/home/casper/.local/lib/python3.6/site-packages/ete3/parser/newick.py", line 445, in _read_node_data
raise NewickError("Unexpected newick format '%s' " %subnw[0:50])
ete3.parser.newick.NewickError: Unexpected newick format ':1.00:0.14535'
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/casper/.conda/envs/py36/bin/MetaCHIP", line 168, in
BP(args, config_dict)
File "/home/casper/.conda/envs/py36/lib/python3.6/site-packages/MetaCHIP/BP.py", line 2370, in BP
pool.map(Ranger_worker, list_for_multiple_arguments_Ranger)
File "/home/casper/.conda/envs/py36/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/casper/.conda/envs/py36/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ete3.parser.newick.NewickError: Unexpected newick format ':1.00:0.14535'
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

From my understanding, it seems the newick file produced in one step is in format 5 and the next step wants it to be in format 0.

Any help would be highly appreciated!

conda install -c bioconda metachip

Dear MetaCHIP developpers,

Thanks for your efforts in this amazing tool.
Is the conda environement maintained ? Would you recommend installing MetaCHIP through it?

It is running still yesterday:

Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: -

Thanks

BLAST and identity distribution error

@songweizhi

I ran metaCHIP successfully in the past, but when I tried it again today, ran into the following issues:

BLAST Database error: No alias or index file found for nucleotide database [cont-1_MetaCHIP_wd/cont-1_all_blastdb/cont-1_all_combined_ffn.fasta] in search path [/mnt/lscratch/users/ldenies/mice_AMR/hgt::]
[2020-06-04 07:59:51] Deleting temporary files
[2020-06-04 07:59:51] PrepIn done!
[2020-06-04 07:59:52] Found grouping file cont-1_g33_grouping.txt, input genomes were clustered into 33 groups
[2020-06-04 07:59:52] Filtering blast matches with the following criteria: Query genome != Subject genome, Alignment length >= 200bp and coverage >= 75%
[2020-06-04 07:59:53] Combining filtered blastn results
[2020-06-04 07:59:53] Get group-to-group identities with 18 cores
[2020-06-04 07:59:53] Plotting identity distribution between each pair of groups
Traceback (most recent call last):
  File "/home/users/sbusi/.local/bin/MetaCHIP", line 227, in <module>
    BM(args, config_dict)
  File "/home/users/sbusi/.local/lib/python3.6/site-packages/MetaCHIP/BP.py", line 2220, in BM
    do(plot_identity)
  File "/home/users/sbusi/.local/lib/python3.6/site-packages/MetaCHIP/BP.py", line 1934, in do
    current_group_pair_identity_cut_off = np.percentile(current_group_pair_identities_array, identity_percentile)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/lib/function_base.py", line 4291, in percentile
    interpolation=interpolation)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/lib/function_base.py", line 4033, in _ureduce
    r = func(a, **kwargs)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/lib/function_base.py", line 4405, in _percentile
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/core/fromnumeric.py", line 159, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/core/fromnumeric.py", line 52, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

is the blast index related to the plotting error? is there a workaround?

Thank you!

gene transfers of the same class

hi,thank you for developing a handy and practical tool.I have a question,
Gene transfers likely occurred between members within the same class, but my result graph didn't find evidence because my parameters were wrong?
thanks!

Error in PI step: Invalid index

Hi
Few days ago, I downloaded the MetaCHIP(V1.10.9) and succeeded in testing the example data. But when I run MetaCHIP with the data myself, the error occur below:

command: MetaCHIP PI -p sample_3 -r c -t 6 -i sample_3_bins -x fasta -taxon sample_3_bins_GTDB.tsv

[2022-01-13 22:57:18] Input genomes grouped into 3 classes.
[2022-01-13 22:57:18] Total number of qualified genomes for HGT detection: 10.
[2022-01-13 22:57:19] Grouping file exported to: sample_3_grouping_c3.txt.
[2022-01-13 22:57:19] Running Prodigal for 10 qualified genomes with 6 cores (1-3 minutes per genome per core).
[2022-01-13 22:57:28] Get SCG tree: running hmmsearch with 6 cores.
[2022-01-13 22:57:30] Get SCG tree: running hmmalign with 6 cores.

Error: File existence/permissions problem in trying to open HMM file sample_3_MetaCHIP_wd/sample_3_c_get_SCG_tree_wd/sample_3_c_hmm_profile_fetched/.hmm.
HMM file sample_3_MetaCHIP_wd/sample_3_c_get_SCG_tree_wd/sample_3_c_hmm_profile_fetched/.hmm not found (nor an .h3m binary of i

[2022-01-13 22:57:35] Get SCG tree: concatenating alignments.
[2022-01-13 22:57:35] Get SCG tree: removing columns from concatenated alignment represented by <50% of genomes and with amino acid consensus <25%.
Traceback (most recent call last):
File "/miniconda3/bin/MetaCHIP", line 166, in
PI(args, MetaCHIP_config.config_dict)
File "/miniconda3/lib/python3.7/site-packages/MetaCHIP/PI.py", line 974, in PI
remove_low_cov_and_consensus_columns(pwd_combined_alignment_file_tmp, minimal_cov_in_msa, min_consensus_in_msa, pwd_combined_alignment_file)
File "/miniconda3/lib/python3.7/site-packages/MetaCHIP/PI.py", line 533, in remove_low_cov_and_consensus_columns
alignment_cov = remove_low_cov_columns(alignment, minimal_cov)
File "/miniconda3/lib/python3.7/site-packages/MetaCHIP/PI.py", line 492, in remove_low_cov_columns
alignment_new = remove_columns_from_msa(alignment_in, low_cov_columns)
File "/miniconda3/lib/python3.7/site-packages/MetaCHIP/PI.py", line 460, in remove_columns_from_msa
segment_value = alignment_in[:, segment[0]]
File "/miniconda3/lib/python3.7/site-packages/Bio/Align/init.py", line 760, in getitem
rec[col_index] for rec in self._records[row_index]
File "/miniconda3/lib/python3.7/site-packages/Bio/Align/init.py", line 160, in init
self.extend(records)
File "/miniconda3/lib/python3.7/site-packages/Bio/Align/init.py", line 450, in extend
rec = next(records)
File "/miniconda3/lib/python3.7/site-packages/Bio/Align/init.py", line 760, in
rec[col_index] for rec in self._records[row_index]
File "/miniconda3/lib/python3.7/site-packages/Bio/SeqRecord.py", line 519, in getitem
raise ValueError("Invalid index")
ValueError: Invalid index

By the way, the version of biopython is V1.79.
What's wrong with it?

identity percentile cutoff

Hi,

This is probably a very naive question but I wonder how you define the option "MetaCHIP BP -ip". By default it is 90% so I guess it is not the % of identity of blastn ?

Thanks,

Nico

Error: BP step, file not found

Hey,
thanks for developing and maintaining metaCHIP. It seems to be of high significance and I really would like to use it to look into my metagenomics data. However, I get an error message when executing the MetaCHIP BP command. I know the question has been asked before but there is no final answer yet. The PI step works fine with the following output:

[2021-01-01 00:44:06] Input genomes grouped into 3 phyla.
[2021-01-01 00:44:06] Input genomes grouped into 3 classes.
[2021-01-01 00:44:07] Input genomes grouped into 5 orders.
[2021-01-01 00:44:07] Input genomes grouped into 5 families.
[2021-01-01 00:44:08] Input genomes grouped into 6 genera.
[2021-01-01 00:44:08] Total number of qualified genomes for HGT detection: 33.
[2021-01-01 00:44:09] Grouping file exported to: 0_core_grouping_p3.txt.
[2021-01-01 00:44:09] Grouping file exported to: 0_core_grouping_c3.txt.
[2021-01-01 00:44:10] Grouping file exported to: 0_core_grouping_o5.txt.
[2021-01-01 00:44:10] Grouping file exported to: 0_core_grouping_f5.txt.
[2021-01-01 00:44:11] Grouping file exported to: 0_core_grouping_g6.txt.
[2021-01-01 00:44:11] Running Prodigal for 33 qualified genomes with 1 cores (1-3 minutes pre genome per core).
[2021-01-01 00:53:44] Get SCG tree: running hmmsearch with 1 cores.
[2021-01-01 00:54:25] Get SCG tree: running hmmalign with 1 cores.
[2021-01-01 00:54:32] Get SCG tree: concatenating alignments.
[2021-01-01 00:54:32] Get SCG tree: removing columns from concatenated alignment represented by <50% of genomes and with amino acid consensus <25%.
[2021-01-01 00:54:32] Get SCG tree: running FastTree.
[2021-01-01 00:54:54] SCG tree exported to: 0_core_pcofg_SCG_tree.newick.
[2021-01-01 00:54:55] Making blast database.
[2021-01-01 00:54:57] Blastn commands exported to: 0_core_pcofg_blastn_commands.txt.
[2021-01-01 00:54:57] Running blastn for 33 qualified genomes with 1 cores.
[2021-01-01 01:22:50] Blast results exported to: 0_core_MetaCHIP_wd/0_core_pcofg_blastn_results.
[2021-01-01 01:22:50] PI step done!

However, when running the BS step, I get the following error message.

[2021-01-01 13:33:50] Filtering blastn results with the following criteria: Query genome != Subject genome, Alignment length >= 200bp and coverage >= 75%.
[2021-01-01 13:33:53] HGT Detection among phyla: input genomes were clustered into 3 phyla.
[2021-01-01 13:33:53] HGT Detection among phyla: Best-match approach.
[2021-01-01 13:33:53] HGT Detection among phyla: get group-to-group identities with 1 cores.
[2021-01-01 13:33:56] HGT Detection among phyla: analyzing Blast hits with 1 cores.
[2021-01-01 13:34:02] HGT Detection among phyla: plotting flanking regions with 1 cores.
Command line argument error: Argument "subject". File is not accessible:  `0_core_MetaCHIP_wd/0_core_HGT_ip90_al200bp_c75_ei80_f10kbp_p3/0_core_p3_Flanking_region_plots/Neis>
Command line argument error: Argument "subject". File is not accessible:  `0_core_MetaCHIP_wd/0_core_HGT_ip90_al200bp_c75_ei80_f10kbp_p3/0_core_p3_Flanking_region_plots/Neis>
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/miniconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/miniconda3/lib/python3.7/site-packages/MetaCHIP/BP.py", line 861, in get_gbk_blast_act2
    for blast_hit in open(output_c_full_len):
FileNotFoundError: [Errno 2] No such file or directory: '0_core_MetaCHIP_wd/0_core_HGT_ip90_al200bp_c75_ei80_f10kbp_p3/0_core_p3_Flanking_region_plots/Neis>
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/miniconda3/bin/MetaCHIP", line 168, in <module>
    BP(args, config_dict)
  File "/miniconda3/lib/python3.7/site-packages/MetaCHIP/BP.py", line 2164, in BP
    pool_flanking_regions.map(get_gbk_blast_act2, list_for_multiple_arguments_flanking_regions)
  File "/miniconda3/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/miniconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: '0_core_MetaCHIP_wd/0_core_HGT_ip90_al200bp_c75_ei80_f10kbp_p3/0_core_p3_Flanking_region_plots/Neis>

The commands I use are:

MetaCHIP PI  \
-p 0_core \
-r pcofg \
-i hgt-metachip/prep_gtdbtk/0_core_renamed \
-x fa \
-taxon hgt-metachip/prep_gtdbtk/0_core_renamed/classifying_genomes/gtdbtk.bac120.summary.tsv
MetaCHIP BP  \
-p 0_core \
-r pcofg

Can you give me any advice on what to do now?
Cheers,
Marie

Error while performing MetaCHIP PG step

Hi,
Thank you for developing a great tools!
when I run this command: MetaCHIP PG -p S1_Spr -r o -t 18, something went wrong but I get no idea that how to debug this.
Here is my error report, waitting on the line:). thank you!

Traceback (most recent call last):
File "/home/chenziwu/anaconda3/envs/metachip_py27/bin/MetaCHIP", line 177, in
PG(args, config_dict)
File "/home/chenziwu/anaconda3/envs/metachip_py27/lib/python2.7/site-packages/MetaCHIP/PG.py", line 825, in PG
pwd_AggregateRanger_exe = config_dict['AggregateRanger_linux']
KeyError: 'AggregateRanger_linux'

I have successfully completed the first two steps:MetaCHIP PI and MetaCHIP BM.
Here is the report of the first two steps:
MetaCHIP PI -i /home/chenziwu/data/metachip_test/metawrap_bins -x fa -taxon s1_spr_total.txt -r o -p S1_Spr -t 18
[2019-04-16 16:38:05] Input genomes clustered into 20 groups
[2019-04-16 16:38:05] Ignored 3 genome(s) with unknown classification at specified level
[2019-04-16 16:38:06] Grouping file exported to: S1_Spr_o20_grouping.txt
[2019-04-16 16:38:07] Grouping stats exported to: S1_Spr_o20_grouping.png
[2019-04-16 16:38:07] Calculating the size of input genomes
[2019-04-16 16:38:09] The size of input genomes exported to: S1_Spr_all_genome_size.txt
[2019-04-16 16:38:09] Running Prodigal with 18 cores for all input genomes
[2019-04-16 16:40:53] Copying annotation files of qualified genomes to corresponding folders
[2019-04-16 16:40:53] Running Hmmsearch with 18 cores
[2019-04-16 16:40:59] Running Hmmalign with 18 cores
[2019-04-16 16:41:05] Concatenating alignments
[2019-04-16 16:41:05] Removing columns from concatenated alignment represented by <50% of genomes and with an amino acid consensus <25%
[2019-04-16 16:41:10] Running FastTree
[2019-04-16 16:41:43] Species tree exported to: S1_Spr_o20_species_tree.newick
[2019-04-16 16:41:53] Commands for running blastn exported to: S1_Spr_all_blastn_commands.txt
[2019-04-16 16:41:53] Running blastn for all input genomes with 18 cores, blast results exported to: S1_Spr_MetaCHIP_wd/S1_Spr_all_blastn_results
[2019-04-16 17:36:40] PrepIn done!
MetaCHIP BM -p S1_Spr -r o -t 18
[2019-04-16 18:47:35] Found grouping file S1_Spr_o20_grouping.txt, input genomes were clustered into 20 groups
[2019-04-16 18:47:35] Filtering blast matches with the following criteria: Query genome != Subject genome, Alignment length >= 200bp and coverage >= 75%
[2019-04-16 18:47:36] Combining filtered blastn results
[2019-04-16 18:47:36] Get group-to-group identities with 18 cores
[2019-04-16 18:47:38] Plotting identity distribution between each pair of groups
[2019-04-16 18:47:39] Analyzing Blast hits to get HGT candidates with 18 cores
[2019-04-16 18:47:39] Plotting flanking regions with 18 cores
[2019-04-16 18:49:38] Extracting nc sequences for BM predicted HGTs
[2019-04-16 18:49:42] Deleting temporary files
[2019-04-16 18:49:42] Done for Best-match approach!

Find HGT between genes of different genomes

I made multifasta files of genes of interest of several genomes and used them as input files in a directory called genomes as following
singularity exec /apps/singularity-images/metachip_1.10.2.sif MetaCHIP PI -p pantoea -g customised_grouping.txt -t 30 -i genomes -x fasta

customised_grouping.txt file looks as following:
A,PNA_99_2
A,PNA_99_3
A,PNA_99_6
A,PNA_99_7
A,PNA_99_8
A,PNA_99_9
B,PANS_2_1
B,PANS_4_2
B,PANS_99_32
B,PNA_07_13
B,PNA_07_14
B,PNA_98_11
B,PNA_98_3
B,PNA_98_7

Getting an error:

WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container

Error: File existence/permissions problem in trying to open HMM file pantoea_MetaCHIP_wd/pantoea_x_get_SCG_tree_wd/pantoea_x_hmm_profile_fetched/.hmm.
HMM file pantoea_MetaCHIP_wd/pantoea_x_get_SCG_tree_wd/pantoea_x_hmm_profile_fetched/.hmm not found (nor an .h3m binary of it)
Traceback (most recent call last):
File "/usr/local/bin/MetaCHIP", line 165, in
PI(args, MetaCHIP_config.config_dict)
File "/usr/local/lib/python3.8/site-packages/MetaCHIP/PI.py", line 974, in PI
remove_low_cov_and_consensus_columns(pwd_combined_alignment_file_tmp, minimal_cov_in_msa, min_consensus_in_msa, pwd_combined_alignment_file)
File "/usr/local/lib/python3.8/site-packages/MetaCHIP/PI.py", line 537, in remove_low_cov_and_consensus_columns
alignment_cov = remove_low_cov_columns(alignment, minimal_cov)
File "/usr/local/lib/python3.8/site-packages/MetaCHIP/PI.py", line 496, in remove_low_cov_columns
alignment_new = remove_columns_from_msa(alignment_in, low_cov_columns)
File "/usr/local/lib/python3.8/site-packages/MetaCHIP/PI.py", line 464, in remove_columns_from_msa
segment_value = alignment_in[:, segment[0]]
File "/usr/local/lib/python3.8/site-packages/Bio/Align/init.py", line 848, in getitem
new = MultipleSeqAlignment(
File "/usr/local/lib/python3.8/site-packages/Bio/Align/init.py", line 170, in init
self.extend(records)
File "/usr/local/lib/python3.8/site-packages/Bio/Align/init.py", line 533, in extend
rec = next(records)
File "/usr/local/lib/python3.8/site-packages/Bio/Align/init.py", line 849, in
(rec[col_index] for rec in self._records[row_index]), self._alphabet
File "/usr/local/lib/python3.8/site-packages/Bio/SeqRecord.py", line 524, in getitem
raise ValueError("Invalid index")
ValueError: Invalid index

The program however generated an incomplete output directory pantoea_MetaCHIP_wd with these three subdirectories
antoea_x_get_SCG_tree_wd
pantoea_x_log_files
pantoea_x_prodigal_output

As a result the BP command does not generate HGT output files because not blastall files were reported by PI command

I can also share my input genome files if needed.

Index out of range

Hi @songweizhi

I'm trying to run MetaCHIP and getting the following error:

Activating conda environment: /scratch/users/sbusi/tools/miniconda3/envs/snakemake/pipeline/c9e7ba4a44b9d0d2baa8ed458e82adc8
Thu Nov 11 01:08:12 CET 2021
MetaCHIP working directory detected, program exited!
Traceback (most recent call last):
  File "/scratch/users/sbusi/tools/miniconda3/envs/snakemake/pipeline/c9e7ba4a44b9d0d2baa8ed458e82adc8/bin/MetaCHIP", line 169, in <module>
    BP(args, MetaCHIP_config.config_dict)
  File "/scratch/users/sbusi/tools/miniconda3/envs/snakemake/pipeline/c9e7ba4a44b9d0d2baa8ed458e82adc8/lib/python3.8/site-packages/MetaCHIP/BP.py", line 1737, in BP
    pwd_prodigal_output_folder_detected    = [os.path.basename(file_name) for file_name in glob.glob(pwd_prodigal_output_folder_re)][0]
IndexError: list index out of range

I'm running it inside a snakemake workflow and the rule looks like so:

rule metachip:
    input: /work/projects/amrwd/mice_amr_2021/fastq/IMP3/metachip/M11, /work/projects/amrwd/mice_amr_2021/fastq/IMP3/metachip/M11/gtdbtk_output/M11_gtdbtk.tsv
    output: /work/projects/amrwd/mice_amr_2021/fastq/IMP3/metachip/M11_MetaCHIP_wd/M11_all_blastn_commands.txt
    jobid: 0
    wildcards: datadir=/work/projects/amrwd/mice_amr_2021/fastq/IMP3/metachip, sample=M11
    threads: 18
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp

        date &&
        export PATH=$PATH:/home/users/sbusi/apps/metachip/bin && export PATH=$PATH:/scratch/users/sbusi/tools/miniconda3/envs/anvio-7/bin &&
        MetaCHIP PI -p M11 -r g -t 18 -i /work/projects/amrwd/mice_amr_2021/fastq/IMP3/metachip/M11 -x fa -taxon /work/projects/amrwd/mice_amr_2021/fastq/IMP3/metachip/M11/gtdbtk_output/M11_gtdbtk.tsv -o $(dirname /work/projects/amrwd/mice_amr_2021/fastq/IMP3/metachip/M11_MetaCHIP_wd/M11_all_blastn_commands.txt) &&
        MetaCHIP BP -p M11 -r g -t 18 -o $(dirname /work/projects/amrwd/mice_amr_2021/fastq/IMP3/metachip/M11_MetaCHIP_wd/M11_all_blastn_commands.txt) -force &&
        date

Thank you for your help!

IndexError: list index out of range when MetaCHIP=1.10.9 & biopython=1.79

Dear doctor song @ songweizhi:
I am running MetaCHIP for this command:
MetaCHIP BP -p AN20.MetaCHIP -r g -t 100
My MetaCHIP version is 1.10.9 and my biopython version is 1.79.
But i still met this error:
MetaCHIP BP -p AN20.MetaCHIP -r g -t 100
[2022-04-15 21:26:08] Filtered blastn results detected, filtration step skipped.
[2022-04-15 21:26:08] Detect HGT among genera: input genomes were clustered into 28 genera.
[2022-04-15 21:26:09] Detect HGT among genera: Best-match approach.
[2022-04-15 21:26:09] Detect HGT among genera: get group-to-group identities with 100 cores.
[2022-04-15 21:26:11] Detect HGT among genera: analyzing Blast hits with 100 cores.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/ifs1/User/wangc/anaconda3/envs/MetaCHIP3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/ifs1/User/wangc/anaconda3/envs/MetaCHIP3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/ifs1/User/wangc/anaconda3/envs/MetaCHIP3/lib/python3.7/site-packages/MetaCHIP/BP.py", line 409, in get_HGT_worker
group_pair_iden_cutoff_dict)
File "/ifs1/User/wangc/anaconda3/envs/MetaCHIP3/lib/python3.7/site-packages/MetaCHIP/BP.py", line 226, in get_candidates
query_gene_name = query_split[1]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/ifs1/User/wangc/anaconda3/envs/MetaCHIP3/bin/MetaCHIP", line 169, in
BP(args, MetaCHIP_config.config_dict)
File "/ifs1/User/wangc/anaconda3/envs/MetaCHIP3/lib/python3.7/site-packages/MetaCHIP/BP.py", line 2115, in BP
pool.map(get_HGT_worker, list_for_multiple_arguments_get_HGT)
File "/ifs1/User/wangc/anaconda3/envs/MetaCHIP3/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/ifs1/User/wangc/anaconda3/envs/MetaCHIP3/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: list index out of range

It is amazing that my other treatments can run this command successfully but except this one.
Please tell me how to solve this problems, I am looking forward to your reply
Thank you very much!

NA direction

I'm getting some nice data from MetaCHIP - many thanks. I had a question about the direction of gene flow. The majority of my HGT events have NA in the Direction column. If I'm understanding the output correctly, is there something I can do to ensure I this column gets populated with a direction for each event?

Gene_1	Gene_2	Gene_1_group	Gene_2_group	Identity	end_match	full_length_match	Direction
genome0001_00345	genome5872_00701	A	I	93.5	no	no	NA
genome0001_00347	genome5872_00699	A	I	91.3	no	no	NA
genome0001_00348	genome5872_00698	A	I	83.9	no	no	NA
genome0001_00349	genome5872_00695	A	I	87.7	no	no	NA
genome0001_00350	genome5872_00693	A	I	90.5	no	no	NA
genome0001_00353	genome5872_00690	A	I	91.7	no	no	NA
genome0001_00354	genome5872_00689	A	I	91.6	no	no	NA
genome0001_00355	genome5872_00688	A	I	93.6	no	no	NA
genome0001_00500	genome5697_03009	A	P	99.6	no	yes	NA
genome0001_00662	genome1082_01212	A	P	77.8	no	no	NA
genome0001_00737	genome1055_00881	A	I	93.0	no	no	NA
genome0001_00739	genome1055_00879	A	I	89.9	no	no	NA
genome0001_00740	genome1055_00878	A	I	88.2	no	no	NA
genome0001_00741	genome1055_00877	A	I	89.0	no	no	NA
genome0001_01112	genome3275_01776	A	D	95.3	no	no	NA
genome0001_01510	genome5872_00789	A	I	92.8	no	no	NA
genome0001_01511	genome5872_00790	A	I	95.6	no	no	NA
genome0002_00074	genome5930_03185	B	E	88.1	no	yes	NA
genome0002_00097	genome6042_00739	B	E	83.9	no	no	NA
genome0002_00242	genome5930_05734	B	E	79.6	no	yes	NA
genome0002_00305	genome0698_00292	B	A	78.8	no	no	NA
genome0002_00357	genome1851_00462	B	C	81.7	no	no	NA
genome0002_00611	genome3472_01563	B	I	75.9	no	no	genome0002-->genome3472
genome0002_00665	genome5207_01979	B	E	82.4	no	no	NA
genome0002_00847	genome5930_00581	B	E	89.5	no	yes	NA
genome0002_00865	genome0191_02082	B	I	74.5	no	no	NA

Thanks for your help

FileNotFoundError

Command:

MetaCHIP PI -r g -t 8   -x fna -p tmp/metachip     -i genomes  -taxon gtdbtk_summary.tsv

Error:

Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/ll_pipelines/llg/.snakemake/conda/393a800324f63ba3ae6b73503f7745cc/bin/MetaCHIP", line 165, in <module>
    PI(args, MetaCHIP_config.config_dict)
  File "/ebio/abt3_projects/software/dev/ll_pipelines/llg/.snakemake/conda/393a800324f63ba3ae6b73503f7745cc/lib/python3.8/site-packages/MetaCHIP/PI.py", line 651, in PI
    force_create_folder(pwd_log_folder)
  File "/ebio/abt3_projects/software/dev/ll_pipelines/llg/.snakemake/conda/393a800324f63ba3ae6b73503f7745cc/lib/python3.8/site-packages/MetaCHIP/PI.py", line 59, in force_create_folder
    os.mkdir(folder_to_create)
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/metachip_MetaCHIP_wd/tmp/metachip_g_log_files'

Conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
_r-mutex                  1.0.1               anacondar_1    conda-forge
alsa-lib                  1.2.3                h516909a_0    conda-forge
binutils_impl_linux-64    2.36.1               h193b22a_2    conda-forge
binutils_linux-64         2.36                 hf3e587d_1    conda-forge
biopython                 1.77             py38h1e0a361_1    conda-forge
blast                     2.12.0          pl5262h3289130_0    bioconda
bwidget                   1.9.14                        0    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.2               h7f98852_0    conda-forge
ca-certificates           2021.7.5             h06a4308_1
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
certifi                   2021.5.30        py38h578d9bd_0    conda-forge
curl                      7.79.0               hea6ffbf_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
dbus                      1.13.18              hb2f20db_0
entrez-direct             15.6                 he881be0_0    bioconda
ete3                      3.1.2              pyh9f0ad1d_0    conda-forge
expat                     2.4.1                h9c3ff4c_0    conda-forge
fasttree                  2.1.10               h779adbc_5    bioconda
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
fribidi                   1.0.10               h516909a_0    conda-forge
gcc_impl_linux-64         9.4.0                h03d3576_8    conda-forge
gcc_linux-64              9.4.0                h391b98a_1    conda-forge
gettext                   0.21.0               hf68c758_0
gfortran_impl_linux-64    9.4.0                h0003116_8    conda-forge
gfortran_linux-64         9.4.0                hf0ab688_1    conda-forge
glib                      2.68.4               h9c3ff4c_1    conda-forge
glib-tools                2.68.4               h9c3ff4c_1    conda-forge
graphite2                 1.3.14               h23475e2_0
gsl                       2.7                  he838d99_0    conda-forge
gst-plugins-base          1.18.5               hf529b03_0    conda-forge
gstreamer                 1.18.5               h76c114f_0    conda-forge
gxx_impl_linux-64         9.4.0                h03d3576_8    conda-forge
gxx_linux-64              9.4.0                h0316aca_1    conda-forge
harfbuzz                  2.9.1                h83ec7ef_0    conda-forge
hmmer                     3.3.2                h1b792b2_1    bioconda
icu                       68.1                 h58526e2_0    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
jpeg                      9d                   h516909a_0    conda-forge
kernel-headers_linux-64   2.6.32              he073ed8_14    conda-forge
kiwisolver                1.3.2            py38h1fd1430_0    conda-forge
krb5                      1.19.2               hcc1bbae_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      2.2.1                h9c3ff4c_0    conda-forge
libblas                   3.9.0           11_linux64_openblas    conda-forge
libcblas                  3.9.0           11_linux64_openblas    conda-forge
libclang                  11.1.0          default_ha53f305_1    conda-forge
libcurl                   7.79.0               h2574ce0_0    conda-forge
libdeflate                1.7                  h7f98852_5    conda-forge
libedit                   3.1.20210714         h7f8727e_0
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libffi                    3.4.2                h9c3ff4c_1    conda-forge
libgcc-devel_linux-64     9.4.0                hd854feb_8    conda-forge
libgcc-ng                 11.2.0               h1d223b6_8    conda-forge
libgfortran-ng            11.2.0               h69a702a_8    conda-forge
libgfortran5              11.2.0               h5c6108e_8    conda-forge
libglib                   2.68.4               h174f98d_1    conda-forge
libgomp                   11.2.0               h1d223b6_8    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           11_linux64_openblas    conda-forge
libllvm11                 11.1.0               hf817b99_2    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libogg                    1.3.5                h27cfd23_1
libopenblas               0.3.17          pthreads_h8fe5266_1    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.37               hed695b0_2    conda-forge
libpq                     13.3                 hd57d9b9_0    conda-forge
libsanitizer              9.4.0                h79bfe98_8    conda-forge
libssh2                   1.10.0               ha56f1ee_0    conda-forge
libstdcxx-devel_linux-64  9.4.0                hd854feb_8    conda-forge
libstdcxx-ng              11.2.0               he4da1e4_8    conda-forge
libtiff                   4.3.0                hf544144_1    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libvorbis                 1.3.7                he1b5a44_0    conda-forge
libwebp-base              1.2.1                h7f98852_0    conda-forge
libxcb                    1.14                 h7b6447c_0
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.12               h72842e0_0    conda-forge
libxslt                   1.1.33               h15afd5d_2    conda-forge
lxml                      4.6.3            py38hf1fe3a4_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
mafft                     7.487                h779adbc_0    bioconda
make                      4.3                  hd18ef5c_1    conda-forge
matplotlib-base           3.4.3            py38hf4fb855_0    conda-forge
metachip                  1.10.5             pyh5e36f6f_0    bioconda
mysql-common              8.0.25               ha770c72_2    conda-forge
mysql-libs                8.0.25               hfa10184_2    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nspr                      4.30                 h9c3ff4c_0    conda-forge
nss                       3.69                 hb5efdd6_0    conda-forge
numpy                     1.21.2           py38he2449b9_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1l               h7f98852_0    conda-forge
pango                     1.48.10              hb8ff022_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pcre2                     10.37                h032f7d1_0    conda-forge
perl                      5.26.2            h36c2ea0_1008    conda-forge
perl-archive-tar          2.32                    pl526_0    bioconda
perl-carp                 1.38                    pl526_3    bioconda
perl-common-sense         3.74                    pl526_2    bioconda
perl-compress-raw-bzip2   2.087           pl526he1b5a44_0    bioconda
perl-compress-raw-zlib    2.087           pl526hc9558a2_0    bioconda
perl-exporter             5.72                    pl526_1    bioconda
perl-exporter-tiny        1.002001                pl526_0    bioconda
perl-extutils-makemaker   7.36                    pl526_1    bioconda
perl-io-compress          2.087           pl526he1b5a44_0    bioconda
perl-io-zlib              1.10                    pl526_2    bioconda
perl-json                 4.02                    pl526_0    bioconda
perl-json-xs              2.34            pl526h6bb024c_3    bioconda
perl-list-moreutils       0.428                   pl526_1    bioconda
perl-list-moreutils-xs    0.428                   pl526_0    bioconda
perl-pathtools            3.75            pl526h14c3975_1    bioconda
perl-scalar-list-utils    1.52            pl526h516909a_0    bioconda
perl-types-serialiser     1.0                     pl526_2    bioconda
perl-xsloader             0.24                    pl526_0    bioconda
pigz                      2.6                  h27826a3_0    conda-forge
pillow                    8.3.2            py38h8e6f84c_0    conda-forge
pip                       21.2.4             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
prodigal                  2.6.3                h779adbc_3    bioconda
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.12.3           py38h578d9bd_7    conda-forge
pyqt-impl                 5.12.3           py38h7400c14_7    conda-forge
pyqt5-sip                 4.19.18          py38h709712a_7    conda-forge
pyqtchart                 5.12             py38h7400c14_7    conda-forge
pyqtwebengine             5.12.1           py38h7400c14_7    conda-forge
python                    3.8.12          hb7a2778_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
qt                        5.12.9               hda022c4_4    conda-forge
r-ape                     5.5               r41h306847c_0    conda-forge
r-base                    4.1.1                hb93adac_1    conda-forge
r-circlize                0.4.13            r41hc72bb7e_0    conda-forge
r-colorspace              2.0_2             r41hcfec24a_0    conda-forge
r-getopt                  1.20.3            r41ha770c72_2    conda-forge
r-globaloptions           0.1.2             r41ha770c72_0    conda-forge
r-lattice                 0.20_44           r41hcfec24a_0    conda-forge
r-nlme                    3.1_153           r41h859d828_0    conda-forge
r-optparse                1.6.6             r41hc72bb7e_1    conda-forge
r-rcpp                    1.0.7             r41h03ef668_0    conda-forge
r-shape                   1.4.6             r41ha770c72_0    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
reportlab                 3.5.68           py38hadf75a6_0    conda-forge
scipy                     1.7.1            py38h56a6a73_0    conda-forge
sed                       4.8                  he412f7d_0    conda-forge
setuptools                58.0.4           py38h578d9bd_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlite                    3.36.0               h9cd32fc_1    conda-forge
sysroot_linux-64          2.12                he073ed8_14    conda-forge
tk                        8.6.11               h27826a3_1    conda-forge
tktable                   2.10                 hb7b940f_3    conda-forge
tornado                   6.1              py38h497a2fe_1    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
xorg-libice               1.0.10               h516909a_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-libxt                1.2.1                h7f98852_2    conda-forge
xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
xorg-xproto               7.0.31            h14c3975_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.5.0                ha95c52a_0    conda-forge

Changing os.mkdir to os.makedirs() should fix the issue:

def force_create_folder(folder_to_create):
    if os.path.isdir(folder_to_create):
        shutil.rmtree(folder_to_create, ignore_errors=True)
        if os.path.isdir(folder_to_create):
            shutil.rmtree(folder_to_create, ignore_errors=True)
            if os.path.isdir(folder_to_create):
                shutil.rmtree(folder_to_create, ignore_errors=True)
                if os.path.isdir(folder_to_create):
                    shutil.rmtree(folder_to_create, ignore_errors=True)
    os.mkdir(folder_to_create)

btw, what's up with the odd code? Are you running into a latency issue? If yes, it might be better to use a while loop with a sleep and then os.path.isdir?

Missing ')' in input tree expression

Thanks for the nice tool! During the running procedure of MetaCHIP BP, I came across quite some error messages reading: ERROR: missing ')' in input tree expression line x column y. The MetaCHIP PI module run well. Is it normal? Should I trust the final output? Thanks!

About the interpretation of the results

Hello! I want to ask a question about the biological interpretation of horizontal gene transfer. I used MetaCHIP to get the genes that had a horizontal transfer between MAGs. The resulting genes are annotated into a database by DIAMOND blastp. What puzzled me was the discovery that gene A( carbon fixation Mcr) had a high rate of horizontal gene transfer between members within the same class , while gene A was not detected in MAGs. I don't know if it's a bug in the program of MetaCHIP or something else, is my result reliable?
Thanks!

Index out of range error in MetaCHIP BP

I am trying to run MetaCHIP on 42 genomes. I am able to run the PI portion of the pipeline either using either GTDBtk or a custom grouping, however, the second step fails with an error about an index out of range. Here is the full output:

$ MetaCHIP BP -p taxonomy -r g -t 48
[2020-06-22 20:32:21] Found grouping file taxonomy_g9_grouping.txt, input genomes were clustered into 9 groups
[2020-06-22 20:32:21] Filtering blast matches with the following criteria: Query genome != Subject genome, Alignment length >= 200bp and coverage >= 75%
[2020-06-22 20:32:22] Combining filtered blastn results
[2020-06-22 20:32:22] Get group-to-group identities with 48 cores
[2020-06-22 20:32:30] Plotting identity distribution between each pair of groups
[2020-06-22 20:32:33] Analyzing Blast hits to get HGT candidates with 48 cores
[2020-06-22 20:32:35] Plotting flanking regions with 48 cores
[2020-06-22 20:49:52] Extracting nc sequences for BM predicted HGTs
[2020-06-22 20:49:54] Deleting temporary files
[2020-06-22 20:49:55] Done for Best-match approach!
[2020-06-22 20:49:55] Found grouping file taxonomy_g9_grouping.txt, input genomes were clustered into 9 groups
[2020-06-22 20:49:55] Get gene/genome member in gene/species tree for each BM predicted HGT
[2020-06-22 20:49:55] Prepare subset of taxonomy_all_combined_faa.fasta for building gene tree
[2020-06-22 20:49:57] Get species/gene tree for 3873 BM approach identified HGTs with 48 cores
[2020-06-22 20:51:51] Running Ranger-DTL2 with dated mode
[2020-06-22 20:51:54] Parsing Ranger prediction results
[2020-06-22 20:51:54] Add Ranger-DTL predicted direction to HGT_candidates.txt
[2020-06-22 20:51:54] Deleting temporary files
[2020-06-22 20:51:55] Done for Phylogenetic approach!
Traceback (most recent call last):
  File "/opt/modules/pkgs/anaconda/4.8/envs/metachip/bin/MetaCHIP", line 259, in <module>
    combine_multiple_level_predictions(args, config_dict)
  File "/opt/modules/pkgs/anaconda/4.8/envs/metachip/lib/python3.7/site-packages/MetaCHIP/BP.py", line 3502, in combine_multiple_level_predictions
    Get_circlize_plot(multi_level_detection, output_prefix, pwd_detected_HGT_txt, genome_to_taxon_dict, circos_HGT_R, pwd_plot_circos, detection_rank_list, taxon_rank_num, pwd_MetaCHIP_op_folder)
  File "/opt/modules/pkgs/anaconda/4.8/envs/metachip/lib/python3.7/site-packages/MetaCHIP/BP.py", line 3199, in Get_circlize_plot
    value = each_3_split[2]
IndexError: list index out of range

There are some plots in the output, but the *_detected_HGTs.txt file only has the headers.

Singularity use

Hello,

I installed Metachip in a singularity container, it works well but I am not able to write output:

Detect HGT among phyla: Best-match approach.
[2021-07-28 13:48:28] Detect HGT among phyla: get group-to-group identities with 20 cores.
sort: cannot create temporary file in '/tmpscratch/lcornet/1278701': No such file or directory

It is because singularity on HPC cluster don't allow writing in the current directory. It must been launch with a command like :
singularity exec --bind /scratch/ulg/bioec/lcornet/Metachip:/mnt MetaCHIP.img MetaCHIP

So everything is written in /mnt and not in /tmpscratch/lcornet/. Is it possible to add a output directory in an option ?

Best regards,
Luc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.