bhattlab / mgefinder Goto Github PK
View Code? Open in Web Editor NEWA toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
License: MIT License
A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
License: MIT License
Hello, I keep getting "MissingInputException" and "Missing input files for rule pair" when I run mgefinder workflow denote however, all files are in the correct directories. I have deleted them and remade the files with no luck. I have run MGEfinder previously without this error.
The script file is: mgefinderworkflow.sh
The time limit is 150:00:00 HH:MM:SS.
The target directory is: /scratch/aubksw/MGEfinder/MGEfinder/cluster4
The working directory is: /scratch-local/aubksw.mgefinderworkflows.601211
The memory limit is: 8gb
The job will start running after: 2021-06-12T13:43:22
Job Name: mgefinderworkflows
Virtual queue: medium
QOS: --qos=medium
Constraints: --constraint=dmc
Using 6 cores on master node dmc19
Node list: dmc19
Nodes: dmc19 dmc19 dmc19 dmc19 dmc19 dmc19
Command typed:
/apps/scripts/run_script mgefinderworkflow.sh
Queue submit command:
sbatch --qos=medium -J mgefinderworkflows --begin=2021-06-12T13:43:22 --requeue --mail-user=[email protected] -o mgefinderworkflows.o601211 -t 150:00:00$
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
Get help documentation with --help.
Get version with --version.
command: workflow
workdir: /scratch/aubksw/MGEfinder/MGEfinder/cluster4
cores: 1
memory: 16000
unlock: False
rerun_incomplete: False
keep_going: False
sensitive: False
####################
MissingInputException in line 93 of /home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Missing input files for rule pair:
/scratch/aubksw/MGEfinder/MGEfinder/cluster4/00.bam/Xsp60.XretroflexusSp953.bam
COMMAND: snakemake -s /home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile --config wd=/scratch$
Traceback (most recent call last):
File "/home/aubksw/anaconda3/envs/mgefinder/bin/mgefinder", line 8, in
sys.exit(cli())
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 78, in denovo
_workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going)
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow.py", line 25, in _workflow
shell(cmd)
File "/home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original$
Hi!
I was getting the following errors when I run mgefinder workflow using my data sets. How can I resolve this issue?
Finished job 16.
7 of 16 steps (44%) done
rule make_database:
input: ./01.mgefinder/NC_000962_3/NC_000962_3.all_inferseq.txt
output: ./02.database/NC_000962_3/NC_000962_3.database.fna, ./02.database/NC_000962_3/NC_000962_3.database.fna.1.bt2
jobid: 13
benchmark: ./02.database/NC_000962_3/NC_000962_3.database.benchmark.txt
wildcards: genome=NC_000962_3
#### CHECKING DEPENDENCIES ####
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
#### PARAMETERS ####
command: makedatabase
inferseqfiles: ('./01.mgefinder/NC_000962_3/NC_000962_3.all_inferseq.txt',)
minimum_size: 30
maximum_size: 200000
threads: 1
memory: 16000
force: True
output_dir: ./02.database/NC_000962_3
prefix: NC_000962_3.database
####################
Parsing inferseq files
Combining the inferseq files...
Loading file 1/3: ./01.mgefinder/NC_000962_3/JPN-B2019-Rv-1224_S1_L001/03.inferseq_assembly.JPN-B2019-Rv-1224_S1_L001.NC_000962_3.tsv
Loading file 2/3: ./01.mgefinder/NC_000962_3/JPN-B2019-Rv-1224_S1_L001/03.inferseq_reference.JPN-B2019-Rv-1224_S1_L001.NC_000962_3.tsv
Loading file 3/3: ./01.mgefinder/NC_000962_3/JPN-B2019-Rv-1224_S1_L001/03.inferseq_overlap.JPN-B2019-Rv-1224_S1_L001.NC_000962_3.tsv
Deleting old database directory...
No termini found in the input file...
Waiting at most 5 seconds for missing files.
Error in job make_database while creating output files ./02.database/NC_000962_3/NC_000962_3.database.fna, ./02.database/NC_000962_3/NC_000962_3.database.fna.1.bt2.
MissingOutputException in line 192 of /home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile:
Missing files after 5 seconds:
./02.database/NC_000962_3/NC_000962_3.database.fna
./02.database/NC_000962_3/NC_000962_3.database.fna.1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Traceback (most recent call last):
File "/home/user/anaconda3/envs/mgefinder/bin/mgefinder", line 8, in <module>
sys.exit(cli())
File "/home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 51, in workflow
_workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going)
File "/home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow.py", line 26, in _workflow
shell(cmd)
File "/home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/snakemake/shell.py", line 88, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile --config wd=. memory=16000 --cores 1 --configfile /home/user/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/config.yml ' returned non-zero exit status 1.
.
├── 00.assembly
│ ├── JPN-B2019-Rv-1224_S1_L001.fna
│ ├── JPN-B2019-Rv-1224_S1_L001.fna.1.bt2
│ ├── JPN-B2019-Rv-1224_S1_L001.fna.2.bt2
│ ├── JPN-B2019-Rv-1224_S1_L001.fna.3.bt2
│ ├── JPN-B2019-Rv-1224_S1_L001.fna.4.bt2
│ ├── JPN-B2019-Rv-1224_S1_L001.fna.rev.1.bt2
│ └── JPN-B2019-Rv-1224_S1_L001.fna.rev.2.bt2
├── 00.bam
│ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.bam
│ └── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.bam.bai
├── 00.genome
│ ├── NC_000962_3.fna
│ ├── NC_000962_3.fna.1.bt2
│ ├── NC_000962_3.fna.2.bt2
│ ├── NC_000962_3.fna.3.bt2
│ ├── NC_000962_3.fna.4.bt2
│ ├── NC_000962_3.fna.amb
│ ├── NC_000962_3.fna.ann
│ ├── NC_000962_3.fna.bwt
│ ├── NC_000962_3.fna.pac
│ ├── NC_000962_3.fna.rev.1.bt2
│ ├── NC_000962_3.fna.rev.2.bt2
│ ├── NC_000962_3.fna.sa
│ └── log
│ ├── NC_000962_3.index_bowtie2.benchmark.txt
│ ├── NC_000962_3.index_bowtie2.log
│ └── NC_000962_3.index_bowtie2.log.err
├── 01.mgefinder
│ └── NC_000962_3
│ ├── JPN-B2019-Rv-1224_S1_L001
│ │ ├── 01.find.JPN-B2019-Rv-1224_S1_L001.NC_000962_3.tsv
│ │ ├── 02.pair.JPN-B2019-Rv-1224_S1_L001.NC_000962_3.tsv
│ │ ├── 03.inferseq_assembly.JPN-B2019-Rv-1224_S1_L001.NC_000962_3.tsv
│ │ ├── 03.inferseq_overlap.JPN-B2019-Rv-1224_S1_L001.NC_000962_3.tsv
│ │ ├── 03.inferseq_reference.JPN-B2019-Rv-1224_S1_L001.NC_000962_3.tsv
│ │ └── log
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.benchmark.txt
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.find.benchmark.txt
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.find.log
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.find.log.err
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.inferseq_assembly.benchmark.txt
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.inferseq_assembly.log
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.inferseq_assembly.log.err
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.inferseq_overlap.log
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.inferseq_overlap.log.err
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.inferseq_reference.benchmark.txt
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.inferseq_reference.log
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.inferseq_reference.log.err
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.pair.benchmark.txt
│ │ ├── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.pair.log
│ │ └── JPN-B2019-Rv-1224_S1_L001.NC_000962_3.pair.log.err
│ └── NC_000962_3.all_inferseq.txt
└── 02.database
└── NC_000962_3
└── NC_000962_3.database.benchmark.txt
I confirmed both generating the test dataset and analyzing them using mgefinder workflow work fine.
Thank you in advance.
Yosm
In our data set there seems to be some filtering that occurred between the 01.clusterseq..tsv and the *.all_seqs.fna files. Even after the 01.clusterseq..tsv file was filtered to remove duplicates based on the sequence inference method, it contains far more MGE sequences than the *.all_seqs.fna file. Could you direct me where to look to find the filtering information that was used to create the *.all_seqs.fna file? My main concern is that the 01.clusterseq..tsv file contains a high number of potentially false positive MGE sequences.
Thanks in advance for your help!
Hello! I have an error while running workflow denovo. I had the same problem with my samples and with your test directory.
Finished job 3.
75 of 79 steps (95%) done
rule genotype:
input: test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv, test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt
output: test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv
log: test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log
jobid: 4
benchmark: test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.benchmark.txt
wildcards: genome=efae_GCF_900639545
zsh:2: = not found
Error in job genotype while creating output file test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv.
RuleException:
CalledProcessError in line 286 of /home/mk/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Command '
if [ "True" == "True" ]; then
mgefinder genotype --filter-clusters-inferred-assembly test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt -o test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv 1> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log 2> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err || (cat test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err; exit 1)
else
mgefinder genotype --no-filter-clusters-inferred-assembly test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt -o test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv 1> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log 2> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err || (cat test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err; exit 1)
fi
' returned non-zero exit status 1.
File "/home/mk/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile", line 286, in __rule_genotype
File "/home/mk/miniconda3/envs/mgefinder/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
test_workdir/03.results/efae_GCF_900639545/log directory is empty.
Hi,
I want to detect MGE(mobile genetic elements)in the contigs,but I don't know whether ‘MGEfinder’ can solve it?
Hi, durrantmm.
Could you tell us what IAwoC and ArSC mean?
I found the 4 words ( IAwFC, IAwoC, IDB, ArSC) used to specify confidence level of the identified insertion sequence in the file 02.genotype..tsv.
In the user manual, IAwFC and IDB are explained, and I could not find the above two.
My purpose is strain genotyping based on polymorphism of the inserted position of MGEs by using resequencing data.
Additionally, if possible, could you recommend or suggest any tools for the analysis of strain genotyping based on 02.genotype..tsv. I especially want to know which strain belongs to which cluster consist of strains harboring an identical MGE profile.
Many thanks for your kind support.
Hello, a very meaningful tool. I have two questions to ask:
2.I encountered the following problem when using the trim-gallore tool to process data. I don't know how to handle it?
(trim-galore) [KXY@zju 673]$ trim_galore --fastqc --paired 673.nodup_R1.fastq.gz 673.nodup_R2.fastq.gz --cores 8
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 1.18
Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<)
Letting the (modified) Cutadapt deal with the Python version instead
pigz 2.6
Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores
Proceeding with 'pigz -p 4' for decompression
To decrease CPU usage of decompression, please install 'igzip' and run again
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> 673.nodup_R1.fastq.gz <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 905 AGATCGGAAGAGC 1000000 0.09
Nextera 1 CTGTCTCTTATA 1000000 0.00
smallRNA 0 TGGAATTCTCGG 1000000 0.00
Using Illumina adapter for trimming (count: 905). Second best hit was Nextera (count: 1)
Writing report to '673.nodup_R1.fastq.gz_trimming_report.txt'
Input filename: 673.nodup_R1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.10
Cutadapt version: 1.18
Python version: could not detect
Number of cores used for trimming: 8
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Running FastQC on the data once trimming has completed
Output file(s) will be GZIP compressed
Cutadapt seems to be reasonably up-to-date. Setting -j 8
Writing final adapter and quality trimmed output to 673.nodup_R1_trimmed.fq.gz
Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file 673.nodup_R1.fastq.gz <<<
This is cutadapt 1.18 with Python 3.7.12
Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC 673.nodup_R1.fastq.gz
Processing reads on 8 cores in single-end mode ...
ERROR: Traceback (most recent call last):
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/cutadapt/pipeline.py", line 412, in reader_process
pipe.send_bytes(chunk)
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 88, in exit
self.close()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 215, in close
self._raise_if_error()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 231, in _raise_if_error
raise IOError(message)
OSError
ERROR: Traceback (most recent call last):
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/cutadapt/pipeline.py", line 412, in reader_process
pipe.send_bytes(chunk)
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 88, in exit
self.close()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 215, in close
self._raise_if_error()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 231, in _raise_if_error
raise IOError(message)
OSError
ERROR: Traceback (most recent call last):
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/cutadapt/pipeline.py", line 412, in reader_process
pipe.send_bytes(chunk)
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 88, in exit
self.close()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 215, in close
self._raise_if_error()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 231, in _raise_if_error
raise IOError(message)
OSError
ERROR: Traceback (most recent call last):
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/cutadapt/pipeline.py", line 412, in reader_process
pipe.send_bytes(chunk)
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 88, in exit
self.close()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 215, in close
self._raise_if_error()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 231, in _raise_if_error
raise IOError(message)
OSError
ERROR: Traceback (most recent call last):
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/cutadapt/pipeline.py", line 412, in reader_process
pipe.send_bytes(chunk)
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 88, in exit
self.close()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 215, in close
self._raise_if_error()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 231, in _raise_if_error
raise IOError(message)
OSError
ERROR: Traceback (most recent call last):
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/cutadapt/pipeline.py", line 412, in reader_process
pipe.send_bytes(chunk)
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 88, in exit
self.close()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 215, in close
self._raise_if_error()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/xopen/init.py", line 231, in _raise_if_error
raise IOError(message)
OSError
ERROR: Traceback (most recent call last):
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/cutadapt/pipeline.py", line 486, in run
(n, bp1, bp2) = self._pipeline.process_reads()
File "/data/users/KXY/miniconda3/envs/trim-galore/lib/python3.7/site-packages/cutadapt/pipeline.py", line 230, in process_reads
for read in self._reader:
File "src/cutadapt/_seqio.pyx", line 176, in iter
cutadapt.seqio.FormatError: FASTQ file ended prematurely
cutadapt: error: FASTQ file ended prematurely
Cutadapt terminated with exit signal: '256'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...
Hi all
I am trying to run test files of MGEfinder but I got this error
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
:
:
Error in job genotype while creating output file
test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv.
RuleException:
CalledProcessError in line 286 of /Users/mo/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Command '
if [ "True" == "True" ]; then
mgefinder genotype --filter-clusters-inferred-assembly test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt -o test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv 1> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log 2> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err || (cat test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err; exit 1)
else
mgefinder genotype --no-filter-clusters-inferred-assembly test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt -o test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv 1> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log 2> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err || (cat test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err; exit 1)
fi
' returned non-zero exit status 1.
File "/Users/mo/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile", line 286, in __rule_genotype
File "/Users/mo/miniconda3/envs/mgefinder/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
:
:
subprocess.CalledProcessError: Command 'snakemake -s /Users/mo/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile --config wd=test_workdir/ memory=16000 --cores 1 --configfile /Users/mo/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.config.yml ' returned non-zero exit status 1.
Hi,
On many HPC clusters, conda is not supported for several reasons and singularity is kind of the last resort if nothing else works.
Would you mind to at least provide a list of the required dependencies for MGEfinder on the wiki of this github repository?
Then people would have a choice to decide if they want to use conda, singularity or if they provided the dependencies in a different way. I don't ask you to support this way of installing the software actively, but please at least provide the list of dependencies.
Best regards
Sam
I'm having a locale-related problem with Click (I think). Not sure how to remedy this, setting LANG to en_US.UTF-8 didn't seem to work.
I installed mgefinder using the conda instructions (install.sh)
I'm guessing there's something wierd about my environment, but I'm not sure where to start.
$ mgefinder --help
#### CHECKING DEPENDENCIES ####
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
Traceback (most recent call last):
File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/home/aprasad/mydata/PD-3134/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in <module>
sys.exit(cli())
File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/home/aprasad/mydata/PD-3134/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/home/aprasad/mydata/PD-3134/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 696, in main
_verify_python3_env()
File "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/home/aprasad/mydata/PD-3134/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/_unicodefun.py", line 124, in _verify_python3_env
' mitigation steps.' + extra
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/en/7.x/python3/ for mitigation steps.
This system lists a couple of UTF-8 supporting locales that
you can pick from. The following suitable locales were
discovered: aa_DJ.utf8, aa_ER.utf8, aa_ET.utf8, af_ZA.utf8, am_ET.utf8, an_ES.utf8, ar_AE.utf8, ar_BH.utf8, ar_DZ.utf8, ar_EG.utf8, ar_IN.utf8, ar_IQ.utf8, ar_JO.utf8, ar_KW.utf8, ar_LB.utf8, ar_LY.utf8, ar_MA.utf8, ar_OM.utf8, ar_QA.utf8, ar_SA.utf8, ar_SD.utf8, ar_SY.utf8, ar_TN.utf8, ar_YE.utf8, as_IN.utf8, ast_ES.utf8, ayc_PE.utf8, az_AZ.utf8, be_BY.utf8, bem_ZM.utf8, ber_DZ.utf8, ber_MA.utf8, bg_BG.utf8, bho_IN.utf8, bn_BD.utf8, bn_IN.utf8, bo_CN.utf8, bo_IN.utf8, br_FR.utf8, brx_IN.utf8, bs_BA.utf8, byn_ER.utf8, ca_AD.utf8, ca_ES.utf8, ca_FR.utf8, ca_IT.utf8, crh_UA.utf8, cs_CZ.utf8, csb_PL.utf8, cv_RU.utf8, cy_GB.utf8, da_DK.utf8, de_AT.utf8, de_BE.utf8, de_CH.utf8, de_DE.utf8, de_LU.utf8, doi_IN.utf8, dv_MV.utf8, dz_BT.utf8, el_CY.utf8, el_GR.utf8, en_AG.utf8, en_AU.utf8, en_BW.utf8, en_CA.utf8, en_DK.utf8, en_GB.utf8, en_HK.utf8, en_IE.utf8, en_IN.utf8, en_NG.utf8, en_NZ.utf8, en_PH.utf8, en_SG.utf8, en_US.utf8, en_ZA.utf8, en_ZM.utf8, en_ZW.utf8, es_AR.utf8, es_BO.utf8, es_CL.utf8, es_CO.utf8, es_CR.utf8, es_CU.utf8, es_DO.utf8, es_EC.utf8, es_ES.utf8, es_GT.utf8, es_HN.utf8, es_MX.utf8, es_NI.utf8, es_PA.utf8, es_PE.utf8, es_PR.utf8, es_PY.utf8, es_SV.utf8, es_US.utf8, es_UY.utf8, es_VE.utf8, et_EE.utf8, eu_ES.utf8, fa_IR.utf8, ff_SN.utf8, fi_FI.utf8, fil_PH.utf8, fo_FO.utf8, fr_BE.utf8, fr_CA.utf8, fr_CH.utf8, fr_FR.utf8, fr_LU.utf8, fur_IT.utf8, fy_DE.utf8, fy_NL.utf8, ga_IE.utf8, gd_GB.utf8, gez_ER.utf8, gez_ET.utf8, gl_ES.utf8, gu_IN.utf8, gv_GB.utf8, ha_NG.utf8, he_IL.utf8, hi_IN.utf8, hne_IN.utf8, hr_HR.utf8, hsb_DE.utf8, ht_HT.utf8, hu_HU.utf8, hy_AM.utf8, ia_FR.utf8, id_ID.utf8, ig_NG.utf8, ik_CA.utf8, is_IS.utf8, it_CH.utf8, it_IT.utf8, iu_CA.utf8, iw_IL.utf8, ja_JP.utf8, ka_GE.utf8, kk_KZ.utf8, kl_GL.utf8, km_KH.utf8, kn_IN.utf8, ko_KR.utf8, kok_IN.utf8, ks_IN.utf8, ku_TR.utf8, kw_GB.utf8, ky_KG.utf8, lb_LU.utf8, lg_UG.utf8, li_BE.utf8, li_NL.utf8, lij_IT.utf8, lo_LA.utf8, lt_LT.utf8, lv_LV.utf8, mag_IN.utf8, mai_IN.utf8, mg_MG.utf8, mhr_RU.utf8, mi_NZ.utf8, mk_MK.utf8, ml_IN.utf8, mn_MN.utf8, mni_IN.utf8, mr_IN.utf8, ms_MY.utf8, mt_MT.utf8, my_MM.utf8, nb_NO.utf8, nds_DE.utf8, nds_NL.utf8, ne_NP.utf8, nhn_MX.utf8, niu_NU.utf8, niu_NZ.utf8, nl_AW.utf8, nl_BE.utf8, nl_NL.utf8, nn_NO.utf8, nr_ZA.utf8, nso_ZA.utf8, oc_FR.utf8, om_ET.utf8, om_KE.utf8, or_IN.utf8, os_RU.utf8, pa_IN.utf8, pa_PK.utf8, pap_AN.utf8, pl_PL.utf8, ps_AF.utf8, pt_BR.utf8, pt_PT.utf8, ro_RO.utf8, ru_RU.utf8, ru_UA.utf8, rw_RW.utf8, sa_IN.utf8, sat_IN.utf8, sc_IT.utf8, sd_IN.utf8, se_NO.utf8, shs_CA.utf8, si_LK.utf8, sid_ET.utf8, sk_SK.utf8, sl_SI.utf8, so_DJ.utf8, so_ET.utf8, so_KE.utf8, so_SO.utf8, sq_AL.utf8, sq_MK.utf8, sr_ME.utf8, sr_RS.utf8, ss_ZA.utf8, st_ZA.utf8, sv_FI.utf8, sv_SE.utf8, sw_KE.utf8, sw_TZ.utf8, szl_PL.utf8, ta_IN.utf8, ta_LK.utf8, te_IN.utf8, tg_TJ.utf8, th_TH.utf8, ti_ER.utf8, ti_ET.utf8, tig_ER.utf8, tk_TM.utf8, tl_PH.utf8, tn_ZA.utf8, tr_CY.utf8, tr_TR.utf8, ts_ZA.utf8, tt_RU.utf8, ug_CN.utf8, uk_UA.utf8, unm_US.utf8, ur_IN.utf8, ur_PK.utf8, ve_ZA.utf8, vi_VN.utf8, wa_BE.utf8, wae_CH.utf8, wal_ET.utf8, wo_SN.utf8, xh_ZA.utf8, yi_US.utf8, yo_NG.utf8, yue_HK.utf8, zh_CN.utf8, zh_HK.utf8, zh_SG.utf8, zh_TW.utf8, zu_ZA.utf8
$ echo $LANG
en_US.UTF-8
like this , does it means my contents in the .fna file is not fit the workflow? i am able to finish running the 2.4G files used in tutorial ,
my workdir is like this:
----BJ22012
----00.assembly
----00.bam
----00.genome
bam and bam.bai files was created as tutorial showed but assembly file and genome file was using a assembled but not known which approach file and not sure if it is fit for the workflow , if workflow itself needs a specific format on contents please give a short sample , thanks for anyone who could help.
Hello, I have tried running mgefinder on multiple data sets. Some have worked and some have not. My latest run produced this:
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
Get help documentation with --help.
Get version with --version.
command: workflow
workdir: /scratch/aubksw/MGEfinder/MGEfinder/cluster5
cores: 1
memory: 16000
unlock: False
rerun_incomplete: False
keep_going: False
sensitive: True
####################
COMMAND: snakemake -s /home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.sensitive.Snakefile --config wd=/scratch/aubksw/MGEfinder/MGEfinder/cluster5 memory=16000 --cores 1 --configfile /home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.sensitive.config.yml
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1
rule all:
jobid: 0
Finished job 0.
1 of 1 steps (100%) done
There were no output files created and no error messages so I am not sure what went wrong. Please let me know if there is a solution. Thank you
I am trying to download and activate mgefinder on a Mac Ventura 13.2.1 with an Apple M2 Max chip.
I receive this error every time I try to run the install script -- even after adding conda forge and other channels to my conda environment, the same error appears. I am a new with coding, so this may be an easy fix but I have not been able to find a work around. Thanks!
"(base) mafuller@JV25QX0JKV MGEfinder % bash install.sh
Removing mgefinder environment if already installed...
Installing mgefinder environment...
Channels:
PackagesNotFoundError: The following packages are not available from current channels:
Current channels:
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
Installation Complete.
Before running mgefinder, activate the mgefinder environment with
conda activate mgefinder
You can then run mgefinder by typing
mgefinder [command]
Send any questions to [email protected]"
mgefinder is giving me an error code I don't understand. As far as I can tell the input files follow the specifications and everything should be up to date, loaded or updated in the last week. Entire screen dump in in attached file.
mge_screen-out.txt
Pertinent part would seem to be:
click.echo('Loading file {num1}/{num2}: {f}'.format(num1=1, num2=len(inferseq_files), f=inferseq_files[0]))
IndexError: list index out of range
Error in job clusterseq while creating output file /home/rick/Documents/Campy-mge-test/03.results/AL111168/01.clusterseq.AL111168.tsv.
RuleException:
CalledProcessError in line 259 of /home/rick/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Command '
mgefinder clusterseq -minsize 70 -maxsize 200000 --threads 1 --memory 50000 /home/rick/Documents/Campy-mge-test/01.mgefinder/AL111168/AL111168.all_inferseq_database.txt -o /home/rick/Documents/Campy-mge-test/03.results/AL111168/01.clusterseq.AL111168.tsv
' returned non-zero exit status 1.
File "/home/rick/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile", line 259, in __rule_clusterseq
File "/home/rick/anaconda3/envs/mgefinder/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Thank you for any help,
Rick
Hi,
Thank you to make this great tool.
I finally get the 03. results folder. But when I check the number of unique clusters in "01.clusterseq.GCA_000210735.tsv", I found the number is not the same as the number of clusters in 03.summarize.GCA_000210735.clusters.tsv. For example, 1331 vs 1234. The number of groups is also the same case. Besides, the number of unique inferred_seq in "01.clusterseq.GCA_000210735.tsv" is also not the same as the number of contigs in "04.makefasta.GCA_000210735.all_seqs.fna". Do you have any explanation for this? Thanks a lot!
This is a question more than an issue. I am interested in looking for MGEs shared across multiple species of staph. In doing so, should I map all the reads with BWA to a single ref species (one which shares the most genes with all others), or do BWA mapping to the individual reference for each specie? I guess my question is are MGEs identified across different species using species specific reference species going to be comparable?
Thanks in advance for your input!
Hello, do you know what is causing this error and how to fix it?
KeyError: 'emb|FRDD01000003.1|'
Error in job pair while creating output file /scratch/aubksw/MGEfinder/MGEfinder/cluster4/01.mgefinder/XretroflexusSp953/Xsp60/02.pair.Xsp60.XretroflexusSp953.tsv.
RuleException:
CalledProcessError in line 109 of /home/aubksw/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Command '
mgefinder pair -maxdr 20 -minq 20 -minial 21 -maxjsp 0.15 -lins 30 /scratch/aubksw/MGEfinder/MGEfinder/cluster4/01.mgefinder/XretroflexusSp953/Xsp60/01.find.Xsp60.XretroflexusSp953.tsv $
' returned non-zero exit status 1.
Hello!
I installed MGEfinder using Method 1 as per the installation instructions and everything went fine.
I ran (after activating the conda env):
$ mgefinder find <bam alignment>.bam
and was able to obtain the *.tsv with correct looking output.
Following this I ran:
$ mgefinder pair <tsv from mgefinder find>.tsv <bam alignment>.bam <ref genome>.fasta
This command was unsuccessful and the output was:
#### CHECKING DEPENDENCIES ####
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
#### PARAMETERS ####
command: pair
findfile: mgefinder.find.tsv
bamfile: EC_IC_20_1X_MinION_Hybrid.align.sorted.bam
genome: EC_IC_20_1X_MinION_Hybrid.assembly.fasta
max_direct_repeat_length: 20
min_alignment_quality: 20
min_alignment_inner_length: 21
max_junction_spanning_prop: 0.15
large_insertion_cutoff: 30
output_file: mgefinder.pairs.tsv
####################
Finding all flank pairs within 20 bases of each other ...
Finding all inverted repeats at termini in 8 candidate pairs...
Assigning pairs according to existence of inverted repeats, read count difference, and flank length difference...
Traceback (most recent call last):
File "/home/reedrich/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in <module>
sys.exit(cli())
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 123, in pair
min_alignment_inner_length, max_junction_spanning_prop, large_insertion_cutoff, output_file)
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 40, in _pair
flank_pairs = flank_pairer.run_pair_flanks()
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 99, in run_pair_flanks
assigned_pairs = self.assign_pairs(pairs)
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 245, in assign_pairs
self.get_header_list()].sort_values(['contig', 'pos_5p', 'pos_3p'])
File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1761, in __getitem__
return self._getitem_tuple(key)
File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1288, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1953, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1594, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1552, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1654, in _validate_read_indexer
"Passing list-likes to .loc or [] with any missing labels "
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'
I am wondering what the source of error is. Is this due to a dependency out of data or perhaps using a different version of python than that which is required?
Thank you for your advice and time.
All the best,
-BioRRW
Hi! I'm running into an error on the pair step while using MGEfinder v1.0.6. I'm using the workflow denovo command, and the working directory only produces the 01.mgefinder directory. Within the ~/01.mgefinder///log/..pair.log.err file, I get this error:
Traceback (most recent call last):
File "/home/erin.newcomer/.conda/envs/mgefinder/bin/mgefinder", line 8, in
sys.exit(cli())
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 194, in pair
min_alignment_inner_length, max_junction_spanning_prop, large_insertion_cutoff, output_file)
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 40, in _pair
flank_pairs = flank_pairer.run_pair_flanks()
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 113, in run_pair_flanks
final_pairs = self.get_direct_repeats(filtered_pairs)
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 278, in get_direct_repeats
positions = self.get_reference_direct_repeats(flank_pairs, genome_dict)
File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 292, in get_reference_direct_repeats
direct_repeat = genome_dict[contig][(start+1):end]
KeyError: '1'
Do you have any advice?
Hello, thanks for the wonderfull tool you have developed for exploring MGE! As you mentioned in your manuscript, the choice of reference genome is important (isolates should share at least 98.5% nucleotide identity with the reference genome), so before analyses, I want to consult you how should I choose my reference genome. Here I have downloaded ~200 genomes of one bacterial species, and from the phylogenetic tree I found that they were divided into five clades, so I want to do some analyses about mobile genetic elements of the five clades, and compare them.
I intend to perform MGE analysis of the five clades separately using your MGEfinder
, however, I am a little confused about which genome can be used as the reference genome for each clade, can you help me? Thanks in advance.
Best,
jk yin
mgefinder workflow denovo --cores 4 ../fastqtrim/
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
Get help documentation with --help.
Get version with --version.
command: workflow
workdir: ../fastqtrim/
cores: 4
memory: 16000
unlock: False
rerun_incomplete: False
keep_going: False
sensitive: False
####################
COMMAND: snakemake -s /home/biobootcamp/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile --config wd=../fastqtrim/ memory=16000 --cores 4 --configfile /home/biobootcamp/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.config.yml
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1
rule all:
jobid: 0
Finished job 0.
1 of 1 steps (100%) done
Hello,
I tried installing mgefinder using both conda and pip but for either method, I encountered the following error:
mgefinder workflow denovo workdir/
Usage: mgefinder workflow [OPTIONS] WORKDIR Try 'mgefinder workflow --help' for help. Error: Invalid value for 'WORKDIR': Path 'denovo' does not exist.
And when I removed "denovo", I got another error:
mgefinder workflow workdir/
command: workflow
workdir: workdir/
snakefile: /home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/mgefinder/workflow/Snakefile configfile: /home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/mgefinder/workflow/config.yml
cores: 1
memory: 16000
unlock: False
rerun_incomplete: False
keep_going: False ###################
Traceback (most recent call last):
File "/home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/bin/mgefinder", line 11, in sys.exit(cli())
File "/home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 829, in call return self.main(*args, **kwargs)
File "/home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx)
File "/home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params)
File "/home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs)
File "/home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/mgefinder/main.py", line 46, in workflow
_workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going) File "/home/styphi/sf_D/Test_dir/miniconda2/envs/mgefinder/lib/python3.7/site-packages/mgefinder/workflow.py", line 7, in _workflow
force_incomplete=rerun_incomplete, keepgoing=keep_going) TypeError: snakemake() got an unexpected keyword argument 'configfile'
Could you please advise?
Many thanks.
Hi,
Our group (Dr. Pamer's UChicago Lab) is very excited to find a de novo approach to detect MGE. I am running your tutorial just to see how it works. However, I ran into some issues which I have no idea how to fix. It seems there is usage error about the "click" package. Please see the following error message.
I ran the following command as you listed in the tutorial.
$ mgefinder workflow --cores 16 test_workdir/
The following is the error message.
#### PARAMETERS ###
command: workflow
workdir: test_workdir/
snakefile: /home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/mgefinder/workflow/Snakefile
configfile: /home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/mgefinder/workflow/config.yml
cores: 16
memory: 16000
unlock: False
rerun_incomplete: False
keep_going: False
###################
Traceback (most recent call last):
File "/home/dfi_user/miniconda3/envs/mgefinder/bin/mgefinder", line 11, in <module>
sys.exit(cli())
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/mgefinder/main.py", line 46, in workflow
_workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.7/site-packages/mgefinder/workflow.py", line 7, in _workflow
force_incomplete=rerun_incomplete, keepgoing=keep_going)
TypeError: snakemake() got an unexpected keyword argument 'configfile'
I am running the tutorial inside the mgefinder environment created by miniconda3. What other specs I should provide for you to identify the problem?
Also, what is the purpose of de-duplicating the raw fastq files? I couldn't get hts_SuperDeduper to work yet. Is it OK if I skip that step?
Thanks,
Eddi
Hello,
I'm writing about some trouble that I have had running MGEfinder with my own data. I was able to complete the step-by-step tutorial without any issues-- The "mgefinder workflow denovo" command ran just fine and the correct output files were generated.
For my own data, I created a directory called "workdir", which included three directories:
To run mgefinder, I used the following command: mgefinder workflow denovo -t 10 workdir/
The program terminated with only 53% of the analysis completed. I've included the error file as an attachment. The issues seems to be related to the following portion of the error file:
rule make_database:
input: workdir/01.mgefinder/flye_final_polished/flye_final_polished.all_inferseq.txt
output: workdir/02.database/flye_final_polished/flye_final_polished.database.fna, workdir/02.database/flye_final_polished/flye_final_polished.database.fna.1.bt2
jobid: 12
benchmark: workdir/02.database/flye_final_polished/flye_final_polished.database.benchmark.txt
wildcards: genome=flye_final_polished
threads: 10
Waiting at most 5 seconds for missing files.
Error in job make_database while creating output files workdir/02.database/flye_final_polished/flye_final_polished.database.fna, workdir/02.database/flye_final_polished/flye_final_polished.database.fna.1.bt2.
MissingOutputException in line 192 of /scratch2/software/anaconda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Missing files after 5 seconds:
workdir/02.database/flye_final_polished/flye_final_polished.database.fna
workdir/02.database/flye_final_polished/flye_final_polished.database.fna.1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Any insight and assistance would be appreciated! Thank you!
I have a FASTA file assembled already, is it possible to use this file in MGEfinder to detect any MGE that is present in my sequence?
If so, what is the code I should use to run the system? Since the tutorial mostly focuses on non-assembled files..
Thanks
Hi Matthew,
I’m very interested in the Analysis of Insertion-Enriched Sites in your paper.
I am trying to analyze Insertion-Enriched Sites caused by Insertion sequences (the upstream, within and downstream of the nearest CDS) among 200 complete sequenced bacteria.
I wonder if I can use the MGEfinder tool to do this?
If yes, Could you please give me some idea (or the workflow)?
Many thanks in advance.
Sincerely,
Dai Kuang
I created a snakemake workflow to make the necessary files and organize them for mgefinder, but I am having difficulties merging mgefinder, which has its own environment and configfile, as a part of my workflow
subworkflow
but then mgefinder runs first when I need it to run last include
but it prevents me from using a unique environment or config file.rule mgefinder:
input:
lambda wildcards:
expand("mgefinder/{group}/00.bam/{sample2}.{sample1}.bam.bai",
sample1=GROUPS.get(int(wildcards.group)),
sample2=GROUPS.get(int(wildcards.group)),
allow_missing=True),
lambda wildcards:
expand("mgefinder/{group}/00.{dirname}/{sample}.fna",
sample=GROUPS.get(int(wildcards.group)),
dirname=["assembly","genome"],allow_missing=True),
output:
"mgefinder/{group}/dummy.txt"
params:
prefix="mgefinder/{group}/"
conda:
"database/mgefinder.yaml"
shell:
"mgefinder workflow denovo {params.prefix}; touch `{params.prefix}/dummy.txt"
I was wondering if you have any suggestions as to how to do it?
When I attempt to convert .sam to .bam I get the following error and no .bam file is created:
(mgefinder) -bash-4.2$ mgefinder formatbam 1027D_19.NC_014925.1.sam 1027D_19.NC_014925.1.bam
Traceback (most recent call last):
File "/network/rit/lab/andamlab/bin/miniconda3/envs/mgefinder/bin/mgefinder", line 7, in
from mgefinder.main import cli
File "/network/rit/lab/andamlab/bin/miniconda3/envs/mgefinder/lib/python3.7/site-packages/mgefinder/main.py", line 6, in
from mgefinder.pair import _pair
File "/network/rit/lab/andamlab/bin/miniconda3/envs/mgefinder/lib/python3.7/site-packages/mgefinder/pair.py", line 8, in
from mgefinder import fastatools, embosstools, pysamtools, sctools
File "/network/rit/lab/andamlab/bin/miniconda3/envs/mgefinder/lib/python3.7/site-packages/mgefinder/fastatools.py", line 6, in
from Bio.Alphabet import IUPAC
File "/network/rit/lab/andamlab/bin/miniconda3/envs/mgefinder/lib/python3.7/site-packages/Bio/Alphabet/init.py", line 21, in
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type
as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type
as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
Hi,
I am very excited about using mgefinder, but so far I cannot make it work. I successfully run the script with a test dataset. However, multiple trials with different isolates and different reference genomes gave me the same error.
Error in job clusterseq while creating output file workdir/03.results/R27/01.clusterseq.R27.tsv.
RuleException:
CalledProcessError in line 259 of /home/rozwandm/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Command 'mgefinder clusterseq -minsize 70 -maxsize 200000 --threads 1 --memory 16000 workdir/01.mgefinder/R27/R27.all_inferseq_database.txt -o workdir/03.results/R27/01.clusterseq.R27.tsv' returned non-zero exit status 1.
File "/home/rozwandm/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile", line 259, in __rule_clusterseq
File "/home/rozwandm/.conda/envs/mgefinder/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Thank you in advance for your help,
Marta
Greetings! Firstly, I would like to express my sincere gratitude for your dedicated efforts in developing the MGEfinder software. Your work has greatly facilitated our research endeavors.
I am currently engaged in the analysis of short-read sequencing data related to the evolutionary drug resistance in Klebsiella pneumoniae. The goal is to uncover whether there have been alterations in MGEs within the strains during their evolutionary process. Presently, I have completed the analysis for 18 strains.
However, I have encountered some queries while interpreting the analysis results. In the "04.makefasta.ref.all_seqs.fna" file generated by MGEfinder, I observed only 4 sequences, whereas in the "01.clusterseq.ref.tsv" file, I noticed the presence of 455 sequences labeled as "inferred_seq." After annotating these sequences using ISfinder, all were identified as Insertion Sequence (IS) elements.
I would like to take this opportunity to seek your guidance on whether the interpretation and handling of these results are correct. Particularly, given the occurrence of only 4 sequences in the "04.makefasta.ref.all_seqs.fna" file, is there a possibility of oversight or misunderstanding in the analysis process? Your professional guidance will play a crucial role in resolving this matter, and I sincerely appreciate your assistance once again.
Best regards
Kindly be informed that, for the ease of data upload, the file type has been modified to .txt.
04.makefasta.ref.all_seqs.txt
01.clusterseq.ref.txt
Hi all,
I would like to use MGEfinder to detect insertion sequence using the whole workflow. I prepared the dics (00.genome, 00.bam, 00.assembly)and files (sample.refer.bam etc) following the tutorial.
However, when I run mgefinder workflow /workdir
I get the error below. It would be highly appreciated if anyone could help with it. Thank you very much in advance.
Best,
Jason
Hi Sir,
I want to use the MGEfinder tool to find mobile genetic elements from assembled genome despite raw reads. How can I implement this program?
Hi,
As the title is shown, can MGEfinder be used for metagenomics data of infant gut microbiome? What problem do you expect if I use it? Thanks a lot!
Hi all
please, if anyone can help me
I am trying to run this code
(mgefinder) lololly-MBP MGEfinder % mgefinder formatbam s5.sam s5.bam
but gave me this error
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Traceback (most recent call last):
File "/Users/lololly/miniconda3/envs/mgefinder/bin/mgefinder", line 5, in
from mgefinder.main import cli
File "/Users/lololly/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 38, in
check_dependencies()
File "/Users/lololly/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/dependencies.py", line 39, in check_dependencies
bowtie2_checker.check(extract_version=lambda x: x.split()[2])
File "/Users/lololly/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/dependencies.py", line 12, in check
output = shell(cmd.format(tool=self.tool), read=True).decode('utf-8').strip()
File "/Users/lololly/miniconda3/envs/mgefinder/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'bowtie2 --version 2>&1' returned non-zero exit status 255.
I'm running MGEfinder on my own data and have used the pipeline described in the readme document. This is using a complete assembly called "Ancestor.fna", a sample assembly assembled using Unicycler labelled "48con5.fna", and the bam and ba.bai files made using bwa mem followed by the mgefinder formatbam. When I run it I get the error message:
Parsing inferseq files
Combining the inferseq files...
Loading file 1/3: workdir/01.mgefinder/Ancestor/48con5/03.inferseq_assembly.48con5.Ancestor.tsv
Loading file 2/3: workdir/01.mgefinder/Ancestor/48con5/03.inferseq_reference.48con5.Ancestor.tsv
Loading file 3/3: workdir/01.mgefinder/Ancestor/48con5/03.inferseq_overlap.48con5.Ancestor.tsv
Deleting old database directory...
No termini found in the input file...
Waiting at most 5 seconds for missing files.
Error in job make_database while creating output files workdir/02.database/Ancestor/Ancestor.database.fna, workdir/02.database/Ancestor/Ancestor.database.fna.1.bt2.
MissingOutputException in line 192 of /users/steg500/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Missing files after 5 seconds:
workdir/02.database/Ancestor/Ancestor.database.fna
workdir/02.database/Ancestor/Ancestor.database.fna.1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
I've tried increasing the latency wait time but it doesn't recognise the --latency-wait command when I run it with mgefinder workflow denovo. Do you have any ideas how I could fix this? Thank you!
Hi,
Upon recommendation from another group (Dr. Pamer's group at UChicago) I was interested in using MGEfinder to detect MGEs for a family of microbes. However, I ran into a couple of issues that I don't really know how to fix/where to start and I was hoping that you might be able to advise.
I started by running the tutorial to get a feel for the software and make sure it was running smoothly, and I was able to run the tutorial without any major problems. However, when I tried to run the program with my own files, I ran into some trouble.
I think that the main problem might be that I am trying to analyze pre-assembled reads that are already in .fna
format.
Below are the steps I followed, and where I ran into problems:
MSK_4_13_contigs.fna
and the reference genome was GCF_00373885.fna
(Blautia producta downloaded from NCBI)% bwa index GCF_00373885.fna
% bwa mem GCF_000373885.fna MSK_4_13_contigs.fna > MSK_4_13_contigs.GCF_000373885.sam
mgefinder
environment and ran% mgefinder formatbam MSK_4_13_contigs.GCF_000373885.sam MSK_4_13_contigs.GCF_000373885.bam --single-end
Using the --single-end
command since I wasn't using paired reads. And I received the output:
Removing secondary alignments...
Successfully removed secondary alignments...
Sorting the BAM file by chromosomal location...
BAM file successfully sorted...
Index the sorted BAM file...
BAM file successfully indexed...
test_workdir0/
├── 00.assembly/
│ ├── MSK_4_13_contigs.fna
├── 00.bam/
│ ├── MSK_4_13_contigs.GCF_000373885.bam
│ ├── MSK_4_13_contigs.GCF_000373885.bam.bai
└── 00.genome/
└── GCF_000373885.fna
workflow
command in the mgefinder environment as follows% mgefinder workflow --cores 4 --memory 50000 test_workdir0/
And at the makedatabase
step, the following message appears:
#### CHECKING DEPENDENCIES ####
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
#### PARAMETERS ####
command: makedatabase
inferseqfiles: ('test_workdir2/01.mgefinder/GCF_000373885/GCF_000373885.all_inferseq.txt',)
minimum_size: 30
maximum_size: 200000
threads: 1
memory: 16000
force: True
output_dir: test_workdir2/02.database/GCF_000373885
prefix: GCF_000373885.database
####################
Parsing inferseq files
Combining the inferseq files...
Loading file 1/3: test_workdir2/01.mgefinder/GCF_000373885/MSK_4_13_contigs/03.inferseq_assembly.MSK_4_13_contigs.GCF_000373885.tsv
Loading file 2/3: test_workdir2/01.mgefinder/GCF_000373885/MSK_4_13_contigs/03.inferseq_reference.MSK_4_13_contigs.GCF_000373885.tsv
Loading file 3/3: test_workdir2/01.mgefinder/GCF_000373885/MSK_4_13_contigs/03.inferseq_overlap.MSK_4_13_contigs.GCF_000373885.tsv
Deleting old database directory...
No termini found in the input file...
Waiting at most 5 seconds for missing files.
Error in job make_database while creating output files test_workdir2/02.database/GCF_000373885/GCF_000373885.database.fna, test_workdir2/02.database/GCF_000373885/GCF_000373885.database.fna.1.bt2.
MissingOutputException in line 186 of /Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile:
Missing files after 5 seconds:
test_workdir2/02.database/GCF_000373885/GCF_000373885.database.fna
test_workdir2/02.database/GCF_000373885/GCF_000373885.database.fna.1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Traceback (most recent call last):
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/bin/mgefinder", line 8, in <module>
sys.exit(cli())
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 50, in workflow
_workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going)
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow.py", line 26, in _workflow
shell(cmd)
File "/Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/snakemake/shell.py", line 88, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile --config wd=test_workdir2 memory=16000 --cores 1 --configfile /Users/arnoldj/opt/anaconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/config.yml' returned non-zero exit status 1.
When I try to open the files in 01.mgefinder/<genome>/
all the .tsv
files are empty
Is there something that I am doing incorrectly, or is there another process that I should follow since my contigs are already pre-assembled?
I would love to be able to use MGEfinder, so any help would be very much appreciated! Thank you in advance!
Best regards,
Jack
In the response to question #30 (also referenced in the response to question #33) you mention that "'--filter-clusters-inferred-assembly'"... "removes clusters that were never identified from an assembly, meaning they were only found in the reference."
Does the term "reference" you use in the question #30 response refer only to the genome defined as the reference when the pipeline was originally run? Or in this case can the term "reference" in the question #30 refer to any single-genome-only cluster (i.e. any cluster only originally identified in a single genome?) I want to be sure I'm understanding this correctly. In my analysis, I'm using only assembled genomes (albeit, most are draft assemblies) and I'm seeking clarification on whether self-only clusters (i.e. clusters only originally identified in a single genome would be removed under the same rules as would be done with the explicitly defined reference used when the pipeline was run. Under these conditions, every genome assembly in turn might be construed as a "reference" for the purposes of filtering as explained in the response to question #30.
Hello, thanks for the wonderfull tool you have developed for exploring MGE!
i failed to use the Method 1 to install MGEfinder, it shows:
(base) lkj666@Cool:~/software/MGEfinder$ bash install.sh
Removing mgefinder environment if already installed...
Installing mgefinder environment...
Collecting package metadata (repodata.json): \ install.sh:行 13: 727048 killed conda env create -f env/conda_linux64.yaml
Installation Complete.
Before running mgefinder, activate the mgefinder environment with
> conda activate mgefinder
You can then run mgefinder by typing
> mgefinder [command]
Send any questions to [email protected]
can you give me some advices? thank you
Dear durrantmm
I am using MGEfinder to detect IS6110 of Mycobacterium tuberculosis, which has been long used for DNA fingerprinting of M.tb isolates. I finished analysis of MGEfinder for hundreds of M.tb isolates, and now I am encountering some difficulties to interpret results.
Though nearly all of M.tb strains should harbor IS6110, MGEfinder detects no IS6110 insertion among 5% of my tested strains. All of them belong to lineage 1 and 4.
I noticed that MGEfinder often detects a smaller number of IS6110 than that by another reliable IS-finding tool, implying MGEfinder lacks sensitivity in my condition.
Related to 1 and 2, let me confirm one point.
M.tb has hot-spot regions where IS6110 are inserted at identical positions frequently.
If IS6110 insertion points were shared between reference genome sequence and query strains, can MGEfinder detect those shared IS6110 ?
I know you are analyzing M.tb data in your published paper, and I am grateful if you could give us any suggestions to overcome these difficulties. For example, are there any recommended parameters to improve sensitivity? The size of IS6110 is about 1300 bp and I want to focus on IS6110 insertions in this time.
I always appreciate your kind support.
Many thanks.
Hi Matthew,
This is me again~ Hope everything is going well with you! Finally, we have some meaningful data to run through your software, but it is giving me some job pair error message.
I set up my folders as the following:
├── 00.assembly
│ ├── ST1_19.fna
│ ├── ST1_20.fna
│ └── ST1_6.fna
├── 00.bam
│ ├── ST1_19.ST1_12.bam
│ ├── ST1_19.ST1_12.bam.bai
│ ├── ST1_20.ST1_12.bam
│ ├── ST1_20.ST1_12.bam.bai
│ ├── ST1_6.ST1_12.bam
│ └── ST1_6.ST1_12.bam.bai
└── 00.genome
└── ST1_12.fna
The error message are the following. I hope it is not about how I named the isolates.
Error in job pair while creating output file workdir/01.mgefinder/ST1_12/ST1_6/02.pair.ST1_6.ST1_12.tsv.
RuleException:
CalledProcessError in line 106 of /home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile:
Command '
mgefinder pair -maxdr 20 -minq 20 -minial 21 -maxjsp 0.15 -lins 30 workdir/01.mgefinder/ST1_12/ST1_6/01.find.ST1_6.ST1_12.tsv workdir/00.bam/ST1_6.ST1_12.bam workdir/00.genome/ST1_12.fna -o workdir/01.mgefinder/ST1_12/ST1_6/02.pair.ST1_6.ST1_12.tsv &> workdir/01.mgefinder/ST1_12/ST1_6/log/ST1_6.ST1_12.pair.log
' returned non-zero exit status 1.
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile", line 106, in __rule_pair
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Finished job 28.
6 of 31 steps (19%) done
Will exit after finishing currently running jobs.
Finished job 29.
7 of 31 steps (23%) done
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Traceback (most recent call last):
File "/home/dfi_user/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in <module>
sys.exit(cli())
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 47, in workflow
_workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow.py", line 19, in _workflow
shell(cmd)
File "/home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/snakemake/shell.py", line 88, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile --config wd=workdir/ memory=16000 --cores 4 --configfile /home/dfi_user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/config.yml' returned non-zero exit status 1.
I hope those are helpful!
Thanks in advance. Let me know if you want a copy of the files to reproduce the error message.
Eddi
Hi Matthew,
I tried installing MGEfinder on Stanford's sherlock through conda and it seemed to finish successfully. However, the help command mgefinder —help results in the following error:
ModuleNotFoundError: No module named ‘pandas.compat'
I also tried installing a more up-to-date version of pandas, but still produces the same error. I figured you might be able to help me, considering that you probably used sherlock to develop the tool. Do you know what might be going on?
Thanks!
Hello,
I'm wondering if it's possible to use as entry point for analysis an assembled genome.
Thanks,
Theo Dreher
Hi,
I am trying to use MGEfinder but I keep getting this error:
Error in job make_database while creating output files sample5dir/02.database/v583_ncbi_genome/v583_ncbi_genome.database.fna, sample5dir/02.database/v583_ncbi_genome/v583_ncbi_genome.database.fna.1.bt2.
MissingOutputException in line 192 of /Users/duerkoplab/opt/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Missing files after 5 seconds:
sample5dir/02.database/v583_ncbi_genome/v583_ncbi_genome.database.fna
sample5dir/02.database/v583_ncbi_genome/v583_ncbi_genome.database.fna.1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
I can't find anything in the manual about --latency-wait. Can you please advise on what I should do?
Thanks!
Hi all,
I was wondering if MGEfinder can be used in samples that are not fully clonal? e.g. have you tested whether MGEfinder can detect MGEs with different levels of "purity" of the sample?
Specifically, my use case would be data from experimental evolution carried in a natural environment, where populations of a given strain are sequenced after a given time of evolution. To be more specific, the design is similar to this, where mice (that contain a microbiota) are colonized with a bacterial clone, which is allowed to evolve within different mice for a specific period. After this period, faecal samples are plated in selective media for the clone of interest and all colonies growing on a plate are scrapped and sequenced, which is where it differs from the standard MGEfinder use case.
Multiple experiments have shown that bacteria readily adapt and we can detect SNPs/ISs, etc. Though, as this environment contains multiple species, HGT is a possibility. So the data contains substantial microvariation, but the genomes constituting these populations are not as different as if we were to sample different clones from different people.
One thing I worry is if the SPADES assembly step will basically lead to a consensus sequence that ignores the microvariation present in the population. Therefore, I was wondering if it is possible to use metaSPADES for the assembly process (assuming this would allow us to keep more of that microvariation)?
PS. sorry for the rambling post
Hello!
I'm interested in using mgefinder on our datasets and followed instructions to install through conda per the guide. I downloaded and extracted the test_workdir files as instructed. I set the environment appropriately for mgefinder in conda and invoked the following command:
$ mgefinder workflow --cores 20 --memory 100000 test_workdir/
However, it appears to have crashed with the following error:
Traceback (most recent call last):
File "/home/user/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in
sys.exit(cli())
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 251, in genotype
_genotype(clusterseq, pairfiles, filter_clusters_inferred_assembly, output_file)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 37, in _genotype
genotypes = genotyper.genotype()
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 106, in genotype
genotypes = self.resolve_ambiguous_genotypes(genotypes)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 224, in resolve_ambiguous_genotypes
unresolved, cluster_counts_per_site
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 322, in resolve_all_sample_comparison
resolved = (pd.merge(unresolved, cluster_counts, how='inner', on=['contig', 'pos_5p', 'pos_3p', 'cluster']).
File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 61, in merge
validate=validate)
File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 555, in init
self._maybe_coerce_merge_keys()
File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 986, in _maybe_coerce_merge_keys
raise ValueError(msg)
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
Error in job genotype while creating output file test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv.
RuleException:
CalledProcessError in line 286 of /home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile:
Command '
if [ "True" == "True" ]; then
mgefinder genotype --filter-clusters-inferred-assembly test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt -o test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv 1> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log 2> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err || (cat test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err; exit 1)
else
mgefinder genotype --no-filter-clusters-inferred-assembly test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt -o test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv 1> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log 2> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err || (cat test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err; exit 1)
fi
' returned non-zero exit status 1.
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile", line 286, in __rule_genotype
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Traceback (most recent call last):
File "/home/user/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in
sys.exit(cli())
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 51, in workflow
_workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow.py", line 26, in _workflow
shell(cmd)
File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile --config wd=test_workdir/ memory=16000 --cores 20 --configfile /home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/config.yml ' returned non-zero exit status 1.
Obviously, would like to get the test dataset to run appropriately before trying on our own data. Most likely in my experience this is something simple but my relative inexperience leaves me baffled at this time.
Any suggestions are most welcome.
Tony
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.