Code Monkey home page Code Monkey logo

mob-suite's Introduction

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies

Introduction

Plasmids are mobile genetic elements (MGEs), which allow for rapid evolution and adaption of bacteria to new niches through horizontal transmission of novel traits to different genetic backgrounds. The MOB-suite is designed to be a modular set of tools for the typing and reconstruction of plasmid sequences from WGS assemblies.

The MOB-suite depends on a series of databases which are too large to be hosted in git-hub. They can be downloaded or updated by running mob_init or if running any of the tools for the first time, the databases will download and initialize automatically if you do not specify an alternate database location. However, they are quite large so the first run will take a long time depending on your connection and speed of your computer. Databases can be manually downloaded from here.
Our new automatic chromosome depletion feature in MOB-recon can be based on any collection of closed chromosome sequences.

Citations

Below are the manuscripts describing the algorithmic approaches used in the MOB-suite.

  1. Robertson, James, and John H E Nash. “MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies.” Microbial genomics vol. 4,8 (2018): e000206. doi:10.1099/mgen.0.000206

  2. Robertson, James et al. “Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance.” Microbial genomics vol. 6,10 (2020): mgen000435. doi:10.1099/mgen.0.000435

MOB-init

On first run of MOB-typer or MOB-recon, MOB-init (invoked by mob_init command) should run to download the databases from figshare, sketch the databases and setup the blast databases. However, it can be run manually if the databases need to be re-initialized OR if you want to initialize the databases in an alternative directory.

MOB-cluster

This tool creates plasmid similarity groups using fast genomic distance estimation using Mash. Plasmids are grouped into clusters using complete-linkage clustering and the cluster code accessions provided by the tool provide an approximation of operational taxonomic units OTU’s. The plasmid nomenclature is designed to group highly similar plasmids together which are unlikely to have multiple representatives within a single cell and have a strong concordance with replicon and relaxase typing but is universally applicable since it uses the complete sequence of the plasmid itself rather than specific biomarkers.

MOB-recon

This tool reconstructs individual plasmid sequences from draft genome assemblies using the clustered plasmid reference databases provided by MOB-cluster. It will also automatically provide the full typing information provided by MOB-typer. It optionally can use a chromosome depletion strategy based on closed genomes or user supplied filter of sequences to ignore.

MOB-typer

Provides in silico predictions of the replicon family, relaxase type, mate-pair formation type and predicted transferability of the plasmid. Using a combination of biomarkers and MOB-cluster codes, it will also provide an observed host-range of your plasmid based on its replicon, relaxase and cluster assignment. This is combined with information mined from the literature to provide a prediction of the taxonomic rank at which the plasmid is likely to be stably maintained but it does not provide source attribution predictions.

Installation

Requires

  • Python >= 3.7
  • ete3 >= 3.1.3 (due to updated taxonomy database init)
  • pandas >= 0.22.0,<=1.05
  • biopython >= 1.80,<2
  • pytables >= 3.3
  • pycurl >= 7.43
  • numpy >= 1.11.1
  • scipy >= 1.1.0
  • six >= 1.10

Dependencies

  • blast+ v. 2.3.0
  • mash v. 2.0

Installation

We recommend MOB-Suite installation as a conda package due to large number of dependencies. The package is available through bioconda channel.

% conda config --add channels defaults
% conda config --add channels conda-forge
% conda config --add channels bioconda
% conda install -c bioconda mob_suite

Pip

We recommend installing MOB-Suite via bioconda but you can install it via pip using the command below

% pip3 install mob_suite

Source

To build from source code directly on Ubuntu Linux distro, follow these commands that include Python libraries and other dependencies install

apt update && apt install python3-pip #installs gcc compiler for pycurl
apt install libcurl4-openssl-dev libssl-dev #for pycurl
pip3 install Cython
apt install mash ncbi-blast+
python3 setup.py install && mob_init #to install and init databases

Docker image

A docker images are also available at https://hub.docker.com/r/kbessonov/mob_suite and at https://quay.io/repository/biocontainers/mob_suite

% latest_tag=$(curl -H "Authorization: Bearer X" -X GET "https://quay.io/api/v1/repository/biocontainers/mob_suite/tag/" | jq .tags[].name | head -1 | sed -e 's|\"||g')
% docker pull quay.io/biocontainers/mob_suite:${latest_tag} 
% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:${latest_tag}" mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output

Singularity image

A singularity image could be built locally via Singularity recipe donated by Eric Deveaud or pulled from one of the repositories.

The recipe (recipe.singularity) is located in the singularity folder of this repository and installs MOB-Suite via conda.

% singularity build mobsuite.simg recipe.singularity

As the simplest alternative, Singularity image can be pulled from BioContainers repository where <version> is the desired version (e.g. 3.0.3--py_0) or Quay.io or Docker Hub repositories.

The MOB-Suite image (mob_suite.sif) will be generated using one of the following 3 methods.Next MOB-Suite tools could be run a on mounted directory via --bind like so singularity run mob_suite.sif --bind $PWD:/mnt mob_recon -i /mnt/<input_fasta> -o /mnt/<output_directory>

# Method 1
% singularity pull mob_suite.sif https://depot.galaxyproject.org/singularity/mob_suite:<version>
# Method 2
% singularity pull mob_suite.sif docker://kbessonov/mob_suite:3.0.3 #or for the latest version
# Method 3 - recommended
% latest_version=$(curl -H "Authorization: Bearer X" -X GET "https://quay.io/api/v1/repository/biocontainers/mob_suite/tag/" | jq .tags[].name | head -1 | sed -e 's|\"||g')
% singularity pull mob_suite.sif docker://quay.io/biocontainers/mob_suite:${latest_version}

Using MOB-typer to perform replicon and relaxase typing of complete plasmids and to predict mobility and replicative plasmid host-range

Setuptools

Clone this repository and install via setuptools.

% git clone https://github.com/phac-nml/mob-suite.git
% cd mob-suite
% python setup.py install

MGE detection

As of v. 3.1.0, MOB-recon and MOB-typer can report the blast HSP of the repetive mask file for IS/TN and other MGE elements in a new report file called mge.report.txt which will report blast hits from both chromosome and plasmid contigs. The MGE report is generated by default in MOB-recon and can be toggled on in MOB-typer by using the '--mge_report_file' parameter and specifying an output file. This is a very naieve implementation of detecting MGE's and further work will improve the utility for users based on feedback.

Using MOB-typer to perform replicon and relaxase typing of complete plasmids and predict mobility

You can perform plasmid typing using a fasta formated file containing a single plasmid represented by one or more contigs or it can treat all of the sequences in the fasta file as independent. The default behaviour is to treat all sequences in a file as from one plasmid, so do not include multiple unrelated plasmids in the file without specifying --multi as they will be treated as a single plasmid.

# Single plasmid
% mob_typer --infile assembly.fasta --out_file sample_mobtyper_results.txt

# Multiple independant plasmids
% mob_typer --multi --infile assembly.fasta --out_file sample_mobtyper_results.txt

Using MOB-recon to reconstruct plasmids from draft assemblies

This procedure works with draft or complete genomes and is agnostic of assembler choice but if unicycler is used, then the circularity information can be parsed directly from the header of the unmodified assembly using -u . MOB-typing information is automatically generated for all plasmids reconstructed by MOB-recon.

### Basic Mode
% mob_recon --infile assembly.fasta --outdir my_out_dir

As of v. 3.0.0, we have added the ability of users to provide their own specific set of sequences to remove from plasmid reconstruction. This should be performed with caution and with the knowledge of your organism. Filtering of sequences which are frequently of plasmid origin but are not in your organism is the primary use case we envision for this feature.

### User sequence mask
% mob_recon --infile assembly.fasta --outdir my_out_dir --filter_db filter.fasta

As of v. 3.0.0, we have provided the ability to use a collection of closed genomes which will be quickly checked using Mash for genomes which are genetically close and limit blast searches to those chromosomes. This more nuanced and automatic approach is recommended for users where there are sequences which should be filtered in one genomic context but not another. We provide as an optional download a set of closed Enterobacteriacea genomes from NCBI which can be used to provide added accuracy for some organisms such as E. coli and Klebsiella where there are sequences which switch between chromosome and plasmids.

If reconstructed plasmids exceed the Mash distance for primary cluster assignment, then they will be assigned a name in the format novel_{md5} where the md5 hash is calculated based on all of the sequences belonging to that reconstructed plasmid. This will provide a unique name for the plasmids but any change will result in a corresponding change in the md5 hash. It is therefore not advised to use these assigned names for further analyses. Rather they should be highlighted as cases where targeted long read sequencing is required to obtain a closer database representative of that plasmid.

### Autodetected close genome filter
% mob_recon --infile assembly.fasta --outdir my_out_dir -g 2019-11-NCBI-Enterobacteriacea-Chromosomes.fasta

Using MOB-cluster

Use this tool only to update the plasmid databases or build a new one, however MOB-cluster should only be run with closed high quality plasmids. If you add in poor quality data it can severely impact MOB-recon. As of v3.0.0, MOB-cluster has been re-written to utilize the output from MOB-typer to greatly speed up the process of updating and building plasmid databases by using pre-computed results. Clusters generated from earlier versions of MOB-suite are not compatible with the new clusters. We have provided a mapping file of previous cluster assignments and their new cluster accessions. Each cluster code is unique and will not be re-used.

### Build a new database
% mob_cluster --mode build -f new_plasmids.fasta -p new_plasmids_mobtyper_report.txt -t new_plasmids_host_taxonomy.txt --outdir output_directory
### Add a sequence to an existing database
% mob_cluster --mode update -f new_plasmids.fasta -p new_plasmids_mobtyper_report.txt -t new_plasmids_host_taxonomy.txt --outdir output_directory -c existing_clusters.txt -r existing_sequences.fasta
### Update MOB-suite plasmid databases
% cp output_directory/clusters.txt
% mv output_directory/updated.fasta mob_db_path/ncbi_plasmid_full_seqs.fas
% makeblastdb -in mob_db_path/ncbi_plasmid_full_seqs.fas -dbtype nucl
% mash sketch -i mob_db_path/ncbi_plasmid_full_seqs.fas 

Output files

file Description
contig_report.txt This file describes the assignment of the contig to chromosome or a particular plasmid grouping
mge.report.txt Blast HSP of detected MGE's/repetitive elements with contextual information
chromosome.fasta Fasta file of all contigs found to belong to the chromosome
plasmid_(X).fasta Each plasmid group is written to an individual fasta file which contains the assigned contigs
mobtyper_results Aggregate MOB-typer report files for all identified plasmid

MOB-recon contig report format

field Description
sample_id Sample ID specified by user or default to filename
molecule_type Plasmid or Chromosome
primary_cluster_id primary MOB-cluster id of neighbor
secondary_cluster_id secondary MOB-cluster id of neighbor
size Length in base pairs
gc GC %
md5 md5 hash
circularity_status Molecule is either circular, incomplete or not tested based on parameters used
rep_type(s) Replion type(s)
rep_type_accession(s) Replicon sequence accession(s)
relaxase_type(s) Relaxase type(s)
relaxase_type_accession(s) Relaxase sequence accession(s)
mpf_type Mate-Pair formation type
mpf_type_accession(s) Mate-Pair formation sequence accession(s)
orit_type(s) Origin of transfer type
orit_accession(s) Origin of transfer sequence accession(s)
predicted_mobility Mobility prediction for the plasmid (Conjugative, Mobilizable, Non-mobilizable)
mash_nearest_neighbor Accession of closest plasmid database match
mash_neighbor_distance Mash distance from query to match
mash_neighbor_identification Host taxonomy of the plasmid database match
repetitive_dna_id Repetitive DNA match id
repetitive_dna_type Repetitive element class

MOB-typer report file format

field Description
sample_id Sample ID specified by user or default to filename
num_contigs Number of sequences belonging to plasmid
size Length in base pairs
gc GC %
md5 md5 hash
rep_type(s) Replicon type(s)
rep_type_accession(s) Replicon sequence accession(s)
relaxase_type(s) Relaxase type(s)
relaxase_type_accession(s) Relaxase sequence accession(s)
mpf_type Mate-Pair formation type
mpf_type_accession(s) Mate-Pair formation sequence accession(s)
orit_type(s) Origin of transfer type
orit_accession(s) Origin of transfer sequence accession(s)
predicted_mobility Mobility prediction for the plasmid (Conjugative, Mobilizable, Non-mobilizable)
mash_nearest_neighbor Accession of closest plasmid database match
mash_neighbor_distance Mash distance from query to match
mash_neighbor_identification Host taxonomy of the plasmid database match
primary_cluster_id primary MOB-cluster id of neighbor
secondary_cluster_id secondary MOB-cluster id of neighbor
predicted_host_range_overall_rank Taxon rank of convergence between observed and reported host ranges
predicted_host_range_overall_name Taxon name of convergence between observed and reported host ranges
observed_host_range_ncbi_rank Taxon rank of convergence of plasmids in MOB-suite plasmid DB
observed_host_range_ncbi_name Taxon name of convergence of plasmids in MOB-suite plasmid DB
reported_host_range_lit_rank Taxon rank of convergence of literature reported host ranges
reported_host_range_lit_name Taxon name of convergence of literature reported host ranges
associated_pmid(s) PubMed ID(s) associated with records

MOB-cluster sequence cluster information file

field Description
sample_id Sample ID specified by user or default to filename
size Length in base pairs
gc GC %
md5 md5 hash
organism Host taxon name
taxid Host NCBI taxon id
rep_type(s) Replion type(s)
rep_type_accession(s) Replicon sequence accession(s)
relaxase_type(s) Relaxase type(s)
relaxase_type_accession(s) Relaxase sequence accession(s)
mpf_type Mate-Pair formation type
mpf_type_accession(s) Mate-Pair formation sequence accession(s)
orit_type(s) Origin of transfer type
orit_accession(s) Origin of transfer sequence accession(s)
predicted_mobility Mobility prediction for the plasmid (Conjugative, Mobilizable, Non-mobilizable)
primary_cluster_id primary MOB-cluster id of plasmid
primary_dist primary MOB-cluster distance cutoff to generate cluster
secondary_cluster_id secondary MOB-cluster id of plasmid
secondary_dist secondary MOB-cluster distance cutoff to generate cluster

MGE Report

field Description
sample_id Sample ID specified by user or default to filename
molecule_type Plasmid or Chromosome
primary_cluster_id primary MOB-cluster id of neighbor
secondary_cluster_id secondary MOB-cluster id of neighbor
contig_id Sequence Identifier
size Length in base pairs
gc GC %
md5 md5 hash
mge_id Unique numeric id
mge_acs GenBank Accession of MGE
mge_type Primary type of the MGE
mge_subtype Subtype of the MGE
mge_length Length of the MGE query
mge_start HSP start of MGE query
mge_end HSP end of MGE query
contig_start HSP start on contig
contig_end HSP end on contig
length Length of HSP
sstrand Stand of HSP
qcovhsp Query coverage of HSP
pident Sequence identity of HSP
evalue Sequence evalue of HSP
bitscore Sequence bitscore of HSP

blast report file format

field name description
qseqid query sequence id
sseqid subject sequence id
qlen query length
slen subject length
qstart match start query
qend match end query
sstart match subject start
send match subject end
length length of alignment
mismatch number of mismatches
pident identity
qcovhsp query coverage by hsp
qcovs query coverage by subject
sstrand strad of hit in subject
evalue evalue of match
bitscore bitscore of match

Contact

James Robertson - [email protected]
Kyrylo Bessonov - [email protected]

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

mob-suite's People

Contributors

cbrueffer avatar dfornika avatar dorbarker avatar jamespocalypse avatar jchorl avatar jrober84 avatar kbessonov1984 avatar lowandrew avatar rpetit3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mob-suite's Issues

Multiple Mob_recon cannot run in concurrently.

When running hundreds of mob_recon concurrently in our Galaxy environment, we are getting the following error:

NCBI database format is outdated. Upgrading

Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...

Done. Parsing...

Traceback (most recent call last):

  File /Galaxy/deps/_conda/envs/[email protected]/lib/python3.6/site-packages/mob_suite/mob_typer.py", line 520, in <module>

    main()

  File /Galaxy/deps/_conda/envs/[email protected]/lib/python3.6/site-packages/mob_suite/mob_typer.py", line 366, in main

    matchtype="loose_match",hr_obs_data = loadHostRangeDB())

  File /Galaxy/deps/_conda/envs/[email protected]/lib/python3.6/site-packages/mob_suite/mob_host_range.py", line 527, in getRefSeqHostRange

    convergance_rank, converged_taxonomy_name = getHostRangeRankCovergence(ref_taxids)

  File /Galaxy/deps/_conda/envs/[email protected]/lib/python3.6/site-packages/mob_suite/mob_host_range.py", line 369, in getHostRangeRankCovergence

    ncbi = NCBITaxa(dbfile=ETE3DBTAXAFILE)

  File /Galaxy/deps/_conda/envs/[email protected]/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 120, in __init__

    self.update_taxonomy_database(taxdump_file)

  File /Galaxy/deps/_conda/envs/[email protected]/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database

    update_db(self.dbfile)

  File /Galaxy/deps/_conda/envs/[email protected]/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db

    upload_data(dbfile)

  File /Galaxy/deps/_conda/envs/[email protected]/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 791, in upload_data

    db.execute(cmd)

sqlite3.OperationalError: database is locked

Current solution is to limit the number of concurrent mob_recon jobs to ONE for all users.

galaxy - irida

Is this going into galaxy and irida (as a custom pipeline plugin)?

show-coords instructions

I am unable to use conda, so i wasn't sure what the show-coords step was trying to do.

% which show-coords 
using the path above as "conda-show-coords-path"
% ln -s conda-show-coords-path /usr/local/bin/show-coords

could be replaced as this on bash

% ln -s $(which show-coords) /usr/local/bin/show-coords

But it still assumes /usr/local/bin is in people's PATH?

It is not in my PATH, and isn't in most HPC centres.

So I'm not sure what to do.

I have show-coords in my PATH as I have mummer 3.x installed. Is that ok?

default cut-offs for mob_typer

Dear devs, thanks a lot for a very useful piece of software.

I've been browsing the documentation and paper but it seems I can't find what is the default threshold values for mob_typer similarity cut_offs (min seq. ident.; min. coverage & min evalue) for each database elements (ie. rep; mob; mpf; orit; etc)

Is this listed somewhere ?

Best regards,
Joseph.

mob_suite/databases/ is missing FASTA files inc. rep.dna.fas

pip3 install mob_typer

mob_typer -o mob1 -i NC_003197.fasta -n 18

ERROR:root:Error needed database missing "/home/linuxbrew/.linuxbrew/lib/python3.7/site-packages/mob_suite/databases/rep.dna.fas"

ls /home/linuxbrew/.linuxbrew/lib/python3.7/site-packages/mob_suite/databases/

__init__.py  __pycache__

The github folder is also empty.

Host range overall call

There should be an overall host range call which uses the NCBI taxonomy and the literature evidence to produce an overall call. I think that the overall call should be whichever of the two is less restrictive. Biologically it makes sense to set the maximum resolution to the family level since it is unlikely to have plasmids that can only replicate in a single species.

mob_init does not work

Hi,

I have been able to successfully download the database from the provided link. However, when I run the command %mob_init, I get the following error:

Traceback (most recent call last):
File "/usr/local/bin/mob_init", line 7, in
from mob_suite.mob_init import main
File "/usr/local/lib/python3.7/site-packages/mob_suite/mob_init.py", line 3, in
import os, pycurl, tarfile, zipfile, gzip, multiprocessing, sys
ImportError: dlopen(/usr/local/lib/python3.7/site-packages/pycurl.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libcrypto.1.1.dylib
Referenced from: /usr/local/lib/python3.7/site-packages/pycurl.cpython-37m-darwin.so
Reason: image not found

I am not sure what to do with this issue.

Thank you in advance for your help.

Regards,
Pablo

Help with interpreting results

I ran on E.faecium with 3 plasmids, that are in Refseq.

LOCUS       NC_017022            2955294 bp    DNA     circular CON 23-MAR-2017
LOCUS       NC_017032              56520 bp    DNA     circular CON 23-MAR-2017
LOCUS       NC_017023               3847 bp    DNA     circular CON 23-MAR-2017
LOCUS       NC_017024               4119 bp    DNA     circular CON 23-MAR-2017

1   file_id                     0085.gbk.fasta
2   num_contigs                 4
3   total_length                3019780
4   gc                          38.30749259879859
5   rep_type(s)                 rep_cluster_1742,rep_cluster_1077,rep_cluster_935,rep_cluster_889
6   rep_type_accession(s)       001162__CP011831_00015,000325__NC_017023_00001,002420__CP003354_00001,002369__NC_021995_00001
7   relaxase_type(s)            MOBP,MOBV,MOBV,MOBP
8   relaxase_type_accession(s)  NC_019240_00008,NC_017023_00004,NC_017024_00005,NC_021995_00038
9   mpf_type                    MPF_T
10  mpf_type_accession(s)       NC_011642_00014,NC_020208_00096
11  orit_type(s)                -
12  orit_accession(s)           -
13  PredictedMobility           Conjugative
14  mash_nearest_neighbor       CP003352
15  mash_neighbor_distance      0.151946
16  mash_neighbor_cluster       1642

I am not sure how to interpret the results.

How do i know which contigs it found things on?
How to interconnect the comma separated rows?
Is there only ever 2 lines of output? If so, can you transpose the results?

ValueError: 1 taxid not found

Thank you very much for your kind instruction
on the installation of mob_suite.

I've installed mob_suite in a server, but the following
test command caused an error of "ValueError: 1 taxid not found".
Could you let me know how to fix it?

(mob_suite) [k-yahara@gc016 plasmid-jp_Ohsumi]$ mob_typer
usage: mob_typer [-h] -i INFILE -o OUT_FILE [-a ANALYSIS_DIR] [-n NUM_THREADS] [-s SAMPLE_ID] [-f] [-x] [--min_rep_evalue MIN_REP_EVALUE]
[--min_mob_evalue MIN_MOB_EVALUE] [--min_con_evalue MIN_CON_EVALUE] [--min_length MIN_LENGTH] [--min_rep_ident MIN_REP_IDENT]
[--min_mob_ident MIN_MOB_IDENT] [--min_con_ident MIN_CON_IDENT] [--min_rep_cov MIN_REP_COV] [--min_mob_cov MIN_MOB_COV]
[--min_con_cov MIN_CON_COV] [--min_overlap MIN_OVERLAP] [-k] [--debug] [--plasmid_mash_db PLASMID_MASH_DB] [-m PLASMID_META]
[--plasmid_db_type PLASMID_DB_TYPE] [--plasmid_replicons PLASMID_REPLICONS] [--repetitive_mask REPETITIVE_MASK]
[--plasmid_mob PLASMID_MOB] [--plasmid_mpf PLASMID_MPF] [--plasmid_orit PLASMID_ORIT] [-d DATABASE_DIRECTORY]
[--primary_cluster_dist PRIMARY_CLUSTER_DIST] [--secondary_cluster_dist SECONDARY_CLUSTER_DIST] [-V]

$ mob_typer --infile test_rep.fas --out_file test_rep.mob_typer.out
2020-08-30 06:25:41,501 mob_suite.mob_typer INFO: Running Mob-typer version 3.0.0 [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_typer.py:163]
2020-08-30 06:25:41,501 mob_suite.mob_typer INFO: Processing fasta file test_plasmid.fas [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_typer.py:165]
2020-08-30 06:25:41,502 mob_suite.mob_typer INFO: SUCCESS: Found program blastn at /home/k-yahara/miniconda3/bin/blastn [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:571]
2020-08-30 06:25:41,502 mob_suite.mob_typer INFO: SUCCESS: Found program makeblastdb at /home/k-yahara/miniconda3/bin/makeblastdb [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:571]
2020-08-30 06:25:41,502 mob_suite.mob_typer INFO: SUCCESS: Found program tblastn at /home/k-yahara/miniconda3/bin/tblastn [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:571]
2020-08-30 06:25:41,502 root INFO: Creating Lock file /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases/ETE3_DB.lock [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:438]
2020-08-30 06:25:41,503 root INFO: Testing ETE3 taxonomy db /yshare2/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases/taxa.sqlite [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:441]
Traceback (most recent call last):
File "/home/k-yahara/miniconda3/bin/mob_typer", line 10, in
sys.exit(main())
File "/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_typer.py", line 316, in main
dbstatus = ETE3_db_status_check(1, ETE3_LOCK_FILE, ETE3DBTAXAFILE, logging)
File "/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py", line 444, in ETE3_db_status_check
lineage = ncbi.get_lineage(taxid)
File "/home/k-yahara/miniconda3/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 238, in get_lineage
raise ValueError("%s taxid not found" %taxid)
ValueError: 1 taxid not found

The input file test_rep.fas is as follows:
>test_rep
TTGAAAAAAATATGTGTACTTATGAAGAAGGAACTTGTTGTCAAAGACAATGCACTAATAAATGCCAGTTATAATTTAGACCTTTCAGAACAACGTCTAATATTGTTAGCAATCCTTGAAGCTAGACAATCAAACACACCCAATGATAAAGATTTAACAATTCATGCTGAAAGCTATATCAACCATTTTAACGTTCATAGAAATACAGCCTATAAAGTCCTTAAAGATGCATGTAAGAGTCTATTTGATCGTAGATTCAGCTATCAAAAACTAACTCAGAAGGGCAACATTGAAAATGTAATAAGCCGATGGGTACAACGCATATCTTATGTTGAGAATGAAGCTCTTGTTCGTATTAAGTTTTCTGATGATGTTGTACCGTTGATTACAAACTTAGAAAAACACTTCACCAGTTATGAATTAGAACAAGTCAGTAGTTTAACCAGTGTTTACGCTATACGCTTATATGAATTGCTTATTGCATGGCGTAGTACTGGTAAAGTCATTTTGGTAGAGCTAGAAGAACTTAGATTAAAACTAGGTATAGAATCCCATGAATATAAGAGAATGGGGCAATTTAAAGAAAAAGTTTTACACCTTGCTATTGATCAAATAAACAAATACACCGATATAAAAGCAGAGTATGAACAACACAAACGTGGCCGTTCGATTATTGGCTTTTCATTTAAGTTTAAACAGAAACAACAACCCCAAAAAGCAGATTCCAAGCGAGCCCCTAACACCCCAGACTTCTTTGTCAAAATGACCGATGCACAACGCCATCTATTCGCCAATAAAATGTCTGAGATGCCTGAAATGAGCAAATATTCACAAGGCACAGAAAGCTATCAACAGTTTGCTATCCGTATCGCTGACATGCTTTTAGAGCCTGAAAAGTTTAGAGAGCTTTATCCAATCTTAGAAAAAGCAGGGTTTAAAGGTTAA

Many thanks again.

Koji Yahara

MOB-host range module error in mean transfer rate

When a plasmid matches a set of literature with a mixture of transfer rates and no transfer rates, then any numerical operations on the list will cause the program to fail. I have added numerical checking to mean. as well as a filter to literature_knowledge["TransferRate"] to only have numerical values. This is addressed in commits: 69b833d..85fdf75 .

Traceback (most recent call last):

File "/home/brovervv/miniconda3/envs/mob_suite/lib/python3.6/site-packages/mob_suite/mob_typer.py", line 514, in

main()

File "/home/brovervv/miniconda3/envs/mob_suite/lib/python3.6/site-packages/mob_suite/mob_typer.py", line 395, in main

literature_hr_report=host_range_literature_report_df)

File "/home/brovervv/miniconda3/envs/mob_suite/lib/python3.6/site-packages/mob_suite/mob_host_range.py", line 612, in writeOutHostRangeReports

literature_hr_report = collapseLiteratureReport(literature_hr_report) #collapse report

File "/home/brovervv/miniconda3/envs/mob_suite/lib/python3.6/site-packages/mob_suite/mob_host_range.py", line 309, in collapseLiteratureReport

collapsedlitdf.loc[0,field] = mean(df[df[field].isna() == False][field].values)

File "/home/brovervv/miniconda3/envs/mob_suite/lib/python3.6/site-packages/mob_suite/mob_host_range.py", line 342, in mean

return float(sum(numbers)) / max(len(numbers), 1)

TypeError: unsupported operand type(s) for +: 'float' and 'str'

run-typer not working

Thank you for your extremely useful software. I have run it on several genomes using mob_recon and --run_typer and successfully generated all the reports. However now the run_typer part seems to have stopped working. The individual reports are not generated, and the aggregated report is generated but only has a header line. It may be coincidence but this started after I ran mob-recon with a for loop and used multiple threads. I have uninstalled and re-installed.
I am calling;
mob_recon --infile file.fasta --outdir mob_plasmids_file --run_typer
Thanks

The plasmid replicon database of MOB-typer

We would like to refer to nucleotide sequences contained in the plasmid replicon library of MOB-typer. Is there a good way to do that? Also, do you know what conditions (identity and coverage) MOB-typer detects the target sequence?

taxadump location ?

Hello,

on out cluster, compute nodes does not have network acces to outside
after generating a singularity container with https://github.com/phac-nml/mob-suite/blob/master/mob_suite/singularity/recipe.singularity

we have the following problem.

2019-11-05 14:31:52,788 DEBUG: Found 79 records (with duplicates) in the reference database.(37 unique and 42 duplicated) [in /opt/miniconda/envs/py36/lib/python3.6/site-packages/mob_suite/mob_host_range.py:456]

NCBI database not present yet (first time used?)

Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...

Traceback (most recent call last):

[SNIP long traceback]

  File "/opt/miniconda/envs/py36/lib/python3.6/socket.py", line 713, in create_connection

    sock.connect(sa)

OSError: [Errno 101] Network is unreachable

I try to dig on that problem, a I noticed that mob-init does not install taxdump.tar.gz
here is the content of dtaabase directory

module load singularity
singularity shell mob-suite-2.0.1.simg:~/eric> ls /opt/miniconda/envs/py36/lib/python3.6/site-packages/mob_suite/databases/
__init__.py				   mpf.proteins.faa		   rep.dna.fas
__pycache__				   ncbi_plasmid_full_seqs.fas	   repetitive.dna.fas
host_range_literature_plasmidDB.csv	   ncbi_plasmid_full_seqs.fas.msh  repetitive.dna.fas.nhr
host_range_ncbirefseq_plasmidDB.csv	   ncbi_plasmid_full_seqs.fas.nhr  repetitive.dna.fas.nin
literature_mined_plasmid_seq_db.fasta	   ncbi_plasmid_full_seqs.fas.nin  repetitive.dna.fas.nsq
literature_mined_plasmid_seq_db.fasta.msh  ncbi_plasmid_full_seqs.fas.nsq  status.txt
mob.proteins.faa			   orit.fas

building the image we see that taxdump.tar.gz is downloaded in /
see:

Singularity mob-suite-2.0.1.simg:/> ls /taxdump.tar.gz 
/taxdump.tar.gz

what must be the correct location for taxdump.tar.gz

i have a issure regarding setup.py

"""
setup.py: Driving code for MLE function
Authors : mns
"""
from distutils.core import setup
from Cython.Build import cythonize
import numpy

setup(
name = "error_function",
ext_modules = cythonize("/content/drive/My Drive/try/LLSC-CNN/lsc-cnn-master/utils/mle_function/error_function.pyx/", include_path = [numpy.get_include()]),
)

error

ValueError Traceback (most recent call last)
in ()
9 setup(
10 name = "error_function",
---> 11 ext_modules = cythonize("/content/drive/My Drive/try/LLSC-CNN/lsc-cnn-master/utils/mle_function/error_function.pyx/", include_path = [numpy.get_include()]),
12 )
13

2 frames
/usr/local/lib/python3.6/dist-packages/Cython/Build/Dependencies.py in nonempty(it, error_msg)
112 yield value
113 if empty:
--> 114 raise ValueError(error_msg)
115
116

ValueError: '/content/drive/My Drive/try/LLSC-CNN/lsc-cnn-master/utils/mle_function/error_function.pyx/' doesn't match any files

Error in installing mob_suite-3.0.0 (either by conda or pip3) - pandas EmptyDataError issue

Dear developers,

Thank you for developing mob-suite and making it pubic.
I'm eager to utilize it.

However, I'm facing the following errors in in installing
mob_suite-3.0.0 (either by conda or pip3).

Could you give me an advise how to fix it?
Thank you very much in advance.

#######################################################################
Error by conda
resulting in "ImportError: cannot import name 'EmptyDataError' from 'pandas.io.common'
although manual execution of "from pandas.io.common import EmptyDataError"
in the python 3 console doesn't cause any error.
#######################################################################

$ which conda
~/miniconda3/bin/conda

$ python --version
Python 3.7.3

$ conda create -c bioconda -n mob_suite mob_suite
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
ERROR conda.core.link:_execute(700): An error occurred while installing package 'bioconda::mob_suite-3.0.0-py_1'.
Rolling back transaction: done

LinkError: post-link script failed for package bioconda::mob_suite-3.0.0-py_1
location of failed script: /home/k-yahara/miniconda3/envs/mob_suite/bin/.mob_suite-post-link.sh
==> script messages <==
<None>
==> script output <==
stdout:
stderr: Traceback (most recent call last):
  File "/home/k-yahara/miniconda3/envs/mob_suite/bin/mob_init", line 7, in <module>
    from mob_suite.mob_init import main
  File "/home/k-yahara/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/mob_init.py", line 8, in <module>
    from mob_suite.blast import BlastRunner
  File "/home/k-yahara/miniconda3/envs/mob_suite/lib/python3.8/site-packages/mob_suite/blast/__init__.py", line 9, in <module>
    from pandas.io.common import EmptyDataError
ImportError: cannot import name 'EmptyDataError' from 'pandas.io.common' (/home/k-yahara/miniconda3/envs/mob_suite/lib/python3.8/site-packages/pandas/io/common.py)

return code: 1

#######################################################################
Error by pip
#######################################################################

$ which pip
~/miniconda3/bin/pip

$ pip install --user mob_suite

g++ -c -pipe -O2 -fno-exceptions -Wall -W -D_REENTRANT -fPIC -DPy_LIMITED_API=0x03040000 -DSIP_PROTECTED_IS_PUBLIC -Dprotected=public -DQT_NO_EXCEPTIONS -DQT_NO_DEBUG -DQT_PLUGIN -DQT_MULTIMEDIA_LIB -DQT_GUI_LIB -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -I. -I.. -I/home/k-yahara/miniconda3/include/python3.7m -I/home/k-yahara/miniconda2/include/qt -I/home/k-yahara/miniconda2/include/qt/QtMultimedia -I/home/k-yahara/miniconda2/include/qt/QtGui -I/home/k-yahara/miniconda2/include/qt/QtNetwork -I/home/k-yahara/miniconda2/include/qt/QtCore -I. -I/home/k-yahara/miniconda2/mkspecs/linux-g++ -o sipQtMultimediaQCustomAudioRoleControl.o sipQtMultimediaQCustomAudioRoleControl.cpp
/tmp/pip-install-1px41fwf/pyqt5/sip/QtMultimedia/qcustomaudiorolecontrol.sip:26:10: fatal error: qcustomaudiorolecontrol.h: No such file or directory
 #include <qcustomaudiorolecontrol.h>
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:9092: sipQtMultimediaQCustomAudioRoleControl.o] Error 1
make[1]: Leaving directory '/tmp/tmpo14sga4w/QtMultimedia'
make: *** [Makefile:504: sub-QtMultimedia-make_first-ordered] Error 2
----------------------------------------

ERROR: Command "/home/k-yahara/miniconda3/bin/python /home/k-yahara/miniconda3/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp5gks5ebc" failed with error code 1 in /tmp/pip-install-1px41fwf/pyqt5

#####################################
Koji Yahara
Group Leader
Antimicrobial Resistance Research Center
National Institute of Infectious Diseases
Aoba-cho 4-2-1, Higashimurayama-shi, Tokyo
189-0002 Japan

ERROR: Mob_typer return code 1

Hello!

I'm running mob_recon on a Salmonella incomplete assembly:

mob_recon --infile ~/suganda_data/assemblies/contigs/SRR1556083_contigs_renamed.fasta --outdir ~/suganda_data/mob_suite/test -d /home/johnsont/public/mob_suite_dbs --run_typer

Without the --run_typer flag, everything seems to run perfectly. With the --run_typer flag, I receive the following error:

Traceback (most recent call last):

File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4410, in get_value

return libindex.get_value_at(s, key)

File "pandas/_libs/index.pyx", line 44, in pandas._libs.index.get_value_at

File "pandas/_libs/index.pyx", line 45, in pandas._libs.index.get_value_at

File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at

File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/panfs/roc/groups/1/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/mob_suite/mob_typer.py", line 513, in

main()

File "/panfs/roc/groups/1/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/mob_suite/mob_typer.py", line 373, in main

input_seq = args.infile )

File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/mob_suite/mob_host_range.py", line 89, in getLiteratureBasedHostRange

repliconsearchdict = findHitsInLiteratureDBbyReplicon(replicon_names,plasmid_lit_db)

File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/mob_suite/mob_host_range.py", line 59, in findHitsInLiteratureDBbyReplicon

db_hit_indices=[i for i in range(0, plasmid_lit_db.shape[0]) if plasmid_lit_db.iloc[i, :]["Replicon"] == replicon_name]

File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/mob_suite/mob_host_range.py", line 59, in

db_hit_indices=[i for i in range(0, plasmid_lit_db.shape[0]) if plasmid_lit_db.iloc[i, :]["Replicon"] == replicon_name]

File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/pandas/core/series.py", line 871, in getitem

result = self.index.get_value(self, key)

File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4418, in get_value

raise e1

File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4404, in get_value

return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))

File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value

File "pandas/_libs/index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value

File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc

File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item

File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Replicon'

2020-05-28 13:37:13,910 ERROR: Mob_typer return code 1 [in /home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/mob_suite/mob_recon.py:193]
Traceback (most recent call last):
File "/home/johnsont/millere/.conda/envs/mob_suit/bin/mob_recon", line 10, in
sys.exit(main())
File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/mob_suite/mob_recon.py", line 835, in main
database_dir=database_dir))
File "/home/johnsont/millere/.conda/envs/mob_suit/lib/python3.7/site-packages/mob_suite/mob_recon.py", line 194, in run_mob_typer
raise Exception("MOB_typer could not type {}".format(plasmid_file_abs_path))
Exception: MOB_typer could not type /home/johnsont/millere/suganda_data/mob_suite/test/plasmid_653.fasta

I'm not sure what the issue is, but I'm wondering if it has something to do with my version of python? The anaconda version that I originally tried to use could not solve the MOB-suite conda environment (python3/3.7.1_anaconda) so I installed python version 3.7.6 in my MOB-suite env first and then the conda package:

# load python3/3.7.1_anaconda
module load python3 

# create MOB-suite env
conda create --name mob_suit
source activate mob_suit
conda install python==3.7.6
conda install -c bioconda mob_suite

Any suggestions would be very much appreciated!

Many thanks!
Liz

Decouple identification from reconstruction databases

MOB-typer and recon should be able to optionally take a mash sketch for determining closest neighbor for ID and cluster assignment purposes only. This would allow users to have custom databases for their own private collections without needing to validate the MOB-recon reconstruction workflow every time they want to add a new plasmid for ID purposes.

Updating replicon database

Hello,

I am trying to update the replicon database (rep.dna.fas) with a new file having the merged repllicon databases of plasmidFinder and mob_suite. I recently read that there are some discrepancies between the two databases (https://is.gd/p95DMQ). However, when using the mob_typer I get the following error: "root ERROR: Specified column for biomarker type: 1 does not exist in biomarker id field:..." for all novel entries in the database.

Could you please help me with this issue?

Kind regards,
Ilias

Add sample label to outputs from MOB-typer and MOB-recon

Right now the file ID is all that is reported in either the contig report or MOB-typer reports. This makes aggregating results irritating since you need to add in your sample label by some other means. Adding a new command line parameter --sample_name and having that added as a column to each of the reports would solve this hassle. In the same vein, we should have a file name prefix flag for the fasta files written by MOB-recon as an option for users so that they have control over the output naming. This is not a high priority improvement but should be combined with issue#30

Fix issues with character encoding in non-UTF-8 environments

If LANG on the system is not UTF-8 and the output contains special characters, then mob_recon fails to read the mob_typer files with the following error 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 824: ordinal not in range(128)' . This does not occur when LC_ALL=UTF-8 is explicitly set for the environment. All MOB-suite functions which read data from disk should check character encoding and handle these cases.

using own mob_cluster db error

Hello!
I generated a mob_cluster database of closed plasmids of two clinical E. coli strains using the mob_cluster function. During this process I did not encouter any errors.
When I wanted to check if this database works with some of the assemblies that I used to generate it, I encoutered the following error:

Traceback (most recent call last): File "/home/lneffe/miniconda3/envs/mob-suite/bin/mob_recon", line 8, in <module> sys.exit(main()) File "/home/lneffe/miniconda3/envs/mob-suite/lib/python3.6/site-packages/mob_suite/mob_recon.py", line 580, in main (pcl_clusters,reference_sequence_hits,contig_hit_scores) = contig_blast_group(filtered_blast, min_overlapp) ValueError: not enough values to unpack (expected 3, got 0)

What could be the reason for this?

Best wishes,
Lisa

Output mash tree or equivalent?

Hi,

It's nice that mob-suite tells me the plasmid in the reference database which is closest to each plasmid it identifies in my sample, but it would be good if it output a tree of mash distances so I can see how it relates to multiple plasmids in the database. Possibly within some mash distance threshold so that I don't end up with a tree with 12000 tips.

Or even just output the mash distance matrix of my plasmid vs everything in teh database, so I can easily see how far it is from another plasmid of interest.

I can roll my own using mashtree, but others might find useful?

Just a thought, thanks for the nice tool.

Best,

Phil

only one instance will run at the same time

When submitting more than one instance of mob-recon, the following error is given:
ERROR:root:Error needed database missing "_conda/envs/[email protected]/lib/python3.6/site-packages/mob_suite/databases/repetitive.dna.fas.nin"

I think its because when you run repetitive_blast it is rebuilding the blast database every time even when using a built in database.

Installation in Galaxy failed

Hi, I was trying to install the mob-suite via Galaxy toolshed into my own Galaxy instance but I kept getting this error while conda is running.

ERROR conda.core.link:_execute(568): An error occurred while installing package 'bioconda::mob_suite-3.0.0-py_1'.
LinkError: post-link script failed for package bioconda::mob_suite-3.0.0-py_1
running your command again with -v will provide additional information
location of failed script: /home/phemarajata/galaxy/database/dependencies/_conda/envs/[email protected]/bin/.mob_suite-post-link.sh

I tried to resolve the dependencies using conda a few times, but was never successful.
Is there anything I should do differently? Thank you and stay safe.

plasmid sequence categorized as chromosome

Thank you for creating a great tool!
I am currently running some closed E. coli genomes through mob_recon. In the results, I noticed that one isolate had a closed plasmid sequence categorized as "chromosome" instead of "plasmid_x". When I look in the contigs in the fasta file, I see that the actual chromosome is closed, as well as the plasmid in question. The plasmid contig is 44425 bp long, while the chromosome is over 4 Mbp. I tried to understand what makes the program categorize the plasmid as part of the chromosome but couldn't really identify how mobsuite actually labels something as "chromosome"? A short description of why this is happening would be much appreciated!

No matching distribution found for setup.py

Hi there,

I'm running into some trouble with installing the mob-suite.

pip install setup.py 
Collecting setup.py
  Could not find a version that satisfies the requirement setup.py (from versions: )
No matching distribution found for setup.py

What am I doing wrong here?

Thanks!

mob_typer import error

I have issues running mob_typer. When I enter:
mob_typer --infile myassembly.fasta --outdir my_out_dir

The error message comes up as:

Traceback (most recent call last):
  File "/home/ana/anaconda3/bin/mob_typer", line 11, in <module>
    load_entry_point('mob-suite==2.0.0', 'console_scripts', 'mob_typer')()
  File "/home/ana/anaconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/ana/anaconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2843, in load_entry_point
    return ep.load()
  File "/home/ana/anaconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2434, in load
    return self.resolve()
  File "/home/ana/anaconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2440, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/ana/anaconda3/lib/python3.7/site-packages/mob_suite-2.0.0-py3.7.egg/mob_suite/mob_typer.py", line 28, in <module>
    from mob_suite.mob_host_range import getTaxonomyTree, getLiteratureBasedHostRange, loadliteratureplasmidDB, \
  File "/home/ana/anaconda3/lib/python3.7/site-packages/mob_suite-2.0.0-py3.7.egg/mob_suite/mob_host_range.py", line 9, in <module>
    from ete3 import NCBITaxa, TreeStyle
ImportError: cannot import name 'TreeStyle' from 'ete3' (/home/ana/anaconda3/lib/python3.7/site-packages/ete3-3.1.1-py3.7.egg/ete3/__init__.py)

I'm not sure what's particularly the issue and how to go about this.

Thanks,
Ana

mob_init: ImportError: No module named ete3

Just noticed this error in the middle/end of the mob_init process?

2019-11-05 12:30:36,543 INFO: Sketching complete plasmid database [in /home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/mob_suite/mob_init.py:215]
2019-11-05 12:30:52,922 INFO: Init ete3 library ... [in /home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/mob_suite/mob_init.py:227]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named ete3
2019-11-05 12:30:52,960 INFO: MOB init completed successfully [in /home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/mob_suite/mob_init.py:241]

I have ete3:

pip3up ete3
Requirement already up-to-date: ete3 in ./.linuxbrew/lib/python3.7/site-packages (3.1.1)

Bioconda Pacakge - Allow Python <3.6

Would it be possible to modify the bioconda package to allow for python versions <3.6? I've got a few pipelines that are stuck on 3.5, so I have to manually install mob_suite instead of just grabbing everything from conda the way I'd like to.

I've attached a version of the meta.yaml (as meta.yaml.txt, since apparently github won't allow .yaml file to be attached to issues) that would allow for python >= 3.4 to work, instead of only python >=3.6.

Thanks!

meta.yaml.txt

Rerunning mob_init re-downloads the data

This is probably desired behaviour, but maybe if you make the install instructions make people run mob_init, or mob_typer could check if the data is there and tell me to download it by running mob_init if it is not there?

mob_cluster own db matrix problem

Hi!
I am tyring to construct a database out of my own plasmids using the command
mob_cluster --mode build --infile --outdir
Even when trying to run this on the original ncbi_plasmid_full_seqs.fas file I encountered an error.

2020-05-13 13:24:45,779 INFO: Running Mob-Suite Clustering toolkit v. 2.0.0 [in /home/lneffe/miniconda3/lib/python3.7/site-packages/mob_suite/mob_cluster.py:225]
2020-05-13 13:24:45,779 INFO: Processing fasta file /vol/projects/MOBA/eco_bs_clin_iso/plasmid_typing_annotation/db_test/test_old_db_original/ncbi_plasmid_full_seqs.fas [in /home/lneffe/miniconda3/lib/python3.7/site-packages/mob_suite/mob_cluster.py:226]
2020-05-13 13:24:45,779 INFO: Analysis directory /vol/projects/MOBA/eco_bs_clin_iso/plasmid_typing_annotation/db_test/test_old_db_original/ [in /home/lneffe/miniconda3/lib/python3.7/site-packages/mob_suite/mob_cluster.py:227]
2020-05-13 13:55:28,055 INFO: b'' [in /home/lneffe/miniconda3/lib/python3.7/site-packages/mob_suite/wrappers/init.py:51]
Traceback (most recent call last):
File "/home/lneffe/miniconda3/bin/mob_cluster", line 10, in
sys.exit(main())
File "/home/lneffe/miniconda3/lib/python3.7/site-packages/mob_suite/mob_cluster.py", line 308, in main
clust_assignments = build_cluster_db(distance_matrix_file, (0.05, 0.0001))
File "/home/lneffe/miniconda3/lib/python3.7/site-packages/mob_suite/mob_cluster.py", line 81, in build_cluster_db
distance_matrix = data.as_matrix()
File "/home/lneffe/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'as_matrix'

I installed mob-suite with bioconda in a python3.7 environment.
Maybe there is something wrong with my installation?

Mayn thanks in advance!

unrecognized arguments

I have issue to run mob_recon. Below is the error message:

usage: mob_recon [-h] [-d DATABASE_DIRECTORY]
mob_recon: error: unrecognized arguments: -o /hpc-home/zamudio/mob_cluster --run_typer -n 32 -i /hpc-home/zamudio/sample.fasta

Am I using wrongly the flags for the input file and the output directory?
Many thanks
Roxana

How does mob-recon handle integrated plasmids?

Hi, I am trying to detect integrated plasmids in a single sequence (working with a test genome). I get a single hit in the contigs_report.txt file, but the fields for "contig_match_start" and "contig_match_end" are empty. Is there a way to get the start/end coords from mob-suite? The actual assemblies I will be working with will be more fragmented, but it would still be nice to have the coords...

Thanks in advance!

bash: mob_typer.py: command not found...

Docs says mob_typer.py but its installed as mob_typer

See:

 entry_points={
        'console_scripts': [
            'mob_init=mob_suite.mob_init:main',
            'mob_recon=mob_suite.mob_recon:main',
            'mob_cluster=mob_suite.mob_cluster:main',
            'mob_typer=mob_suite.mob_typer:main',
            'best_blast_hits=mob_suite.blast_best_hits:main',
        ],
    },

from pandas.io.common import EmptyDataError

Aim: Need to update code in /blast/__init__.py to make it more agnostic to pandas version.
Issue: new pandas versions > 1.0.5 do not support from pandas.io.common import EmptyDataError statement described in this reference
Action: Need to update code in /blast/__init__.py to more general from pandas.errors import EmptyDataError for the next MOB-Suite release

append() got an unexpected keyword argument 'sort'

Hi,
First, thanks for a practical tool.

I've just reinstalled mobsuite to try the novel feature used to determine host range in a fresh conda environment but I'm now having an error thrown when using mob typer that seems linked to pandas library ?

Traceback (most recent call last):
  File "/mibi/users/jnesme/miniconda2/envs/mobsuite_env/bin/mob_typer", line 12, in <module>
    sys.exit(main())
  File "/mibi/users/jnesme/miniconda2/envs/mobsuite_env/lib/python3.6/site-packages/mob_suite/mob_typer.py", line 497, in main
    main_report_mobtyper_df = main_report_mobtyper_df.append(pandas.DataFrame([main_report_data_dict]),sort=False)
TypeError: append() got an unexpected keyword argument 'sort'

Why does contig_id have file_id| before it?

file_id cluster_id      contig_id       contig_length   circularity_status      rep_type        rep>
kleb.fa 769     kleb.fa|2018-11623.74   1036    Incomplete                                      LT0>
kleb.fa 769     kleb.fa|2018-11623.46   3671    Incomplete                                      LT0>

The contig ID has the kleb.fa| before it. Is this necessary?

python call :: use sys.executable instead of python

there is some python call in the mob-suite code written as

p = Popen(['python', ...] )
os.system('python'.... )

I suggest you replace verbatim python by sys.executable.
on some situations, users may have a python in their path different form the one scripts have been installed with
NB this is how we perform our installation on our cluster.

this may lead to a call to a different python version, python with different set of modules and so on.

sys.executable will ensure that you will use the same interpreter,

Eric

On some systems pycurl library might not install correctly via pip

We observed strange pycurl library installation issue on old Linux systems when installed via the pip such as below.

% pip install pycurl>=7.43.0
Collecting pycurl>=7.43.0
  Using cached https://files.pythonhosted.org/packages/ac/b3/0f3979633b7890bab6098d84c84467030b807a1e2b31f5d30103af5a71ca/pycurl-7.43.0.3.tar.gz
    ERROR: Command errored out with exit status 1:
File "/opt/conda/envs/mobsuitepip/lib/python3.8/subprocess.py", line 1702, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'curl-config'

To resolve this issue on Ubuntu install the following openssl libraries and a gcc compiler. The gcc version 6.3.0 worked well for me.

%apt install libcurl4-openssl-dev libssl-dev gcc
%pip install pycurl>=7.43.0
Successfully built pycurl
Installing collected packages: pycurl
Successfully installed pycurl-7.43.0.3

If you are on HPC cluster, please install all dependencies in a separate clean environment.

mob-suite :: ete3 import TreeStyle error

Hello after installing mob-suite using conda as described on the README
I have the following error

(py36) mob-suite:~/tt> mob_typer --infile AB040415.fasta --outdir xx
Traceback (most recent call last):
  File "/opt/miniconda/envs/py36/bin/mob_typer", line 8, in <module>
    from mob_suite.mob_typer import main
  File "/opt/miniconda/envs/py36/lib/python3.6/site-packages/mob_suite/mob_typer.py", line 28, in <module>
    from mob_suite.mob_host_range import getTaxonomyTree, getLiteratureBasedHostRange, loadliteratureplasmidDB, \
  File "/opt/miniconda/envs/py36/lib/python3.6/site-packages/mob_suite/mob_host_range.py", line 9, in <module>
    from ete3 import NCBITaxa, TreeStyle
ImportError: cannot import name 'TreeStyle'

regards

Eric

2.0.0 FileNotFound: lib/python3.7/site-packages/mob_suite/config.json

% pip3 install mob-suite

% pip3 show mob-suite

Name: mob-suite
Version: 2.0.0
Summary: mob_suite is a set of tools for finding, typing and reconstruction of plasmids from draft and complete genome assemblies.
Home-page: https://github.com/phac-nml/mob-suite
Author: James Robertson, Kyrylo Bessonov
Author-email: [email protected]
License: GPLv3
Location: /home/linuxbrew/.linuxbrew/lib/python3.7/site-packages
Requires: pandas, pyqt5, pandas, biopython, pycurl, numpy, tables, scipy, ete3
Required-by:

% mob_typer --version

Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/mob_typer", line 5, in <module>
    from mob_suite.mob_typer import main
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/mob_suite/mob_typer.py", line 9, in <module>
    import mob_suite.mob_init
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/mob_suite/mob_init.py", line 18, in <module>
    with open(config_path, 'r') as configfile:
FileNotFoundError: [Errno 2] No such file or directory: '/home/linuxbrew/.linuxbrew/lib/python3.7/site-packages/mob_suite/config.json'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.