Code Monkey home page Code Monkey logo

magpy's Introduction

MAGpy

MAGpy is a Snakemake pipeline for downstream analysis of metagenome-assembled genomes (MAGs) (pronounced mag-pie)

Citation

Robert Stewart, Marc Auffret, Tim Snelling, Rainer Roehe, Mick Watson (2018) MAGpy: a reproducible pipeline for the downstream analysis of metagenome-assembled genomes (MAGs). Bioinformatics bty905, bty905

Clean your MAGs

There are a few things you will need to do before you run MAGpy, and these are due to limitations imposed by the software MAGpy runs, rather than by MAGpy itself.

These are:

  • the names of contigs in your MAGs must be globally unique. Some assemblers, e.g. Megahit, output very generic contig names e.g. "scaffold_22" which, if you have assembled multiple samples, may be duplicated in your MAGs. This is not allowed. BioPython and/or BioPerl can help you rename your contigs
  • The MAG FASTA files must start with a letter
  • The MAG FASTA files should not have any "." characters in them, other than the final . before the file extension e.f. mag1.faa is fine, mag.1.faa is not

NEW RELEASE - June 2021

  • updated to Sourmash 4.1.1
  • updated to PhyloPhlAn 3.0.2
  • updated to DIAMOND 2.0.9

Install conda

Skip if you already have it. Instructions are here

Clone the repo

git clone https://github.com/WatsonLab/MAGpy.git
cd MAGpy

Install Snakemake and mamba

Skip if you already have them

conda env create -f envs/install.yaml 
conda activate magpy_install

Run tests and install conda envs:

snakemake -rp -s MAGpy --cores 1 --use-conda test

Build the databases

This will build a DIAMOND database of the whole of UniProt TREMBL, so you will need to give it a lot of resources (RAM) - try 256Gb.

rm -rf magpy_dbs
snakemake -rp -s MAGpy --cores 16 --use-conda setup

Run MAGpy

snakemake -rp -s MAGpy --use-conda --cores 8

For large workflows, I recommend you use cluster or cloud execution.

Also, for any large number of MAGs, PhyloPhlAn will take a long time - e.g. a few weeks for a couple of thousand MAGs.

magpy's People

Contributors

cezar77 avatar fmaguire avatar halexand avatar mw55309 avatar wdecoster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

magpy's Issues

SyntaxError in line 66

Hello,

I've been trying to install MAGpy using the 10 minute install instructions. I installed in /data/metagenomics/MAGPy-attempt3/ and then created a new directory where I cloned the repo to test it, /data/metagenomics/MAGpy_test/. I changed the json.config file accordingly, to point to the MAGPy-attempt3 directory where all the installed files are located. When I tested using snakemake --use-conda -s MAGpy test I got the following error:

SyntaxError in line 66 of /data/metagenomics/MAGpy_test/MAGpy/MAGpy: Command must be given as string after the shell keyword. (MAGpy, line 66).

I tried the following:

  1. Editing the shell script to be a 1 liner:
    echo -e "name\tnprots\tnhits\tnfull\tgenus\tngenus\tspecies\tnspecies\tavgpid" >> {output} && find diamond_report/ -name "bin*.tsv" | xargs -I {{}} cat {{}} >> {output}

However then I got a syntax error:

"SyntaxError in line 65 of /data/metagenomics/MAGPy-attempt3/MAGpy/MAGpy: 
invalid syntax"
  1. Tried opening in Sublime Text and changing tabs to spaces & made sure the formatting lined up. This resulted in the following error:
"KeyError in line 19 of /data/metagenomics/MAGpy_test/MAGpy/MAGpy: 
'checkm_dataroot'
 File "/data/metagenomics/MAGpy_test/MAGpy/MAGpy", line 19, in <module>"

I verified that I had modified the config.json file to point to checkm_data. My config.json looks like:

{
    "phylophlan_dir": "/data/metagenomics/MAGPy-attempt3/nsegata-phylophlan-1d174e34b2ae",
    "uniprot_sprot": "/data/metagenomics/MAGPy-attempt3/uniprot_trembl",
    "sourmash_gbk": "/data/metagenomics/MAGPy-attempt3/genbank-d2-k31.sbt.json",
    "pfam_dir": "/data/metagenomics/MAGPy-attempt3/",
    "checkm_dataroot": "/data/metagenomics/MAGPy-attempt3/checkm_data"
}

I am using a linux HPC cluster, Python 3.6.7.

Do you have any suggestions about what could be going wrong?

Many thanks

installation issue for sourmash

Hello,

I want to use MAGpy for my metagenomic data analysis but get the error at a very early stage. My command is here:
git clone https://github.com/WatsonLab/MAGpy.git
cd MAGpy
conda env create -f envs/install.yaml
This succesfully creates a conda environment.

I activated the environment
conda activate maypy_install

Then ran the following command:
nakemake -rp -s MAGpy --cores 24 --use-conda test
But get the following error:
`Building DAG of jobs...
Creating conda environment envs/prodigal.yaml...
Downloading and installing remote packages.
Environment for envs/prodigal.yaml created (location: .snakemake/conda/55c7ff22b092c19217e5f2ec3f9e4209)
Creating conda environment envs/bioperl.yaml...
Downloading and installing remote packages.
Environment for envs/bioperl.yaml created (location: .snakemake/conda/a38998cd23e9ca04e9db80dcbfc6e82c)
Creating conda environment envs/pfam_scan.yaml...
Downloading and installing remote packages.
Environment for envs/pfam_scan.yaml created (location: .snakemake/conda/31f109d7845f53d8b16f172a493949bd)
Creating conda environment envs/phylophlan.yaml...
Downloading and installing remote packages.
Environment for envs/phylophlan.yaml created (location: .snakemake/conda/d1d3c5ec64b4c2330ba4a92faf2e0eb3)
Creating conda environment envs/diamond.yaml...
Downloading and installing remote packages.
Environment for envs/diamond.yaml created (location: .snakemake/conda/8435cfd2ff926a9883cafbdc8d09837d)
Creating conda environment envs/sourmash.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /scratch/gencore/ma5877/MAGs/MAGpy/envs/sourmash.yaml:
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Encountered problems while solving:

  • package sourmash-4.1.1-hdfd78af_0 requires matplotlib-base, but none of the providers can be installed`

Please help in solving this issue. Many thanks in advance!

install issues

Just trying to install and run this on our cluster. When I run the command: snakemake -rp -s MAGpy --cores 1 --use-conda test

I get the following error, but everything else seems to run fine:

scripts/add_tax.py ran with some errors: File "/network/rit/lab/andamlab/bin/MAGpy/scripts/add_tax.py", line 95
k = names[l]
TabError: inconsistent use of tabs and spaces in indentation

Further more if I move to the next step and run: snakemake -rp -s MAGpy --cores 16 --use-conda setup

It finishes very quickly, and reviewing the log file it says:

The flag 'directory' used in rule all is only valid for outputs, not inputs.
Building DAG of jobs...
Nothing to be done.

Am I missing something here with the install?

Snakemake test not working!

Hi,

I've installed the conda version, and when running the test am getting the following:

WorkflowError:
Failed to open /mnt/irisgpfs/projects/ecosystem_biology/local_tools/MAGpy/MAGpy.

I've checked folder permissions and everything looks alright.

Could you let me know how to fix this?

Thank you,
Susheel

ResolvePackageNotFound issue with pplacer

Hello,

Having some trouble in the install- running the below

conda env create -f MAGpy/envs/MAGpy-3.5.yaml

Produces the below error-

Solving environment: failed

ResolvePackageNotFound: 
  - pplacer==1.1.alpha17=0

Is this a conda issue, or something up with the .yaml file?

error in magpy3 test

snakemake --use-conda -s MAGpy test
SyntaxError in line 66 of /MAGpy/MAGpy:
Command must be given as string after the shell keyword. (MAGpy, line 66)

Conda takes a long time to resolve enviornments

This is a new problem we have found with conda - the MAGpy-3.5 environment took 7 hours to solve on my PC yesterday (but only took ~ten minutes a few weeks ago).

Conda is really struggling. They are working in it.

I am working on resolving this issue, please be patient!

Argument list too long for rule `diamond_bin_summary`

Hello,

I am trying to run MAGpy with ~3000 MAGs. When working with this larger number of mags I have come upon this error in the rule diamond_bin_summary:

RuleException:
OSError in line 67 of /vortexfs1/omics/alexander/Alexander-MAGpy/MAGpy:
[Errno 7] Argument list too long: '/bin/bash'
  File "/vortexfs1/omics/alexander/Alexander-MAGpy/MAGpy", line 67, in __rule_diamond_bin_summary
  File "/vortexfs1/home/halexander/.conda/envs/snakemake/lib/python3.6/subprocess.py", line 709, in __init__
  File "/vortexfs1/home/halexander/.conda/envs/snakemake/lib/python3.6/subprocess.py", line 1344, in _execute_child
  File "/vortexfs1/home/halexander/.conda/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

It would appear that trying to concatenate the output from all the diamond results is a bit too much for my particular system. I wonder if breaking it into a for loop would make the pipeline more extendable.

Tests are written for Python 2.7

Hello!

I installed MAGpy and am excited to try it out! I am trying to run it with MAGpy-3.5 env that was defined in the envs. If I do this I noticed that snakemake -s MAGpy test doesn't work and throws errors e.g.:

  File "test/scripts/test_phylophlan.py", line 17
    print phydir + " does not exist or isn't a directory"
               ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(phydir + " does not exist or isn't a directory")?

Generally speaking, is MAGpy better run with 2.7 than 3.5? If not, I have made a PR (#8) with a fix to the tests.

Thanks again for the great platform-- I am excited to work with it.

checkm cannot overwrite the folder

In the rule checkm, the folder ./checkm is problematic. If you want to rerun the pipeline, checkm breaks, as the folder ./checkm is not empty. I added to the rule end a 'rm -r ./checkm' to remove this folder after execution, allowing a rerun.

Error in rule test_prodigal

Building DAG of jobs...
Creating conda environment envs/prodigal.yaml...
Downloading and installing remote packages.
Environment for /home/microviable/Programas/MAGpy/envs/prodigal.yaml created (location: .snakemake/conda/4edbab1558aaecb0ac56dcc0022a7d06)
Creating conda environment envs/ete3.yaml...
Downloading and installing remote packages.
Environment for /home/microviable/Programas/MAGpy/envs/ete3.yaml created (location: .snakemake/conda/504677b662d9b309a94e279cba8bc259)
Creating conda environment envs/pfam_scan.yaml...
Downloading and installing remote packages.
Environment for /home/microviable/Programas/MAGpy/envs/pfam_scan.yaml created (location: .snakemake/conda/f4814e8155a03c91d2d9cccb5c9f04af)
Creating conda environment envs/checkm.yaml...
Downloading and installing remote packages.
Environment for /home/microviable/Programas/MAGpy/envs/checkm.yaml created (location: .snakemake/conda/3711ed2a33f26ece83799275a394e167)
Creating conda environment envs/bioperl.yaml...
Downloading and installing remote packages.
Environment for /home/microviable/Programas/MAGpy/envs/bioperl.yaml created (location: .snakemake/conda/6c0f736163adb181cd1ef7fbe053b7bf)
Creating conda environment envs/phylophlan.yaml...
Downloading and installing remote packages.
Environment for /home/microviable/Programas/MAGpy/envs/phylophlan.yaml created (location: .snakemake/conda/e7a563cd3f7c0a978d8fb74f76cefc24)
Creating conda environment envs/diamond.yaml...
Downloading and installing remote packages.
Environment for /home/microviable/Programas/MAGpy/envs/diamond.yaml created (location: .snakemake/conda/75a6929f6d67262af24a40a431808705)
Creating conda environment envs/sourmash.yaml...
Downloading and installing remote packages.
Environment for /home/microviable/Programas/MAGpy/envs/sourmash.yaml created (location: .snakemake/conda/9f5d5c80ad78d231c30510ef6afd7269)
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads


test 1 1 1
test_checkm 1 1 1
test_checkm_plus 1 1 1
test_diamond 1 1 1
test_diamond_bin_summary_plus 1 1 1
test_diamond_report 1 1 1
test_pfam_scan 1 1 1
test_phylophlan 1 1 1
test_prodigal 1 1 1
test_sourmash 1 1 1
test_update_ete3 1 1 1
total 11 1 1

Select jobs to execute...

[Fri Apr 22 21:03:24 2022]
rule test_phylophlan:
output: test/outputs/phylophlan.txt
jobid: 10
reason: Missing output files: test/outputs/phylophlan.txt
resources: tmpdir=/tmp

	test/scripts/test_phylophlan.py test/outputs/phylophlan.txt

Activating conda environment: .snakemake/conda/e7a563cd3f7c0a978d8fb74f76cefc24
[Fri Apr 22 21:03:25 2022]
Finished job 10.
1 of 11 steps (9%) done
Select jobs to execute...

[Fri Apr 22 21:03:25 2022]
rule test_prodigal:
output: test/outputs/prodigal.txt
jobid: 8
reason: Missing output files: test/outputs/prodigal.txt
resources: tmpdir=/tmp

	test/scripts/test_prodigal.py test/outputs/prodigal.txt

Activating conda environment: .snakemake/conda/4edbab1558aaecb0ac56dcc0022a7d06
[Fri Apr 22 21:03:25 2022]
**Error in rule test_prodigal:
jobid: 8
output: test/outputs/prodigal.txt
conda-env: /home/microviable/Programas/MAGpy/.snakemake/conda/4edbab1558aaecb0ac56dcc0022a7d06
shell:

	test/scripts/test_prodigal.py test/outputs/prodigal.txt
	
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-04-22T210037.540400.snakemake.log**

WARNING: the sourmash compute command is DEPRECATED as of 4.0 and will be removed in 5.0.

Hi,Dear Sir!
I had some questions when I used this pipeline.The Sourmash's Options seem to something wrong!
First question,the sourmash is useful,but the "test_results.txt" is print
"Sourmash ran with some errors:
== This is sourmash version 4.1.1. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. =="
why?
Next question,when I used "snakemake -rp -s MAGpy --use-conda all --cores 16",the log file was found this:
“** WARNING: the sourmash compute command is DEPRECATED as of 4.0 and
** will be removed in 5.0. Please see the 'sourmash sketch' command instead.”
I don't know my results are correct or not?
Thanks!

Error while running the test

Hi,

I installed the MAGpy by following the described steps. But all that, when I tried running the test with the following command:
snakemake --use-conda -s MAGpy

I get the following error:

Activating conda environment: /scratch/gencore/ma5877/MAG/MAGpy/.snakemake/conda/f2a7e790
Activating conda environment: /scratch/gencore/ma5877/MAG/MAGpy/.snakemake/conda/0a6540a5
Activating conda environment: /scratch/gencore/ma5877/MAG/MAGpy/.snakemake/conda/3c5b4375
[Tue Nov 17 17:15:54 2020]
Error in rule sourmash_sig:
jobid: 3
output: sourmash/ecoli.sig
conda-env: /scratch/gencore/ma5877/MAG/MAGpy/.snakemake/conda/0a6540a5
shell:
sourmash compute --scaled 1000 -k 31 -o sourmash/ecoli.sig mags/ecoli.fa
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Tue Nov 17 17:15:54 2020]
Error in rule checkm:
jobid: 11
output: checkm.txt
conda-env: /scratch/gencore/ma5877/MAG/MAGpy/.snakemake/conda/f2a7e790
shell:

            checkm_db=/scratch/gencore/ma5877/MAG/checkm_data
            echo ${checkm_db} | checkm data setRoot ${checkm_db}
            checkm lineage_wf -f checkm.txt --reduced_tree -t 16 -x fa mags ./checkm

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Tue Nov 17 17:17:25 2020]
Finished job 8.
1 of 13 steps (8%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Please help me to fix this issue. Thanks

miniconda permission/version error

Hi there,

Managed to follow the 10min install and complete it but get to test the install with snakemake --use-conda -s MAGpy test and get the following:

$ snakemake --use-conda -s MAGpy test
Building DAG of jobs...
Creating conda environment envs/ete3.yaml...
Downloading remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /media/andre/B2F8C9A0F8C962E9/SGG_metagenomes/data/MAGpy/envs/ete3.yaml:
Fetching package metadata ..........An unexpected error has occurred.
Please consider posting the following information to the
conda GitHub issue tracker at:

    https://github.com/conda/conda/issues



Current conda install:

               platform : linux-64
          conda version : 4.3.30
       conda is private : False
      conda-env version : 4.3.30
    conda-build version : not installed
         python version : 2.7.13.final.0
       requests version : 2.12.4
       root environment : /home/andre/miniconda2  (read only)
    default environment : /home/andre/miniconda3/envs/magpy_install
       envs directories : /home/andre/.conda/envs
                          /home/andre/miniconda2/envs
          package cache : /home/andre/miniconda2/pkgs
                          /home/andre/.conda/pkgs
           channel URLs : https://repo.continuum.io/pkgs/main/linux-64
                          https://repo.continuum.io/pkgs/main/noarch
                          https://repo.continuum.io/pkgs/free/linux-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/r/linux-64
                          https://repo.continuum.io/pkgs/r/noarch
                          https://repo.continuum.io/pkgs/pro/linux-64
                          https://repo.continuum.io/pkgs/pro/noarch
            config file : None
             netrc file : None
           offline mode : False
             user-agent : conda/4.3.30 requests/2.12.4 CPython/2.7.13 Linux/4.4.6-040406-generic debian/jessie/sid glibc/2.19    
                UID:GID : 1000:1000

`$ /home/andre/miniconda2/bin/conda-env create --file /media/andre/B2F8C9A0F8C962E9/SGG_metagenomes/data/MAGpy/.snakemake/conda/d1cdcc3a.yaml --prefix /media/andre/B2F8C9A0F8C962E9/SGG_metagenomes/data/MAGpy/.snakemake/conda/d1cdcc3a`




    Traceback (most recent call last):
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda/exceptions.py", line 640, in conda_exception_handler
        return_value = func(*args, **kwargs)
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda_env/cli/main_create.py", line 108, in execute
        installer.install(prefix, pkg_specs, args, env)
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda_env/installers/conda.py", line 29, in install
        prefix=prefix)
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda/core/index.py", line 101, in get_index
        index = fetch_index(channel_priority_map, use_cache=use_cache)
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda/core/index.py", line 120, in fetch_index
        repodatas = collect_all_repodata(use_cache, tasks)
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda/core/repodata.py", line 75, in collect_all_repodata
        repodatas = _collect_repodatas_serial(use_cache, tasks)
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda/core/repodata.py", line 485, in _collect_repodatas_serial
        for url, schan, pri in tasks]
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda/core/repodata.py", line 115, in func
        res = f(*args, **kwargs)
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda/core/repodata.py", line 467, in fetch_repodata
        touch(cache_path)
      File "/home/andre/miniconda2/lib/python2.7/site-packages/conda/gateways/disk/update.py", line 64, in touch
        utime(path, None)
    OSError: [Errno 13] Permission denied: '/home/andre/miniconda2/pkgs/cache/9ca791dd.json'

So not only is it unable to get permission to certain folders, it's also weirdly calling my old miniconda2 install during this. Any help?

error while running pfam

Hi,
I got en error while running the MAGpy on mags.

**[Mon Nov 18 12:52:21 2019]
Finished job 109.
173 of 373 steps (46%) done
Activating conda environment: /mibi/users/jsb562/MAGpy/.snakemake/conda/b0d3265a
FATAL: can't find "active_site.dat" in "/mibi/users/jsb562/MAGpy/pfam" at /mibi/users/jsb562/MAGpy/.snakemake/conda/b0d3265a/share/pfam_scan-1.6-3/pfam_scan.pl line 92.
[Mon Nov 18 12:52:25 2019]
Error in rule pfam_scan:
jobid: 36
output: pfam/18097D-02-03_bin.6.pfam
conda-env: /mibi/users/jsb562/MAGpy/.snakemake/conda/b0d3265a
shell:
pfam_scan.pl -outfile pfam/18097D-02-03_bin.6.pfam -as -cpu 8 -fasta proteins/18097D-02-03_bin.6.faa -dir /mibi/users/jsb562/MAGpy/pfam
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

found less than 4.0 kbp in common. => exiting
[Mon Nov 18 12:52:30 2019]
Finished job 116.
174 of 373 steps (47%) done
found less than 4.0 kbp in common. => exiting
[Mon Nov 18 12:52:32 2019]
Finished job 100.
[Mon Nov 18 12:53:07 2019]
Finished job 118.
183 of 373 steps (49%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /mibi/users/jsb562/MAGpy/.snakemake/log/2019-11-18T123925.474123.snakemake.log**

Any suggestions?

Error in Run tests and install conda envs

Hi, I tried to run the new version of MAGpy, and I got an error in running tests and install conda envs (forward step "conda env create -f envs/install.yaml" was corrected).

❯ snakemake -rp -s MAGpy --cores 1 --use-conda test
Building DAG of jobs...
Creating conda environment envs/checkm.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/t640/user/xiaochen/software/MAGpy/envs/checkm.yaml:

#>>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

Traceback (most recent call last):
  File "/home/t640/user/xiaochen/miniconda3/envs/magpy_install/lib/python3.10/site-packages/conda/exceptions.py", line 1080, in __call__
    return func(*args, **kwargs)
  File "/home/t640/user/xiaochen/miniconda3/envs/magpy_install/lib/python3.10/site-packages/conda_env/cli/main.py", line 80, in do_call
    exit_code = getattr(module, func_name)(args, parser)
  File "/home/t640/user/xiaochen/miniconda3/envs/magpy_install/lib/python3.10/site-packages/conda_env/cli/main_create.py", line 141, in execute
    result[installer_type] = installer.install(prefix, pkg_specs, args, env)
  File "/home/t640/user/xiaochen/miniconda3/envs/magpy_install/lib/python3.10/site-packages/mamba/mamba_env.py", line 45, in mamba_install
    index = load_channels(pool, channel_urls, repos, prepend=False)
  File "/home/t640/user/xiaochen/miniconda3/envs/magpy_install/lib/python3.10/site-packages/mamba/utils.py", line 122, in load_channels
    index = get_index(
  File "/home/t640/user/xiaochen/miniconda3/envs/magpy_install/lib/python3.10/site-packages/mamba/utils.py", line 103, in get_index
    is_downloaded = dlist.download(True)
RuntimeError: Multi-download failed.

$ /home/t640/user/xiaochen/miniconda3/envs/magpy_install/bin/mamba create --quiet --file /home/t640/user/xiaochen/software/MAGpy/.snakemake/conda/80bd1d816c0e92c259e9c740d841735f.yaml --prefix /home/t640/user/xiaochen/software/MAGpy/.snakemake/conda/80bd1d816c0e92c259e9c740d841735f

How can I solve this problem? Thank you!

building Database

Build the databases

This will build a DIAMOND database of the whole of UniProt TREMBL, so you will need to give it a lot of resources (RAM) - try 256Gb.

rm -rf magpy_dbs
snakemake -rp -s MAGpy --cores 16 --use-conda setup

Is it possible to download Diamond databases from Uniprot considering the high resources needed to build it locally?

Recommended hardware requirements

Hi there,

MAGpy seems like an interesting tool I would like to use, but I was wondering if you could add any information regarding system requirements? I understand resource requirements for this pipeline will vary greatly based on the amount of data being processed, but can you recommend a minimum and ideal amount of cores, RAM and storage space to succesfully run MAGpy on (for example) 100 MAGs?

Best regards,

Sam

phylophlan.py: error: unrecognized arguments: -u {params.unn}

phylophlan.log
phylophlan_help.txt

I am testing MAGpy with conda env and getting "phylophlan.py: error: unrecognized arguments: -u {params.unn}".

given bitbuket link was broken (https://bitbucket.org/nsegata/phylophlan), so i have tried with github repo and bioconda package for phylophlan.
Both are same versions and giving same error. attached command help and error output log.
Please check and suggest me correct versions to fix the issue.

phylophlan {PhyloPhlAn version 3.0.60 (27 November 2020)}

Git Repo : (https://github.com/biobakery/phylophlan)
(conda install -c bioconda phylophlan).

Starting up MAGpy

I'm sure this is an issue with me having never used snakemake before, but here goes:

I followed the steps on the github readme (test and building diamond database all worked perfectly), and am now trying to start the actual MAGpy process using the command:

snakemake -rp -s MAGpy --use-conda MAGpy --cores 8

My MAGs are in the mags directory in the cloned github repo.

The process fails with below message:

Building DAG of jobs... MissingRuleException: No rule to produce MAGpy (if you use input functions make sure that they don't raise unexpected exceptions).

Could you point me in the right direction?
Thank you!

phylophlan no longer available

Hello,

following the "10 minute install" - hg clone https://bitbucket.org/nsegata/phylophlan does not work as Bitbucket no longer supports Mercurial repositories.

I tried installing phylophlan with conda install phylophlan, pip install phylophlan, and using the GitHub repository but I still receive the following error:

Activating conda environment: /home/mojarro/MAGpy/.snakemake/conda/3ff88c87 mkdir: cannot create directory ‘./phylophlan/input/205367’: No such file or directory cp: target './phylophlan/input/205367' is not a directory /bin/bash: line 3: cd: ./phylophlan: No such file or directory /bin/bash: line 4: ./phylophlan.py: No such file or directory mv: cannot stat './output/205367/*.nwk': No such file or directory mv: cannot stat './output/205367/*.xml': No such file or directory /bin/bash: line 8: ./phylophlan.py: No such file or directory mv: cannot stat './output/205367/*.nwk': No such file or directory mv: cannot stat './output/205367/*.txt': No such file or directory Waiting at most 5 seconds for missing files. MissingOutputException in line 109 of /home/mojarro/MAGpy/MAGpy: Job completed successfully, but some output files are missing. Missing files after 5 seconds: tree/MAGpy/MAGpy.tree.nwk This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. File "/home/mojarro/Documents/MiniConda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 544, in handle_job_success File "/home/mojarro/Documents/MiniConda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 231, in handle_job_success Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/mojarro/MAGpy/.snakemake/log/2020-08-20T114738.178383.snakemake.log

Any suggestions?

installing magpy

Hi,

MAGpy seems to be the perfect tool for my metagenomic data analysis but I am unable to even install it. The manual says that after installing conda I should do the following:
conda env create -f envs/install.yaml

However, the install.yaml file doesn't contain all the tools needed to run the analysis hence I get the error. My install.yaml file looks like this:
`name: magpy_install
channels:

  • conda-forge
  • bioconda
  • defaults
    dependencies:
  • snakemake
  • mamba`

Another thing that MAGpy/envs folder contains several files basic2.yaml basic3.yaml bioperl.yaml checkm.yaml diamond.yaml ete3.yaml install.yaml MAGpy-2.7.yaml MAGpy-3.5.yaml pfam_scan.yaml phylophlan.yaml prodigal.yaml setup.yaml sourmash.yaml

Should I install all of them? Is it possible to have them together in one file so I could install everything in one run?

Please let me know if I am missing something. Many thanks!

Seems promising but??

The pipeline seems promising since it uses several tools, is it possible to get examples or tutorial to show most relevant of its functionalities.

Thanks

SLURM submission?

Hi there,

Anyone ever tried running MAGpy in a SLURM environment? Getting a lot of Error: Snakefile "Snakefile" not present. and wondering if it's just me not knowing how to submit this or something else.

Last command I tried: $ snakemake --use-conda --cluster-config MAGpy.json --cluster "sbatch -n {core} -t {time} --mem={vmem} -P {proj} -D /scratch/a.ans74/MAGpy/" --jobs 100.

All tests for the installation run well btw.

Cheerio!

error in test_checkm.py

Hi, I tried to run the new version of MAGpy, and I got an error in running tests.
When the step testing test_checkm.py, the terminal stopped and nothing happened.
All the other testing steps were well processed.

❯ snakemake -rp -s MAGpy --cores 120 --use-conda test
The flag 'directory' used in rule all is only valid for outputs, not inputs.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 120
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 test
1 test_checkm
2
Select jobs to execute...

[Thu Jun 3 20:08:44 2021]
rule test_checkm:
output: test/outputs/checkm.txt
jobid: 2
reason: Missing output files: test/outputs/checkm.txt

	test/scripts/test_checkm.py test/outputs/checkm.txt	

Activating conda environment: /home/yb/MAGpy/.snakemake/conda/193fb18dbd5d570d4fcebaa03389205a

How can I solve this problem? Thank you!

usearch error

I am at the first step of MAGpy installation. I downloaded the usearch file and then tried to change mode to make it executable but I get the following error:
chmod: changing permissions of ‘usearch’: Operation not permitted

How can we fix that? Also, can I complete the analysis without usearch installed?

Thanks,

The incompatible packages with each other

Hello,

I installed MAGpy as the 10-min-installation, and at the last step, it showed:

Collecting package metadata (repodata.json): done
Solving environment: \
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Package libgfortran-ng conflicts for:
scipy -> libgfortran-ng[version='>=7,<8.0a0|>=7.2.0,<8.0a0']
Package python conflicts for:
mercurial=4.5 -> python[version='2.7.|>=2.7,<2.8.0a0']
scipy -> python[version='2.7.
|3.4.|3.5.|3.6.|>=2.7,<2.8.0a0|>=3.5,<3.6.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']
graphlan=1.0.0 -> python=2.7
Package libgcc conflicts for:
fasttree=2.1.10 -> libgcc
muscle=3.8.1551 -> libgcc
scipy -> libgcc
Package libgcc-ng conflicts for:
muscle=3.8.1551 -> libgcc-ng[version='>=7.3.0']
fasttree=2.1.10 -> libgcc-ng[version='>=4.9|>=7.3.0']
scipy -> libgcc-ng[version='>=4.9|>=7.2.0|>=7.3.0']
python=2.7 -> libgcc-ng[version='>=4.9|>=7.2.0|>=7.3.0']
mercurial=4.5 -> libgcc-ng[version='>=7.2.0']
Package liblapack conflicts for:
scipy -> liblapack[version='>=3.8.0,<3.9.0a0']
Package biopython conflicts for:
graphlan=1.0.0 -> biopython=1.66
Package libstdcxx-ng conflicts for:
muscle=3.8.1551 -> libstdcxx-ng[version='>=4.9|>=7.3.0']
scipy -> libstdcxx-ng[version='>=4.9|>=7.2.0|>=7.3.0']
python=2.7 -> libstdcxx-ng[version='>=4.9|>=7.2.0|>=7.3.0']
Package * conflicts for:
scipy -> [track_features=blas_openblas]
Package pip conflicts for:
python=2.7 -> pip
Package mkl conflicts for:
scipy -> mkl[version='>=2018.0.0,<2019.0a0|>=2018.0.2,<2019.0a0|>=2018.0.3,<2019.0a0|>=2019.1,<2020.0a0|>=2019.4,<2020.0a0']
Package libffi conflicts for:
python=2.7 -> libffi[version='3.2.
|>=3.2.1,<3.3.0a0|>=3.2.1,<4.0a0']
Package sqlite conflicts for:
python=2.7 -> sqlite[version='3.13.
|3.20.|>=3.20.1,<4.0a0|>=3.22.0,<4.0a0|>=3.23.1,<4.0a0|>=3.24.0,<4.0a0|>=3.25.2,<4.0a0|>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.27.2,<4.0a0|>=3.28.0,<4.0a0|>=3.29.0,<4.0a0']
Package readline conflicts for:
python=2.7 -> readline[version='6.2.
|7.|7.0.|7.0|>=7.0,<8.0a0|>=8.0,<9.0a0']
Package mkl-service conflicts for:
scipy -> mkl-service[version='>=2,<3.0a0']
Package openssl conflicts for:
python=2.7 -> openssl[version='1.0.|1.0.,>=1.0.2l,<1.0.3a|>=1.0.2m,<1.0.3a|>=1.0.2n,<1.0.3a|>=1.0.2o,<1.0.3a|>=1.0.2p,<1.0.3a|>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a']
Package numpy conflicts for:
scipy -> numpy[version='1.10.|1.11.|1.12.|1.13.|>=1.11|>=1.11.3,<2.0a0|>=1.14.6,<2.0a0|>=1.15.1,<2.0a0|>=1.9|>=1.9.3,<2.0a0']
Package ncurses conflicts for:
python=2.7 -> ncurses[version='5.9.|6.0.|>=6.0,<7.0a0|>=6.1,<6.2.0a0|>=6.1,<7.0a0']
Package tk conflicts for:
python=2.7 -> tk[version='8.5.|8.6.|>=8.6.7,<8.7.0a0|>=8.6.8,<8.7.0a0|>=8.6.9,<8.7.0a0']
Package matplotlib conflicts for:
graphlan=1.0.0 -> matplotlib[version='1.4.|1.5.']
Package zlib conflicts for:
python=2.7 -> zlib[version='1.2.|1.2.11.|1.2.11|1.2.8|>=1.2.11,<1.3.0a0']
Package ca-certificates conflicts for:
python=2.7 -> ca-certificates
Package bzip2 conflicts for:
python=2.7 -> bzip2[version='>=1.0.6,<2.0a0']
Package libgfortran conflicts for:
scipy -> libgfortran[version='>=3.0']
Package libcblas conflicts for:
scipy -> libcblas[version='>=3.8.0,<4.0a0']
Package libblas conflicts for:
scipy -> libblas[version='>=3.8.0,<4.0a0']
Package libopenblas conflicts for:
scipy -> libopenblas[version='>=0.2.20,<0.2.21.0a0|>=0.3.2,<0.3.3.0a0|>=0.3.3,<1.0a0']
Package blas conflicts for:
scipy -> blas[version='||1.0|1.1',build='openblas|openblas|openblas|mkl|mkl']
Package openblas conflicts for:
scipy -> openblas[version='0.2.18.|0.2.18|0.2.18.|0.2.19|0.2.19.|0.2.20|0.2.20.|>=0.2.20,<0.2.21.0a0|>=0.3.3,<0.3.4.0a0']
Package pygments conflicts for:
mercurial=4.5 -> pygments

Could you please tell me how to solve the problems?

Best regards,

Sarah

Error during "Looking for PhyloPhlAn proteins in input faa files"

Hello,
I was running the pipeline, using the command

(magpy_install) aparada@kuat:~/MAGpy3$ snakemake --use-conda -s MAGpy --jobs 20

but got the following error message.

Activating conda environment: /media/acclomator/aparada/MAGpy3/.snakemake/conda/30119d31
Looking for PhyloPhlAn proteins in input faa files
Starting data/73535/IMG_2524023116.b6o...
Starting data/73535/IMG_2634166733.b6o...
Starting data/73535/Bin_14_10-contigsRN.b6o...
Starting data/73535/IMG_2634166754.b6o...
Starting data/73535/IMG_2731639251.b6o...
Starting data/73535/CP003842.b6o...
Starting data/73535/Bin_14_19-contigsRN.b6o...
Starting data/73535/IMG_2663763570.b6o...
Starting data/73535/IMG_2731639231.b6o...
Starting data/73535/IMG_2524023110.b6o...
Starting data/73535/CP011070.b6o...
Starting data/73535/IMG_2731639250.b6o...
Starting data/73535/CP007026.b6o...
Starting data/73535/IMG_2524023097.b6o...
Starting data/73535/IMG_2524023101.b6o...
Starting data/73535/IMG_2740891960.b6o...


Invalid command line
Unknown option wdb



Invalid command line
Unknown option wdb

Starting data/73535/IMG_2657245302.b6o...


Invalid command line
Unknown option wdb

This combination of "Strating data/....." followed by "Invalid command line Unknown option wdb" goes on until the pipeline exits with the following message.


All usearch runs performed!
Traceback (most recent call last):
  File "./phylophlan.py", line 792, in <module>
    faa2ppafaa( inps, pars['nproc'], projn )
  File "./phylophlan.py", line 291, in faa2ppafaa
    for l in open(dat_fol+i+".b6o").readlines()):
IOError: [Errno 2] No such file or directory: 'data/73535/IMG_2524023116.b6o'
mv: cannot stat './output/73535/*.nwk': No such file or directory
mv: cannot stat './output/73535/*.txt': No such file or directory
Waiting at most 5 seconds for missing files.
MissingOutputException in line 109 of /media/acclomator/aparada/MAGpy3/MAGpy:
Missing files after 5 seconds:
tree/MAGpy/MAGpy.tree.nwk
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /media/acclomator/aparada/MAGpy3/.snakemake/log/2020-08-24T150512.835624.snakemake.log

Any suggestions on how I can correct this problem?
I've also attached the log file.
2020-08-24T150512.835624.snakemake.log

Thanks in advance!
Alma

Question about databases

Hello,

I have been testing out your tool with my bacterial MAGs. Most of the diamond hits get assigned to archaea. I used the 10 minute install guideline for setting up the databases. Do you have any idea on why this could be happening? I am new to your work and don't know where to start with the troubleshooting.

Thank you and happy new year!

problems updating ete3 database

Hello,
I am running into the problem below when trying to update the ete3 database. Any suggestions on how to fix this problem?
Best,
Alma

$ python MAGpy/scripts/update_ete3.py

Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...
Done. Parsing...
Loading node names...
2268733 names loaded.
225755 synonyms loaded.
Loading nodes...
2268733 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /home/users/aparada/.etetoolkit/taxa.sqlite ...
2268000 generating entries...
Uploading to /home/users/aparada/.etetoolkit/taxa.sqlite

Inserting synonyms: 100000 Traceback (most recent call last):

File "MAGpy/scripts/update_ete3.py", line 5, in

ncbi.update_taxonomy_database()

File "/home/users/aparada/miniconda3/envs/magpy_install/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database

update_db(self.dbfile)

File "/home/users/aparada/miniconda3/envs/magpy_install/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db

upload_data(dbfile)

File "/home/users/aparada/miniconda3/envs/magpy_install/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data

db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))

sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid

workflow rerun again and again

Hi, here is a toy code I wrote (test.smk):

rule all:
input:
"/home/wangjw/data/work/flask/b.txt"

rule copy1:
input:
file = directory("/home/wangjw/data/work/flask")
output:
file = "data1/a.txt"
shell:
"cp {input.file}/a.txt {output.file}"

rule copy2:
input:
file = "data1/a.txt"
output:
file = "/home/wangjw/data/work/flask/b.txt"
shell:
"cp {input.file} {output.file}"

then I run snakemake -j 1 -s test.smk, the workflow will run again each time I perform "snakemake -j 1 -s test.smk", although all the files have been created.

phylophlan 3

Hi,

I'm trying to run MAGpy, but I suspect that the setup regarding phylophlan is based on version 2, which is currently unavailable, rather than the newest version 3.
Any idea how to modify the file MAGpy to run the new version?

Thanks

python vs python3

Thank you for the development of MAGpy, it is a great tool.

Using the pipeline with the rule specific conda packages (--use-conda, following your instructions), I had to adapt the call to python in a few rules. Somehow the 'python' symlink was missing and therefore 'python3' had to be called directly. Personally, I solved the issue by not changing the shebang of the scripts, but by calling the python scripts with python in the rules, e.g.:
$ python3 script.py

There is also a small error in the readme. To run the pipeline, you have to call
$ snakemake -rp -s MAGpy --use-conda
without the ‘MAGpy’. I assume that this rule name has changed to ‘all’.

sourmash error

Hi
I'm getting an error during testing whole MAGpy pipeline, in sourmash step.

Activating conda environment: /home/wikim-hpc/MAGpy/.snakemake/conda/8abd07f5f0e14ba82c88ec1b75516b52
loaded query: mags/ecoli.fa... (k=31, DNA)
[Errno 2] No such file or directory: 'ecolidb.sbt.json'

Cannot open file 'ecolidb.sbt.json'
[Tue May 25 16:56:05 2021]
Error in rule sourmash_gather:
jobid: 2
output: sourmash/ecoli.csv, sourmash/ecoli.sm
conda-env: /home/wikim-hpc/MAGpy/.snakemake/conda/8abd07f5f0e14ba82c88ec1b75516b52
shell:
sourmash gather -k 31 sourmash/ecoli.sig ecolidb.sbt.json -o sourmash/ecoli.csv > sourmash/ecoli.sm
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job sourmash_gather since they might be corrupted:
sourmash/ecoli.csv, sourmash/ecoli.sm
[Tue May 25 16:56:41 2021]
Finished job 8.
4 of 13 steps (31%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/wikim-hpc/MAGpy/.snakemake/log/2021-05-25T165311.085862.snakemake.log

How can I solve this problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.