Code Monkey home page Code Monkey logo

agnostos-wf's People

Contributors

chiaravanni avatar genomewalker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

agnostos-wf's Issues

Location of Pfam-31_names_mod and DPD data

Hi,

First, thank you for developing such a comprehensive workflow.

I was setting up the workflow and was unable to find the Pfam-31_names_mod_01122019.tsv and DPD datasets required.
According to the config/config.yaml file the Pfam-31_names_mod_01122019.tsv file should be on Figshare but I am unable to find it using the links available through the pre-print.
It also looks like the DPD website (http://darkproteome.ws/) is no long available (unless they have changed to a new site). Is there an alternative location to download the required DPD data?

Hopefully I haven't completely overlooked these files somewhere obvious.

Thanks,
Tim.

uniprotKB database setup error

Dear all

I have been creating multiple github issue posts here, mainly trying to set up SLURM capabilities on AWS (recent thread here). We have since then returned back in trying to set up HPC in our local server, which means that agnostos had to be set up again.

Having said that, while I was trying to download/setup the UniProtKB database with MMSEQS, I encounter the following messages:

"${MMSEQS}" databases "UniProtKB" uniprotKB tmp --remove-tmp-files 1 -v 0
databases UniProtKB uniprotKB tmp --remove-tmp-files 1 -v 0

MMseqs Version:                 2f1db01c5109b07db23dc06df9d232e82b1b4b99-MPI
Force restart with latest tmp   false
Remove temporary files          true
Compressed                      0
Threads                         96
Verbosity                       0

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   151  100   151    0     0   1129      0 --:--:-- --:--:-- --:--:--  1135
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 86.6M  100 86.6M    0     0  7105k      0  0:00:12  0:00:12 --:--:--  9.7M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 52.7G  100 52.7G    0     0  5136k      0  2:59:28  2:59:28 --:--:-- 5332k
createdb tmp/9145239893181210742/uniprot_sprot.fasta.gz tmp/9145239893181210742/uniprot_trembl.fasta.gz uniprotKB --compressed 0 -v 0

Converting sequences
[230895596] 23m 20s 481ms
Time for merging to uniprotKB_h: 0h 2m 34s 996ms
Time for merging to uniprotKB: 0h 3m 30s 766ms
Database type: Aminoacid
Time for processing: 0h 33m 17s 737ms
prefixid uniprotKB_h tmp/9145239893181210742/header_pref.tsv --tsv --threads 96 -v 0

[=================================================================] 100.00% 230.90M 19s 822ms
Time for merging to header_pref.tsv: 0h 2m 45s 511ms
Time for processing: 0h 3m 33s 500ms
Tmp tmp/9145239893181210742/taxonomy folder does not exist or is not a directory.
Create dir tmp/9145239893181210742/taxonomy
createtaxdb uniprotKB tmp/9145239893181210742/taxonomy --threads 96 -v 0

Download taxdump.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 55.2M  100 55.2M    0     0  2563k      0  0:00:22  0:00:22 --:--:-- 1738k
Database created
tmp/9145239893181210742/taxonomy/createindex.sh: 90: [: Illegal number:

I am not sure what the "illegal number" error means, and whether it is related mmseqs not recognizing my tmp/9145239893181210742/taxonomy directory...any help would be greatly appreciated.

Thank you very much

Marcus

profile_search.sh output question

Dear all

Just want to take this opportunity here to say thank you for creating this promising tool, and the help I have been receiving here on github :)

I managed to run profile_search.sh on a fasta file of 34,234 coding sequences. However, the resulting file [my_sample]_vs_mg_gtdb_hmm_search_res_best-hits.tsv only contains a header and 14,925 lines of results. My understanding is that all coding sequences should be accounted for? Maybe I am missing something?

Thank you very much

Marcus

Singularity/Docker

Hello,

Do you have any plan to release a Singularity/Docker version of Agnostos? It would be great as both are able to run with MPI and the portability will be better.

error in step cluster_compositional_validation

Hello,

I have been trying to get the db_creation workflow to run and I am stuck at the step cluster_compositional_validation:
In the logfile (logs/cval_stderr.err) it reads:

Invalid database read for database data file=/project/scratch/p200005/agnostos_test/db_creation2/mmseqs_clustering/clu_seqDB, database index=/project/scratch/p200005/agnostos_test/db_creation2/mmseqs_clustering/clu_seqDB.index
Size of data: 2340609
Requested offset: 2584994
Can not open index file /project/scratch/p200005/agnostos_test/db_creation2/compositional_validation/comp_valDB_tmp_0_tmp_1.index!
srun: error: mel0075: task 0: Exited with exit code 1

The size of "request offset" varied in the different testruns that I made.
I can do mmseqs view on the database clu_seqDB without error.

In the output folder compositional_validation there are several tmp index files, most ending on .0 except comp_valDB_tmp_0_tmp_0.index, but not the one from the log file is missing:

$ ls
SSN
alignments
comp_valDB_tmp_0
comp_valDB_tmp_0_tmp_0.index
comp_valDB_tmp_0_tmp_0.index.0
comp_valDB_tmp_0_tmp_1.index.0
comp_valDB_tmp_0_tmp_10.index.0
....

I tried to look into the script compositional_validation.sh, but I cannot find what could be going wrong at this step.
Any help would be appreciated.

Best regards.

nr.gz database error message during zcat step in download_DB.sh

Dear all

We have been encountering issues processing the nr database with the following outputs:

aria2c --file-allocation=none -c -x 10 -s 10 ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz

${MMSEQS}" createdb nr.gz nr.db --write-lookup 0 -v 0
nr.db exists and will be overwritten.
createdb nr.gz nr.db --write-lookup 0 -v 0

MMseqs Version:         2f1db01c5109b07db23dc06df9d232e82b1b4b99-MPI
Database type           0
Shuffle input database  true
Createdb mode           0
Write lookup file       0
Offset of numeric ids   0
Compressed              0
Verbosity               0

Converting sequences
[158056214] 16m 8s 55msss
Time for merging to nr.db_h: 0h 6m 32s 806ms
Time for merging to nr.db: 0h 15m 7s 228ms
Database type: Aminoacid
Time for processing: 0h 41m 1s 161ms_

zcat nr.gz | grep '^>' | sed 's/^>//' | sed 's/ /\t/' | sed 's/ /_/g' | gzip > nr.proteins.tsv.gz
gzip: nr.gz: invalid compressed data--format violated

Seems like somehow the nr.gz file is not compressed properly or something? However if this was the case, does that mean that the mmseq step did not run properly? We did manage to complete the mmseq step without any error messages. Thanks

Marcus

rule pfam_annotation error: customising hmmsearch log directory location

Dear all

We have encountered the following error when running db_update on head node on an AWS EC2 instance with SLURM capability and working nodes also in EC2 instances.

Error in rule pfam_annotation:
    jobid: 4
    output: /shared-efs/marcus2/database/pfam_annotation/pfam_annotations.tsv
    log: logs/pfannot_stdout.log, logs/pfannot_stderr.err (check log file(s) for error message)
    conda-env: /shared-efs/marcus2/agnostos-wf/db_update/.snakemake/conda/ae0ca3f79801625c00fe17fbeced98d9
    shell:
 
        set -x
        set -e

        export OMPI_MCA_btl=^openib
        export OMP_NUM_THREADS=28
        export OMP_PROC_BIND=FALSE

        NPFAM=$(grep -c '^NAME' /shared-efs/marcus2/agnostos-wf/databases/Pfam-A.hmm)
        NSEQS=$(grep -c '^>' /shared-efs/marcus2/database/gene_prediction/orf_seqs.fasta)
        N=$(($NSEQS * $NPFAM))

        # Run hmmsearch (MPI-mode)
        srun --mpi=pmi2 /shared-efs/marcus2/agnostos-wf/bin/hmmsearch --mpi --cut_ga -Z "${N}" --domtblout /shared-efs/marcus2/database/pfam_annotation/hmmsearch_pfam_annot.out -o /shared-efs/marcus2/database/pfam_annotation/hmmsearch_pfam_annot.log /shared-efs/marcus2/agnostos-wf/databases/Pfam-A.hmm /shared-efs/marcus2/database/gene_prediction/orf_seqs.fasta 2>logs/pfannot_stderr.err 1>logs/pfannot_stdout.log

        # Collect the results
        grep -v '^#' /shared-efs/marcus2/database/pfam_annotation/hmmsearch_pfam_annot.out > /shared-efs/marcus2/database/pfam_annotation/pfam_annotations.tsv

 
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Submitted batch job 6

Error executing rule pfam_annotation on cluster (jobid: 4, external: Submitted batch job 6, jobscript: /shared-efs/marcus2/agnostos-wf/db_update/.snakemake/tmp.3xj2r3io/snakejob.pfam_annotation.4.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-03-25T144023.337224.snakemake.log

We do not see the /shared-efs/marcus2/database/pfam_annotation/ being generated, suggesting that something went wrong with hmmsearch. However, when we ran the hmmsearch command interactively (without the srun command), AND changing the path of the log directory, it seemed to be running without any errors.

This suggests that AGNOSTOS is having trouble accessing the log directory location with the original script. If this is the core of the problem, how do we go about changing the log directory? Is it in any of the config files where we can change the log directory location?

Thank you very much

Marcus

AgnostosDB "seedDB + TARA giant viruses + TARA eukaryotes" availability

Hi,
First of all thank you so much for sharing this workflow!
I would like to this approach on a set of metagenomes in which I'm mostly interested in the eukaryotic content. I saw on your wiki (https://github.com/functional-dark-side/agnostos-wf/wiki) that there is a version of the AgnostosDB (version: "seedDB + TARA giant viruses + TARA eukaryotes") which includes the eukaryotes from Delmont and coworkers (https://doi.org/10.1016/j.xgen.2022.100123) which I think could be be perfect for my aims.
However, I see that a "[fixing bug...]" warning is placed next to that version and the link reports back to the wiki page.

Is there an estimate on when this database will be available?
I don't know if it helps but I could help reporting bugs and all if that is the case.

In any case, thanks again!

mmseqs_clustering_update.smk error: Invalid database read for database data

Dear all

We are currently testing AGNOSTOS on our system, but breaking it based on the different snakemake sub-runs in agnostos-wf/db_update/rules/

In the steps within mmseqs_clustering_update.smk, in the very last task, we are currently getting the following error:

/agnostos-wf/bin/mmseqs createtsv /agnostos_test/db_update/mmseqs_clustering/seqDB /agnostos_test/db_update/mmseqs_clustering/seqDB /agnostos_test/db_update/mmseqs_clustering/cluDB /agnostos_test/db_update/mmseqs_clustering/cluDB.tsv --threads 28
createtsv /agnostos_test/db_update/mmseqs_clustering/seqDB /agnostos_test/db_update/mmseqs_clustering/seqDB /agnostos_test/db_update/mmseqs_clustering/cluDB /agnostos_test/db_update/mmseqs_clustering/cluDB.tsv --threads 28

MMseqs Version:                         2f1db01c5109b07db23dc06df9d232e82b1b4b99-MPI
First sequence as representative        false
Target column                           1
Add full header                         false
Sequence source                         0
Database output                         false
Threads                                 28
Compressed                              0
Verbosity                               3

Invalid database read for database data file=/agnostos_test/db_update/mmseqs_clustering/seqDB_h, database index=/agnostos_test/db_update/mmseqs_clustering/seqDB_h.index
getData: local id (4294967295) >= db size (427306981)

We are actually testing AGNOSTOS based on the database that also contains the expanded sequences from the TARA ocean works (FigShare link) that contains the expanded eukaryotic plankton predicted gene sequences. Could that be related? Thanks

Marcus

Installation HHblits on HPC

Hi!
I am trying to install the dependencies of AGNOSTOS following your instructions here: https://github.com/functional-dark-side/agnostos-wf/blob/master/AGNOSTOS_usage.md

And everything seems to be working! Yay! (thanks for that). Except HHblits.

When running make -j 8 and make install I get the following error:
image

I tried a couple of times, to try to make sure I didn't make a mistake, but I got the same thing. Do you think it has to do with my permissions in the hpc?

Best,

Mery

setting directories

Hi,

I am trying to run the DB creation part of the workflow and I am struggling to parse all the locations I need to point to to run on my system locally.
https://github.com/functional-dark-side/agnostos-wf/wiki#db-creation

Running this command:
snakemake --conda-frontend conda --use-conda -j 100 --config module="creation" --cluster-config config/cluster.yaml --cluster "sbatch --export=ALL -t {cluster.time} -c {threads} --ntasks-per-node {cluster.ntasks_per_node} --nodes {cluster.nodes} --cpus-per-task {cluster.cpus_per_task} --job-name {rulename}.{jobid} --partition {cluster.partition}" -R --until creation_workflow_report

With the attached my yaml files (renamed to txt so I can upload them).

Results in this error:

rule gene_prediction:
    input: /data/etrembat/agnostos_test/db_creation_data/TARA_039_041_SRF_0.1-0.22_5K_contigs.fasta
    output: /vol/cloud/agnostos_test/db_creation/gene_prediction/orf_seqs.fasta, /vol/cloud/agnostos_test/db_creation/gene_prediction/orf_partial_info.tsv
    log: logs/gene_stdout.log, logs/gene_stderr.err
    jobid: 11
    benchmark: benchmarks/gene_prediction.tsv
    reason: Missing output files: /vol/cloud/agnostos_test/db_creation/gene_prediction/orf_partial_info.tsv, /vol/cloud/agnostos_test/db_creation/gene_prediction/orf_seqs.fasta
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>

I am unsure where these files should be since the download for the creation data only has the contigs.fasta files.

Thanks!

config_yaml.txt
config_communities_yaml.txt
cluster_yaml.txt

db_creation snakemake command: conda download/install stage stuck

Dear all

I am new to snakemake, but it seems like my db_creation snakemake command is stuck in the conda stage:

$ snakemake -s Snakefile --use-conda -j 100 --cluster-config config/cluster.yaml --cluster "sbatch --export=ALL -t 48 -c 1 --ntasks-per-node 1 --nodes 1 --cpus-per-task 1 --job-name hostname.0001" -R --until workflow_report

Migrating .snakemake folder to new format...
Migration complete
Building DAG of jobs...
Removing incomplete Conda environment [path]/agnostos-wf/envs/workflow.yml...
Creating conda environment [path]/agnostos-wf/envs/workflow.yml...
Downloading and installing remote packages.

This has been here for over 15 hours now.

I am not sure if it is because of the number of tasks/nodes/cpus I have set, but I am not sure if this is supposed to take this long. Thanks

Marcus

Split dependencies into different environment files

Hi,

Thank you for providing this tool. I am very excited to use it.

I am hoping to install AGNOSTOS, but I am concerned with the fact that all dependencies seem to be lumped together in the workflow.yaml file. In my experience, this can very easily lead to problems with environment setup, specially in a distributed cluster context.

Would you consider providing separate environment files for each rule/dependency? This could potentially facilitate the installation and portability of dependencies of the workflow. I would be glad to submit a PR attempting to implement this for your appreciation, since it's something I am inclined to do so I can use AGNOSTOS myself. I also believe doing this separation of dependencies into distinct files could be a step forwarding to including more tools listed in the installation_script.sh (e.g. I believe Parasail, Kaiju, EggNOG-mapper all have Bioconda distributions?).

Thank you for any assistance you can provide.

Best,
V

Running AGNOSTOS db_update on cluster without SLURM capability

Dear all

I appear to have managed to set up Agnostos properly, but it appears that the cluster that I am using does not support SLURM. I was advised to run the command without the SLURM bit:

cd db_update/
snakemake -s Snakefile --use-conda -j 100 --cluster-config config/cluster.yaml -R --until workflow_report

So it appeared to be running until I hit the following error on the log file:

Error in rule spurious_shadow:
    jobid: 14
    output: /my_path/agnostos-wf/db_update/spurious_shadow/spurious_shadow_info.tsv
    log: logs/spsh_stdout.log, logs/spsh_stderr.err (check log file(s) for error message)
    conda-env: /my_path/agnostos-wf/db_update/.snakemake/conda/e79cd1b5
    shell:
        
        set -x
        set -e

        export OMPI_MCA_btl=^openib
        export OMP_NUM_THREADS=16
        export OMP_PROC_BIND=FALSE

        # 1. Detection of spurious ORFs

        NANTI=$(grep -c '^NAME' /my_path/agnostos-wf/databases/AntiFam.hmm)
        NSEQS=$(grep -c '^>' /my_path/agnostos-wf/db_update/gene_prediction/orf_seqs.fasta)
        N=$(($NSEQS * $NANTI))

        # Run hmmsearch (MPI mode)
        srun --mpi=pmi2 /my_path/agnostos-wf/agnostos-wf/bin/hmmsearch --mpi --cut_ga -Z "${N}" --domtblout /my_path/agnostos-wf/db_update/spurious_shadow/hmmsearch_antifam_sp.out -o /my_path/agnostos-wf/db_update/spurious_shadow/hmmsearch_antifam_sp.log /my_path/agnostos-wf/databases/AntiFam.hmm /my_path/agnostos-wf/db_update/gene_prediction/orf_seqs.fasta 2>logs/spsh_stderr.err 1>logs/spsh_stdout.log

The corresponding ogs/spsh_stdout.log file indicates that srun is not found (as expected). I figured the way to let this run without SLURM would be to remove the srun commands on all AGNOSTOS commands, which might be a daunting task. However, I can't seem to find the file in which all these commands for the snakemake workflow is laid out.

I am not overly familiar with snakemake so if anyone can lend me a hand that would be great.

Better yet, if there is a currently-functioning alternative to running agnostos without SLURM that would also be awesome!

Thanks

Marcus

igraph Installation troubleshooting

Dear all

I have been trying to install the packages outside of conda, specifically igraph, and I got the following error code when I input

wget https://igraph.org/nightly/get/c/igraph-0.7.1.tar.gz
tar xvfz igraph-0.7.1.tar.gz
cd igraph-0.7.1
./configure --prefix="${WD}"/bin/igraph
make -j 8
make check
Making check in src
make[1]: Entering directory '/media/software/conda/envs/agnostos-wf/extra_packages/programs/igraph-0.7.1/src'
make  check-am
make[2]: Entering directory '/media/software/conda/envs/agnostos-wf/extra_packages/programs/igraph-0.7.1/src'
/bin/bash ../libtool  --tag=CXX   --mode=compile /media/software/conda/envs/agnostos-wf/bin/x86_64-conda-linux-gnu-c++ -DHAVE_CONFIG_H -I. -I..   -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /media/software/conda/envs/agnostos-wf/include -I/media/software/conda/envs/agnostos-wf/include/libxml2 -I/media/software/conda/envs/agnostos-wf/include -I../include -I../include -Wall -I../optional/glpk -I../src/prpack -DPRPACK_IGRAPH_SUPPORT -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /media/software/conda/envs/agnostos-wf/include -MT libigraph_la-clustertool.lo -MD -MP -MF .deps/libigraph_la-clustertool.Tpo -c -o libigraph_la-clustertool.lo `test -f 'clustertool.cpp' || echo './'`clustertool.cpp
libtool: compile:  /media/software/conda/envs/agnostos-wf/bin/x86_64-conda-linux-gnu-c++ -DHAVE_CONFIG_H -I. -I.. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /media/software/conda/envs/agnostos-wf/include -I/media/software/conda/envs/agnostos-wf/include/libxml2 -I/media/software/conda/envs/agnostos-wf/include -I../include -I../include -Wall -I../optional/glpk -I../src/prpack -DPRPACK_IGRAPH_SUPPORT -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /media/software/conda/envs/agnostos-wf/include -MT libigraph_la-clustertool.lo -MD -MP -MF .deps/libigraph_la-clustertool.Tpo -c clustertool.cpp  -fPIC -DPIC -o .libs/libigraph_la-clustertool.o
In file included from clustertool.cpp:51:
/media/software/conda/envs/agnostos-wf/x86_64-conda-linux-gnu/include/c++/9.3.0/ctime:80:11: error: '::timespec_get' has not been declared
   80 |   using ::timespec_get;
      |           ^~~~~~~~~~~~
make[2]: *** [Makefile:7350: libigraph_la-clustertool.lo] Error 1
make[2]: Leaving directory '/media/software/conda/envs/agnostos-wf/extra_packages/programs/igraph-0.7.1/src'
make[1]: *** [Makefile:7806: check] Error 2
make[1]: Leaving directory '/media/software/conda/envs/agnostos-wf/extra_packages/programs/igraph-0.7.1/src'
make: *** [Makefile:480: check-recursive] Error 1

How could one go about fixing this error? Thanks

Marc

Question/Suggestion

Hi!

I am going through the installation process following AGNOSTOS_usage.md. I was changing the configuration files and I got to the database directory path changes. I assumed that in order to know that path I first need to do the following step:

"4. Check that you have the required external DBs listed in the config.yaml file (under "Databases"). In case you miss some of them, you can find the instructions for the download in the script download_DBs.sh. If you want to download all needed databases simply run sh download_DBs.sh (be patient this may take a while...)."

So I have gone ahead and done the downloading before I change the configuration files. If this is the correct thing to do I wanted to suggest you mention it in the readme before the change in configuration step, for clarity.

Thanks!

SLURM HPC architecture for running AGNOSTOS

Dear all

Our server has a following architecture. We currently have the same head and compute node. It seems to be ok submitting SLURM jobs as single nodes, but we are currently struggling to set up multi-nodes with our server. We suspect that SLURM requires some sort of networking between computer and head nodes with open ports. We just have 96cpus, but they seem to be separated into 2 NUMA nodes between the 96 CPUs.

We were wondering if AGNOSTOS can work via SLURM operating on a single node, or is it necessary to have AGNOSTOS running on a multi-node system?

Thank you very much

Regards

Marcus

Conda env

Hi,

I'm wondering why some dependencies are not included in the yaml file and need to be manually installed. I've take a look on Anaconda and they all seems available:

  - igraph
  - parasail-python
  - hmmer
  - ffindex
  - hhsuite
  - famsa
  - bioconductor-odseq
  - mmseqs2

Is there any reason to not put them into the yaml?

Thanks for your awesome workflow :)

Spurious Shadow Step Question

Hi,

I'm trying to test agnostos update module (db_update) using a small test set of sequences (50 or so). I've been running the snakemake pipeline manually going through each smk file, starting with gene_prediction.smk to mmseqs_clustering_results.smk. I am now at spurious_shadows.

When I run this it errors out, and I'm not sure if its b/c there is not results or if its something with the code. So I decided to run each command of the shell portion from spurious_shadows.smk as ouputed in the snakemake log files.

When I run the hmmer search on line 61 of

       {params.mpi_runner} {params.hmmer_bin} --mpi --cut_ga -Z "${{N}}" --domtblout {params.hmmout} -o {params.hmmlog} {params.antifamdb} {input.fasta} 2>{log.err} 1>{log.out}

followed by line 69

        grep -v '^#' {params.hmmout} > {params.spur}.tmp || true > {params.spur}.tmp 2>>{log.err}

The resulting tmp file is empty, and looking at the hmmout file it also has no hits. With the tmp file being empty I can not proceed to the next steps # 2. Detection of shadow ORFs line 80.

So my question is, does it just die there and I can not proceed further pass spurious_shadow and to step cluster_pfam_annotation?

thanks!

gcc error during installation

Dear all,

I keep having this error after running sh installation_script.shDo you have any ideas how can I work around this? I checked on the path and the file does exits. The operative system is CentOS and the batch system SLURM.

Many thanks in advanced!

Magda
error.txt

majority_vote_categ.R fails; Error in dirname(opt$contig)

Hello.

I am following the tutorial on Integrating AGNOSTOS gene categories into anvi'o projects using the example data from the tutorial. I believe I have followed the recipe correctly yet at the step where the workflow calls majority_vote_categ.R I get the following error:

Error in dirname(opt$contig) : a character vector argument expected

I looked at the script but I am afraid it was not obvious why it was failing. I tried several times to reinstall everything but no luck. I also tried different versions of R (3.6, 4.0). I am stuck :)

Any help would be appreciated. I also attached the log file job_AGNOSTOS.log.

Here is how I am running the command:

MMSEQ="/home/scottjj/miniconda3/envs/mmseqs2/bin"

agnostos-wf/Profile_search/profile_search.sh \
                       --query infant_gut_genes.fasta \
                       --clu_hmm IGD_agnostos/GC_profiles/clu_hmm_db \
                       --clu_cat IGD_agnostos/cluster_ids_categ.tsv \
                       --threads 10 --mmseqs $MMSEQ/mmseqs

mmseqs2 core dump

Hi,

We have installed Agnostos-wf and in our first attempt to analyze our metagenomic data we got the error bellow (in bold) after running 'db_creation' workflow

Used command line:
snakemake --use-conda -j 100 --cluster-config config/cluster.yaml --cluster "sbatch --export=ALL -t {cluster.time} -c {threads} --ntasks-per-node {cluster.ntasks_per_node} --nodes {cluster.nodes} --cpus-per-task {cluster.cpus_per_task} --job-name {rulename}.{jobid} --partition {cluster.partition}" -R --until workflow_report

#################################BEGIN slurm log file##############################################
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 5
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 mmseqs_clustering
1
Select jobs to execute...

[Fri Jun 18 15:05:26 2021]
rule mmseqs_clustering:
output: /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/cluDB.tsv
log: logs/mmseqs_clustering_stdout.log, logs/mmseqs_clustering_stderr.err
jobid: 0
benchmark: benchmarks/mmseqs_clustering/clu.tsv
threads: 5

  • set -e

  • export 'OMPI_MCA_btl=^openib'

  • OMPI_MCA_btl='^openib'

  • export OMP_NUM_THREADS=5

  • OMP_NUM_THREADS=5

  • export OMP_PROC_BIND=FALSE

  • OMP_PROC_BIND=FALSE

  • /home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs createdb /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/seqDB
    /usr/bin/bash: line 8: 1032725 Illegal instruction (core dumped) /home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs createdb /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/seqDB 2> logs/mm>[Fri Jun 18 15:05:27 2021]
    Error in rule mmseqs_clustering:
    jobid: 0
    output: /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/cluDB.tsv
    log: logs/mmseqs_clustering_stdout.log, logs/mmseqs_clustering_stderr.err (check log file(s) for error message)
    shell:

      set -x
      set -e
    
      export OMPI_MCA_btl=^openib
      export OMP_NUM_THREADS=5
      export OMP_PROC_BIND=FALSE
    
      /home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs createdb  /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/seqDB 2>logs/mmseqs_clustering_stderr.err 1>logs/mmseqs_clustering_stdout.log
        /home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs cluster           /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/seqDB           /home/joaquim.junior/work/projects/bagasse/analysi>
      /home/bioinf/progs/agnostos/agnostos-wf/bin/mmseqs createtsv /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creation/mmseqs_clustering/seqDB /home/joaquim.junior/work/projects/bagasse/analysis/agnostos/db_creati>
      (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
#################################END slurm log file##############################################

I was not able to figure out what might be happening.

Best regards,
Joaquim

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.