Code Monkey home page Code Monkey logo

athena_meta's People

Contributors

abishara avatar jvhaarst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

athena_meta's Issues

Error during assemble_olc : AssertionError: merge FASTA not valid

I received the following error when running athena. Please, can you help me determine what caused the error and provide suggestions to fix the problem?

============================== assemble_olc ==============================
1 chunks to run. Starting...
2019-12-27 21:55:37 - ERROR - ========== Exception ==========
2019-12-27 21:55:37 - ERROR - Traceback (most recent call last):
2019-12-27 21:55:37 - ERROR - File "/home/s953272/ATHENA_META_ENV/lib/python2.7/site-packages/athena/pipeline.py", line 5
1, in _run_chunk
2019-12-27 21:55:37 - ERROR - chunk.run()
2019-12-27 21:55:37 - ERROR - File "/home/s953272/ATHENA_META_ENV/lib/python2.7/site-packages/athena/stages/assemble_olc.
py", line 135, in run
2019-12-27 21:55:37 - ERROR - assert is_valid_fasta(mergedfiltfa_path), "merge FASTA not valid"
2019-12-27 21:55:37 - ERROR - AssertionError: merge FASTA not valid
2019-12-27 21:55:37 - ERROR -
2019-12-27 21:55:37 - ERROR - merge FASTA not valid
Traceback (most recent call last):
File "/home/s953272/ATHENA_META_ENV/bin/athena-meta", line 11, in
sys.exit(main())
File "/home/s953272/ATHENA_META_ENV/lib/python2.7/site-packages/main.py", line 211, in main
run(options)
File "/home/s953272/ATHENA_META_ENV/lib/python2.7/site-packages/main.py", line 42, in run
runner.run_stage(stage, stage_name)
File "/home/s953272/ATHENA_META_ENV/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage
cluster.map(_run_chunk, to_run)
File "/home/s953272/ATHENA_META_ENV/lib/python2.7/site-packages/athena/cluster.py", line 43, in map
return pool.map_async(fn, args).get(9999999)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
AssertionError: merge FASTA not valid
Command exited with non-zero status 1

Athena Assembly olc error

Hello,

I have a metagenomic dataset that I would like to assembly using athena. However I am running into this issue, which I have attached. Can you please help me identify a solution?

Thanks in Advance,
Kevin Nguyen
athena_olc_error.txt

pip install error

OS = CentOS release 6.2 (Final)
[root@server1 athena_meta-main]# echo $PATH /usr/local/packages/bwa-0.7.15/bin:/usr/local/packages/idba-1.1.3a1/bin:/usr/local/packages/samtools-1.3.1/bin:/usr/local/packages/htslib-1.3.2/bin:/usr/local/packages/flye-2.3.4/bin:/usr/local/packages/python-2.7.15/bin:$PATH

All python prerequisites have been installed

[root@server1 athena_meta-main]# pip --no-cache-dir install -r requirements.txt Requirement already satisfied: bx-python==0.7.3 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from -r requirements.txt (line 1)) (0.7.3) Requirement already satisfied: ipython-cluster-helper==0.5.2 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from -r requirements.txt (line 2)) (0.5.2) Requirement already satisfied: numpy==1.11.0 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from -r requirements.txt (line 3)) (1.11.0) Requirement already satisfied: pysam==0.9.1 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from -r requirements.txt (line 4)) (0.9.1) Requirement already satisfied: setuptools>=18.5 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (39.0.1) Requirement already satisfied: pyzmq>=2.1.11 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (17.0.0) Requirement already satisfied: ipython<5.0.0,>=4.0.0 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (4.2.1) Requirement already satisfied: ipyparallel>=4.0.0 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (6.1.1) Requirement already satisfied: netifaces>=0.10.3 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (0.10.7) Requirement already satisfied: decorator in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (4.3.0) Requirement already satisfied: pickleshare in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (0.7.4) Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (1.0.0) Requirement already satisfied: simplegeneric>0.8 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (0.8.1) Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (4.5.0) Requirement already satisfied: traitlets in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (4.3.2) Requirement already satisfied: tornado>=4 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (5.0.2) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (2.7.3) Requirement already satisfied: jupyter-client in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (5.2.3) Requirement already satisfied: ipykernel in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (4.8.2) Requirement already satisfied: ipython-genutils in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (0.2.0) Requirement already satisfied: futures; python_version == "2.7" in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (3.2.0) Requirement already satisfied: pathlib2; python_version in "2.6 2.7 3.2 3.3" in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from pickleshare->ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (2.3.2) Requirement already satisfied: ptyprocess>=0.5 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (0.5.2) Requirement already satisfied: six in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from traitlets->ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (1.11.0) Requirement already satisfied: enum34; python_version == "2.7" in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from traitlets->ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (1.1.6) Requirement already satisfied: singledispatch in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from tornado>=4->ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (3.4.0.3) Requirement already satisfied: backports_abc>=0.4 in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from tornado>=4->ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (0.5) Requirement already satisfied: jupyter-core in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from jupyter-client->ipyparallel>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (4.4.0) Requirement already satisfied: scandir; python_version < "3.5" in /usr/local/packages/python-2.7.15/lib/python2.7/site-packages (from pathlib2; python_version in "2.6 2.7 3.2 3.3"->pickleshare->ipython<5.0.0,>=4.0.0->ipython-cluster-helper==0.5.2->-r requirements.txt (line 2)) (1.7)

When attempting athena_meta install via pip

pip -r install -vvv .

Traceback (most recent call last): File "/usr/local/packages/python-2.7.15/lib/python2.7/site-packages/pip/_internal/utils/outdated.py", line 140, in pip_version_check state.save(pypi_version, current_time) File "/usr/local/packages/python-2.7.15/lib/python2.7/site-packages/pip/_internal/utils/outdated.py", line 73, in save state = json.load(statefile) File "/usr/local/packages/python-2.7.15/lib/python2.7/json/__init__.py", line 291, in load **kw) File "/usr/local/packages/python-2.7.15/lib/python2.7/json/__init__.py", line 339, in loads return _default_decoder.decode(s) File "/usr/local/packages/python-2.7.15/lib/python2.7/json/decoder.py", line 364, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/packages/python-2.7.15/lib/python2.7/json/decoder.py", line 382, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

Cannot open file "./results/olc/pre-flye-input-contigs.fa"

Hi,

I am experiencing some issues when trying to run athena-meta, more specifically in the OLC assembly step.. This is the error code I get in the log file:

2018-02-01 12:59:19 - --starting logging AssembleOLCStep --
2018-02-01 12:59:19 - jointly assemble bins with OLC
2018-02-01 12:59:19 - merge input contigs
2018-02-01 12:59:19 - ========== Exception ==========
2018-02-01 12:59:19 - Traceback (most recent call last):
2018-02-01 12:59:19 -   File "/shared/external/miniconda3/envs/athena_assembly/lib/python2.7/site-packages/athena/pipeline.py", line 50, in _run_chunk
2018-02-01 12:59:19 -     chunk.run()
2018-02-01 12:59:19 -   File "/shared/external/miniconda3/envs/athena_assembly/lib/python2.7/site-packages/athena/stages/assemble_olc.py", line 96, in run
2018-02-01 12:59:19 -     premergedfiltfa_path,
2018-02-01 12:59:19 -   File "/shared/external/miniconda3/envs/athena_assembly/lib/python2.7/site-packages/athena/stages/assemble_olc.py", line 143, in filter_inputs
2018-02-01 12:59:19 -     ctg_size_map = util.get_fasta_sizes(mergedfa_path)
2018-02-01 12:59:19 -   File "/shared/external/miniconda3/envs/athena_assembly/lib/python2.7/site-packages/athena/mlib/util.py", line 151, in get_fasta_sizes
2018-02-01 12:59:19 -     fasta = pysam.FastaFile(fa_path)
2018-02-01 12:59:19 -   File "pysam/libcfaidx.pyx", line 119, in pysam.libcfaidx.FastaFile.__cinit__
2018-02-01 12:59:19 -   File "pysam/libcfaidx.pyx", line 160, in pysam.libcfaidx.FastaFile._open
2018-02-01 12:59:19 - IOError: could not open file `./results/olc/pre-flye-input-contigs.fa`
2018-02-01 12:59:19 - 
2018-02-01 12:59:19 - could not open file `./results/olc/pre-flye-input-contigs.fa`

Hope you can help me,
Jennifer

athena_meta install error

when I used the $pip install -r requirements.txt, the error is

pip install -r requirements.txt
Collecting bx-python==0.7.3 (from -r requirements.txt (line 1))
Using cached https://files.pythonhosted.org/packages/55/db/fa76af59a03c88ad80494fc0df2948740bbd58cd3b3ed5c31319624687cc/bx-python-0.7.3.tar.gz
Collecting ipython-cluster-helper==0.5.2 (from -r requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/97/df/8a9b0ef7657344bdad15ba9bc187c8d5182f27d5afda51c493a961d0f9a1/ipython-cluster-helper-0.5.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-cq2sb9t7/ipython-cluster-helper/setup.py", line 3, in
from pip.req import parse_requirements
ModuleNotFoundError: No module named 'pip.req'

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-cq2sb9t7/ipython-cluster-helper/

how can I solve the problem?

Input bam needs to be indexed

It is mentioned in the manual that the input BAM should be sorted, it seems that it need to be indexed as well - should this be added?

cryptic AssertionError

I'm getting a cryptic assertion error lacking an informative trackback:

============================== check_reads ==============================
1 chunks to run. Starting...
2020-07-24 20:12:18 - INFO - index fastq /ebio/abt3_scratch/haptag-mg_49025743982/athena/0.5/3/2000000/sim_reads/1/R1-2.fq
2020-07-24 20:13:04 - INFO - get seed contigs from input assembly
2020-07-24 20:13:04 - INFO - computing seed coverages (required pass thru *bam)
2020-07-24 20:13:36 - INFO -   79298 total inputs seeds covering 94388578 bases
2020-07-24 20:13:36 - INFO -   491 input seed contigs >= 400bp and >= 10.0x coverage covering 10925225 bases
2020-07-24 20:13:36 - INFO - created 492 bins from seeds
2020-07-24 20:13:36 - INFO - done
2020-07-24 20:13:36 - ERROR - ========== Exception ==========
2020-07-24 20:13:36 - ERROR - Traceback (most recent call last):
2020-07-24 20:13:36 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/pipeline.py", line 52, in _run_chunk
2020-07-24 20:13:36 - ERROR -     chunk.finalize()
2020-07-24 20:13:36 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/stages/step.py", line 109, in finalize
2020-07-24 20:13:36 - ERROR -     assert not self.needs_to_run()
2020-07-24 20:13:36 - ERROR - AssertionError
2020-07-24 20:13:36 - ERROR -
2020-07-24 20:13:36 - ERROR -
Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/bin/athena-meta", line 10, in <module>
    sys.exit(main())
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/main.py", line 211, in main
    run(options)
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/main.py", line 42, in run
    runner.run_stage(stage, stage_name)
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage
    cluster.map(_run_chunk, to_run)
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/cluster.py", line 43, in map
    return pool.map_async(fn, args).get(9999999)
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
AssertionError

It appears that assert not self.needs_to_run() may be the cause, but that isn't very informative in itself.

My conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       0_gnu    conda-forge
athena_meta               1.3                        py_1    bioconda
backports                 1.0                        py_2    conda-forge
backports.shutil_get_terminal_size 1.0.0                      py_3    conda-forge
backports_abc             0.5                        py_1    conda-forge
bwa                       0.7.17               hed695b0_7    bioconda
bx-python                 0.8.9            py27h213ead4_1    bioconda
bzip2                     1.0.8                h516909a_2    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
certifi                   2019.11.28       py27h8c360ce_1    conda-forge
configparser              3.7.3            py27h8c360ce_2    conda-forge
curl                      7.71.1               he644dc0_3    conda-forge
decorator                 4.4.2                      py_0    conda-forge
entrypoints               0.3             py27h8c360ce_1001    conda-forge
enum34                    1.1.10           py27h8c360ce_1    conda-forge
flye                      2.3.1                    py27_0    bioconda
futures                   3.3.0            py27h8c360ce_1    conda-forge
htslib                    1.9                  ha228f0b_7    bioconda
idba_subasm               1.1.3a1         py27pl526hf484d3e_0    bioconda
ipykernel                 4.10.0                   py27_1    conda-forge
ipyparallel               6.2.4            py27h8c360ce_0    conda-forge
ipython                   5.8.0                    py27_1    conda-forge
ipython-cluster-helper    0.6.4                      py_0    bioconda
ipython_genutils          0.2.0                      py_1    conda-forge
jupyter_client            5.3.4                    py27_1    conda-forge
jupyter_core              4.6.3            py27h8c360ce_1    conda-forge
krb5                      1.17.1               hfafb76e_1    conda-forge
ld_impl_linux-64          2.34                 h53a641e_7    conda-forge
libblas                   3.8.0               17_openblas    conda-forge
libcblas                  3.8.0               17_openblas    conda-forge
libcurl                   7.71.1               hcdd3856_3    conda-forge
libdeflate                1.0                  h14c3975_1    bioconda
libedit                   3.1.20191231         h46ee950_1    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc                    7.2.0                h69d50b8_2    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.5.0                hdf63c60_6    conda-forge
libgomp                   9.2.0                h24d8f2e_2    conda-forge
liblapack                 3.8.0               17_openblas    conda-forge
libopenblas               0.3.10          pthreads_hb3c22a3_3    conda-forge
libsodium                 1.0.17               h516909a_0    conda-forge
libssh2                   1.9.0                hab1572f_4    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
lzo                       2.10              h14c3975_1000    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
netifaces                 0.10.4                   py27_1    bioconda
numpy                     1.11.3          py27he5ce36f_1207    conda-forge
openssl                   1.1.1g               h516909a_0    conda-forge
pathlib2                  2.3.5            py27h8c360ce_1    conda-forge
perl                      5.26.2            h516909a_1006    conda-forge
pexpect                   4.8.0            py27h8c360ce_1    conda-forge
pickleshare               0.7.5           py27h8c360ce_1001    conda-forge
pigz                      2.3.4                hed695b0_1    conda-forge
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
prompt_toolkit            1.0.15                     py_1    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
pygments                  2.5.2                      py_0    conda-forge
pysam                     0.15.3           py27hda2845c_1    bioconda
python                    2.7.15          h5a48372_1011_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-lzo                1.12            py27h42e1302_1001    conda-forge
python_abi                2.7                    1_cp27mu    conda-forge
pyzmq                     19.0.0           py27h76efe43_1    conda-forge
readline                  8.0                  h46ee950_1    conda-forge
samtools                  1.9                 h10a08f8_12    bioconda
scandir                   1.10.0           py27hdf8410d_1    conda-forge
setuptools                44.0.0                   py27_0    conda-forge
simplegeneric             0.8.1                      py_1    conda-forge
singledispatch            3.4.0.3               py27_1000    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.32.3               hcee41ef_1    conda-forge
tk                        8.6.10               hed695b0_0    conda-forge
tornado                   5.1.1           py27h14c3975_1000    conda-forge
traitlets                 4.3.3            py27h8c360ce_1    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_0    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zeromq                    4.3.2                he1b5a44_2    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge

idba_subasm processes using 2 threads per process, overloading server

I have noticed when running Athena with multiple threads that the spawned processes for idba_subasm are running with two threads apiece.

top-snapshot-athena

The Python code, which I am guessing processes the output, each actively use roughly one core apiece.

top-snapshot-athena2

If we try to automate this a bit and run this on a cluster where the number of threads/cores per job can be set dynamically per job, for example SLURM:

athena-meta -t $SLURM_NPROCS --config myconfig.json

this leads to load problems. For example, on a small 12 core node this launches Athena with 12 threads. However, when I've monitored this via top there are 12 processes running (a mix of idba_asm and python) with the idba_subasm processes requested more threads than are available, roughly about 30-40% more, which is causing issues with load.

For the time being we're simply modifying the threads in the athena-meta call to about 2/3 of what are expected (8 threads for a 12 core node). But is there a way to account for this within athena?

[Errno 12] Cannot allocate memory

Hello,

Athena runs out of 2T memory in our server. Each bin needs over 160GB memory but I don't know why.

SubassembleReadsStep.bin.918.txt

$ tail -n 30 SubassembleReadsStep.bin.918

2020-01-17 12:14:21 - DEBUG - determing local assemblies
2020-01-17 12:15:31 - DEBUG - 4 initial link candidates to check
2020-01-17 12:15:31 - DEBUG - - 0 pass reciprocal filtering
2020-01-17 12:15:42 - DEBUG - root-ctg:NODE_21065_length_904_cov_10.610130;numreads:88;checks:4;trunc-checks:False;asms:0;trunc-asms:False
2020-01-17 12:15:42 - DEBUG - - found 1 candidates
2020-01-17 12:16:09 - DEBUG - performing local assemblies
2020-01-17 12:16:09 - DEBUG - assembling with neighbor None
2020-01-17 12:16:09 - DEBUG - - 44 orig barcodes
2020-01-17 12:16:09 - DEBUG - - 44 downsampled barcodes
2020-01-17 12:16:09 - DEBUG - - 9.73451327434x estimated local coverage
2020-01-17 12:16:09 - DEBUG - - 2 min_support required
2020-01-17 12:20:55 - ERROR - ========== Exception ==========
2020-01-17 12:20:55 - ERROR - Traceback (most recent call last):
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk
2020-01-17 12:20:55 - ERROR - chunk.run()
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 82, in run
2020-01-17 12:20:55 - ERROR - self.do_local_assembly(ctg, asmdir)
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 149, in do_local_assembly
2020-01-17 12:20:55 - ERROR - local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs)
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 83, in assemble
2020-01-17 12:20:55 - ERROR - contig_path = self._do_idba_assembly(local_asm)
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 160, in _do_idba_assembly
2020-01-17 12:20:55 - ERROR - stderr=subprocess.PIPE,
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/subprocess.py", line 394, in init
2020-01-17 12:20:55 - ERROR - errread, errwrite)
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/subprocess.py", line 938, in _execute_child
2020-01-17 12:20:55 - ERROR - self.pid = os.fork()
2020-01-17 12:20:55 - ERROR - OSError: [Errno 12] Cannot allocate memory
2020-01-17 12:20:55 - ERROR -
2020-01-17 12:20:55 - ERROR - [Errno 12] Cannot allocate memory
(python2) [zhangzhm@gpu01 logs]$ tail -n 37 SubassembleReadsStep.bin.918
2020-01-17 12:01:25 - DEBUG - merge long output contigs from local assemblies
2020-01-17 12:01:25 - DEBUG - contig path for local asm NODE_272216_length_407_cov_16.352273_NODE_257018_length_417_cov_19.287293.0 not generated
2020-01-17 12:01:25 - DEBUG - - 0 contigs covering 0 bases
2020-01-17 12:01:42 - DEBUG - assembling barcoded reads for seed NODE_145872_length_520_cov_8.929032
2020-01-17 12:08:01 - DEBUG - seed NODE_145872_length_520_cov_8.929032 contig does not have high enough coverage
2020-01-17 12:08:01 - DEBUG - - 28 bcodes, 26.4903846154x
2020-01-17 12:08:10 - DEBUG - assembling barcoded reads for seed NODE_21065_length_904_cov_10.610130
2020-01-17 12:14:21 - DEBUG - determing local assemblies
2020-01-17 12:15:31 - DEBUG - 4 initial link candidates to check
2020-01-17 12:15:31 - DEBUG - - 0 pass reciprocal filtering
2020-01-17 12:15:42 - DEBUG - root-ctg:NODE_21065_length_904_cov_10.610130;numreads:88;checks:4;trunc-checks:False;asms:0;trunc-asms:False
2020-01-17 12:15:42 - DEBUG - - found 1 candidates
2020-01-17 12:16:09 - DEBUG - performing local assemblies
2020-01-17 12:16:09 - DEBUG - assembling with neighbor None
2020-01-17 12:16:09 - DEBUG - - 44 orig barcodes
2020-01-17 12:16:09 - DEBUG - - 44 downsampled barcodes
2020-01-17 12:16:09 - DEBUG - - 9.73451327434x estimated local coverage
2020-01-17 12:16:09 - DEBUG - - 2 min_support required
2020-01-17 12:20:55 - ERROR - ========== Exception ==========
2020-01-17 12:20:55 - ERROR - Traceback (most recent call last):
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk
2020-01-17 12:20:55 - ERROR - chunk.run()
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 82, in run
2020-01-17 12:20:55 - ERROR - self.do_local_assembly(ctg, asmdir)
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 149, in do_local_assembly
2020-01-17 12:20:55 - ERROR - local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs)
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 83, in assemble
2020-01-17 12:20:55 - ERROR - contig_path = self._do_idba_assembly(local_asm)
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 160, in _do_idba_assembly
2020-01-17 12:20:55 - ERROR - stderr=subprocess.PIPE,
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/subprocess.py", line 394, in init
2020-01-17 12:20:55 - ERROR - errread, errwrite)
2020-01-17 12:20:55 - ERROR - File "/home/zhangzhm/software/anaconda3/envs/python2/lib/python2.7/subprocess.py", line 938, in _execute_child
2020-01-17 12:20:55 - ERROR - self.pid = os.fork()
2020-01-17 12:20:55 - ERROR - OSError: [Errno 12] Cannot allocate memory
2020-01-17 12:20:55 - ERROR -
2020-01-17 12:20:55 - ERROR - [Errno 12] Cannot allocate memory

Installation Error

I am using python3.6 and cannot actually get through the installation of athena.

When trying to run "pip install ." I always receive the error posted below. It appears to be something to do with setuptools but I have upgraded everything to the newest version and it still persists. Please advise.

Complete output from command python setup.py egg_info:
running egg_info
creating pip-egg-info/athena.egg-info
writing pip-egg-info/athena.egg-info/PKG-INFO
writing dependency_links to pip-egg-info/athena.egg-info/dependency_links.txt
writing entry points to pip-egg-info/athena.egg-info/entry_points.txt
writing requirements to pip-egg-info/athena.egg-info/requires.txt
writing top-level names to pip-egg-info/athena.egg-info/top_level.txt
writing manifest file 'pip-egg-info/athena.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-req-build-o8ggv44g/setup.py", line 20, in
description='athena assembler',
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/setuptools/init.py", line 140, in setup
return distutils.core.setup(**attrs)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/setuptools/command/egg_info.py", line 295, in run
self.find_sources()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/setuptools/command/egg_info.py", line 302, in find_sources
mm.run()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/setuptools/command/egg_info.py", line 533, in run
self.add_defaults()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/setuptools/command/egg_info.py", line 569, in add_defaults
sdist.add_defaults(self)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/setuptools/command/py36compat.py", line 34, in add_defaults
self._add_defaults_python()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/setuptools/command/sdist.py", line 126, in _add_defaults_python
if self.distribution.has_pure_modules():
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/distutils/dist.py", line 980, in has_pure_modules
return len(self.packages or self.py_modules or []) > 0
TypeError: object of type 'filter' has no len()

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-o8ggv44g/

Install with conda, not pip

Hello,

I highly recommend that you update your install instructions to create and install athena_meta with a conda environment instead of a venv, based on the fact that you have numpy as a requirement. When numpy is installed with conda instead of pip, the mkl (Math Kernel Library) comes as a dependency, and numpy leverages the power of that library, drastically increasing its performance, as described here.

Furthermore, conda has become more or less the preferred platform for distribution of scientific packages, therefore I'd also recommend creating a conda recipe for the installation of athena_meta itself. If there is interest in your part, I can contribute by doing that.

Sincerely
V

Distinct Cluster Settings for OLC Flye Step

Using the the cluster settings as follows Athena distributes the sub-assemblies across the nodes of our grid engine very nicely.

"cluster_settings": { "cluster_type": "IPCluster", "processes": 318, "cluster_options": { "scheduler": "sge", "queue": "all.q", "extra_params": {"mem":16}
Where "mem":16 is 16GB of RAM.

However, Athena uses these same settings for the OLC/Flye step. This is problematic for two reasons. First, athena passes the processes value (here set to 318) to flye's thread argument and flye will reject values greater than 128. Second, the RAM demands of the single Flye step far exceed that of the individual sub-assemblies. Can there be distinct cluster settings for the sub-assemblies and the OLC steps? I see how to work around setting the thread value but not where the OLC submits the jobs to the cluster such as the setting could be changed. Happy to work with a little steer.

Error: assembly failed to produce contig.fa

Hi,
I get the following error while running Athena-meta:

============================== check_reads ==============================
--> 0 chunks need to be run. Skipping...

============================== subassemble_reads ==============================
4157 chunks to run. Starting...
2019-02-20 20:03:12 - --starting logging SubassembleReadsStep.bin.0 --
2019-02-20 20:03:12 - performing local assembly for 17 seeds
2019-02-20 20:03:12 - targeting 100x short-read subassembly coverage
2019-02-20 20:03:12 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2019-02-20 20:03:12 - assembling barcoded reads for seed NODE_70878_length_2122_cov_16.687470
2019-02-20 20:03:12 - --starting logging SubassembleReadsStep.bin.260 --
2019-02-20 20:03:12 - performing local assembly for 17 seeds
2019-02-20 20:03:12 - --starting logging SubassembleReadsStep.bin.520 --
2019-02-20 20:03:12 - targeting 100x short-read subassembly coverage
2019-02-20 20:03:12 - performing local assembly for 17 seeds
2019-02-20 20:03:12 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2019-02-20 20:03:12 - targeting 100x short-read subassembly coverage
2019-02-20 20:03:12 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2019-02-20 20:03:12 - --starting logging SubassembleReadsStep.bin.780 --
2019-02-20 20:03:12 - performing local assembly for 17 seeds
2019-02-20 20:03:12 - targeting 100x short-read subassembly coverage
2019-02-20 20:03:12 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2019-02-20 20:03:12 - assembling barcoded reads for seed NODE_9167_length_13103_cov_180.557863
2019-02-20 20:03:12 - assembling barcoded reads for seed NODE_97383_length_1373_cov_22.972686
2019-02-20 20:03:12 - assembling barcoded reads for seed NODE_68287_length_2218_cov_13.134535
2019-02-20 20:03:31 - seed NODE_68287_length_2218_cov_13.134535 contig does not have high enough coverage
2019-02-20 20:03:31 - - 64 bcodes, 6.46753832281x
2019-02-20 20:03:32 - assembling barcoded reads for seed NODE_76498_length_1932_cov_162.898775
2019-02-20 20:03:34 - seed NODE_70878_length_2122_cov_16.687470 contig does not have high enough coverage
2019-02-20 20:03:34 - - 95 bcodes, 9.93873704053x
2019-02-20 20:03:34 - assembling barcoded reads for seed NODE_108331_length_1156_cov_16.250681
2019-02-20 20:03:41 - determing local assemblies
2019-02-20 20:03:42 - determing local assemblies
2019-02-20 20:03:45 - 2 initial link candidates to check
2019-02-20 20:03:45 - 4 initial link candidates to check
2019-02-20 20:03:45 - - 0 pass reciprocal filtering
2019-02-20 20:03:46 - - 2 pass reciprocal filtering
2019-02-20 20:03:47 - root-ctg:NODE_97383_length_1373_cov_22.972686;numreads:184;checks:4;trunc-checks:False;asms:0;trunc-asms:False
2019-02-20 20:03:47 - - found 1 candidates
2019-02-20 20:03:48 - performing local assemblies
2019-02-20 20:03:48 - assembling with neighbor None
2019-02-20 20:03:48 - - 85 orig barcodes
2019-02-20 20:03:48 - - 85 downsampled barcodes
2019-02-20 20:03:48 - - 33.6372906045x estimated local coverage
2019-02-20 20:03:48 - - 2 min_support required
2019-02-20 20:03:48 - root-ctg:NODE_9167_length_13103_cov_180.557863;numreads:3241;checks:2;trunc-checks:False;asms:2;trunc-asms:False
2019-02-20 20:03:48 - - found 3 candidates
2019-02-20 20:03:49 - performing local assemblies
2019-02-20 20:03:49 - assembling with neighbor NODE_32750_length_4694_cov_218.604656
2019-02-20 20:03:49 - - 339 orig barcodes
2019-02-20 20:03:49 - - 339 downsampled barcodes
2019-02-20 20:03:49 - - 42.4174783734x estimated local coverage
2019-02-20 20:03:49 - - 2 min_support required
2019-02-20 20:04:02 - determing local assemblies
2019-02-20 20:04:06 - 3 initial link candidates to check
2019-02-20 20:04:06 - - 3 pass reciprocal filtering
2019-02-20 20:04:08 - root-ctg:NODE_76498_length_1932_cov_162.898775;numreads:1633;checks:3;trunc-checks:False;asms:3;trunc-asms:False
2019-02-20 20:04:08 - - found 4 candidates
2019-02-20 20:04:10 - performing local assemblies
2019-02-20 20:04:10 - assembling with neighbor NODE_36659_length_4228_cov_133.458184
2019-02-20 20:04:10 - - 229 orig barcodes
2019-02-20 20:04:10 - - 229 downsampled barcodes
2019-02-20 20:04:10 - - 51.5380911436x estimated local coverage
2019-02-20 20:04:10 - - 2 min_support required
2019-02-20 20:04:10 - determing local assemblies
2019-02-20 20:04:14 - 3 initial link candidates to check
2019-02-20 20:04:15 - - 0 pass reciprocal filtering
2019-02-20 20:04:16 - root-ctg:NODE_108331_length_1156_cov_16.250681;numreads:117;checks:3;trunc-checks:False;asms:0;trunc-asms:False
2019-02-20 20:04:16 - - found 1 candidates
2019-02-20 20:04:18 - performing local assemblies
2019-02-20 20:04:18 - assembling with neighbor None
2019-02-20 20:04:18 - - 62 orig barcodes
2019-02-20 20:04:18 - - 62 downsampled barcodes
2019-02-20 20:04:18 - - 25.4039792388x estimated local coverage
2019-02-20 20:04:18 - - 2 min_support required
number of threads 2
2019-02-20 20:04:19 - assembly failed to produce contig.fa
2019-02-20 20:04:19 - ========== Exception ==========
2019-02-20 20:04:19 - Traceback (most recent call last):
2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/pipeline.py", line 50, in _run_chunk
2019-02-20 20:04:19 - chunk.run()
2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 80, in run
2019-02-20 20:04:19 - self.do_local_assembly(ctg, asmdir)
2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 145, in do_local_assembly
2019-02-20 20:04:19 - local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs)
2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 83, in assemble
2019-02-20 20:04:19 - contig_path = self._do_idba_assembly(local_asm)
2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 173, in _do_idba_assembly
2019-02-20 20:04:19 - raise Exception()
2019-02-20 20:04:19 - Exception
2019-02-20 20:04:19 -
2019-02-20 20:04:19 -
2019-02-20 20:04:19 - --starting logging SubassembleReadsStep.bin.1040 --
2019-02-20 20:04:19 - performing local assembly for 17 seeds
2019-02-20 20:04:19 - targeting 100x short-read subassembly coverage
2019-02-20 20:04:19 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
Traceback (most recent call last):
File "/gpfs0/home/apps/athena/1.1/venv/bin/athena-meta", line 11, in
load_entry_point('athena==1.2', 'console_scripts', 'athena-meta')()
File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/main.py", line 203, in main
2019-02-20 20:04:19 - assembling barcoded reads for seed NODE_60560_length_2545_cov_171.220482
run(options)
File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/main.py", line 42, in run
runner.run_stage(stage, stage_name)
File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage
cluster.map(_run_chunk, to_run)
File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/cluster.py", line 43, in map
return pool.map_async(fn, args).get(9999999)
File "/apps/miniconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
Exception

can not OLC assemble

Hi Bishara,

I'm using athena on a testing soil metagenomic data about 1.7GB. but got some error at the last step. I used conda to install athena. The test went well, even though I can not find the resultant athena-testT2_xee folder under /tem. Can you help on this ? Thanks.

Cheers,

Juntao

============================== assemble_olc ==============================
1 chunks to run. Starting...
2024-05-30 07:41:02 - INFO - merge input contigs
2024-05-30 07:41:02 - ERROR - ========== Exception ==========
2024-05-30 07:41:02 - ERROR - Traceback (most recent call last):
2024-05-30 07:41:02 - ERROR - File "/home/ubuntu/miniforge3/envs/athena/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk
2024-05-30 07:41:02 - ERROR - chunk.run()
2024-05-30 07:41:02 - ERROR - File "/home/ubuntu/miniforge3/envs/athena/lib/python2.7/site-packages/athena/stages/assemble_olc.py", line 82, in run
2024-05-30 07:41:02 - ERROR - 'FAILED to produce any subassembled contigs, cannot OLC assemble'
2024-05-30 07:41:02 - ERROR - SubassemblyException: FAILED to produce any subassembled contigs, cannot OLC assemble
2024-05-30 07:41:02 - ERROR -
2024-05-30 07:41:02 - ERROR - FAILED to produce any subassembled contigs, cannot OLC assemble
Traceback (most recent call last):
File "/home/ubuntu/miniforge3/envs/athena/bin/athena-meta", line 10, in
sys.exit(main())
File "/home/ubuntu/miniforge3/envs/athena/lib/python2.7/site-packages/main.py", line 211, in main
run(options)
File "/home/ubuntu/miniforge3/envs/athena/lib/python2.7/site-packages/main.py", line 42, in run
runner.run_stage(stage, stage_name)
File "/home/ubuntu/miniforge3/envs/athena/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage
cluster.map(_run_chunk, to_run)
File "/home/ubuntu/miniforge3/envs/athena/lib/python2.7/site-packages/athena/cluster.py", line 43, in map
return pool.map_async(fn, args).get(9999999)
File "/home/ubuntu/miniforge3/envs/athena/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
athena.stages.assemble_olc.SubassemblyException: FAILED to produce any subassembled contigs, cannot OLC assemble

lower than expected barcodes

Hi,
I have the following message:

2019-08-13 15:06:38 - INFO - index fastq files/barcoded/10X-testMock/barcoded_clean_trimmed_sorted_barcodes.fastq
2019-08-13 15:06:41 - INFO - get seed contigs from input assembly
2019-08-13 15:06:41 - INFO - computing seed coverages (required pass thru *bam)
lower than expected amount (~0.00%) of barcoded reads from fastq files/barcoded/10X-testMock/barcoded_clean_trimmed_sorted_barcodes.fastq detected in bam files/seedcontigs/10X-testMock/align-reads.metaspades-contigs.bam

  specify --force_reads to bypass QC checks.  Barcoded subassembly likely to fail.

It seems there are not enough barcodes. A quick count returns a lot

> grep -c "BX:" files/barcoded/10X-testMock/barcoded_clean_trimmed_sorted_barcodes.fastq
49922056

Here is the header of my fastq file:

 >head files/barcoded/10X-testMock/barcoded_clean_trimmed_sorted_barcodes.fastq
@ST-J00115:190:H5GHMBBXY:2:2102:19319:26670 BX:Z:AAACACCAGAAACCAT-1
ACGCCAGCGTGCGTCTCGAAACCATCGATGCAATGGTTTCTGGTGTTTAGATCGGAAGAGCGTCGTGTAGGGA
+
AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJF<A
@ST-J00115:190:H5GHMBBXY:2:2102:19319:26670 BX:Z:AAACACCAGAAACCAT-1
ATGGTTTCGAGACGCACGCTGGCGTAG
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJ
@ST-J00115:190:H5GHMBBXY:2:2216:31040:19056 BX:Z:AAACACCAGAAACCAT-1
AAGAAGAGGCGGACGTAAACGTCGGAAAGTCGACTACATCGCAGCAAACCACATCGAATACATCGATTATAAAGATACTGAATTGTTAAAACGATTCATTTCAGAACGTGGCAAAATTT

Any idea what´s going on ??

Thanks

assembly failed to produce contig.fa

It appears that athena_meta v1.1 will die if any of the subassemblies fail to generate contigs:

$ athena-meta config.json
============================== bin_reads ==============================
--> 0 chunks need to be run. Skipping...

============================== index_reads ==============================
--> 0 chunks need to be run. Skipping...

============================== assemble_bins ==============================
671 chunks to run. Starting...
2018-11-03 10:10:09 - --starting logging AssembleMetaBinnedStep.bin.3 --
2018-11-03 10:10:09 - performing local assembly for 1 seeds
2018-11-03 10:10:09 - targeting 100x short-read subassembly coverage
2018-11-03 10:10:09 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2018-11-03 10:10:09 - assembling barcoded reads for seed NODE_331_length_7434_cov_24.498577
2018-11-03 10:10:09 - determing local assemblies
2018-11-03 10:10:09 - 4 initial link candidates to check
2018-11-03 10:10:09 -   - 2 pass reciprocal filtering
2018-11-03 10:10:09 - root-ctg:NODE_331_length_7434_cov_24.498577;numreads:777;checks:4;trunc-checks:False;asms:2;trunc-asms:False
2018-11-03 10:10:09 -   - found 3 candidates
2018-11-03 10:10:09 - performing local assemblies
2018-11-03 10:10:09 -   - skipping filtered contig NODE_66_length_45138_cov_17.962691
2018-11-03 10:10:09 - assembling with neighbor NODE_13841_length_362_cov_30.302932
2018-11-03 10:10:09 -   - 31 orig barcodes
2018-11-03 10:10:09 -   - 31 downsampled barcodes
2018-11-03 10:10:09 -   - 13.381294964x estimated local coverage
2018-11-03 10:10:09 -   - 2 min_support required
:::: 127
2018-11-03 10:10:16 - assembly failed to produce contig.fa
2018-11-03 10:10:16 - ========== Exception ==========
2018-11-03 10:10:16 - Traceback (most recent call last):
2018-11-03 10:10:16 -   File "build/bdist.linux-x86_64/egg/athena/pipeline.py", line 50, in _run_chunk
2018-11-03 10:10:16 -     chunk.run()
2018-11-03 10:10:16 -   File "build/bdist.linux-x86_64/egg/athena/stages/assemble_meta_bins.py", line 82, in run
2018-11-03 10:10:16 -     self.do_local_assembly(ctg, asmdir)
2018-11-03 10:10:16 -   File "build/bdist.linux-x86_64/egg/athena/stages/assemble_meta_bins.py", line 147, in do_local_assembly
2018-11-03 10:10:16 -     local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs)
2018-11-03 10:10:16 -   File "build/bdist.linux-x86_64/egg/athena/assembler_tools/barcode_assembler/local.py", line 83, in assemble
2018-11-03 10:10:16 -     contig_path = self._do_idba_assembly(local_asm)
2018-11-03 10:10:16 -   File "build/bdist.linux-x86_64/egg/athena/assembler_tools/barcode_assembler/local.py", line 173, in _do_idba_assembly
2018-11-03 10:10:16 -     raise Exception()
2018-11-03 10:10:16 - Exception
2018-11-03 10:10:16 -
2018-11-03 10:10:16 -
2018-11-03 10:10:16 - --starting logging AssembleMetaBinnedStep.bin.171 --
2018-11-03 10:10:16 - performing local assembly for 1 seeds
2018-11-03 10:10:16 - targeting 100x short-read subassembly coverage
2018-11-03 10:10:16 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2018-11-03 10:10:16 - assembling barcoded reads for seed NODE_2239_length_1159_cov_10.361413
Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/athena/bin/athena-meta", line 11, in <module>
    load_entry_point('athena==1.1', 'console_scripts', 'athena-meta')()
  File "build/bdist.linux-x86_64/egg/main.py", line 104, in main
  File "build/bdist.linux-x86_64/egg/main.py", line 36, in run
  File "build/bdist.linux-x86_64/egg/athena/pipeline.py", line 33, in run_stage
  File "build/bdist.linux-x86_64/egg/athena/cluster.py", line 43, in map
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/athena/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
Exception

Why raise the exception instead of just skipping that subassembly? Currently, it appears that all bins have to result in contigs or else athena_meta will die.

I'm also getting errors such as the following if I use >1 process for cluster_settings with multiprocessing:

2018-11-03 10:15:39 - performing local assemblies
2018-11-03 10:15:39 -   - skipping filtered contig NODE_66_length_45138_cov_17.962691
2018-11-03 10:15:39 - assembling with neighbor NODE_13841_length_362_cov_30.302932
2018-11-03 10:15:39 -   - 31 orig barcodes
2018-11-03 10:15:39 -   - 31 downsampled barcodes
2018-11-03 10:15:39 -   - 13.381294964x estimated local coverage
2018-11-03 10:15:39 -   - 2 min_support required
2018-11-03 10:15:39 - performing local assemblies
2018-11-03 10:15:39 -   - skipping filtered contig NODE_1669_length_1417_cov_63.982379
2018-11-03 10:15:39 - assembling with neighbor None
2018-11-03 10:15:39 -   - 1289 orig barcodes
2018-11-03 10:15:39 -   - 1289 downsampled barcodes
2018-11-03 10:15:39 -   - 38.04x estimated local coverage
2018-11-03 10:15:39 -   - 2 min_support required
2018-11-03 10:15:39 -   - 0 pass reciprocal filtering
2018-11-03 10:15:39 -   - 1 pass reciprocal filtering
2018-11-03 10:15:39 - root-ctg:NODE_368_length_6660_cov_9.626798;numreads:733;checks:2;trunc-checks:False;asms:0;trunc-asms:False
2018-11-03 10:15:39 -   - found 1 candidates
2018-11-03 10:15:39 - performing local assemblies
2018-11-03 10:15:39 - assembling with neighbor None
2018-11-03 10:15:39 -   - 375 orig barcodes
2018-11-03 10:15:39 -   - 375 downsampled barcodes
2018-11-03 10:15:39 -   - 16.509009009x estimated local coverage
2018-11-03 10:15:39 -   - 2 min_support required
2018-11-03 10:15:39 - root-ctg:NODE_5174_length_686_cov_28.564184;numreads:210;checks:4;trunc-checks:False;asms:1;trunc-asms:False
2018-11-03 10:15:39 -   - found 2 candidates
2018-11-03 10:15:39 - performing local assemblies
2018-11-03 10:15:39 - assembling with neighbor NODE_29_length_78920_cov_25.474697
2018-11-03 10:15:39 -   - 15 orig barcodes
2018-11-03 10:15:39 -   - 15 downsampled barcodes
2018-11-03 10:15:39 -   - 0.466409120889x estimated local coverage
2018-11-03 10:15:39 -   - 2 min_support required
2018-11-03 10:15:45 - ========== Exception ==========
2018-11-03 10:15:45 - Traceback (most recent call last):
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/pipeline.py", line 50, in _run_chunk
2018-11-03 10:15:45 -     chunk.run()
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/stages/assemble_meta_bins.py", line 82, in run
2018-11-03 10:15:45 -     self.do_local_assembly(ctg, asmdir)
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/stages/assemble_meta_bins.py", line 147, in do_local_assembly
2018-11-03 10:15:45 -     local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs)
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/assembler_tools/barcode_assembler/local.py", line 83, in assemble
2018-11-03 10:15:45 -     contig_path = self._do_idba_assembly(local_asm)
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/assembler_tools/barcode_assembler/local.py", line 127, in _do_idba_assembly
2018-11-03 10:15:45 -     local_asm.ds_bcode_set,
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/assembler_tools/barcode_assembler/local.py", line 546, in get_bcode_reads
2018-11-03 10:15:45 -     for _, qname, lines in idx.get_reads(bcode):
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/mlib/fq_idx.py", line 110, in get_reads
2018-11-03 10:15:45 -     for e in util.fastq_iter(f):
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/mlib/util.py", line 138, in fastq_iter
2018-11-03 10:15:45 -     qname, info_map = _get_qname_info(lines[0])
2018-11-03 10:15:45 -   File "build/bdist.linux-x86_64/egg/athena/mlib/util.py", line 129, in _get_qname_info
2018-11-03 10:15:45 -     qname = elms[0]
2018-11-03 10:15:45 - IndexError: list index out of range
2018-11-03 10:15:45 -
2018-11-03 10:15:45 - list index out of range
Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/athena/bin/athena-meta", line 11, in <module>
    load_entry_point('athena==1.1', 'console_scripts', 'athena-meta')()
  File "build/bdist.linux-x86_64/egg/main.py", line 104, in main
  File "build/bdist.linux-x86_64/egg/main.py", line 36, in run
  File "build/bdist.linux-x86_64/egg/athena/pipeline.py", line 33, in run_stage
  File "build/bdist.linux-x86_64/egg/athena/cluster.py", line 43, in map
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/athena/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
IndexError: list index out of range

Support gzip'ed read input?

I was just checking fq_idx.py, and I noticed that you have:

    assert not self.fq_path.endswith('.gz'), \
      "gzipped fq not supported"
    with open(self.fq_path) as f:

With just a couple of more lines of code, you could support gzip'ed input fastq files:

import gzip
if self.fq_path.endswith('.gz'):
    _open = lambda x: gzip.open(self.fq_path, 'rb')
else:
    _open = lambda x: open(self.fq_path)
with _open(self.fq_path) as f:

Missing scaffolds.fasta

Hello,
I get a missing scaffolds.fasta as follows:

My data was assembled with megahit (rather than metaspades).

[2019-08-13 18:45:16] INFO: Assembly statistics:

        Total length:   31516514
        Fragments:      364
        Fragments N50:  425866
        Largest frg:    1399380
        Scaffolds:      1
        Mean coverage:  6

[2019-08-13 18:45:16] INFO: Final assembly: /home/david/files/athena/10X-GivMock/resul
ts/olc/flye-asm-1/assembly.fasta
2019-08-13 18:45:16 - ERROR - ========== Exception ==========
2019-08-13 18:45:16 - ERROR - Traceback (most recent call last):
2019-08-13 18:45:16 - ERROR -   File "/home/david/work/miniconda3/envs/python2.7/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chun
k
2019-08-13 18:45:16 - ERROR -     chunk.run()
2019-08-13 18:45:16 - ERROR -   File "/home/david/work/miniconda3/envs/python2.7/lib/python2.7/site-packages/athena/stages/assemble_olc.py", line 162,
in run
2019-08-13 18:45:16 - ERROR -     shutil.copy(flye_contigs_path, final_fa_path)
2019-08-13 18:45:16 - ERROR -   File "/home/david/work/miniconda3/envs/python2.7/lib/python2.7/shutil.py", line 133, in copy
2019-08-13 18:45:16 - ERROR -     copyfile(src, dst)
2019-08-13 18:45:16 - ERROR -   File "/home/david/work/miniconda3/envs/python2.7/lib/python2.7/shutil.py", line 96, in copyfile
2019-08-13 18:45:16 - ERROR -     with open(src, 'rb') as fsrc:
2019-08-13 18:45:16 - ERROR - IOError: [Errno 2] No such file or directory: 'files/athena/10X-testMock/results/olc/flye-asm-1/scaffolds.fasta'
2019-08-13 18:45:16 - ERROR -
2019-08-13 18:45:16 - ERROR - [Errno 2] No such file or directory: 'files/athena/10X-testMock/results/olc/flye-asm-1/scaffolds.fasta'

Sample json

Dear Athena,
I am not familiar with json format, do you have an example for the required input files?
Thanks,
Juan

AttributeError: 'NoneType' object has no attribute 'error'

I'm getting the following error:

Creating config
Running athena-meta
============================== check_reads ==============================
1 chunks to run. Starting...
2020-08-01 22:26:48 - INFO - index fastq /ebio/abt3_scratch/haptag-mg_49025743982/athena/0.5/2/2000000/sim_reads/1/R1-2.fq
2020-08-01 22:27:26 - INFO - get seed contigs from input assembly
2020-08-01 22:27:26 - INFO - computing seed coverages (required pass thru *bam)
2020-08-01 22:28:00 - INFO -   83160 total inputs seeds covering 100951176 bases
2020-08-01 22:28:00 - INFO -   430 input seed contigs >= 400bp and >= 10.0x coverage covering 9983886 bases
2020-08-01 22:28:00 - INFO - created 431 bins from seeds
2020-08-01 22:28:00 - INFO - done
--> check_reads completed.

============================== subassemble_reads ==============================
431 chunks to run. Starting...
2020-08-01 22:28:01 - INFO - finished subassembly bin.28
2020-08-01 22:28:01 - INFO - finished subassembly bin.0
2020-08-01 22:28:01 - INFO - finished subassembly bin.98
2020-08-01 22:28:01 - INFO - finished subassembly bin.14
2020-08-01 22:28:01 - INFO - finished subassembly bin.70
2020-08-01 22:28:01 - INFO - finished subassembly bin.42
2020-08-01 22:28:01 - INFO - finished subassembly bin.84
2020-08-01 22:28:01 - INFO - finished subassembly bin.15
2020-08-01 22:28:01 - INFO - finished subassembly bin.85
2020-08-01 22:28:01 - INFO - finished subassembly bin.1
2020-08-01 22:28:01 - INFO - finished subassembly bin.43
2020-08-01 22:28:02 - INFO - finished subassembly bin.16
2020-08-01 22:28:02 - INFO - finished subassembly bin.86
2020-08-01 22:28:02 - INFO - finished subassembly bin.2
2020-08-01 22:28:02 - INFO - finished subassembly bin.17
2020-08-01 22:28:02 - INFO - finished subassembly bin.44
2020-08-01 22:28:02 - ERROR - ========== Exception ==========
2020-08-01 22:28:02 - ERROR - Traceback (most recent call last):
2020-08-01 22:28:02 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk
2020-08-01 22:28:02 - ERROR -     chunk.run()
2020-08-01 22:28:02 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 80, in run
2020-08-01 22:28:02 - ERROR -     self.do_local_assembly(ctg, asmdir)
2020-08-01 22:28:02 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 147, in do_local_assembly
2020-08-01 22:28:02 - ERROR -     local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs)
2020-08-01 22:28:02 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 83, in assemble
2020-08-01 22:28:02 - ERROR -     contig_path = self._do_idba_assembly(local_asm)
2020-08-01 22:28:02 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 127, in _do_idba_assembly
2020-08-01 22:28:02 - ERROR -     local_asm.ds_bcode_set,
2020-08-01 22:28:02 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 546, in get_bcode_reads
2020-08-01 22:28:02 - ERROR -     with FastqIndex(fq_path) as idx:
2020-08-01 22:28:02 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/mlib/fq_idx.py", line 46, in __init__
2020-08-01 22:28:02 - ERROR -     self.__load_index__()
2020-08-01 22:28:02 - ERROR -   File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/mlib/fq_idx.py", line 120, in __load_index__
2020-08-01 22:28:02 - ERROR -     self.logger.error('error loading barcode FASTQ index {}, remove and retry'.format(
2020-08-01 22:28:02 - ERROR - AttributeError: 'NoneType' object has no attribute 'error'
2020-08-01 22:28:02 - ERROR -
2020-08-01 22:28:02 - ERROR - 'NoneType' object has no attribute 'error'
Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/bin/athena-meta", line 10, in <module>
    sys.exit(main())
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/main.py", line 211, in main
    run(options)
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/main.py", line 42, in run
    runner.run_stage(stage, stage_name)
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage
    cluster.map(_run_chunk, to_run)
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/site-packages/athena/cluster.py", line 43, in map
    return pool.map_async(fn, args).get(9999999)
  File "/ebio/abt3_projects/software/dev/haptag-mg/data_sim/.snakemake/conda/a7933a08/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
AttributeError: 'NoneType' object has no attribute 'error'
2020-08-01 22:28:02 - INFO - finished subassembly bin.3

This may be a low memory error, but I'm not sure.

conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       0_gnu    conda-forge
athena_meta               1.3                        py_1    bioconda
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.1                      py_0    conda-forge
backports.shutil_get_terminal_size 1.0.0                      py_3    conda-forge
backports_abc             0.5                        py_1    conda-forge
bwa                       0.7.17               hed695b0_7    bioconda
bx-python                 0.8.9            py27h213ead4_1    bioconda
bzip2                     1.0.8                h516909a_2    conda-forge
c-ares                    1.11.0               h470a237_1    bioconda
ca-certificates           2020.6.20            hecda079_0    conda-forge
certifi                   2019.11.28       py27h8c360ce_1    conda-forge
configparser              3.7.3            py27h8c360ce_2    conda-forge
curl                      7.71.1               he644dc0_4    conda-forge
decorator                 4.4.2                      py_0    conda-forge
entrypoints               0.3             py27h8c360ce_1001    conda-forge
enum34                    1.1.10           py27h8c360ce_1    conda-forge
flye                      2.3.1                    py27_0    bioconda
futures                   3.3.0            py27h8c360ce_1    conda-forge
htslib                    1.9                  ha228f0b_7    bioconda
idba_subasm               1.1.3a1         py27pl526hf484d3e_0    bioconda
ipykernel                 4.10.0                   py27_1    conda-forge
ipyparallel               6.2.4            py27h8c360ce_0    conda-forge
ipython                   5.8.0                    py27_1    conda-forge
ipython-cluster-helper    0.6.4              pyh864c0ab_1    bioconda
ipython_genutils          0.2.0                      py_1    conda-forge
jupyter_client            5.3.4                    py27_1    conda-forge
jupyter_core              4.6.3            py27h8c360ce_1    conda-forge
krb5                      1.17.1               hfafb76e_2    conda-forge
ld_impl_linux-64          2.34                 hc38a660_9    conda-forge
libblas                   3.8.0               17_openblas    conda-forge
libcblas                  3.8.0               17_openblas    conda-forge
libcurl                   7.71.1               hcdd3856_4    conda-forge
libdeflate                1.0                  h14c3975_1    bioconda
libedit                   3.1.20191231         h46ee950_1    conda-forge
libev                     4.33                 h516909a_0    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc                    7.2.0                h69d50b8_2    conda-forge
libgcc-ng                 9.3.0               h24d8f2e_11    conda-forge
libgfortran-ng            7.5.0               hdf63c60_11    conda-forge
libgomp                   9.3.0               h24d8f2e_11    conda-forge
liblapack                 3.8.0               17_openblas    conda-forge
libnghttp2                1.41.0               hab1572f_1    conda-forge
libopenblas               0.3.10          pthreads_hb3c22a3_4    conda-forge
libsodium                 1.0.17               h516909a_0    conda-forge
libssh2                   1.9.0                hab1572f_5    conda-forge
libstdcxx-ng              9.3.0               hdf63c60_11    conda-forge
lzo                       2.10              h14c3975_1000    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
netifaces                 0.10.4                   py27_1    bioconda
numpy                     1.11.3          py27he5ce36f_1207    conda-forge
openssl                   1.1.1g               h516909a_1    conda-forge
pathlib2                  2.3.5            py27h8c360ce_1    conda-forge
perl                      5.26.2            h516909a_1006    conda-forge
pexpect                   4.8.0            py27h8c360ce_1    conda-forge
pickleshare               0.7.5           py27h8c360ce_1001    conda-forge
pigz                      2.3.4                hed695b0_1    conda-forge
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
prompt_toolkit            1.0.15                     py_1    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
pygments                  2.5.2                      py_0    conda-forge
pysam                     0.15.3           py27hda2845c_1    bioconda
python                    2.7.15          h5a48372_1011_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-lzo                1.12            py27h42e1302_1001    conda-forge
python_abi                2.7                    1_cp27mu    conda-forge
pyzmq                     19.0.0           py27h76efe43_1    conda-forge
readline                  8.0                  h46ee950_1    conda-forge
samtools                  1.9                 h10a08f8_12    bioconda
scandir                   1.10.0           py27hdf8410d_1    conda-forge
setuptools                44.0.0                   py27_0    conda-forge
simplegeneric             0.8.1                      py_1    conda-forge
singledispatch            3.4.0.3               py27_1000    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.32.3               hcee41ef_1    conda-forge
tk                        8.6.10               hed695b0_0    conda-forge
tornado                   5.1.1           py27h14c3975_1000    conda-forge
traitlets                 4.3.3            py27h8c360ce_1    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_1    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zeromq                    4.3.2                he1b5a44_2    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge

MEGAHIT instead of metaspades?

I noticed in the Athena publication that the marine sediment read cloud sample was assembled with MEGAHIT prior to using Athena, but I couldn't find any documentation on how MEGAHIT was run and whether any additional post-processing was needed

As MEGAHIT is quite a bit faster than MetaSPAdes and uses less memory, is there any reason to preferentially use metaspades (e.g. better assemblies)? And if we try MEGAHIT, are there any issues we can expect to run into? Thanks!

NOT in barcode sorted order

I'm running athena_meta 1.1, and I got the following error:

AssertionError: fastq R1-2.fq NOT in barcode sorted order. Ensure reads that share barcodes are in a block together

The README.rst states that the reads have to be interleaved, but I don't see anything in the documentation about having to sort reads by barcodes. Given that this is a requirement for athena_meta, do you have a helper script for sorting the reads by barcode?

flye-polish error

[2019-09-03 09:21:36] DEBUG: Contig: 12: 12
[2019-09-03 09:21:36] DEBUG: Writing Dot
[2019-09-03 09:21:36] DEBUG: Writing FASTA
[2019-09-03 09:21:36] DEBUG: Writing Gfa
-----------End assembly log------------
[2019-09-03 09:21:36] root: INFO: Polishing genome (1/1)
[2019-09-03 09:21:36] root: INFO: Running Minimap2
[2019-09-03 09:21:36] root: DEBUG: Running: flye-minimap2 /in/results/olc/flye-asm-1/2-repeat/graph_paths.fasta /in/results/olc/flye-input-contigs.fa -a -Q -w5 -m100 -g10000 --max-chain-skip 25 -t 32 -Hk19
[2019-09-03 09:21:38] root: DEBUG: Sorting alignment file
[2019-09-03 09:21:38] root: INFO: Separating alignment into bubbles
[2019-09-03 09:22:13] root: DEBUG: Alignment error rate: 0.00114378338344
[2019-09-03 09:22:13] root: DEBUG: Generated 326301 bubbles
[2019-09-03 09:22:13] root: DEBUG: Split 0 long bubbles
[2019-09-03 09:22:13] root: DEBUG: Skipped 0 empty bubbles
[2019-09-03 09:22:13] root: DEBUG: Skipped 0 bubbles with long branches
[2019-09-03 09:22:13] root: INFO: Correcting bubbles
[2019-09-03 09:22:23] root: ERROR: Command '['flye-polish', '-t', '32', '/in/results/olc/flye-asm-1/3-polishing/bubbles_1.fasta', '/Flye-2.3.1/flye/resource/pacbio_substitutions.mat', '/Flye-2.3.1/flye/resource/pacbio_homopolymers.mat', '/in/results/olc/flye-asm-1/3-polishing/consensus_1.fasta']' returned non-zero exit status -11

Hi Athena dev team,

I ran the Athena pipeline by using the docker and got that error. What kind of problem could cause an error like this? Any help or suggestion is really appreciated.

Thank you,
Tan.

Error: Cannot allocate memory

Hi,
I'm trying to run Athena on a dataset with 16 CPU cores and "--threads 16". I've previously run Athena with the same specifications successfully.
On checking the logs directory, it's the "SubassembleReadsStep.bin.1" that's giving the memory error.

$ cat SubassembleReadsStep.bin.1
2019-07-18 13:20:54 - DEBUG - --starting logging SubassembleReadsStep.bin.1 --
2019-07-18 13:20:54 - DEBUG - performing local assembly for 26 seeds
2019-07-18 13:20:54 - DEBUG - targeting 100x short-read subassembly coverage
2019-07-18 13:20:54 - DEBUG - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2019-07-18 13:20:54 - DEBUG - assembling barcoded reads for seed NODE_3594604_length_400_cov_20.831884
2019-07-18 13:22:03 - DEBUG - determing local assemblies
2019-07-18 13:22:14 - DEBUG - 2 initial link candidates to check
2019-07-18 13:22:14 - DEBUG - - 2 pass reciprocal filtering
2019-07-18 13:22:19 - DEBUG - root-ctg:NODE_3594604_length_400_cov_20.831884;numreads:52;checks:2;trunc-checks:False;asms:2;trunc-asms:False
2019-07-18 13:22:19 - DEBUG - - found 3 candidates
2019-07-18 13:22:24 - DEBUG - performing local assemblies
2019-07-18 13:22:24 - DEBUG - assembling with neighbor NODE_266230_length_1759_cov_19.257629
2019-07-18 13:22:24 - DEBUG - - 19 orig barcodes
2019-07-18 13:22:24 - DEBUG - - 19 downsampled barcodes
2019-07-18 13:22:24 - DEBUG - - 8.18351187704x estimated local coverage
2019-07-18 13:22:24 - DEBUG - - 2 min_support required
2019-07-18 13:22:52 - ERROR - ========== Exception ==========
2019-07-18 13:22:52 - ERROR - Traceback (most recent call last):
2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk
2019-07-18 13:22:52 - ERROR - chunk.run()
2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 80, in run
2019-07-18 13:22:52 - ERROR - self.do_local_assembly(ctg, asmdir)
2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 147, in do_local_assembly
2019-07-18 13:22:52 - ERROR - local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs)
2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 83, in assemble
2019-07-18 13:22:52 - ERROR - contig_path = self._do_idba_assembly(local_asm)
2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 160, in _do_idba_assembly
2019-07-18 13:22:52 - ERROR - stderr=subprocess.PIPE,
2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/devel/python/Python-2.7.16/lib/python2.7/subprocess.py", line 394, in init
2019-07-18 13:22:52 - ERROR - errread, errwrite)
2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/devel/python/Python-2.7.16/lib/python2.7/subprocess.py", line 938, in _execute_child
2019-07-18 13:22:52 - ERROR - self.pid = os.fork()
2019-07-18 13:22:52 - ERROR - OSError: [Errno 12] Cannot allocate memory
2019-07-18 13:22:52 - ERROR -
2019-07-18 13:22:52 - ERROR - [Errno 12] Cannot allocate memory

force re-run of entire workflow

If the output files/directories exist, athena-meta will just skip those steps. For example:

============================== check_reads ==============================
--> 0 chunks need to be run. Skipping...

============================== subassemble_reads ==============================
--> 0 chunks need to be run. Skipping...

============================== assemble_olc ==============================
--> 0 chunks need to be run. Skipping...

...but there is no option to force re-run from the start of the workflow. This could be dangerous for anyone thinking that they have re-run athena, but in fact athena just found the old files/directories and skipped everything. It's also a bit annoying to have to manually delete the output files/directories for each re-run.

How does athena calling other programmes ?

without installing virtualenv, I have installed all the required software and add in my environment path
(add the /path/to/software/bin to ~/.bashrc), and the athena were installed in ~/miniconda2/bin/athena-meta, when I run athena, the output showed that could not find bwa command:
$ athena-meta config.json
============================== bin_reads ==============================
1 chunks to run. Starting...
2018-11-19 21:11:04 - --starting logging BinMetaReadsStep --
2018-11-19 21:11:04 - get seed contigs from input assembly
2018-11-19 21:11:04 - computing seed coverages (required pass thru *bam)
2018-11-19 21:11:04 - 3 total inputs seeds covering 809 bases
2018-11-19 21:11:04 - 1 input seed contigs >= 400bp and >= 10.0x coverage covering 400 bases
2018-11-19 21:11:04 - created 2 bins from seeds
2018-11-19 21:11:04 - -> finished running step; time elapsed: 0:00:00.030149
2018-11-19 21:11:04 - --stopping logging--
--> bin_reads completed.

============================== index_reads ==============================
1 chunks to run. Starting...
2018-11-19 21:11:04 - --starting logging IndexReadsStep_tianchen --
2018-11-19 21:11:04 - index fastq /home/tianchen/10x/barcoded.fastq
2018-11-19 21:11:04 - done
2018-11-19 21:11:04 - -> finished running step; time elapsed: 0:00:00.004956
2018-11-19 21:11:04 - --stopping logging--
--> index_reads completed.

============================== assemble_bins ==============================
2 chunks to run. Starting...
2018-11-19 21:11:05 - --starting logging AssembleMetaBinnedStep.bin.0 --
2018-11-19 21:11:05 - performing local assembly for 1 seeds
2018-11-19 21:11:05 - --starting logging AssembleMetaBinnedStep.bin.1 --
2018-11-19 21:11:05 - targeting 100x short-read subassembly coverage
2018-11-19 21:11:05 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2018-11-19 21:11:05 - performing local assembly for 0 seeds
2018-11-19 21:11:05 - targeting 100x short-read subassembly coverage
2018-11-19 21:11:05 - using barcodes mapped within 10000bp from seed end-points for seed subassembly
2018-11-19 21:11:05 - assembling barcoded reads for seed k99_25
2018-11-19 21:11:05 - merging all outputs
2018-11-19 21:11:05 - done
2018-11-19 21:11:05 - removing bin directory ./working/bin.1
2018-11-19 21:11:05 - -> finished running step; time elapsed: 0:00:00.000714
2018-11-19 21:11:05 - --stopping logging--
2018-11-19 21:11:05 - determing local assemblies
2018-11-19 21:11:05 - 0 initial link candidates to check
2018-11-19 21:11:05 - - 0 pass reciprocal filtering
2018-11-19 21:11:05 - root-ctg:k99_25;numreads:136;checks:0;trunc-checks:False;asms:0;trunc-asms:False
2018-11-19 21:11:05 - - found 1 candidates
2018-11-19 21:11:05 - performing local assemblies
2018-11-19 21:11:05 - assembling with neighbor None
2018-11-19 21:11:05 - - 92 orig barcodes
2018-11-19 21:11:05 - - 92 downsampled barcodes
2018-11-19 21:11:05 - - 43.18x estimated local coverage
2018-11-19 21:11:05 - - 2 min_support required
number of threads 2
reads 184
long reads 10
seed contigs 1
extra reads 0
read_length 150
kmer 20
kmers 4233 4243
merge bubble 0
contigs: 3 n50: 293 max: 293 mean: 125 total length: 375 n80: 62
aligned 8 reads
confirmed bases: 79 correct reads: 0 bases: 0
distance mean 86 sd 0
seed contigs 1 local contigs 6
kmer 40
kmers 931 926
merge bubble 0
contigs: 1 n50: 293 max: 293 mean: 293 total length: 293 n80: 293
aligned 8 reads
confirmed bases: 79 correct reads: 0 bases: 0
distance mean 86 sd 0
seed contigs 1 local contigs 2
kmer 60
kmers 354 354
merge bubble 0
contigs: 1 n50: 293 max: 293 mean: 293 total length: 293 n80: 293
aligned 8 reads
confirmed bases: 79 correct reads: 0 bases: 0
distance mean 86 sd 0
seed contigs 1 local contigs 2
kmer 80
kmers 330 328
merge bubble 0
contigs: 1 n50: 293 max: 293 mean: 293 total length: 293 n80: 293
aligned 8 reads
confirmed bases: 79 correct reads: 0 bases: 0
distance mean 86 sd 0
seed contigs 1 local contigs 2
kmer 100
kmers 194 193
merge bubble 0
contigs: 1 n50: 0 max: 0 mean: 0 total length: 0 n80: 0
aligning seed contigs

  • 0 aligned
    hit 0 connected components
  • retained 0 of 0 contigs
    reads 184
    aligned 0 reads
    distance mean -nan sd -nan
    invalid insert distance
    :::: -11
    2018-11-19 21:11:05 - something funky happened while running idba_ud (got error code -11) but there's not much we can do so continuing
    2018-11-19 21:11:05 - failed to produce correctly formatted contigs in ./working/bin.0/asm/k99_25/local-asm.0/contig.fa
    2018-11-19 21:11:05 - - finished 1
    2018-11-19 21:11:05 - merge long output contigs from local assemblies
    2018-11-19 21:11:05 - contig path for local asm k99_25_None.0 not generated
    2018-11-19 21:11:05 - - 0 contigs covering 0 bases
    2018-11-19 21:11:05 - merging all outputs
    2018-11-19 21:11:05 - done
    2018-11-19 21:11:05 - removing bin directory ./working/bin.0
    2018-11-19 21:11:05 - -> finished running step; time elapsed: 0:00:00.428354
    2018-11-19 21:11:05 - --stopping logging--
    --> assemble_bins completed.

============================== assemble_olc ==============================
1 chunks to run. Starting...
2018-11-19 21:11:05 - --starting logging AssembleOLCStep --
2018-11-19 21:11:05 - jointly overlap-assemble
2018-11-19 21:11:05 - merge input contigs
cmd bwa mem -t 4 /home/tianchen/10x/final.contigs.fa ./results/olc/pre-flye-input-contigs.fa | samtools view -bS - | samtools sort -o ./results/olc/align-inputs.bam -
/bin/sh: bwa: command not found
cmd samtools index ./results/olc/align-inputs.bam
2018-11-19 21:11:05 - filter short subassembled contigs and merge with seeds
2018-11-19 21:11:05 - ========== Exception ==========
2018-11-19 21:11:05 - Traceback (most recent call last):
2018-11-19 21:11:05 - File "/home/tianchen/miniconda2/lib/python2.7/site-packages/athena/pipeline.py", line 50, in _run_chunk
2018-11-19 21:11:05 - chunk.run()
2018-11-19 21:11:05 - File "/home/tianchen/miniconda2/lib/python2.7/site-packages/athena/stages/assemble_olc.py", line 96, in run
2018-11-19 21:11:05 - premergedfiltfa_path,
2018-11-19 21:11:05 - File "/home/tianchen/miniconda2/lib/python2.7/site-packages/athena/stages/assemble_olc.py", line 160, in filter_inputs
2018-11-19 21:11:05 - ctg_size_map = util.get_fasta_sizes(mergedfa_path)
2018-11-19 21:11:05 - File "/home/tianchen/miniconda2/lib/python2.7/site-packages/athena/mlib/util.py", line 151, in get_fasta_sizes
2018-11-19 21:11:05 - fasta = pysam.FastaFile(fa_path)
2018-11-19 21:11:05 - File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.cinit
2018-11-19 21:11:05 - File "pysam/libcfaidx.pyx", line 183, in pysam.libcfaidx.FastaFile._open
2018-11-19 21:11:05 - IOError: error when opening file ./results/olc/pre-flye-input-contigs.fa
2018-11-19 21:11:05 -
2018-11-19 21:11:05 - error when opening file ./results/olc/pre-flye-input-contigs.fa
Traceback (most recent call last):
File "/home/tianchen/miniconda2/bin/athena-meta", line 11, in
sys.exit(main())
File "/home/tianchen/miniconda2/lib/python2.7/site-packages/main.py", line 104, in main
run(options)
File "/home/tianchen/miniconda2/lib/python2.7/site-packages/main.py", line 36, in run
runner.run_stage(stage, stage_name)
File "/home/tianchen/miniconda2/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage
cluster.map(_run_chunk, to_run)
File "/home/tianchen/miniconda2/lib/python2.7/site-packages/athena/cluster.py", line 43, in map
return pool.map_async(fn, args).get(9999999)
File "/home/tianchen/miniconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
IOError: error when opening file ./results/olc/pre-flye-input-contigs.fa

and also, the error showed that fail to open the ./results/olc/pre-flye-input-contigs.fa.
I think I need help,thanks!

Log files are not closed properly

Hello, Alex!
I found that log files are not closed when they should be. I have a restriction of 1000 open file handles and athena_meta inevitably crushes with error "Too many files are opened".
This sloppy quickfix worked for me (but I'm sure there is a better solution):
In log.py:

  def open_log(self):
    if self.log_file is None:
      self.log_file = open(self.log_path, "w")

to

  def open_log(self):
    pass

and

in def log(self, message):

    self.log_file.write(message_str)
    self.log_file.flush()

to

   with open(self.log_path, "r+") as log.file:
       log_file.write(message_str)
       log_file.flush()

SLURM settings not used during flye olc

In the JSON I set additional parameters for SLURM:
"cluster_settings": { "cluster_type": "IPCluster", "processes": 36, "cluster_options": { "scheduler": "slurm", "queue": "PRI_Std", "extra_params": {"mem":200, "cpus-per-task":36, "time":"3-24:00:00"} }
These settings are however ignored during flye in OLC. Although flye is run with -t 36:
flye-assemble -l /.../results/olc/flye-asm-1/flye.log -t 36

The SLURM sbatch is single core:
` Priority=5000 Nice=0 Account=pri QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=10:21:12 TimeLimit=1-00:00:00 TimeMin=N/A
SubmitTime=2019-03-24T23:39:34 EligibleTime=2019-03-24T23:39:35
StartTime=2019-03-24T23:39:35 EndTime=2019-03-25T23:39:36 Deadline=N/A
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0::
TRES=cpu=1,mem=200G,node=1

`

Which means that because 1 CPU is not sufficient to finish the job within 1 day, SLURM cancels the job.
Could you indicate where this part of the pipeline is located in the scripts? I might be able to fix it myself then.

Flye error

Dear Athena,
I am running Athena-meta on linked-read data through a docker. I started the run, everything seems to be running OK but then it crashes with the following error:

2019-02-09 12:34:10 - Command 'flye --subassemblies ./results/olc/flye-input-contigs.fa --out-dir ./results/olc/flye-asm-1 --genome-size 41031 --threads 64 --min-overlap 1000' returned non-zero exit status 1
Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 389, in _handle_results
task = get()
TypeError: ('init() takes at least 3 arguments (1 given)', <class 'subprocess.CalledProcessError'>, ())

Please help,
Juan

Barcodes format

It seems that Athena does not accept barcodes that do not end with "-1". Is it something important? The data that we have does not follow this convention, so I had to manually modify the input fastq files.

installation fail due to pysam old version

Hi,

the installation failled due to htslib / pysam version 0.9.0.
I
C -Ipysam -I. -Ihtslib -I/lb/project/mugqic/analyste_dev/software/python/Python-2.7.13/include/python2.7 -c htslib/hfile_libcurl.c -o build/temp.linux-x86_64-2.7/htslib/hfile_libcurl.o -Wno-unused -Wno-strict-prototypes -Wno-sign-compare -Wno-error=declaration-after-statement
htslib/hfile_libcurl.c: In function ‘easy_errno’:
htslib/hfile_libcurl.c:93:10: error: ‘CURLE_NOT_BUILT_IN’ undeclared (first use in this function)
case CURLE_NOT_BUILT_IN:
^
htslib/hfile_libcurl.c:93:10: note: each undeclared identifier is reported only once for each function it appears in

I resolved it by modifying the version to 0.9.1 in the requirement.txt file.

Example config.json file missing

Hello Alex

I was wondering if you can provide an example of a config.json file you used to run the assembler since it seems to be missing.

Thanks,
Andy

error when opening file `./results/olc/pre-flye-input-contigs.fa`

I got the following error when running athena-meta v1.1:

2018-11-01 22:02:32 - ========== Exception ==========
2018-11-01 22:02:32 - Traceback (most recent call last):
2018-11-01 22:02:32 -   File "build/bdist.linux-x86_64/egg/athena/pipeline.py", line 50, in _run_chunk
2018-11-01 22:02:32 -     chunk.run()
2018-11-01 22:02:32 -   File "build/bdist.linux-x86_64/egg/athena/stages/assemble_olc.py", line 96, in run
2018-11-01 22:02:32 -     premergedfiltfa_path,
2018-11-01 22:02:32 -   File "build/bdist.linux-x86_64/egg/athena/stages/assemble_olc.py", line 160, in filter_inputs
2018-11-01 22:02:32 -     ctg_size_map = util.get_fasta_sizes(mergedfa_path)
2018-11-01 22:02:32 -   File "build/bdist.linux-x86_64/egg/athena/mlib/util.py", line 151, in get_fasta_sizes
2018-11-01 22:02:32 -     fasta = pysam.FastaFile(fa_path)
2018-11-01 22:02:32 -   File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.__cinit__
2018-11-01 22:02:32 -   File "pysam/libcfaidx.pyx", line 183, in pysam.libcfaidx.FastaFile._open
2018-11-01 22:02:32 - IOError: error when opening file `./results/olc/pre-flye-input-contigs.fa`
2018-11-01 22:02:32 -
2018-11-01 22:02:32 - error when opening file `./results/olc/pre-flye-input-contigs.fa`
Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/athena/bin/athena-meta", line 11, in <module>
    load_entry_point('athena==1.1', 'console_scripts', 'athena-meta')()
  File "build/bdist.linux-x86_64/egg/main.py", line 104, in main
  File "build/bdist.linux-x86_64/egg/main.py", line 36, in run
  File "build/bdist.linux-x86_64/egg/athena/pipeline.py", line 33, in run_stage
  File "build/bdist.linux-x86_64/egg/athena/cluster.py", line 43, in map
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/athena/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
IOError: error when opening file `./results/olc/pre-flye-input-contigs.fa`

./results/olc/pre-flye-input-contigs.fa is an empty file, so it appears that athena_meta dies if this occurs, and the traceback doesn't show that the problem is an empty file.

input generation

Hi,
this is not an issue per say but I didn't find any tools that are able to generate the fastq file in the format required for Athena. Do you have any insights for that purpose ?

I was thinking aligning the fastq file with the standard tools (longranger/lariat) and extract the information from the bam file.

Thank you in advance

Mat

seed_ctgs when continuing from aborted run

I have a run that aborted midway through the pipeline, when resuming I got this error:

============================== bin_reads ==============================
--> 0 chunks need to be run. Skipping...

============================== index_reads ==============================
--> 0 chunks need to be run. Skipping...

============================== assemble_bins ==============================
--> 0 chunks need to be run. Skipping...

============================== assemble_olc ==============================
1 chunks to run. Starting...
2018-09-30 13:41:25 - --starting logging AssembleOLCStep --
2018-09-30 13:41:25 - jointly assemble bins with OLC
2018-09-30 13:41:25 - merge input contigs
orig ctgs 13476
filtered ctgs 9755
2018-09-30 13:41:40 - ========== Exception ==========
2018-09-30 13:41:40 - Traceback (most recent call last):
2018-09-30 13:41:40 -   File "/ifs/scratch/bsiranos/miniconda3/envs/py27/lib/python2.7/site-packages/athena/pipeline.py", line 50, in _
run_chunk
2018-09-30 13:41:40 -     chunk.run()
2018-09-30 13:41:40 -   File "/ifs/scratch/bsiranos/miniconda3/envs/py27/lib/python2.7/site-packages/athena/stages/assemble_olc.py", li
ne 102, in run
2018-09-30 13:41:40 -     for ctg in seed_ctgs:
2018-09-30 13:41:40 - UnboundLocalError: local variable 'seed_ctgs' referenced before assignment

Adding this snippet of code (taken from from the above block) to assemble_olc.py solved the issue:

if not os.path.isfile(mergedfiltfa_path):
  filter_inputs(
    mergedbam_path,
    premergedfa_path,
    premergedfiltfa_path,
  )

  # append input seed contigs
  with open(seedsfa_path, 'w') as fout:
    fasta = pysam.FastaFile(self.options.ctgfasta_path)
   bins = util.load_pickle(self.options.bins_pickle_path)
   seed_ctgs = set()
   for (binid, seed_group) in bins:
     for ctg in seed_group:
       seed_ctgs.add(ctg)
    for ctg in seed_ctgs:  
      seq = str(fasta.fetch(ctg).upper())
      for i in xrange(5):
      #for i in xrange(2):
        fout.write('>{}.{}\n'.format(ctg, i))
        fout.write(str(seq) + '\n')

  util.concat_files(
    [premergedfiltfa_path, seedsfa_path], 
    mergedfiltfa_path,
  )
assert is_valid_fasta(mergedfiltfa_path), "merge FASTA not valid"

Ns in input fasta

Athena does not accept Ns in input seed contigs - should this be allowed?

Possible fix - update the first line of "is_valid_fasta" function at "stages/assemble_olc.py"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.