The miga's discuss from bio-miga

Unregistered files

Some result files are left unregistered and remain after miga unlink_dataset -r:

data/02.trimmed_reads/DS.1.fastq.trimmed.summary.txt
data/04.trimmed_fasta/DS.1.fasta.gz
data/04.trimmed_fasta/DS.2.fasta.gz
data/05.assembly/DS

Incomplete test coverage for lib/miga/dataset.rb

Fix "Rubocop/Metrics/MethodLength" issue in lib/miga/remote_dataset.rb

Method has too many lines. [35/30]

https://codeclimate.com/github/bio-miga/miga/lib/miga/remote_dataset.rb#issue_594185b6ad7118000100007b

Terminate daemon jobs on daemon termination

This would allow restarting daemons more safely and would avoid clutter in the background. It's also the least surprising behavior, so I'm tagging this as bug report.

Eliminate unimplemented expected results

The following results are not implemented and should be removed: :ess_phylogeny, :core_phylogeny, :clade_metadata. These could be implemented as plugins in the future, but should be removed from miga-base.

Incomplete test coverage for lib/miga/dataset/result.rb

Issue downloading draft Genomes

Hi,
I tried to download draft genomes using this command

miga get -P . --file test.tsv --universe web --db assembly_gz --ignore-dup --verbose -t genome

The content of test.tsv is

dataset comments ids
Candidatus_Nitrosoarchaeum_koreensis_MY1_GCA_000220175 Assembly: GCA_000220175.2 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/220/175/GCA_000220175.2_ASM22017v1/GCA_000220175.2_ASM22017v1_genomic.fna.gz
Thaumarchaeota_archaeon_SCGC_AAA007_O23_GCA_000402075 Assembly: GCA_000402075.1 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/402/075/GCA_000402075.1_ASM40207v1/GCA_000402075.1_ASM40207v1_genomic.fna.gz

Here is the error I got

Dataset: Candidatus_Nitrosoarchaeum_koreensis_MY1_GCA_000220175
Loading project.
Locating remote dataset.
Creating dataset.
Exception: undefined local variable or method 'map_to' for MiGA::RemoteDataset:Class

/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:91:in 'download_net'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:63:in 'download'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:188:in 'download'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:153:in 'save_to'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:80:in 'block in <top (required)>'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:62:in 'each'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:62:in '<top (required)>'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/bin/miga:205:in 'load'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/bin/miga:205:in '<top (required)>'
/usr/local/bin/miga:23:in 'load'
/usr/local/bin/miga:23:in '<main>

Incomplete manual §2.5 (Distances)

Trust NCBI/EBI FastA files

When downloading FastA files directly from NCBI and EBI the :assembly result should be clean: true to avoid unnecessary reformating.

Incomplete manual §5.2

Incomplete test coverage for lib/miga/remote_dataset.rb

Incomplete manual §5 Additional details

Downloading datasets should have multiple attempts

Network interruptions are common, so downloading datasets in batch shouldn't be stoped by a single failure. The download code should be protected and multiple (3?) trials should be attempted before aborting. Here's an excerpt of a failure:

Exception: Net::ReadTimeout

/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:158:in `rescue in rbuf_fill'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:152:in `rbuf_fill'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:134:in `readuntil'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:1108:in `gets'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:1113:in `readline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:290:in `getline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:301:in `getmultiline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:319:in `getresp'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:339:in `voidresp'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:249:in `block in connect'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:247:in `connect'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:762:in `buffer_open'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:210:in `block in open_loop'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:208:in `catch'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:208:in `open_loop'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:149:in `open_uri'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:704:in `open'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:34:in `open'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:75:in `download'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:161:in `download'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:128:in `save_to'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:80:in `block in <top (required)>'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:62:in `each'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:62:in `<top (required)>'
/nv/hmicro1/lmr3/shared3/miga/bin/miga:136:in `load'
/nv/hmicro1/lmr3/shared3/miga/bin/miga:136:in `<main>'

Add cloud computing to the manual

MiGA has been deployed in Amazon Web Services and in Google Cloud Computing, these experiences should be documented in the manual.

Incomplete test coverage for lib/miga/common/format.rb

Calculate query dataset AAI/ANI on request

It should be possible to add AAI/ANI estimation against a given reference dataset upon request.

Fix "Rubocop/Metrics/MethodLength" issue in test/project_test.rb

Method has too many lines. [49/30]

https://codeclimate.com/github/bio-miga/miga/test/project_test.rb#issue_594185b6ad7118000100007c

Incomplete test coverage for lib/miga/taxonomy.rb

Incomplete manual §1.2

Stats for essential_genes don't support metagenomes

Metagenomic datasets are treated differently by :essential_genes, but this is not reflected in the :stats implementation, simply resulting in 0% completeness and 0% contamination.

Krona report not registered in MyTaxa

The #{base}.html file remains orphan.

Clean assembly files

Datasets initiated from assemblies must have a cleanup step. This could either be at registration or in the cds step (as it's done for reads). Cleanup should remove characters [^A-Za-z0-8_-] from the defline before the first space (replace them by _?).

Fix "Identical code" issue in test/project_test.rb

Identical code found in 1 other location (mass = 24)

https://codeclimate.com/github/bio-miga/miga/test/project_test.rb#issue_594185b9ad71180001000080

Incomplete manual §1.1 (How can MiGA help me?)

Manual documentation still has blank pages.

Incomplete manual §2.6 (Clustering)

Test ticket from Code Climate

clades projects shouldn't have hAAI

First, hAAI is virtually useless for clades, since the assumption is that all genomes are too close for them to be resolved like this. Second, in the unlikely event that hAAI can resolve a pair, this would break :ogs.

Incomplete manual §3.3 (MiGA Web)

Also:

MiGA GUI has been deprecated, so §3.3 should be removed
§3 should have an intro

Incomplete test coverage for lib/miga/tax_index.rb

Re-registration of previous results not working

Some steps involve (un)zipping files from previous steps, and registering the result should trigger a re-registration. However, the presence of the previous .json file stops this from happening.

Also, this re-registration should be moved from the miga-base Ruby code to the execution Bash scripts. It may need a new --force option for miga add_result.

Incomplete test coverage for lib/miga/daemon.rb

miga init not consistently sourcing .miga_modules

For Ruby libs and R packages

Daemon status reports link to dataset objects (not name)

The daemon/status.json file includes ds keys in the jobs_running array with useless values

Incomplete test coverage for lib/miga/project/dataset.rb

Bundle all steps per dataset

The steps cds, essential_genes, and ssu usually take longer in queue than running, so it would make sense to bundle them together. This could be done by restructuring the preprocessing functions, but probably an easier way to go around would be by virtualizing the bundle. Two options are:

Create a "virtual" step only accessible from the daemon that runs them all
Include sub-calls to essential_genes and ssu at the end of cds.

Option 1 appears to be preferable as less intrusive and more future-proof, but option 2 is easier to implement and probably to maintain. I'm currently leaning towards option 1.

bio-miga / miga Goto Github PK

miga's Issues

Recommend Projects

Recommend Topics

Recommend Org