bio-miga / miga Goto Github PK
View Code? Open in Web Editor NEWMiGA: Microbial Genomes Atlas
License: Artistic License 2.0
MiGA: Microbial Genomes Atlas
License: Artistic License 2.0
Some result files are left unregistered and remain after miga unlink_dataset -r
:
Method has too many lines. [35/30]
This would allow restarting daemons more safely and would avoid clutter in the background. It's also the least surprising behavior, so I'm tagging this as bug report.
The following results are not implemented and should be removed: :ess_phylogeny
, :core_phylogeny
, :clade_metadata
. These could be implemented as plugins in the future, but should be removed from miga-base
.
Hi,
I tried to download draft genomes using this command
miga get -P . --file test.tsv --universe web --db assembly_gz --ignore-dup --verbose -t genome
The content of test.tsv is
dataset comments ids
Candidatus_Nitrosoarchaeum_koreensis_MY1_GCA_000220175 Assembly: GCA_000220175.2 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/220/175/GCA_000220175.2_ASM22017v1/GCA_000220175.2_ASM22017v1_genomic.fna.gz
Thaumarchaeota_archaeon_SCGC_AAA007_O23_GCA_000402075 Assembly: GCA_000402075.1 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/402/075/GCA_000402075.1_ASM40207v1/GCA_000402075.1_ASM40207v1_genomic.fna.gz
Here is the error I got
Dataset: Candidatus_Nitrosoarchaeum_koreensis_MY1_GCA_000220175
Loading project.
Locating remote dataset.
Creating dataset.
Exception: undefined local variable or method 'map_to' for MiGA::RemoteDataset:Class
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:91:in 'download_net'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:63:in 'download'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:188:in 'download'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:153:in 'save_to'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:80:in 'block in <top (required)>'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:62:in 'each'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:62:in '<top (required)>'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/bin/miga:205:in 'load'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/bin/miga:205:in '<top (required)>'
/usr/local/bin/miga:23:in 'load'
/usr/local/bin/miga:23:in '<main>
When downloading FastA files directly from NCBI and EBI the :assembly
result should be clean: true
to avoid unnecessary reformating.
Network interruptions are common, so downloading datasets in batch shouldn't be stoped by a single failure. The download
code should be protected and multiple (3?) trials should be attempted before aborting. Here's an excerpt of a failure:
Exception: Net::ReadTimeout
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:158:in `rescue in rbuf_fill'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:152:in `rbuf_fill'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:134:in `readuntil'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:1108:in `gets'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:1113:in `readline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:290:in `getline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:301:in `getmultiline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:319:in `getresp'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:339:in `voidresp'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:249:in `block in connect'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:247:in `connect'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:762:in `buffer_open'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:210:in `block in open_loop'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:208:in `catch'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:208:in `open_loop'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:149:in `open_uri'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:704:in `open'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:34:in `open'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:75:in `download'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:161:in `download'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:128:in `save_to'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:80:in `block in <top (required)>'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:62:in `each'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:62:in `<top (required)>'
/nv/hmicro1/lmr3/shared3/miga/bin/miga:136:in `load'
/nv/hmicro1/lmr3/shared3/miga/bin/miga:136:in `<main>'
MiGA has been deployed in Amazon Web Services and in Google Cloud Computing, these experiences should be documented in the manual.
It should be possible to add AAI/ANI estimation against a given reference dataset upon request.
Method has too many lines. [49/30]
https://codeclimate.com/github/bio-miga/miga/test/project_test.rb#issue_594185b6ad7118000100007c
Metagenomic datasets are treated differently by :essential_genes
, but this is not reflected in the :stats
implementation, simply resulting in 0% completeness and 0% contamination.
The #{base}.html
file remains orphan.
Datasets initiated from assemblies must have a cleanup step. This could either be at registration or in the cds
step (as it's done for reads). Cleanup should remove characters [^A-Za-z0-8_-]
from the defline before the first space (replace them by _
?).
Identical code found in 1 other location (mass = 24)
https://codeclimate.com/github/bio-miga/miga/test/project_test.rb#issue_594185b9ad71180001000080
Manual documentation still has blank pages.
First, hAAI is virtually useless for clades, since the assumption is that all genomes are too close for them to be resolved like this. Second, in the unlikely event that hAAI can resolve a pair, this would break :ogs
.
Also:
Some steps involve (un)zipping files from previous steps, and registering the result should trigger a re-registration. However, the presence of the previous .json file stops this from happening.
Also, this re-registration should be moved from the miga-base
Ruby code to the execution Bash scripts. It may need a new --force
option for miga add_result
.
For Ruby libs and R packages
The daemon/status.json
file includes ds
keys in the jobs_running
array with useless values
The steps cds
, essential_genes
, and ssu
usually take longer in queue than running, so it would make sense to bundle them together. This could be done by restructuring the preprocessing functions, but probably an easier way to go around would be by virtualizing the bundle. Two options are:
essential_genes
and ssu
at the end of cds
.Option 1 appears to be preferable as less intrusive and more future-proof, but option 2 is easier to implement and probably to maintain. I'm currently leaning towards option 1.
Add options to create_dataset to input assembly file and possibly genes/proteins.
Class has too many lines. [269/250]
https://codeclimate.com/github/bio-miga/miga/lib/miga/gui.rb#issue_57047d24a4db61000101aeb8
Cyclomatic complexity for opt_object is too high. [13/6]
https://codeclimate.com/github/bio-miga/miga/bin/miga#issue_594185b6ad71180001000078
It would be easy to implement a local (bash) run of single steps in the CLI.
Daemon should understand that an empty PID means task not scheduled or launched. It seems this issue is not even solved after housekeeping!
In addition to file keys, this section should document the stats keys supported.
This has been long discussed internally, and it appears to be the easiest solution to the licensing hassle with MetaGeneMark.hmm.
The daemon can die (after just a warning) if a dataset is removed. This is probably caused by the dataset being listed in the memory-loaded list of datasets.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.