Code Monkey home page Code Monkey logo

miga's People

Contributors

fangyuan0059 avatar gunturus avatar lmrodriguezr avatar tkiryuti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

miga's Issues

Re-registration of previous results not working

Some steps involve (un)zipping files from previous steps, and registering the result should trigger a re-registration. However, the presence of the previous .json file stops this from happening.

Also, this re-registration should be moved from the miga-base Ruby code to the execution Bash scripts. It may need a new --force option for miga add_result.

Clean assembly files

Datasets initiated from assemblies must have a cleanup step. This could either be at registration or in the cds step (as it's done for reads). Cleanup should remove characters [^A-Za-z0-8_-] from the defline before the first space (replace them by _?).

Daemon assumes empty PIDs are valid

Daemon should understand that an empty PID means task not scheduled or launched. It seems this issue is not even solved after housekeeping!

Eliminate unimplemented expected results

The following results are not implemented and should be removed: :ess_phylogeny, :core_phylogeny, :clade_metadata. These could be implemented as plugins in the future, but should be removed from miga-base.

Downloading datasets should have multiple attempts

Network interruptions are common, so downloading datasets in batch shouldn't be stoped by a single failure. The download code should be protected and multiple (3?) trials should be attempted before aborting. Here's an excerpt of a failure:

Exception: Net::ReadTimeout

/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:158:in `rescue in rbuf_fill'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:152:in `rbuf_fill'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/protocol.rb:134:in `readuntil'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:1108:in `gets'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:1113:in `readline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:290:in `getline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:301:in `getmultiline'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:319:in `getresp'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:339:in `voidresp'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:249:in `block in connect'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/net/ftp.rb:247:in `connect'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:762:in `buffer_open'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:210:in `block in open_loop'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:208:in `catch'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:208:in `open_loop'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:149:in `open_uri'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:704:in `open'
/usr/local/pacerepov1/ruby/2.1.5/gcc-4.9.0/lib/ruby/2.1.0/open-uri.rb:34:in `open'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:75:in `download'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:161:in `download'
/nv/hmicro1/lmr3/shared3/miga/lib/miga/remote_dataset.rb:128:in `save_to'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:80:in `block in <top (required)>'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:62:in `each'
/nv/hmicro1/lmr3/shared3/miga/actions/download_dataset.rb:62:in `<top (required)>'
/nv/hmicro1/lmr3/shared3/miga/bin/miga:136:in `load'
/nv/hmicro1/lmr3/shared3/miga/bin/miga:136:in `<main>'

Bundle all steps per dataset

The steps cds, essential_genes, and ssu usually take longer in queue than running, so it would make sense to bundle them together. This could be done by restructuring the preprocessing functions, but probably an easier way to go around would be by virtualizing the bundle. Two options are:

  1. Create a "virtual" step only accessible from the daemon that runs them all
  2. Include sub-calls to essential_genes and ssu at the end of cds.

Option 1 appears to be preferable as less intrusive and more future-proof, but option 2 is easier to implement and probably to maintain. I'm currently leaning towards option 1.

Issue downloading draft Genomes

Hi,
I tried to download draft genomes using this command

miga get -P . --file test.tsv --universe web --db assembly_gz --ignore-dup --verbose -t genome

The content of test.tsv is

dataset comments ids
Candidatus_Nitrosoarchaeum_koreensis_MY1_GCA_000220175 Assembly: GCA_000220175.2 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/220/175/GCA_000220175.2_ASM22017v1/GCA_000220175.2_ASM22017v1_genomic.fna.gz
Thaumarchaeota_archaeon_SCGC_AAA007_O23_GCA_000402075 Assembly: GCA_000402075.1 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/402/075/GCA_000402075.1_ASM40207v1/GCA_000402075.1_ASM40207v1_genomic.fna.gz

Here is the error I got

Dataset: Candidatus_Nitrosoarchaeum_koreensis_MY1_GCA_000220175
Loading project.
Locating remote dataset.
Creating dataset.
Exception: undefined local variable or method 'map_to' for MiGA::RemoteDataset:Class

/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:91:in 'download_net'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:63:in 'download'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:188:in 'download'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/lib/miga/remote_dataset.rb:153:in 'save_to'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:80:in 'block in <top (required)>'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:62:in 'each'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/actions/get.rb:62:in '<top (required)>'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/bin/miga:205:in 'load'
/var/lib/gems/2.3.0/gems/miga-base-0.3.1.2/bin/miga:205:in '<top (required)>'
/usr/local/bin/miga:23:in 'load'
/usr/local/bin/miga:23:in '<main>

Trust NCBI/EBI FastA files

When downloading FastA files directly from NCBI and EBI the :assembly result should be clean: true to avoid unnecessary reformating.

clades projects shouldn't have hAAI

First, hAAI is virtually useless for clades, since the assumption is that all genomes are too close for them to be resolved like this. Second, in the unlikely event that hAAI can resolve a pair, this would break :ogs.

Unregistered files

Some result files are left unregistered and remain after miga unlink_dataset -r:

  • data/02.trimmed_reads/DS.1.fastq.trimmed.summary.txt
  • data/04.trimmed_fasta/DS.1.fasta.gz
  • data/04.trimmed_fasta/DS.2.fasta.gz
  • data/05.assembly/DS

Daemon can die if a dataset is unlinked

The daemon can die (after just a warning) if a dataset is removed. This is probably caused by the dataset being listed in the memory-loaded list of datasets.

Add cloud computing to the manual

MiGA has been deployed in Amazon Web Services and in Google Cloud Computing, these experiences should be documented in the manual.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.