Code Monkey home page Code Monkey logo

tibhannover / bacdiver Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 12.0 3.3 MB

Inofficial R client for the DSMZ's Bacterial Diversity Metadatabase (former contact: @katrinleinweber). https://api.bacdive.dsmz.de/client_examples seems to be the official alternatives.

Home Page: https://TIBHannover.GitHub.io/BacDiveR/

License: MIT License

R 98.05% Makefile 1.95%
r microorganism bacterial-database bacteriology webservice-client microbiology biobank r-package rstats bacterial-samples

bacdiver's People

Contributors

axel-klinger avatar katrinleinweber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bacdiver's Issues

Write management plans

https://figshare.com/articles/Managing_Research_Software_Development_better_software_better_research/5930662 p24f & http://www.software.ac.uk/software-management-plans

What software will you write?
What will your software do? 
Will your software have a name?
Who are the intended users of your software?
Is for one type of user or for many?
What expertise is required?
How will you make your software available?
How will your software contribute to research and how will you measure its contribution?

Compare temp data to https://zenodo.org/record/1175609

  • check whether that dataset has different source
    • partially from BacDive => reproduce results
  • write vignette about extracting growth temp from that dataset & through BacDiveR & mention @mengqvist then
    • parse his dataset, try to retrieve same species from BacDive

split retrieve_IDs off from retrieve_data()

Extract to different function? If yes, by scraping IDs from paged URL returns (official examples), or by storing the URLs as intermediate result, plus providing helper functions to narrow that result down to the IDs?

Or, implement as an internal loop-back in retrieve_data(…, searchType = "taxon") based on new parameter taxon_data = TRUE?

  • ask, whether only taxon search can return multiple IDs

Remove invalid \n in JSON

While implementing #31 and switching from rjson to jsonlite I noticed that some fields contain a insufficiently escaped \ns. This results in lexical error: invalid character inside string..

@ceb15: Please consider ensuring that those are escaped as \\n already BacDive or (I presume) during JSON serialisation.

screen shot 2018-03-20 at 11 00 02

I'll parse them away for now.

Make taxon search more prominent?

Assuming the vast majority ("90%") of BacDive users looks up data about a strain, backdive_ID as the default search may not be as useful.

Maybe rather a retrieve_taxon_data("…", filter_by = c("property_A", "prop_B", "C")) function?

randomise test searches

https://bacdive.dsmz.de/api/bacdive/example uses some specific search terms. If automatic tests use these as well, their internal statistics about popular datasets might be skewed. Maybe they are already, and this fact is accounted for by the DSMZ.

  • ask whether any such statistics are collected

The test search terms could be randomised to avoid this problem: sample(seq(100000, 999999), size = 1), acc <- paste(sample(LETTERS, size = 2), collapse = ""), int), paste("DSM", round(int / 1000)) or similar.

  • ask for max ranges

This would spread out the "popularity" inflation, but might require fine-tuning the seq ranges. Plus, it would assume continuous numbering on their end.

  • ask whether this is the case

aggregate datasets into useful structure before returning

noticed while working on #16

retrieve_data() currently appends multiple downloads into a continuous list in which the datasets can't be addressed anymore. We need a data structure, that lets the user $-address the datasets, and their fields. Ideally, each dataset is referred to by index = bacdive_id. Something like a sparse list-of-lists?!?

ideas:

  • aggregate JSON strings in character vector, then rjson::fromJSON() them "in-place" or somehow that creates the nested lists "below / as lower hierarchies" of that vector
  • write-out each dataset to a file (kind of a local cache), then maybe concatenate files & re-import as a useful data structure
  • use jsonlite to create 1 dataframe per bacdive_ID, then add those to a list
  • keep on c()ombining downloads, but aggregate into a higher-level list and use an apply variant to extract a field/element from the resulting "megastructure"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.