Code Monkey home page Code Monkey logo

zoon's Introduction

Zoon banner Build Status codecov.io cran version DOI rstudio mirror downloads

zoon is a package for the reproducible and shareable analysis of species distribution models with a focus on the ability to compare between models and diagnostic output of models.

An overview of the project can be found here. There is a blog to keep collaborators up to date with progress. This can be found here

zoon is still being developed. Feel free to clone and use the code, open issues, let us know what you want etc. But don't expect much functionality from the package yet. If you would like to add functionality, please start writing modules!

Basic usage

library(zoon)

# Run a workflow, specifying one module of each type.
work1 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = OneHundredBackground,
                  model      = LogisticRegression,
                  output     = PrintMap)

# Get a list of modules
GetModuleList()

# Get help on a module
ModuleHelp(LogisticRegression)

Installation

zoon is now on CRAN, so you can install the stable(ish) version directly from R with:

install.packages('zoon')

you can also install the most recent development version of the package straight from GitHub using the devtools package:

devtools::install_github("zoonproject/zoon")

Contributing modules

zoon has a modular structure, and we are hoping for user submitted modules. This allows zoon to keep up to date with the fast-moving SDM field in a way a package maintained by a small team of developers can't. Modules are simple R scripts containing a single function and some metadata. They are currently kept here. The inputs and outputs of each module type are controlled. A brief description can be found at the end of the Build a module vignette. The function BuildModule is used to turn a function in an R session into a module.

Please note, zoon is still being developed. We would love you to contribute modules, but can't yet guarantee that there won't be major changes that might break modules. We will try to fix user submitted modules if we break them.

Notes for collaborators

We welcome collaboration and input anyone who'd like to get involved! If you have any comments, suggestions or you spot any bugs or errors, please let us know via the issue tracker. Pull requests are always welcome, though please let us know what you're developing first so we plan how to integrate it into the main package.

We are committed to making zoon an inclusive project that the whole research community can contribute to and benefit from it and ask all contributers (including the zoon development team) to stick to a code of conduct

We are using the Google style guide with the exception that function description goes before the function name, not inside the function definition. We are using roxygen2 to document the package. Try to keep function names as verbs.

Zoon banner

zoon's People

Contributors

augustt avatar doi90 avatar goldingn avatar hadley avatar norival avatar rtbecard avatar timcdlucas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zoon's Issues

Output module input

Need to think about what information the output module might potentially need. I think process.output and model.output may contain everything (covariate and occurrence data are all in process). So need to rewrite sametimeplacemap module to match that.

Bundle enviro data

Calling NCEP or some other online environmental dataset is quite slow and therefore not ideal for examples (and annoying when rebuilding package/vignettes.)

Easiest is probably to download a dataset and bundle with package if that's allowed.

add examples

Firstly add the code from the bottom of the gist as an example for the workflow function.

It may be worth adding minimal examples for the other modules too, though we still haven't decided on the right type of repository for those, so that's not urgent at all.

Chain is not a real function

In fixing #40 we have prevented execution of functions given to arguments.

The easiest way to tell between list(mod1, mod2), Chain(mod1, mod2) and mod1 is to simply see if the first element of the substituted argument is 'list', 'Chain' or something else.

I think there's probably some reason why this is bad coding style.

Furthermore it means Chain is not a function that is ever actually called. It need documentation though. So perhaps I need to make a phantom function or something.

Chain vs list in output

It makes more sense to list the output modules rather then Chain them to do what they currently do. Chains could be used to pass one output to another output

Licence

We need to choose a licence (two 'c's is a noun, with an s is a verb apparently.)

GetModule and GetModules not visible

Both of these appear in the manual, but neither function is visible when I install & load the latest version on my linux machine. GetModuleList works fine.

Vignette

As well as having examples in the documentation, having a constantly updated vignette seems a good way to keep everyone up to date on how the package works.

It works well as the vignette is recompiles with each new package build, and is automatically (I think) downloaded when the package is downloaded.

Record code in output

Adding the full code to the output of the workflow function for reproducibility, recording the package versions and session info.

Style guide

Pick a style guide and state it somewhere (on the blog/readme/wiki?), reformat the current code to it, then follow it, ruthlessly.

Google R style guide or a variant might be a good option.

Stochastic models

Need to set.seeds() within each module so that parallel modules are comparing some thing. e.g. if we compare two data sets, we want the same pseudorandom background points. Need to check I'm doing it sensibly and reproducibly all the way through though.

ModuleHelp for `UKBiomod` etc.

I still can't get this to work for my new modules, I get:

> ModuleHelp('UKBiomod')

Warning: /tmp/RtmpkaxY78/UKBiomod.Rd:1: All text must be in a section
Error: /tmp/RtmpkaxY78/UKBiomod.Rd: Sections \title, and \name must exist and be unique in Rd files

for UKBiomod, OneThousandBackground, OptGRaF and QuickGRaF

works for yours though, e.g.:

> ModuleHelp('LogisticRegression')

Model module: LogisticRegression

Description:

     Model module to fit a simple logistic regression model

Usage:

     LogisticRegression(df)

See Also:

     ‘glm’

Save progress

The progress so far should be accessible in case of a crash.

Tests

Once the package is ina reasonable state need to spend some time writing a test suite.

Naming columns gives warning.

Someone at the zoon workshop was experiencing this error.
Looks like it should a) throw and error and b) be fixed.

Warning messages:
1: In names(df)[6:ncol(df)] <- names(ras) :
number of items to replace is not a multiple of replacement length

IsReproducible tag to workflow

Add a tag (probably an attribute) that says whether a workflow was run entirely from the online repo and is therefore reproducible (given that the data is bundled with it.)

Memory

The way the workflow is currently written, you get multiple copies of the wholw environmental rasters. I think this is probably a bad idea but would need to think about how best to fix it.

handle installation of gdal, maxent and other awkward 3rd party software

We're going to have to work out a user friendly way of installing this sort of awkward software.

Chris Clements tweeted a helpful suggestion for rgdal and rgeos:

Want to install the "rgdal" and "rgeos" #R packages? try this:

install.packages(c('rgdal','rgeos'),repos="http://www.stats.ox.ac.uk/pub/RWin ")
(https://twitter.com/CClements88/status/509958777303736320)

though I haven't explored that

Perhaps for maxent we could link to the page to accept the licence (http://www.cs.princeton.edu/~schapire/maxent) and get the jar then try and handle putting it in the right place for the user?

Error in BiomodModel

Hi,

Looks like you have done a terrific work! Looking forward to trying it.

Now I just wanted to comment that I'm also getting an error when running the Biomod workflow in https://github.com/zoonproject/zoon/blob/master/zoonQuickStart.R.

After a quick look I suspect it might be related to your chosen 'modeling.id' argument in 'BIOMOD_Modeling' function (https://github.com/zoonproject/modules/blob/master/R/BiomodModel.R).

With the current specification:
modeling.id = paste('zoon',Sys.time(),sep=" ")
the resulting file path ('Species/.BIOMOD_DATA/zoon 2014-09-17 00:12:33/formated.input.data') is probably not valid, at least in Windows, due to the colons. That causes an error in gzfile:
In gzfile(file, "wb") :
cannot open compressed file 'Species/.BIOMOD_DATA/zoon 2014-09-17 00:12:33/formated.input.data', probable reason 'Invalid argument'.

I'd suggest a quick check using the standard 'modeling.id' in Biomod, i.e. modeling.id = as.character(format(Sys.time(), '%s')), to see if that sorts out the problem.

Thanks and see you soon,

Paco

install failed - Error in find_vignette_product

Hi,

When I try to install the package I get this error. Any guesses at what it could be? It gets messed up when it tries to deal with the vignette.... do I need to have latex installed? Or perhaps you have omitted a file needed to build the vignette?

Hope we can get the sorted tomorrow!

Tom

install_github("zoonproject/zoon")
Installing github repo zoon/master from zoonproject
Downloading master.zip from https://github.com/zoonproject/zoon/archive/master.zip
Installing package from C:\Users\Tom\AppData\Local\Temp\RtmpQlt7PR/master.zip
Installing zoon
"C:/PROGRA1/R/R-301.2/bin/x64/R" --vanilla CMD build
"C:\Users\Tom\AppData\Local\Temp\RtmpQlt7PR\devtools193828231c4d\zoon-master"
--no-manual --no-resave-data

  • checking for file 'C:\Users\Tom\AppData\Local\Temp\RtmpQlt7PR\devtools193828231c4d\zoon-master/DESCRIPTION' ... OK
  • preparing 'zoon':
  • checking DESCRIPTION meta-information ... OK
  • installing the package to build vignettes
  • creating vignettes ...Error in find_vignette_product(name, by = "weave", dir = dirname(file), :
    Failed to locate the 'weave' output file (by engine 'utils::Sweave') for vignette with name 'vignette'. The following files exists in directory 'C:/Users/Tom/AppData/Local/Temp/RtmpkNqD9L/Rbuildac443ac7f1f/zoon/inst/doc': 'basic-zoon-usage.R', 'basic-zoon-usage.Rmd', 'basic-zoon-usage.html', 'vignette.Rnw'
    Execution halted
    Error: Command failed (1)

Modules in packages and tests

Currently some tests require packages. The tests pass if the packages are installed and fail if not. Long term, this means they will fail. Not sure how best to fix it and how other people deal with this.

Deciding syntax (for paths urls etc.)

Overview

Nick and I have been discussing some of the problems with syntax here. #42

I have made the package so that it can accept modules like this

workflow(UKAnopheles, UKAir, Crossvalidate(k=2), LogReg, PrintMap)

as discussed at the workshop.

A tweak that I will make but haven't yet is to make GetModules search for module names in the global namespace first and in the github repo second.

URL/Paths syntax

The bit we're stuck on is what the syntax for a module being defined by a path or url, but with extra arguments should be. I've put the types we've thought of below. They are all possible. If you could opinions or simply votes, that would be great.

a) String as function

This is already downvoted by Nick, but adding for completeness.

workflow('/pathto/myModule.R'(arg = 1, para = 'something'),
         'www.github.com/name/repo/moduleName.R'(options = 2),
         ...)

b) Get function with params after

workflow(get('/pathto/myModule.R')(arg = 1, para = 'something'),
         get('www.github.com/name/repo/moduleName.R')(options = 2),
         ...)

c) A function like ModuleOptions

workflow(ModuleOptions('/pathto/myModule.R', arg = 1, para = 'something'),
         ModuleOptions('www.github.com/name/repo/moduleName.R', options = 2),
         ...)

d) Separately define repo location

workflow(myModule(arg = 1, para = 'something'),
         moduleName(options = 2),
         occurrence_repo = '/pathto', 
         covariate_repo = 'www.github.com/name/repo/',
         ...)

e) Separately define repo location as arg

workflow(myModule(arg = 1, para = 'something', repo = '/pathto'),
         moduleName(options = 2, repo = 'www.github.com/name/repo/'),
         ...)

f) Disallow directly calling from a custom path

Instead make users load the function into the global environment for development

source('/pathto/myModule.R')
source('www.github.com/name/repo/moduleName.R')

workflow(myModule(arg = 1, para = 'something'),
     moduleName(options = 2),
     ...)

ModuleHelp convert to string

ModuleHelp should convert it's argument to a string

ModuleHelp(NCEP) and ModuleHelp('NCEP') should be the same. This matches help() as well.

Giving path as object broken

The substitute stuff to make the syntax nicer has broken the ability to give a module path as an object e.g.
path <- '~/MyMod.R'
w <- workflow(path, ....)

now doesn't work because substitute reads this as 'path' not '~\MyMod.R'.

Workflow syntax

Just opening this so that I can close it.

As discussed at the workshop we want to move from
workflow(occurMod = ModuleOptions('SpOcc', species= 'Milvus milvus', ...

to

workflow(occurrence = SpOcc(species = 'Milvus milvus'), ...

Presence-absence dataset

We want to be able to run examples of presence-absence as well as presence only models. It would be good to find either an online PA dataset we can easily link to (figshare etc.) or else bundle one with the package.

Sort modules by type in GetModuleList

I don't know how to do this but I think we need to add metadata to the roxygen comments and sort by that.

First thoughts are that we don't want to read the text of every module just to print a list of modules. So I'm not sure how to fix this.

Once continuous integration is sorted, it might be easiest to do a nightly run of reading each file, reading the meta data and making a separate file with the module list. GetModuleList() then simply needs to read that file. This also fixes the requirement for a github password to use GetModuleList.

Move RunModels outside of workflow

This was a quick hack to get stuff working for the workshop.

Now I want to move RunModels outside of the workflow function. But there was environment issues last time.

Output script now broken. What is aim?

In moving the modules to another repo I have made it so that the outputted script no longer works (i.e. the workflow = workflow element of the output list from function workflow. Quite confusing there. Line 327 in the zoon_sketch.). Before fixing it I thought I would check what the aims for this part of the package is.

I'm still a little confused between the difference between this outputted script, which is reproducible, and the original handwritten script, which is also in theory reproducible.

I suppose the idea of the output script is that no matter what happens to the rest of the code, if the workflow function is runeable, there will be a reproducible script.

How much detail should go in there? Now that the modules have been moved online, they are similarly as stable as all the code in the packages that are loaded during the analysis.

Argument names

In the workflow function all the arguments are suffixed with Mod, this seems redundant.

workflow to return a more helpful error message when module doesn't exist

Currently this:

# run a workflow, using the logistic regression model
ans1 <- workflow(occurMod = 'UKAnophelesPlumbeus',
                covarMod = 'ThisShitIsBananasB-A-N-A-N-A-S',
                procMod = 'OneHundredBackground',
                modelMod = 'RandomForest',
                outMod = 'SameTimePlaceMap')

only returns this:

Error in GetModule(x$module, type) : 
  Cannot find the module. Check the URL or check that the module is at github.com/zoonproject

It would be useful to know which model doesn't exist, and at some point in the future some sort of fuzzy matching with suggestions?

Workflow code cleanup

Just for future refernce.
I tried to clean up the code for workflow
4434ccf
However it is harder than it looks. Moving code out of workflow means that any call to module functions (eg LogisticRegression()) fails because these functions are only in the workflow environment. I guess you'd have to pass down the environment name like I did with RunModels. But that's a pain.

zoon github account

I'm thinking we should probably set up a separate zoon organisation github account and have the main zoon repo in there, that way we can keep going on it when Tim finishes his internship.

Are you happy to do that Tim?

install_github in getpackages

Having install_github in GetPackages gives a warning on R CMD check which will annoy CRAN.

However, I'm not sure it's great to have devtools in imports. I don't know if suggests devtools gets round this?

Force getting modules from repo

Given #45, we might need to make an argument that forces workflow to get modules from the repo, even if they exist locally in order to confirm that the workflow is reproducible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.