Code Monkey home page Code Monkey logo

bwgtools's People

Contributors

aammd avatar dezeraldolivier avatar ethanwhite avatar nacmarino avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bwgtools's Issues

Argentina: incorrect values in Oxygen columns

as of this writing, the offending values are in cells AC15 and AD15 of the bromeliad.physical tab.
The values of the cells are "20,2/10,4" and "7,2/3,9" respectively
they should actually be a number (50, 20.3, etc)

fix the names from bwg_names

the names of the data was changed by Srivastavalab/bwg_names@e73ac3396540b8a7f2c4de6ca841d11457ac9ef2

need to update the package for dealing with these new names!

Duplicate columns in Cardoso

There are two columns in the Cardoso bromeliad.inverts.final sheet that are duplicated:

"Anopheles sp. - early instars" & "Leptagrion sp. - early instars"

These are exactly the same as others with the same name immediately to the left, except they lack biomass measurements

correct filtering of long invert data

long_out %>% dplyr::filter(abundance != 0 | biomass != 0) is what it should be.

I had thought that abundance could be recorded without biomass, but the opposite was impossible. Such hubris! that's not true at all ๐Ÿ˜ณ

read_site_sheet() vs. combine_tab()

Arg <- read_site_sheet("Argentina","leaf.waterdepths")
Arg1 <- combine_tab("Argentina","leaf.waterdepths")

firstday(Arg) => it works fine!
firstday(Arg1) => it doesn't work! when you call for combine_tab(): "bromeliad.id" column is not found in the sheetname=leaf.waterdepths

the question is why they call for different columns even though it's the same sheetname?
This the same issue with the function: from_start()!

Incorporating a few of the Paraty fixes

We made a lot of progress at Paraty, it would be nice sometime to have these incorporated into the BWGtools pkg as we now have snippets of code on some papers' R scripts and not others. Only when you have time @aammd !

(1) make site a factor within BWGtools (I need to do it manually outside)
(2) make this summarize code below part of the hydro function, not having it in the function threw me for a loop by creating duplication of rownames (as two leaves per bromeliad) in the final fulldata:
mean_hydro <- hydro %>%
ungroup %>%
select(c(2,7:17, long_dry, long_wet, n_driedout, n_overflow)) %>%
group_by(site_brom.id) %>%
summarise_each(funs(median(., na.rm = TRUE))) %>%
replace_na(list(long_dry = 0, long_wet = 0, n_driedout = 0, n_overflow = 0))

when_last <- hydro %>%
ungroup %>%
select(site_brom.id, last_dry, last_wet) %>%
group_by(site_brom.id) %>%
summarise_each(funs(min(., na.rm = TRUE))) %>%
replace_na(list(last_dry = 65, last_wet = 65))

brom_hydro <- mean_hydro %>% left_join(when_last)

(3) adding this code to summarize the ibutton data, which I think is not yet part of BWGtools:
ibuttons <- combine_tab(sheetname = "bromeliad.ibuttons")
ibutton_data <- ibuttons %>%
group_by(site, site_brom.id) %>%
summarise(mean_max = mean(max.temp, na.rm = TRUE), mean_min = mean(min.temp, na.rm = TRUE),
mean_mean = mean(mean.temp, na.rm = TRUE), sd_max = sd(max.temp, na.rm = TRUE),
sd_min = sd(min.temp, na.rm = TRUE), sd_mean = sd(mean.temp, na.rm = TRUE),
cv_max = 100_(sd_max/mean_max), cv_min = 100_(sd_min/mean_min),
cv_mean = 100*(sd_mean/mean_mean)) %>%
ungroup %>%
gather(variable, observed, 3:11) %>%
replace_na(list(observed = "NA")) %>%
select(-site) %>%
spread(variable, observed, fill = 0) %>%
rename(max_temp = mean_max, min_temp = mean_min, mean_temp = mean_mean,
sd_max_temp = sd_max, sd_min_temp = sd_min, sd_mean_temp = sd_mean,
cv_max_temp = cv_max, cv_min_temp = cv_min, cv_mean_temp = cv_mean)

Improve the documentation

Right now, bwgtools is a tool that only Andrew can use (in its entirety), when actually it is a tool for everyone! The package should be accompanied by clear documentation. The first priority is clear communication with all members of the BWG research group. The second is communication with reviewers, editors and readers of the manuscripts we produce.

I would like this issue to be a place to hold all the discussion we have on this topic. We can open separate ones for specific tasks once they arise.

I really encourage everyone who is using this package to consider contributing to documentation! I will help settle any doubts you might have about exactly how to do it (e.g. if you want to edit the documentation of a function).

here are some things I think could be better:

  • Function names -- many functions are named different things, even though they are related. For example, functions which download data are called read_sheet, read_site_sheet, get_all_sites and combine_tab. Should they all be renamed to download_x, where x is something specific?
  • Function documentation -- there is minimal documentation, but lots could be improved there. Examples would be great, plus an expanded "details" section
  • README -- is OK, but could have more developed sections.
    • Accessing data
    • Combining data
    • Calculations
    • Analyses (when we have those)
    • Plots
  • Translations -- Should we have a French/Spanish/Portuguese translation of the README? Of any part of this? Would anyone like that?

Ideas for better offline use

read_site_sheet needs better error catching if the path is wrong. right now, if you give it the wrong path it tries to go to Dropbox anyway. This gives an unhelpful error and wastes time.

so just let it check, not for the existing file, but for being a certain length.

if it is long, must be a path, look for file

if file is absent exit with warning.

function to run hydro calculations for all sites

This function should have an argument which permits users to specify whether

  • the centre leaf is included (default = FALSE)
  • the calculations are performed on leaves first (default) or on bromeliad.averages

data validation for each tab

  • check column names
  • are data values out of range -- loo large or too many very small
  • are any columns filled with NAs
  • is the site variable missing any values
  • does the site value contain all the same word
  • are any observational rows filled with NAs

using `match.fun()` to find functions with the same name as the sheet being read

Is it dangerous or risky to use match.fun() to find a package's own functions? small example of code in question here

background The package has a specialized purpose: reading data in from excel files which are stored in dropbox. the backstory is that we have results from many international replicates of an experiment. the package uses Karthik Ram's rdrop2 and Hadley Wickham's readxl packages. This way scientists on the project can get their data directly into R from the excel files in their dropbox, without ever actually opening Dropbox.

All the excel files are made from the same template, so they should be in a standardized form. Therefore a function that works for one will work for all.

Within each excel file there are multiple tabs. each tab is different, and needs to be read in with specific arguments.

the problem: how can I allow users to simply say which tab (aka "sheet") they want, without having to set default arguments manually every time?

Andrew's solution: create a function with the exact same name as the tab in question (here leaf.waterdepths()). the function reads in the tab of the same name with all the correct arguments. When users ask for a sheet, my function read_sheet() uses match.fun() to find the correct reading function.

from_start()

This function does not work properly for French Guiana as the experiment last from 2012 to 2013. So the nday column gives weird values in 2013!

Script for importing bromeliad.final.inverts in tidy format, merging with species info (Distribution organisms)

This script is currently written for accessing the invert ad distribution organism file from offline locations, but can be modified to access these from BWG Dropbox locations as soon as we have bwgtools setup to read these files directly. Diane

invert.final<-read.table("Drought_data_PuertoRico_bromeliad.final.inverts.csv", sep=",", header=TRUE)
head(invert.final)
library(tidyr)
library(dplyr)
library(magrittr)
invert.long<-invert.final %>% gather(species, quantity,Diptera.292:Ostracoda.7)%>%
spread(abundance.or.biomass, quantity)%>%
separate(trt.name, c("mu", "k"), "k")%>%
mutate(mu = extract_numeric(mu))
head(invert.long)

input error to fix with names in Distributions_organisms_full file, perhaps the apostrophe in line 93?

dist.org<-read.table("Distributions_organisms_full_nonames.csv", sep=",", na.strings="",header=TRUE)
head(dist.org); tail(dist.org)
invert.full <- merge(invert.long, dist.org, by.x = "species", by.y = "nickname")

Line endings

May I push a .gitattributes file to keep linux line endings?

add the latest trait data

the forupload file contains our most recent bwgdb insect trait data. These should be combined with this package. add these data to the package, and get get_bwg_tools

Create RLQ matricies for Regis

Regis requires three matricies for his analysis:

  • a species x traits matrix (fuzzy coding) = matrix Q
  • a species x bromeliad matrix (abundance data) = matrix L
  • a bromeliad x environmental variables (plant specific data, including
    physical, hydrological, ..) = matrix R

The rows of Q and L must be identical, and the columns of L and the rows of R must be identical

Perhaps the easiest thing is to write a function that does this for only one site, then give it all sites.

start with trait data merged on abundance data, and the physical data too.
filter and spread trait data
separate into matricies.
filter and arrange the physical data.

check for duplicate bromeliads

We need a test for duplicate bromeliads. What if somebody has inadvertently typed in the same code? We need to find this before it causes problems.
Which sheets need to be checked?
Are bromeliad.id s always unique within each of those sheets, or are there some that have duplicate values by design (ie repeated rows)?
Should we actually be checking for consistency with treatment?

NAs in water measurements

As reported by @nacmarino , @dsrivast :

both the cv.depth as well as the wetness for argentina_15 are missing. This shouldn't be so, as there is a mean and a maximum value on the spreadsheet.

And time.since.minimum is entirely NA...all bromelaids all sites.

argentina_15 is not NA but NaN, suggesting that there is illegal math
issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.