srivastavalab / bwgtools Goto Github PK
View Code? Open in Web Editor NEWThis is an R package that validates data and reproduces all analyses for the bromeliad working group rainfall experiment.
License: Other
This is an R package that validates data and reproduces all analyses for the bromeliad working group rainfall experiment.
License: Other
as of this writing, the offending values are in cells AC15 and AD15 of the bromeliad.physical
tab.
The values of the cells are "20,2/10,4" and "7,2/3,9" respectively
they should actually be a number (50, 20.3, etc)
the names of the data was changed by Srivastavalab/bwg_names@e73ac3396540b8a7f2c4de6ca841d11457ac9ef2
need to update the package for dealing with these new names!
especially driedout
and overflow
. ie new variables from Paraty
We would like to take the 60 day schedule and condense it into a few numbers:
There are two columns in the Cardoso bromeliad.inverts.final
sheet that are duplicated:
"Anopheles sp. - early instars" & "Leptagrion sp. - early instars"
These are exactly the same as others with the same name immediately to the left, except they lack biomass measurements
long_out %>% dplyr::filter(abundance != 0 | biomass != 0)
is what it should be.
I had thought that abundance could be recorded without biomass, but the opposite was impossible. Such hubris! that's not true at all
Arg <- read_site_sheet("Argentina","leaf.waterdepths")
Arg1 <- combine_tab("Argentina","leaf.waterdepths")
firstday(Arg) => it works fine!
firstday(Arg1) => it doesn't work! when you call for combine_tab(): "bromeliad.id" column is not found in the sheetname=leaf.waterdepths
the question is why they call for different columns even though it's the same sheetname?
This the same issue with the function: from_start()!
We made a lot of progress at Paraty, it would be nice sometime to have these incorporated into the BWGtools pkg as we now have snippets of code on some papers' R scripts and not others. Only when you have time @aammd !
(1) make site a factor within BWGtools (I need to do it manually outside)
(2) make this summarize code below part of the hydro function, not having it in the function threw me for a loop by creating duplication of rownames (as two leaves per bromeliad) in the final fulldata:
mean_hydro <- hydro %>%
ungroup %>%
select(c(2,7:17, long_dry, long_wet, n_driedout, n_overflow)) %>%
group_by(site_brom.id) %>%
summarise_each(funs(median(., na.rm = TRUE))) %>%
replace_na(list(long_dry = 0, long_wet = 0, n_driedout = 0, n_overflow = 0))
when_last <- hydro %>%
ungroup %>%
select(site_brom.id, last_dry, last_wet) %>%
group_by(site_brom.id) %>%
summarise_each(funs(min(., na.rm = TRUE))) %>%
replace_na(list(last_dry = 65, last_wet = 65))
brom_hydro <- mean_hydro %>% left_join(when_last)
(3) adding this code to summarize the ibutton data, which I think is not yet part of BWGtools:
ibuttons <- combine_tab(sheetname = "bromeliad.ibuttons")
ibutton_data <- ibuttons %>%
group_by(site, site_brom.id) %>%
summarise(mean_max = mean(max.temp, na.rm = TRUE), mean_min = mean(min.temp, na.rm = TRUE),
mean_mean = mean(mean.temp, na.rm = TRUE), sd_max = sd(max.temp, na.rm = TRUE),
sd_min = sd(min.temp, na.rm = TRUE), sd_mean = sd(mean.temp, na.rm = TRUE),
cv_max = 100_(sd_max/mean_max), cv_min = 100_(sd_min/mean_min),
cv_mean = 100*(sd_mean/mean_mean)) %>%
ungroup %>%
gather(variable, observed, 3:11) %>%
replace_na(list(observed = "NA")) %>%
select(-site) %>%
spread(variable, observed, fill = 0) %>%
rename(max_temp = mean_max, min_temp = mean_min, mean_temp = mean_mean,
sd_max_temp = sd_max, sd_min_temp = sd_min, sd_mean_temp = sd_mean,
cv_max_temp = cv_max, cv_min_temp = cv_min, cv_mean_temp = cv_mean)
Right now, bwgtools
is a tool that only Andrew can use (in its entirety), when actually it is a tool for everyone! The package should be accompanied by clear documentation. The first priority is clear communication with all members of the BWG research group. The second is communication with reviewers, editors and readers of the manuscripts we produce.
I would like this issue to be a place to hold all the discussion we have on this topic. We can open separate ones for specific tasks once they arise.
I really encourage everyone who is using this package to consider contributing to documentation! I will help settle any doubts you might have about exactly how to do it (e.g. if you want to edit the documentation of a function).
here are some things I think could be better:
read_sheet
, read_site_sheet
, get_all_sites
and combine_tab
. Should they all be renamed to download_x
, where x
is something specific?read_site_sheet
needs better error catching if the path is wrong. right now, if you give it the wrong path it tries to go to Dropbox anyway. This gives an unhelpful error and wastes time.
so just let it check, not for the existing file, but for being a certain length.
if it is long, must be a path, look for file
if file is absent exit with warning.
This function should have an argument which permits users to specify whether
FALSE
)Is it dangerous or risky to use match.fun()
to find a package's own functions? small example of code in question here
background The package has a specialized purpose: reading data in from excel files which are stored in dropbox. the backstory is that we have results from many international replicates of an experiment. the package uses Karthik Ram's rdrop2
and Hadley Wickham's readxl
packages. This way scientists on the project can get their data directly into R from the excel files in their dropbox, without ever actually opening Dropbox.
All the excel files are made from the same template, so they should be in a standardized form. Therefore a function that works for one will work for all.
Within each excel file there are multiple tabs. each tab is different, and needs to be read in with specific arguments.
the problem: how can I allow users to simply say which tab (aka "sheet") they want, without having to set default arguments manually every time?
Andrew's solution: create a function with the exact same name as the tab in question (here leaf.waterdepths()
). the function reads in the tab of the same name with all the correct arguments. When users ask for a sheet, my function read_sheet()
uses match.fun()
to find the correct reading function.
unclear how these should be interpreted. Should they have been averaged into a single value?
This function does not work properly for French Guiana as the experiment last from 2012 to 2013. So the nday column gives weird values in 2013!
save and store the existing analysis for hydrological stability
adapt script for use with this package?
This script is currently written for accessing the invert ad distribution organism file from offline locations, but can be modified to access these from BWG Dropbox locations as soon as we have bwgtools setup to read these files directly. Diane
invert.final<-read.table("Drought_data_PuertoRico_bromeliad.final.inverts.csv", sep=",", header=TRUE)
head(invert.final)
library(tidyr)
library(dplyr)
library(magrittr)
invert.long<-invert.final %>% gather(species, quantity,Diptera.292:Ostracoda.7)%>%
spread(abundance.or.biomass, quantity)%>%
separate(trt.name, c("mu", "k"), "k")%>%
mutate(mu = extract_numeric(mu))
head(invert.long)
dist.org<-read.table("Distributions_organisms_full_nonames.csv", sep=",", na.strings="",header=TRUE)
head(dist.org); tail(dist.org)
invert.full <- merge(invert.long, dist.org, by.x = "species", by.y = "nickname")
May I push a .gitattributes file to keep linux line endings?
we need at least some blank cells; having no biomass rows changes the shape of the data later
the forupload
file contains our most recent bwgdb insect trait data. These should be combined with this package. add these data to the package, and get get_bwg_tools
sheet name is bromeliad.terrestrial
data to be used by @DEZERALDOlivier
Diptera.192 is mentioned twice in bromeliad.inverts.final
Regis requires three matricies for his analysis:
The rows of Q and L must be identical, and the columns of L and the rows of R must be identical
Perhaps the easiest thing is to write a function that does this for only one site, then give it all sites.
start with trait data merged on abundance data, and the physical data too.
filter and spread trait data
separate into matricies.
filter and arrange the physical data.
this involves calculations performed with observed measurements of bromeliad water depth
We need a test for duplicate bromeliads. What if somebody has inadvertently typed in the same code? We need to find this before it causes problems.
Which sheets need to be checked?
Are bromeliad.id s always unique within each of those sheets, or are there some that have duplicate values by design (ie repeated rows)?
Should we actually be checking for consistency with treatment?
As reported by @nacmarino , @dsrivast :
both the cv.depth as well as the wetness for argentina_15 are missing. This shouldn't be so, as there is a mean and a maximum value on the spreadsheet.
And time.since.minimum is entirely NA...all bromelaids all sites.
argentina_15 is not NA but NaN, suggesting that there is illegal math
issue.
in bromeliad.inverts.final
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.