forestgeo / fgeo.data Goto Github PK

View Code? Open in Web Editor NEW

8.0 2.0 7.0 13.59 MB

Open Datasets of ForestGEO

Home Page: https://forestgeo.github.io/fgeo.data/

License: Other

R 100.00%

fgeo forestgeo ecology dynamics tree datasets data examples open-datasets

fgeo.data's Introduction

Open datasets of ForestGEO

Installation

Install the pre-release version of fgeo.data:

# install.packages("devtools")
devtools::install_github("forestgeo/fgeo.data@pre-release")

Or install the development version of fgeo.data:

# install.packages("devtools")
devtools::install_github("forestgeo/fgeo.data")

Or install all fgeo packages in one step.

For details on how to install packages from GitHub, see this article.

Example

library(fgeo.data)

str(data_dictionary)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    242 obs. of  3 variables:
#>  $ table      : chr  "Census" "Census" "Census" "Census" ...
#>  $ column     : chr  "CensusID" "PlotID" "PlotCensusNumber" "StartDate" ...
#>  $ description: chr  "Primary key, an integer  automatically generated to uniquely identify a census." "Foreign Key to Site table." "Integer census number for an individual plot, 1=first census, 2=second census, etc. If there are more than one "| __truncated__ "Date on which the first measurement of the census was taken." ...

str(luquillo_tree5_random)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    1000 obs. of  19 variables:
#>  $ treeID   : int  104 119 180 602 631 647 1086 1144 1168 1380 ...
#>  $ stemID   : int  143 158 225 736 775 793 1339 1410 1438 114352 ...
#>  $ tag      : chr  "10009" "100104" "100171" "100649" ...
#>  $ StemTag  : chr  "10009" "100104" "100174" "100649" ...
#>  $ sp       : chr  "DACEXC" "MYRSPL" "CASARB" "GUAGUI" ...
#>  $ quadrat  : chr  "113" "1021" "921" "821" ...
#>  $ gx       : num  10.3 182.9 164.6 149 38.3 ...
#>  $ gy       : num  245 410 410 414 245 ...
#>  $ MeasureID: int  439947 466597 466623 466727 439989 466743 440021 466889 440031 466957 ...
#>  $ CensusID : int  5 5 5 5 5 5 5 5 5 5 ...
#>  $ dbh      : num  193 40.4 45 33 140 246 176 74 604 NA ...
#>  $ pom      : chr  "1.4" "1.25" "1.3" "1.3" ...
#>  $ hom      : num  1.4 1.26 1.32 1.3 1.24 1.25 1.45 1.3 1.31 NA ...
#>  $ ExactDate: Date, format: "2011-11-02" "2012-02-06" ...
#>  $ DFstatus : chr  "alive" "alive" "alive" "alive" ...
#>  $ codes    : chr  "MAIN;A" "MAIN;A" "SPROUT;A" "MAIN;A" ...
#>  $ nostems  : num  1 1 2 1 1 1 1 1 1 2 ...
#>  $ status   : chr  "A" "A" "A" "A" ...
#>  $ date     : num  18933 19029 19026 19024 18933 ...

str(luquillo_elevation)
#> List of 4
#>  $ col :Classes 'tbl_df', 'tbl' and 'data.frame':    6565 obs. of  3 variables:
#>   ..$ x   : int [1:6565] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ y   : int [1:6565] 0 5 10 15 20 25 30 35 40 45 ...
#>   ..$ elev: num [1:6565] 364 364 363 363 363 ...
#>  $ mat : num [1:101, 1:65] 364 364 363 363 363 ...
#>  $ xdim: int 320
#>  $ ydim: int 500

Get started with fgeo

Information

fgeo.data's People

Contributors

Stargazers

Watchers

Forkers

tylerlittlefield overstreeth laurenkrizel helixcn abbiesachapman fdbesanto2

fgeo.data's Issues

Remove dbhID from both stem and tree tables?

https://mail.google.com/mail/u/0/?zx=yxdmg1oh1mf4#search/in%3Asent+laoz%40si.edu/162bc4dc2db9f8c1

Request

From: Mauro Lepore [[email protected]]
Sent: Monday, March 19, 2018 9:43 AM
To: Davies, Stuart J.; Lao, Suzanne
Subject: Can you please send me the current BCI dataset?

Can you please send me the current BCI dataset?
...
I would specifically use these two pieces:

A randomly selected subset of aprox. 1000 tree-tags from one whole-plot.
A subset of all trees from at least four quadrats -- 1 hectare would be great.
I would use the exact same subset of stems in all kinds of tables: tree, stem, ViewFullTable.

Also need all other up-to-date datasets of the same site: ViewTaxonomy, elevation, and whatever else you consider standard for a site.

Create bit.ly links to common datasets

This will be helpful to train partners in how to deal with raw data.

Useful links:

to data-raw/database-output/, so the entire collection of files can be downloaded with usethis::use_course(). This includes a .Rproj file and a data-raw/ directory, to show learners how to organize a project.

For example:

A zip file that lives here:

https://github.com/forestgeo/fgeo.opendata/blob/dev/inst/database-output.zip

Can be downloaded with (note that blob is replaced by raw):

use_course("https://github.com/forestgeo/fgeo.opendata/raw/dev/inst/database-output.zip")

to each individual file in data-raw/, for cases when a single file is what matters for a particular demonstration.

Develop tools to summarize ForestGEO data

This issue starts to collect ideas for summaries of ForestGEO data which could be included in different tools, for example:

A bookdown book.
A shiny app.
A function that creates a report in different formats.

Feel free to add more ideas as comments below.

Background

Today I discussed with Patrick Jansen the idea of developing tools to summarize ForestGEO data that could be used to gain some insight about plot data short after it's been collected.

This idea related to building the new generation of ForestGEO plot-books (online, interactive, built with pkgdown), which @teixeirak is working on. She started a repository to work on such a book, and that seems to be the best place to discuss this issue. But I discuss it here because that repository is now private.

Some specific ideas

Patrick would like to visualize data from a site relative to other sites. These are some specific visualizations he proposes:

Cumulative area versus cumulative number of species. This gives a curve for each plot.
Same but cumulative number of stems versus cumulative number of species.
Scatterplot of total number of stems versus total number of species. This gives a point per plot.

Interactivity: Data from the focal plot could be highlighted by clicking on the legend of a plotly graph.

Keep DFstatus in stem but not tree tables

via

---------- Forwarded message ----------
From: Lao, Suzanne <[email protected]>
Date: Mon, Apr 16, 2018 at 2:20 PM
Subject: RE: Can you please send me the current BCI dataset?
To: Mauro Lepore <[email protected]>

...you should keep it in the Stem files. This is just the Status variable from ViewFullTable. In other words, DFstatus in the R stem file is exactly Status in the ViewFullTable.

You don’t need it in the Tree R files because DFstatus refers to the stem, not the tree.

Automatize creating toy datasets

Make it easy to create toy datasets for examples and tests.

Alternatives:

Pre made:

Pre-make toy datasets and keep them in fgeo.data.
Paste them to script with an addin, maybe internally using datapasta.

On the fly:

Create them on the fly with expand.grid or simlar alternative in tidyr and friends.

Merge #13 and #14 following Suzanne's feedback?

#13 #14.

Acknowledgment

tyluRp:

suggested to add NEWS.md
Fixed typos and broken links in vignettes.

Remove DFstatus from both stem and tree tables?

https://mail.google.com/mail/u/0/?zx=yxdmg1oh1mf4#search/in%3Asent+laoz%40si.edu/162bc4dc2db9f8c1

Simplify documentation of census_description to point to data_dictionary

For easier maintenance, point to data_dictionary instead of repeating the column definitions.

fix links to raw data

DONE

bit.ly
url <- "http://bit.ly/fgeo-opendata-taxa-bci"
url <- "http://bit.ly/fgeo-data-bci-taxa"

#' Raw .csv file available at \url{http://bit.ly/fgeo-opendata-vft-random-bci}).
#' Raw .csv file available at \url{http://bit.ly/fgeo-data-bci-vft-random}).

#' \url{http://bit.ly/fgeo-opendata-vft-1ha-bci}.
#' \url{http://bit.ly/fgeo-data-bci-vft-1ha}.

#' \url{http://bit.ly/fgeo-opendata-elevation-bci}.
#' \url{http://bit.ly/fgeo-data-bci-elevation}.

Review documentation

The documentation of the fgeo.data package is available at https://forestgeo.github.io/fgeo.data. To suggest edits please add your comments below or create a pull request.

If you have any question, please ask me by referring to me as @maurolepore.

Search related issues on https://github.com with: label:hacktoberfest maurolepore

Fix `date`: All values are missing.

all(is.na(fgeo.data::luquillo_tree5_random$date))
#> [1] TRUE
all(is.na(fgeo.data::luquillo_stem5_random$date))
#> [1] TRUE
# Is the problem comming from rtbl?
all(is.na(fgeo.data::luquillo_vft_4quad$Date))
#> [1] FALSE

Created on 2018-09-28 by the reprex package (v0.2.1)

Why there may be more than one `tag` per `TreeID`?

library(tidyverse)
stem <- fgeo.data::luquillo_stem6_1ha
dups <- c("100223", "116681", "118059", "119589", "17112", "18174")
stem %>% 
  select(treeID, tag) %>% 
  unique() %>% 
  filter(tag %in% dups) %>% 
  arrange(tag)
#> # A tibble: 12 x 2
#>    treeID tag   
#>     <int> <chr> 
#>  1    224 100223
#>  2 125584 100223
#>  3  14122 116681
#>  4 125589 116681
#>  5  15349 118059
#>  6 125590 118059
#>  7  16687 119589
#>  8 125591 119589
#>  9  26248 17112 
#> 10 125552 17112 
#> 11  27280 18174 
#> 12 125564 18174

Created on 2018-06-22 by the reprex package (v0.2.0).

replace bciex with fgeo.opendata in fgeo packages.

Update luquillo_habitat to be of class fgeo_habitat.

Add documentation from bciex

In bciex I had documented datasets generally.

Ensure luquillo_stem_random_tiny has same number of rows per CensusID

library(tidyverse)
#> -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1     v purrr   0.2.4
#> v tibble  1.4.2     v dplyr   0.7.4
#> v tidyr   0.8.0     v stringr 1.3.0
#> v readr   1.1.1     v forcats 0.3.0
#> -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()

# Good
list(
  census1 = fgeo.data::luquillo_stem_random %>% filter(CensusID == 5),
  census2 = fgeo.data::luquillo_stem_random %>% filter(CensusID == 6)
) %>%
  map(nrow)
#> $census1
#> [1] 1324
#> 
#> $census2
#> [1] 1324

# Bad
list(
  census1 = fgeo.data::luquillo_stem_random_tiny %>% filter(CensusID == 5),
  census2 = fgeo.data::luquillo_stem_random_tiny %>% filter(CensusID == 6)
) %>%
  map(nrow)
#> $census1
#> [1] 13
#> 
#> $census2
#> [1] 9

Define undefined status

library(tidyverse)
library(fgeo.data)

tables <- c("bci_stem6_random", "bci_tree6_random", "bci_vft_random")
tables %>% 
  map(get) %>% 
  set_names(tables) %>% 
  map(select, matches("status")) %>% 
  map(unique) %>% 
  map(pull) %>% 
  map(paste, collapse = ", ")
#> $bci_stem6_random
#> [1] "A, V, D, P, G, AR"
#> 
#> $bci_tree6_random
#> [1] "A, D, P"
#> 
#> $bci_vft_random
#> [1] "alive, broken below, dead, missing"

Created on 2018-04-16 by the reprex package (v0.2.0).

#   Subject: RE: Why there may be more than one tag per TreeID?
# > In the case of Luquillo, which is a relatively clean dataset, there 
#    shouldn’t be # duplicate tree tags.
# > I have fixed the errors. Please download the ViewFullTable

Add CensusID to allow merging and grouping multiple census tables.

Fix the FIXME in the documentation of census tables.

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/, https://community.rstudio.com/, https://github.com/forestgeo/forum/ or email Mauro Lepore at [email protected].

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

Brief description of the problem

# insert reprex here

SubspeciesID FIXME: Foreign Key to FIXME table.
Date         FIXME: Integer representing the number of days since 1 Jan 1960. 
HighHOM      FIXME: Define HighHOM     
LargeStem    FIXME: Define LargeStem

Update data with changes in rtbl()

https://github.com/forestgeo/rtbl/issues/42

Update census tables.

Census tables tree and stem are out of date. They don't match with the output of rtbl(). Try recreate them with rtbl().