Code Monkey home page Code Monkey logo

fgeo.data's Introduction

Open datasets of ForestGEO

lifecycle Travis build status CRAN status

Installation

Install the pre-release version of fgeo.data:

# install.packages("devtools")
devtools::install_github("forestgeo/fgeo.data@pre-release")

Or install the development version of fgeo.data:

# install.packages("devtools")
devtools::install_github("forestgeo/fgeo.data")

Or install all fgeo packages in one step.

For details on how to install packages from GitHub, see this article.

Example

library(fgeo.data)

str(data_dictionary)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    242 obs. of  3 variables:
#>  $ table      : chr  "Census" "Census" "Census" "Census" ...
#>  $ column     : chr  "CensusID" "PlotID" "PlotCensusNumber" "StartDate" ...
#>  $ description: chr  "Primary key, an integer  automatically generated to uniquely identify a census." "Foreign Key to Site table." "Integer census number for an individual plot, 1=first census, 2=second census, etc. If there are more than one "| __truncated__ "Date on which the first measurement of the census was taken." ...

str(luquillo_tree5_random)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    1000 obs. of  19 variables:
#>  $ treeID   : int  104 119 180 602 631 647 1086 1144 1168 1380 ...
#>  $ stemID   : int  143 158 225 736 775 793 1339 1410 1438 114352 ...
#>  $ tag      : chr  "10009" "100104" "100171" "100649" ...
#>  $ StemTag  : chr  "10009" "100104" "100174" "100649" ...
#>  $ sp       : chr  "DACEXC" "MYRSPL" "CASARB" "GUAGUI" ...
#>  $ quadrat  : chr  "113" "1021" "921" "821" ...
#>  $ gx       : num  10.3 182.9 164.6 149 38.3 ...
#>  $ gy       : num  245 410 410 414 245 ...
#>  $ MeasureID: int  439947 466597 466623 466727 439989 466743 440021 466889 440031 466957 ...
#>  $ CensusID : int  5 5 5 5 5 5 5 5 5 5 ...
#>  $ dbh      : num  193 40.4 45 33 140 246 176 74 604 NA ...
#>  $ pom      : chr  "1.4" "1.25" "1.3" "1.3" ...
#>  $ hom      : num  1.4 1.26 1.32 1.3 1.24 1.25 1.45 1.3 1.31 NA ...
#>  $ ExactDate: Date, format: "2011-11-02" "2012-02-06" ...
#>  $ DFstatus : chr  "alive" "alive" "alive" "alive" ...
#>  $ codes    : chr  "MAIN;A" "MAIN;A" "SPROUT;A" "MAIN;A" ...
#>  $ nostems  : num  1 1 2 1 1 1 1 1 1 2 ...
#>  $ status   : chr  "A" "A" "A" "A" ...
#>  $ date     : num  18933 19029 19026 19024 18933 ...

str(luquillo_elevation)
#> List of 4
#>  $ col :Classes 'tbl_df', 'tbl' and 'data.frame':    6565 obs. of  3 variables:
#>   ..$ x   : int [1:6565] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ y   : int [1:6565] 0 5 10 15 20 25 30 35 40 45 ...
#>   ..$ elev: num [1:6565] 364 364 363 363 363 ...
#>  $ mat : num [1:101, 1:65] 364 364 363 363 363 ...
#>  $ xdim: int 320
#>  $ ydim: int 500

Get started with fgeo

Information

fgeo.data's People

Contributors

maurolepore avatar overstreeth avatar tylerlittlefield avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fgeo.data's Issues

Add data from Suzanne

Request

From: Mauro Lepore [[email protected]]
Sent: Monday, March 19, 2018 9:43 AM
To: Davies, Stuart J.; Lao, Suzanne
Subject: Can you please send me the current BCI dataset?

Can you please send me the current BCI dataset?
...
I would specifically use these two pieces:

  1. A randomly selected subset of aprox. 1000 tree-tags from one whole-plot.
  2. A subset of all trees from at least four quadrats -- 1 hectare would be great.
    I would use the exact same subset of stems in all kinds of tables: tree, stem, ViewFullTable.

Also need all other up-to-date datasets of the same site: ViewTaxonomy, elevation, and whatever else you consider standard for a site.

Create bit.ly links to common datasets

This will be helpful to train partners in how to deal with raw data.

Useful links:

  • to data-raw/database-output/, so the entire collection of files can be downloaded with usethis::use_course(). This includes a .Rproj file and a data-raw/ directory, to show learners how to organize a project.

For example:

A zip file that lives here:

https://github.com/forestgeo/fgeo.opendata/blob/dev/inst/database-output.zip

Can be downloaded with (note that blob is replaced by raw):

use_course("https://github.com/forestgeo/fgeo.opendata/raw/dev/inst/database-output.zip")
  • to each individual file in data-raw/, for cases when a single file is what matters for a particular demonstration.

Develop tools to summarize ForestGEO data

This issue starts to collect ideas for summaries of ForestGEO data which could be included in different tools, for example:

  • A bookdown book.
  • A shiny app.
  • A function that creates a report in different formats.

Feel free to add more ideas as comments below.


Background

Today I discussed with Patrick Jansen the idea of developing tools to summarize ForestGEO data that could be used to gain some insight about plot data short after it's been collected.

This idea related to building the new generation of ForestGEO plot-books (online, interactive, built with pkgdown), which @teixeirak is working on. She started a repository to work on such a book, and that seems to be the best place to discuss this issue. But I discuss it here because that repository is now private.

Some specific ideas

Patrick would like to visualize data from a site relative to other sites. These are some specific visualizations he proposes:

  • Cumulative area versus cumulative number of species. This gives a curve for each plot.
  • Same but cumulative number of stems versus cumulative number of species.
  • Scatterplot of total number of stems versus total number of species. This gives a point per plot.

Interactivity: Data from the focal plot could be highlighted by clicking on the legend of a plotly graph.

Keep DFstatus in stem but not tree tables

via

---------- Forwarded message ----------
From: Lao, Suzanne <[email protected]>
Date: Mon, Apr 16, 2018 at 2:20 PM
Subject: RE: Can you please send me the current BCI dataset?
To: Mauro Lepore <[email protected]>

...you should keep it in the Stem files. This is just the Status variable from ViewFullTable. In other words, DFstatus in the R stem file is exactly Status in the ViewFullTable.

You don’t need it in the Tree R files because DFstatus refers to the stem, not the tree.

Automatize creating toy datasets

Make it easy to create toy datasets for examples and tests.

Alternatives:

Pre made:

  • Pre-make toy datasets and keep them in fgeo.data.
  • Paste them to script with an addin, maybe internally using datapasta.

On the fly:

  • Create them on the fly with expand.grid or simlar alternative in tidyr and friends.

Acknowledgment

tyluRp:

  • suggested to add NEWS.md
  • Fixed typos and broken links in vignettes.

Fix `date`: All values are missing.

all(is.na(fgeo.data::luquillo_tree5_random$date))
#> [1] TRUE
all(is.na(fgeo.data::luquillo_stem5_random$date))
#> [1] TRUE
# Is the problem comming from rtbl?
all(is.na(fgeo.data::luquillo_vft_4quad$Date))
#> [1] FALSE

Created on 2018-09-28 by the reprex package (v0.2.1)

Why there may be more than one `tag` per `TreeID`?

library(tidyverse)
stem <- fgeo.data::luquillo_stem6_1ha
dups <- c("100223", "116681", "118059", "119589", "17112", "18174")
stem %>% 
  select(treeID, tag) %>% 
  unique() %>% 
  filter(tag %in% dups) %>% 
  arrange(tag)
#> # A tibble: 12 x 2
#>    treeID tag   
#>     <int> <chr> 
#>  1    224 100223
#>  2 125584 100223
#>  3  14122 116681
#>  4 125589 116681
#>  5  15349 118059
#>  6 125590 118059
#>  7  16687 119589
#>  8 125591 119589
#>  9  26248 17112 
#> 10 125552 17112 
#> 11  27280 18174 
#> 12 125564 18174

Created on 2018-06-22 by the reprex package (v0.2.0).

Ensure luquillo_stem_random_tiny has same number of rows per CensusID

library(tidyverse)
#> -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1     v purrr   0.2.4
#> v tibble  1.4.2     v dplyr   0.7.4
#> v tidyr   0.8.0     v stringr 1.3.0
#> v readr   1.1.1     v forcats 0.3.0
#> -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()

# Good
list(
  census1 = fgeo.data::luquillo_stem_random %>% filter(CensusID == 5),
  census2 = fgeo.data::luquillo_stem_random %>% filter(CensusID == 6)
) %>%
  map(nrow)
#> $census1
#> [1] 1324
#> 
#> $census2
#> [1] 1324

# Bad
list(
  census1 = fgeo.data::luquillo_stem_random_tiny %>% filter(CensusID == 5),
  census2 = fgeo.data::luquillo_stem_random_tiny %>% filter(CensusID == 6)
) %>%
  map(nrow)
#> $census1
#> [1] 13
#> 
#> $census2
#> [1] 9

Define undefined status

library(tidyverse)
library(fgeo.data)

tables <- c("bci_stem6_random", "bci_tree6_random", "bci_vft_random")
tables %>% 
  map(get) %>% 
  set_names(tables) %>% 
  map(select, matches("status")) %>% 
  map(unique) %>% 
  map(pull) %>% 
  map(paste, collapse = ", ")
#> $bci_stem6_random
#> [1] "A, V, D, P, G, AR"
#> 
#> $bci_tree6_random
#> [1] "A, D, P"
#> 
#> $bci_vft_random
#> [1] "alive, broken below, dead, missing"

Created on 2018-04-16 by the reprex package (v0.2.0).

Scrap column definitions from the online dictionary.

Columns definitions may change and fall out of sync relative to definitions hard-wired in the package.

Scrap definitions from the web, and provide a single dataset of variables and definitions for users to find their own definitions.

Update data

Use fixed vft in data-raw\private\ViewFullTable_luquillo to remake all affected data.

#   Subject: RE: Why there may be more than one tag per TreeID?
# > In the case of Luquillo, which is a relatively clean dataset, there 
#    shouldn’t be # duplicate tree tags.
# > I have fixed the errors. Please download the ViewFullTable

Fix the FIXME in the documentation of census tables.

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/, https://community.rstudio.com/, https://github.com/forestgeo/forum/ or email Mauro Lepore at [email protected].

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Brief description of the problem

# insert reprex here

Define missing column-definitions in vft

Labelled with FIXME.

SubspeciesID FIXME: Foreign Key to FIXME table.
Date         FIXME: Integer representing the number of days since 1 Jan 1960. 
HighHOM      FIXME: Define HighHOM     
LargeStem    FIXME: Define LargeStem  

Update census tables.

Census tables tree and stem are out of date. They don't match with the output of rtbl(). Try recreate them with rtbl().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.