SMurphy 2023-12-04
- Task 1
- 1.1 Import Data
- 1.2 Audit Data
- 1.3 Tidy Data
- 2.1 Compute Biomass at Plot Level (Eq1)
- 2.2 Compute Biomass at Strata Level (Eq2)
- 2.3 Compute Carbon in Harvested Volumes (Eq3)
- 2.4 Compute Carbon in Extracted Volumes (Eq4)
- 3.1 Additional Metrics
- 3.2 Regression Diagnostics
- Appendix I: Codebook Derived
- Task 2
Prototype Tool for Quantifying GHG in Standing Timber Using VSC-Methodology-VM0010
The following workflow was used to test a prototype tool for quantifying greenhouse gas (GHG) emissions reduction. The objective is to illustrate the process outlined in the VCS Methodology, specifically VM0010, with an emphasis on simplicity of tool design over burdensome complexity. The chosen subsection for digitalization is “8.1.1 Calculation of carbon stocks in commercial timber volumes”.
A number of empricial assumptions were made including sourcing of
pre-existing inventory dataset with pre-defined format and values
published in the exercise’s document (‘Program Officer Technology
Solutions Seamus Murphy’). Using this table of observations, an excel
spreadsheet was copied into and imported into an R environment as the
dataframe ’dataset_raw'
and commited to the project’s github
repository here.
In this original, unchanged dataset, assumptions of values can be viewed
regarding sample strata, plot, species, tree, volume, species wood
characteristics, and plot area.
The following table presents a data dictionary of these values, along
with descriptions of their units, variable labels, types and file
structure. Documentation of the dataset as it was received in original
format ‘dataset_raw.xlsx
’ and descriptions of output dataset
’dataset_tidy.xlsx'
are intended to enable reproducability, encourage
collaboration and inform future procedures for data submissions from
clients.
“Volume.V.l.j.I.sp”=> “volume” | ‘dataset_raw’, ‘dataset_tidy’ | Whole stem volume of living tree | m3 | numeric |
“Species..j.” => “species_j” | ‘dataset_raw’, ‘dataset_tidy’ | Tree species used at the species summary level. | Sp1–Sp5 | character |
“Stratum...i.” => ’stratum_i” | ‘dataset_raw’, ‘dataset_tidy’ | Free form text field identifying stratum of each plot. | 1, 2 | integer > Factor |
“Plot..sp.” => “plot_sp” | ‘dataset_raw’, ‘dataset_tidy’ | Numerical identifier unique only within its stratum. | 1, 2, 3 | integer |
“Tree..l.” => “tree_l” | ‘dataset_raw’, ‘dataset_tidy’ | Text field identifying tree observation unique only specific plot family of associated stratum. | t1 - t5 | character>numeric |
“bcef_r” | ‘dataset_raw’, ‘dataset_tidy’ | Biomass conversion and expansion factor used to derive carbon stocks from timber volume | % | numeric |
“cf” | ‘dataset_raw’, ‘dataset_tidy’ | Carbon factor used to derive carbon stocks from volumes of extracted timber | % | numeric |
“d” | ‘dataset_tidy’, | Basic wood density represented as fraction of dry in tons to green volume | % | numeric |
“a_sp” | dataset_tidy | Area in hectares of individual sample plots | ha | numeric |
“vji_sp_m3” | dataset_tidy | Sum volume of merchantable timber of a species from specific plots within specific stratum | m3 | numeric |
“vji_ha_m3” | dataset_tidy | Mean volume per hectare of merchantable timber of a species in a specific stratum. | m3.ha-1 | numeric |
chb_ha_tC | dataset_tidy | Mean volume of carbon in extracted timber of species in stratum using ‘bcef_r’ and ‘cf’ factors. | tC.ha-1 | numeric |
cex_ha_tC | dataset_tidy | Mean volume of carbon in extracted biomass of a species in a stratum using ‘d’ and ‘cf’ factors. | tC.ha-1 | numeric |
Import dataset_raw
and write copy to dataset_tidy
. Seed is set to
77777
.
set.seed(77777)
dataset_raw <- read_excel("dataset_raw.xlsx")
write.csv(dataset_raw, "dataset_tidy.csv", row.names = FALSE)
dataset_tidy <- read.csv("dataset_tidy.csv")
dataset_tidy
# A tibble: 10 × 5
Stratum...i. Plot..sp. Species..j. Tree..l. Volume..V_.l.j.I.sp..
<int> <int> <chr> <chr> <dbl>
1 1 1 Sp1 t1 3.3
2 1 1 Sp1 t2 4.8
3 1 1 Sp1 t3 4.08
4 1 2 Sp4 t1 1.5
5 1 2 Sp4 t2 1.68
6 2 1 Sp1 t1 1.38
7 2 1 Sp2 t2 3.24
8 2 1 Sp3 t3 3.72
9 2 1 sp4 t4 2.94
10 2 1 Sp5 t5 3.36
Examine structure, scan for errors, and save audit report for input to codebook (Appendix I).
str(dataset_tidy)
dplyr::count(dataset_tidy, Species..j.)
saveHTML(dataMaid::makeDataReport(
dataset_tidy,
output = "html",
codebook = TRUE,
onlyProblematic = TRUE,
visuals = setVisuals(all = "basicVisual"),
replace = TRUE
))
'data.frame': 10 obs. of 5 variables:
$ Stratum...i. : int 1 1 1 1 1 2 2 2 2 2
$ Plot..sp. : int 1 1 1 2 2 1 1 1 1 1
$ Species..j. : chr "Sp1" "Sp1" "Sp1" "Sp4" ...
$ Tree..l. : chr "t1" "t2" "t3" "t1" ...
$ Volume..V_.l.j.I.sp..: num 3.3 4.8 4.08 1.5 1.68 1.38 3.24 3.72 2.94 3.36
# A tibble: 6 × 2
Species..j. n
<chr> <int>
1 Sp1 4
2 Sp2 1
3 Sp3 1
4 Sp4 2
5 Sp5 1
6 sp4 1
Tests identified one problematic entry in values of Species..j.
variable and multiple naming issues inherited during data import.
Variable relabelling was carried out according to naming convention
stated in exercise document, the symbology used in equations 1-4 of
subsection 8.1.1 of the VM0010 Verra
Methodology,
and syntax standards provided in the Tidyverse style
guide. This involved the following
changes.
- Correct the case-sensitive error from
sp4
toSp4
. - Convert or recode variables to reflec their class
- Remove spaces
- Use underscores between words
- Change to lowercase
- Warnings, messages turned to silent
- Space placed after commas
- Max 80 characters to chunks
- Comments limited to single line each
::
operator used to signpost installs & avoid package conflicts
data.table::setnames(dataset_tidy, old = "Stratum...i.", new = "stratum_i", skip_absent = TRUE)
data.table::setnames(dataset_tidy, old = "Species..j.", new = "species_j", skip_absent = TRUE)
data.table::setnames(dataset_tidy, old = "Plot..sp.", new = "plot_sp", skip_absent = TRUE)
data.table::setnames(dataset_tidy, old = "Tree..l.", new = "tree_l", skip_absent = TRUE)
data.table::setnames(dataset_tidy, old = "Volume..V_.l.j.I.sp..", new = "volume", skip_absent = TRUE)
dataset_tidy$species_j[dataset_tidy$species_j == "sp4"] <- "Sp4"
dataset_tidy$species_j <- as.factor(dataset_tidy$species_j)
dataset_tidy$stratum_i <- as.factor(dataset_tidy$stratum_i)
Derive new variable columns using values provided in the exercise document. Compute additional variables of area estimates at different units and scales:
bcef_r
: Biomass expansion factor applicable to wood removals (t.d.m m-3).cf
: Carbon fraction of biomass of species (tCt d.m.-1)d
: Basic wood density of species (t d.m. m-3)a_sp
: Total area of plot in hectares (ha)a_sp_m2
: Total area of plot in metres squared (m2)a_I_m2
: Total area of stratum in metres squared (m2)a_I_ha
: Total area of stratum in hectares (ha)
Tabulate, confirm and write to directory.
dataset_tidy$bcef_r <- 0.7
dataset_tidy$cf <- 0.5
dataset_tidy$d <- 0.5
dataset_tidy$a_sp <- 0.1
dataset_tidy$a_sp_m2 <- dataset_tidy$a_sp * 10000
dataset_tidy <- dataset_tidy %>%
group_by(stratum_i) %>%
mutate(a_I_m2 = sum(a_sp_m2), a_I_ha = sum(a_sp))
dataset_tidy
# A tibble: 10 × 12
stratum_i plot_sp species_j tree_l volume bcef_r cf d a_sp a_sp_m2
<fct> <int> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 Sp1 t1 3.3 0.7 0.5 0.5 0.1 1000
2 1 1 Sp1 t2 4.8 0.7 0.5 0.5 0.1 1000
3 1 1 Sp1 t3 4.08 0.7 0.5 0.5 0.1 1000
4 1 2 Sp4 t1 1.5 0.7 0.5 0.5 0.1 1000
5 1 2 Sp4 t2 1.68 0.7 0.5 0.5 0.1 1000
6 2 1 Sp1 t1 1.38 0.7 0.5 0.5 0.1 1000
7 2 1 Sp2 t2 3.24 0.7 0.5 0.5 0.1 1000
8 2 1 Sp3 t3 3.72 0.7 0.5 0.5 0.1 1000
9 2 1 Sp4 t4 2.94 0.7 0.5 0.5 0.1 1000
10 2 1 Sp5 t5 3.36 0.7 0.5 0.5 0.1 1000
# ℹ 2 more variables: a_I_m2 <dbl>, a_I_ha <dbl>
write.csv(dataset_tidy, "dataset_tidy.csv", row.names = FALSE)
Note all four equations that follow derive estimates according to specific species, whether measured at the plot level, stratum or globally. Hence, caution is advised when computing initial, absolute estimates and when deriving or applying hectare expansion factors.
On page 19 of the VM0010 Methodology, ‘equation 1’ provides calculation of merchantable timber volume of a species at the plot level. This is estimated in m3 as follows:
where
dataset_tidy <- dataset_tidy %>%
group_by(species_j, stratum_i, plot_sp) %>%
mutate(vji_sp_m3 = sum(volume))
data.table::setDT(dataset_tidy)[, .(
vji_sp_m3 = sum(volume)
),
by = .(stratum_i, plot_sp, species_j)
]
# A tibble: 7 × 4
stratum_i plot_sp species_j vji_sp_m3
<fct> <int> <fct> <dbl>
1 1 1 Sp1 12.2
2 1 2 Sp4 3.18
3 2 1 Sp1 1.38
4 2 1 Sp2 3.24
5 2 1 Sp3 3.72
6 2 1 Sp4 2.94
7 2 1 Sp5 3.36
On page 19 of the VM0010 Methodology, ‘equation 2’ provides calculation of merchantable timber volume of a species at the stratum level:
where
dataset_tidy <- dataset_tidy %>%
group_by(stratum_i, species_j) %>%
mutate(vji_ha_m3 = mean(vji_sp_m3) * 10)
data.table::setDT(dataset_tidy)[, .(
vji_sp_m3,
vji_ha_m3 = mean(vji_sp_m3) * 10
),
by = .(stratum_i, species_j)
]
# A tibble: 10 × 4
stratum_i species_j vji_sp_m3 vji_ha_m3
<fct> <fct> <dbl> <dbl>
1 1 Sp1 12.2 122.
2 1 Sp1 12.2 122.
3 1 Sp1 12.2 122.
4 1 Sp4 3.18 31.8
5 1 Sp4 3.18 31.8
6 2 Sp1 1.38 13.8
7 2 Sp2 3.24 32.4
8 2 Sp3 3.72 37.2
9 2 Sp4 2.94 29.4
10 2 Sp5 3.36 33.6
On page 20 of the VM0010 Methodology, ‘equation 3’ provides calculation of carbon stock of harvested timber volumes for species i in stratum i.
In the absence of harvest plan data, carbon stocks of commercial timber
will be derived from the full inventory dataset_tidy
.
This differs to ‘equation 3’ below, which measures mean carbon stock
from volumes of ‘harvested biomass’
Carbon estimates are calculated in units of metric tons per hectare
(tC/ha-1) by multiplying mean volume per hectare of extracted
timber of species i from stratum j by the carbon fraction
dataset_tidy <- dataset_tidy %>%
mutate(chb_ha_tC = vji_ha_m3 * bcef_r * cf)
data.table::setDT(dataset_tidy)[, .(
vji_sp_m3,
vji_ha_m3,
chb_ha_tC = vji_ha_m3 * bcef_r * cf
),
by = .(stratum_i, species_j)
]
# A tibble: 10 × 5
stratum_i species_j vji_sp_m3 vji_ha_m3 chb_ha_tC
<fct> <fct> <dbl> <dbl> <dbl>
1 1 Sp1 12.2 122. 42.6
2 1 Sp1 12.2 122. 42.6
3 1 Sp1 12.2 122. 42.6
4 1 Sp4 3.18 31.8 11.1
5 1 Sp4 3.18 31.8 11.1
6 2 Sp1 1.38 13.8 4.83
7 2 Sp2 3.24 32.4 11.3
8 2 Sp3 3.72 37.2 13.0
9 2 Sp4 2.94 29.4 10.3
10 2 Sp5 3.36 33.6 11.8
On page 21 of the VM0010 Methodology, ‘equation 4’ provides calculation
of mean estimates of carbon stocks of species i in stratum j, by
multiplying the species’ value of basic wood density
dataset_tidy <- dataset_tidy %>%
mutate(cex_ha_tC = vji_ha_m3 * d * cf)
data.table::setDT(dataset_tidy)[, .(
vji_sp_m3,
vji_ha_m3,
chb_ha_tC,
cex_ha_tC = vji_ha_m3 * d * cf
),
by = .(stratum_i, species_j)
]
# A tibble: 10 × 6
stratum_i species_j vji_sp_m3 vji_ha_m3 chb_ha_tC cex_ha_tC
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 1 Sp1 12.2 122. 42.6 30.4
2 1 Sp1 12.2 122. 42.6 30.4
3 1 Sp1 12.2 122. 42.6 30.4
4 1 Sp4 3.18 31.8 11.1 7.95
5 1 Sp4 3.18 31.8 11.1 7.95
6 2 Sp1 1.38 13.8 4.83 3.45
7 2 Sp2 3.24 32.4 11.3 8.1
8 2 Sp3 3.72 37.2 13.0 9.3
9 2 Sp4 2.94 29.4 10.3 7.35
10 2 Sp5 3.36 33.6 11.8 8.4
# vJ_ha_m3: Mean volume by species across strata (m^3.ha-1)
dataset_tidy <- dataset_tidy %>%
group_by(species_j) %>%
mutate(vJ_ha_m3 = mean(vji_sp_m3) * 10)
dataset_tidy[1:10, ] %>%
select(
stratum_i, species_j,
plot_sp, vji_ha_m3,
chb_ha_tC, cex_ha_tC,
vJ_ha_m3
) %>%
kbl(caption = "Table with right-hand showing mean biomass volume per species across all strata (m3.ha-1)", escape = T) %>%
kable_minimal(full_width = T)
stratum_i | species_j | plot_sp | vji_ha_m3 | chb_ha_tC | cex_ha_tC | vJ_ha_m3 |
---|---|---|---|---|---|---|
1 | Sp1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 |
1 | Sp1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 |
1 | Sp1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 |
1 | Sp4 | 2 | 31.8 | 11.13 | 7.95 | 31.0 |
1 | Sp4 | 2 | 31.8 | 11.13 | 7.95 | 31.0 |
2 | Sp1 | 1 | 13.8 | 4.83 | 3.45 | 94.8 |
2 | Sp2 | 1 | 32.4 | 11.34 | 8.10 | 32.4 |
2 | Sp3 | 1 | 37.2 | 13.02 | 9.30 | 37.2 |
2 | Sp4 | 1 | 29.4 | 10.29 | 7.35 | 31.0 |
2 | Sp5 | 1 | 33.6 | 11.76 | 8.40 | 33.6 |
# VI_ha_m3: Mean volume by strata across species and plots (m^3.ha-1)
dataset_tidy <- dataset_tidy %>%
group_by(stratum_i) %>%
mutate(vI_ha_m3 = mean(vji_sp_m3) * 10)
dataset_tidy[1:10, ] %>%
select(
stratum_i, species_j,
plot_sp, vji_ha_m3,
chb_ha_tC, cex_ha_tC,
vJ_ha_m3, vI_ha_m3
) %>%
kbl(caption = "Table with right-hand column showing mean biomass volume per strata across all species and plots (m3.ha-1)", escape = T) %>%
kable_minimal(full_width = T)
stratum_i | species_j | plot_sp | vji_ha_m3 | chb_ha_tC | cex_ha_tC | vJ_ha_m3 | vI_ha_m3 |
---|---|---|---|---|---|---|---|
1 | Sp1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 | 85.80 |
1 | Sp1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 | 85.80 |
1 | Sp1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 | 85.80 |
1 | Sp4 | 2 | 31.8 | 11.13 | 7.95 | 31.0 | 85.80 |
1 | Sp4 | 2 | 31.8 | 11.13 | 7.95 | 31.0 | 85.80 |
2 | Sp1 | 1 | 13.8 | 4.83 | 3.45 | 94.8 | 29.28 |
2 | Sp2 | 1 | 32.4 | 11.34 | 8.10 | 32.4 | 29.28 |
2 | Sp3 | 1 | 37.2 | 13.02 | 9.30 | 37.2 | 29.28 |
2 | Sp4 | 1 | 29.4 | 10.29 | 7.35 | 31.0 | 29.28 |
2 | Sp5 | 1 | 33.6 | 11.76 | 8.40 | 33.6 | 29.28 |
# Vsp_ha_n3: Total volume per plot (m^3.ha-1)
dataset_tidy <- dataset_tidy %>%
group_by(plot_sp, stratum_i) %>%
mutate(Vsp_ha_m3 = sum(volume) * 10)
dataset_tidy[1:10, ] %>%
select(
stratum_i,
plot_sp, vji_ha_m3,
chb_ha_tC, cex_ha_tC,
vJ_ha_m3, vI_ha_m3,
vI_ha_m3, Vsp_ha_m3
) %>%
kbl(caption = "Table with right-hand column showing total biomass per plot across species (m3.ha-1)", escape = T) %>%
kable_minimal(full_width = T)
stratum_i | plot_sp | vji_ha_m3 | chb_ha_tC | cex_ha_tC | vJ_ha_m3 | vI_ha_m3 | Vsp_ha_m3 |
---|---|---|---|---|---|---|---|
1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 | 85.80 | 121.8 |
1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 | 85.80 | 121.8 |
1 | 1 | 121.8 | 42.63 | 30.45 | 94.8 | 85.80 | 121.8 |
1 | 2 | 31.8 | 11.13 | 7.95 | 31.0 | 85.80 | 31.8 |
1 | 2 | 31.8 | 11.13 | 7.95 | 31.0 | 85.80 | 31.8 |
2 | 1 | 13.8 | 4.83 | 3.45 | 94.8 | 29.28 | 146.4 |
2 | 1 | 32.4 | 11.34 | 8.10 | 32.4 | 29.28 | 146.4 |
2 | 1 | 37.2 | 13.02 | 9.30 | 37.2 | 29.28 | 146.4 |
2 | 1 | 29.4 | 10.29 | 7.35 | 31.0 | 29.28 | 146.4 |
2 | 1 | 33.6 | 11.76 | 8.40 | 33.6 | 29.28 | 146.4 |
# Save output dataset
write.csv(dataset_tidy, "dataset_tidy.csv", row.names = FALSE)
Section unfinished!: For demonstration purposes only, an ordinary-least-squared (OLS) model was fitted with unscreened covariates from the above dataset and deployed to generate predictions of total standing timber over a site area of 100 hectares. From these predictions, derived residuals were acquired to showcase a number of diagnostic tools and accuracy metrics relevant to forest inventory analyses.
Candidate models were then trained, tuned, and analysed using the 10k-fold cross-validation technique (10-repeat). This method was used to present proposed estimates of model bias between training sample and test sample for potential use in actual biomass estimation. It was also used to explore levels of precision in terms of the variation of results (RMSE). Using DescTools package in R, Theil’s U estimate of error also examined level of unexplained variance in each model.
This stage was specifically focused around a diagnostic exercise, reporting on techniques used in cross validation, tuning functions, and predictor evaluation. For quicker review, sample results from the following code chunks were gathered and presented in the three tables directly below.
Variable | Mean | Std. Deviation | Std. Error | Skewness | Shapiro-Wilk |
stratum_i | 1.5 | 0.53 | 0.17 | 0.00 | *** |
species_j | 2.60 | 1.58 | 0.50 | 0.14 | * |
volume | 3.00 | 1.15 | 0.36 | 0.36 | |
vji_ha_m3 | 57.54 | 44.77 | 14.16 | 0.69 | *** |
Breusch Pagan | Bonferroni | stratum_VIF | species_VIF | volume_VIF | |
M1OLS | 1.078 | 1.212 | 1.078 | 1.212 | 1.131 |
Model | Theil U | MAEfull.model | RMSEfull.model | MAEcv.model | RMSEcv.model |
---|---|---|---|---|---|
SVMlinear | 0.083 | 9.324 | 11.575 | 53.476 | 65.683 |
SVMradial | |||||
RFlinear |
An additional set of packages used in this stage were loaded as listed
in the requirements
variable below.
requirements <- install.packages(
c(
"oslrr", "ggplot2", "autoplotly", "BiocManager", "ggbio",
"car", "psych", "useful", "caret",
"BIOMASS", "rvest", "lmfor", "DescTools",
"MLmetrics", "knitr", "phyloseq", "carData"
),
dependencies = TRUE
)
requirements
# str(dataset_tidy)
dataset_tidy$stratum_i <- as.integer(dataset_tidy$stratum_i)
dataset_tidy$species_j <- as.integer(dataset_tidy$species_j)
model_lm1 <- lm(vji_ha_m3 ~ volume + species_j + stratum_i, data = dataset_tidy)
olsrr::ols_regress(vji_ha_m3 ~ volume + species_j + stratum_i, data = dataset_tidy)
Model Summary
---------------------------------------------------------------
R 0.968 RMSE 13.706
R-Squared 0.938 Coef. Var 23.820
Adj. R-Squared 0.906 MSE 187.853
Pred R-Squared 0.848 MAE 9.281
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 16909.928 3 5636.643 30.006 5e-04
Residual 1127.116 6 187.853
Total 18037.044 9
--------------------------------------------------------------------
Parameter Estimates
-------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-------------------------------------------------------------------------------------------
(Intercept) 84.003 20.840 4.031 0.007 33.008 134.997
volume 21.765 4.239 0.557 5.135 0.002 11.393 32.138
species_j -8.343 3.189 -0.294 -2.616 0.040 -16.145 -0.540
stratum_i -46.712 8.999 -0.550 -5.191 0.002 -68.731 -24.692
-------------------------------------------------------------------------------------------
# psych::describe(dataset_tidy$vji_ha_m3)
# psych::describe(dataset_tidy$volume)
# psych::describe(dataset_tidy$species_j)
# psych::describe(dataset_tidy$stratum_i)
shapiro.test(dataset_tidy$stratum_i)
Shapiro-Wilk normality test
data: dataset_tidy$stratum_i
W = 0.65527, p-value = 0.000254
shapiro.test(dataset_tidy$species_j)
Shapiro-Wilk normality test
data: dataset_tidy$species_j
W = 0.83827, p-value = 0.04207
shapiro.test(dataset_tidy$volume)
Shapiro-Wilk normality test
data: dataset_tidy$volume
W = 0.9223, p-value = 0.3766
shapiro.test(dataset_tidy$vji_ha_m3)
Shapiro-Wilk normality test
data: dataset_tidy$vji_ha_m3
W = 0.69994, p-value = 0.0008795
Variable | Mean | SD | SE | Skewness | Shapiro-Wilk |
stratum_i | 1.5 | 0.53 | 0.17 | 0.00 | *** |
species_j | 2.60 | 1.58 | 0.50 | 0.14 | * |
volume | 3.00 | 1.15 | 0.36 | 0.36 | |
vji_ha_m3 | 57.54 | 44.77 | 14.16 | 0.69 | *** |
-----------------------------------------------
Test Statistic pvalue
-----------------------------------------------
Shapiro-Wilk 0.918 0.3407
Kolmogorov-Smirnov 0.1812 0.8419
Cramer-von Mises 0.8333 0.0048
Anderson-Darling 0.3383 0.4225
-----------------------------------------------
Breusch Pagan Test for Heteroskedasticity
-----------------------------------------
Ho: the variance is constant
Ha: the variance is not constant
Data
-------------------------------------
Response : vji_ha_m3
Variables: fitted values of vji_ha_m3
Test Summary
----------------------------
DF = 1
Chi2 = 0.7880034
Prob > Chi2 = 0.3747045
Tolerance and Variance Inflation Factor
---------------------------------------
Variables Tolerance VIF
1 volume 0.8842968 1.130842
2 species_j 0.8247492 1.212490
3 stratum_i 0.9278966 1.077706
Eigenvalue and Condition Index
------------------------------
Eigenvalue Condition Index intercept volume species_j stratum_i
1 3.63391993 1.000000 3.184201e-03 0.00687952 0.01296957 6.623171e-03
2 0.25093610 3.805451 4.067109e-03 0.14908146 0.50064841 8.506720e-06
3 0.08426133 6.567098 2.275283e-05 0.26599152 0.32764056 7.126742e-01
4 0.03088264 10.847521 9.927259e-01 0.57804751 0.15874147 2.806942e-01
# A tibble: 1 × 3
studentized_residual unadjusted_p_val bonferroni_p_val
<dbl> <dbl> <dbl>
1 2.41 0.0609 0.609
# A tibble: 3 × 3
Variables Tolerance VIF
<chr> <dbl> <dbl>
1 volume 0.884 1.13
2 species_j 0.825 1.21
3 stratum_i 0.928 1.08
Breusch Pagan | Bonferroni | stratum_VIF | species_VIF | volume_VIF | |
M1OLS | 1.078 | 1.212 | 1.078 | 1.212 | 1.131 |
Comparing model performances using cross validation.
# Split training and test data
dataset_tidy.samples <- createDataPartition(dataset_tidy$vji_ha_m3, p=0.70, list = FALSE)
dataset_tidy.train <- dataset_tidy[dataset_tidy.samples, ]
dataset_tidy.test <- dataset_tidy[-dataset_tidy.samples, ]
# Set training regime
model_training_10kfold <- trainControl(method = "repeatedcv",
number = 10, repeats = 10)
# animation of 10-kfold method:
knitr::include_graphics(path = "animation.gif")
# Train full and test models: support vector machine*linear
svm_linear_train <- train(vji_ha_m3 ~ volume + species_j + stratum_i,
data = dataset_tidy.train,
method = "svmLinear",
trControl = model_training_10kfold,
preProcess = c("center", "scale"),
tuneLength = 10
)
svm_linear_full <- train(vji_ha_m3 ~ volume + species_j + stratum_i,
data = dataset_tidy,
method = "svmLinear",
trControl = model_training_10kfold,
preProcess = c("center", "scale"),
tuneLength = 10
)
# Train full and test models: randomForest regression tree
rf_train <- train(vji_ha_m3 ~ volume + species_j + stratum_i,
data = dataset_tidy.train,
method = "rf", ntree = 1000,
metric = "RMSE",
trControl = model_training_10kfold,
importance = TRUE
)
note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
svm_linear_full_pred <- predict(svm_linear_full, data = dataset_tidy)
svm_linear_full_pred_mae <- mae(svm_linear_full_pred, dataset_tidy$vji_ha_m3)
svm_linear_full_pred_mae
[1] 9.323846
svm_linear_full_pred_rmse <- rmse(svm_linear_full_pred, dataset_tidy$vji_ha_m3)
svm_linear_full_pred_rmse
[1] 11.5753
#TheilU(dataset_tidy$vji_ha_m3, svm_linear_full_pred, type = 2)
TheilU(dataset_tidy$vji_ha_m3, svm_linear_full_pred, type = 1)
[1] 0.08325566
svm_linear_test_pred <- predict(svm_linear_train, data = dataset_tidy.test)
svm_linear_test_pred_mae <- mae(svm_linear_test_pred, dataset_tidy.test$vji_ha_m3)
svm_linear_test_pred_mae
[1] 56.6754
svm_linear_test_pred_rmse <- rmse(svm_linear_test_pred, dataset_tidy.test$vji_ha_m3)
svm_linear_test_pred_rmse
[1] 66.84399
svm_linear_test_pred_rmse / svm_linear_full_pred_rmse
[1] 5.774708
Model | Theil U | MAEfull.model | RMSEfull.model | MAEcv.model | RMSEcv.model |
---|---|---|---|---|---|
SVMlinear | 0.083 | 9.324 | 11.575 | 53.476 | 65.683 |
SVMradial | |||||
RFlinear |
Output of auto-generated audit also stored in working directory as
dataMaid_dataset_tidy.pdf
knitr::include_graphics("dataMaid_dataset_tidy.png")
Prototype Tool for Mapping Wildfires Using the NASA-FIRMS and Earth-Engine API’s
The digital tool for Task 2 was derived on the Google Colab platform using python programming, a jupyter notebook and the Google Earth Engine library. Using the following link, the tool can be viewed there in its hosted environment with its interactive maps pre-loaded.
The Colab platform should also ensure simpler installation when sharing externally, downloading locally, or running copies in different a Colab project. Note that full editor settings were enabled with this link allowing for code edits to be made by user.
Task 2 Output File Name & Webpage: verra_task2.ipynb
In addition, copies of the Colab notebook were committed to the project’s github repository for easy cloning and revisions: