R package with functions that supports the Carob project for standardizing agricultural research data.
You can install the package with:
remotes::install_github("reagro/carobiner")
R package to support the Carob project
License: GNU General Public License v3.0
R package with functions that supports the Carob project for standardizing agricultural research data.
You can install the package with:
remotes::install_github("reagro/carobiner")
I got INVALID YIELD Error when i was process a dataset containing Cassava Crop. I notice that cassava crop yield in the whole dataset was out of range define for a crops yield. Since for some crop we can have a very widely value of yield , We proposed to perform the Check range by crop on the specific crops depending variable such as Yield, residue yield, Grain weight, biomass, and plant density. What do you thing about that?
I also sent you an email with the proposed R code to go for that , please have a look and drop out your comment if applicable. Thank you
Hallo Carobiners,
We have a function on Carobiner for updating the 'to do' list to have an idea of which data sets have not been processed in Carob. The function however considers a 'processed data set' to be that in the compiled list only, not those in the _pending or _removed. Is this the correct way? or Can we consider those on the _pending or _removed list to be regarded processed as well?
quantitative_check is a function that seeks to ensure that the quantitative variables; biomass total and yield are numeric, that yield is above 0,that yield with NA entries is eliminated(assumption is NAs omitted should be less than 10%) and finally shows the distribution of the aforementioned variables, looking out for the mean, standard deviation, min, max in the summary statistics. This endeavor aims to determine if there are outliers and begs the question, where did the outliers come from?...is it from poor data entry or wrong units used.
quantitative_check <- function(d){
library(dplyr)
d <- d %>% filter(yield > 0)
d <- d[,c("biomass_total","yield")]
for(i in names(d)){
summary(d[,i])
NA_count <- sum(is.na(d[,i]))
perc_NA <- (NA_count/nrow(d))*100
mn <- mean(d[,i],na.rm = TRUE)
std_dv <- sd(d[,i],na.rm = TRUE)
# stats <- boxplot(d,main = "Data Distribution",xlab = "Variables of Interest",
# ylab = "Values",col = "pink",border = "black")
# get threshold values for outliers
Tmin = mn-(3*std_dv)
Tmax = mn+(3*std_dv)
# find outliers
outlier <- d[,i][which(d[,i] < Tmin | d[,i] > Tmax)]
if(length(outlier) > 0){
print(paste("Total outliers in",i,":",length(outlier)))
}
if(perc_NA > 10){
print(paste("NAs above 10% threshold in",i))
}
}
}
quantitative_check is a function that seeks to ensure that the quantitative variables; biomass total and yield are numeric, that yield is above 0,that yield with NA entries is eliminated(assumption is NAs omitted should be less than 10%) and finally shows the distribution of the aforementioned variables, looking out for the mean, standard deviation, min, max in the summary statistics. This endeavor aims to determine if there are outliers and begs the question, where did the outliers come from?...is it from poor data entry or wrong units used.
Kindly assist in correcting and refining.
QC_quantitative.zip
Empty cells are detected as whitespaces. Can it be replaced by NA by default?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.