reagro / carobiner Goto Github PK

View Code? Open in Web Editor NEW

1.0 3.0 5.0 250 KB

R package to support the Carob project

License: GNU General Public License v3.0

R 100.00%

carobiner's Introduction

carobiner

R package with functions that supports the Carob project for standardizing agricultural research data.

You can install the package with:

remotes::install_github("reagro/carobiner")

carobiner's People

Contributors

Stargazers

Watchers

Forkers

egbendito cedricngakou efyrouwa njogum ivanaalexml

carobiner's Issues

Check range issue for crops depending variable

I got INVALID YIELD Error when i was process a dataset containing Cassava Crop. I notice that cassava crop yield in the whole dataset was out of range define for a crops yield. Since for some crop we can have a very widely value of yield , We proposed to perform the Check range by crop on the specific crops depending variable such as Yield, residue yield, Grain weight, biomass, and plant density. What do you thing about that?

I also sent you an email with the proposed R code to go for that , please have a look and drop out your comment if applicable. Thank you

Updating the 'to do' list

Hallo Carobiners,

We have a function on Carobiner for updating the 'to do' list to have an idea of which data sets have not been processed in Carob. The function however considers a 'processed data set' to be that in the compiled list only, not those in the _pending or _removed. Is this the correct way? or Can we consider those on the _pending or _removed list to be regarded processed as well?

Quantitative Variable quality check

quantitative_check is a function that seeks to ensure that the quantitative variables; biomass total and yield are numeric, that yield is above 0,that yield with NA entries is eliminated(assumption is NAs omitted should be less than 10%) and finally shows the distribution of the aforementioned variables, looking out for the mean, standard deviation, min, max in the summary statistics. This endeavor aims to determine if there are outliers and begs the question, where did the outliers come from?...is it from poor data entry or wrong units used.

quantitative_check <- function(d){
  library(dplyr)
  d <- d %>% filter(yield > 0)
  d <- d[,c("biomass_total","yield")]
  for(i in names(d)){
    summary(d[,i])
    NA_count <- sum(is.na(d[,i]))
    perc_NA <- (NA_count/nrow(d))*100
    mn <- mean(d[,i],na.rm = TRUE)
    std_dv <- sd(d[,i],na.rm = TRUE)
    # stats <- boxplot(d,main = "Data Distribution",xlab = "Variables of Interest",
    #                  ylab = "Values",col = "pink",border = "black")
    # get threshold values for outliers
    Tmin = mn-(3*std_dv)
    Tmax = mn+(3*std_dv)
    # find outliers
    outlier <- d[,i][which(d[,i] < Tmin | d[,i] > Tmax)]
    if(length(outlier) > 0){
      print(paste("Total outliers in",i,":",length(outlier)))
    }
    if(perc_NA > 10){
      print(paste("NAs above 10% threshold in",i))
    }
  }
}

Quantitative Variable quality check

Kindly assist in correcting and refining.
QC_quantitative.zip

Empty cells seen as whitespaces

Empty cells are detected as whitespaces. Can it be replaced by NA by default?

reagro / carobiner Goto Github PK

carobiner's Introduction

carobiner

carobiner's People

Contributors

Stargazers

Watchers

Forkers

carobiner's Issues

Check range issue for crops depending variable

Updating the 'to do' list

Quantitative Variable quality check

Quantitative Variable quality check

Empty cells seen as whitespaces

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent