Code Monkey home page Code Monkey logo

carobiner's Introduction

carobiner

R package with functions that supports the Carob project for standardizing agricultural research data.

You can install the package with:

remotes::install_github("reagro/carobiner")

carobiner's People

Contributors

rhijmans avatar egbendito avatar efyrouwa avatar cedricngakou avatar

Stargazers

Adam H. Sparks avatar

Watchers

James Cloos avatar Adam H. Sparks avatar  avatar

carobiner's Issues

Check range issue for crops depending variable

I got INVALID YIELD Error when i was process a dataset containing Cassava Crop. I notice that cassava crop yield in the whole dataset was out of range define for a crops yield. Since for some crop we can have a very widely value of yield , We proposed to perform the Check range by crop on the specific crops depending variable such as Yield, residue yield, Grain weight, biomass, and plant density. What do you thing about that?

I also sent you an email with the proposed R code to go for that , please have a look and drop out your comment if applicable. Thank you

Updating the 'to do' list

Hallo Carobiners,

We have a function on Carobiner for updating the 'to do' list to have an idea of which data sets have not been processed in Carob. The function however considers a 'processed data set' to be that in the compiled list only, not those in the _pending or _removed. Is this the correct way? or Can we consider those on the _pending or _removed list to be regarded processed as well?

Quantitative Variable quality check

quantitative_check is a function that seeks to ensure that the quantitative variables; biomass total and yield are numeric, that yield is above 0,that yield with NA entries is eliminated(assumption is NAs omitted should be less than 10%) and finally shows the distribution of the aforementioned variables, looking out for the mean, standard deviation, min, max in the summary statistics. This endeavor aims to determine if there are outliers and begs the question, where did the outliers come from?...is it from poor data entry or wrong units used.

quantitative_check <- function(d){
  library(dplyr)
  d <- d %>% filter(yield > 0)
  d <- d[,c("biomass_total","yield")]
  for(i in names(d)){
    summary(d[,i])
    NA_count <- sum(is.na(d[,i]))
    perc_NA <- (NA_count/nrow(d))*100
    mn <- mean(d[,i],na.rm = TRUE)
    std_dv <- sd(d[,i],na.rm = TRUE)
    # stats <- boxplot(d,main = "Data Distribution",xlab = "Variables of Interest",
    #                  ylab = "Values",col = "pink",border = "black")
    # get threshold values for outliers
    Tmin = mn-(3*std_dv)
    Tmax = mn+(3*std_dv)
    # find outliers
    outlier <- d[,i][which(d[,i] < Tmin | d[,i] > Tmax)]
    if(length(outlier) > 0){
      print(paste("Total outliers in",i,":",length(outlier)))
    }
    if(perc_NA > 10){
      print(paste("NAs above 10% threshold in",i))
    }
  }
}

Quantitative Variable quality check

quantitative_check is a function that seeks to ensure that the quantitative variables; biomass total and yield are numeric, that yield is above 0,that yield with NA entries is eliminated(assumption is NAs omitted should be less than 10%) and finally shows the distribution of the aforementioned variables, looking out for the mean, standard deviation, min, max in the summary statistics. This endeavor aims to determine if there are outliers and begs the question, where did the outliers come from?...is it from poor data entry or wrong units used.

Kindly assist in correcting and refining.
QC_quantitative.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.