Code Monkey home page Code Monkey logo

nipntk's Introduction

nipnTK: National Information Platforms for Nutrition (NiPN) Data Quality Toolkit

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Lifecycle: stable CRAN cran checks CRAN CRAN CRAN R-CMD-check test-coverage Codecov test coverage CodeFactor DOI

National Information Platforms for Nutrition (NiPN) is an initiative of the European Commission to provide support to countries to strengthen their information systems for nutrition and to improve the analysis of data so as to better inform the strategic decisions they are faced with to prevent malnutrition and its consequences.

As part of this mandate, NiPN has commissioned work on the development of a toolkit to assess the quality of various nutrition-specific and nutrition-related data. This is a companion R package to the toolkit of practical analytical methods that can be applied to variables in datasets to assess their quality.

The focus of the toolkit is on data required to assess anthropometric status such as measurements of weight, height or length, MUAC, sex and age. The focus is on anthropometric status but many of presented methods could be applied to other types of data. NiPN may commission additional toolkits to examine other variables or other types of variables.

Installation

You can install nipnTK from CRAN:

install.packages("nipnTK")

You can install the development version of nipnTK from GitHub with:

if(!require(remotes)) install.packages("remotes")
remotes::install_github("nutriverse/nipnTK")

Usage

Data quality is assessed by:

  1. Range checks and value checks to identify univariate outliers - guide

  2. Scatterplots and statistical methods to identify bivariate outliers - guide

  3. Use of flags to identify outliers in anthropometric indices - guide

  4. Examining the distribution and the statistics of the distribution of measurements and anthropometric indices - guide

  5. Assessing the extent of digit preference in recorded measurements - guide

  6. Assessing the extent of age heaping in recorded ages - guide

  7. Examining the sex ratio - guide

  8. Examining age distributions and age by sex distributions - guide

These activities and a proposed order in which they should be performed are shown below:

Citation

If you find the nipnTK package useful, please cite using the suggested citation provided by a call to the citation function as follows:

citation("nipnTK")
#> To cite nipnTK in publications use:
#> 
#>   Mark Myatt, Ernest Guevarra (2024). _nipnTK: National Information
#>   Platforms for Nutrition (NiPN) Data Quality Toolkit_.
#>   doi:10.5281/zenodo.4297897 <https://doi.org/10.5281/zenodo.4297897>,
#>   R package version 0.2.0, <https://nutriverse.io/nipnTK/>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {nipnTK: National Information Platforms for Nutrition (NiPN) Data Quality Toolkit},
#>     author = {{Mark Myatt} and {Ernest Guevarra}},
#>     year = {2024},
#>     note = {R package version 0.2.0},
#>     url = {https://nutriverse.io/nipnTK/},
#>     doi = {10.5281/zenodo.4297897},
#>   }

Community guidelines

Feedback, bug reports and feature requests are welcome; file issues or seek support here. If you would like to contribute to the package, please see our contributing guidelines.

This project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

nipntk's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

saeedr1987

nipntk's Issues

table value typo

There is a typo error about the SMART criteria cut-off point for skewness/kurtosis in "Article called Distributions of variables and indices".
The cut-off point for "acceptable" was mentioned as "≥ 0.6 and < 0.6". Actually, it should be ">=0.4 and <0.6". You can see in the table name "The range of absolute values of skewness and kurtosis statistics that are applied by SMART (2015)" under sub-topic called "Skew and kurtosis".

Error in NiPN data quality toolkit

I found a bug in the outliersUV() function in the NiPN data quality toolkit that can prevent calls such as:

svy[outliersUV(svy$muac), ]

from returning the correct set of records when there are NA values in the variable being tested.

I found this when using the toolkit with a dataset from MSF that had NA for all anthropometry when oedema was present. This may be a common case.

Here is an example of the problem using the example dataset in the NiPN toolkit:

svy <- read.table("rl.ex01.csv", header = TRUE, sep = ",")
head(svy)
## Test function
svy[outliersUV(svy$muac), ]

This gives:

Univariate outliers : Lower fence = 98, Upper fence = 178

  age sex weight height muac oedema
33 24 1 9.8 74.5 180.0 2
93 12 2 6.7 67.0 96.0 1
126 16 2 9.0 74.6 999.0 2
135 18 2 8.5 74.5 999.0 2
194 24 M 7.0 75.0 95.0 2
227 8 M 6.2 66.0 11.1 2
253 35 2 7.6 75.6 97.0 2
381 24 1 10.8 82.8 12.4 2
501 36 2 15.5 93.4 185.0 2
594 21 2 9.8 76.5 13.2 2
714 59 2 18.9 98.5 180.0 2
752 48 2 15.6 102.2 999.0 2
756 59 1 19.4 101.1 180.0 2
873 59 1 20.6 109.4 179.0 2

Note that the values in the muac column are all outside of the fences. This is correct.

Adding a missing value:

## Add a missing value
svy$muac[1] <- NA
head(svy)
## Test function
svy[outliersUV(svy$muac), ]

gives us:

Univariate outliers : Lower fence = 98, Upper fence = 178

  age sex weight height muac oedema
32 24 2 8.2 68.7 139 2
92 14 F 7.2 70.6 109 2
125 12 1 8.8 70.6 124 2
134 18 M 7.9 73.1 114 2
193 24 1 8.3 77.1 116 2
226 12 2 9.6 79.2 133 2
252 15 1 11.5 79.2 154 2
380 32 1 11.8 82.6 142 2
500 36 2 11.8 90.0 136 2
593 59 2 13.3 97.5 146 2
713 59 2 13.1 99.3 126 2
751 59 1 16.5 101.1 154 2
755 59 2 14.9 104.4 143 2
872 59 1 15.9 109.5 139 2

The listed records do not contain outliers. This is wrong.

build-in datasets were not able to open

I installed the package from CRAN and followed the sample code provided in the articles' session. But, those build-in datasets (example: rl.ex01) were not able to open. Below was the error message I got after running this comment "svy <- rl.ex01".
Error: object 'rl.ex01' not found

`nipnTK::ageRatioTest()` is returning wrong statistics on p-value

I am writing to report an issue I detected with ageRatioTest() in the nipnTK R package. The issues is that the function is returning wrong p-values. I tested this with different datasets and compared the results with those returned in ENA for SMART software. Let me showcase the issue here and how I tried to identify the source of this error in your code. Lets say I take a sample data and I do:

> ageRatioTest(as.numeric(!is.na(input_data$age)), ratio = 0.85)$p

I get this (after transforming scientific notation into conventional):

[1] 0

When I aggregate at a district, for instance I get this: as you see, exact same results returned in an all locations.

## Check Age ratio ----

data |> select(district, `Age ratio (p)`) |> head()

# A tibble: 6 × 2
  district            `Age ratio (p)`
  <chr>               <chr>         
1 Addun Pastoral LZ   <0.001        
2 Bay Agropastoral LZ <0.001        
3 Beletweyne district <0.001        
4 Coastal Deah        <0.001        
5 East Golis          <0.001        
6 Elbarde             <0.001 

When I return the observed ratio, I get the this:

> ageRatioTest(as.numeric(!is.na(input_data$age)), ratio = 0.85)$observedR
[1] 12.8738
```'
and the observed proportion, I get this: 
```r
> ageRatioTest(as.numeric(!is.na(input_data$age)), ratio = 0.85)$observedP
[1] 0.9279217

I inspected the code by running AgeRatioTest in console:

function (x, ratio = 0.85) 
{
    g <- recode(x, "6:29=1; 30:59=2")
    expectedP <- ratio/(ratio + 1)
    observedP <- sum(g == 1)/sum(table(g))
    observedR <- observedP/(1 - observedP)
    X2 <- prop.test(sum(g == 1), sum(table(g)), p = expectedP)
    result <- list(expectedR = ratio, expectedP = expectedP, 
        observedR = observedR, observedP = observedP, X2 = X2$statistic, 
        df = X2$parameter, p = X2$p.value)
    class(result) <- "ageRatioTest"
    return(result)
}

I tried to address this by writing a different function using the guidance in the nipnTK website:

# Function to run age ratio test -------------------------------------------------------------------
 
age_ratio_test <- function(age, .expectedR = 0.85) {
  
  x <- ifelse(age < 30, 1, 2)
  eprop <- .expectedR / (.expectedR + 1)
  ratio <- sum(x[x == 1], na.rm = TRUE) / sum(x[x == 2], na.rm = TRUE)
  prop <- sum(x[x == 1], na.rm = TRUE) / sum(table(x))
  test <- prop.test(sum(x[x == 1], na.rm = TRUE), sum(table(x)), p = eprop)
  
  return(
    list(
      p = test$p.value,
      observedR = ratio,
      observedP = prop
    )
  )
}

When applying this function to the same data, it returns a different statistics than that in nipnTK::ageRatioTest and the statistics are nearly equal those from the ENA plausibility (just some differences in decimal points). If I run the same test as above I get this:

> age_ratio_test(input_data$age, .expectedR = 0.85)$p
[1] 0.06905227
 
## Check Age ratio ----
 
# A tibble: 6 × 2
  district            `Age ratio (p)`
  <chr>               <chr>          
1 Addun Pastoral LZ   0.011          
2 Bay Agropastoral LZ 0.035          
3 Beletweyne district 0.369          
4 Coastal Deah        0.265          
5 East Golis          0.713          
6 Elbarde             0.300 

When I looked into the differences I spotted two things:

Not use of na.rm = TRUE

in my function (age_ratio_test), I use the arguments na.rm = TRUE in the objects "ratio", "prop" and "test", while in yours you don't remove NA's. With this, in the script, even when trying to control for NA's using this approach (as an example) ageRatioTest(as.numeric(!is.na(input_data$age)), ratio = 0.85)$p, I still doesn't work. I tested my function with this same approach and it returned exactly what your function is returning - incorrect numbers:

> age_ratio_test(as.numeric(!is.na(input_data$age)), .expectedR = 0.85)$p
[1] 0

Use of recode(): you are using recode(). With this, even when adding na.rm = TRUE in the sum() operations, it does not work. It throws this error:

## Function ----
ART <- function (x, ratio = 0.85) 
{
  g <- recode(x, "6:29=1; 30:59=2")
  expectedP <- ratio/(ratio + 1)
  observedP <- sum(g == 1, na.rm = TRUE)/sum(table(g))
  observedR <- observedP/(1 - observedP)
  X2 <- prop.test(sum(g == 1, na.rm = TRUE), sum(table(g)), p = expectedP)
  result <- list(expectedR = ratio, expectedP = expectedP, 
                 observedR = observedR, observedP = observedP, X2 = X2$statistic, 
                 df = X2$parameter, p = X2$p.value)
  class(result) <- "ageRatioTest"
  return(result)
}
 
## Implementation ----
> ART(input_data$age, ratio = 0.85)$p

Error in prop.test(sum(g == 1, na.rm = TRUE), sum(table(g)), p = expectedP) : 
  elements of 'n' must be positive
In addition: Warning message:
Unreplaced values treated as NA as `.x` is not compatible.
Please specify replacements exhaustively or supply `.default`. 

But changing that to ifelse() makes it works :

## Function ----
ART <- function (x, ratio = 0.85) 
{
  g <- ifelse(x < 30, 1, 2)
  expectedP <- ratio/(ratio + 1)
  observedP <- sum(g == 1, na.rm = TRUE)/sum(table(g))
  observedR <- observedP/(1 - observedP)
  X2 <- prop.test(sum(g == 1, na.rm = TRUE), sum(table(g)), p = expectedP)
  result <- list(expectedR = ratio, expectedP = expectedP, 
                 observedR = observedR, observedP = observedP, X2 = X2$statistic, 
                 df = X2$parameter, p = X2$p.value)
  class(result) <- "ageRatioTest"
  return(result)
}
 
## Implementation ----
> ART(input_data$age, ratio = 0.85)$p
[1] 0.06905227

I hope you will find this useful.

CRAN feedback

Thanks,

\dontrun{} should only be used if the example really cannot be executed
(e.g. because of missing additional software, missing API keys, ...) by
the user. That's why wrapping examples in \dontrun{} adds the comment
("# Not run:") as a warning for the user.
Does not seem necessary.
Please unwrap the example if it is executable in < 5 sec, or replace
\dontrun{} with \donttest{}.

Please make sure that you do not change the user's options, par or
working directory. If you really have to do so within functions, please
ensure with an immediate call of on.exit() that the settings are reset
when the function is exited. e.g.:
...
oldpar <- par(no.readonly = TRUE) # code line i
on.exit(par(oldpar)) # code line i + 1
...
par(mfrow=c(2,2)) # somewhere after
...
e.g.: ageChildren.R
If you're not familiar with the function, please check ?on.exit. This
function makes it possible to restore options before exiting a function
even if the function breaks. Therefore it needs to be called immediately
after the option change within a function.

Please do not set a seed to a specific number within a function. e.g.:
greensIndex.R

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.