Code Monkey home page Code Monkey logo

jmpwashdata's Introduction

jmpwashdata

The goal of this package is to facilite the use and analysis of data form the WHO/UNICEF Joint Monitoring Programme for Water and Sanitation. It provides a tidy snapshot of the JMP WASH household, WASH in schools and WASH in health care facilities data that is normally available in Excel sheets on https://washdata.org. The excel sheets filenames and date downloaded are stored in the jmpwashdata::jmp_files data frame as a reference. The last download for jmpwashdata version 0.1.4 took place on 2023-01-15.

The goal is to keep the package up to date with changes on the JMP website and eventually to automate this process. If data are out of data with the main JMP website, please feel free to post an issue so we can rebuild it: https://github.com/WASHNote/jmpwashdata/issues

Please support the development and maintenance of this package. The simplest way to do this is to provide us with attribution.

citation(package = "jmpwashdata")
#> 
#> To cite package 'jmpwashdata' in publications use:
#> 
#>   Nicolas Dickinson (2021). jmpwashdata: WHO/UNICEF Joint Monitoring
#>   Programme Water and Sanitation Data. R package version 0.1.4.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {jmpwashdata: WHO/UNICEF Joint Monitoring Programme Water and Sanitation Data},
#>     author = {Nicolas Dickinson},
#>     year = {2021},
#>     note = {R package version 0.1.4},
#>   }
#> 
#> ATTENTION: This citation information has been auto-generated from the
#> package DESCRIPTION file and may need manual editing, see
#> 'help("citation")'.

Installation

The easiest way to install this is by using devtools. You may install devtools as follows:

install.packages("devtools")

Install with devtools

Simply run the following code.

devtools::install_github("WASHNote/jmpwashdata")

You cannot yet install from CRAN. The package will be submitted to CRAN as soon as the documentation has been completed. Rather. you must build it from source and the easiest way to do this is with devtools.

Build and develop the package (advanced)

For those interested in contributing to the development of the package, you may also clone the repository and open it in RStudio.

Changes

  • v.0.1.4 February 2023 Update of data and cleanup of some of the extraction messages.
  • v.0.1.3 November 2021 Addition of extraction of regional and world school and healthcare facility datasets.
  • v.0.1.2 October 2021 Update of data files to include the new world and region files and changes in other files and to add more error handling. Includes now the data summary sheets found in the inequality files parsed to be in a cleaner long format.
  • v.0.1.1 July 2021 New published data files extracted with the 2019 and 2020 data sets from JMP Excel sheets.
  • v.0.1.0 June 2021 Extraction of 2017 JMP files.

Wish list / roadmap

  • Complete codebook of all jmp datasets and of the package
  • Complete labeling of all of the datasets
  • Complete how-to documentation and several case studies to demonstrate use
  • Add WASH in Schools and WASH in Health Care Facilities country files.
  • Add use cases on combining with other data sets (national monitoring data, country TrackFin studies, etc.)
  • Add tests for data extraction and validation to cross validate country files against world files and different sheets against one another (as an extraction test and internal validation of the data sets).
  • Add helper functions to transform world and regional data between the original wide format and a long format.
  • Standardize the (long) data format used by datasets in the package.
  • Automate rebuilds using file hashes and sampling and a periodic poll of the JMP website
  • Post article on “Enhancing the use and quality of official statistics using open source”
  • Python wrapper library for easy inclusion in Python projects

jmpwashdata's People

Contributors

nickdickinson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

bonschorno

jmpwashdata's Issues

Add labels to columns and values

Add data labels to each dataset using a standardized codebook (#2).

Optional: add value labels (for discussion). We can add as a separate issue if needed.

Add standard tests for between PRs and local testing

The main tests would:

  • Check that the any API for making long / wide dataframes still work
  • Check that the codebook generates

Additionally, if there are changes to the scraping files (note, this should NOT be automated to avoid unnecessarily scraping all the files and getting potentially blacklisted):

  • Check that the scraping functions still work (just a few files)

Wish list / roadmap

This wish list was copied from the README and added here as a to do list:

  • Complete codebook of all jmp datasets and of the package (#2)
  • Add label attributes and possibly value attributes to all datasets using [haven::labelled] (#8)(https://haven.tidyverse.org/reference/labelled.html)
  • Write how-to documentation and several case studies as article vignettes in pkdgown website #4 to demonstrate use
  • Add WASH in Schools and WASH in Health Care Facilities country files.
  • Add use cases on combining with other data sets (national monitoring data, country TrackFin studies, etc.) in pkdown website #4
  • Add tests for data extraction and validation to cross validate country files against world files and different sheets against one another (as an extraction test and internal validation of the data sets). (#1 and #9)
  • Add helper functions to transform world and regional data between the original wide format and a long format.
  • Standardize the (long) data format used by datasets in the package.
  • Automate rebuilds using file hashes and sampling and a periodic poll of the JMP website
  • Prepare package for CRAN submission (https://r-pkgs.org/release.html)
  • Prepare an article on “Enhancing the use and quality of official statistics using open source” for JOSS once CRAN publication is finalized
  • Python wrapper library for easy inclusion in Python projects

Language selection in country files

@donmezayca
There was an issue introduced with scraping data from country files because of the way that the language selection is implemented. At the moment the column names are translated automatically when a new language is selected. It is difficult in the R script to switch the language. Making it easier to scrape these files would be good for a number of reasons:

  • ETL scripts used by jmpwashdata, the JMP team, and others should have predictable column names/codes
  • There shouldn't be a dependency on running a heavy piece of software like Excel

At the moment, solutions are:

  • hardcoding column names and ranges in the R script, but with the disadvantage that we cannot check when there are mistakes (because a column was shifted for example). This requires coordination with JMP to make sure the assumed fixed columns are indeed correct.
  • adding a raw data sheet that only has the data + column code names that is used as the data source both in the Excel sheet and by ETL scripts (this requires an action by JMP).

Finalization of the codebook and documentation

Currently working on completing the csv file codebook and helper functions that will generate documentation for the R package in a human readable format so users know what the package contains and to be able to publish the package to CRAN

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.