Code Monkey home page Code Monkey logo

jmpwashdata's Issues

Language selection in country files

@donmezayca
There was an issue introduced with scraping data from country files because of the way that the language selection is implemented. At the moment the column names are translated automatically when a new language is selected. It is difficult in the R script to switch the language. Making it easier to scrape these files would be good for a number of reasons:

  • ETL scripts used by jmpwashdata, the JMP team, and others should have predictable column names/codes
  • There shouldn't be a dependency on running a heavy piece of software like Excel

At the moment, solutions are:

  • hardcoding column names and ranges in the R script, but with the disadvantage that we cannot check when there are mistakes (because a column was shifted for example). This requires coordination with JMP to make sure the assumed fixed columns are indeed correct.
  • adding a raw data sheet that only has the data + column code names that is used as the data source both in the Excel sheet and by ETL scripts (this requires an action by JMP).

Add standard tests for between PRs and local testing

The main tests would:

  • Check that the any API for making long / wide dataframes still work
  • Check that the codebook generates

Additionally, if there are changes to the scraping files (note, this should NOT be automated to avoid unnecessarily scraping all the files and getting potentially blacklisted):

  • Check that the scraping functions still work (just a few files)

Wish list / roadmap

This wish list was copied from the README and added here as a to do list:

  • Complete codebook of all jmp datasets and of the package (#2)
  • Add label attributes and possibly value attributes to all datasets using [haven::labelled] (#8)(https://haven.tidyverse.org/reference/labelled.html)
  • Write how-to documentation and several case studies as article vignettes in pkdgown website #4 to demonstrate use
  • Add WASH in Schools and WASH in Health Care Facilities country files.
  • Add use cases on combining with other data sets (national monitoring data, country TrackFin studies, etc.) in pkdown website #4
  • Add tests for data extraction and validation to cross validate country files against world files and different sheets against one another (as an extraction test and internal validation of the data sets). (#1 and #9)
  • Add helper functions to transform world and regional data between the original wide format and a long format.
  • Standardize the (long) data format used by datasets in the package.
  • Automate rebuilds using file hashes and sampling and a periodic poll of the JMP website
  • Prepare package for CRAN submission (https://r-pkgs.org/release.html)
  • Prepare an article on “Enhancing the use and quality of official statistics using open source” for JOSS once CRAN publication is finalized
  • Python wrapper library for easy inclusion in Python projects

Add labels to columns and values

Add data labels to each dataset using a standardized codebook (#2).

Optional: add value labels (for discussion). We can add as a separate issue if needed.

Finalization of the codebook and documentation

Currently working on completing the csv file codebook and helper functions that will generate documentation for the R package in a human readable format so users know what the package contains and to be able to publish the package to CRAN

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.