Code Monkey home page Code Monkey logo

dqa-dqastats's Introduction

DQAstats

CRAN Status Badge CRAN Checks Dependencies R CMD Check via {tic} linting test-coverage codecov

The R package 'DQAstats' provides core functionalities to perform data quality assessment (DQA) of electronic health record data (EHR).

Currently implemented features are:

  • descriptive (univariate) analysis of categorical and continuous variables of a source database and a target database
  • checks of the extract-transform-load (ETL) pipeline (by comparing distinct values and valid values between the source database and the target database)
  • value conformance checks by comparing the resulting statistics to value constraints (given in a meta data repository (MDR))
  • 'atemporal plausibility' checks (multivariate)
  • 'uniqueness plausibility' checks (multivariate)

The tool provides one main function, dqa(), to create a comprehensive PDF document, which presents all statistics and results of the data quality assessment.

Currently supported input data formats / databases:

Installation

CRAN Version

DQAstats can be installed directly from CRAN with:

install.packages("DQAstats")

Development Version

You can install the latest development version of DQAstats with:

install.packages("remotes")
remotes::install_github("miracum/dqa-dqastats")

Note: A working LaTeX installation is a prerequisite for using this software (e.g. using the R package tinytex)!

๐Ÿ’ก If you want to run this in a dockerized environment you can use the rocker/verse image which has TeX already installed.

Configuration of the tool

The configuration of databases, be it CSV files or SQL-based databases, is done with environment variables, which can be set using the base R command Sys.setenv().

A detailed description, which environment variables need to be set for the specific databases can be found here.

Example

The following code example is intended to provide a minimal working example on how to apply the DQA tool to data. Example data and a corresponding MDR are provided with the R package DQAstats (a working LaTeX installation is a prerequisite for using this software, e.g. by using the R package tinytex; please refer to the DQAstats wiki for further installation instructions).

# Load library DQAstats:
library(DQAstats)

# Set environment vars to demo files paths:
Sys.setenv("EXAMPLECSV_SOURCE_PATH" = system.file("demo_data",
                                                  package = "DQAstats"))
Sys.setenv("EXAMPLECSV_TARGET_PATH" = system.file("demo_data",
                                                  package = "DQAstats"))
# Set path to utilities folder where to find the mdr and template files:
utils_path <- system.file("demo_data/utilities",
                          package = "DQAstats")

# Execute the DQA and generate a PDF report:
results <- DQAstats::dqa(
  source_system_name = "exampleCSV_source",
  target_system_name = "exampleCSV_target",
  utils_path = utils_path,
  mdr_filename = "mdr_example_data.csv",
  output_dir = "output/",
  parallel = FALSE
)

# The PDF report is stored at "./output/"

Demo Usage / Deployment Examples

You can test the package without needing to install anything except docker. :bulb: For further details, see the Wiki: https://github.com/miracum/dqa-dqastats/wiki/Deployment.

Citation

L.A. Kapsner, J.M. Mang, S. Mate, S.A. Seuchter, A. Vengadeswaran, F. Bathelt, N. Deppenwiese, D. Kadioglu, D. Kraska, and H.-U. Prokosch, Linking a Consortium-Wide Data Quality Assessment Tool with the MIRACUM Metadata Repository, Appl Clin Inform. 12 (2021) 826โ€“835. doi:10.1055/s-0041-1733847.

@article{kapsner2021,
  title = {Linking a {{Consortium}}-{{Wide Data Quality Assessment Tool}} with the {{MIRACUM Metadata Repository}}},
  author = {Kapsner, Lorenz A. and Mang, Jonathan M. and Mate, Sebastian and Seuchter, Susanne A. and Vengadeswaran, Abishaa and Bathelt, Franziska and Deppenwiese, Noemi and Kadioglu, Dennis and Kraska, Detlef and Prokosch, Hans-Ulrich},
  year = {2021},
  month = aug,
  journal = {Applied Clinical Informatics},
  volume = {12},
  number = {04},
  pages = {826--835},
  issn = {1869-0327},
  doi = {10.1055/s-0041-1733847},
  language = {en}
}

More Infos

dqa-dqastats's People

Contributors

joundso avatar kapsner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.