chop-cgtinformatics / redcaptidier Goto Github PK

View Code? Open in Web Editor NEW

32.0 4.0 8.0 29.7 MB

Makes it easy to read REDCap Projects into R

Home Page: https://chop-cgtinformatics.github.io/REDCapTidieR/

License: Other

R 17.74% TeX 0.52% HTML 81.74%

redcap tidy-data r redcap-api

redcaptidier's People

Stargazers

Watchers

Forkers

rjake nmintz munozedg wibeasley gaborcsardi mike-caarg mikemahoney218 larmarange

redcaptidier's Issues

[FEATURE] Improve Test Suite

Feature Request Description

In part documenting a recent discussion with @ezraporter

As prompted by a recent bug that involved silent failing (#75), it is possible for us to miss test failures that rely on manual updates to .RDS files in /inst/testdata (a method that was found useful from other users).

.RDS files were a solution used so that non-read_redcap_tidy tests didn't have to rely on API calls.

Proposed Solutions

There are a few ways we could tackle this:

Use mocks to get read_redcap_tidy outputs and supply the data to the necessary tests
- Not entirely desirable since this essentially also tests read_redcap_tidy
Write integration tests that test functions with read_redcap_tidy for the purpose of alerting to failures that may subsequently indicate silent failures
Rewrite tests as much as possible to use manually constructed tibble/tribbles
- Also potentially undesirable since manual changes need to be kept up with if the examples they reference also change

Additional Context

Add any other context or screenshots about the feature request here.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] read_redcap Checkbox Example Not Working

Expected Behavior

The following is resultant of meeting with Will on 2022-12-08

Checkboxes should be exported from REDCap appropriately.

Current Behavior

When testing with Will's example checkbox database, checkbox exports did not come out as expected:

https://github.com/OuhscBbmc/REDCapR/blob/main/inst/misc/example.credentials#L36

Screenshots
In the image below, only two checkbox variables exist (checkbox_one___1, checkbox_two___a), though there are multiple options for each of these in the REDCap client side UI.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Look Into redcap_metadata_coltypes for coltype ID

Feature Request Description

A subfunction of REDCapR, redcap_metadata_coltypes, allows for extraction, identification, and assignment of REDCap column data types with readr syntax.

This may be a nice solution to simplifying how we currently process and assign coltypes in our internal function logic.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

read_redcap_tidy should support using labels instead of raw values

Feature Request Description

Currently, the tibbles that read_redcap_tidy generates contain the raw values (usually but not always serial integer numbers) for the following field types:

dropdown-single
radio-single
checkbox-multiple
yesno
truefalse

There should be a switch to read_redcap_tidy (set to TRUE by default) to make the values contain the labels instead of raw values.

Proposed Solution

For the following field types, the labels are trivial and can be easily implemented programmatically:

yesno: 0 -> no / 1 -> yes (factor)
truefalse: 0 -> FALSE / 1 -> TRUE (logical)
checkbox-multiple: 0 -> FALSE / 1 -> TRUE (logical)

For the following types, the mapping of raw to label can be determined using the parse_labels function.

dropdown-single
radio-single

An important implementation detail here is that the order of the choices may have meaning. For this reason, these categorial field types should be coded as factor in the resulting tibble, with the levels corresponding to the order of choices.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] tidyselect .data Deprecation

Expected Behavior

The REDCapTidieR test suite should run free of any FAILs or WARNs.

Current Behavior

There are currently 255 WARNs returned when running devtools::test() locally since the newest release of tidyselect v1.2.0. All warnings look similar to below:

Warning (test-utils.R:32): multi_choice_to_labels works
Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
i Please use `"label"` instead of `.data$label`

This appears to be a very new issue and one that is still being documented and discussed in the community.

And one of Hadley's most recent commits.

I'm hesitant to a do a blanket change since R CMD Check in tandem with rlang was what prompted introduction of .data$ into our code in the first place.

Failure Logs

If applicable, submit any failure logs or error messages here.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Update names of exported functions to optimize for teachability

Feature Request Description

Naming things is hard! A good guideline for naming functions is that the name of the function gels well with the context in which you would first explain or teach the function's purpose. For example, when you explain that importing data is the first step of data analysis, and the question is, how do you import data from REDCap, a great answer could be: with import_redcap()!

When we then explain that the supertibble has multiple data tibbles embedded into it, and the question is how do you extract those tibbles, the answer could be: with extract_tibbles()! Or extract_tibble() if you just want one data tibble. Or bind_tibbles() if you want to bind the data tibbles to your environment.

We will want to make sure the old functions still work but also let users know that they are now deprecated and will go away soon. Here are two resources for deprecating things smoothly.

This is also a good opportunity to clean up the API a bit. See details below.

Note that roxygen documentation, README, vignettes, as well as screenshots/GIFs will need updating to reflect these changes.

We will also want to add a paragraph to the blog post for 0.2 to explain why we're changing the API.

Proposed Solution

Rename read_redcap_tidy() to import_redcap(). Have read_redcap_tidy() throw a deprecation warning (will be removed in 0.3) and call import_redcap()
Rename bind_tables() to bind_tibbles(). Deprecate the old function as above.
Rename extract_table() to extract_tibble(). Deprecate the old function as above.
Rename extract_tables() to extract_tibbles(). Deprecate the old function as above.
For bind_tibbles(), rename .data to supertbl. Remove structure (not a useful option IMO)
For extract_tibble() and extract_tibbles(), rename .data to supertbl
Update tests to reflect these changes
Update roxygen documentation for exported functions
Update README and vignettes with new function names, including screenshots/GIFs

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Throw a warning when two or more choices have the same label

Feature Request Description

When REDCapTidieR detects that a multiple choice field radio, dropdown, or checkbox has two or more choices where the choice label is the same it should throw a warning.

REDCap allows multiple choices to have the same label (not the same raw value) but this is probably always an error.
REDCapTidieR converts the choice values to the levels of a factor, ignoring the raw value. When a factor is generated in which multiple levels have the same label, those levels are automatically combined into one level.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Implement a set of extract_* functions to provide an alternative way to extracting tables

Feature Request Description

The bind_tables() function provides an easy way to make all or a subset of tables derived from a REDCap project in the analyst's environment. However, some may feel like this function has spooky side effects and would like more control over the objects in the environment. I propose a series of helper functions inspired by tidymodels that allow extracting one or multiple tables.

Proposed Solution

Implement two functions, extract_table() and extract_tables() that allow extracting specific tables from a REDCapTidieR supertibble (good branding?), where:

extract_table(data, tbl) takes a supertibble and returns the extracted tibble, and
extract_tables(data, tbls) takes a supertibble and returns a named list of the extracted tibbles.

A really cool feature would be if both functions could support tidy-select semantics for the tbl/tbls argument, i.e. allow helpers such as starts_with() or everything(). This may be a bit tricky to implement because tidy-select operates on columns and the supertibble doesn't organize its content in columns, but this is a solvable problem with a pivot_wider transformation.

extract_table() should throw an error if more than one table is matched by the tbl argument. I think a great way to implement extract_table() would be by calling extract_tables(), counting the number of elements in the list, throw an error if it's not exactly one, and then unlist that element.

demographics <- supertibble %>% extract_table("demographics")

redcap_data_all <- supertibble %>% extract_tables(everything())

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

Combine distill_*_table functions

Feature Request Description

We currently have four functions that do the bulk of redcap processing work. clean_redcap() has distill_repeat_table() and distill_nonrepeat_table(). clean_redcap_long() has distill_repeat_table_long() and distill_nonrepeat_table_long(). There's a good amount of duplicated code between these helper functions.

Proposed Solution

Combine distill_repeat_table() and distill_nonrepeat_table() into a single distill_table(). Combine distill_repeat_table_long() and distill_nonrepeat_table_long() into a single distill_table_long(). This will reduce duplication without going so far as to mix our longitudinal and non-longitudinal processing together.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

Create a demo database to make the vignette easier to understand

Feature Request Description

The current test database is being used to showcase the functionality of REDCapTidieR in the README.Rmd. Let's create a more fun demo database, I'm thinking of this one here → https://www.kaggle.com/claudiodavi/superhero-set/home

Proposed Solution

Let's build a classic database "superheroes" with one record per hero and two instruments: heroes_information (nonrepeating) and superpowers (repeating) and use those to showcase the idea of having different granularities in different instruments.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

Consider removing the "experimental" lifecycle badge

The API cleanup, document revisions, and performance improvements for 0.2 leave the package in a state that I feel are ready to be shared publicly. Take a look here and let me know if you agree.

[FEATURE] Add test cases for attempting to access PHI-restricted data

Feature Request Description

REDCap has settings to restrict export of fields identified as PHI. These settings impact the content of fields returned by REDCapR.

Our testing suite should include tests that check for expected behavior when a user attempts to access a form for which they don't have full permissions to export.

Proposed Solution

This is an initial pass at expected behavior across a range cases. The parameters I'm varying are:

Whether the form was requested specifically vs. requested by default
Whether the user has access to all, some, or no fields in the form

Request	Permissions within form	Proposed Behavior	`REDCapR` Behavior
`forms = NULL` (default)	All fields	success	success, all accessible fields returned
`forms = NULL` (default)	Subset of fields	warning?	success, all accessible fields returned
`forms = NULL` (default)	No fields	warning?	success, all accessible fields returned
`forms = "name_of_form"`	All fields	success	success, all accessible fields returned
`forms = "name_of_form"`	Subset of fields	warning?	success, all accessible fields returned
`forms = "name_of_form"`	No fields	warning	success, all fields in the project returned including ones the user doesn't have access to

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] REDCapTidieR doesn't return survey timestamp fields

Amazing, elegant work 💪 I have been battling with REDCap's 'block matrix' data format written my own much less elegant conversion scripts. Thanks so much for sharing with the community.

Enhancment Idea

Unless have overlooked something, currently the CRAN version of REDCapTidieR doesn't expose the survey timestamp field.

Proposed Solution

Expose the 'survey_timestamp' field for all instruments of type survey similar to 'form_status_complete'.

[FEATURE] Error Message for Simultaneous Repeat/Nonrepeat Instruments

Feature Request Description

REDCapTidieR will need to handle the edge case where a given instrument may have repeating structure under some events and nonrepeating structure under others.

Proposed Solution

Current advised solution is to throw an error indicating the database build, since this likely isn't a proper build.

Additional Context

Referenced in conversation with REDCapR dev and @skadauke on 2022-08-12.

See below example where technically you could make the "Physical Exam" instrument non-repeating under the "Screening & Enrollment" event but repeating under "Pre-Infusion":

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Implement automated lintr and styler checks

Feature Request Description

We should have automated checked to ensure the package is lint free and follows appropriate style

Proposed Solution

For lintr we can add an expect_lint_free() test to our testing suite. This will give us a full lint check without changing GHA workflows.

For styler the best solution is probably precommit. The hook can be configured to style files automatically or just alert for failures without modifying files.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] REDCapTidieR Needs to Handle Databases with No Repeating Instruments

Expected Behavior

REDCapTidieR should be able to handle databases that do not have repeating instruments.

Current Behavior

Currently the way our extract_* and clean_* functions work, we always check for repeating and non-repeating instruments resulting in errors when a database with no repeating instruments is supplied.

This should return a similarly structured series of tidy tibbles and needs to also handle the case where a longitudinal database contains no repeating instruments.

How to Reproduce the Bug:

I developed a copy of the REDCap Classic test database but removed the repeating instrument to replicate the behavior.

Failure Logs

> read_redcap_tidy(redcap_uri = Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_CLASSIC_NOREPEAT_API"))
Error in `filter()`:                                                                                          
! Problem while computing `..1 =
  !is.na(.data$redcap_repeat_instrument)`.
Caused by error in `.data$redcap_repeat_instrument`:
! Column `redcap_repeat_instrument` not found in `.data`.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Create utility function to apply {labelled} labels to supertibble

Feature Request Description

A new exported function (make_labelled()) that uses the labelled package to apply labels to fields in the supertibble and fields within the supertibble's list columns.

Proposed Solution

A use would run:

suppertibble <- make_labelled(suppertibble)

which would:

Apply labels to the variables in the supertibble. These will be defined by REDCapTidieR.
Apply labels to the variables in supertibble$redcap_metadata and supertibble$redcap_events. These will be defined by REDCapTidieR.
Apply labels to the variables in supertibble$redcap_data. These would be derived from the contents of supertibble$redcap_metadata

Additionally we should add functionality for editing the labels in redcap_metadata before applying them to redcap_data. Two potential options:

Provide additional parameters to make_labelled() that control custom formatting, ex. remove_terminal_colons
Add a parameter that let's the user pass a formatting function to preprocess labels and give a sensible default, ex:

formatter <- function(x) stringr::str_remove(x, ":$")

suppertibble <- make_labelled(suppertibble, label_format = formatter)

Finally, rather than importing labelled we should check that it's installed in make_labelled() and issue an error message if it isn't that asks the user to install it.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Add `make_labelled()` usage to vignette

Feature Request Description

Update the main REDCapTidieR vignette with a section showing off usage of make_labelled() and exported formatters

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] `data_na_pct` returned by `read_redcap_tidy()` is between 0 and 1 as opposed to 0 and 100

Expected Behavior

data_na_pct should return a true percentage between 0 and 100

Current Behavior

data_na_pct is between 0 and 1

Suggested fix

Multiply data_na_pct by 100 in calc_metadata_stats()

An alternative solution is to use scales::percent() to format the percentage. The downside is that scales functions return a string so that the user wouldn't be able to easily manipulate data_na_pct. I think that makes multiplying by 100 the most elegant solution.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] REDCap Form Names can be "contained" in other names

Expected Behavior

Varying REDCap variables should exist solely in their expected instrument tables when output by read_redcap_tidy.

Current Behavior

There is a minor bug that appeared when evaluating the outputs of the "repeat/nonrepeat" REDCap test database. This results in nonrepeat* columns appearing in the repeated table output because repeat as a word is "contained" in nonrepeat.

How to Reproduce the Bug:

Using the current REDCapTidieR functions, use the repeat/nonrepeat REDCap test database and observe the existence of nonrepeat* columns in the repeated table output.

Screenshots

> read_redcap_tidy(redcap_uri = Sys.getenv("REDCAP_URI"), 
+                  token = Sys.getenv("REDCAPTIDIER_LONGITUDINAL_API")) %>% bind_tables()
> repeated
# A tibble: 9 × 10
  record_id redcap_repeat_instance redcap_event redcap_arm nonrepeat_1 repeat_1 nonrepeat_2 repeat_2 nonrepeat…¹ form_…²
      <dbl>                  <dbl> <chr>             <int> <chr>       <chr>    <chr>       <chr>          <dbl>   <dbl>
1         1                      1 event_1               1 NA          1        NA          2                 NA       0
2         1                      2 event_1               1 NA          3        NA          4                 NA       0
3         1                      3 event_1               1 NA          5        NA          6                 NA       0
4         1                      1 event_2               1 NA          A        NA          B                 NA       0
5         1                      2 event_2               1 NA          C        NA          D                 NA       0
6         3                      1 event_1               1 NA          C        NA          D                 NA       0
7         3                      1 event_2               1 NA          E        NA          F                 NA       0
8         3                      2 event_2               1 NA          G        NA          H                 NA       0
9         4                      1 event_3               2 NA          R1       NA          R2                NA       0
# … with abbreviated variable names ¹nonrepeated_complete, ²form_status_complete

Proposed Solution

Replacing instances of contains with starts_with in the extract_* functions resolves this problem.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] "forms" argument fails when the record_id form is repeating

Expected Behavior

When a read_redcap_tidy() function is called, all expected forms should be returned regardless of forms specification and structure.

Current Behavior

When specifying forms from a REDCap built where the record_id exists in a repeating form, the execution errors.

How to Reproduce the Bug:

Have a REDCap project where the initial form (the one containing the record_id field) is a repeating instrument. Call read_redcap_tidy and specify a form that is not that instrument.

Screenshots

Failure Logs

Error message:

Error in if (my_fields[1] != my_record_id) { : 
missing value where TRUE/FALSE needed

Coming from distill_repeat_table:
https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/3944c913990bcd8f9f314a591357c20e01d2adc4/R/clean_redcap.R#L198-L200

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] REDCapTidieR enumerates arms incorrectly

Expected Behavior

REDCapTidieR should report the correct number of arms, even if no data has been entered in some of them.

Current Behavior

The link_arms() function currently enumerates arms like so:

arms <- db_data_long %>% pull(redcap_arm) %>% unique() # Define arms

Instead, we should be using either REDCapR::redcap_arm_export() or REDCapR::redcap_event_instruments() to enumerate arms.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] read_redcap_tidy Output Order Doesn't Mirror Order in REDCap

Expected Behavior

The output order of the tables in the supertibble should reflect the same order in REDCap/the output of REDCapR::redcap_metadata_read().

Current Behavior

Currently the order is grouped by structure (repeat/nonrepeat) and semi-recognizable ordering, though we don't specify it explicitly anywhere.

Solution

This should be doable using the dplyr ordering functions, identifying the order right after the initial metadata call and then enforcing them at the very end of the read_redcap_tidy function.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

Convert Text Box data to the appropriate data type based on Validation type

Feature Request Description

Text Box data entry fields are special because they can contain many different types of data in addition to text. These data types should be converted to the appropriate data type in R (maybe that's already happening? But I wanted to jot this down while I was thinking about it). See screenshot for the Validation types available in CHOP's REDCap:

Proposed Solution

I would hope that the validation type is captured in the metadata tibble. I think, here's how things should map--

Validation type	R data type
Date (all variants)	`Date`
Datetime (all variants)	`POSIXct`
Integer	`numeric`
Number (all variants)	`numeric`

lubridate functions will be helpful in converting the various date and datetime formats.

I don't think there is a good representation of Time (without date) so would leave that as a character.

Tests

I think we'll need another instrument in the nonrepeat instrument with all of these and write a test to make sure we get the expected results.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Include event labels in `redcap_events` tibbles

Feature Request Description

Events in longitudinal redcaps can have custom labels in addition to identifiers:

The redcap_events columns of the supertibble should include these labels.

Proposed Solution

The event labels are accessible with the Export Events API method but REDCapR doesn't currently have a wrapper for this. We can make an issue and supply the code. Once that's running in REDCapR it will be easy to add this field to redcap_events by modifying link_arms().

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] suppress_redcapr_messages Is Not Working

Expected Behavior

Supplying suppress_redcapr_messages = FALSE should allow for REDCapR messages to come through (instead of the default of TRUE).

Current Behavior

Messages are always suppressed regardless of argument spec.

Proposed Solution

In the REDCapR::redcap_read_oneshot call, we currently have verbose = FALSE where instead it should be verbose = supress_redcapr_messages.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] CRAN Preparation and Documentation Checklist

Feature Request Description

In preparation of CRAN submission, we must address the following:

All functions need to have roxygen2 documentation and appropriate tags
- Including @return and @example tags
Vignette documentation updates (see #18)
- README updates
Update the DESCRIPTION file
Create a NEWS article
Check package against REDCapR sample databases
Check submission in RHub
Code of conduct and contributing documentation

To properly address this we will need to determine how best to include/exclude API token calls for package examples and vignette rendering.

Other things to consider:

~~[ ] Convert pipes from %>% to |>~~
- ~~[ ] Remove any unnecessary magrittr @import calls~~
Rename internal extract_repeat/nonrepeat_table functions under clean_* to be less confusing with exported, outward-facing extract_table/s functions
- Thesauraus options: distill_*, derive_*, pull_*
Revisit utils.R to clean up, potentially move some out to separate files

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Implement cli for Error Message Consistency

Feature Request Description

As mentioned in #49, we would like to standardize error messages across REDCapTidieR using the cli package.

I believe all messages should currently be captured under the checks.R script.

Additional Context

Add any other context or screenshots about the feature request here.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] read_redcap_tidy returns incorrect results when forms are specified that don't include first instrument in the project

Expected Behavior

Regardless of which instruments are requested, read_redcap_tidy() should return an object with each instrument-tibble containing:

identifiers (record_id, redcap_repeat_instance, redcap_event, redcap_arm)
one row per unique combination of identifiers in the project data

Current Behavior

When the forms parameter is used and the vector of instruments doesn't include the instrument containing record_id, identifiers aren't included in the output.

When this occurs for longitudinal projects the output additionally contains extra rows with NA values.

How to Reproduce the Bug:

Non-longitudinal:

token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")
redcap_uri <- Sys.getenv("REDCAP_URI")

read_redcap_tidy(redcap_uri, token, forms = "repeated")

Longitudinal:

token <- Sys.getenv("REDCAPTIDIER_LONGITUDINAL_API")
redcap_uri <- Sys.getenv("REDCAP_URI")

read_redcap_tidy(redcap_uri, token, forms = "repeated")

Suggested Fix

This seems to be occurring because the REDCap API only returns identifiers if the instrument containing record_id is included in the forms argument. Augmenting the API call so that the form with record_id is always requested under the hood should resolve the issue.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Make checkbox-multi column names less ugly

Feature Request Description

It would be nice if the columns from checkbox-multi fields didn't have names with three consecutive underscores in them.

Proposed Solution

As part of the "aesthetic cleanup" let's rename any checkbox-multi columns so that they have a single underscore. I'm not sure what the best way is here. The logic probably shouldn't simply rely on looking for three underscores and instead depend on positive identification of checkbox-multi kinds of columns. I think it might be best to implement this inside of update_field_names()?

Additional Context

You'll have to change tests/testthat/test-clean_redcap.R.

Let's target v0.1 because this will be a breaking change.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Improve `field_label` metadata for checkbox fields

Feature Request Description

Currently when checkbox fields are expanded in the metadata to have one row per checkbox option the field label from the checkbox field is carried over to all new rows. Field labels for each new row should be updated to include the option label in addition to the field label.

This addresses the conversation in #65 (comment)

Proposed Solution

Update update_field_names() to append field labels with option labels. See #65 (comment) for example output.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] redcap_ready_tidy fails on checkbox field with negative value

Expected Behavior

redcap_read_tidy should be able to handle all types of fields

Current Behavior

I have several checkbox fields in my project where the value 'Unknown' is coded as -99. In the data export REDCap converts the minus sign - into an underscore _. For example, if the patients identifies as white then race___4 will be 1 but if they refuse to supply their race then race____99 will be 1 -- notice the four underscores instead of the 3 underscores for the 'Unknown' option

It seems like this break assumptions that redcap_ready_tidy makes, since I get this error message:

> read_redcap_tidy(uri, token)

Error in `mutate()`:
! Problem while computing `..1 = across(.cols =
  all_of(logical_cols), as.logical)`.
Caused by error in `across()`:
! Can't subset columns that don't exist.
x Column `race___-99` doesn't exist.

After searching through the source code I suspect the underlying issue is here:

https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/1fd9b7763fbd85e59be7a0a5659092c8ab1d7dfb/R/utils.R#L133-L142

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] `forms_status_complete` Raw or Label Spec

Expected Behavior

The form_status_complete variable needs to be included in the raw_or_label logic.

Reference REDCapR constants for values associated. Should be a simple hard-coded fix.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] Improve REDCapTidieR Execution Times and Benchmarking

Expected Behavior

REDCapTidieR should run relatively quickly (<10s) on most manageable databases.

Current Behavior

Currently REDCapTidieR can take long stretches of time that are not related to API calls. Using profvis it was found that the main culprit here is the check_repeat_and_nonrepeat function that gets looped through inside of the clean_* functions. This is a nested for loop that looks across rows and columns of a database to check whether a project is repeating or nonrepeating.

The complicating factor here is the allowed REDCap behavior to design an instrument that can be both repeating and nonrepeating. This is a behavior we do not want to support, but wish to loudly fail on. If we can fix the below chunk for this check, it should greatly improve execution time:

https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/1c70017dff24b1600bfe18d79695af3ff148ae5e/R/checks.R#L67-L89

There may be some other areas that can marginally improve execution time, but this seems to be the biggest driving factor.

Screenshots
profvis outputs shown below:

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

Address deprecation warnings in GitHub Actions workflows

Description

Our Actions workflows began issuing warnings related to this update from GitHub.

There's an open issue on r-lib/actions r-lib/actions#627

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

Add longitudinal case to vignette

This is a placeholder for thoughts on improving the longitudinal vignette.

Demo how to use dplyr::left_join to augemnt tables for creating analytic objects

[FEATURE] Optimize read_redcap for Batch processing

Feature Request Description

The following is resultant of meeting with Will on 2022-12-08

For extremely large databases, the preferred method of data extraction from REDCap switches between oneshot and "batch" methods (i.e. those supplied by REDCapR::redcap_read).

For smaller databases, reading in REDCaps using batches can be slower, so it is not always a great solution. We should determine a way to assess REDCap database size at onset and then send the extraction call to the appropriate function.

To note, the real draw to this is an increase in reliability that batch reading can provide over oneshot methods, even though oneshot is simpler and faster for most databases.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Augment the supertibble with metadata

Feature Request Description

The supertibble should make it easy for the analyst to discover table-specific metadata.

Proposed Solution

Add the following columns to the supertibble returned by read_redcap_tidy():

redcap_form_label: this should be the instrument label returned by redcap_instruments
redcap_metadata: This should be a list column containing data from db_metadata specific to the form (i.e. filtered on form_name). This will include non-data fields, which I think is OK. I suggest the following modifications from the raw REDCapR output: (1) remove form_name as it's redundant; (2) Turn select_choices_or_calculations into a list column with tibbles generated by parse_labels; (3) Add a list column events that contains a tibble of events which which the instrument is associated - skip if not longitudinal. If there are arms in the project, this tibble should have a column "arm" to indicate arms. (4) Reorder columns so the most useful ones are on the left: field_name, field_label, field_type then everything() else
rows showing the number of rows of the tibble
columns showing the number of cols of the tibble
NA percentage
object size

We might want to add more informational columns. For inspiration, see here: https://cghlewis.github.io/codebook-pkg-comparison/
Note, see: https://github.com/skadauke/dataMeta

Potential Related Enhancement

Use the labelled package to add labels from the metadata to fields in the super-tibble as suggest on the CHOP R User Group slack

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] REDCapTidieR Duplicating Similarly Named Vars Across Tables

Current Behavior

When testing the classic REDCap database, it was found that variables were being duplicated between the data_field_types and text_input_validation_types instruments due to a starts_with selector in one of the cleaning sub functions. This caused the text variable in one instrument to pick up multiple text_* variables in another instrument.

The offending code lines can be found below:
https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/5d25913e3ad21733322eebef68eb8600d4bf488e/R/clean_redcap.R#L145-L147

Tests will also need to be updated.

Proposed Solution:

In the longitudinal functions, this was handled by changing starts_with() to all_of(), but seems to not have carried over to the classic functions.

How to Reproduce the Bug:

library(REDCapTidieR) 

out <- read_redcap_tidy(redcap_uri = Sys.getenv("REDCAP_URI"),
                                 token = Sys.getenv("REDCAPTIDIER_CLASSIC_API")) %>%
           extract_tables(c("data_field_types", "text_input_validation_types"))

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] `read_redcap_tidy()` fails with `forms` argument when project has more than 2 repeating instruments

Expected Behavior

read_redcap_tidy() handles projects with any number of repeating instruments

Current Behavior

read_redcap_tidy() fails with the following error below when specific forms are requested on a project with > 1 repeating instruments:

Error in if (my_fields[1] != my_record_id) { : 
missing value where TRUE/FALSE needed

This bug was actually introduced in 99461c6 but we didn't discover it because our test REDCaps never have more than 1 repeat instrument. @rsh52 added an additional repeat instrument to test exporting of survey fields which revealed the underlying bug.

How to Reproduce the Bug:

uri <- Sys.getenv("REDCAP_URI")
token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")

read_redcap_tidy(uri, token, forms = "repeated")

Underlying Problem

Starting in 99461c6 we disassociate the Record ID field in db_metadata from the first form so that it can be easily included in the data/metadata for all forms:

https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/aac3373fef6e1ee3fdb82eb3b8476f74414ecb26/R/read_redcap_tidy.R#L87-L89

However we subsequently rely on the Record ID being associated with a form in db_metadata to figure out if we need to insert any additional forms into our call to read_redcap_oneshot():

https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/aac3373fef6e1ee3fdb82eb3b8476f74414ecb26/R/read_redcap_tidy.R#L100-L104

This inconsistency results in extra forms being returned by read_redcap_oneshot() which generates the downstream error.

Proposed Solution

Currently we edit db_metadata throughout read_redcap_tidy() which causes some tricky order dependencies on the operations we perform. Some of this can be resolved if we stored the results of redcap_metadata_read before editing them so downstream steps can refer to the unedited metadata when needed.

Pointing this step at the unedited metadata should resolve this bug:

https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/aac3373fef6e1ee3fdb82eb3b8476f74414ecb26/R/read_redcap_tidy.R#L104

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Mock API calls with httptest so tests and vignettes can run on CRAN

Feature Request Description

Tests of read_redcap_tidy() and the bulk of our vignette currently don't run on CRAN to avoid making live API calls during testing and exposing credentials. httptest has the ability to create mock API responses that can be fed back to read_redcap_tidy() during testing/vignette rendering to simulate real API responses.

This needs to be done in such a way that REDCap tokens (and ideally the CHOP REDCap uri) are not stored publicly or needed to run tests/vignettes.

Proposed Solution

Add a script in /tests to create mocks for everything in test-read_redcap_tidy.R and store as test fixtures
- Create custom redactor to replace tokens and uri with fakes that can be referenced in tests and vignettes
Follow these instructions to set up vignettes to use mocks

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] REDCapTidieR Option to Only Load Specific Forms

Feature Request Description

For large databases, it is not desirable to extract everything into a REDCapTidieR output (the current default behavior) from REDCapR::redcap_read_oneshot. Users should have the option to specify forms for extraction to minimize load times and memory use.

Proposed Solution

There should be an option in read_redcap_tidy where users specify forms. This should trickle to redcap_read_oneshot for the extraction.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Implement a function to return project-wide metadata

Feature Request Description

REDCapTidieR should have a function to return project-wide metadata, arranged in a tidy tibble.

Proposed Solution

Similar to read_redcap_tidy() output, we could look into making a read_redcap_metadata_tidy() function that returns a tibble. This tibble will probably have one row and include the following columns:

project_type: longitudinal or classic
arms: a tibble of arms and arm names, generated by REDCapR::redcap_arms. Skip if project doesn't have arms
events: a tibble of event/instrument mappings, generated by REDCapR::redcap_event_instruments
users: a tibble of users and project-wide permissions, generated by REDCapR::redcap_users_export (from $data object)
user_form_permissions: a tibble of users and instrument-specific permissions, generated by REDCapR::redcap_users_export (from $data_user_form object)
dags: a tibble of DAGs, generated by REDCapR::redcap_dag_read
version: the version of the REDCap instance, generated by REDCapR::redcap_version

Note: dag, users and user_form_permissions may contain sensitive information. I would suggest adding a toggle pull_user_info as a function argument, setting the default to FALSE.

More to be added as determined.

Alternatives considered

Metadata could be returned by the read_redcap_tidy function. However, this may be awkward for metadata that is not instrument specific.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] REDCapTidieR Does not capture checkboxes

Expected Behavior

Checkbox data types should be accurately displayed in the tibble outputs. For an example variable with 3 possible selection options, the following output can be expected:

variable__1
variable__2
variable__3

Current Behavior

REDCapTidieR doesn't accurately capture checkbox data types.

This is likely because the field_name in the metadata is assigned the REDCap variable name (ex: infseq_confirmation) while the named output has a number attached to each possible sub-variable (ex: infseq_confirmation__1).

How to Reproduce the Bug:

Check the output for a missing infseq_confirmation in the infusion_sequence datatable for Prodigy, or in any other checkbox variable like Cohort selection.

Screenshots:
Metadata Output:

Data Output:

Failure Logs

If applicable, submit any failure logs or error messages here.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Improve error message for invalid API token

Feature Request Description

Currently REDCapTidieR fails with an uninformative message when a bad API token is given:

Error in `filter()`: 
! Problem while computing `..1 = .data$field_type != "descriptive"`. 
Caused by error in `.data$field_type`: 
! Column `field_type` not found in `.data`. 
Run `rlang::last_error()` to see where the error occurred.

Instead we should return an informative message.

Proposed Solution

We currently pipe the response from redcap_metadata_read() into filter():

REDCapTidieR/R/read_redcap.R

Lines 76 to 81 in 51ced1e

    
           db_metadata <- redcap_metadata_read( 
        
             redcap_uri = redcap_uri, 
        
             token = token, 
        
             verbose = FALSE 
        
           )$data %>% 
        
             filter(.data$field_type != "descriptive")

Instead we can use the information returned by redcap_metadata_read() to issue an error message before passing the data on. See success, status_codes, and outcome_messages returned by redcap_metadata_read().

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] extract_tables no longer works with new field additions

Expected Behavior

The extract_table/s functions should return the expected tables from a read_redcap_tidy call.

Current Behavior

NULL values are returned instead of tibbles when using extract functions with table specs.

This was not being picked up by our test suite because non-API tests require manual updates of the .RDS files they reference. We may want to think of a better way to update internal/non-API tests or ensure updating the files is a part of our checklist (the current file for doing so is under /inst/create_test_data.R).

How to Reproduce the Bug:

Take a current read_redcap_tidy output and supply a specification to extract_tables.

Proposed Solution:

The culprit here is how extract_tables shaves off supertibble columns that aren't necessary:
https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/3944c913990bcd8f9f314a591357c20e01d2adc4/R/extract_table.R#L98-L106

In brief testing I believe all we have to do here is change select(-"structure) to select("redcap_form_name", "redcap_data"). I think when I originally wrote this it was without the idea that we'd be adding onto the supertibble, but now it makes sense to specifically call out only the columns we need to shore up new additions.

Would welcome thoughts on how to avoid silent test failures like this in the future.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] REDCapTidieR Should Support Data Access Groups

Feature Request Description

REDCapTidieR should support the inclusion of Data Access Groups (DAGs). This is available in REDCapR's export_data_access_groups argument.

Additional Context

Data Access Groups are used for multiple collaborators and make it possible to changes access rights. The redcap_data_access_group variable comes out in the database export as its own column, but does not come out in the metadata (similar to redcap_repeat_instrument/instance).

This could be a good use of a R/Medicine hackathon topic

Potential Problems

It will be necessary to determine what happens to a user with limited DAG access rights.

Update 2023-03-28:

This should be solved using a redcap_dags argument (by default set to FALSE). Likely just needs to be incorporated with the REDCapR::redcap_read_oneshot argument for export_data_access_groups

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[FEATURE] Missing Instrument Detection

Feature Request Description

When a user does not have the appropriate privileges to specific instruments, it can causes an error. This happened recently where new instruments were created in a database but the user API was not updated to grant access, causing a disparity between expected instruments in the metadata and exported data from the database.

This should throw a warning when expected instruments are not found in the database export.

Proposed Solution

This should throw a warning when expected instruments are not found in the database export.

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

Field test REDCapTidieR with current complex databases

Feature Request Description

Before we publicize this package we should make sure that things look as expected when we use it on one or two actual complex REDCap databases. The two I am thinking of are the draft Prodigy huCART19 database and the BMT Outcomes database.

Proposed Solution

Let's make sure that the import of those two databases works as expected:

No errors pop up when the database is imported
Each instrument is represented as a tibble
Each instrument has the expected partial keys
Each instrument has the expected fields
Partial keys are never NA
There are no rows that are all NA in data fields (not sure if this is what we expect)?

Since most of these steps are really a QA activity, if you like, you can save the data as CSV and give to Amir to review and report back.

It may also be worth writing automated tests for some/most of these, just to nail down expected behavior.

Check Against Sample REDCap Databases for `REDCapR`

REDCapTidieR should function as expected when tested against the same databases that REDCapR uses:

https://github.com/OuhscBbmc/REDCapR/blob/main/inst/misc/example.credentials

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

[BUG] read_redcap_tidy removes fields with checkbox type

Expected Behavior

read_redcap_tidy() should return all requested fields

Current Behavior

read_redcap_tidy() drops checkbox-type fields and warns that there are fields in the metadata that aren't in the result

How to Reproduce the Bug:

devtools::load_all()
redcap_uri <- Sys.getenv("REDCAP_URI")
classic_token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")

res <- read_redcap_tidy(redcap_uri, classic_token) |>
  extract_table("data_field_types")

#> Warning messages:
#> 1: Form name {data_field_types} detected in metadata, but not found in the database export.
#> This can happen when the user privileges are not set to allow exporting that form via the API.
#> The following variables are affected: checkbox_multiple___1, checkbox_multiple___2, checkbox_multiple___3,
#> ...

"checkbox_multiple___1" %in% names(res)
#> [1] FALSE

Details

This is a result of my bug fix in #51. That fix limits the results of redcap_read_oneshot() based on what's in the metadata. I missed that at the point we're checking the metadata contains only one record for a checkbox field (ex. checkbox_multiple) but the data contains one field per checkbox field option (ex. checkbox_multiple___1, checkbox_multiple___2, ...).

Our tests missed this because:

We're not explicitly checking for these fields in the output
We're silencing warnings in the tests that would catch this due to some expected warnings we check for explicitly

Suggested fix

Update get_output_fields() to ensure that checkbox fields are kept.

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

	db_metadata <- redcap_metadata_read(
	redcap_uri = redcap_uri,
	token = token,
	verbose = FALSE
	)$data %>%
	filter(.data$field_type != "descriptive")