chop-cgtinformatics / redcaptidier Goto Github PK
View Code? Open in Web Editor NEWMakes it easy to read REDCap Projects into R
Home Page: https://chop-cgtinformatics.github.io/REDCapTidieR/
License: Other
Makes it easy to read REDCap Projects into R
Home Page: https://chop-cgtinformatics.github.io/REDCapTidieR/
License: Other
In part documenting a recent discussion with @ezraporter
As prompted by a recent bug that involved silent failing (#75), it is possible for us to miss test failures that rely on manual updates to .RDS files in /inst/testdata
(a method that was found useful from other users).
.RDS files were a solution used so that non-read_redcap_tidy
tests didn't have to rely on API calls.
There are a few ways we could tackle this:
read_redcap_tidy
outputs and supply the data to the necessary tests
read_redcap_tidy
read_redcap_tidy
for the purpose of alerting to failures that may subsequently indicate silent failurestibble/tribble
s
Add any other context or screenshots about the feature request here.
The following is resultant of meeting with Will on 2022-12-08
Checkboxes should be exported from REDCap appropriately.
When testing with Will's example checkbox database, checkbox exports did not come out as expected:
https://github.com/OuhscBbmc/REDCapR/blob/main/inst/misc/example.credentials#L36
Screenshots
In the image below, only two checkbox variables exist (checkbox_one___1
, checkbox_two___a
), though there are multiple options for each of these in the REDCap client side UI.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
A subfunction of REDCapR
, redcap_metadata_coltypes
, allows for extraction, identification, and assignment of REDCap column data types with readr
syntax.
This may be a nice solution to simplifying how we currently process and assign coltypes in our internal function logic.
Currently, the tibbles that read_redcap_tidy
generates contain the raw values (usually but not always serial integer numbers) for the following field types:
There should be a switch to read_redcap_tidy
(set to TRUE
by default) to make the values contain the labels instead of raw values.
For the following field types, the labels are trivial and can be easily implemented programmatically:
no
/ 1 -> yes
(factor)FALSE
/ 1 -> TRUE
(logical)FALSE
/ 1 -> TRUE
(logical)For the following types, the mapping of raw to label can be determined using the parse_labels
function.
An important implementation detail here is that the order of the choices may have meaning. For this reason, these categorial field types should be coded as factor
in the resulting tibble, with the levels corresponding to the order of choices.
The REDCapTidieR
test suite should run free of any FAIL
s or WARN
s.
There are currently 255 WARN
s returned when running devtools::test()
locally since the newest release of tidyselect
v1.2.0. All warnings look similar to below:
Warning (test-utils.R:32): multi_choice_to_labels works
Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
i Please use `"label"` instead of `.data$label`
This appears to be a very new issue and one that is still being documented and discussed in the community.
And one of Hadley's most recent commits.
I'm hesitant to a do a blanket change since R CMD Check in tandem with rlang
was what prompted introduction of .data$
into our code in the first place.
If applicable, submit any failure logs or error messages here.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
Naming things is hard! A good guideline for naming functions is that the name of the function gels well with the context in which you would first explain or teach the function's purpose. For example, when you explain that importing data is the first step of data analysis, and the question is, how do you import data from REDCap, a great answer could be: with import_redcap()
!
When we then explain that the supertibble has multiple data tibbles embedded into it, and the question is how do you extract those tibbles, the answer could be: with extract_tibbles()
! Or extract_tibble()
if you just want one data tibble. Or bind_tibbles()
if you want to bind the data tibbles to your environment.
We will want to make sure the old functions still work but also let users know that they are now deprecated and will go away soon. Here are two resources for deprecating things smoothly.
This is also a good opportunity to clean up the API a bit. See details below.
Note that roxygen documentation, README, vignettes, as well as screenshots/GIFs will need updating to reflect these changes.
We will also want to add a paragraph to the blog post for 0.2 to explain why we're changing the API.
read_redcap_tidy()
to import_redcap()
. Have read_redcap_tidy()
throw a deprecation warning (will be removed in 0.3) and call import_redcap()
bind_tables()
to bind_tibbles()
. Deprecate the old function as above.extract_table()
to extract_tibble()
. Deprecate the old function as above.extract_tables()
to extract_tibbles()
. Deprecate the old function as above.bind_tibbles()
, rename .data
to supertbl
. Remove structure
(not a useful option IMO)extract_tibble()
and extract_tibbles()
, rename .data
to supertbl
When REDCapTidieR detects that a multiple choice field radio
, dropdown
, or checkbox
has two or more choices where the choice label is the same it should throw a warning.
REDCap allows multiple choices to have the same label (not the same raw value) but this is probably always an error.
REDCapTidieR converts the choice values to the levels of a factor, ignoring the raw value. When a factor is generated in which multiple levels have the same label, those levels are automatically combined into one level.
The bind_tables()
function provides an easy way to make all or a subset of tables derived from a REDCap project in the analyst's environment. However, some may feel like this function has spooky side effects and would like more control over the objects in the environment. I propose a series of helper functions inspired by tidymodels
that allow extracting one or multiple tables.
Implement two functions, extract_table()
and extract_tables()
that allow extracting specific tables from a REDCapTidieR supertibble (good branding?), where:
extract_table(data, tbl)
takes a supertibble and returns the extracted tibble, and
extract_tables(data, tbls)
takes a supertibble and returns a named list of the extracted tibbles.
A really cool feature would be if both functions could support tidy-select semantics for the tbl
/tbls
argument, i.e. allow helpers such as starts_with()
or everything()
. This may be a bit tricky to implement because tidy-select operates on columns and the supertibble doesn't organize its content in columns, but this is a solvable problem with a pivot_wider
transformation.
extract_table()
should throw an error if more than one table is matched by the tbl
argument. I think a great way to implement extract_table()
would be by calling extract_tables()
, counting the number of elements in the list, throw an error if it's not exactly one, and then unlist that element.
demographics <- supertibble %>% extract_table("demographics")
redcap_data_all <- supertibble %>% extract_tables(everything())
We currently have four functions that do the bulk of redcap processing work. clean_redcap()
has distill_repeat_table()
and distill_nonrepeat_table()
. clean_redcap_long()
has distill_repeat_table_long()
and distill_nonrepeat_table_long()
. There's a good amount of duplicated code between these helper functions.
Combine distill_repeat_table()
and distill_nonrepeat_table()
into a single distill_table()
. Combine distill_repeat_table_long()
and distill_nonrepeat_table_long()
into a single distill_table_long()
. This will reduce duplication without going so far as to mix our longitudinal and non-longitudinal processing together.
The current test database is being used to showcase the functionality of REDCapTidieR in the README.Rmd. Let's create a more fun demo database, I'm thinking of this one here โ https://www.kaggle.com/claudiodavi/superhero-set/home
Let's build a classic database "superheroes" with one record per hero and two instruments: heroes_information (nonrepeating) and superpowers (repeating) and use those to showcase the idea of having different granularities in different instruments.
The API cleanup, document revisions, and performance improvements for 0.2 leave the package in a state that I feel are ready to be shared publicly. Take a look here and let me know if you agree.
REDCap has settings to restrict export of fields identified as PHI. These settings impact the content of fields returned by REDCapR
.
Our testing suite should include tests that check for expected behavior when a user attempts to access a form for which they don't have full permissions to export.
This is an initial pass at expected behavior across a range cases. The parameters I'm varying are:
Request | Permissions within form | Proposed Behavior | REDCapR Behavior |
---|---|---|---|
forms = NULL (default) |
All fields | success | success, all accessible fields returned |
forms = NULL (default) |
Subset of fields | warning? | success, all accessible fields returned |
forms = NULL (default) |
No fields | warning? | success, all accessible fields returned |
forms = "name_of_form" |
All fields | success | success, all accessible fields returned |
forms = "name_of_form" |
Subset of fields | warning? | success, all accessible fields returned |
forms = "name_of_form" |
No fields | warning | success, all fields in the project returned including ones the user doesn't have access to |
Amazing, elegant work ๐ช I have been battling with REDCap's 'block matrix' data format written my own much less elegant conversion scripts. Thanks so much for sharing with the community.
Unless have overlooked something, currently the CRAN version of REDCapTidieR doesn't expose the survey timestamp field.
Expose the 'survey_timestamp' field for all instruments of type survey similar to 'form_status_complete'.
REDCapTidieR
will need to handle the edge case where a given instrument may have repeating structure under some events and nonrepeating structure under others.
Current advised solution is to throw an error indicating the database build, since this likely isn't a proper build.
Referenced in conversation with
REDCapR
dev and @skadauke on 2022-08-12.
See below example where technically you could make the "Physical Exam" instrument non-repeating under the "Screening & Enrollment" event but repeating under "Pre-Infusion":
We should have automated checked to ensure the package is lint free and follows appropriate style
For lintr
we can add an expect_lint_free()
test to our testing suite. This will give us a full lint check without changing GHA workflows.
For styler
the best solution is probably precommit
. The hook can be configured to style files automatically or just alert for failures without modifying files.
REDCapTidieR
should be able to handle databases that do not have repeating instruments.
Currently the way our extract_*
and clean_*
functions work, we always check for repeating and non-repeating instruments resulting in errors when a database with no repeating instruments is supplied.
This should return a similarly structured series of tidy tibbles and needs to also handle the case where a longitudinal database contains no repeating instruments.
I developed a copy of the REDCap Classic test database but removed the repeating instrument to replicate the behavior.
> read_redcap_tidy(redcap_uri = Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_CLASSIC_NOREPEAT_API"))
Error in `filter()`:
! Problem while computing `..1 =
!is.na(.data$redcap_repeat_instrument)`.
Caused by error in `.data$redcap_repeat_instrument`:
! Column `redcap_repeat_instrument` not found in `.data`.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
A new exported function (make_labelled()
) that uses the labelled
package to apply labels to fields in the supertibble and fields within the supertibble's list columns.
A use would run:
suppertibble <- make_labelled(suppertibble)
which would:
REDCapTidieR
.supertibble$redcap_metadata
and supertibble$redcap_events
. These will be defined by REDCapTidieR
.supertibble$redcap_data
. These would be derived from the contents of supertibble$redcap_metadata
Additionally we should add functionality for editing the labels in redcap_metadata
before applying them to redcap_data
. Two potential options:
make_labelled()
that control custom formatting, ex. remove_terminal_colons
formatter <- function(x) stringr::str_remove(x, ":$")
suppertibble <- make_labelled(suppertibble, label_format = formatter)
Finally, rather than importing labelled
we should check that it's installed in make_labelled()
and issue an error message if it isn't that asks the user to install it.
Update the main REDCapTidieR
vignette with a section showing off usage of make_labelled()
and exported formatters
data_na_pct
should return a true percentage between 0 and 100
data_na_pct
is between 0 and 1
Multiply data_na_pct
by 100 in calc_metadata_stats()
An alternative solution is to use scales::percent()
to format the percentage. The downside is that scales
functions return a string so that the user wouldn't be able to easily manipulate data_na_pct
. I think that makes multiplying by 100 the most elegant solution.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
Varying REDCap variables should exist solely in their expected instrument tables when output by read_redcap_tidy
.
There is a minor bug that appeared when evaluating the outputs of the "repeat/nonrepeat" REDCap test database. This results in nonrepeat*
columns appearing in the repeated
table output because repeat
as a word is "contained" in nonrepeat
.
REDCapTidieR
functions, use the repeat/nonrepeat REDCap test database and observe the existence of nonrepeat*
columns in the repeated
table output.Screenshots
> read_redcap_tidy(redcap_uri = Sys.getenv("REDCAP_URI"),
+ token = Sys.getenv("REDCAPTIDIER_LONGITUDINAL_API")) %>% bind_tables()
> repeated
# A tibble: 9 ร 10
record_id redcap_repeat_instance redcap_event redcap_arm nonrepeat_1 repeat_1 nonrepeat_2 repeat_2 nonrepeatโฆยน form_โฆยฒ
<dbl> <dbl> <chr> <int> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1 1 event_1 1 NA 1 NA 2 NA 0
2 1 2 event_1 1 NA 3 NA 4 NA 0
3 1 3 event_1 1 NA 5 NA 6 NA 0
4 1 1 event_2 1 NA A NA B NA 0
5 1 2 event_2 1 NA C NA D NA 0
6 3 1 event_1 1 NA C NA D NA 0
7 3 1 event_2 1 NA E NA F NA 0
8 3 2 event_2 1 NA G NA H NA 0
9 4 1 event_3 2 NA R1 NA R2 NA 0
# โฆ with abbreviated variable names ยนโnonrepeated_complete, ยฒโform_status_complete
Replacing instances of contains
with starts_with
in the extract_*
functions resolves this problem.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
When a read_redcap_tidy()
function is called, all expected forms should be returned regardless of forms
specification and structure
.
When specifying forms
from a REDCap built where the record_id
exists in a repeating form, the execution errors.
Have a REDCap project where the initial form (the one containing the record_id
field) is a repeating instrument. Call read_redcap_tidy
and specify a form that is not that instrument.
Error message:
Error in if (my_fields[1] != my_record_id) { :
missing value where TRUE/FALSE needed
Coming from distill_repeat_table
:
https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/3944c913990bcd8f9f314a591357c20e01d2adc4/R/clean_redcap.R#L198-L200
Before submitting this issue, please check and verify below that the submission meets the below criteria:
REDCapTidieR should report the correct number of arms, even if no data has been entered in some of them.
The link_arms()
function currently enumerates arms like so:
arms <- db_data_long %>% pull(redcap_arm) %>% unique() # Define arms
Instead, we should be using either REDCapR::redcap_arm_export()
or REDCapR::redcap_event_instruments()
to enumerate arms.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
The output order of the tables in the supertibble should reflect the same order in REDCap/the output of REDCapR::redcap_metadata_read()
.
Currently the order is grouped by structure
(repeat/nonrepeat) and semi-recognizable ordering, though we don't specify it explicitly anywhere.
This should be doable using the dplyr
ordering functions, identifying the order right after the initial metadata call and then enforcing them at the very end of the read_redcap_tidy
function.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
Text Box data entry fields are special because they can contain many different types of data in addition to text. These data types should be converted to the appropriate data type in R (maybe that's already happening? But I wanted to jot this down while I was thinking about it). See screenshot for the Validation types available in CHOP's REDCap:
I would hope that the validation type is captured in the metadata tibble. I think, here's how things should map--
Validation type | R data type |
---|---|
Date (all variants) | Date |
Datetime (all variants) | POSIXct |
Integer | numeric |
Number (all variants) | numeric |
lubridate
functions will be helpful in converting the various date and datetime formats.
I don't think there is a good representation of Time (without date) so would leave that as a character
.
I think we'll need another instrument in the nonrepeat
instrument with all of these and write a test to make sure we get the expected results.
Events in longitudinal redcaps can have custom labels in addition to identifiers:
The redcap_events
columns of the supertibble should include these labels.
The event labels are accessible with the Export Events API method but REDCapR
doesn't currently have a wrapper for this. We can make an issue and supply the code. Once that's running in REDCapR
it will be easy to add this field to redcap_events
by modifying link_arms()
.
Supplying suppress_redcapr_messages = FALSE
should allow for REDCapR messages to come through (instead of the default of TRUE
).
Messages are always suppressed regardless of argument spec.
In the REDCapR::redcap_read_oneshot
call, we currently have verbose = FALSE
where instead it should be verbose = supress_redcapr_messages
.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
In preparation of CRAN submission, we must address the following:
roxygen2
documentation and appropriate tags
@return
and @example
tagsDESCRIPTION
fileREDCapR
sample databasesTo properly address this we will need to determine how best to include/exclude API token calls for package examples and vignette rendering.
Other things to consider:
%>%
to |>
magrittr
@import
callsextract_repeat/nonrepeat_table
functions under clean_*
to be less confusing with exported, outward-facing extract_table/s
functions
distill_*
, derive_*
, pull_*
utils.R
to clean up, potentially move some out to separate filesAs mentioned in #49, we would like to standardize error messages across REDCapTidieR
using the cli
package.
I believe all messages should currently be captured under the checks.R
script.
Add any other context or screenshots about the feature request here.
Regardless of which instruments are requested, read_redcap_tidy()
should return an object with each instrument-tibble containing:
record_id
, redcap_repeat_instance
, redcap_event
, redcap_arm
)When the forms
parameter is used and the vector of instruments doesn't include the instrument containing record_id
, identifiers aren't included in the output.
When this occurs for longitudinal projects the output additionally contains extra rows with NA
values.
Non-longitudinal:
token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")
redcap_uri <- Sys.getenv("REDCAP_URI")
read_redcap_tidy(redcap_uri, token, forms = "repeated")
Longitudinal:
token <- Sys.getenv("REDCAPTIDIER_LONGITUDINAL_API")
redcap_uri <- Sys.getenv("REDCAP_URI")
read_redcap_tidy(redcap_uri, token, forms = "repeated")
This seems to be occurring because the REDCap API only returns identifiers if the instrument containing record_id
is included in the forms
argument. Augmenting the API call so that the form with record_id
is always requested under the hood should resolve the issue.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
It would be nice if the columns from checkbox-multi
fields didn't have names with three consecutive underscores in them.
As part of the "aesthetic cleanup" let's rename any checkbox-multi columns so that they have a single underscore. I'm not sure what the best way is here. The logic probably shouldn't simply rely on looking for three underscores and instead depend on positive identification of checkbox-multi kinds of columns. I think it might be best to implement this inside of update_field_names()
?
You'll have to change tests/testthat/test-clean_redcap.R.
Let's target v0.1 because this will be a breaking change.
Currently when checkbox fields are expanded in the metadata to have one row per checkbox option the field label from the checkbox field is carried over to all new rows. Field labels for each new row should be updated to include the option label in addition to the field label.
This addresses the conversation in #65 (comment)
Update update_field_names()
to append field labels with option labels. See #65 (comment) for example output.
redcap_read_tidy
should be able to handle all types of fields
I have several checkbox fields in my project where the value 'Unknown'
is coded as -99
. In the data export REDCap converts the minus sign -
into an underscore _
. For example, if the patients identifies as white then race___4
will be 1
but if they refuse to supply their race then race____99
will be 1
-- notice the four underscores instead of the 3 underscores for the 'Unknown'
option
It seems like this break assumptions that redcap_ready_tidy
makes, since I get this error message:
> read_redcap_tidy(uri, token)
Error in `mutate()`:
! Problem while computing `..1 = across(.cols =
all_of(logical_cols), as.logical)`.
Caused by error in `across()`:
! Can't subset columns that don't exist.
x Column `race___-99` doesn't exist.
After searching through the source code I suspect the underlying issue is here:
Before submitting this issue, please check and verify below that the submission meets the below criteria:
The form_status_complete
variable needs to be included in the raw_or_label
logic.
Reference REDCapR
constants for values associated. Should be a simple hard-coded fix.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
REDCapTidieR
should run relatively quickly (<10s) on most manageable databases.
Currently REDCapTidieR
can take long stretches of time that are not related to API calls. Using profvis
it was found that the main culprit here is the check_repeat_and_nonrepeat
function that gets looped through inside of the clean_*
functions. This is a nested for
loop that looks across rows and columns of a database to check whether a project is repeating or nonrepeating.
The complicating factor here is the allowed REDCap behavior to design an instrument that can be both repeating and nonrepeating. This is a behavior we do not want to support, but wish to loudly fail on. If we can fix the below chunk for this check, it should greatly improve execution time:
There may be some other areas that can marginally improve execution time, but this seems to be the biggest driving factor.
Screenshots
profvis
outputs shown below:
Before submitting this issue, please check and verify below that the submission meets the below criteria:
Our Actions workflows began issuing warnings related to this update from GitHub.
There's an open issue on r-lib/actions r-lib/actions#627
This is a placeholder for thoughts on improving the longitudinal vignette.
dplyr::left_join
to augemnt tables for creating analytic objectsThe following is resultant of meeting with Will on 2022-12-08
For extremely large databases, the preferred method of data extraction from REDCap switches between oneshot
and "batch" methods (i.e. those supplied by REDCapR::redcap_read
).
For smaller databases, reading in REDCaps using batches can be slower, so it is not always a great solution. We should determine a way to assess REDCap database size at onset and then send the extraction call to the appropriate function.
To note, the real draw to this is an increase in reliability that batch reading can provide over oneshot
methods, even though oneshot
is simpler and faster for most databases.
The supertibble should make it easy for the analyst to discover table-specific metadata.
Add the following columns to the supertibble returned by read_redcap_tidy()
:
redcap_form_label
: this should be the instrument label returned by redcap_instruments
redcap_metadata
: This should be a list column containing data from db_metadata
specific to the form (i.e. filtered on form_name
). This will include non-data fields, which I think is OK. I suggest the following modifications from the raw REDCapR output: (1) remove form_name
as it's redundant; (2) Turn select_choices_or_calculations
into a list column with tibbles generated by parse_labels
; (3) Add a list column events
that contains a tibble of events which which the instrument is associated - skip if not longitudinal. If there are arms in the project, this tibble should have a column "arm" to indicate arms. (4) Reorder columns so the most useful ones are on the left: field_name
, field_label
, field_type
then everything()
elserows
showing the number of rows of the tibblecolumns
showing the number of cols of the tibbleNA
percentageWe might want to add more informational columns. For inspiration, see here: https://cghlewis.github.io/codebook-pkg-comparison/
Note, see: https://github.com/skadauke/dataMeta
Use the labelled
package to add labels from the metadata to fields in the super-tibble as suggest on the CHOP R User Group slack
When testing the classic REDCap database, it was found that variables were being duplicated between the data_field_types
and text_input_validation_types
instruments due to a starts_with
selector in one of the cleaning sub functions. This caused the text
variable in one instrument to pick up multiple text_*
variables in another instrument.
The offending code lines can be found below:
https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/5d25913e3ad21733322eebef68eb8600d4bf488e/R/clean_redcap.R#L145-L147
Tests will also need to be updated.
In the longitudinal functions, this was handled by changing starts_with()
to all_of()
, but seems to not have carried over to the classic functions.
library(REDCapTidieR)
out <- read_redcap_tidy(redcap_uri = Sys.getenv("REDCAP_URI"),
token = Sys.getenv("REDCAPTIDIER_CLASSIC_API")) %>%
extract_tables(c("data_field_types", "text_input_validation_types"))
Before submitting this issue, please check and verify below that the submission meets the below criteria:
read_redcap_tidy()
handles projects with any number of repeating instruments
read_redcap_tidy()
fails with the following error below when specific forms
are requested on a project with > 1 repeating instruments:
Error in if (my_fields[1] != my_record_id) { :
missing value where TRUE/FALSE needed
This bug was actually introduced in 99461c6 but we didn't discover it because our test REDCaps never have more than 1 repeat instrument. @rsh52 added an additional repeat instrument to test exporting of survey fields which revealed the underlying bug.
uri <- Sys.getenv("REDCAP_URI")
token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")
read_redcap_tidy(uri, token, forms = "repeated")
Starting in 99461c6 we disassociate the Record ID field in db_metadata
from the first form so that it can be easily included in the data/metadata for all forms:
However we subsequently rely on the Record ID being associated with a form in db_metadata
to figure out if we need to insert any additional forms into our call to read_redcap_oneshot()
:
This inconsistency results in extra forms being returned by read_redcap_oneshot()
which generates the downstream error.
Currently we edit db_metadata
throughout read_redcap_tidy()
which causes some tricky order dependencies on the operations we perform. Some of this can be resolved if we stored the results of redcap_metadata_read
before editing them so downstream steps can refer to the unedited metadata when needed.
Pointing this step at the unedited metadata should resolve this bug:
Before submitting this issue, please check and verify below that the submission meets the below criteria:
Tests of read_redcap_tidy()
and the bulk of our vignette currently don't run on CRAN to avoid making live API calls during testing and exposing credentials. httptest
has the ability to create mock API responses that can be fed back to read_redcap_tidy()
during testing/vignette rendering to simulate real API responses.
This needs to be done in such a way that REDCap tokens (and ideally the CHOP REDCap uri) are not stored publicly or needed to run tests/vignettes.
/tests
to create mocks for everything in test-read_redcap_tidy.R
and store as test fixtures
For large databases, it is not desirable to extract everything into a REDCapTidieR
output (the current default behavior) from REDCapR::redcap_read_oneshot
. Users should have the option to specify forms for extraction to minimize load times and memory use.
There should be an option in read_redcap_tidy
where users specify forms. This should trickle to redcap_read_oneshot
for the extraction.
REDCapTidieR
should have a function to return project-wide metadata, arranged in a tidy tibble.
Similar to read_redcap_tidy()
output, we could look into making a read_redcap_metadata_tidy()
function that returns a tibble. This tibble will probably have one row and include the following columns:
project_type
: longitudinal or classicarms
: a tibble of arms and arm names, generated by REDCapR::redcap_arms
. Skip if project doesn't have armsevents
: a tibble of event/instrument mappings, generated by REDCapR::redcap_event_instruments
users
: a tibble of users and project-wide permissions, generated by REDCapR::redcap_users_export
(from $data
object)user_form_permissions
: a tibble of users and instrument-specific permissions, generated by REDCapR::redcap_users_export
(from $data_user_form
object)dags
: a tibble of DAGs, generated by REDCapR::redcap_dag_read
version
: the version of the REDCap instance, generated by REDCapR::redcap_version
Note: dag
, users
and user_form_permissions
may contain sensitive information. I would suggest adding a toggle pull_user_info
as a function argument, setting the default to FALSE
.
More to be added as determined.
Metadata could be returned by the read_redcap_tidy
function. However, this may be awkward for metadata that is not instrument specific.
Checkbox data types should be accurately displayed in the tibble
outputs. For an example variable
with 3 possible selection options, the following output can be expected:
variable__1
variable__2
variable__3
REDCapTidieR
doesn't accurately capture checkbox data types.
This is likely because the field_name
in the metadata is assigned the REDCap variable name (ex: infseq_confirmation
) while the named output has a number attached to each possible sub-variable (ex: infseq_confirmation__1
).
Check the output for a missing infseq_confirmation
in the infusion_sequence
datatable for Prodigy, or in any other checkbox variable like Cohort selection.
If applicable, submit any failure logs or error messages here.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
Currently REDCapTidieR
fails with an uninformative message when a bad API token is given:
Error in `filter()`:
! Problem while computing `..1 = .data$field_type != "descriptive"`.
Caused by error in `.data$field_type`:
! Column `field_type` not found in `.data`.
Run `rlang::last_error()` to see where the error occurred.
Instead we should return an informative message.
We currently pipe the response from redcap_metadata_read()
into filter()
:
Lines 76 to 81 in 51ced1e
Instead we can use the information returned by redcap_metadata_read()
to issue an error message before passing the data on. See success
, status_codes
, and outcome_messages
returned by redcap_metadata_read()
.
The extract_table/s
functions should return the expected tables from a read_redcap_tidy
call.
NULL
values are returned instead of tibble
s when using extract
functions with table specs.
This was not being picked up by our test suite because non-API tests require manual updates of the .RDS files they reference. We may want to think of a better way to update internal/non-API tests or ensure updating the files is a part of our checklist (the current file for doing so is under /inst/create_test_data.R
).
Take a current read_redcap_tidy
output and supply a specification to extract_tables
.
The culprit here is how extract_tables
shaves off supertibble columns that aren't necessary:
https://github.com/CHOP-CGTDataOps/REDCapTidieR/blob/3944c913990bcd8f9f314a591357c20e01d2adc4/R/extract_table.R#L98-L106
In brief testing I believe all we have to do here is change select(-"structure)
to select("redcap_form_name", "redcap_data")
. I think when I originally wrote this it was without the idea that we'd be adding onto the supertibble, but now it makes sense to specifically call out only the columns we need to shore up new additions.
Would welcome thoughts on how to avoid silent test failures like this in the future.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
REDCapTidieR
should support the inclusion of Data Access Groups (DAGs). This is available in REDCapR
's export_data_access_groups
argument.
Data Access Groups are used for multiple collaborators and make it possible to changes access rights. The redcap_data_access_group
variable comes out in the database export as its own column, but does not come out in the metadata (similar to redcap_repeat_instrument
/instance
).
This could be a good use of a R/Medicine hackathon topic
It will be necessary to determine what happens to a user with limited DAG access rights.
Update 2023-03-28:
redcap_dags
argument (by default set to FALSE
). Likely just needs to be incorporated with the REDCapR::redcap_read_oneshot
argument for export_data_access_groups
When a user does not have the appropriate privileges to specific instruments, it can causes an error. This happened recently where new instruments were created in a database but the user API was not updated to grant access, causing a disparity between expected instruments in the metadata and exported data from the database.
This should throw a warning when expected instruments are not found in the database export.
This should throw a warning when expected instruments are not found in the database export.
Before we publicize this package we should make sure that things look as expected when we use it on one or two actual complex REDCap databases. The two I am thinking of are the draft Prodigy huCART19 database and the BMT Outcomes database.
Let's make sure that the import of those two databases works as expected:
NA
NA
in data fields (not sure if this is what we expect)?Since most of these steps are really a QA activity, if you like, you can save the data as CSV and give to Amir to review and report back.
It may also be worth writing automated tests for some/most of these, just to nail down expected behavior.
REDCapR
REDCapTidieR
should function as expected when tested against the same databases that REDCapR
uses:
https://github.com/OuhscBbmc/REDCapR/blob/main/inst/misc/example.credentials
read_redcap_tidy()
should return all requested fields
read_redcap_tidy()
drops checkbox-type fields and warns that there are fields in the metadata that aren't in the result
devtools::load_all()
redcap_uri <- Sys.getenv("REDCAP_URI")
classic_token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")
res <- read_redcap_tidy(redcap_uri, classic_token) |>
extract_table("data_field_types")
#> Warning messages:
#> 1: Form name {data_field_types} detected in metadata, but not found in the database export.
#> This can happen when the user privileges are not set to allow exporting that form via the API.
#> The following variables are affected: checkbox_multiple___1, checkbox_multiple___2, checkbox_multiple___3,
#> ...
"checkbox_multiple___1" %in% names(res)
#> [1] FALSE
This is a result of my bug fix in #51. That fix limits the results of redcap_read_oneshot()
based on what's in the metadata. I missed that at the point we're checking the metadata contains only one record for a checkbox field (ex. checkbox_multiple
) but the data contains one field per checkbox field option (ex. checkbox_multiple___1, checkbox_multiple___2, ...
).
Our tests missed this because:
Update get_output_fields()
to ensure that checkbox fields are kept.
Before submitting this issue, please check and verify below that the submission meets the below criteria:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.