Code Monkey home page Code Monkey logo

metacore's People

Contributors

davisvaughan avatar elimillera avatar esimms999-gsk avatar kaz462 avatar mayagans avatar mstackhouse avatar seniort avatar statasaurus avatar tamarasenior avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

metacore's Issues

yn_to_tf: additional fix needed

Left is "main" branch, right is "dev" branch. The change was made to address Issue #47 .

Screenshot 2022-09-16 104705

There needs to be an additional, small change made to the regex:

if(all(is.na(x) | str_detect(x, regex("^y$|^n$|yes$|no$", ignore_case = T)))){
case_when(str_detect(x, regex("^y$|yes$", ignore_case = T)) ~ TRUE,
str_detect(x, regex("^n$|no$", ignore_case = T)) ~ FALSE,

"ABCy" would be set to NA. But "ABCyes" would be set to TRUE. Same with "ABCn" vs "ABCno" being set to NA vs FALSE.

Separate Warnings?

At the moment multiple issues are put into a single warning message. For example

Warning messages:
1: order from the ds_vars table only contain missing values.

keep from the ds_vars table only contain missing values.

supp_flag from the ds_vars table only contain missing values.

sig_dig from the value_spec table only contain missing values.

dataset from the supp table only contain missing values.

variable from the supp table only contain missing values.

idvar from the supp table only contain missing values.

qeval from the supp table only contain missing values.

This makes it hard to filter out warnings which are not of interest. For example, we have implemented a function to create a metacore object from the Roche standard specifications. As the Roche specs do not include significant digits, we would like to suppress the corresponding warning.

Could metacore create separate warnings for each detected issue?

Then it would be easy to suppress warnings which do not apply by calling admiraldev::suppress_warning(). For example:

  suppress_warning(
    metacore(
      ds_spec = ds_spec,
      ds_vars = ds_vars,
      var_spec = var_spec,
      value_spec = value_spec,
      derivations = derivations,
      codelist = codelist
    ),
    regexpr = "sig_dig from the value_spec table only contain missing values."
  )

Issue with our Roche DAP reader on latest dev version

@statasaurus we had success using 0.0.4 version of metacore on CRAN, but when we tried our xls spec reader on latest 0.0.6 dev version we got:

Caused by error in .data$predecessor:
! Column predecessor not found in .data.

I thought the latest updates were only focused on define.xml compatibility - did something else change around a "predecessor" field?? We'd need to account for this in our spec reader function in admiralroche.

Importing from Excel specs (P21 formats) and define-xml

Hi, I have been testing/started considering the metacore/metatools packages for ADaM work and downloaded the latest versions from github this week, following the readme instructions. I would like to report potential issues related to building metacore from Excel specs (following the P21 Excel file formats) and related to importing from define-xml. I have regrouped in one post as these are all potential issues related to importing functionality.

  1. spec_to_metacore (using the old P21 Excel file format - with a WhereClauses sheet): issue with importing composite where clauses (i.e. conjunctive clauses) like: PARAMCD EQ "CUMPLDOS" and AVISIT EQ "POST TREATMENT FOLLOW-UP 2". Such a composite Where clause is split into 2 records in the WhereClauses sheet, bearing the same ID and holding each a base clause. After importing, 2 records are created in VALUE_SPEC, one for each base clause. I would expect a single record instead in VALUE_SPEC with the composite clause rebuilt. Likely the issue is in spec_type_to_value_spec(... where_sep_sheet = TRUE... ).

  2. define_to_metacore : similar issues with importing composite where clauses. In define-xml, such where clauses are also split into basic clauses. While importing from define, one record is created in VALUE_DESC but only the first part of the composite where clause is imported, the rest is ignored. So, the issue is related but slightly different than 1).

  3. spec_to_metacore (using the new P21 Excel file format - without the WhereClauses sheet and the where clauses directly in ValueLevel): fails to import and throws errors. However, import works following the stepwise approach with the individual spec_type_to_ functions. I assume spec_to_metacore is tweaked towards the old P21 Excel file format. However, considering that the new P21 Excel file format is now used in recent releases of the P21 Enterprise and Community tools, it would be nice that the function can import directly the newer format as well. Note: here the composite where clauses import well in VALUE_SPEC - which may be expected as they are already on a single line in the input file.

  4. spec_to_metacore: In the P21 Excel file format, for variables/VLM with Origin = "Predecessor", the details are stored in the Predecessor column, for variables with Origin = "Assigned", the details are stored in the Comment column. The import function only handles Origin = "Derived" and reads the details from the Method column into DERIVATIONS. The predecessor and assignment details are lost during import. Looking at the metacore_example("pilot_ADaM.rda"), all the predecessor and assignment details are stored in the DERIVATIONS table. Technically as a VALUE_SPEC record can have only a single Origin, I understand that the origin details may always be stored in the DERIVITIONS table whatever the Origin - as shown in pilot_ADaM.rda. But then it would make sense for spec_to_metacore (or spec_type_to_value_spec) to be able to map the Predecessor column or Comment column into the DERIVATIONS table as well, according to the value of Origin.

  5. define_to_metacore: faces a similar issue to 4) when Origin="Predecessor" or "Assigned", the detailed origin description is also lost and not stored in the DERIVATIONS table.

dplyr 1.0.0 required?

It seems that metacore requires dplyr 1.0.0. I tried to use it with dplyr 0.8.5 and it failed. After updating dplyr to 1.0.9 it worked.

spec_type_to_value_spec: crash without meaningful message and graceful exit

When I call spec_type_to_value_spec in this manner:

  value_spec <- spec_type_to_value_spec(
    doc,
    cols = c(
      dataset = "DATASET",
      variable = "VARIABLE",
      origin = "ORIGIN",
      type = "VLM_TYPE",
      code_id = "CONTROL_TERM_NAME",
      where = "VALUE",
      derivation_id = "COMP_METHOD_NAME",
      sig_dig = "SIG_DIGITS"
    ),
    where_sep_sheet = FALSE,
    sheet = "VLM"
  )

... I receive the following:

Error in `mutate()`:                                                                                                   
! Problem while computing `derivation_id = case_when(...)`.
Caused by error in `.data$predecessor`:
! Column `predecessor` not found in `.data`.
Run `rlang::last_error()` to see where the error occurred.

I was told that this is due to my not having the predecessor parameter in my call. If so, the spec_type_to_value_spec function should be checking that this parameter is present and, if not, inform the user with a meaningful message and exit gracefully.

Error without explicit message: spec_type_to_derivations ()

Error without explanation. I was not able to create derivation.
derivation <- spec_type_to_derivations(
doc,
cols = c(derivation_id = "ID", derivation = "[D|d]efinition|[D|d]escription"),
sheet = "Method|Derivations?",
var_cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "[N|n]ame|[V|v]ariables?",
origin = "[O|o]rigin", predecessor = "[P|p]redecessor", comment = "[C|c]omment")
)

image

Mandatory

image

In metacore documentation, “keep: Logical value about if the variable needs to be kept “. In Define-XML Version 2.0 Completion Guidelines page 12, mandatory corresponds to:

image

Keep and mandatory does not seem compatible, what should be done?

Demo data return functions

@statasaurus can we put in a function that returns that Pilot AdaM metacore object? We have metacore_example() which gives file paths. But what about like a load_metacore_adam_example() or something much less verbose that returns the demo object?

I was looking for this when I was unit testing on metatools. It would even be a nice touch to have the individual tables as dataframes available so you can pull them into memory quick and create a metacore object.

yn_to_tf returning NA instead of T/F for Y/N

image

Test at "A" in screenshot is checking for 'Yes' or 'yes' or 'No' or 'no' or 'Y' or 'y' or 'N' or 'n', so a 'Y' or 'N' would pass the test. But then at "B" in screenshot, the values of only 'Yes' or 'yes' are assigned TRUE and the values of only 'No' or 'no' are assigned FALSE. 'Y', 'y', 'N', 'n' are assigned to NA.

I discovered this because spec_type_to_ds_vars() calls yn_to_tf() for the keep variable.

Also, a value of 'YES' or 'NO' is unacceptable and triggers the Warning message in the "else" block of the yn_to_tf function.

`spec_type_to_var_spec`: consider adding `format` when remove duplicates?

Current spec_type_to_var_spec combines multiple rows into one when variable, length, label, type are the same, but formats are different, e.g., below two rows from Variables sheet are considered as one row, so one of the formats is dropped:

Order Dataset Variable Label Data Type Length Significant Digits Format
28 ADPFT PARAMCD Parameter Code Text 8   $PFT
22 ADAETTE PARAMCD Parameter Code Text 8   $AETTE

var_spec:

variable length label type format common
PARAMCD 8 Parameter Code Text $PFT FALSE

Should format be considered when removing duplicates in spec_type_to_var_spec ?

Error without explicit message: spec_type_to_value_spec()

Error without explanation. I was not able to create value_spec:

value_spec <- spec_type_to_value_spec (
doc,
cols = c(dataset = "[D|d]ataset|[D|d]omain", variable = "Variable",
origin = "[O|o]rigin", type = "[T|t]ype", code_id = "[C|c]odelist|Controlled Term",
sig_dig = "[S|s]ignificant", where = "[W|w]here", derivation_id = "[M|m]ethod",
predecessor = "[P|p]redecessor"),
sheet = NULL,
where_sep_sheet = TRUE,
where_cols = c(id = "ID", where = c("Variable", "Comparator", "Value")),
var_sheet = "Variables"
)

image

Failure with dev tidyselect

With dev tidyselect there are a bunch of failures like:

Error (test-metacore.R:68:4): metacore wrapper function works
<dplyr:::mutate_error/rlang_error/error/condition>
Error in `mutate(., key_seq = str_split(key_seq, ",\\s"), key_seq = map(key_seq, 
    function(x) {
        tibble(variable = x) %>% mutate(key_seq = row_number())
    }))`: Problem while computing `key_seq = str_split(key_seq, ",\\s")`.
Caused by error in `vctrs::vec_size_common()`:
! object 'key_seq' not found

This comes down to a fairly subtle problem create_tbl() used a named argument to tidyselect::matches() which used to work by conicidence, but not longer does.

esimms_README

Going through the README.md, I noticed some things which may need to be fixed:

General: the diagrams have the df and field names in upper case; all should be lower case?

ds_spec:

  • "Structure: Value structure ..." -> "structure: Value structure ..."
  • "Label: Dataset label" -> "label: Dataset label"

ds_vars: Only 5 out of 7 fields are shown in the diagram; order and supp_flag are not shown.

var_spec:
" ... contains the information the purely variable level information" -> "... contains the purely variable-level information"
Only 5 out of 6 fields are shown in the diagram; format is not shown.

derivation:

  • The name of the df should be "derivations"
  • The name of the field in the derivation df is "derivation". The text in the diagram for the field name needs to change to "derivation".

codelist:

  • In the diagram, "NAMES" should be "name" and "CODES" should be "code".
  • "To see a metacore object in about please see our vignettes" -> "To see more information about the metacore object, please see the vignettes." ?

supp: "To see a metacore object in about please see our vignettes" -> "To see more information about the metacore object, please see the vignettes." ?

Remove {gt} from Vignette usage

The first release of {gt} was 2020-03-31. Our github workflow is set up right now to test on a R 3.6 snapshot from 2020-02-29 and a R 3.5 snapshot from earlier. This causes a build error because {gt} doesn't exist on the MRAN snapshot from that time period.

This is a simple fix as we just need to remove the use of {gt} and replace with another HTML table package.

Documentation: core in Readme

image

In ADaM IG, core can be Required, Conditionally required or Permissible (ADAM IG 1.3 pg 14)
image

SDTM IG 3.4 page 22
image

It is not clear what core values are allowed. Furthermore, in CDISC library, only Req, Cond Perm is used for ADaM and SDTM Req, Exp and Perm. Are these values also acceptable for Core?

esimms_build_specification_readers

/vignettes/Building_Specification_readers.Rmd:

  • "By default metacore can read in specification that ..." -> "By default metacore can read in specifications that ..."
  • "Therefore we will have to build bespoke ..." -> "Therefore we will have to build a bespoke ..."
  • "ds_sep: Contains dataset ..." -> "ds_spec: Contains dataset ..."
  • "The ds_vars table has 5 columns:" -> "The ds_vars table has 7 columns:"
  • "Now we know what we need we can ..." -> "Now that we know what we need, we can ..."
  • "... the where information on different ..." -> "... the where information on a different ..."
  • code block shows warning message with typo; this is due to line 129 of /R/validators.R (see below)

Line 129 of /R/validators.R:

  • "Warning: The following variables hace code ids not found in the ..." -> "Warning: The following variables have code ids not found in the ..."

Any thoughts on how to expand the metacore data model to other dimensions

The idea of centralizing metadata into a relational data structure is a great one, although right now the current model only accounts for a "slice" of possible metadata storage within an R session. Think of one metacore object per dataset type (e.g. SDTM x ADAM), Task, Study, etc. One could of course just loop over the different dimensions and store individual metacore objects in a list, but some of this information could also be natively incorporated into the model. Are you currently discussing on how to expand the structure to account for these other dimensions or is this not part of the scope of the project?

Make package compatible with {dplyr} < 1.0.0

I just tried to install {metacore} into our validated R environment at Roche which still uses {dplyr} 0.8.5. Unfortunately, the installation failed.

Error: object ‘c_across’ is not exported by 'namespace:dplyr'
Execution halted
ERROR: lazy loading failed for package ‘metacore’
* removing ‘/home/bceuser/neitmant/admiral.roche/renv/staging/1/metacore’

Could you adjust your code such that it works with older dplyr versions?

I'd be happy to edit the matrix config of your R CMD check workflow such that it runs also on earlier R versions with older packages. I did that for {admiral} and it proved to be very valuable for ensuring that the package works on our kind of old production environment.

metacore pulled from CRAN?

Just noticed that:

http://cran.r-project.org/package=metacore

Currently yields:

Package ‘metacore’ was removed from the CRAN repository.

Formerly available versions can be obtained from the [archive](https://cran.r-project.org/src/contrib/Archive/metacore).

Archived on 2023-03-01 as issues were not corrected despite reminders.

A summary of the most recent check results can be obtained from the [check results archive](https://cran-archive.r-project.org/web/checks/2023/2023-03-01_check_results_metacore.html).

Please use the canonical form https://cran.r-project.org/package=metacore to link to this page.

esimms_minor_typos

I am unsure where this is coming from (title statement in the function files?) but the xml_to_codelist and xml_to_value_spec descriptions should have "XML" in upper case:

xml_caps

Export `create_tbl()`

@statasaurus Thanks again for helping with building the Roche spec reader. After making some minor adjustments the reader now works properly and I incorporated it within an internal package. However, as the reader function makes use of the un-exported create_tbl() function I get a Note when running R CMD check which is not acceptable according to our package validation guidelines. Thus, please export this function. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.