Code Monkey home page Code Monkey logo

r-cometsanalytics's People

Contributors

arsbiostatistica avatar ellatemprosa avatar ewymathe avatar kailingchen avatar mathelab avatar park-brian avatar wheelerb avatar wobenshain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

r-cometsanalytics's Issues

COMETS 1.3. Minor tweaks to warning messages

For batch mode, model age.1, lets replace existing warning message with "We removed one or more dummy variables that were redundant (i.e. perfectly correlated with another variable)."

Also is it possible to specify the variable?

COMETS 1.4. heatmap (showHClust) issue

When running a model that has several XXX, the heatmap function fails because there are duplicate rownames:

> excorrdata  <- COMETS::runCorr(exmodeldata,exmetabdata,"DPP")
NULL
NULL
[1] "running unadjusted"
> COMETS::showHClust(excorrdata)
Error: Duplicate identifiers for rows (532, 611), (1143, 1222)

This is a new error and not sure why it's creeping up now since the vignette hasn't been changed in a while.

COMETS 1.3. Handle metabolites where variance=0

It will sometimes happen that a metabolite has no variance, i.e. has the same value for every single participant. When this occurs, there should be no analysis/results for this metabolite, but analysis/results for other metabolites should carry forward as normal. Currently, however, the analysis crashes when it runs into any metabolite with variance=0.

We need a better method for handling metabolites where variance=0.

Create S3 class for getcorr output

We will define the object to have the following slots:

  1. outcome
  2. exposures
  3. adjusted
  4. corr
  5. corrmethod (spearman/pearson/mixed)
  6. pvalue
  7. n
  8. super_pathway
  9. biochemical
  10. harmflag
  11. hmdb
  12. mz
  13. rt
  14. uid_01
  15. multrow
  16. uidsource

COMETS 1.3. "Harmonization" file

For our rollout, we have proposed a two step process for the cohorts:

  1. Prepare data file, test integrity, download "harmonization" file, and run one simple analysis (age.2). Send harmonization file and results file to IMS so that they can begin harmonization.

  2. IMS sends back a "Metabolites" tab that is identical to the original, except with an additional UID_01 column. With this new tab, the cohort then goes back to COMETS-Analytics and runs "All models". These models are now "pre-harmonized".

To accommodate this process change, I have two minor edits to the harmonization file, per discussion with Nathan Appel and David Ruggieri of IMS.

  1. The variable that is currently called "UID_01" should be renamed to make room for the IMS UID_01 variable. My suggested rename is "UID_01.comets_analytics"--which reflects the fact that this UID_01 is based on the COMETS-Analytics algorithm. Making room for both columns also will give us data to track our algorithm's performance over time (% match between algorithm and IMS final UID).

  2. The harmonization file changes the case (lower case vs. upper case) of the metabid variable as compared with the original input. To ensure that IMS can fully replicate the original harmonization file, we should provide the original harmonization case. Remember that if the cohort is not proceeding with the "All models" analysis initially, then IMS only has the "Harmonziation" file to work with and not the "Input file".

COMETS 1.5. Handling of missing data

Several investigators have been trying to input missing data, which causes errors. All metabolites and subjectdata variables should be tested for this.

Possible fixes include:
a) An improved tutorial
b) Tests in the Data Input function
c) Tests in the "Read datafile", or "Integrity check", or "getModel" functions.

Because this check could add some time, we could add this as an option, i.e. add a checkbox to test the variables, default is no test of the variables.

COMETS 1.3. Super-batch and Addition of metabolite meta-data table to zip file

The results files are outputting correctly from COMETS-Analytics. However, to link the results from one cohort to another's, we need to pull in their metabolite meta-data table in its entirety, most likely as a separate table in the zip file. In addition, if the results auto-harmonized, we should also pull in the UID column, possibly one or two other columns.

Without meta-data or the UID, we cannot harmonize the metabolites on the back end. This is a high priority fix.

Comets 1.3. Display in interactive mode needs modification

In the latest update, the display of results onscreen is using the wrong column of metabolite names in interactive mode. The display is correct, however, in batch mode. Screenshots of each are below.

Interactive mode:

image

Batch mode (model age.1):

image

COMETS 1.5 Rapid UID upload

After harmonization of study data, in our updated process, we will need to update COMETS with a new UID file to account for new metabolites added. Is there a way to automate this where when the files has changes, we can upload it to a specific spot and is used by COMETS without any input processing?

Does the file we send need to be modified to make something like this work.

COMETS 1.3. Infinite loop

When running the following model, the app enters an infinite loop:

Exposure: Age
Outcome: All metabolites
Adjusted covariates: race_grp

I have noted that this model can work when running individual metabolites or when running all models that are not adjusting for race. This suggests that the problem is related to metabolites where there are only one or a few values being combined with an adjustment where some categories have only one or a few values.

Thus, this could be a model singularity issue, like that described in issue #32 .

COMETS 1.3. New test dataset jams Amazon queue

I have been working with a new test dataset from our collaborators at the American Cancer Society. At least one of their models has been jamming the queue, for reasons unknown. I can confirm that the first two models are fine, and that the problem is not solely due to the "all metabolites*all metabolites" analysis.

More testing to be done once the queue is unjammed.

Scrambled women CPSII data.xlsx

COMETS 1.3

filter function not working correctly using the CPS scrambled file:
the counts expected are

table(exmetabdata$subjdata$prev_heart_dx)
0 1 2
454 84 18

but when you filter using the where statement:
image

this should be 454

and here should be 18
image

COMETS 1.5. Accommodate new field for data harmonization

For COMETS 1.4, I would like to focus on three things: 1) Harmonization; 2) Error handling; and 3) Queue management/troubleshooting. This issue applies to the first of these.

Currently, we are doing all the harmonization on the backend at IMS. For each cohort, they start with our attempt to auto-harmonize but then revise/edit substantially, until all entries are logically consistent. Nathan pointed out that, once this has been done for each study, the most sensible approach is to send our UID back to the cohort as a column to add to their datafile, so that files is permanently harmonized from then forward. Ella, Ewy and I should meet with Nathan to discuss, but on a preliminary basis, I agree.

If we go this route, we will need to accommodate a new column for each datafile in our harmonization algorithm. It may also change the (non-software) workflow for each cohort--for example, we have each study run the Integrity Check and one or two tables that they send to IMS for pre-harmonization. Then, we feed back the harmonized metabolite UID, teh cohort analyst adds it to their file, and runs one or two tables again. Then, if IMS is able to harmonize these easily, then we the cohort runs the whole analysis.

Let's discuss once 1.3 is complete.

COMETS 1.3 Permit adjustment for categorical variables

Currently, categorical variables are not properly adjusted for--they are entered into the model as continuous variables. Models should distinguish between categorical and continuous models using a new column that will be added to the Varmap tab.

This change will also require a change to the Sample file (to be logged separately) and to the "Create Input" utility (it needs to add this column to the input file that it creates).

Warning: bad alloc

I believe that this issue occurs when a greater number of processes are initiated than the server is currently configured to handle. That number right now is 3, but could be increased if the cost is justifiable on the basis of our usage tracking data.

bad alloc

Comets 1.3 - Models adjusted for race (when analyses are stratified by BMI) returns errors

In the test file that I had prepared, the N for non-white/European persons was quite small. In fact, only 1 individual had a race_grp=2. This seems to be causing all kinds of problems in the adjusted/stratified analyses.

To test, run in interactive mode:

Exposure: Age
Outcome: Any individual metabolite
Adjusted covariates: race_grp
Strata by: BMI_grp

Two of the three values returned will have a value of NA. Possibly, this reflects a degrees of freedom issue?

Input file is below.

cometsInput_March_2018.xlsx

Issues to deal with under R-COMETS 0.8004

[ ] rename all _ in variables with .
[ ] add lm and lmer code
[ ] summary statistics for covariates in modeldata$gdta
[ ] concatenate really long names or take only 1st if more than 1 in the display

COMETS 1.4. Warning: CSV input file does not exist

This appears to occur primarily if the extension in the data input file is capitalized (".XLSX"), which results in the software not being able to find the csv file at the analysis stage.

This is a bug that should be fixed in COMETS 1.4

image

COMETS 1.3. Problems with some adjustment-strata combos

Certain combinations of adjustments and stratification can still cause problems with models. One of the simplest scenarios uses the data below, with the model as follows:

Exposure: age
Outcome: glycine (can also use All Metabolites)
Adjusted: bmi_grp, alc_grp
Strata: smk_grp

Initially, I though this could be due to a code reversion, but that was a false lead.

I then thought it could reflect metabolites with high numbers of values below the limit of detection (i.e. little meaningful variance), but I tested against glycine (for which this issue does not apply) and still had the same problem.

I am thus forced to conclude that the issue reflects something about the joint distribution of the adjusted and strata variables that we are not quite fully handling.

Ella and Ewy, the data are attached. Let me know if you have any insights. I hope to test again toward the end of today.

Scrambled CPSII data.xlsx

image

COMETS 1.4. UID file integration

some weird UIDs to be checked:

  • PC-source_fragment
  • nh4 suffix

C14_0_CE_+NH4
C16_0_CE_+NH4
C16_1_CE_+NH4
C18_0_CE_+NH4
C18_0_MAG_+NH4
C18_1_CE_+NH4
C18_2_CE_+NH4

COMETS 1.3 Harmonization not consistently working

Sometimes the metabolites do not harmonize, even when HMDB IDs are present. To my understanding, Ella was looking into this issue. This also occurred with the R. Kelly VDAART file, which I can supply, if needed.

COMETS 1.3: "Where" functionality not working

The "where" functionality no longer appears to be working. This issue needs to be fixed before I can complete testing on categorical adjustment, since the model we have been testing includes a "where" statement.

COMETS 1.3. Update sample and template file

A number of changes need to be made to the test/sample file, including:

  1. Adding a column for continuous/categorical variables
  2. Adding a new value to the models tab for the age_grp variable (age<20 years), so that analyses can be stratified for the youngest participants
  3. Adding BMI as a continuous variable
  4. Adding models for the BMI analysis to the models tab (per R. Kelly), and cleanly distinguishing these models so that cohorts can delete if they elect not to participate.

COMETS 1.4: Data input and "COHORTVARIABLE" column

In tests done to date, all investigators have elected to use the variable names that we use. They are not using the variable matching in any meaningful way.

Thus, I think we could perhaps encourage users to simply code "COHORTVARIABLE" the same as "VARREFERENCE" and, if using the "Create input" utility, we could assume as a default the VARREFERENCE names. This could help to streamline the process of making data input and our writing of the tutorial. We should discuss as a group.

COMETS 1.3. Minor warning issue

I like the addition of the warnings--it will make testing easier.

Now that they are visible, there may be some tweaks needed. One such tweak is that, when running an analysis stratified by BMI, I received the following warning: "Warning: one of your models specifies bmi_grp as a stratification but that variable only has one possible value. Model will run without bmi_grp stratified"

There is a strata of BMI that had very few observations, but bmi_grp itself definitely has more than one possible value, as evidenced in the screenshot below. Any suggestions for how to modify wording ?

image

COMETS 1.2

  1. Should we create a varlabel, a prettier string to display on the heatmap or just strip everything after ()

examples and vignettes

Currently, the code in the vignettes and examples do not match. To minimize confusion, it may be best to sync those up at one point...
Do you agree?

COMETS 1.3. Add "model" table to zip file

My IMS collaborators have requested that we add the model tab to our zip file so that they can double-check that the correct models were run and have it documented. I think this is a great idea. We may want to add the varmap as well.

COMETS 1.3. SuperBatch::Table 1 functionality

COMETS manuscripts will need descriptive data from each of the participating studies that we can show in our Table 1. The descriptive data should be output as a zip file table. For categorical variables, the percent in each category will likely suffice. For continuous variables, I suggest outputting the mean, the standard deviation, and the values at the 0th (minimum), 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 100th (maximum) percentiles.

getModelData() error with modbatch

I've quick fixed the vignette clode bc the getModelData() function was not working. I figured out it was because the original call was as follows:
exmodeldata<- getModelData(exmetabdata,colvars="age",modbatch="1.1 Unadjusted")

HOwever, if modbatch is not specified, it errors because of this line (#62):
mods<-dplyr::filter(as.data.frame(readData[["mods"]]),model==modbatch)

I've fixed it the call to this, which now works:
`exmodeldata <- getModelData(exmetabdata,colvars="age",modbatch="1.1 Unadjusted")

However, does this make sense? If it's in batch, then all models should be read in right?

COMETS 1.3. N does not update when "where" statement used

If the "where" statement is used in either "interactive" mode or in "batch" mode, the N listed does not update. This will create downstream problems when calculating standard errors for meta-analysis and so is an important problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.