Code Monkey home page Code Monkey logo

deepphewas's Introduction

DeepPheWAS

Overview

DeepPheWAS is an R package for running phenome wide association studies (PheWAS). It allows user control of all the stages of PheWAS, from data wrangling, through phenotype generation and association testing.

Installation

# The development version from GitHub:
# install.packages("devtools")
devtools::install_github("Richard-Packer/DeepPheWAS")

Use

For a detailed tutorial see our Github pages site.

https://richard-packer.github.io/DeepPheWAS_site/

deepphewas's People

Contributors

mikkmart avatar richard-packer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

deepphewas's Issues

Can this package be run on other datasets ?

Hello,

I have a query and would like to know. Can this be used to run the analysis on other datasets? Based on what I observed, the phenotypes generation is solely based on ukbiobank.

Regards
Akhil

"Can't convert <double> to <date>" during concept creation

If for any reason the earliest_date of the occurrence of a clinical code is missing for a participant during the concept creation part, the program errors out with the following stack trace:

Error:
! Assigned data `value` must be compatible with existing data.
ℹ Error occurred for column `earliest_date`.
x Can't convert <double> to <date>.

The culprit seems to be this line which tries to assign 0 also to the missing date values:

https://github.com/Richard-Packer/DeepPheWAS/blob/d569e975213e4138d026fffe28ea1f42d6aa15ee/R/concept_creation.R#L203C21-L203C21

Handling duplicated columns in tab data files

We have our tab data spread across multiple files. However, the files have some overlap in the columns that they include. Currently, minimum_data_R() combines data from multiple files with a dplyr::full_join(), which causes duplicate columns to be suffixed with .x and .y respectively. As a result, the columns are essentially missing downstream, where such suffixes aren't expected.

Could minimum_data_R() be updated to be a bit smarter about this, and in the presence of duplicate columns for example use the one from the file specified last in the list?

Problem with test data

Hi!

I'm trying to run the test data to better understand the software but I have the following problem. It seems that "hesin.csv.gz" file didn't have the correct colnames, in fact it only have 4 columns and it is suposed to have more. I also found an error with "GPC.csv.gz" file because of an incorrect colname, but I have change it manually. Can you please help me with this issue?

./02_data_preparation.R
--save_location $phewas_folder/data/
--min_data $phewas_folder/data/min_tab_test.gz
--hesin_diag $package_folder/extdata/worked_example/HES_Diag.csv.gz
--HESIN $package_folder/extdata/worked_example/hesin.csv.gz
--death_cause $package_folder/extdata/worked_example/death.csv.gz
--death $package_folder/extdata/worked_example/death_date.csv.gz
--king_coef $package_folder/extdata/worked_example/KING_coef.csv.gz
--GPC $package_folder/extdata/worked_example/GPC_new.csv.gz
Joining with by = join_by(eid)
Joining with by = join_by(eid)
Error in dplyr::na_if():
! Can't convert y to match type of x <data.table>.
Backtrace:

  1. ├─DeepPheWAS::data_preparation_R(...)
  2. │ └─... %>% tidyr::drop_na()
  3. ├─tidyr::drop_na(.)
  4. ├─dplyr::select(., .data$eid, .data$ins_index, .data$dates)
  5. ├─dplyr::mutate(., dates = lubridate::dmy(.data$dated))
  6. ├─dplyr::mutate(...)
  7. ├─dplyr::na_if(., "")
  8. │ └─vctrs::vec_cast(x = y, to = x, x_arg = "y", to_arg = "x")
  9. └─vctrs (local) <fn>()
  10. └─vctrs::vec_default_cast(...)
  11. ├─base::withRestarts(...)
    
  12. │ └─base (local) withOneRestart(expr, restarts[[1L]])
    
  13. │   └─base (local) doWithOneRestart(return(expr), restart)
    
  14. └─vctrs::stop_incompatible_cast(...)
    
  15.   └─vctrs::stop_incompatible_type(...)
    
  16.     └─vctrs:::stop_incompatible(...)
    
  17.       └─vctrs:::stop_vctrs(...)
    
  18.         └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
    

Warning messages:
1: There was 1 warning in dplyr::mutate().
ℹ In argument: date_of_dx = lubridate::ymd(...).
Caused by warning:
! 295 failed to parse.
2: In data_preparation_R(min_data = arguments$min_data, GPC = arguments$GPC, :
'HESIN' does not have the correct colnames and may not produce the correct output, expected colnames are:
'eid,ins_index,dsource,source,epistart,epiend,epidur,bedyear,epistat,epitype,epiorder,spell_index,spell_seq,spelbgin,spelend,speldur,pctcode,gpprpct,category,elecdate,elecdur,admidate,admimeth_uni,admimeth,admisorc_uni,admisorc,firstreg,classpat_uni,classpat,intmanag_uni,intmanag,mainspef_uni,mainspef,tretspef_uni,tretspef,operstat,disdate,dismeth_uni,dismeth,disdest_uni,disdest,carersi'
not:
eid,ins_index,epistart,admidate
differences between inputed file and expected are:
dsource,source,epiend,epidur,bedyear,epistat,epitype,epiorder,spell_index,spell_seq,spelbgin,spelend,speldur,pctcode,gpprpct,category,elecdate,elecdur,admimeth_uni,admimeth,admisorc_uni,admisorc,firstreg,classpat_uni,classpat,intmanag_uni,intmanag,ma [... truncated]

Best regards,

Judit

How to create input fields to run 02_data_preparation.R

Dear developer,
I have successfully finished the example pipline. But, I met some problem with 02_data_preparation.R in my real UKB data. I have the UKB fields 20002 and 20004 in my min_tab_test.gz.
Could you tell me how to create this fields?

--hesin_diag $package_folder/extdata/worked_example/HES_Diag.csv.gz \
--HESIN $package_folder/extdata/worked_example/hesin.csv.gz \
--death_cause $package_folder/extdata/worked_example/death.csv.gz \
--death $package_folder/extdata/worked_example/death_date.csv.gz \
--king_coef $package_folder/extdata/worked_example/KING_coef.csv.gz \
--GPC $package_folder/extdata/worked_example/GPC.csv.gz

Or can I use these $package_folder/extdata/worked_example/HES_Diag.csv.gz in whole UKB data?

Hope to receive your answer soon.

Best wishes

dplyr version must be less than v1.1.0

I noticed that I kept getting an error when running 02_data_preparation.R saying Can't convert 'y' <character> to match type of 'x' <integer>.. After some searching online, I realized that this is because according to the changelog for dplyr 1.1.0, na_if() now uses the vctrs package, which is stricter about type stability. After downgrading my dplyr package in R to v1.0.10, the error is gone. At the moment in your imports you only require dplyr (>= 1.0.0). I recommend changing this to dplyr (>= 1.0.0 & <=1.0.10).

Problem running phenotype generation

Hi Richard,
I've just tried running the phenotype generation script using the examples you provided in the manual and got the following error.
image

Do you think you could help pointing out where/how I could get zlib installed? I tried several ways (e.g. using install.packages("Rcompression", repos = "http://www.omegahat.net/R ") or tried the link http://www.gzip.org/zlib/ ) but it does not work. Installing this package using "install.packages('zlib')" gave me the following error
image

Your help will be highly appreciated.
Many thanks
Hang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.