Code Monkey home page Code Monkey logo

Comments (2)

ramiromagno avatar ramiromagno commented on May 28, 2024

Hi Tota,

Thank you for your question.

Your code seems fine overall.

Let me just present a tidier approach that:

  1. Relies on dplyr::left_join() to join tables (better than assuming that the number of rows across tables will match by position, in general they will not, although in your case it seems to work)
  2. Genes are obtained from the genes table in the associations object, not from the genomic_contexts table from the variants object. Results should be similar, but the genes table reflects the genes to be annotated with that variant according to the authors of the study, whereas the genes listed genomic_contexts are generated by the GWAS Catalog team by automatic workflows. So, why am I getting the genes from the genes table? Just to show you that you could do differently. Stick with your approach if you prefer the annotation by the GWAS Catalog team instead of the original authors'.
  3. Keeps all gene names associated with a locus, instead of only the first one.

With regards to efficiency, both methods are equivalent. The bottleneck is always the retrieval of the data from the GWAS Catalog. In your case we need to get associations and variants. The rest is just wrangling.

Feel free to ask more questions! Happy coding.

library(gwasrapidd)
  
study_id <- "GCST004132"
associations <- get_associations(study_id = study_id)
variants <- get_variants(study_id = study_id)

# Because there are more than on gene associated with a locus
genes <- associations@genes %>%
  dplyr::group_by(association_id, locus_id) %>%
  dplyr::summarise(gene_name = paste(gene_name, collapse = ' '), .groups = 'drop')

association_results <-
  associations@associations %>%
  dplyr::select(association_id, pvalue, beta_number, or_per_copy_number) %>%
  dplyr::left_join(associations@risk_alleles, by = 'association_id') %>%
  dplyr::left_join(genes, by = c('association_id', 'locus_id')) %>%
  dplyr::left_join(variants@variants, by = c('variant_id')) %>%
  dplyr::transmute(
    study_id = study_id,
    association_id = association_id,
    ID = variant_id,
    CHROM = chromosome_name,
    POS = chromosome_position,
    risk_allele = risk_allele,
    gene_name = gene_name,
    P = pvalue,
    beta = beta_number,
    OR = or_per_copy_number
  )

association_results
#> # A tibble: 119 × 10
#>    study_id  association_id ID    CHROM    POS risk_allele gene_name     P  beta
#>    <chr>     <chr>          <chr> <chr>  <int> <chr>       <chr>     <dbl> <dbl>
#>  1 GCST0041… 19144286       rs34… 1     1.60e8 G           SLAMF8    1e- 6    NA
#>  2 GCST0041… 19144332       rs25… 3     5.31e7 C           intergen… 6e- 9    NA
#>  3 GCST0041… 19144360       rs56… 3     1.89e8 C           LPP       6e-10    NA
#>  4 GCST0041… 19144385       rs80… 13    4.23e7 C           AKAP11    4e- 8    NA
#>  5 GCST0041… 19144411       rs48… 22    3.69e7 C           NCF4      2e- 8    NA
#>  6 GCST0041… 19144456       rs10… 16    8.28e7 A           CDH13     1e- 9    NA
#>  7 GCST0041… 19713859       rs14… 7     5.03e7 <NA>        C7orf72 … 9e-12    NA
#>  8 GCST0041… 19713864       rs12… 17    4.24e7 <NA>        NAGLU ST… 2e-11    NA
#>  9 GCST0041… 19713869       rs51… 19    4.87e7 <NA>        IZUMO1 N… 4e-11    NA
#> 10 GCST0041… 19713874       rs10… 2     4.36e7 <NA>        THADA ZF… 4e-11    NA
#> # … with 109 more rows, and 1 more variable: OR <dbl>

from gwasrapidd.

totajuliusd avatar totajuliusd commented on May 28, 2024

Thank you very much, when I said more efficient I meant tidier, so this is perfect and exactly what I wanted!

from gwasrapidd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.