Code Monkey home page Code Monkey logo

Comments (4)

ramiromagno avatar ramiromagno commented on May 24, 2024 1

Hi Pradeep,

Thank you for your question! Let me look into that to see if that piece of information is provided by the GWAS REST API.

from gwasrapidd.

ramiromagno avatar ramiromagno commented on May 24, 2024 1

Hi again Pradeep,

It seems that that information about the genes is only provided at the GWAS Catalog website, not via REST API at the moment.

I looked a bit though how the front-end website retrieves that information, and I found the URL being requested. So I wrote a bit of a hack to get that info into R: the get_gene() function. Then get_genes() is just a vectorised version of get_gene().

Finally I exemplify how you could use these functions to augment the genes table in the associations object. I hope this solution is helpful for the time being.

library(dplyr, warn.conflicts = FALSE)
library(glue, warn.conflicts = FALSE)
library(httr, warn.conflicts = FALSE)
library(tidyjson, warn.conflicts = FALSE)
library(purrr)
library(gwasrapidd, warn.conflicts = FALSE)

# For retrieving info about one gene only.
get_gene <- function(gene_name) {
  
  gene_name <- URLencode(gene_name)
  url <- glue::glue("https://www.ebi.ac.uk/gwas/api/search?q=title%3A+%22{gene_name}%22+AND+resourcename%3Agene")
  response <- httr::GET(url)
  json <- httr::content(response, 'text', encoding = 'UTF-8')
  
  tbl <-
    json %>%
    tidyjson::enter_object('response', 'docs') %>%
    tidyjson::gather_array() %>%
    tidyjson::spread_values(
      gene_name = tidyjson::jstring('title'),
      description = tidyjson::jstring('description'),
      chromosome_name = tidyjson::jstring('chromosomeName'),
      chromosome_start = tidyjson::jnumber('chromosomeStart'),
      chromosome_end = tidyjson::jnumber('chromosomeEnd'),
      cytogenetic_band = tidyjson::jstring('cytobands'),
      biotype = tidyjson::jstring('biotype')
    ) %>%
    tidyjson::as_tibble() %>%
    dplyr::select(-c('document.id', 'array.index'))
  
  return(tbl)
}

# Maps over genes
get_genes <- function(gene_names) {
  purrr::map_dfr(gene_names, get_gene)
}

# As an example, let us get some studies about rheumatoid arthritis.
my_studies <- get_studies(efo_trait = 'rheumatoid arthritis')

# Going after those associations related to the first study found only (for the
# sake of simplicity)
one_study <- my_studies[1]
my_associations <- get_associations(study_id = one_study@studies$study_id)
gene_info <- get_genes(my_associations@genes$gene_name)

my_associations@genes <- dplyr::left_join(my_associations@genes, gene_info, by = 'gene_name')

# The slot `genes` of the `associations` object is now augmented to include
# those extra details about each gene.
my_associations@genes
#> # A tibble: 5 × 9
#>   association_id locus_id gene_name description chromosome_name chromosome_start
#>   <chr>             <int> <chr>     <chr>       <chr>                      <dbl>
#> 1 17729                 1 BLK       BLK proto-… 8                       11486894
#> 2 17731                 1 ARHGEF3   Rho guanin… 3                       56727418
#> 3 17883                 1 HLA-DRB1  major hist… 6                       32578769
#> 4 17730                 1 TRHDE     thyrotropi… 12                      72087266
#> 5 40454                 1 PADI4     peptidyl a… 1                       17308195
#> # … with 3 more variables: chromosome_end <dbl>, cytogenetic_band <chr>,
#> #   biotype <chr>

from gwasrapidd.

ramiromagno avatar ramiromagno commented on May 24, 2024 1

Indeed! It seems that the gene being queried is used as a keyword that will match any gene name containing that keyword. But do not worry, when you do the dplyr::left_join() by gene_name, it has to match exactly. So the final table obtained in my_associations@genes should be as expected.

from gwasrapidd.

peranti avatar peranti commented on May 24, 2024

Thanks a lot, @ramiromagno, for this quick hack.

I have tested these functions briefly and made the following observation. If the input contains HLA to the get_genes() function, it fetches the information for even HLA-E, HLA-F etc. Similarly, if the input contains C10orf95, then it will fetch the C10orf95-AS1.

from gwasrapidd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.