ramiromagno / gwasrapidd Goto Github PK

View Code? Open in Web Editor NEW

83.0 4.0 13.0 20.38 MB

gwasrapidd: an R package to query, download and wrangle GWAS Catalog data

Home Page: https://rmagno.eu/gwasrapidd/

License: Other

R 98.68% HTML 0.96% TeX 0.37%

gwas-catalog r rest-client snp trait trait-ontology human association-studies

gwasrapidd's Introduction

gwasrapidd

The goal of {gwasrapidd} is to provide programmatic access to the NHGRI-EBI Catalog of published genome-wide association studies.

Get started by reading the documentation.

Installation

Install {gwasrapidd} from CRAN:

install.packages("gwasrapidd")

Cheatsheet

Example

Get studies related to triple-negative breast cancer:

library(gwasrapidd)
studies <- get_studies(efo_trait = 'triple-negative breast cancer')
studies@studies[1:4]
## # A tibble: 3 × 4
##   study_id     reported_trait         initial_sample_size replication_sample_s…¹
##   <chr>        <chr>                  <chr>               <chr>                 
## 1 GCST002305   Breast cancer (estrog… 1,529 European anc… 2,148 European ancest…
## 2 GCST010100   Breast cancer (estrog… 8,602 European anc… <NA>                  
## 3 GCST90029052 15-year breast cancer… 5,631 European anc… <NA>                  
## # ℹ abbreviated name: ¹replication_sample_size

Find associated variants with study GCST002305:

variants <- get_variants(study_id = 'GCST002305')
variants@variants[c('variant_id', 'functional_class')]
## # A tibble: 5 × 2
##   variant_id functional_class   
##   <chr>      <chr>              
## 1 rs4245739  3_prime_UTR_variant
## 2 rs2363956  missense_variant   
## 3 rs10069690 intron_variant     
## 4 rs3757318  intron_variant     
## 5 rs10771399 intergenic_variant

Citing this work

{gwasrapidd} was published in Bioinformatics in 2019: https://doi.org/10.1093/bioinformatics/btz605.

To generate a citation for this publication from within R:

citation('gwasrapidd')
## To cite gwasrapidd in publications use:
## 
##   Ramiro Magno, Ana-Teresa Maia, gwasrapidd: an R package to query,
##   download and wrangle GWAS Catalog data, Bioinformatics, btz605, 2
##   August 2019, Pages 1-2, https://doi.org/10.1093/bioinformatics/btz605
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {gwasrapidd: an R package to query, download and wrangle GWAS Catalog data},
##     author = {Ramiro Magno and Ana-Teresa Maia},
##     journal = {Bioinformatics},
##     year = {2019},
##     pages = {1--2},
##     url = {https://doi.org/10.1093/bioinformatics/btz605},
##   }

Code of Conduct

Please note that the {gwasrapidd} project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Similar projects

Bioconductor R package gwascat by Vincent J Carey: https://www.bioconductor.org/packages/release/bioc/html/gwascat.html
Web application PhenoScanner V2 by Mihir A. Kamat, James R. Staley, and others: http://www.phenoscanner.medschl.cam.ac.uk/
Web application GWEHS: Genome-Wide Effect sizes and Heritability Screener by Eugenio López-Cortegano and Armando Caballero: http://gwehs.uvigo.es/

Acknowledgements

This work would have not been possible without the precious help from the GWAS Catalog team, particularly Daniel Suveges.

gwasrapidd's People

Contributors

Stargazers

Watchers

Forkers

xtmgah simexin wxyz tzup dpcscience dlhuang aarbduarte nunonog peranti yuan-zheng-rong fmadani jtnedoctor hadley

gwasrapidd's Issues

Error: parse error: premature EOF in study responses

As encountered by Theresia Handayani Mina,

There are SNPs triggering study JSON responses that are truncated which results in an error in jsonlite::fromJSON() , which in turn results in an error in gwasrapidd:::gc_request().

The upstream issue: EBISPOT/goci-rest#39.

Functions for identifier mapping needed

Functions mapping between study, association, variant and trait identifiers are needed.

Something like:

study_id_to_association_id()
association_id_to_study_id()
study_id_to_variant_id()

etc.

Make functions that search by either `efo_trait` or `reported_trait` case sensitive again

Remove the wrap in tolower() in these cases:

get_studies_by_efo_trait

gwasrapidd/R/get_studies.R

Line 441 in 700c86f

urltools::url_encode(tolower(efo_trait)))

get_studies_by_reported_trait

gwasrapidd/R/get_studies.R

Line 491 in 700c86f

urltools::url_encode(tolower(reported_trait)))

get_variants_by_efo_trait

gwasrapidd/R/get_variants.R

Line 459 in 700c86f

urltools::url_encode(tolower(efo_trait)))

get_variants_by_reported_trait

gwasrapidd/R/get_variants.R

Line 513 in 700c86f

urltools::url_encode(tolower(reported_trait)))

get_associations_by_efo_trait

gwasrapidd/R/get_associations.R

Line 295 in 700c86f

urltools::url_encode(tolower(efo_trait)))

get_traits_by_efo_trait

gwasrapidd/R/get_traits.R

Line 278 in 700c86f

urltools::url_encode(tolower(efo_trait)))

revisit FAQ 5

Check if it still applies:
https://rmagno.eu/gwasrapidd/articles/faq.html#5-genomic-coordinates-of-genomic-contexts-seem-to-be-wrong-

Why are some gene names present in `genomic_contexts` but not in `ensembl_ids`?

Hi,
First thank you for such a great package.

I have been working on retrieval of gene data of certain variants through gwasrapidd package. I realized that variants can have incompatible gene data in ensembl_ids and genomic_context segments.
For example, let assume I retrieve data of a variant using get_variants function. Some gene names of the variant might be different in the ensembl_ids table (or segment) than in the genomic_context table (or segment).

What could be the reason for this difference?

What is the difference between genomic_context and ensembl_ids of a variant in terms of gene?

Unfortunately, today i cannot reach gwas through gwasrapidd package. When i run the functions, i have retrieved zero data. Thus, i cannot add any example files.

get_associations using reported trait

Is it possible to search for associations using the reported trait? I checked and it does not seem possible. The get_associations() function only allows one to search on efo_trait and efo_id but not reported_trait.

Problem with obtaining the RAF for individual variants contained within associations with a haplotype

I have came across a potential issue when using gwasrapidd to get associations with traits, for which multiple SNPs are reported (this is, the association is with an haplotype).

For example, when you run:
get_associations(study_id = "GCST000480"), we pull three associations as reported in the GWAS catalog, but when we inspect the risk_alleles slot, the risk allelic frequency (RAF) (or risk_frequency in gwasrapidd) for the individual SNPs isn't reported.

This means that the RAF that is displayed on the GWAS catalog web interface (which I assume corresponds to the haplotype RAF) is not displayed at all in the gwasrapidd interface.

Despite this small issue, I am loving your tool! Thank you so much!

`cols` is now required. Please use `cols

get_variants() is throwing all these warnings:

`cols` is now required. Please use `cols

See: https://community.rstudio.com/t/cols-is-now-required-please-use-cols/40350

Check tidyr::unnest() usage.

Response code was 500.

Hi, thank you for the useful package. When I run get_associations() for some SNPs (submitting one SNP at a time, but many in a row, in a for loop), I get an error 500. It seems this is when the request is too large. Do you have any suggestions? Is is possible to return an error file giving me the variant_id which produced the 500 response? Thank you

library('gwasrapidd')
library('tidyverse')
library('glue')

# Manually selected list of EFO terms
efo_traits=list('aortic valve disease', 'heart valve disease','coronary artery disease')

# Import reference, in order to extract EFO_ids
efo_codes<-read_csv('notebooks/suspension/scanpy_clustering/EBI_codes.csv')

# Make a table of SNPs for each trait
list_of_tibbles<-list()
for (i in seq_along(efo_traits)) {
  efo_code<-efo_codes$EFO_ids[efo_codes$`Disease trait`==as.character(efo_traits[i])]
  variants<-get_variants(efo_trait=as.character(efo_traits[i])) #gets variants for the trait
  variants_table<-variants@variants
  variants_table$efo_term<-as.character(efo_traits[i])
    for (j in seq_along(variants_table$variant_id)) {#for each variant, get the association info (pval, beta etc) and adds that to the variants_table
      associations<-get_associations(variant_id=variants_table$variant_id[j])
      associations_table<-associations@associations
      variants_table$pvalue[j]<-associations_table$pvalue[1]
      variants_table$pvalue_description[j]<-associations_table$pvalue_description[1]
      variants_table$beta_number[j]<-associations_table$beta_number[1]
      variants_table$beta_unit[j]<-associations_table$beta_unit[1]
      variants_table$beta_direction[j]<-associations_table$beta_direction[1]
      if (length(associations@associations$association_id) > 1) { # marks if a variant is associated with multiple traits
        variants_table$multiple_associations[j]<-"yes"
        studies<-get_studies(variant_id=variants_table$variant_id[j])
        variants_table$publications_number[j]<-length(studies@publications$study_id)
      }
      else {
        variants_table$multiple_associations[j]<-"no"
        variants_table$publications_number[j]<-1
        }
  list_of_tibbles[[i]]<-variants_table
  write_csv(list_of_tibbles[[i]],glue('/nfs/team205/heart/EBI_SNP_enrichment/traits/{unique(efo_codes$EFO_ids[efo_codes$`EFO term`==as.character(efo_traits[i])])}_{as.character(efo_traits[i])}_EBI_GWAS_SNPs_with_positions.csv'))
  }
}

Error in as.vector(x) : no method for coercing this S4 class to a vector

Note to self:

library(tidyverse)
library(gwasrapidd)
library(GenomicRanges)
gwasrapidd::get_associations(variant_id = "rs1800629", efo_trait = "cancer")

works seemingly fine.

On the other hand,

library(tidyverse)
library(GenomicRanges)
library(gwasrapidd)
gwasrapidd::get_associations(variant_id = "rs1800629", efo_trait = "cancer")

result in this error:

> gwasrapidd::get_associations(variant_id = "rs1800629", efo_trait = "cancer")
Error in as.vector(x) : no method for coercing this S4 class to a vector

Which is, probably, a symptom of S4 dispatch using the wrong generic.

See this discussion here: https://r.789695.n4.nabble.com/Conflicting-definitions-for-function-redefined-as-S4-generics-td4687570.html.

How to get non-unioned results when using a list as a parameter?

Hello!

When using a get function, is it possible to non-unique results when using a list as a parameter? Here is what I am trying to do:

studyID = "GCST001718" # a study containing association ids belonging to 3 separate traits
associationsTibble <- get_associations(study_id = studyID)@associations # getting the associations from the study
association_ids <- associationsTibble[["association_id"]] # there are 7 associations in the study
#trying to add the result of get_traits using the association_id list as a new column gives an error because get functions only return unique values, even with the set_operation="intersection" parameter
combinedTable <- add_column(associationsTibble, trait = get_traits(association_id = association_ids, set_operation = "intersection")@traits)

Error: New columns must be compatible with `.data`.
x New column has 3 rows.
i `.data` has 7 rows.

The set operation appears to work only when there are multiple parameters passed into a get function (ie: study_id and association_id). Is there anyway to keep all results when passing in a list as a parameter instead of just unique values?

Thanks!

The most efficient way to retrieve association results by study_id?

Hi, can you please tell me what would be the most efficient way to extract the following information for a particular study by study_id?

What I would like to end up with, would be a table with the following columns:

CHROM, POS, P, beta_or_OR,risk_allle (or REF and ALT),gene_name

I can do this using the following code. But I havent used S4 objects before, and am sure there must be a simpler way of making this table by using them correctly?

study_id <- "GCST004132"
associations <- get_associations(study_id=study_id)
variants <- get_variants(study_id = study_id)

assoc_df <- data.frame(P = as.numeric(associations@associations$pvalue), ID = associations@risk_alleles$variant_id, beta=associations@associations$beta_number,OR=associations@associations$or_per_copy_number, risk_allele=associations@risk_alleles$risk_allele)
variants_df <- data.frame(ID=variants@variants$variant_id, CHROM=variants@variants$chromosome_name, POS=variants@variants$chromosome_position)

variants_assoc_df <- merge(assoc_df, variants_df, by="ID")
gene_name_df <- variants@genomic_contexts %>% distinct(variant_id, .keep_all=T) %>% select(variant_id,gene_name) %>%rename(ID=variant_id) %>% as.data.frame()

variants_with_gene_name <- inner_join(variants_assoc_df, gene_name_df, by="ID")

I appreciate any help with this,
Tota

consider transitioning to the (not so) new pkgdown website template

Unexpected error in get_variants_by_efo_id: Error: length(efo_trait) not greater than 0

gwasrapidd:::get_variants_by_efo_id(efo_id = 'EFO_0000000')

Warning: The request for https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0000000 failed: response code was 404.
Error: length(efo_trait) not greater than 0
In addition: Warning message:
In gc_request_all(resource_url = resource_url, base_url = base_url,  :
 
 Error: length(efo_trait) not greater than 0

That call should not err, but only trigger a warning.

The problem comes from not asserting the NULL value in:

gwasrapidd/R/get_variants.R

Line 181 in d51d07a

trait_descriptions <- traits@traits$trait

and moving on to the call get_variants_by_efo_trait empty-handed.

A manifestation of issue #3 .

parsing issue while using get_variants()

Hi @ramiromagno

I used the get_variants() function using the gene list and got the following error.

variant_data <- get_variants(gene_name = gene_list)
#  downloading [===================>-----------------------------------------------------]  28% eta:  6mErreur : parse error: premature EOF
#                                       {   "_embedded" : {     "single
#                     (right here) ------^

Are you aware of this issue and have any workaround?

Meanwhile, I will try finding a workaround. Thanks for this fantastic package.

Link association_id, study_id and ancestry_id

Hi there

Thanks for creating a lovely package!

Is there a way to retrieve associations searching on reported trait and then linking the associations to study_id and ancestry?
This is what I do at the moment:

get_studies(reported_trait = "colorectal cancer")
Then I loop the get_associations() function over the study_ids retrieved from the first step.
I'd now like to link the associations to their ancestry. I thought I'd be able to do that using study_id but this doesn't work because ancestry_id varies within study_id.

many thanks
Philip

get_snp_by_id and get_snp_by_location not working for rs35252396 and rs6470588

rs35252396 and rs6470588 return more than one hit in locations::_links::snps where only one was (erroneously) presumed:

{
  "rsId" : "rs35252396",
  "merged" : 0,
  "functionalClass" : "intron_variant",
  "lastUpdateDate" : "2018-07-21T04:19:30.186+0000",
  "locations" : [ {
    "chromosomeName" : "8",
    "chromosomePosition" : 127877125,
    "region" : {
      "name" : "8q24.21"
    },
    "_links" : {
      "snps" : [ {
        "href" : "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs35252396"
      }, {
        "href" : "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs6470588"
      } ]
    }
  } ],

and

{
  "rsId" : "rs6470588",
  "merged" : 0,
  "functionalClass" : "intron_variant",
  "lastUpdateDate" : "2018-07-21T04:19:29.693+0000",
  "locations" : [ {
    "chromosomeName" : "8",
    "chromosomePosition" : 127877125,
    "region" : {
      "name" : "8q24.21"
    },
    "_links" : {
      "snps" : [ {
        "href" : "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs35252396"
      }, {
        "href" : "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs6470588"
      } ]
    }
  } ],

I need to write a patch here:

gwasrapidd/R/snp.R

Lines 200 to 258 in b062b9c

    
           filter_genomic_location_by_chr_name <- function(df, chr_names_to_keep = chromosomes, warnings = TRUE) { 
        
             genomic_locations_variables <- c("chromosomeName", 
        
                                              "chromosomePosition", 
        
                                              "region.name", 
        
                                              "_links.snps.href") 
        
             if(!is.data.frame(df)) 
        
               stop("df needs to be a dataframe.") 
        
             if(!(all(genomic_locations_variables %in% colnames(df)))) 
        
               stop("df must contain all of the following variables:\n", 
        
                    concatenate::cc_and(genomic_locations_variables),".") 
        
             if(identical(nrow(df), 0L)) { 
        
               if(warnings) 
        
                 warning("The dataframe df is empty. Filling in NAs...") 
        
               # A one-row tibble filled with NA values. 
        
               df2 <- tibble::tibble("chromosomeName" = NA_character_, 
        
                                     "chromosomePosition" = NA_integer_, 
        
                                     "region.name" = NA_character_, 
        
                                     "_links.snps.href" = NA_character_) 
        
               return(df2) 
        
             } 
        
             # To appease R CMD check (not happy with this.) 
        
             chromosomeName <- NULL 
        
             # Filter genomic locations by the variable chromosomeName 
        
             df2 <- dplyr::filter(df, chromosomeName %in% chr_names_to_keep) 
        
             # If filtering resulted in an empty dataframe, just return it. 
        
             if(identical(nrow(df2), 0L)) { 
        
               if(warnings) 
        
                 warning("Filtering of genomic locations resulted in an empty dataframe!") 
        
               # A one-row tibble filled with NA values. 
        
               df2 <- tibble::tibble("chromosomeName" = NA_character_, 
        
                                     "chromosomePosition" = NA_integer_, 
        
                                     "region.name" = NA_character_, 
        
                                     "_links.snps.href" = NA_character_) 
        
               return(df2) 
        
             } 
        
             # If only one genomic location is found, nice!, 
        
             # that's how we expected it to be. 
        
             if(identical(nrow(df2), 1L)) 
        
               return(tibble::as_tibble(df2)) 
        
             # If more than one location is found, err.. it's a bit strange as one SNP 
        
             # should map only to one genomic location in one bona fide chromosome, so we 
        
             # enforce it to be one location only and return the first row (ad hoc choice). 
        
             if(nrow(df2) > 1L) { 
        
               if(warnings) 
        
                 warning("Filtering of genomic locations did not result in one unique location!\n", 
        
                         "Picking the first, ad hoc.") 
        
               return(tibble::as_tibble(df2[1, ])) 
        
             } 
        
           }

The associations number obtained by "gwasrapidd" differs extremly from obtained in GWAS Catalog

Hi,
Using the code as follows, only 182 associations obtained
my_associations <- get_associations(efo_id = "EFO_0005140")
but in the GWAS Catalog web. it shows more than 7000 associations

institutional logo not rendered in footer

e.g. here https://rmagno.eu/gwasrapidd/articles/faq.html#5-genomic-coordinates-of-genomic-contexts-seem-to-be-wrong-

check r-lib/pkgdown#2020 for the solution.

Refactor parsing functions to use `purrr::pluck()`

Refactor all those parsing functions in files parse-*.R to use purrr::pluck() so that we deal with missing elements gracefully: https://adv-r.hadley.nz/subsetting.html#subsetting-oob.

How do I export `my_associations` to a table file, separated by tabs

library(gwasrapidd)
child_traits <- gwasrapidd::get_child_efo("EFO_0004515")$EFO_0004515
child_traits_in_gwas_cat <- all_traits[child_traits]
my_associations <- get_associations(efo_id = c("EFO_0004515", child_traits_in_gwas_cat@traits$efo_id))
my_associations1 <- my_associations 
write.table(my_associations,"muscle measurement.txt",quote = FALSE)

error 
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class ‘structure("associations", package = "gwasrapidd")’ to a data.frame

I would like to obtain information on the location of snp loci

About failing download the studies of "get_studies()“

Professor:
There is one question I want to consult. When I input command of "get_sudies()",

Allstudies <- get_studies()

Do you still want to proceed (y/n)? y

R.studio return:
"OK! Getting all studies then. This is going to take a while...
downloading [==>--------------------------------] 8% eta: 4hWarning: The request for https://www.ebi.ac.uk/gwas/rest/api/studies?page=332&size=20 failed: response code was 500.
downloading [=========================>---------] 74% eta: 1hWarning: The request for https://www.ebi.ac.uk/gwas/rest/api/studies?page=2957&size=20 failed: response code was 500.
downloading [==============================>----] 89% eta: 23m"

Could you tell me how can I deal with this problem? Looking forward to your reply. Thank you very much.

Response code 500 when using get_studies()

When I run:

gwasrapidd::get_studies(reported_trait = "Glioma")

I get this error message:
Warning message:
In gc_request_all(resource_url = resource_url, base_url = base_url, :
The request for https://www.ebi.ac.uk/gwas/rest/api/studies/search/findByDiseaseTrait?diseaseTrait=Glioma failed: response code was 500.

Do you know why I am getting this error?

discrepancies between get_variant and get_association

I was expecting these two different search strategies to identify the same variant IDs but they don't. Do you know why they perform differently?

trait="Plasma omega-6 polyunsaturated fatty acid levels (arachidonic acid)"
efo="arachidonic acid measurement"
efo_id=NULL

Strategy 1. identify variant IDs using combination of get_studies and get_associations

gwas_studies<-gwasrapidd::get_studies(efo_trait = efo,efo_id=efo_id,reported_trait=trait)			
gwas_associations<-gwasrapidd::get_associations(study_id = gwas_studies@studies$study_id)

> gwas_associations@risk_alleles$variant_id
 [1] "rs2581624"   "rs174545"    "rs4246215"   "rs174601"    "rs8523"     
 [6] "rs174549"    "rs174541"    "rs174549"    "rs174550"    "rs12580543" 
[11] "rs3811444"   "rs174535"    "rs2110073"   "kgp12662626" "rs7258177"  
[16] "rs174528"    "rs174577"    "kgp12035941" "rs1795851"   "rs1404384"  
[21] "rs16979306"  "rs6133127"   "rs1688589"   "rs1539053"   "kgp9433132" 
[26] "rs1882496"   "kgp6577813"  "kgp2178364"  "rs12714668"  "rs1080261"  
[31] "rs9942436"   "rs2637523"   "rs10839732"  "rs12471016"  "rs16829840" 
[36] "rs274557"    "rs9394931"   "rs12209128"  "rs174547"    "rs1741"     
[41] "rs102275"

Strategy 2. identify variant IDs using get_variants

gwas_variants<-gwasrapidd::get_variants(efo_trait = efo,efo_id=efo_id,reported_trait=trait)		
> gwas_variants@variants$variant_id
 [1] "rs1080261"   "rs7258177"   "rs12714668"  "rs16979306"  "rs174528"   
 [6] "rs9942436"   "kgp2178364"  "rs174577"    "kgp12035941" "kgp6577813" 
[11] "rs1795851"   "rs2637523"   "rs1882496"   "rs1539053"   "rs174545"   
[16] "rs1404384"   "rs6133127"   "kgp12662626" "kgp9433132"  "rs1688589"  
[21] "rs10839732"  "rs2581624"   "rs274557"    "rs1741"      "rs12471016" 
[26] "rs174547"    "rs12209128"  "rs9394931"   "rs102275"    "rs16829840"

Why are the two sets of variants different?

FR: Add export functionality for gwasrapidd objects: `write_xlsx()`

Error when running get_associations()

When running
gwasrapidd::get_associations(efo_id ="EFO:0001663",verbose = verbose,warnings = warnings)

I get the following error message:
Error: Elements 1 of is_efo_id2(efo_id) are not true

I wonder what is the cause of the error?

Gwascatcollect<-function(gene, chr=xx, start=xx, end=xx)

Hi Ramiro,

I am wondering is there any function to collect GWAS records for given gene?

like: Gwascatcollect<-function(gene, chr=xx, start=xx, end=xx)

Thanks.

Shicheng

NCBI build conversion

No information about the build of GWAS catalog which is in build hg38.
Does your package support the liftover conversation to, for example, hg19?

Is it possible to add additional gene information?

Hi Ramiro, Thanks for this amazing package.

Would it be possible to add Biotype in the association object?

For example, GATA3 is of protein_coding type. Source: https://www.ebi.ac.uk/gwas/genes/GATA3

Thanks.

Retrieve all genetic associations searching on reported trait and efo simultaneously

I'd like to be able to retrieve genetic associations searching on reported trait and EFO at the same time (ie all genetic associations that match either the reported trait or EFO). However, get_associations does not allow searching on reported trait. A workaround is to first search on get_study using reported trait (which is invariant within gwas catalog study ID) and then retrieve the genetic associations using the study ID. However, this strategy cannot be used to identify associations for EFOs because EFO is not necessarily invariant within GWAS catalog study ID. Therefore, it seems that to identify genetic associations for reported trait and EFO you have to do the search in the follow three steps:

e.g.
Step1. Identify genetic associations for reported trait
trait="Plasma omega-6 polyunsaturated fatty acid levels (arachidonic acid)"
gwas_studies<-gwasrapidd::get_studies(reported_trait=trait)
gwas_associations<-gwasrapidd::get_associations(study_id = gwas_studies@studies$study_id)

Step 2. Identify genetic associations for efo
efo="arachidonic acid measurement"
gwas_associations<-gwasrapidd::get_associations(efo_trait=efo)

Step3. Combine genetic associations from steps1 and step2
association_ids<-unique(c(gwas_associations1@associations$association_id,gwas_associations2@associations$association_id))
gwas_associations3<-gwasrapidd::get_associations(association_id=association_ids)

Or alternatively
gwas_associations3<-unique(rbind(gwas_associations2@associations,gwas_associations1@associations))

Do you agree this is the best way to retrieve genetic associations for reported trait and efo?

Problems with ensembl_id

Hi,

This seems like a great package but I haven't been able to use it. The direct installation of the package didn't work (because LEGACY variable in unnest function was being set to TRUE). So I manually sourced all the files and corrected this part in utils.R. The functions however still don't run. I keep getting the error of the form:
"Column ensembl_id must be length x or y, not z"
where x, y, z are numbers that change based on the argument (e.g. different gene_name in get_variants function).

What am I doing wrong?
Help needed!!

Fix documentation of `cytogenetic_bands`

gwasrapidd/R/data.R

Line 25 in d51d07a

#' @format A data frame with 862 rows and 7 variables:

should be 8 variables, not 7.

`get_studies()` not returning a scores object with efo id `"MONDO_0004648"`

> get_studies(efo_id = "MONDO_0004648")
# A tibble: 0 × 7
# ℹ 7 variables: study_id <chr>, pubmed_id <int>, publication_date <date>, publication <chr>, title <chr>, author_fullname <chr>, author_orcid <chr>

Bug in example

There seems to be a bug in the example on the github README.md.
It all works until the following: variants <- get_variants(study_id = 'GCST002305')
Then I get the error:

Error: Tibble columns must have consistent lengths, only values of length one are recycled:
* Length 0: Columns `chromosome_name`, `chromosome_position`
* Length 15: Columns `gene_name`, `distance`, `is_mapped_gene`, `is_closest_gene`, `is_intergenic`, … (and 4 more)
Run `rlang::last_error()` to see where the error occurred.

How download TSV file like on the web GWAS Catalog？or any function can do like that？

how can I get those?
‘
variant_id p_value chromosome base_pair_location effect_allele other_allele effect_allele_frequency odds_ratio ci_lower ci_upper
rs888953847 0.9626 1 594445 T C 6e-04 0.985 0.522 1.857 -0.0152 0.3236 20301 21839 0 1 chr1:594445:C:T
rs1040232850 0.267 1 595762 CTG C 0.9986 0.779 0.502 1.210 -0.2494 0.2247 26798 28624 0 0.528 chr1:595762:CTG:C
rs1390538076 0.8214 1 630947 A G 3e-04 1.102 0.473 2.571 0.0975 0.432 20301 21839 0 1 chr1:630947:G:A
...
’
Thanks.

Regular expression in function is_study_id no longer works for newest studies

Newer studies have ID numbers with more than 6 digits. As of today, October 29, 2020, the highest number of digits in a study ID is 8. However, the regular expression checking whether a string is a study ID or not has 6 as the max number of digits in a study ID. This should be an easy fix, but a more robust workaround may be necessary in the future.

gwasrapidd/R/parse-utils.R

Line 241 in ca5957d

is_accession <- stringr::str_detect(str2, "^GCST\\d{6}$")

List of Variants to GWAS associations

Hi -- Is there a function where I can specify a list of variants and get back a list of traits those variants have been associated with in GWAS studies?
Thanks.
Liz

Extract other information using gwasrapidd

Hi,

Thanks a lot for developing the package. I am enjoying using the gwasrapidd to extract information from GWAS catalog API.

Could I know whether I can define other columns to be extracted or only the column in class?studies (for example, besides reported traits in the table, I also want to extract curated traits and background traits like the webpage search result) ?

Thank you for your help
Yue

"get_variants" query error

library(gwasrapidd)
ae_variants = get_variants(association_id = c("20171" ,   "20172"))

results in error:

Error: 'unnest_legacy' is not an exported object from 'namespace:tidyr'

Very Small P Values are Rounded to 0

A few of the studies in the GWAS catalog have p values that are smaller than 1e-300 and can even be as small as 1e-500 or 1e-600. When I use the "get_associations" function, these p values are rounded to 0.

Ex: for the study "GCST001884", rs10737680 and rs10490924 have p values of 1x10-434 and 4x10-89 respectively.
(Source: https://www.ebi.ac.uk/gwas/studies/GCST001884)

However, when I run the following command:
get_associations(study_id = "GCST001884")@associations[["pvalue"]]
These two SNPs have p values of 0e+00.

Is there anything that can be done to fix this?

Error in get_studies

Hi there,

I found an issue happening during the parsing of the JSON object returned by the GWAS Catalog API. When a study has "diseaseTrait": null , the gwasrapidd::get_studies function raises the following error:

> Error in obj$content$studies$diseaseTrait$trait : 
> $ operator is invalid for atomic vectors

This function call reproduces the error:

gwasrapidd::get_studies(association_id = '62871909', verbose = T)
#> Base URL: https://www.ebi.ac.uk/gwas/rest/api.
#> Requesting resource: https://www.ebi.ac.uk/gwas/rest/api/associations/62871909/study.
#> Using the user agent: gwasrapidd: GWAS R API Data Download.
#> Response code: 200.
#> Response content type: application/json.
#> Error in obj$content$studies$diseaseTrait$trait : 
#>   $ operator is invalid for atomic vectors

I am running gwasrapidd 0.99.9 on R 4.0.3.

I guess a simple check on the JSON names (e.g. !is.null(obj$content$studies$diseaseTrait)) before accessing the data could work as a fix.

Update documentation of `exists_variant()`

Documentation of exists_variant is not up to date: https://rmagno.eu/gwasrapidd/reference/exists_variant.html. The second example in section Examples is no longer correct:

exists_variant('rs123456') # FALSE
#> rs123456 
#>     TRUE

It returns TRUE now. Just need to pick another made-up rsId to have an example that returns FALSE.

GRASP: Genome-Wide Repository of Associations Between SNPs and Phenotypes

Hi Ramiro,

Any plan to add GRASP to the package?

https://grasp.nhlbi.nih.gov/Overview.aspx

Thanks.

Shicheng

	filter_genomic_location_by_chr_name <- function(df, chr_names_to_keep = chromosomes, warnings = TRUE) {

	genomic_locations_variables <- c("chromosomeName",
	"chromosomePosition",
	"region.name",
	"_links.snps.href")

	if(!is.data.frame(df))
	stop("df needs to be a dataframe.")

	if(!(all(genomic_locations_variables %in% colnames(df))))
	stop("df must contain all of the following variables:\n",
	concatenate::cc_and(genomic_locations_variables),".")

	if(identical(nrow(df), 0L)) {
	if(warnings)
	warning("The dataframe df is empty. Filling in NAs...")

	# A one-row tibble filled with NA values.
	df2 <- tibble::tibble("chromosomeName" = NA_character_,
	"chromosomePosition" = NA_integer_,
	"region.name" = NA_character_,
	"_links.snps.href" = NA_character_)
	return(df2)
	}

	# To appease R CMD check (not happy with this.)
	chromosomeName <- NULL
	# Filter genomic locations by the variable chromosomeName
	df2 <- dplyr::filter(df, chromosomeName %in% chr_names_to_keep)

	# If filtering resulted in an empty dataframe, just return it.
	if(identical(nrow(df2), 0L)) {
	if(warnings)
	warning("Filtering of genomic locations resulted in an empty dataframe!")

	# A one-row tibble filled with NA values.
	df2 <- tibble::tibble("chromosomeName" = NA_character_,
	"chromosomePosition" = NA_integer_,
	"region.name" = NA_character_,
	"_links.snps.href" = NA_character_)
	return(df2)
	}

	# If only one genomic location is found, nice!,
	# that's how we expected it to be.
	if(identical(nrow(df2), 1L))
	return(tibble::as_tibble(df2))

	# If more than one location is found, err.. it's a bit strange as one SNP
	# should map only to one genomic location in one bona fide chromosome, so we
	# enforce it to be one location only and return the first row (ad hoc choice).
	if(nrow(df2) > 1L) {
	if(warnings)
	warning("Filtering of genomic locations did not result in one unique location!\n",
	"Picking the first, ad hoc.")
	return(tibble::as_tibble(df2[1, ]))
	}
	}

ramiromagno / gwasrapidd Goto Github PK

gwasrapidd's Introduction

gwasrapidd

Installation

Cheatsheet

Example

Citing this work

Code of Conduct

Similar projects

Acknowledgements

gwasrapidd's People

Contributors

Stargazers

Watchers

Forkers

gwasrapidd's Issues

get_studies_by_efo_trait

get_studies_by_reported_trait

get_variants_by_efo_trait

get_variants_by_reported_trait

get_associations_by_efo_trait

get_traits_by_efo_trait

Recommend Projects

Recommend Topics

Recommend Org