Comments (7)
Hi @mzzclb
Thank you for reaching out.
Because I am also having trouble retrieving data from the GWAS Catalog I can't check the issue you are reporting.
For the moment, check whether your problem might be related to this question: https://rmagno.eu/gwasrapidd/articles/faq.html#genomic-coordinates-of-genomic-contexts-seem-to-be-wrong.
Meanwhile I will check with the GWAS Catalog team why the server is not responding.
from gwasrapidd.
Thank you for replying.
What I mentioned is not really related to the topic above at the link.
I mean that a variant can have different gene clusters in genomic_context and ensembl_ids segments.
Could you examine the pdf file I added as an example? I created it from rmarkdown.
ensembl_ids-and-genomic_context-of-a-variant.pdf
from gwasrapidd.
Hi @mzzclb
The GWAS Catalog is running well again, so perhaps you could provide a specific example illustrating your question. I will try to answer nevertheless based on what you wrote.
The genomic_contexts
table provides all Ensembl and RefSeq genes mapping within 50kb upstream and downstream of each GWAS Catalog variant.
Then, a specific gene is typically associated with one Ensembl identifier only but there are cases when it is associated with more than one Ensembl identifier, e.g. a gene locates in the haplotypic MHC region, see discussion here. The table ensembl_ids
provides that info.
Here is an example:
library(gwasrapidd)
my_variants <- get_variants(variant_id = "rs2269423")
print(my_variants@genomic_contexts, n = 20)
#> # A tibble: 200 × 12
#> variant_id gene_name chromosome_name chromosome_position distance
#> <chr> <chr> <chr> <int> <int>
#> 1 rs2269423 FKBPL 6 32177930 47642
#> 2 rs2269423 PPT2 6 32177930 14252
#> 3 rs2269423 TNXB 6 32177930 68592
#> 4 rs2269423 NOTCH4 6 32177930 16913
#> 5 rs2269423 RNA5SP206 6 32177930 99302
#> 6 rs2269423 RNA5SP206 6 32177930 99302
#> 7 rs2269423 TSBP1-AS1 6 32177930 76710
#> 8 rs2269423 PPT2-EGFL8 6 32177930 5952
#> 9 rs2269423 FKBPL 6 32177930 47642
#> 10 rs2269423 GPSM3 6 32177930 12836
#> 11 rs2269423 PBX2 6 32177930 6803
#> 12 rs2269423 MIR6721 6 32177930 7814
#> 13 rs2269423 ATF6B 6 32177930 49677
#> 14 rs2269423 EGFL8 6 32177930 9649
#> 15 rs2269423 NOTCH4 6 32177930 16913
#> 16 rs2269423 LOC100507547 6 32177930 23565
#> 17 rs2269423 TNXB 6 32177930 62596
#> 18 rs2269423 AGPAT1 6 32177930 0
#> 19 rs2269423 MIR6833 6 32177930 1886
#> 20 rs2269423 PPT2 6 32177930 14255
#> # ℹ 180 more rows
#> # ℹ 7 more variables: is_mapped_gene <lgl>, is_closest_gene <lgl>,
#> # is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>,
#> # mapping_method <chr>
print(my_variants@ensembl_ids, n = 20)
#> # A tibble: 77 × 3
#> variant_id gene_name ensembl_id
#> <chr> <chr> <chr>
#> 1 rs2269423 FKBPL ENSG00000224200
#> 2 rs2269423 FKBPL ENSG00000204315
#> 3 rs2269423 FKBPL ENSG00000223666
#> 4 rs2269423 FKBPL ENSG00000230907
#> 5 rs2269423 PPT2 ENSG00000228116
#> 6 rs2269423 PPT2 ENSG00000206329
#> 7 rs2269423 PPT2 ENSG00000168452
#> 8 rs2269423 PPT2 ENSG00000206256
#> 9 rs2269423 PPT2 ENSG00000236649
#> 10 rs2269423 PPT2 ENSG00000221988
#> 11 rs2269423 PPT2 ENSG00000231618
#> 12 rs2269423 TNXB ENSG00000168477
#> 13 rs2269423 TNXB ENSG00000236236
#> 14 rs2269423 TNXB ENSG00000206258
#> 15 rs2269423 TNXB ENSG00000229353
#> 16 rs2269423 TNXB ENSG00000233323
#> 17 rs2269423 TNXB ENSG00000231608
#> 18 rs2269423 NOTCH4 ENSG00000235396
#> 19 rs2269423 NOTCH4 ENSG00000223355
#> 20 rs2269423 NOTCH4 ENSG00000204301
#> # ℹ 57 more rows
Created on 2023-07-04 with reprex v2.0.2
from gwasrapidd.
Hi @ramiromagno,
Thank you for your time.
What i mentioned is not related to different ensembl ids assigning to teh same gene.
A variant can have different gene clusters in genomic_context and ensembl_ids segments.
Could you examine the code pasted below?
The genes of HCG23 and LOC105379657 are available in the ensembl_ids segment of the given variant although none of them is in the genomic_context segment.
library(gwasrapidd) rs137931178 <- gwasrapidd::get_variants(variant_id = "rs137931178") # I have checked rs13793117 as an example unique_genes_of_rs137931178_in_genomic_context <- unique(rs137931178@genomic_contexts$gene_name) unique_genes_of_rs137931178_in_ensembl_ids <- unique(rs137931178@ensembl_ids$gene_name) genes_of_genomic_context_of_rs137931178_not_in_ensembl_ids_rs137931178 <- setdiff(unique_genes_of_rs137931178_in_genomic_context,unique_genes_of_rs137931178_in_ensembl_ids) print(genes_of_genomic_context_of_rs137931178_not_in_ensembl_ids_rs137931178) # HCG23 and LOC105379657 are available in the ensembl_ids segment although none of them is in the genomic_context segment.
Why are some genes not included in the gene group in ensembl_ids segment of the variant?
from gwasrapidd.
Hi @mzzclb,
I think I understand your question now, although I also think you've written the opposite of what you meant at the certain point. But please tell me otherwise.
So, in principle, you can have more gene names included in genomic_contexts
than in ensembl_ids
table but not the other way around. In your example that is the case. You have HCG23 and LOC105379657 in genomic_contexts
but not in ensembl_ids
. The reverse does not happen, i.e. you don't have a gene name showing up in ensembl_ids
that would be missing from genomic_contexts
.
When you wrote:
The genes of HCG23 and LOC105379657 are available in the ensembl_ids segment of the given variant although none of them is in the genomic_context segment.
I think you meant the other way around because HCG23 and LOC105379657 are available in the genomic_contexts
table but not in ensembl_ids
.
So why is it normal to have some gene names in the genomic_contexts
but not in the table ensembl_ids
. Well, like I said earlier, the genomic_contexts
table provides all Ensembl and RefSeq genes mapping within 50kb upstream and downstream of each GWAS Catalog variant. However, only Ensembl genes have associated Ensembl identifiers. So there are RefSeq genes that either have other names in Ensembl or are non-existent at all, and therefore do not have an associated Ensembl identifier. The two cases you report are examples of each of these cases:
- The RefSeq gene HCG23 is known as TSBP1-AS1 in Ensembl. Note that TSBP1-AS1 is present both in
genomic_contexts
and inensembl_ids
. - The RefSeq gene LOC105379657 is the name of a gene used by the NCBI when a published symbol is not available, i.e. orthologs have not yet been determined and hence the gene will provide a symbol that is constructed as 'LOC' + the GeneID. Again, this gene name only makes sense in the context of the NCBI system, not Ensembl's, so it has not an associated Ensembl identifier.
I hope this helps.
from gwasrapidd.
Thank you very much @ramiromagno
from gwasrapidd.
You're welcome!
from gwasrapidd.
Related Issues (20)
- Response code 500 when using get_studies() HOT 8
- parsing issue while using get_variants() HOT 10
- Gwascatcollect<-function(gene, chr=xx, start=xx, end=xx) HOT 3
- GRASP: Genome-Wide Repository of Associations Between SNPs and Phenotypes HOT 4
- The most efficient way to retrieve association results by study_id? HOT 2
- institutional logo not rendered in footer
- revisit FAQ 5 HOT 1
- consider transitioning to the (not so) new pkgdown website template HOT 2
- Error: parse error: premature EOF in study responses
- Response code was 500. HOT 3
- About failing download the studies of "get_studies()“ HOT 6
- Error when running get_associations() HOT 4
- List of Variants to GWAS associations HOT 3
- Problem with obtaining the RAF for individual variants contained within associations with a haplotype
- The associations number obtained by "gwasrapidd" differs extremly from obtained in GWAS Catalog HOT 5
- How do I export `my_associations` to a table file, separated by tabs HOT 8
- FR: Add export functionality for gwasrapidd objects: `write_xlsx()`
- `get_studies()` not returning a scores object with efo id `"MONDO_0004648"`
- How download TSV file like on the web GWAS Catalog?or any function can do like that? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gwasrapidd.