Code Monkey home page Code Monkey logo

Comments (7)

ramiromagno avatar ramiromagno commented on June 8, 2024

Hi @mzzclb

Thank you for reaching out.

Because I am also having trouble retrieving data from the GWAS Catalog I can't check the issue you are reporting.

For the moment, check whether your problem might be related to this question: https://rmagno.eu/gwasrapidd/articles/faq.html#genomic-coordinates-of-genomic-contexts-seem-to-be-wrong.

Meanwhile I will check with the GWAS Catalog team why the server is not responding.

from gwasrapidd.

mzzclb avatar mzzclb commented on June 8, 2024

Thank you for replying.

What I mentioned is not really related to the topic above at the link.

I mean that a variant can have different gene clusters in genomic_context and ensembl_ids segments.
Could you examine the pdf file I added as an example? I created it from rmarkdown.
ensembl_ids-and-genomic_context-of-a-variant.pdf

from gwasrapidd.

ramiromagno avatar ramiromagno commented on June 8, 2024

Hi @mzzclb

The GWAS Catalog is running well again, so perhaps you could provide a specific example illustrating your question. I will try to answer nevertheless based on what you wrote.

The genomic_contexts table provides all Ensembl and RefSeq genes mapping within 50kb upstream and downstream of each GWAS Catalog variant.

Then, a specific gene is typically associated with one Ensembl identifier only but there are cases when it is associated with more than one Ensembl identifier, e.g. a gene locates in the haplotypic MHC region, see discussion here. The table ensembl_ids provides that info.

Here is an example:

library(gwasrapidd)

my_variants <- get_variants(variant_id = "rs2269423")

print(my_variants@genomic_contexts, n = 20)
#> # A tibble: 200 × 12
#>    variant_id gene_name    chromosome_name chromosome_position distance
#>    <chr>      <chr>        <chr>                         <int>    <int>
#>  1 rs2269423  FKBPL        6                          32177930    47642
#>  2 rs2269423  PPT2         6                          32177930    14252
#>  3 rs2269423  TNXB         6                          32177930    68592
#>  4 rs2269423  NOTCH4       6                          32177930    16913
#>  5 rs2269423  RNA5SP206    6                          32177930    99302
#>  6 rs2269423  RNA5SP206    6                          32177930    99302
#>  7 rs2269423  TSBP1-AS1    6                          32177930    76710
#>  8 rs2269423  PPT2-EGFL8   6                          32177930     5952
#>  9 rs2269423  FKBPL        6                          32177930    47642
#> 10 rs2269423  GPSM3        6                          32177930    12836
#> 11 rs2269423  PBX2         6                          32177930     6803
#> 12 rs2269423  MIR6721      6                          32177930     7814
#> 13 rs2269423  ATF6B        6                          32177930    49677
#> 14 rs2269423  EGFL8        6                          32177930     9649
#> 15 rs2269423  NOTCH4       6                          32177930    16913
#> 16 rs2269423  LOC100507547 6                          32177930    23565
#> 17 rs2269423  TNXB         6                          32177930    62596
#> 18 rs2269423  AGPAT1       6                          32177930        0
#> 19 rs2269423  MIR6833      6                          32177930     1886
#> 20 rs2269423  PPT2         6                          32177930    14255
#> # ℹ 180 more rows
#> # ℹ 7 more variables: is_mapped_gene <lgl>, is_closest_gene <lgl>,
#> #   is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>,
#> #   mapping_method <chr>
print(my_variants@ensembl_ids, n = 20)
#> # A tibble: 77 × 3
#>    variant_id gene_name ensembl_id     
#>    <chr>      <chr>     <chr>          
#>  1 rs2269423  FKBPL     ENSG00000224200
#>  2 rs2269423  FKBPL     ENSG00000204315
#>  3 rs2269423  FKBPL     ENSG00000223666
#>  4 rs2269423  FKBPL     ENSG00000230907
#>  5 rs2269423  PPT2      ENSG00000228116
#>  6 rs2269423  PPT2      ENSG00000206329
#>  7 rs2269423  PPT2      ENSG00000168452
#>  8 rs2269423  PPT2      ENSG00000206256
#>  9 rs2269423  PPT2      ENSG00000236649
#> 10 rs2269423  PPT2      ENSG00000221988
#> 11 rs2269423  PPT2      ENSG00000231618
#> 12 rs2269423  TNXB      ENSG00000168477
#> 13 rs2269423  TNXB      ENSG00000236236
#> 14 rs2269423  TNXB      ENSG00000206258
#> 15 rs2269423  TNXB      ENSG00000229353
#> 16 rs2269423  TNXB      ENSG00000233323
#> 17 rs2269423  TNXB      ENSG00000231608
#> 18 rs2269423  NOTCH4    ENSG00000235396
#> 19 rs2269423  NOTCH4    ENSG00000223355
#> 20 rs2269423  NOTCH4    ENSG00000204301
#> # ℹ 57 more rows

Created on 2023-07-04 with reprex v2.0.2

from gwasrapidd.

mzzclb avatar mzzclb commented on June 8, 2024

Hi @ramiromagno,

Thank you for your time.

What i mentioned is not related to different ensembl ids assigning to teh same gene.

A variant can have different gene clusters in genomic_context and ensembl_ids segments.
Could you examine the code pasted below?

The genes of HCG23 and LOC105379657 are available in the ensembl_ids segment of the given variant although none of them is in the genomic_context segment.

library(gwasrapidd) rs137931178 <- gwasrapidd::get_variants(variant_id = "rs137931178") # I have checked rs13793117 as an example unique_genes_of_rs137931178_in_genomic_context <- unique(rs137931178@genomic_contexts$gene_name) unique_genes_of_rs137931178_in_ensembl_ids <- unique(rs137931178@ensembl_ids$gene_name) genes_of_genomic_context_of_rs137931178_not_in_ensembl_ids_rs137931178 <- setdiff(unique_genes_of_rs137931178_in_genomic_context,unique_genes_of_rs137931178_in_ensembl_ids) print(genes_of_genomic_context_of_rs137931178_not_in_ensembl_ids_rs137931178) # HCG23 and LOC105379657 are available in the ensembl_ids segment although none of them is in the genomic_context segment.

Why are some genes not included in the gene group in ensembl_ids segment of the variant?

from gwasrapidd.

ramiromagno avatar ramiromagno commented on June 8, 2024

Hi @mzzclb,

I think I understand your question now, although I also think you've written the opposite of what you meant at the certain point. But please tell me otherwise.

So, in principle, you can have more gene names included in genomic_contexts than in ensembl_ids table but not the other way around. In your example that is the case. You have HCG23 and LOC105379657 in genomic_contexts but not in ensembl_ids. The reverse does not happen, i.e. you don't have a gene name showing up in ensembl_ids that would be missing from genomic_contexts.

When you wrote:

The genes of HCG23 and LOC105379657 are available in the ensembl_ids segment of the given variant although none of them is in the genomic_context segment.

I think you meant the other way around because HCG23 and LOC105379657 are available in the genomic_contexts table but not in ensembl_ids.

So why is it normal to have some gene names in the genomic_contexts but not in the table ensembl_ids. Well, like I said earlier, the genomic_contexts table provides all Ensembl and RefSeq genes mapping within 50kb upstream and downstream of each GWAS Catalog variant. However, only Ensembl genes have associated Ensembl identifiers. So there are RefSeq genes that either have other names in Ensembl or are non-existent at all, and therefore do not have an associated Ensembl identifier. The two cases you report are examples of each of these cases:

  1. The RefSeq gene HCG23 is known as TSBP1-AS1 in Ensembl. Note that TSBP1-AS1 is present both in genomic_contexts and in ensembl_ids.
  2. The RefSeq gene LOC105379657 is the name of a gene used by the NCBI when a published symbol is not available, i.e. orthologs have not yet been determined and hence the gene will provide a symbol that is constructed as 'LOC' + the GeneID. Again, this gene name only makes sense in the context of the NCBI system, not Ensembl's, so it has not an associated Ensembl identifier.

I hope this helps.

from gwasrapidd.

mzzclb avatar mzzclb commented on June 8, 2024

Thank you very much @ramiromagno

from gwasrapidd.

ramiromagno avatar ramiromagno commented on June 8, 2024

You're welcome!

from gwasrapidd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.