bioconductor / annotationhub Goto Github PK
View Code? Open in Web Editor NEWClient for the Bioconductor AnnotationHub web resource
Client for the Bioconductor AnnotationHub web resource
Just wanna suggest to add:
BugReport: https://github.com/Bioconductor/AnnotationHub/issues
to make it easier to find this page. The only info on http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html is the maintainer email address.
Installing and loading the library seems to be ok but I cannot use the library at all. I even open an new R session and yet the error is the same, please help. Below are the error and the session information.
AnnotationHub()
Error in overscope_eval_next(overscope, x) :
could not find function "overscope_eval_next"
Error in overscope_clean(overscope) :
could not find function "overscope_clean"
BiocManager::valid(pkgs = "AnnotationHub")
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details
replacement repositories:
CRAN: https://cran.rstudio.com/
[1] TRUE
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] AnnotationHub_2.18.0 BiocFileCache_1.10.2 dbplyr_1.4.0 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 later_0.8.0 pillar_1.3.1
[4] compiler_3.6.0 BiocManager_1.30.18 tools_3.6.0
[7] digest_0.6.18 bit_4.0.4 RSQLite_2.2.14
[10] memoise_2.0.1 tibble_2.1.1 pkgconfig_2.0.2
[13] rlang_1.0.2 shiny_1.3.2 DBI_1.1.2
[16] cli_3.3.0 rstudioapi_0.10 curl_3.3
[19] yaml_2.2.0 fastmap_1.1.0 dplyr_0.8.0.1
[22] httr_1.4.0 IRanges_2.20.2 vctrs_0.4.1
[25] S4Vectors_0.24.4 rappdirs_0.3.1 stats4_3.6.0
[28] bit64_4.0.5 tidyselect_0.2.5 Biobase_2.46.0
[31] glue_1.6.2 R6_2.4.0 AnnotationDbi_1.48.0
[34] purrr_0.3.2 blob_1.2.3 magrittr_1.5
[37] promises_1.0.1 htmltools_0.3.6 assertthat_0.2.1
[40] xtable_1.8-4 mime_0.6 interactiveDisplayBase_1.24.0
[43] httpuv_1.5.1 cachem_1.0.6 crayon_1.3.4
[46] BiocVersion_3.10.1
As a follow up of this question, while I was able to replicate the analysis by using the same version of R and Bioconductor used at the time, I would like to understand the source of the discrepancies between using the AH107424 annotation file in 2022 and AH114250 in 2023.
The issue is that I do not get the same GO Terms through enrichGO
with both annotation files. Since the AH114250 annotation file is quite recent, is it a question of the annotation with GO Terms 'catching up'?
Plotted in 2022 using older Atlantic salmon annotation file. Downregulated (A) vs Upregulated (B).
Plotted in 2024 using more recent Atlantic salmon annotation file
The help page for setAnnotationHubOption()
mentions setHubOption()
which is non-existing:
Would there be any way to move interactiveDisplayBase to a Suggests? Installing shiny and all of its deps seems like overkill just to access an annotation web service.
Hi there,
I thought I had fixed this issue when I updated AnnotationHub and Bioconductor, but it came back. When I try to use Select(), I get the following error:
Error: 'select' is not an exported object from 'namespace:AnnotationHub'
We use a proxy for internet access and set environment variables http_proxy and https_proxy. That works for curl and httr to fetch data without any errors (e.g. httr::GET('https://www.google.com')
returns a 200 OK). However, curl::has_internet()
here
AnnotationHub/R/AnnotationHub-class.R
Line 20 in c5c464e
still returns FALSE
b/c it uses nslookup to resolve a random address which won't work behind the proxy. If i understand this correctly, this means that at the moment it's not possible to use AnnotationHub behind a proxy.
If i manually step around the call to curl::has_internet
with
ah <- AnnotationHub::.Hub("AnnotationHub", getAnnotationHubOption("URL"),
getAnnotationHubOption("CACHE"), use_proxy(Sys.getenv("http_proxy")),
FALSE)
the resulting AnnotationHub object works as expected. Maybe AnnotationHub
could skip the curl:has_internet()
if a proxy is specified and instead functionally test the ability to fetch data?
AnnotationHub/vignettes/AnnotationHub.Rmd
Line 235 in 09fb82f
I can't get that to work, although the vignette exists on landing page
We have HPC hosts that have working HTTP/HTTPS proxies, but no support for nslookup. This causes AnnotationHub()
to incorrectly believe it has no access to the server. If the internal nslookup test could be skipped, then it would indeed have worked.
$ R --vanilla
> hub <- AnnotationHub::AnnotationHub()
Cannot connect to AnnotationHub server, using 'localHub=TRUE' instead
/wynton/home/cbi/hb/.cache/AnnotationHub
does not exist, create directory? (yes/no): no
This is because AnnotationHub::AnnotationHub
uses:
> curl::nslookup("annotationhub.bioconductor.org")
Error in curl::nslookup("annotationhub.bioconductor.org") :
Unable to resolve host: annotationhub.bioconductor.org
to test whether it can connect to that server. However, a non-nslookup connection test shows that it works:
> readLines(curl::curl("https://annotationhub.bioconductor.org"), n = 5L)
[1] "<html>"
[2] "<head>"
[3] " <title>BiocHub Server API</title>"
[4] "</head>"
[5] "<body>"
Another proof that curl::nslookup()
is a too conservative test is to override it's result, e.g.:
trace(AnnotationHub::AnnotationHub, at = 3L, tracer = quote(connect <- TRUE))
# Tracing function "AnnotationHub" in package "AnnotationHub"
# [1] "AnnotationHub"
hub <- AnnotationHub::AnnotationHub()
# Tracing AnnotationHub::AnnotationHub() step 3
# Testing for internet connectivity via https_proxy... success!
# snapshotDate(): 2020-04-27
Call the following first, will workaround the current AnnotationHub()
limitation:
AnnotationHub::setAnnotationHubOption("PROXY", Sys.getenv("https_proxy"))
Since it's not "unheard of" that access to nslookup can be restricted on some compute environments, I'd like to suggest to use another approach, e.g. above curl::curl()
approach, or something that works like curl --head ...
and checks the return status. The latter could even be a fallback to the current curl::nslookup()
test.
Even without changing the current approach, it would be neat if one could skip the test and just let it try. One natural approach would be to support:
hub <- AnnotationHub::AnnotationHub(proxy=TRUE)
by updating the code to:
if (is.null(proxy)) {
connect <- !is.null(curl::nslookup("annotationhub.bioconductor.org",
error = FALSE))
}
else if (isTRUE(proxy)) {
connect <- TRUE
proxy <- NULL
else {
connect <- TRUE
message("Assuming valid proxy connection through '",
ifelse(is(proxy, "request"), paste(unlist(proxy),
collapse = ":"), proxy), "'", "\n If you experience connection issues consider ",
"using 'localHub=TRUE'")
}
> packageVersion("AnnotationHub")
[1] ‘2.20.2’
I just wanted to point out that using clusterprofiler with OrgDb objects is not ideal for less well annotated species. This is the case where the OrgDb comes from AnnotationHub.
This includes rice for example. The issue is with OrgDb not having translations from EntrezIDs to GO terms ~75% of the input EntrezIDs do not map to GO terms through this method.
Since the OrgDb object does not have an ensembl keytype I was forced to translate using biomart from ensembl to entrez. This also loses some IDs.
A direct translation from ensembl to GO terms leads to only ~39 % non-mapping genes.
I am unaware of a method to update OrgDb objects with, for example, new keyTypes. But need to look into it as this clusterprofiler method for GSEA is unusable for lesser annotated species.
I have not tried creating an OrgDb from ncbi, but I would not recommend using AnnotationHub for anyhting other than arabisopsis/human
Well, troubleshooting is not supposed to be fun anyway. However I think in this case it could be made a little bit easier. First I have to say that once I was able to figure out what documentation exactly I had to look at, the documentation was very helpful and it allowed me to quickly fix my cache corruption problem. So thanks for the great documentation!
The issue I was troubleshooting is the following cache corruption problem:
library(ExperimentHub)
eh <- ExperimentHub()
# snapshotDate(): 2020-02-26
bis_1072 <- eh[["EH1072"]] # warning: this is big! (2.9G)
# see ?tissueTreg and browseVignettes('tissueTreg') for documentation
# Error: failed to load resource
# name: EH1072
# title: Bisulfite sequencing data from tissue Tregs (per sample)
# reason: Corrupt Cache: resource path
# See vignette section on corrupt cache
# cache: /home/hpages/.cache/ExperimentHub
# potential duplicate files:
# 36a232eda19_1072
# 36a4e0f9ec3_1072
While I was troubleshooting it, I ran into a few minor rough edges that I thought it might be worth sharing here.
1. The error message is too vague
The recommendation to "See vignette section on corrupt cache" is too vague and it took me a while to actually find the right vignette. Naively, since I'm using ExperimentHub, I would assume that I need to look at the ExperimentHub vignette. But the "Access the ExperimentHub Web Service" vignette in ExperimentHub doesn't say anything about "corrupt cache".
My next guess was that the error message actually comes from BiocFileCache so I should probably look at the BiocFileCache vignette. (Note that I'm only able to make that guess because I know that ExperimentHub uses BiocFileCache behind the scene but most users don't know that.) No such luck: the BiocFileCache vignette doesn't say anything about "corrupt cache" either.
So I cheated (because I can, but most users won't be able to do this): I grep
-ped the full Bioconductor code source with the "See vignette section on corrupt cache"
pattern (I have local git clones of all software packages on my laptop) and found occurrences of the pattern in AnnotationHub code. Ah I remember now that ExperimentHub also relies on AnnotationHub so it can reuse some of the functionalities implemented there. Let's try browseVignettes("AnnotationHub")
. Oops, it's showing a list of 4 vignettes! Not sure which one I'm supposed to look at. After opening the "Access the AnnotationHub Web Service" vignette and realizing that it says nothing about "corrupt cache" either, I finally figured out that the vignette I am supposed to look at is the "Troubleshoot The Hubs" vignette in AnnotationHub.
Could the error message say that?
2. Use bfccache(bfc)
instead of hubCache(ah)
at the end of section 2.2.3
I would suggest using bfccache(bfc)
instead of hubCache(ah)
at the end of the "2.2.3 resource path" subsection. Someone who jumps directly into that section (which is what I did) doesn't have an ah
object around but they do have the bfc
object and it's pointing to the right cache location (there is no such guarantee with the ah
object, and in my case it would actually be the wrong place to remove files from).
3. Why enforce the dplyr dialect on the reader?
Especially when you can just do
bfcinfo(bfc, rid="BFC90")$rpath # universally understood AND less typing
instead of
bfcinfo(bfc, rid="BFC90") %>% dplyr::select(rpath) # hard-to-read dialect (except
# for dplyr fans) AND more typing
Furthermore, the first time I tried it, the dplyr dialect gave me an error:
> bfcinfo(bfc, rid="BFC90") %>% dplyr::select(rpath)
Error in bfcinfo(bfc, rid = "BFC90") %>% dplyr::select(rpath) :
could not find function "%>%"
but that's my fault for jumping directly into section 2.2.3.
Thanks!
sessionInfo():
R Under development (unstable) (2019-10-30 r77336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /home/hpages/R/R-4.0.r77336/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.0.r77336/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] dplyr_0.8.4 tissueTreg_1.7.0 ExperimentHub_1.13.5
[4] AnnotationHub_2.19.7 BiocFileCache_1.11.4 dbplyr_1.4.2
[7] BiocGenerics_0.33.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 pillar_1.4.3
[3] compiler_4.0.0 BiocManager_1.30.10
[5] later_1.0.0 tools_4.0.0
[7] digest_0.6.25 bit_1.1-15.2
[9] RSQLite_2.2.0 memoise_1.1.0
[11] tibble_2.1.3 pkgconfig_2.0.3
[13] rlang_0.4.4.9001 cli_2.0.1
[15] shiny_1.4.0 DBI_1.1.0
[17] curl_4.3 yaml_2.2.1
[19] fastmap_1.0.1 httr_1.4.1
[21] IRanges_2.21.3 vctrs_0.2.3
[23] S4Vectors_0.25.12 rappdirs_0.3.1
[25] stats4_4.0.0 bit64_0.9-7
[27] tidyselect_1.0.0 Biobase_2.47.2
[29] glue_1.3.1 R6_2.4.1
[31] fansi_0.4.1 AnnotationDbi_1.49.1
[33] tcltk_4.0.0 purrr_0.3.3
[35] blob_1.2.1 magrittr_1.5
[37] promises_1.1.0 htmltools_0.4.0
[39] assertthat_0.2.1 mime_0.9
[41] interactiveDisplayBase_1.25.0 xtable_1.8-4
[43] httpuv_1.5.2 utf8_1.1.4
[45] crayon_1.3.4 BiocVersion_3.11.1
myhub = AnnotationHub()
snapshotDate(): 2021-05-18
getInfoOnIds(myhub, "AH72154")
myhub_id fetch_id title rdataclass status biocversion rdatadateadded rdatadateremoved
288111 AH72154 78900 org.Salmo_salar.eg.sqlite OrgDb Public 3.9 2019-05-02 NA
file_size
288111 161341440
myhub[["AH72154"]]
Error: Public
Hiya, the db is present as can be seen above, but I'm not sure what this error message means?
Hi Im trying to catche my data before working with sesame in R.
Im using the sesameDataCache() command.
But i get this error:
Show in New Window
Error under evaluation of argument 'x' under emthod for function 'query': Failed to collect lazy table.
Caused by error in db_collect()
:
! Arguments in ...
must be used.
✖ Problematic argument:
• ..1 = Inf
ℹ Did you misspell an argument name?
I have tried to update all packages related, but it does not seem to work.
Any ideas as to how I can get this to work?
Thanks
Hi AnnotationHub Team,
thank you for providing such easy access to all of this annotation data! I cant seem to find some actually basic annotations:
(CpG islands)
while browsing AnnotationHub I noticed the GRanges annotation for CpG islands is listed for hg19 but not for hg38.
see:
ah->AnnotationHub()
q=query(ah, c("GRanges","Homo sapiens", "CpG"))
mcols(q)
(Promoters)
Also the Ensembl regulatory build seems to be missing.
(https://www.ensembl.org/info/genome/funcgen/regulatory_build.html)
(TFBS)
Now while we are at it there is a very good annotation of TFBS from JASPAR which I could not find:
http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/
Am I looking at the wrong place? If not- who should be addressed to add those tracks? (Also offering my help if needed)
Best,
Sven
Running on up-to-date installation:
library(AnnotationHub)
ah <- AnnotationHub()
d <- display(ah)
Results in pop-up with error:
DataTables warning: table id=DataTables_Table_0 - Requested unknown parameter '6' for row 0. For more information about this error, please see http://datatables.net/tn/4
and the table displayed is completely garbled:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] shiny_1.4.0 AnnotationHub_2.17.10 BiocFileCache_1.9.1
[4] dbplyr_1.4.2 BiocGenerics_0.31.6
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 pillar_1.4.2
[3] compiler_3.6.1 BiocManager_1.30.7
[5] later_1.0.0 tools_3.6.1
[7] zeallot_0.1.0 digest_0.6.21
[9] bit_1.1-14 jsonlite_1.6
[11] RSQLite_2.1.2 memoise_1.1.0
[13] tibble_2.1.3 pkgconfig_2.0.3
[15] rlang_0.4.0 DBI_1.0.0
[17] curl_4.2 yaml_2.2.0
[19] fastmap_1.0.1 dplyr_0.8.3
[21] httr_1.4.1 IRanges_2.19.17
[23] vctrs_0.2.0 S4Vectors_0.23.25
[25] rappdirs_0.3.1 stats4_3.6.1
[27] bit64_0.9-7 tidyselect_0.2.5
[29] Biobase_2.45.1 glue_1.3.1
[31] R6_2.4.0 AnnotationDbi_1.47.1
[33] purrr_0.3.2 blob_1.2.0
[35] magrittr_1.5 backports_1.1.5
[37] promises_1.1.0 htmltools_0.4.0
[39] assertthat_0.2.1 xtable_1.8-4
[41] mime_0.7 interactiveDisplayBase_1.23.0
[43] httpuv_1.5.2 crayon_1.3.4
run
ah <- AnnotationHub()
error
filter_() is deprecated as of dplyr 0.7.0
Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
In the current 'devel' version of AnnotationHub, record "AH75194" exists
> hub = AnnotationHub()
snapshotDate(): 2019-10-25
> hub[["AH75194"]]
downloading 0 resources
loading from cache
AH75194 : 81940
"/Users/ma38727/Library/Caches/AnnotationHub/d90d285daf8e_81940"
but trying to access it from a snapshotDate() before it was available gives a cryptic message
> snapshotDate(hub) <- "2019-05-02"
> hub[["AH75194"]]
Error: Public
Also, I'm not sure why we're told downloading 0 resources
or the internal information AH75194 : 81940
, which makes it hard to know, from the return object, what the original, user-facing, AH id was (i.e., the name should just be AH75194
).
Error:
terminate called after throwing an instance of 'std::runtime_error'
what(): Mutex creation failed
Aborted (core dumped)
sessioninfo()
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 19.10
Matrix products: default
BLAS/LAPACK: /home/sangram/miniconda3/envs/sig/lib/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocManager_1.30.10
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1
Hello! I am getting this error with AnnotationHub the ensembl entry for 111 works fine. Can you help me?
AnnotationHub with 1 record
# snapshotDate(): 2024-04-30
# names(): AH116860
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2024-04-30
# $title: Ensembl 112 EnsDb for Homo sapiens
# $description: Gene and protein annotations for Homo sapiens based on Ensem...
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("112", "Annotation", "AnnotationHubSoftware", "Coverage",
# "DataImport", "EnsDb", "Ensembl", "Gene", "Protein", "Sequencing",
# "Transcript")
# retrieve record with 'object[["AH116860"]]'
loading from cache
require(“ensembldb”)
Error: failed to load resource
name: AH116860
title: Ensembl 112 EnsDb for Homo sapiens
reason: file is not a database
In addition: Warning message:
Couldn't set synchronous mode: file is not a database
Use `synchronous` = NULL to turn off this warning.
Execution halted
First command:
ah=AnnotationHub()
Cannot connect to AnnotationHub server, using 'localHub=TRUE' instead
Error in .updateHubDB(hub_bfc, .class, url, proxy, localHub) :
Invalid Cache: sqlite file
Hub has not been added to cache
Run again with 'localHub=FALSE'
Then:
ah=AnnotationHub(localHub=F)
Cannot connect to AnnotationHub server, using 'localHub=TRUE' instead
Error in .updateHubDB(hub_bfc, .class, url, proxy, localHub) :
Invalid Cache: sqlite file
Hub has not been added to cache
Run again with 'localHub=FALSE'
Session info:
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /wynton/home/cbi/shared/software/CBI/R-4.0.2/lib64/R/lib/libRblas.so
LAPACK: /wynton/home/cbi/shared/software/CBI/R-4.0.2/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] AnnotationHub_2.20.2 BiocFileCache_1.12.1
[3] dbplyr_1.4.4 ensembldb_2.12.1
[5] AnnotationFilter_1.12.0 GenomicFeatures_1.40.1
[7] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2
[9] biomaRt_2.44.1 huex10stprobeset.db_8.7.0
[11] org.Hs.eg.db_3.11.4 AnnotationDbi_1.50.3
[13] IRanges_2.22.2 S4Vectors_0.26.1
[15] Biobase_2.48.0 BiocGenerics_0.34.0
[17] BiocManager_1.30.10 data.table_1.12.8
[19] gprofiler2_0.1.9
loaded via a namespace (and not attached):
[1] httr_1.4.2 tidyr_1.1.1
[3] bit64_4.0.2 jsonlite_1.6.1
[5] viridisLite_0.3.0 shiny_1.5.0
[7] assertthat_0.2.1 interactiveDisplayBase_1.26.3
[9] askpass_1.1 blob_1.2.1
[11] GenomeInfoDbData_1.2.3 Rsamtools_2.4.0
[13] yaml_2.2.1 progress_1.2.2
[15] BiocVersion_3.11.1 pillar_1.4.4
[17] RSQLite_2.2.0 lattice_0.20-41
[19] glue_1.4.1 digest_0.6.25
[21] promises_1.1.1 XVector_0.28.0
[23] colorspace_1.4-1 httpuv_1.5.4
[25] htmltools_0.5.0 Matrix_1.2-18
[27] XML_3.99-0.5 pkgconfig_2.0.3
[29] zlibbioc_1.34.0 xtable_1.8-4
[31] purrr_0.3.4 scales_1.1.1
[33] later_1.1.0.1 BiocParallel_1.22.0
[35] tibble_3.0.1 openssl_1.4.2
[37] generics_0.0.2 ggplot2_3.3.1
[39] ellipsis_0.3.1 SummarizedExperiment_1.18.2
[41] lazyeval_0.2.2 mime_0.9
[43] magrittr_1.5 crayon_1.3.4
[45] memoise_1.1.0 tools_4.0.2
[47] prettyunits_1.1.1 hms_0.5.3
[49] lifecycle_0.2.0 matrixStats_0.56.0
[51] stringr_1.4.0 plotly_4.9.2.1
[53] munsell_0.5.0 DelayedArray_0.14.1
[55] Biostrings_2.56.0 compiler_4.0.2
[57] rlang_0.4.7 grid_4.0.2
[59] RCurl_1.98-1.2 rappdirs_0.3.1
[61] htmlwidgets_1.5.1 bitops_1.0-6
[63] gtable_0.3.0 DBI_1.1.0
[65] curl_4.3 R6_2.4.1
[67] GenomicAlignments_1.24.0 dplyr_1.0.2
[69] rtracklayer_1.48.0 fastmap_1.0.1
[71] bit_4.0.4 ProtGenerics_1.20.0
[73] stringi_1.4.6 Rcpp_1.0.5
[75] vctrs_0.3.2 tidyselect_1.1.0
Hi,
I need to retrieve database AH10587 from hub which belongs to Streptomyces coelicolor, but it shows "reason: this db is of type Inparanoid8Db but this is not a defined class". Although data provider function recognize Inparanoid as class. Please suggest.
Thanks
Hello!
I have aligned and annotated my data with the mus musculus release version 108. I'm currently working on a project that requires using annotation hub to query gtf files. When could I expect AnnotationHub to have query information on release-108?
Thanks!
Emma
Hi. I want to use EnsDb.Hsapiens.v99 for annotation, but I found it was not available.
`> query(ah,"EnsDb.Hsapiens.v99")
AnnotationHub with 0 records
snapshotDate(): 2019-05-02`
Best,
Ci
> file.size("25a6546ba3c2_annotationhub.sqlite3")
[1] 121782272
> tar("25a6546ba3c2_annotationhub.sqlite3.tar.xz", "25a6546ba3c2_annotationhub.sqlite3", compression = "xz")
> file.size("25a6546ba3c2_annotationhub.sqlite3.tar.xz")
[1] 6672412
> untar("25a6546ba3c2_annotationhub.sqlite3.tar.xz", exdir = tempfile()) |> system.time()
user system elapsed
0.722 0.042 0.782
Hi,
When I try hub <- AnnotationHub()
I got this error:
Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
Could you help me?
Details
> packageVersion("AnnotationHub")
[1] ‘2.18.0’
> BiocManager::version()
[1] ‘3.10’
>BiocManager::valid(pkgs = "AnnotationHub")
[1] TRUE
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936
[2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] RUnit_0.4.32 AnnotationHub_2.18.0 BiocFileCache_1.10.2
[4] dbplyr_2.0.0 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 matrixStats_0.57.0
[3] bit64_4.0.5 RColorBrewer_1.1-2
[5] httr_1.4.2 GenomeInfoDb_1.22.1
[7] tools_3.6.1 backports_1.2.0
[9] R6_2.5.0 rpart_4.1-15
[11] Hmisc_4.4-1 DBI_1.1.0
[13] colorspace_2.0-0 nnet_7.3-14
[15] tidyselect_1.1.0 gridExtra_2.3
[17] DESeq2_1.26.0 bit_4.0.4
[19] curl_4.3 compiler_3.6.1
[21] cli_2.1.0 Biobase_2.46.0
[23] htmlTable_2.1.0 DelayedArray_0.12.3
[25] scales_1.1.1 checkmate_2.0.0
[27] genefilter_1.68.0 rappdirs_0.3.1
[29] stringr_1.4.0 digest_0.6.27
[31] foreign_0.8-71 XVector_0.26.0
[33] base64enc_0.1-3 jpeg_0.1-8.1
[35] pkgconfig_2.0.3 htmltools_0.5.0
[37] fastmap_1.0.1 htmlwidgets_1.5.2
[39] rlang_0.4.8 rstudioapi_0.13
[41] RSQLite_2.2.1 shiny_1.5.0
[43] generics_0.1.0 BiocParallel_1.20.1
[45] dplyr_1.0.2 RCurl_1.98-1.2
[47] magrittr_1.5 GenomeInfoDbData_1.2.2
[49] Formula_1.2-4 Matrix_1.2-18
[51] Rcpp_1.0.5 munsell_0.5.0
[53] S4Vectors_0.24.4 fansi_0.4.1
[55] lifecycle_0.2.0 stringi_1.4.6
[57] yaml_2.2.1 SummarizedExperiment_1.16.1
[59] zlibbioc_1.32.0 grid_3.6.1
[61] blob_1.2.1 promises_1.1.1
[63] crayon_1.3.4 lattice_0.20-41
[65] splines_3.6.1 annotate_1.64.0
[67] locfit_1.5-9.4 knitr_1.30
[69] pillar_1.4.7 GenomicRanges_1.38.0
[71] geneplotter_1.64.0 stats4_3.6.1
[73] XML_3.99-0.3 glue_1.4.2
[75] BiocVersion_3.10.1 latticeExtra_0.6-29
[77] data.table_1.13.2 BiocManager_1.30.10
[79] png_0.1-7 vctrs_0.3.4
[81] httpuv_1.5.4 gtable_0.3.0
[83] purrr_0.3.4 assertthat_0.2.1
[85] ggplot2_3.3.2 xfun_0.19
[87] mime_0.9 xtable_1.8-4
[89] later_1.1.0.1 survival_3.2-7
[91] tibble_3.0.4 AnnotationDbi_1.48.0
[93] memoise_1.1.0 IRanges_2.20.2
[95] cluster_2.1.0 ellipsis_0.3.1
[97] interactiveDisplayBase_1.24.0
In AnnotationHub:::.tidyGRanges
the genome of a GTF is set by either using data from GenomeInfoDb
, or by inferring from the GRanges
itself. Previously for Ensembl GTF files, the latter is what happened, because GenomeInfoDb
didn't support GRCh38.
> Seqinfo(genome = "GRCh38")
Error in fetchSequenceInfo(genome) : genome "GRCh38" is not supported
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
But in the current version, GRCh38 is now supported but has different unplaced scaffolds than what we get from an Ensembl GTF, so when assigning the Seqinfo
to the GRanges
we get a bunch of NA
values for the mis-matching unplaced scaffolds.
When using AnnotationHub in parallel with job schedular (OpenGridEngine and Slurm), whether a job can be calculated was stochastical and some jobs were failed with the error attempt to write a readonly database
.
I don't know why but since some jobs can be calculated,
by performing the job queue, again and again, I could finally get the output.
As an alternative approach, by not using the job schedular,
I could perform the calculation without any error,
but it takes an enormous amount of time, so I want to know the reason for the error.
I wrote an Snakefile as follows:
rule all:
input:
expand('{id}.RData', id=list(range(24)))
rule ah:
output:
'{id}.RData'
log:
'logs/{id}.log'
shell:
'Rscript ah.R {id} >& {log}'
wrote the R-script (ah.R) as follows:
outfile = commandArgs(trailingOnly=TRUE)[1]
library("AnnotationHub")
ah = AnnotationHub()
id = names(ah)[sample(length(ah), 1)]
out = ah[[id]]
save(out, file=outfile)
and finally performed the snakemake workflow as follows:
# OpenGridEngine
snakemake -j 24 --cluster "qsub -S /bin/bash"
or
# Slurm
snakemake -j 24 --cluster sbatch
The error message was as follows:
Loading required package: BiocFileCache
Loading required package: dbplyr
snapshotDate(): 2019-10-29
downloading 1 resources
retrieving 1 resource
Error: failed to load resource
name: AH39322
title: E031-H3K23me2.imputed.pval.signal.bigwig
reason: 1 resources failed to download
In addition: Warning messages:
1: Couldn't set cache size: attempt to write a readonly database
Use `cache_size` = NULL to turn off this warning.
2: Couldn't set synchronous mode: attempt to write a readonly database
Use `synchronous` = NULL to turn off this warning.
3: download failed
hub path: ‘https://annotationhub.bioconductor.org/fetch/44762’
cache resource: ‘AH39322 : 44762’
reason: attempt to write a readonly database
Execution halted
I am hoping to integrate your great tool into my repetitive element analysis pipeline, but it seems like some RepeatMasker files (hosted at UCSC) are missing from the cache. Namely, the hg38 repeatmasker file. Would it be possible to add hgdownload.cse.ucsc.edu/goldenpath/hg38/database/rmsk.txt.gz
to the cache?
For instance, AlphaMissense data is licensed with CC BY-NC-SA 4.0
, so users should be made aware of this when using the data (or derived from it).
One solution would be to add a table to the schema, with fields 'hub id' and 'license`, and introduce code into AnnotationHub (and ExperimentHub?) that prompted the user to accept the license before downloading to the local cache. This would be backward compatible. Only licensed data would require an entry. It would not be too invasive for users, with the prompt only on initial access.
It might be useful to provide some way to accept the license in a non-interactive way, e.g., an environment variable ANNOTATION_HUB_ACCEPT_<license>
where <license>
might be CC_BY_NC_SA_4.0
and / or as an argument to [[
.
Hi,
I can't retrieve CpGs file for hg38. can someone pls point me in the right direction ?
thanks
#### get CpG locations for hg38
library(AnnotationHub)
hub_hg38 <- AnnotationHub()
query(hub_hg38, c("cpg","hg38"))
AnnotationHub with 0 records
# snapshotDate(): 2021-10-20
Hello,
The exception that has been hardcoded for QFeatures
objects should no longer be needed since QFeatures>=1.5.2
(rformassspectrometry/QFeatures@1558b05).
cf add0887
Sorry for the inconvenience.
We typically set the AnnotationHub cache to be local to a project. This lets any user with access to the directory run the code and use the same cache. This was definitely working in 2.22.0.
In 3.2.0 (but possibly earlier versions, I haven't looked closely), this is no longer possible. I think it's due to a lockfile remaining around.
The quick test to reproduce the issue is this:
mkdir -p cache
# as user1
Rscript -e "AnnotationHub::AnnotationHub(cache='cache')"
# switch to user2, who is in the same group as user 1, and run the same thing
Rscript -e "AnnotationHub::AnnotationHub(cache='cache')"
The latter command gives the error:
Error in lock(.sql_lock_path(dbfile), exclusive = FALSE) :
Cannot open lock file: Permission denied
Calls: <Anonymous> ... tryCatch -> tryCatchList -> .sql_connect_RO -> lock
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'conn' in selecting a method for function 'dbDisconnect': object 'info' not found
Calls: <Anonymous> ... .sql_disconnect -> dbDisconnect -> .handleSimpleError -> h
Execution halted
Inspecting the cache directory, I see:
-rw------- 1 user1 staff 0 Jan 16 16:32 BiocFileCache.sqlite.LOCK
-rw-rw---- 1 user1 staff 105M Jan 16 16:32 10491703f35c_annotationhub.sqlite3
-rw-r----- 1 user1 staff 20K Jan 16 16:32 BiocFileCache.sqlite
-rw-rw---- 1 user1 staff 1.7M Jan 16 16:32 104953737007_104953737007_hub_index.rds
So the lock file is still around and it's not group-readable which is likely giving the error.
I confirmed that in 2.22.0 the lock file does not remain and that the above test works without errors.
Any ideas on how to resolve this? Perhaps a temporary fix is to wrap AnnotationHub::AnnotationHub()
in a new function that cleans up the lockfile before returning the ah
.
I was unable to pin down where exactly this was happening, and pinging @lshep since this might be in BiocFileCache instead.
Hi Bioconductor team, I'm now seeing this error with the new dbplyr 2.4.0 update:
> packageVersion("AnnotationHub")
[1] ‘3.8.0’
> packageVersion("dbplyr")
[1] ‘2.4.0’
> AnnotationHub::AnnotationHub()
Error in `collect()`:
! Failed to collect lazy table.
Caused by error in `db_collect()`:
! Arguments in `...` must be used.
✖ Problematic argument:
• ..1 = Inf
ℹ Did you misspell an argument name?
Backtrace:
▆
1. ├─AnnotationHub::AnnotationHub()
2. │ └─AnnotationHub::.Hub(...)
3. │ └─AnnotationHub:::.create_cache(...)
4. │ └─BiocFileCache::BiocFileCache(cache = cache, ask = ask)
5. │ └─BiocFileCache:::.sql_create_db(bfc)
6. │ └─BiocFileCache:::.sql_validate_version(bfc)
7. │ └─BiocFileCache:::.sql_schema_version(bfc)
8. │ ├─base::tryCatch(...)
9. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
10. │ └─tbl(src, "metadata") %>% collect(Inf)
11. ├─dplyr::collect(., Inf)
12. └─dbplyr:::collect.tbl_sql(., Inf)
13. ├─base::tryCatch(...)
14. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
15. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
16. │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
17. └─dbplyr::db_collect(x$src$con, sql, n = n, warn_incomplete = warn_incomplete, ...)
18. └─rlang (local) `<fn>`()
19. └─rlang:::check_dots(env, error, action, call)
20. └─rlang:::action_dots(...)
21. ├─base (local) try_dots(...)
22. └─rlang (local) action(...)
Best,
Mike
Hello there.
Could you please check if ensembl database of GRCh38 for human species is available? I could only reach the GRCh37 genome build.
> AnnotationHub::query(ah, pattern = c("EnsDb","Homo sapiens","GRCh38"))
AnnotationHub with 26 records
# snapshotDate(): 2024-04-30
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass,
# tags, rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'
title
AH53211 | Ensembl 87 EnsDb for Homo Sapiens
AH53715 | Ensembl 88 EnsDb for Homo Sapiens
AH56681 | Ensembl 89 EnsDb for Homo Sapiens
AH57757 | Ensembl 90 EnsDb for Homo Sapiens
AH60773 | Ensembl 91 EnsDb for Homo Sapiens
... ...
AH104864 | Ensembl 107 EnsDb for Homo sapiens
AH109336 | Ensembl 108 EnsDb for Homo sapiens
AH109606 | Ensembl 109 EnsDb for Homo sapiens
AH113665 | Ensembl 110 EnsDb for Homo sapiens
AH116291 | Ensembl 111 EnsDb for Homo sapiens
> AnnotationHub::query(ah, pattern = c("EnsDb","Homo sapiens"))
AnnotationHub with 27 records
# snapshotDate(): 2024-04-30
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass,
# tags, rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'
title
AH53211 | Ensembl 87 EnsDb for Homo Sapiens
AH53715 | Ensembl 88 EnsDb for Homo Sapiens
AH56681 | Ensembl 89 EnsDb for Homo Sapiens
AH57757 | Ensembl 90 EnsDb for Homo Sapiens
AH60773 | Ensembl 91 EnsDb for Homo Sapiens
... ...
AH109336 | Ensembl 108 EnsDb for Homo sapiens
AH109606 | Ensembl 109 EnsDb for Homo sapiens
AH113665 | Ensembl 110 EnsDb for Homo sapiens
AH116291 | Ensembl 111 EnsDb for Homo sapiens
AH116860 | Ensembl 112 EnsDb for Homo sapiens
See also in this issue
Hello,
I wanted to extract annotations of interest; however, I ran into an issue that failed to load resources. Has anyone encountered a similar issue? Any suggestion to solve this issue?
mouse_ens[["AH113713"]]
downloading 1 resources
retrieving 1 resource
|======================================================================| 100%
Error: failed to load resource
name: AH113713
title: Ensembl 110 EnsDb for Mus musculus
reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
web resource path: 'https://annotationhub.bioconductor.org/fetch/120459'
local file path: '/Users/huhlab/Library/Caches/org.R-project.R/R/AnnotationHub/161d7316a8937_120459'
reason: Conflict (HTTP 409).
2: bfcadd() failed; resource removed
rid: BFC8
fpath: 'https://annotationhub.bioconductor.org/fetch/120459'
reason: download failed
3: download failed
hub path: 'https://annotationhub.bioconductor.org/fetch/120459'
cache resource: 'AH113713 : 120459'
reason: bfcadd() failed; see warnings()
Hello,
R CMD check
fails on Linux ARM64 with the following output:
R CMD check AnnotationHub_3.7.3.tar.gz
* using log directory ‘/home/biocbuild/git/AnnotationHub.Rcheck’
* using R Under development (unstable) (2023-03-12 r83975)
* using platform: aarch64-unknown-linux-gnu (64-bit)
* R was compiled by
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GNU Fortran (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
* running under: Ubuntu 22.04.2 LTS
* using session charset: UTF-8
* checking for file ‘AnnotationHub/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘AnnotationHub’ version ‘3.7.3’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘AnnotationHub’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking startup messages can be suppressed ... OK
* checking dependencies in R code ... WARNING
'::' or ':::' imports not declared from:
‘CompoundDb’ ‘ensembldb’ ‘keras’
Unexported objects imported by ':::' calls:
‘BiocFileCache:::.get_tbl_rid’ ‘S4Vectors:::selectSome’
See the note in ?`:::` about the use of this operator.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... WARNING
checkRd: (5) AnnotationHub-class.Rd:131-139: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:148-151: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:152-155: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:156-159: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:160-163: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:164-168: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:169-175: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:176-210: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:212-214: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:216-218: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:220-222: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:231-235: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:236-240: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:241-247: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:248-269: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:270-278: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:279-284: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:291-295: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:296-300: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:301-312: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:313-316: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:317-320: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:321-328: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:329-336: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:337-342: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:349-353: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:354-358: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-deprecated.Rd:28-34: \item in \describe must have non-empty label
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... WARNING
Functions or methods with usage in documentation object 'AnnotationHub-deprecated' but not in code:
‘display’
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... ERROR
Running examples in ‘AnnotationHub-Ex.R’ failed
The error most likely occurred in:
> ### Name: AnnotationHub-objects
> ### Title: AnnotationHub objects and their related methods and functions
> ### Aliases: class:AnnotationHub AnnotationHub-class class:Hub Hub-class
> ### .Hub AnnotationHub refreshHub mcols,Hub-method cache cache,Hub-method
> ### cache,AnnotationHub-method cache<- cache<-,Hub-method hubUrl
> ### hubUrl,Hub-method hubCache hubCache,Hub-method hubDate
> ### hubDate,Hub-method package package,Hub-method removeCache isLocalHub
> ### isLocalHub,Hub-method isLocalHub<- isLocalHub<-,Hub-method
> ### possibleDates snapshotDate snapshotDate,Hub-method snapshotDate<-
> ### snapshotDate<-,Hub-method removeResources
> ### removeResources,missing-method removeResources,character-method
> ### dbconn,Hub-method dbfile,Hub-method .db_close recordStatus
> ### recordStatus,Hub-method length,Hub-method names,Hub-method
> ### fileName,Hub-method $,Hub-method [[,Hub,character,missing-method
> ### [[,Hub,numeric,missing-method [,Hub,character,missing-method
> ### [,Hub,logical,missing-method [,Hub,numeric,missing-method
> ### [<-,Hub,character,missing,Hub-method
> ### [<-,Hub,logical,missing,Hub-method [<-,Hub,numeric,missing,Hub-method
> ### subset,Hub-method query query,Hub-method as.list.Hub
> ### as.list,Hub-method c,Hub-method show,Hub-method
> ### show,AnnotationHubResource-method
> ### Keywords: classes methods
>
> ### ** Examples
>
> ## create an AnnotationHub object
> library(AnnotationHub)
> ah = AnnotationHub()
snapshotDate(): 2023-03-21
>
> ## Summary of available records
> ah
AnnotationHub with 69798 records
# snapshotDate(): 2023-03-21
# $dataprovider: Ensembl, BroadInstitute, UCSC, ftp://ftp.ncbi.nlm.nih.gov/g...
# $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Bos taurus,...
# $rdataclass: GRanges, TwoBitFile, BigWigFile, EnsDb, Rle, OrgDb, ChainFile...
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH5012"]]'
title
AH5012 | Chromosome Band
AH5013 | STS Markers
AH5014 | FISH Clones
AH5015 | Recomb Rate
AH5016 | ENCODE Pilot
... ...
AH111330 | Zonotrichia_albicollis.Zonotrichia_albicollis-1.0.1.109.gtf
AH111331 | Zosterops_lateralis_melanops.ASM128173v1.109.abinitio.gtf
AH111332 | Zosterops_lateralis_melanops.ASM128173v1.109.gtf
AH111333 | UCSC RepeatMasker annotations (Oct2022) for Human (hg38)
AH111334 | MassBank CompDb for release 2022.12.1
>
> ## Detail for a single record
> ah[1]
AnnotationHub with 1 record
# snapshotDate(): 2023-03-21
# names(): AH5012
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# $rdatadateadded: 2013-03-26
# $title: Chromosome Band
# $description: GRanges object from UCSC track 'Chromosome Band'
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: UCSC track
# $sourceurl: rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/hg19/database...
# $sourcesize: NA
# $tags: c("cytoBand", "UCSC", "track", "Gene", "Transcript",
# "Annotation")
# retrieve record with 'object[["AH5012"]]'
>
> ## and what is the date we are using?
> snapshotDate(ah)
[1] "2023-03-21"
>
> ## how many resources?
> length(ah)
[1] 69798
>
> ## from which resources, is data available?
> head(sort(table(ah$dataprovider), decreasing=TRUE))
Ensembl
34906
BroadInstitute
18248
UCSC
11193
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
1871
Haemcode
945
FANTOM5,DLRP,IUPHAR,HPRD,STRING,SWISSPROT,TREMBL,ENSEMBL,CELLPHONEDB,BADERLAB,SINGLECELLSIGNALR,HOMOLOGENE
501
>
> ## from which species, is data available ?
> head(sort(table(ah$species),decreasing=TRUE))
Homo sapiens Mus musculus Drosophila melanogaster
26554 1809 459
Bos taurus Rattus norvegicus Pan troglodytes
332 326 318
>
> ## what web service and local cache does this AnnotationHub point to?
> hubUrl(ah)
[1] "https://annotationhub.bioconductor.org"
> hubCache(ah)
[1] "/home/biocbuild/.cache/R/AnnotationHub"
>
> ### Examples ###
>
> ## One can search the hub for multiple strings
> ahs2 <- query(ah, c("GTF", "77","Ensembl", "Homo sapiens"))
>
> ## information about the file can be retrieved using
> ahs2[1]
AnnotationHub with 1 record
# snapshotDate(): 2023-03-21
# names(): AH28812
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: GRanges
# $rdatadateadded: 2015-03-25
# $title: Homo_sapiens.GRCh38.77.gtf
# $description: Gene Annotation for Homo sapiens
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: GTF
# $sourceurl: ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sap...
# $sourcesize: 44454526
# $tags: c("GTF", "ensembl", "Gene", "Transcript", "Annotation")
# retrieve record with 'object[["AH28812"]]'
>
> ## one can further extract information from this show method
> ## like the sourceurl using:
> ahs2$sourceurl
[1] "ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz"
> ahs2$description
[1] "Gene Annotation for Homo sapiens"
> ahs2$title
[1] "Homo_sapiens.GRCh38.77.gtf"
>
> ## We can download a file by name like this (using a list semantic):
> gr <- ahs2[[1]]
loading from cache
require(“GenomicRanges”)
Error: failed to load resource
name: AH28812
title: Homo_sapiens.GRCh38.77.gtf
reason: error in evaluating the argument 'x' in selecting a method for function 'get': error reading from connection
Execution halted
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
Running ‘runTests.R’
OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ...
‘AnnotationHub-HOWTO.Rmd’... OK
‘AnnotationHub.Rmd’ using ‘UTF-8’... OK
‘TroubleshootingTheCache.Rmd’ using ‘UTF-8’... OK
NONE
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE
Status: 1 ERROR, 3 WARNINGs
See
‘/home/biocbuild/git/AnnotationHub.Rcheck/00check.log’
for details.
Any idea what could be the problem ?
Hi,
Nice package!
Thanks for this package,
I could remove the dependency of biomaRt,
which is slow and unstable.
By the way, I found that when the species is not vertebrate,
Ensembl ID cannot be retrieved from AnnotationHub.
For example, when OrgDb is about Homo sapiens,
columns function returns "ENSEMBL", "ENSEMBLPROT", "ENSEMBLTRANS".
library("AnnotationHub")
ah <- AnnotationHub()
# Vertebrate (Homo sapiens)
hs <- query(ah, c("OrgDb", "Homo sapiens"))[[1]]
columns(hs)
However, when the species is not vertebrate,
"ENSEMBL", "ENSEMBLPROT", "ENSEMBLTRANS" are not available.
# EnsemblPlants: http://plants.ensembl.org/index.html
at <- query(ah, c("OrgDb", "Arabidopsis thaliana"))[[1]]
columns(at)
# EnsemblFungi : https://fungi.ensembl.org/index.html
sc <- query(ah, c("OrgDb", "Saccharomyces cerevisiae"))[[1]]
columns(sc)
# EnsemblMetazoa : https://metazoa.ensembl.org/index.html
ce <- query(ah, c("OrgDb", "Caenorhabditis elegans"))[[1]]
columns(ce)
# EnsemblProtists : https://protists.ensembl.org
lm <- query(ah, c("OrgDb", "Leishmania major"))[[1]]
columns(lm)
# EnsemblBacteria: https://bacteria.ensembl.org/index.html
pa <- query(ah, c("OrgDb", "Pseudomonas aeruginosa PAO1"))[[1]]
columns(pa)
Is this related to that these databases are separated as different databases from the original Ensembl database?
i've run the following code,
ah <- AnnotationHub() human_ens <- query(ah, c("Homo sapiens", "EnsDb")) human_ens <- human_ens[["AH75011"]] annotations_ahb <- genes(human_ens, return.type = "data.frame") # where the error occurs
Full error message:
Error: bad_weak_ptr
Traceback:
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] ensembldb_2.24.1 AnnotationFilter_1.24.0 GenomicFeatures_1.52.2
[4] AnnotationDbi_1.64.1 Biobase_2.60.0 GenomicRanges_1.52.0
[7] GenomeInfoDb_1.36.1 IRanges_2.34.1 S4Vectors_0.38.1
[10] msigdbr_7.5.1 tidyr_1.3.0 dplyr_1.1.3
[13] AnnotationHub_3.8.0 BiocFileCache_2.8.0 dbplyr_2.3.3
[16] BiocGenerics_0.46.0 fgsea_1.26.0
loaded via a namespace (and not attached):
[1] jsonlite_1.8.7 magrittr_2.0.3
[3] BiocIO_1.10.0 zlibbioc_1.46.0
[5] vctrs_0.6.3 memoise_2.0.1
[7] Rsamtools_2.16.0 RCurl_1.98-1.12
[9] base64enc_0.1-3 htmltools_0.5.6
[11] S4Arrays_1.2.0 progress_1.2.2
[13] curl_5.0.2 parallelly_1.36.0
[15] lubridate_1.9.2 cachem_1.0.8
[17] uuid_1.1-1 GenomicAlignments_1.36.0
[19] mime_0.12 lifecycle_1.0.3
[21] pkgconfig_2.0.3 Matrix_1.6-1
[23] R6_2.5.1 fastmap_1.1.1
[25] GenomeInfoDbData_1.2.10 MatrixGenerics_1.12.2
[27] future_1.33.0 shiny_1.7.5
[29] digest_0.6.33 colorspace_2.1-0
[31] RSQLite_2.3.3 filelock_1.0.2
[33] progressr_0.14.0 fansi_1.0.4
[35] timechange_0.2.0 httr_1.4.7
[37] abind_1.4-5 compiler_4.3.1
[39] bit64_4.0.5 BiocParallel_1.34.2
[41] DBI_1.1.3 biomaRt_2.56.1
[43] rappdirs_0.3.3 DelayedArray_0.26.6
[45] rjson_0.2.21 tools_4.3.1
[47] interactiveDisplayBase_1.38.0 httpuv_1.6.11
[49] future.apply_1.11.0 glue_1.6.2
[51] restfulr_0.0.15 promises_1.2.1
[53] grid_4.3.1 pbdZMQ_0.3-10
[55] generics_0.1.3 gtable_0.3.4
[57] data.table_1.14.8 hms_1.1.3
[59] sp_2.0-0 xml2_1.3.5
[61] utf8_1.2.3 XVector_0.40.0
[63] BiocVersion_3.17.1 pillar_1.9.0
[65] stringr_1.5.0 babelgene_22.9
[67] IRdisplay_1.1 later_1.3.1
[69] lattice_0.21-8 rtracklayer_1.60.1
[71] bit_4.0.5 tidyselect_1.2.0
[73] Biostrings_2.70.1 ProtGenerics_1.32.0
[75] SummarizedExperiment_1.30.2 timeDate_4022.108
[77] matrixStats_1.0.0 stringi_1.7.12
[79] lazyeval_0.2.2 yaml_2.3.7
[81] evaluate_0.21 codetools_0.2-19
[83] tibble_3.2.1 BiocManager_1.30.22
[85] cli_3.6.1 IRkernel_1.3.2
[87] xtable_1.8-4 repr_1.1.6
[89] munsell_0.5.0 Rcpp_1.0.11
[91] globals_0.16.2 png_0.1-8
[93] XML_3.99-0.14 parallel_4.3.1
[95] ellipsis_0.3.2 ggplot2_3.4.3
[97] blob_1.2.4 prettyunits_1.1.1
[99] bitops_1.0-7 listenv_0.9.0
[101] scales_1.2.1 SeuratObject_4.1.3
[103] purrr_1.0.2 crayon_1.5.2
[105] rlang_1.1.1 cowplot_1.1.1
[107] fastmatch_1.1-4 KEGGREST_1.42.0
Hello, I have an issue setting up annotationHub for use in a VM. I am getting this error whenever I try to download a package from the hub using my R script:
snapshotDate(): 2022-10-31
Error in value[3L] : failed to create index
hubCache(): /usr/share/httpd/.cache/R/AnnotationHub
reason: cannot open the connection
Calls: ... tryCatch -> tryCatchList -> tryCatchOne ->
In addition: Warning message:
In gzfile(file, mode) :
cannot open compressed file '/usr/share/httpd/.cache/R/AnnotationHub/b797a59351bdd_b797a59351bdd_hub_index.rds', probable reason 'Permission denied'
Execution halted
I have enabled full access to .cache, R, and AnnotationHub directories and I'm still getting permission denied errors. Also, this .rds file does not exist within the AnnotationHub directory when list the directories contents. However, I've manually downloaded the hub object in R outside of the script and have been able to do that successfully. This is where the local cache so I'm not too sure why the .rds file is not showing up. Any suggestions or help would be greatly appreciated.
Best wishes
Could the error message:
Error: package or namespace load failed for ‘coMethDMR’:
.onLoad failed in loadNamespace() for 'coMethDMR', details:
call: NULL
error: Corrupt Cache: index file
See AnnotationHub's TroubleshootingTheCache vignette section on corrupt cache
cache: /home/biocbuild/.cache/R/ExperimentHub
filename: experimenthub.index.rds
Error: loading failed
Execution halted
ERROR: loading failed
use the title of the vignette ("Troubleshooting The Hubs") instead of the name of the Rmd or HTML file? This would make the vignette easier to find on the package landing page where the former is displayed. Note that the former is also what gets displayed by browseVignettes(package="AnnotationHub")
.
Ideally the title and Rmd file should be the same or very similar (e.g. Troubleshooting_The_Hubs.Rmd
) to avoid confusion.
Thanks,
H.
P.S.: Looks like a leftover from #12
There is a wrong AHid match. In the description, "AH116340" is an EnsDb for Mus musculus based on Ensembl version 111, but when I tried to load it and see its information, I found that "AH116340" is actually an EnsDb for Homo sapiens based on Ensembl version 105.
log:
select
AnnotationHub with 13 records
snapshotDate(): 2023-10-23
$dataprovider: Ensembl
$species: Mus musculus
$rdataclass: EnsDb
additional mcols(): taxonomyid, genome, description, coordinate_1_based,
maintainer, rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype
retrieve records with, e.g., 'object[["AH116325"]]'
title
AH116325 | Ensembl 111 EnsDb for Mus musculus
AH116326 | Ensembl 111 EnsDb for Mus musculus
AH116327 | Ensembl 111 EnsDb for Mus musculus
AH116328 | Ensembl 111 EnsDb for Mus musculus
AH116329 | Ensembl 111 EnsDb for Mus musculus
... ...
AH116334 | Ensembl 111 EnsDb for Mus musculus
AH116335 | Ensembl 111 EnsDb for Mus musculus
AH116336 | Ensembl 111 EnsDb for Mus musculus
AH116337 | Ensembl 111 EnsDb for Mus musculus
AH116340 | Ensembl 111 EnsDb for Mus musculus
select$description
[1] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[2] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[3] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[4] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[5] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[6] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[7] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[8] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[9] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[10] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[11] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[12] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
[13] "Gene and protein annotations for Mus musculus based on Ensembl version 111."
select$species
[1] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
[7] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
[13] "Mus musculus"
select$genome
[1] "129S1_SvImJ_v1" "A_J_v1" "AKR_J_v1" "BALB_cJ_v1" "C3H_HeJ_v1"
[6] "C57BL_6NJ_v1" "CBA_J_v1" "DBA_2J_v1" "FVB_NJ_v1" "LP_J_v1"
[11] "NOD_ShiLtJ_v1" "NZO_HlLtJ_v1" "GRCm39"
edb<- ah[["AH116340"]]
loading from cache
edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.7
|Creation time: Sat Dec 18 14:48:15 2021
|ensembl_version: 105
|ensembl_host: localhost
|Organism: Homo sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 2.2
| No. of genes: 69329.
| No. of transcripts: 268255.
|Protein data available.
Occasionally when using AnnotationHub I get this warning following a database query:
Warning message:
call dbDisconnect() when finished working with a connection
I noticed that this is popping up in some of the Bioconductor documentation pages as well – Google search. Is there a way to either suppress or fix this warning? It seems to be related to SQL queries not closing out correctly.
BiocManager::version()
[1] ‘3.17’
packageVersion("AnnotationHub")
[1] ‘3.8.0’
hub <- AnnotationHub::AnnotationHub()
Cannot connect to AnnotationHub server, using 'localHub=TRUE' instead
Using 'localHub=TRUE'
If offline, please also see BiocManager vignette section on offline use
Error in .updateHubDB(hub_bfc, .class, url, proxy, localHub) :
Invalid Cache: sqlite file
Hub has not been added to cache
Run again with 'localHub=FALSE'
Who can help me? Thank you!
Hi,
When I try "hub <- AnnotationHub()"
I got this error:
" Error: failed to connect to local data base
database: ‘/Users/anil/.AnnotationHub/annotationhub.sqlite3’
reason: object 'isDevel' not found"
Could you help me?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.