y1zhou / brendadb Goto Github PK
View Code? Open in Web Editor NEWLoad and query the BRENDA database in R.
Home Page: https://bioconductor.org/packages/release/bioc/html/brendaDb.html
License: Other
Load and query the BRENDA database in R.
Home Page: https://bioconductor.org/packages/release/bioc/html/brendaDb.html
License: Other
BiocManager::install("brendaDb", dependencies=TRUE)
Bioconductor version 3.9 (BiocManager 1.30.10), R 3.6.0 (2019-04-26)
Installing package(s) 'brendaDb'
Warning message:
"package ‘brendaDb’ is not available (for R version 3.6.0) "
Can this be resolved? Your package appears to do exactly what I need.
Some gene symbols correspond to multiple Ensembl IDs, and tibble
would complain in this scenario.
Steps to reproduce the behavior:
brendaDb::BiocycPathwayGenes(pathway = "TRYPTOPHAN-DEGRADATION-1")
Found 15 genes in HUMAN pathway TRYPTOPHAN-DEGRADATION-1.
Error: Tibble columns must have consistent lengths, only values of length one are recycled:
* Length 15: Columns `BiocycGene`, `BiocycProtein`, `Symbol`
* Length 17: Column `Ensembl`
Either duplicate values in the other columns, or concatenate Ensembl IDs corresponding to the same gene into one entry.
Is your feature request related to a problem? Please describe.
Only RefIDs are given in the ExtractField()
function.
Describe the solution you'd like
Get reference titles and PubMed IDs as well in the returned table.
Describe alternatives you've considered
A separate function (e.g. FetchReference()
) that operates on the returned table of ExtractField()
and gets the information. This should be doable since all we need is the EC number and the reference ID.
Additional context
NA.
I am getting problems when using a list of ECs in QueryBrenda().
It appears like all the numbers different from 1.1.1.1 are getting deleted as "invalid EC number(s)".
I tried to manually check on BRENDA if those number are indeed actually invalid but that is not the case (e.g. 1.1.1.262)...
Any inputs on how to solve this issue?
The description
column in the protein
table has duplicated organisms because the Uniprot IDs weren't removed from the description
column. For example:
df <- ReadBrenda(system.file("extdata", "brenda_download_test.txt",
package = "brendaDb"))
x <- QueryBrenda(df, EC = "1.1.1.1")
x$nomenclature$protein[order(x$nomenclature$protein$description), ]
# A tibble: 164 x 5
proteinID description uniprot commentary refID
<list> <chr> <chr> <chr> <list>
1 <chr [1]> Acetobacter pasteurianus NA NA <chr [1]>
2 <chr [1]> Acinetobacter calcoaceticus NA NA <chr [1]>
3 <chr [1]> Aeropyrum pernix NA NA <chr [2]>
4 <chr [1]> Aeropyrum pernix Q9Y9P9 UniProt Q9Y9P9 NA <chr [3]>
5 <chr [1]> Alligator mississippiensis NA NA <chr [1]>
6 <chr [1]> Anastrepha fraterculus NA NA <chr [1]>
7 <chr [1]> Anastrepha obliqua NA NA <chr [1]>
8 <chr [1]> Arabidopsis thaliana NA NA <chr [1]>
9 <chr [1]> Aspergillus nidulans NA NA <chr [1]>
10 <chr [1]> Avena sativa NA NA <chr [1]>
# … with 154 more rows
Rows 3 and 4 are proteins from the same organism.
Most of the fieldInfo
columns are NA
s in the ParseGeneric()
function. Removing the column would reduce the size of the brenda.query
object, and won't impact the parsing speed significantly (it's already pretty slow).
Two possible implementations:
brenda.query
objects, but ignore the NA
tables when printingsimplify.res = T
Now the QueryBrenda
function returns all possible fields in the table; a lot of the times we only want information from a certain subset of the fields, e.g. the optimal pH of the enzyme(s).
A lot of the text in the readme file could be reused in the package vignette.
Some edge cases exist for EC numbers in the text file:
1.1.1.286 ()
1.1.1.109 (transferred to EC 1.3.1.28)
1.1.1.5 (transferred to EC 1.1.1.303 and EC 1.1.1.304)
1.1.1.89 (deleted, included in EC 1.1.1.86)
1.1.1.293 (deleted. This enzyme was already in the Enzyme List as EC 1.1.1.206, tropine dehydrogenase so EC 1.1.1.293 has been withdrawn at the public-review stage.)
6.1.1.8 (deleted)
Sorry for first time writing a issue...
2-Butyl-4-[(2,2-dimethyl-1-methylcarbamoyl-propylamino)-hydroxy-methyl]-6-{4'-[(N-methyl-aminooxy)-methyl]-biphenyl-4-yl}-hexanoic acid
in origin brenda_download file,2-Butyl-4-[(2 2-dimethyl-1-methylcarbamoyl-propylamino)-hydroxy-methyl]-6--hexanoic acid
Steps to reproduce the behavior:
library(brendaDb)
brenda.filepath = DownloadBrenda()
df = ReadBrenda(brenda.filepath)
res=QueryBrenda(df,EC='3.4.24.17',organisms = "Mus musculus")
View(res$`3.4.24.17`$interactions$inhibitors)
A clear and concise description of what you expected to happen.
One of compound name output in description column should be:
2-Butyl-4-[(2,2-dimethyl-1-methylcarbamoyl-propylamino)-hydroxy-methyl]-6-{4'-[(N-methyl-aminooxy)-methyl]-biphenyl-4-yl}-hexanoic acid
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux
Matrix products: default
BLAS: /usr/lib/libblas.so.3.10.0
LAPACK: /opt/miniconda3/lib/libmkl_intel_lp64.so.1
locale:
[1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=zh_CN.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=zh_CN.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] brendaDb_1.7.0 stringr_1.4.0 KEGGREST_1.33.0 reticulate_1.20 biomaRt_2.49.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 lattice_0.20-44 tidyr_1.1.3 prettyunits_1.1.1 png_0.1-7 Biostrings_2.61.1
[7] assertthat_0.2.1 digest_0.6.27 utf8_1.2.1 BiocFileCache_2.1.1 R6_2.5.0 GenomeInfoDb_1.29.3
[13] stats4_4.1.0 evaluate_0.14 RSQLite_2.2.7 httr_1.4.2 pillar_1.6.1 zlibbioc_1.39.0
[19] rlang_0.4.11 progress_1.2.2 curl_4.3.2 rstudioapi_0.13 blob_1.2.1 S4Vectors_0.31.0
[25] Matrix_1.3-4 rmarkdown_2.9 BiocParallel_1.27.2 RCurl_1.98-1.3 bit_4.0.4 xfun_0.24
[31] compiler_4.1.0 pkgconfig_2.0.3 BiocGenerics_0.39.1 htmltools_0.5.1.1 tidyselect_1.1.1 tibble_3.1.2
[37] GenomeInfoDbData_1.2.6 IRanges_2.27.0 XML_3.99-0.6 fansi_0.5.0 crayon_1.4.1 dplyr_1.0.7
[43] dbplyr_2.1.1 bitops_1.0-7 rappdirs_0.3.3 grid_4.1.0 jsonlite_1.7.2 lifecycle_1.0.0
[49] DBI_1.1.1 magrittr_2.0.1 cli_3.0.1 stringi_1.7.3 cachem_1.0.5 XVector_0.33.0
[55] xml2_1.3.2 ellipsis_0.3.2 filelock_1.0.2 generics_0.1.0 vctrs_0.3.8 tools_4.1.0
[61] bit64_4.0.5 Biobase_2.53.0 glue_1.4.2 purrr_0.3.4 hms_1.1.0 yaml_2.2.1
[67] parallel_4.1.0 fastmap_1.1.0 AnnotationDbi_1.55.1 memoise_2.0.0 knitr_1.33
Add any other context about the problem here.
Some UniProt IDs in the text file don't follow the standard regex [OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}
, and are not detected as of v0.2.3
. Some example cases are:
o87873
UniProt <6>AB600997
UniProt <125>Q9wtl4
SwissProt <407>A0A14OJW76
UniProt <80>Join other tables with the nomenclature.protein
table to get the organism, and also the bibliography.reference
table to get the references.
The goal is to input a Biocyc pathway ID (e.g. SERSYN-PWY
for serine biosynthesis (phosphorylated route), and return all the enzymes in that pathway, as well as the brenda.query
results.
Is your feature request related to a problem? Please describe.
Current queries take a long time because all fields are constructed in the return object even if they are not part of the desired query.
Describe the solution you'd like
Skip fields that shouldn't be queried.
Describe alternatives you've considered
At least provide an option to remove these fields to reduce the memory taken.
Additional context
None.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.