wikipathways / rwikipathways Goto Github PK

View Code? Open in Web Editor NEW

14.0 15.0 6.0 9.34 MB

R package for WikiPathways API

License: MIT License

R 5.17% HTML 94.54% TeX 0.29%

bioinformatics data-access pathways rpackage

rwikipathways's Introduction

R Client Package for WikiPathways

- Bioconductor Release Build

- Bioconductor Dev Build

R Client library for the WikiPathways API (https://webservice.wikipathways.org/) (license: MIT).

WikiPathays is described in the following papers:

2016 NAR paper by Kutmon et al.
2018 NAR paper by Slenter et al.
2021 NAR paper by Martens et al.

If you like this package, or want to make it easier to work with Xrefs, then you may also like these R packages:

Getting Started

How to install

Official bioconductor releases (recommended)

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("rWikiPathways")

Note: Be sure to use the latest Bioconductor and recommended R version

Unstable development code from this repo (at your own risk)

install.packages("devtools")
library(devtools)
install_github('wikipathways/rWikiPathways', build_vignettes=TRUE)
library(rWikiPathways)

Troubleshooting

If you see this error on a Mac: make: gfortran-4.8: No such file or directory, then try reinstalling R via homebrew: brew update && brew reinstall r
- warning: this make take ~30 minutes

How to contribute

This is a public, open source project. Come on in! You can contribute at multiple levels:

Report an issue or feature request
Fork and make pull requests
Contact current WikiPathways developers and inquire about joining the team

Development

install.packages("devtools")
install.packages("roxygen2") 
library(devtools,roxygen2)
devtools::install_github("AlexanderPico/docthis")
library(docthis)
setwd("/git/wikipathways/rWikiPathways") #customize to your setup
devtools::document()
devtools::check(vignettes = F)
BiocCheck::BiocCheck('./')

Testing

Unit tests are a crucial tool in software development. Be sure to add tests for any new methods implemented. These will be run as part of the devtools::check().

Updating site

We use pkgdown to generate the main site for rWikiPathways based on this README, metadata, man pages and vignettes. If you make changes to any of these, please take a moment to regenerate the site:

library(pkgdown)
pkgdown::build_site()

Bioconductor

While this is the primary development repository for the rWikiPathways project, we also make regular pushes to official bioconductor repository (devel & release) from which the official releases are generated. This is the correct repo for all coding and bug reporting interests. The tagged releases here correspond to the bioconductor releases via a manual syncing process. The devel branch here corresponds to the latest code in development and not yet released.

git commit -m "informative commit message"
git push origin devel
git push upstream devel

http://bioconductor.org/developers/how-to/git/push-to-github-bioc/

Following each bioconductor release, a RELEASE_#_# branch is created. The new branch is fetched and devel is updated:

git fetch upstream
git checkout -b RELEASE_3_15 upstream/RELEASE_3_15
git push origin RELEASE_3_15
git checkout devel
git pull upstream devel
git push origin devel

Only bug fixes and documentation updates can be pushed to the official bioconductor release branch. After committing and pushing fixes to devel, then:

git checkout RELEASE_3_15
git cherry-pick devel #for lastest commit
# or git cherry-pick 1abc234 #for specific commit
# or git cherry-pick 1abc234^..5def678 #for an inclusive range
# bump release version in DESCRIPTION
git commit -am 'version bump'
git push origin RELEASE_3_15
# double check changes, and then...
git push upstream RELEASE_3_15
git checkout devel
# bump dev version in DESCRIPTION
git commit -am 'version bump'
git push origin devel
git push upstream devel

And then finally, bump version and commit DESCRIPTION to devel and push to origin and upstream.

https://bioconductor.org/developers/how-to/git/bug-fix-in-release-and-devel/

Vignettes

When adding or updating vignettes, consider the following tips for consistency:

Copy/paste the header from an existing rWikiPathways vignette, including the global knitr options
Number the VignetteIndexEntry names w.r.t. other vignettes (this determines their presentation order)
Avoid spaces in Rmd filenames; causes CHECK errors
When ready, run Knit to html_vignette_ and review the generated html
Note: you don't need to save the html version; it will be generated anew at Bioconductor.
In the end, you should just have an Rmd version of each vignette in the repo.

rwikipathways's People

Contributors

Stargazers

Watchers

Forkers

mkutmon jlaw9 egonw nm-warrier zerack5

rwikipathways's Issues

Procedure to remove getColoredPathway function

https://bioconductor.org/developers/how-to/deprecation/

Deprecate in 1.8 ( Apr 2020, r3.11)
Defunct 1.10 (Oct 2020, r3.12)
Remove in 1.12 (Apr 2021, r3.13)

Add R function to highlight genes by name/ID rather than graphID

downloadPathwaysArchive doesn't work with redirects

Our plan to migrate data.wikipathways.org content to Toolforge required a Web Redirect (called Page Rule at Cloudflare) so that we (and everyone else) can keep using the same data.wikipathways.or URL.

The current downloadPathwaysArchive function, however, breaks when scraping the files in a dir.

The solution is to first RCurl::getURL with followlocation=TRUE and then pass that to readHTMLTable.

proper acknowledgement

some of your https://github.com/wikipathways/rWikiPathways/blob/master/vignettes/Pathway-Analysis.Rmd vignette was adopted from clusterProfiler. You need to acknowledge it.

GMT parsing is missing

We shouldn't rely on clusterProfiler for read.gmt. It's a basic function and our version could include the % parsing that is unique to our files.

Suggestion: Add readPathwayGMT()

How to extract just metabolic subset of genes?

I am interested in downloading metabolic enzymes from pathways. For example in the omega3 senescence pathway (https://www.wikipathways.org/pathways/WP5424.html) there are various genes that are not directly linked to metabolism, including p21. I think it it should be possible to identify metabolism genes using all genes involved in conversion MIM interactions? Is there a method of just extracting these genes as opposed to all genes in the pathway using the R package?

Thanks

how to metabolite enrichment analysis

The rWikiPathways vignettes tell us how to do a gene enrichment analysis by the gmt file, which give the gene lists per pathway.

Now I want to do the enrichment analysis based on the metabolome data. Is there a metabolites list file like gmt?

downloadPathwayArchive() not working

From Tina:

rWikiPathways::downloadPathwayArchive() is currently not working. I get an error:

Error in function (type, msg, asError = TRUE) : SSL certificate problem: certificate has expired

update pathway analysis vignette

with new clusterProfilier functions: https://yulab-smu.top/biomedical-knowledge-mining-book/wikipathways-analysis.html

get push rights in the BioC repo

https for downloadPathwayArchive

downloadPathwayArchive requires XML

From: Gabriel, https://github.com/egonw/rwikipathways/issues/7

Hi @AlexanderPico,

Error

I was unable to execute this code in a fresh R session:

library(rWikiPathways)
downloadPathwayArchive(
  organism = "Homo sapiens", format = "gmt"
)

This code returned the following error:

Error in readHTMLTable(paste0("http://data.wikipathways.org/current/",  : 
  could not find function "readHTMLTable"

Current Workaround

I called library(XML) before the downloadPathwayArchive() and it worked fine.

loading of BridgeDbR does not happen authomatically when using getXrefList()

In this vignette, Section 4, when not explicitly loading BridgeDbR first, we get an error with getXrefList():

> library(rWikiPathways)
> getXrefList('WP554', getSystemCode('Ensembl'))
Error in getSystemCode("Ensembl") : 
  could not find function "getSystemCode"
> library(BridgeDbR)
Loading required package: rJava

Attaching package: ‘BridgeDbR’

The following object is masked from ‘package:methods’:

    getProperties

> getXrefList('WP554', getSystemCode('Ensembl'))
 [1] "ENSG00000092009" "ENSG00000100448" "ENSG00000100739" "ENSG00000105329"
 [5] "ENSG00000113889" "ENSG00000130234" "ENSG00000130368" "ENSG00000135744"
 [9] "ENSG00000143839" "ENSG00000144891" "ENSG00000151623" "ENSG00000159640"
[13] "ENSG00000164867" "ENSG00000168398" "ENSG00000179142" "ENSG00000180772"
[17] "ENSG00000182220"

getXrefList does not work correctly for WP3620

getXrefList does not work for https://www.wikipathways.org/index.php/Pathway:WP3620
WP3620 has 148 metabolites, but getXrefList does not return those metabolite IDs.
In particular, this pathway contains many KNApSAcK IDs, but I got none of them.
If this issue is resolved, an exhaustive MSEA(Metabolite Set Enrichment Analysis) can be easily implemented.

I knew about this issue, so I implemented Biopax parsing script for Wikipathways (The Biopax of WP3620 had the KNApSAcK IDs).
https://github.com/afukushima/MSEAp/blob/master/R/read.wikipathways.R
But the Biopax does not have non-primary IDs, so I hope this issue is resolved.

> getXrefList('WP3620', 'Ca')
[1] "13306-05-3" "480-20-6"   "5071-40-9"  "520-18-3"   "58124-18-8" "73692-50-9"
> getXrefList('WP3620', 'Ce')
 [1] "15401"       "15413"       "27843"       "28499"       "52047"       "CHEBI:15401" "CHEBI:15413" "CHEBI:1813" 
 [9] "CHEBI:27843" "CHEBI:28499" "CHEBI:4567"  "CHEBI:52047"
> getXrefList('WP3620', 'Ch')
[1] "HMDB0005801" "HMDB0029631" "HMDB0040314" "HMDB05801"   "HMDB29631"   "HMDB40314"  
> getXrefList('WP3620', 'Ck')
[1] "C00223" "C00974" "C05903" "C05905" "C06561" "C15567"
> getXrefList('WP3620', 'Cpc')
[1] "11954208" "122850"   "128861"   "21932272" "5280863"  "5280960"  "90657396"
> getXrefList('WP3620', 'Cks')
[1] ""
> getXrefList('WP3620', 'Wd')
[1] "Q27089427" "Q27123131" "Q393336"   "Q417606"   "Q956217"  
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rWikiPathways_1.0.0  BiocInstaller_1.30.0

loaded via a namespace (and not attached):
[1] httr_1.3.1     compiler_3.5.0 R6_2.2.2       tools_3.5.0    curl_3.2       RJSONIO_1.3-0  caTools_1.17.1 bitops_1.0-6  
>

getXrefList(...) method is broken

For a few months, I've struggled with getting this method to work (first issues relating to getting geneset names that later resolved itself, and now issues with getting the actual genes in the genesets), which is strange because the issues started after the last commit.

The code I use is:

# Get metadata.
pw <- listPathways("Mus musculus")

# Compile pathways.
gs <- list()
for (i in pw$id)
{
  print(i)
  gs[[length(gs)+1]] <- getXrefList(i, "L")
  print(gs[[length(gs)]])
}

This code used to work for me, but now it produces incomplete returns like this (I only copied the first few outputs):

[1] "WP1"
[1] ""
[1] "WP10"
 [1] "11651"  "12702"  "14784"  "16186"  "16198"  "16199"  "16367"  "16451"  "16453"  "18708"  "19247"  "20416"  "20846" 
[14] "20848"  "20850"  "20851"  "26395"  "26396"  "26413"  "26417"  "269523" "384783" "54721"  "81601" 
[1] "WP103"
[1] ""
[1] "WP108"
[1] ""
[1] "WP113"
[1] ""
[1] "WP116"
[1] ""
[1] "WP1241"
[1] ""
[1] "WP1242"
[1] ""
[1] "WP1243"
[1] ""
[1] "WP1244"
[1] ""

I was wondering if it had to do with calling bridgedb's datasources repo, as their datasources.tsv file has been moved from the path described in location in getXrefList(...)'s help page.

Please tell me if you need more information!

migrate away from RJSONIO to rjson

The reason is that the first uses a header file from json.org which has a MIT-no-evil license and therefore is only in Debian non-free. After we moved, our R package can go into Debian. cc @tillea

Error in function (type, msg, asError = TRUE) : SSL certificate problem: certificate has expired

How to extract edges from GPML file ?

I’m trying to figure out how to extract edges from a network using the GPML file . I’m using the example of WP1589.gpml. I think I need to extract the information from the <Interaction> tag. These tags have <Point> child tags which I'm guessing will give me information about the edges.

Question 1 - See screenshot. If you see [[9]] and [[10]] in the list, they each have two <Point> tags that could represent the an edge from the “From node” to the “To node” . But see [[11]] in the list - it has 3 <Point> tags, and one <Anchor> tag. Which would be “From node” node and “To node” ? Can you clarify this ?
Link to screenshot here: https://drive.google.com/file/d/1nl3GnqLSbnQR5asWIkPqgcJKbjCh7tsH/view?usp=sharing

Question 2 - Attached is a csv file of the nodes I extracted with the help of Martin Morgan’s code in Slack. I found some GraphIDs (example “bf654”) which are in the <Interaction> section, but not defined previously in <DataNodes> section. So don’t know if this is a gene, or metabolite or anything else. How do I get around this ?
Link to csv file here: https://drive.google.com/file/d/1maGS4Im6jzkO2Tz1_4W9LkSTzpkp_N9T/view?usp=sharing

`getColoredPathway` should recognize if the remote API method is offline

@AlexanderPico, the alternative is, of course, the remove the method altogether.

Add emap section to PathwayAnalysis vignette

igraph compatible?

Hi, thank you for your work!
I was wondering if the data could be extracted in a format compatible with igraph.
Hope my question it's not too naive.
Thank you!

findPathwaysByText should return data.frame rather than list?

(This is not bug report. This is feature request.)

I created an automation Rmd for rWikipathways.
https://nrnb.org/gsod2019_kozo_nishida/html_documents/Rmd/wikipathways-app.html

I thought findPathwaysByText should return data.frame rather than list.
How do you think about this?

Multiple colored nodes are not supported

Hi all,

How do we color multiple xrefIDs in a single pathway. I am using the R package, "rWikiPathways".
Any suggestions would be appretiated.