Code Monkey home page Code Monkey logo

rwikipathways's Introduction

R Client Package for WikiPathways

BioC Release Build Status - Bioconductor Release Build

BioC Dev Build Status - Bioconductor Dev Build

R Client library for the WikiPathways API (https://webservice.wikipathways.org/) (license: MIT).

WikiPathays is described in the following papers:

If you like this package, or want to make it easier to work with Xrefs, then you may also like these R packages:

Getting Started

How to install

Official bioconductor releases (recommended)

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("rWikiPathways")

Note: Be sure to use the latest Bioconductor and recommended R version

Unstable development code from this repo (at your own risk)

install.packages("devtools")
library(devtools)
install_github('wikipathways/rWikiPathways', build_vignettes=TRUE)
library(rWikiPathways)

Troubleshooting

  1. If you see this error on a Mac: make: gfortran-4.8: No such file or directory, then try reinstalling R via homebrew: brew update && brew reinstall r
    • warning: this make take ~30 minutes

How to contribute

This is a public, open source project. Come on in! You can contribute at multiple levels:

  • Report an issue or feature request
  • Fork and make pull requests
  • Contact current WikiPathways developers and inquire about joining the team

Development

install.packages("devtools")
install.packages("roxygen2") 
library(devtools,roxygen2)
devtools::install_github("AlexanderPico/docthis")
library(docthis)
setwd("/git/wikipathways/rWikiPathways") #customize to your setup
devtools::document()
devtools::check(vignettes = F)
BiocCheck::BiocCheck('./')

Testing

Unit tests are a crucial tool in software development. Be sure to add tests for any new methods implemented. These will be run as part of the devtools::check().

Updating site

We use pkgdown to generate the main site for rWikiPathways based on this README, metadata, man pages and vignettes. If you make changes to any of these, please take a moment to regenerate the site:

library(pkgdown)
pkgdown::build_site()

Bioconductor

While this is the primary development repository for the rWikiPathways project, we also make regular pushes to official bioconductor repository (devel & release) from which the official releases are generated. This is the correct repo for all coding and bug reporting interests. The tagged releases here correspond to the bioconductor releases via a manual syncing process. The devel branch here corresponds to the latest code in development and not yet released.

git commit -m "informative commit message"
git push origin devel
git push upstream devel

http://bioconductor.org/developers/how-to/git/push-to-github-bioc/

Following each bioconductor release, a RELEASE_#_# branch is created. The new branch is fetched and devel is updated:

git fetch upstream
git checkout -b RELEASE_3_15 upstream/RELEASE_3_15
git push origin RELEASE_3_15
git checkout devel
git pull upstream devel
git push origin devel

Only bug fixes and documentation updates can be pushed to the official bioconductor release branch. After committing and pushing fixes to devel, then:

git checkout RELEASE_3_15
git cherry-pick devel #for lastest commit
# or git cherry-pick 1abc234 #for specific commit
# or git cherry-pick 1abc234^..5def678 #for an inclusive range
# bump release version in DESCRIPTION
git commit -am 'version bump'
git push origin RELEASE_3_15
# double check changes, and then...
git push upstream RELEASE_3_15
git checkout devel
# bump dev version in DESCRIPTION
git commit -am 'version bump'
git push origin devel
git push upstream devel

And then finally, bump version and commit DESCRIPTION to devel and push to origin and upstream.

https://bioconductor.org/developers/how-to/git/bug-fix-in-release-and-devel/

Vignettes

When adding or updating vignettes, consider the following tips for consistency:

  • Copy/paste the header from an existing rWikiPathways vignette, including the global knitr options
  • Number the VignetteIndexEntry names w.r.t. other vignettes (this determines their presentation order)
  • Avoid spaces in Rmd filenames; causes CHECK errors
  • When ready, run Knit to html_vignette_ and review the generated html
  • Note: you don't need to save the html version; it will be generated anew at Bioconductor.
  • In the end, you should just have an Rmd version of each vignette in the repo.

rwikipathways's People

Contributors

alexanderpico avatar egonw avatar hpages avatar jwokaty avatar khanspers avatar link-ny avatar mkutmon avatar nturaga avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rwikipathways's Issues

downloadPathwaysArchive doesn't work with redirects

Our plan to migrate data.wikipathways.org content to Toolforge required a Web Redirect (called Page Rule at Cloudflare) so that we (and everyone else) can keep using the same data.wikipathways.or URL.

The current downloadPathwaysArchive function, however, breaks when scraping the files in a dir.

The solution is to first RCurl::getURL with followlocation=TRUE and then pass that to readHTMLTable.

GMT parsing is missing

We shouldn't rely on clusterProfiler for read.gmt. It's a basic function and our version could include the % parsing that is unique to our files.

Suggestion: Add readPathwayGMT()

How to extract just metabolic subset of genes?

I am interested in downloading metabolic enzymes from pathways. For example in the omega3 senescence pathway (https://www.wikipathways.org/pathways/WP5424.html) there are various genes that are not directly linked to metabolism, including p21. I think it it should be possible to identify metabolism genes using all genes involved in conversion MIM interactions? Is there a method of just extracting these genes as opposed to all genes in the pathway using the R package?

Thanks

how to metabolite enrichment analysis

The rWikiPathways vignettes tell us how to do a gene enrichment analysis by the gmt file, which give the gene lists per pathway.

Now I want to do the enrichment analysis based on the metabolome data. Is there a metabolites list file like gmt?

downloadPathwayArchive() not working

From Tina:

rWikiPathways::downloadPathwayArchive() is currently not working. I get an error:

Error in function (type, msg, asError = TRUE) : SSL certificate problem: certificate has expired

downloadPathwayArchive requires XML

From: Gabriel, https://github.com/egonw/rwikipathways/issues/7

Hi @AlexanderPico,

Error

I was unable to execute this code in a fresh R session:

library(rWikiPathways)
downloadPathwayArchive(
  organism = "Homo sapiens", format = "gmt"
)

This code returned the following error:

Error in readHTMLTable(paste0("http://data.wikipathways.org/current/",  : 
  could not find function "readHTMLTable"

Current Workaround

I called library(XML) before the downloadPathwayArchive() and it worked fine.

loading of BridgeDbR does not happen authomatically when using getXrefList()

In this vignette, Section 4, when not explicitly loading BridgeDbR first, we get an error with getXrefList():

> library(rWikiPathways)
> getXrefList('WP554', getSystemCode('Ensembl'))
Error in getSystemCode("Ensembl") : 
  could not find function "getSystemCode"
> library(BridgeDbR)
Loading required package: rJava

Attaching package:BridgeDbRThe following object is masked frompackage:methods:

    getProperties

> getXrefList('WP554', getSystemCode('Ensembl'))
 [1] "ENSG00000092009" "ENSG00000100448" "ENSG00000100739" "ENSG00000105329"
 [5] "ENSG00000113889" "ENSG00000130234" "ENSG00000130368" "ENSG00000135744"
 [9] "ENSG00000143839" "ENSG00000144891" "ENSG00000151623" "ENSG00000159640"
[13] "ENSG00000164867" "ENSG00000168398" "ENSG00000179142" "ENSG00000180772"
[17] "ENSG00000182220"

getXrefList does not work correctly for WP3620

getXrefList does not work for https://www.wikipathways.org/index.php/Pathway:WP3620
WP3620 has 148 metabolites, but getXrefList does not return those metabolite IDs.
In particular, this pathway contains many KNApSAcK IDs, but I got none of them.
If this issue is resolved, an exhaustive MSEA(Metabolite Set Enrichment Analysis) can be easily implemented.

I knew about this issue, so I implemented Biopax parsing script for Wikipathways (The Biopax of WP3620 had the KNApSAcK IDs).
https://github.com/afukushima/MSEAp/blob/master/R/read.wikipathways.R
But the Biopax does not have non-primary IDs, so I hope this issue is resolved.

> getXrefList('WP3620', 'Ca')
[1] "13306-05-3" "480-20-6"   "5071-40-9"  "520-18-3"   "58124-18-8" "73692-50-9"
> getXrefList('WP3620', 'Ce')
 [1] "15401"       "15413"       "27843"       "28499"       "52047"       "CHEBI:15401" "CHEBI:15413" "CHEBI:1813" 
 [9] "CHEBI:27843" "CHEBI:28499" "CHEBI:4567"  "CHEBI:52047"
> getXrefList('WP3620', 'Ch')
[1] "HMDB0005801" "HMDB0029631" "HMDB0040314" "HMDB05801"   "HMDB29631"   "HMDB40314"  
> getXrefList('WP3620', 'Ck')
[1] "C00223" "C00974" "C05903" "C05905" "C06561" "C15567"
> getXrefList('WP3620', 'Cpc')
[1] "11954208" "122850"   "128861"   "21932272" "5280863"  "5280960"  "90657396"
> getXrefList('WP3620', 'Cks')
[1] ""
> getXrefList('WP3620', 'Wd')
[1] "Q27089427" "Q27123131" "Q393336"   "Q417606"   "Q956217"  
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rWikiPathways_1.0.0  BiocInstaller_1.30.0

loaded via a namespace (and not attached):
[1] httr_1.3.1     compiler_3.5.0 R6_2.2.2       tools_3.5.0    curl_3.2       RJSONIO_1.3-0  caTools_1.17.1 bitops_1.0-6  
> 

getXrefList(...) method is broken

For a few months, I've struggled with getting this method to work (first issues relating to getting geneset names that later resolved itself, and now issues with getting the actual genes in the genesets), which is strange because the issues started after the last commit.

The code I use is:

# Get metadata.
pw <- listPathways("Mus musculus")

# Compile pathways.
gs <- list()
for (i in pw$id)
{
  print(i)
  gs[[length(gs)+1]] <- getXrefList(i, "L")
  print(gs[[length(gs)]])
}

This code used to work for me, but now it produces incomplete returns like this (I only copied the first few outputs):

[1] "WP1"
[1] ""
[1] "WP10"
 [1] "11651"  "12702"  "14784"  "16186"  "16198"  "16199"  "16367"  "16451"  "16453"  "18708"  "19247"  "20416"  "20846" 
[14] "20848"  "20850"  "20851"  "26395"  "26396"  "26413"  "26417"  "269523" "384783" "54721"  "81601" 
[1] "WP103"
[1] ""
[1] "WP108"
[1] ""
[1] "WP113"
[1] ""
[1] "WP116"
[1] ""
[1] "WP1241"
[1] ""
[1] "WP1242"
[1] ""
[1] "WP1243"
[1] ""
[1] "WP1244"
[1] ""

I was wondering if it had to do with calling bridgedb's datasources repo, as their datasources.tsv file has been moved from the path described in location in getXrefList(...)'s help page.

Please tell me if you need more information!

migrate away from RJSONIO to rjson

The reason is that the first uses a header file from json.org which has a MIT-no-evil license and therefore is only in Debian non-free. After we moved, our R package can go into Debian. cc @tillea

How to extract edges from GPML file ?

I’m trying to figure out how to extract edges from a network using the GPML file . I’m using the example of WP1589.gpml. I think I need to extract the information from the <Interaction> tag. These tags have <Point> child tags which I'm guessing will give me information about the edges.

Question 1 - See screenshot. If you see [[9]] and [[10]] in the list, they each have two <Point> tags that could represent the an edge from the “From node” to the “To node” . But see [[11]] in the list - it has 3 <Point> tags, and one <Anchor> tag. Which would be “From node” node and “To node” ? Can you clarify this ?
Link to screenshot here: https://drive.google.com/file/d/1nl3GnqLSbnQR5asWIkPqgcJKbjCh7tsH/view?usp=sharing

Question 2 - Attached is a csv file of the nodes I extracted with the help of Martin Morgan’s code in Slack. I found some GraphIDs (example “bf654”) which are in the <Interaction> section, but not defined previously in <DataNodes> section. So don’t know if this is a gene, or metabolite or anything else. How do I get around this ?
Link to csv file here: https://drive.google.com/file/d/1maGS4Im6jzkO2Tz1_4W9LkSTzpkp_N9T/view?usp=sharing

igraph compatible?

Hi, thank you for your work!
I was wondering if the data could be extracted in a format compatible with igraph.
Hope my question it's not too naive.
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.