Code Monkey home page Code Monkey logo

neo4r's Introduction

Travis-CI Build Status lifecycle

Disclaimer: this package is still at an experimental level and under active development. You should only use it for testing, reporting bugs (which are to be expected), proposing changes to the code, requesting features, sending ideas… As long as this package is in “Experimental” mode, there might be bugs, and changes to the API are to be expected. Read the NEWS.md to be informed of the last changes.

neo4r

The goal of {neo4r} is to provide a modern and flexible Neo4J driver for R.

It’s modern in the sense that the results are returned as tibbles whenever possible, it relies on modern tools, and it is designed to work with pipes. Our goal is to provide a driver that can be easily integrated in a data analysis workflow, especially by providing an API working smoothly with other data analysis ({dplyr} or {purrr}) and graph packages ({igraph}, {ggraph}, {visNetwork}…).

It’s flexible in the sense that it is rather unopinionated regarding the way it returns the results, by trying to stay as close as possible to the way Neo4J returns data. That way, you have the control over the way you will compute the results. At the same time, the result is not too complex, so that the “heavy lifting” of data wrangling is not left to the user.

The connexion object is also an easy to control R6 method, allowing you to update and query information from the API.

Installation

You can install {neo4r} from GitHub with:

# install.packages("devtools")
devtools::install_github("neo4j-rstats/neo4r")

Create a connexion object

Start by creating a new connexion object with neo4j_api$new

library(neo4r)
con <- neo4j_api$new(url = "http://localhost:7474", 
                     user = "plop", password = "pouetpouet")

This connexion object is designed to interact with the Neo4J API.

It comes with some methods to retrieve information from it :

# Test the endpoint, that will not work :
con$ping()
#> [1] 401

Being an R6 object, con is flexible in the sense that you can change url, user and password at any time:

con$reset_user("neo4j")
con$ping()
#> [1] 200
# Or with 
con$password <- "pouetpouet"

That means you can connect to another url at any time without having to create a new connexion object. (con$reset_url()).

# Get Neo4J Version
con$get_version()
#> [1] "3.3.3"
# List constaints (if any)
con$get_constraints()
#> # A tibble: 3 x 3
#>   label      type       property_keys
#>   <chr>      <chr>      <chr>        
#> 1 Maintainer UNIQUENESS name         
#> 2 Author     UNIQUENESS name         
#> 3 Package    UNIQUENESS name
# Get a vector of labels (if any)
con$get_labels()
#> # A tibble: 5 x 1
#>   labels    
#>   <chr>     
#> 1 album     
#> 2 Author    
#> 3 Maintainer
#> 4 artist    
#> 5 Package
# Get a vector of relationships (if any)
con$get_relationships()
#> # A tibble: 2 x 1
#>   relationships
#>   <chr>        
#> 1 MAINTAINS    
#> 2 has_recorded
# Get schema 
con$get_schema()
#> # A tibble: 3 x 2
#>   label      property_keys
#>   <chr>      <chr>        
#> 1 Package    name         
#> 2 Author     name         
#> 3 Maintainer name

Call the API

You can either create a separate query or insert it inside the call_api function.

The call_api() function takes several arguments :

  • query : the cypher query
  • con : the connexion object
  • type : “rows” or “graph”: wether to return the results as a list of results in tibble, or as a graph object (with $nodes and $relationships)
  • output : the output format (R or json)
  • include_stats : whether or not to include the stats about the call
  • meta : wether or not to include the meta arguments of the nodes when calling with “rows”

It will be possible to write queries and pipe them with {cyphersugar} at the end of the developping process of all the planned packages. {cyphersugar} will offer a syntactic sugar on top of Cypher.

“rows” format

The user chooses wether or not to return a list of tibbles when calling the API. You get as many objects as specified in the RETURN cypher statement.

library(magrittr)

'MATCH (p:Package) RETURN p.name AS nom LIMIT 5' %>%
  call_api(con)
#> $nom
#> # A tibble: 5 x 1
#>   value      
#>   <chr>      
#> 1 A3         
#> 2 abbyyR     
#> 3 abc        
#> 4 ABCanalysis
#> 5 abc.data

By default, results are returned as an R list of tibbles. We think this is the more “truthful” way to implement the outputs regarding Neo4J calls.

For example, when you want to return two nodes, you’ll get two results, in the form of two tibbles (p.name and dep.name here):

'MATCH (p:Package) <-[:MAINTAINS]-(main:Maintainer) RETURN p.name AS nom, main.name AS maintainer LIMIT 5' %>%
  call_api(con)
#> $nom
#> # A tibble: 5 x 1
#>   value   
#>   <chr>   
#> 1 A3      
#> 2 A3      
#> 3 abbyyR  
#> 4 abbyyR  
#> 5 abc.data
#> 
#> $maintainer
#> # A tibble: 5 x 1
#>   value             
#>   <chr>             
#> 1 scott fortmann-roe
#> 2 scott fortmann-roe
#> 3 gaurav sood       
#> 4 gaurav sood       
#> 5 blum michael

The result is a two elements list with each element being labelled the way it has been specified in the Cypher query.

Results can also be returned in JSON:

'MATCH (p:Package) <-[:MAINTAINS]-(main:Maintainer) RETURN p.name AS nom, main.name AS maintainer LIMIT 5' %>%
  call_api(con, output = "json")
#> [
#>   [
#>     {
#>       "row": [
#>         ["A3"],
#>         ["scott fortmann-roe"]
#>       ],
#>       "meta": [
#>         {},
#>         {}
#>       ]
#>     },
#>     {
#>       "row": [
#>         ["A3"],
#>         ["scott fortmann-roe"]
#>       ],
#>       "meta": [
#>         {},
#>         {}
#>       ]
#>     },
#>     {
#>       "row": [
#>         ["abbyyR"],
#>         ["gaurav sood"]
#>       ],
#>       "meta": [
#>         {},
#>         {}
#>       ]
#>     },
#>     {
#>       "row": [
#>         ["abbyyR"],
#>         ["gaurav sood"]
#>       ],
#>       "meta": [
#>         {},
#>         {}
#>       ]
#>     },
#>     {
#>       "row": [
#>         ["abc.data"],
#>         ["blum michael"]
#>       ],
#>       "meta": [
#>         {},
#>         {}
#>       ]
#>     }
#>   ]
#> ]

If you turn the type argument to "graph", you’ll get a graph result:

'MATCH p=()-[r:MAINTAINS]->() RETURN p LIMIT 5' %>%
  call_api(con, type = "graph")
#> $nodes
#> # A tibble: 6 x 3
#>   id    label     properties
#>   <chr> <list>    <list>    
#> 1 0     <chr [1]> <list [5]>
#> 2 1     <chr [1]> <list [1]>
#> 3 2     <chr [1]> <list [5]>
#> 4 3     <chr [1]> <list [1]>
#> 5 4     <chr [1]> <list [5]>
#> 6 5     <chr [1]> <list [1]>
#> 
#> $relationships
#> # A tibble: 5 x 5
#>   id    type      startNode endNode properties
#>   <chr> <chr>     <chr>     <chr>   <list>    
#> 1 1320  MAINTAINS 1         0       <list [0]>
#> 2 0     MAINTAINS 1         0       <list [0]>
#> 3 1321  MAINTAINS 3         2       <list [0]>
#> 4 1     MAINTAINS 3         2       <list [0]>
#> 5 1322  MAINTAINS 5         4       <list [0]>
#> 
#> attr(,"class")
#> [1] "neo"  "list"

The result is returned as one node or relationship by row.

Due to the specific data format of Neo4J, there can be more than one label and property by node and relationship. That’s why the results is returned, by design, as a list-dataframe.

We have designed several functions to unnest the output :

  • unnest_nodes, that can unnest a node dataframe :

res <- 'MATCH p=()-[r:MAINTAINS]->() RETURN p LIMIT 5' %>%
  call_api(con, type = "graph")
unnest_nodes(res$nodes)
#> # A tibble: 6 x 7
#>   id    label      date       license            name   title      version
#>   <chr> <chr>      <chr>      <chr>              <chr>  <chr>      <chr>  
#> 1 0     Package    2015-08-15 GPL (>= 2)         A3     "Accurate… 1.0.0  
#> 2 1     Maintainer <NA>       <NA>               scott… <NA>       <NA>   
#> 3 2     Package    NA         MIT + file LICENSE abbyyR Access to… 0.5.1  
#> 4 3     Maintainer <NA>       <NA>               gaura… <NA>       <NA>   
#> 5 4     Package    2015-05-04 GPL (>= 3)         abc    Tools for… 2.1    
#> 6 5     Maintainer <NA>       <NA>               blum … <NA>       <NA>

Please, note that this function will return NA for the properties that aren’t in a node. For example here, we have no ‘licence’ information for the Maintainer node (that makes sense).

<<<<<<< HEAD On the long run, and this is not {neo4r} specific but Neo4J related, a good practice is to have a “name” propertie on each node, so this column will be full here. ======= On the long run, and this is not {neo4r} specific by Neo4J related, a good practice is to point out a “name” property on each node, so this column will be full here. >>>>>>> 063e7f2e732a5f214a8e552bb1ce0729a5396f1a

Also, it is possible to unnest either the properties or the labels :

res$nodes %>%
  unnest_nodes(what = "properties")
#> # A tibble: 6 x 7
#>   id    label     date       license            name    title      version
#>   <chr> <list>    <chr>      <chr>              <chr>   <chr>      <chr>  
#> 1 0     <chr [1]> 2015-08-15 GPL (>= 2)         A3      "Accurate… 1.0.0  
#> 2 1     <chr [1]> <NA>       <NA>               scott … <NA>       <NA>   
#> 3 2     <chr [1]> NA         MIT + file LICENSE abbyyR  Access to… 0.5.1  
#> 4 3     <chr [1]> <NA>       <NA>               gaurav… <NA>       <NA>   
#> 5 4     <chr [1]> 2015-05-04 GPL (>= 3)         abc     Tools for… 2.1    
#> 6 5     <chr [1]> <NA>       <NA>               blum m… <NA>       <NA>
res$nodes %>%
  unnest_nodes(what = "label")
#> # A tibble: 6 x 3
#>   id    properties label     
#>   <chr> <list>     <chr>     
#> 1 0     <list [5]> Package   
#> 2 1     <list [1]> Maintainer
#> 3 2     <list [5]> Package   
#> 4 3     <list [1]> Maintainer
#> 5 4     <list [5]> Package   
#> 6 5     <list [1]> Maintainer
  • unnest_relationships

There is only one nested column in the relationship table, thus the function is quite straightforward :

unnest_relationships(res$relationships)
#> # A tibble: 5 x 5
#>   id    type      startNode endNode properties
#>   <chr> <chr>     <chr>     <chr>   <chr>     
#> 1 1320  MAINTAINS 1         0       <NA>      
#> 2 0     MAINTAINS 1         0       <NA>      
#> 3 1321  MAINTAINS 3         2       <NA>      
#> 4 1     MAINTAINS 3         2       <NA>      
#> 5 1322  MAINTAINS 5         4       <NA>
  • unnest_graph

This function takes a graph results, and does unnest_nodes and unnest_relationships.

unnest_graph(res)
#> $nodes
#> # A tibble: 6 x 7
#>   id    label      date       license            name   title      version
#>   <chr> <chr>      <chr>      <chr>              <chr>  <chr>      <chr>  
#> 1 0     Package    2015-08-15 GPL (>= 2)         A3     "Accurate… 1.0.0  
#> 2 1     Maintainer <NA>       <NA>               scott… <NA>       <NA>   
#> 3 2     Package    NA         MIT + file LICENSE abbyyR Access to… 0.5.1  
#> 4 3     Maintainer <NA>       <NA>               gaura… <NA>       <NA>   
#> 5 4     Package    2015-05-04 GPL (>= 3)         abc    Tools for… 2.1    
#> 6 5     Maintainer <NA>       <NA>               blum … <NA>       <NA>   
#> 
#> $relationships
#> # A tibble: 5 x 5
#>   id    type      startNode endNode properties
#>   <chr> <chr>     <chr>     <chr>   <chr>     
#> 1 1320  MAINTAINS 1         0       <NA>      
#> 2 0     MAINTAINS 1         0       <NA>      
#> 3 1321  MAINTAINS 3         2       <NA>      
#> 4 1     MAINTAINS 3         2       <NA>      
#> 5 1322  MAINTAINS 5         4       <NA>      
#> 
#> attr(,"class")
#> [1] "neo"  "list"

Convert for common graph packages

Unless otherwise specified, the function carries out an unnest_graph before being transformed into a graph object.

{igraph}

In order to be converted into a graph object,

  • nodes need an id, and a name. By default, node name is assumed to be found in the “name” property returned by the graph, specifying any other column is allowed. The “label” column from Neo4J is renamed “group”.

  • relationships need a start and an end, i.e. startNode and endNode in the Neo4J results.

res %>%
  convert_to("igraph")
#> IGRAPH adc1501 DN-- 6 5 -- 
#> + attr: name (v/c), group (v/c), date (v/c), license (v/c), title
#> | (v/c), version (v/c), type (e/c), id (e/c), properties (e/x)
#> + edges from adc1501 (vertex names):
#> [1] scott fortmann-roe->A3     scott fortmann-roe->A3    
#> [3] gaurav sood       ->abbyyR gaurav sood       ->abbyyR
#> [5] blum michael      ->abc

Which means that you can :

'MATCH p=()-[r:MAINTAINS]->() RETURN p LIMIT 5' %>%
  call_api(con, type = "graph") %>% 
  convert_to("igraph") %>%
  plot()

This can also be used with {ggraph} :

library(ggraph)
#> Loading required package: ggplot2
'MATCH p=()-[r:MAINTAINS]->() RETURN p LIMIT 10' %>%
  call_api(con, type = "graph") %>% 
  convert_to("igraph") %>%
  ggraph() + 
  geom_node_label(aes(label = name, color = group)) +
  geom_edge_link() + 
  theme_graph()
#> Using `nicely` as default layout

{visNetwork}

network <- 'MATCH p=()-[r:MAINTAINS]->() RETURN p LIMIT 10' %>%
  call_api(con, type = "graph") %>% 
  convert_to("visNetwork")
visNetwork::visNetwork(network$nodes, network$relationships)

Sending data to the API

You can simply send queries has we have just seen, by writing the cypher query and call the api.

Sending an R data.frame

  • as_nodes turns a dataframe into a series of nodes :

// Coming soon

  • as_relationships

// Coming soon

Reading and sending a cypher file :

  • read_cypher reads a cypher file and returns a tibble of all the calls
read_cypher("data-raw/create.cypher")
#> # A tibble: 53 x 1
#>    cypher                                                                 
#>    <chr>                                                                  
#>  1 CREATE CONSTRAINT ON (p:Band) ASSERT p.name IS UNIQUE;                 
#>  2 CREATE CONSTRAINT ON (p:City) ASSERT p.name IS UNIQUE;                 
#>  3 CREATE CONSTRAINT ON (p:record) ASSERT p.name IS UNIQUE;               
#>  4 CREATE CONSTRAINT ON (p:artist) ASSERT p.name IS UNIQUE;               
#>  5 CREATE (ancient:Band {name: 'Ancient' ,formed: 1992}), (acturus:Band {…
#>  6 CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE;               
#>  7 MATCH (band:Band) WHERE band.formed < 1995 RETURN *;                   
#>  8 MATCH (b:Band) WHERE b.formed = 1990 RETURN *;                         
#>  9 MATCH (b:Band {formed: 1990}) RETURN *;                                
#> 10 MATCH (b:Band) WHERE b.formed < 1995 RETURN *;                         
#> # ... with 43 more rows
  • send_cypher reads a cypher file, and send it the the API. By default, the stats are returned.
send_cypher("data-raw/constraints.cypher", con)

Sending csv dataframe to Neo4J

The load_csv_with_headers sends an csv from an url to the Neo4J browser.

The args are :

  • on_load : the code to execute on load
  • con : the connexion object
  • url : the url of the csv to send
  • header : wether or not the csv has a header
  • periodic_commit : the volume for PERIODIC COMMIT
  • as : the AS argument for LOAD CSV
  • format : the format of the result
  • include_stats : whether or not to include the stats
  • meta : whether or not to return the meta information
# Create the constraints
call_api("CREATE CONSTRAINT ON (a:artist) ASSERT a.name IS UNIQUE;", con)
call_api("CREATE CONSTRAINT ON (al:album) ASSERT al.name IS UNIQUE;", con)
# List constaints (if any)
con$get_constraints()
#> # A tibble: 3 x 3
#>   label      type       property_keys
#>   <chr>      <chr>      <chr>        
#> 1 Maintainer UNIQUENESS name         
#> 2 Author     UNIQUENESS name         
#> 3 Package    UNIQUENESS name
# Create the query that will create the nodes and relationships
on_load_query <- 'MERGE (a:artist { name: csvLine.artist})
MERGE (al:album {name: csvLine.album_name})
MERGE (a) -[:has_recorded] -> (al)  
RETURN a AS artists, al AS albums;'
# Send the csv 
load_csv(url = "https://raw.githubusercontent.com/ThinkR-open/datasets/master/tracks.csv", 
         con = con, header = TRUE, periodic_commit = 50, 
         as = "csvLine", on_load = on_load_query)
#> $artists
#> # A tibble: 2,367 x 1
#>    name           
#>    <chr>          
#>  1 Eminem         
#>  2 Eurythmics     
#>  3 Queen          
#>  4 The Police     
#>  5 A$AP Rocky     
#>  6 Tears For Fears
#>  7 Foals          
#>  8 Bag Raiders    
#>  9 Bright Eyes    
#> 10 Bob Dylan      
#> # ... with 2,357 more rows
#> 
#> $albums
#> # A tibble: 2,367 x 1
#>    name                           
#>    <chr>                          
#>  1 Curtain Call (Deluxe)          
#>  2 Sweet Dreams (Are Made Of This)
#>  3 The Game (2011 Remaster)       
#>  4 Synchronicity (Remastered)     
#>  5 LONG.LIVE.A$AP (Deluxe Version)
#>  6 Songs From The Big Chair       
#>  7 Holy Fire                      
#>  8 Bag Raiders (Deluxe)           
#>  9 I'm Wide Awake, It's Morning   
#> 10 Highway 61 Revisited           
#> # ... with 2,357 more rows
#> 
#> $stats
#> # A tibble: 12 x 2
#>    type                  value
#>    <chr>                 <dbl>
#>  1 contains_updates         0.
#>  2 nodes_created            0.
#>  3 nodes_deleted            0.
#>  4 properties_set           0.
#>  5 relationships_created    0.
#>  6 relationship_deleted     0.
#>  7 labels_added             0.
#>  8 labels_removed           0.
#>  9 indexes_added            0.
#> 10 indexes_removed          0.
#> 11 constraints_added        0.
#> 12 constraints_removed      0.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

neo4r's People

Contributors

colinfay avatar dianebeldame avatar statnmap avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.