Code Monkey home page Code Monkey logo

graphs-without-ontologies's Introduction

Graphs without Ontologies

Using Neo4j to Explore Nascent Research Networks

A presentation by Clifford Anderson, Suellen Stringer-Hye and Ed Warga to the 2015 VIVO Conference in Cambridge, MA.

Introduction

This repository contains the code and data we used for our presentation at the 2015 VIVO Conference. This readme document describes how to use the code to extract a dataset from an OAI-PMH compliant institutional repository, ingest the data into Neo4j, and analyze patterns using the Cypher query language.

Abstract

Graphs are the fundamental concept behind both linked data and the semantic web. Research networks are, like all social networks, essentially graph structures. We argue that exploring research networks simply as graphs–without the complexities of the semantic web stack–provides an easy way to start visualizing and analyzing them.

Neo4j is an open source graph database that makes it straightforward to represent data as vertices and edges. Neo4j provides a simple, declarative query language called Cypher, allowing users to readily explore and ask questions of large graph datasets.

Our talk presents graph theory as distinct from both linked data and the semantic web. We provide a quick introduction to Neo4j and the Cypher query language. Using those tools, we then visualize and analyze the data in a prominent institutional repository as a scholarly network. We show how to use graph databases as a kind of scratch pad to assist with representing emergent research networks.

Using the Code

The following instructions illustrate how to extract data from a OAI-compliant institutional repository and load that data into Neo4j. For general information about the OAI-PMH protocol, see the Open Archives Initiative website.

Prerequisites

  • BaseX: BaseX is an open source XML database. We use XQuery and BaseX to harvest metadata and to output CSVs in the required format. You may download BaseX here.
  • Neo4j: We also presume you have already installed Neo4j, an open source graph database, on your system. If not, download a copy of Neo4J before proceeding.

Fork and clone the graphs-without-ontologies repository. The XQuery directory contains the code to harvest metadata from an OAI compliant repository and transform it into CSV documents, which can then be loaded into Neo4j using Cypher. The get-OAI-data.xquery file will harvest records and add them to your BaseX database.

First, create new, empty BaseX database called "OAI". Run the get-OAI-data script, adjusting the base url, metadata prefix, and set-spec variables as necessary.

NB: the db:add function works on Mac OSX, but does not work on some Windows systems. If experiencing problems when using a Windows operating system, replace the add:db function with file:append-text, save your results to the file system, and then import the XML documents into your database from the file system.

Next, run each of the other scripts in the XQuery directory. These will output CSV documents you can use to load the data into Neo4j. Replace GitHub with the path to the forked repository on your machine. Use the Cypher code in the GraphData/readme.md to load the data into Neo4j.

Once you have the data in Neo4j, you can run the Cypher queries in the Cypher/readme.md to explore relations between the data.

graphs-without-ontologies's People

Contributors

cliffordanderson avatar suellenstringer-hye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.