Code Monkey home page Code Monkey logo

dblp-graph-database-neo4j's Introduction

DBLP-graph-database-Neo4j

Neo4j project based on DBLP graph database. Based on an UPC assignment.

TOOLS REQUIRED

DATA There are two ways:

  • generating following the "DOWNLOAD AND CONVERT DBLP DATABASE" procedure or
  • using files in data, equivalent to the first way (scale factor=6000)

DOWNLOAD AND CONVERT DBLP DATABASE The graph has to be loaded from https://dblp.uni-trier.de/xml/ in .xlm format. dblp.xml.gz and dblp.dtd are needed. These two files are to be saved in ~/.../data. The XML file can then be converted in CSV following the instructions at https://github.com/ThomHurks/dblp-to-csv, generating a CSV file that is Neo4j compatible:

  1. obtain a local copy of XMLToCSV.py and save it in ~/.../data
  2. from command line launch the following command: python XMLToCSV.py --annotate --neo4j data/dblp.xml data/dblp.dtd dblp.csv --relations author:authored_by journal:published_in These data should be saved in the Raw directory

GENERATE MISSING DATA The csv do not contain all the necessary data. The preprocessing can be done throw the functions provided in Cleaning and Preparing.

Task A Create the connection to the db and upload the generated nodes and edges in neo4j. The schema is represented in graph schemas/modelA1

Task A.3 Update the data to new schema. The schema is represented in graph schemas/modelA3.

Task B Cypher queries:

  • Find the top 3 most cited papers of each conference.
  • For each conference find its community: i.e., those authors that have published papers on that conference in, at least, 4 different editions.
  • Find the impact factors of the journals in your graph (see https://en.wikipedia. org/wiki/Impact_factor, for the definition of the impact factor).
  • Find the h-indexes of the authors in your graph (see https://en.wikipedia.org/ wiki/H-index, for a definition of the h-index metric).

Task C Need of Data Science library https://neo4j.com/product/graph-data-science/ Application of:

  • Louvain community detection btw authors based on the partecipation at the same conference and
  • similarity to a certain article (given its title) based on the topics it talks about

Task D

  • Research database community, defined through the following keywords: data management, indexing, data modeling, big data, data processing, data storage and data querying
  • Find the conferences and journals related to the database community. If 90% of the papers published in a conference/journal contain one of the keywords of the database community we consider that conference/journal as related to that community.
  • Identify the top 50 papers of these conferences/journals based on the highest page rank provided by the papers of the same community (papers in the conferences/journals of the database community).
  • Identify "gurus", i.e., authors that are authors of, at least, two papers among the top-100 identified.

dblp-graph-database-neo4j's People

Contributors

ferrazzipietro avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.