Code Monkey home page Code Monkey logo

cite-network's Introduction

Overview

DOI

This repository contains several tools for citation network construction and analysis, originally developed for the project described in bibliometrics.pdf. The folders contain a series of Python 3 scripts, intended to be used sequentially as follows:

Flow diagram for data sources, files, and scripts

  • get_dois: Starting with a xml file exported from EndNote, extract a list of DOIs and a search string that can be copied and pasted directly into Scopus' advanced search box.

  • In Scopus, manually retrieve "generation 1." (See the project description for an explanation for this term.)

  • scrape: Starting with the csv file for generation 1, retrieve the desired metadata.

    • scrape.get_meta_by_doi, which actually builds the query for the Scopus API, assumes that an API key has been defined as MY_API_KEY in api_key.py. A new Scopus API key can be generated by registering for free on the Scopus API page.
    • My project required retrieving metadata for something like 30-50,000 articles.
    • Since the Scopus API being used has a limit cap of something like 2,000 articles per week, I contacted Scopus to arrange for a limit cap raise. It took a few weeks to negotiate the cap raise.
    • The raised cap was still too low to retrieve all of the required metadata in one run. The module batch.py was written to break the retrieval list into manageable chunks.
    • run_scrape.py actually works through the metadata retrieval process.
  • build_net: Using the metadata retrieved from Scopus, build citation and coauthor networks. Each of the resulting graphml files contains a single connected network.

    • Installing graph_tool is nontrivial. However, especially if compiled with the --enable-openmp flag, it is significantly faster than any of the other major Python network analysis packages.
  • analyze_net: Using the graphml files and two "comparison networks," conduct the actual network analysis.

    • The "comparison networks" are citation networks grabbed from arXiv, with papers from January 1993 to April 2003. They can be found here and here.
  • ida.R: IMO, Python is better for manipulating complex data structures, but R has better tools for generating publication-quality tables and plots, and a nicer interactive IDE. This R file helps us do this with the graphml files generated by analyze_net.

  • In review, an ad hoc text analysis was added of the core paper abstracts. This analysis, and supplement text, are found in text analysis.

cite-network's People

Contributors

dhicks avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.