Code Monkey home page Code Monkey logo

citegraph's Introduction

Citegraph

Generates easily readable citation graphs. This uses the contents of a bibtex file to determine what you're interested in.

This uses the semanticscholar API to fetch references for articles. Since the API is rate-limited and requests are very slow, request results are cached locally, and the exploration algorithm is engineered to make every request count. This is done by computing a degree of interest (DOI) for each known paper, and fetching only the papers we think will improve the graph the most. See here for an explanation of the DOI calculation.

Installation

  • Make sure you have Python 3.6+
  • Also make sure you have Graphviz on your path
  • Download or clone this repo
$ git clone https://github.com/oowekyala/citegraph.git && cd citegraph
  • Make sure you have all the required Python packages:
$ python3 -m pip install -r DEPENDENCIES
  • Add the bin directory to you PATH, or just use bin/citegraph

Usage

Find out the ID of an interesting paper on semanticscholar.org, see for example the highlighted section of this picture:

Paper ID example

Remove spaces, and you can pass that ID directly to citegraph:

$ bin/citegraph CorpusID:125964925
[1 / 80 / 145] (DOI 0.0) Fractal calculus and its geometrical explanation 
[2 / 80 / 145] (DOI 1.25) Fractal approach to heat transfer in silkworm cocoon hierarchy 
...
[80 / 80 / 5602] (DOI 1.428) Bubble Electrospinning for Mass Production of Nanofibers 
Hit max size threshold
Rendering...
Rendered to graph.pdf

Here's what the graph would look like (using a max --size of 20 for readability):

Graph example

Use citegraph --help to find out about all the options.

Exploration parameters

By default the exploration algorithm is biased towards exploring downward links. To also explore the papers that cite your root papers, you can use the option --also-up.

Compare for example the two following graphs (root paper in pink):

  • Default:

Laarman default

  • With --also-up:

Laarman up and down

Layout and export formats

Export formats are selected using the --format (-f) option.

The following formats are suitable for importing the graph into an external graph visualisation tool:

Citegraph can also call Graphviz directly to perform graph layout and rendering to another format, for example PDF, PNG, or SVG. The default export format is PDF, see the available ones with --help.

Customizing graph appearance

You can specify how individual nodes are styled with a yaml file. For example:

tags:
    read: # an identifier for the tag
        attrs: # DOT attributes:     https://graphviz.gitlab.io/doc/info/attrs.html
            style: bold
        members: # enumerate explicit members using keys of the bibtex file
            - someBibKey
            - another
    
    knuth_articles: # another tag
        attrs: 
            style: filled
            fillcolor: lightyellow
        
        # Select using an arbitrary python expression
        # The bibtex entry is in scope as 'paper'
        selector: 'any("Knuth" in author.last_names for author in paper.authors)'

citegraph's People

Contributors

oowekyala avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

citegraph's Issues

StopIteration exception raised with Python 3.8

This issue can be reproduced by running this command:

bin/citegraph CorpusID:3959685

Python version used: Python 3.8.2 (default, Jul 16 2020, 14:00:26)

Traceback (most recent call last):
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/semapi.py", line 20, in _tupled_sort
    elt = next(it)
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/__main__.py", line 142, in <module>
    main(args, parser.error)
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/__main__.py", line 120, in main
    graph = create_graph(seeds=seeds, biblio=bibdata, params=params, db=db)
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/semapi.py", line 233, in __exit__
    raise exc_val
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/__main__.py", line 120, in main
    graph = create_graph(seeds=seeds, biblio=bibdata, params=params, db=db)
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/explore.py", line 249, in smart_fetch
    roots = [resp for id in seeds for resp in [db.fetch_from_id(id)] if resp or not handle_api_failure(id, None)]
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/explore.py", line 249, in <listcomp>
    roots = [resp for id in seeds for resp in [db.fetch_from_id(id)] if resp or not handle_api_failure(id, None)]
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/semapi.py", line 175, in fetch_from_id
    result = self.__update_db(response=paper_dict)
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/semapi.py", line 149, in __update_db
    return self.__paper_from_db(internal_id, True)
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/semapi.py", line 70, in __paper_from_db
    authors=self.__authors_from_db(internal_id))
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/semapi.py", line 101, in __authors_from_db
    return [Person(tup[0]) for tup in _tupled_sort(self.dbconn.execute("SELECT Authors.name, AuthorLinks.rank FROM Authors INNER JOIN AuthorLinks ON AuthorLinks.authorId = Authors.id WHERE AuthorLinks.paperId=?", (internal_id,)))]
  File "/home/dylan/Documents/tryzone/citegraph/src/citegraph/semapi.py", line 101, in <listcomp>
    return [Person(tup[0]) for tup in _tupled_sort(self.dbconn.execute("SELECT Authors.name, AuthorLinks.rank FROM Authors INNER JOIN AuthorLinks ON AuthorLinks.authorId = Authors.id WHERE AuthorLinks.paperId=?", (internal_id,)))]
RuntimeError: generator raised StopIteration

API break with the latest semanticscholar package v0.3.2

Hey oowekyala, hope you are doing well!

I have wanted to reuse your tool lately but I ran into an exception while trying to use it.
I have installed all the dependencies to their latest released versions.
It looks like semanticscholar got its API changed... here is the traceback:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/gageotd/Documents/workspace/citegraph/src/citegraph/main.py", line 142, in
main(args, parser.error)
File "/home/gageotd/Documents/workspace/citegraph/src/citegraph/main.py", line 118, in main
with PaperDb(bibdata=bibdata, dbfile=db_loc) as db:
File "/home/gageotd/Documents/workspace/citegraph/src/citegraph/semapi.py", line 230, in exit
raise exc_val
File "/home/gageotd/Documents/workspace/citegraph/src/citegraph/main.py", line 120, in main
graph = create_graph(seeds=seeds, biblio=bibdata, params=params, db=db)
File "/home/gageotd/Documents/workspace/citegraph/src/citegraph/explore.py", line 249, in smart_fetch
roots = [resp for id in seeds for resp in [db.fetch_from_id(id)] if resp or not handle_api_failure(id, None)]
File "/home/gageotd/Documents/workspace/citegraph/src/citegraph/explore.py", line 249, in
roots = [resp for id in seeds for resp in [db.fetch_from_id(id)] if resp or not handle_api_failure(id, None)]
File "/home/gageotd/Documents/workspace/citegraph/src/citegraph/semapi.py", line 163, in fetch_from_id
paper_dict: Dict = semanticscholar.paper(paper_id)
AttributeError: module 'semanticscholar' has no attribute 'paper'. Did you mean: 'Paper'?

I "fixed" it by setting semanticscholar version to 0.2.1 in DEPENDENCIES, which might be a good workaround for the moment.

Dylan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.