Code Monkey home page Code Monkey logo

bibfix's Introduction

BibFIX

Description

BibFIX aims to automatically generate complete, consistent and well-formatted BibTeX records for papers published in Computer/Information Science venues given a .bib file.

Use Case

Your reference bibTeX records were collected/generated using different sources/methods (e.g., google scholar, dblp, auto-genearted using a reference manager..etc) resulting in a reference list with inconsistent and incomplete bibTeX records. You need a quick and easy way to fill in missing fields and regenerate the bibTeX records where all records follow one style using dblp or ACM DL.

Example

Given this sample input file at bibtex/bib_sample.bib, the following command generates this output file at bibtex/output_bib.bib using BibTeX records sourced from dblp:

python bibfix.py -i bibtex/bib_sample.bib -o bibtex/output_bib.bib -src dblp

records failed to be retrieved (three in this case) are written into a seperate file.

A summary of the process is given in the following run:

2022-08-19 20:25:49,727 - Reading the bibtex file..
2022-08-19 20:25:49,872 - Getting things ready... ๐Ÿ”ฅ 
2022-08-20 00:15:23,070 - Starting to retrieve bibtex entries from dblp...
Ref [001]: answer interaction in non-fact...  ---> Done   โœ… 
Ref [002]: multi-method evaluation: lever...  ---> Done   โœ… 
Ref [003]: an intent taxonomy for questio...  ---> Done   โœ… 
::
Ref [026]: generating relevant and inform...  ---> Failed ๐Ÿ™ˆ 
Ref [027]: ontological user profiling in ...  ---> Done   โœ… 
Ref [028]: technological frames: making s...  ---> Done   โœ… 
2022-08-20 00:15:58,199 - ... Task Completed ๐Ÿฅณ ...
.... .... .... .... .... .... .... .... .... .... ....
Summary: 
total ...........  28
Succeeded ๐ŸŽ‰ ....  25/28
Failed ๐Ÿซฃ .......  3/28
.... .... .... .... .... .... .... .... .... .... ....
 ๐Ÿ“ƒ๏ธ Bibtex consistent records are stored @ bibtex/output_bib.bib...
 ๐Ÿ“ƒ๏ธ Bibtex failed records are stored @ bibtex/output_bib_failed.bib...
......................................................

How to use?

  1. Download and install python (ensure it is included in your environment path)
  2. Download Chrome Driver (choose the version compatible with the Chrome browser installed on your machine)
  3. Clone the project.
  4. Place the downloaded chrome driver in the tools directory.
  5. Install selenium==3.141.0 and bibtexparser
  6. Run bibfix.py using the options described below:
Option ย ย  Description
-i (--infile) sets the path to the input .bib file.
-o (--outfile) sets the path of the generated output .bib file.
-src (--source) sets the desired source of the BibTeX records [dblp or acm]. By default, the script uses dblp to source bibtex records.
-k (--keepkeys) keeps the keys (labels) of the BibTex unchanged so the compile is not affected. It is set to False by default.
-hd (--hide) hides the browser instance that is used to scrape bibTeX records. It is set to False by default.
-s (--short) sets the script to use the short version of conference/journal names (under development). It is set to False by default.
  • Enjoy! ๐Ÿš€๐Ÿ˜‰

Limitations and further details

  • There are many assumptions used to build this script. For example, the script uses paper titles to search for bib records and assumes that first matching title is the correct one.
  • As opposed to dblp, when using ACM DL as a source for bib records, only papers published in ACM can be found. The rest of the papers will fail and you may want to run the failed list using dblp or generate them manually following the same style.
  • As the script scrapes the content out of web pages, things are not predictable and changes to the structure of web pages can happen which may break the script. Also, the speed of the script is highly dependent on the time taken to load web pages. ACM DL pages take about 5 times more time than dblp web pages to load.
  • The script can be configured to suite one's bibtex formatting preferences. For example, you can change bibtex keys to follow a certain naming convention or change the order of bibtex fields...etc.

bibfix's People

Contributors

marwahalaofi avatar

Stargazers

Johanne avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.