Code Monkey home page Code Monkey logo

pybibtexttools's Introduction

PyBibTextTools

  • SpringerCsv2Bib
  • GetAbstract
  • BibFilesMerge

Dependencies

Run this command to install the dependencies:

  • pip install -r requirements.txt

SpringerCsv2Bib

Convert Springer CSV file to Bibtext file

Run

foo@bar:~$ python SpringerCsv2Bib.py -h
usage: SpringerCsv2Bib.py [-h] -c CSVFILENAME -b BIBFILENAME

optional arguments:
  -h, --help            show this help message and exit
  -c CSVFILENAME, --csvFileName CSVFILENAME
                        CSV file name
  -b BIBFILENAME, --bibFileName BIBFILENAME
                        BibText file name

Example

foo@bar:~$ python SpringerCsv2Bib.py -c "Springer.csv" -b "Springer.bib"

File founded: Springer.csv
Processed: 590
Removid without author: 5
Total Final: 585
Saved file: Springer.bib

GetAbstract

This tools get abstract on digital library, its a Magver function, not official.

Obs 1: ACM need use limit parameter, because ACM blocks if you get many abstracts same time.

Obs 2: In my case, I use proxy, because I access in my house by proxy of my university.

Run

foo@bar:~$ python GetAbstract.py -h
usage: GetAbstract.py [-h] -d {springer,acm,ieee} -f BIBFILENAME [-p PROXY]
                      [-l LIMIT]

optional arguments:
  -h, --help            show this help message and exit
  -d {springer,acm,ieee}, --database {springer,acm,ieee}
                        select database
  -f BIBFILENAME, --bibFileName BIBFILENAME
                        Springer bibFile name
  -p PROXY, --proxy PROXY
                        internet proxy, ex:
                        https://john:[email protected]:4001
  -l LIMIT, --limit LIMIT
                        abstract load limit

Example

foo@bar:~$ python GetAbstract.py -d acm -f "ACM.bib" -l 10 -p https://peter:[email protected]:4001

Had Abstract: 85
Url errors: 0
Loaded Abstract: 10
Total Entries: 585
Limit to process: 10
Processed: 95
Left: 490

or

foo@bar:~$ python GetAbstract.py -d springer -f "Springer.bib"

Had Abstract: 85
Url errors: 0
Loaded Abstract: 10
Total Entries: 585
Limit to process: 10
Processed: 95
Left: 490

BibFilesMerge

Merge BibTex files and:

  • remove duplicate entries
  • in some cases merge information before removing duplicates
  • remove entries that not have:
    • author or
    • title or
    • year or
    • journal name or conference name

This tool has been tested with these digital library files:

  • ACM Digital Library
  • IEEE Xplore
  • Scopus
  • SpringerLink
  • ScienceDirect - ElsevierWeb of Science
  • Web of Science (thanks @dineiar for this)

Run

foo@bar:~$ python BibFilesMerge.py -h
usage: BibFilesMerge.py [-h] -p FOLDERPATH [-f [FILELIST [FILELIST ...]]]
                        [-o FILENAMEOUT] [-e [EXCLUDE [EXCLUDE ...]]] [-l]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDERPATH, --folderPath FOLDERPATH
                        Bib files folder path
  -f [FILELIST [FILELIST ...]], --fileList [FILELIST [FILELIST ...]]
                        bib file name list, e.g. -f IEEE.bib ACM.bib
                        science.bib Springer.bib
  -o FILENAMEOUT, --fileNameOut FILENAMEOUT
                        File name of merged file
  -e [EXCLUDE [EXCLUDE ...]], --exclude [EXCLUDE [EXCLUDE ...]]
                        bib with entries to be removed from others, e.g. -e
                        FirstExecution.bib SecondExecution.bib
  -l, --logProcess      Log processing to CSV files

Example

foo@bar:~$ python BibFilesMerge.py -p output/ -o 2019-2.bib -f 2019-2/ScienceDirect1.bib 2019-2/ScienceDirect2.bib 2019-2/Scopus.bib -e 2018/ACM.bib 2018/IEEE.bib 2018/ScienceDirect.bib 2018/SCOPUS.bib 2019/ACM.bib 2019/IEEE.bib 2019/ScienceDirect1.bib 2019/ScienceDirect2.bib 2019/SCOPUS.bib -l

--folderPath     output/
--fileNameOut    2019-2.bib
--fileList       ['2019-2/ScienceDirect1.bib', '2019-2/ScienceDirect2.bib', '2019-2/Scopus.bib']
--exclude        ['2018/ACM.bib', '2018/IEEE.bib', '2018/ScienceDirect.bib', '2018/SCOPUS.bib', '2019/ACM.bib', '2019/IEEE.bib', '2019/ScienceDirect1.bib', '2019/ScienceDirect2.bib', '2019/SCOPUS.bib']
--logProcess     True

2019-2/ScienceDirect1.bib
2019-2/ScienceDirect2.bib                                  
2019-2/Scopus.bib                                   
                                                       
Total:                   798
No Author:               0
No Year:                 0
No Publisher:            0
Duplicates:              31
Merged:                  25
Excluded from bib:       537
Final:                   230
without Abstract:        0 {'2019-2/ScienceDirect1.bib': 0, '2019-2/ScienceDirect2.bib': 0, '2019-2/Scopus.bib': 0}

The two CSV files created on output folder by the -l switch are:

  • BibFilesMerge_removed.csv, with columns cause, source, key, doi, author, year, title and publish
  • cause is one of: no author, no year, no journal, duplicate of next or duplicate of prev
  • BibFilesMerge_final.csv, with columns key, doi, author, year, title, publish and abstract

Load Bib File Error

Sometimes errors occur while reading the bib file. In this case, note at the end of the error line of the bib file. Then edit the bib file and adjust the error. For example:

foo@bar:~$ python BibFilesMerge.py -p results -f IEEE.bib ACM.bib science.bib Springer.bib -o MyFile.bib

--folderPath     results
--fileNameOut    MyFile.bib
--fileList       ['IEEE.bib', 'ACM.bib', 'science.bib', 'Springer.bib']
--exclude        None
--logProcess     False

IEEE.bib
ACM.bib
science.bib
Traceback (most recent call last):
  File "BibFilesMerge.py", line 146, in <module>
    run(args["folderPath"], args["fileList"], args["fileNameOut"])
  File "BibFilesMerge.py", line 63, in run
    bibData = parse_file(os.path.join(folderPath,bibFileName))
  File "./pybtex\pybtex\database\__init__.py", line 865, in parse_file
    return parser.parse_file(file)
  File "./pybtex\pybtex\database\input\__init__.py", line 54, in parse_file
    self.parse_stream(f)
  File "./pybtex\pybtex\database\input\bibtex.py", line 410, in parse_stream
    return self.parse_string(text)
  File "./pybtex\pybtex\database\input\bibtex.py", line 397, in parse_string
    for entry in entry_iterator:
  File "./pybtex\pybtex\database\input\bibtex.py", line 191, in parse_bibliography
    self.handle_error(error)
  File "./pybtex\pybtex\database\input\bibtex.py", line 383, in handle_error
    report_error(error)
  File "./pybtex\pybtex\errors.py", line 78, in report_error
    raise exception
  File "./pybtex\pybtex\database\input\bibtex.py", line 189, in parse_bibliography
    yield tuple(self.parse_command())
  File "./pybtex\pybtex\database\input\bibtex.py", line 222, in parse_command
    self.handle_error(error)
  File "./pybtex\pybtex\database\input\bibtex.py", line 383, in handle_error
    report_error(error)
  File "./pybtex\pybtex\errors.py", line 78, in report_error
    raise exception
  File "./pybtex\pybtex\database\input\bibtex.py", line 220, in parse_command
    self.required([body_end])
  File "./pybtex\pybtex\scanner.py", line 120, in required
    raise TokenRequired(description, self)
pybtex.scanner.TokenRequired: syntax error in line 2264: '}' expected

Bib file Content in line 2264:

2264 note = "Special issue on Assistive Computer Vision and Robotics - "Assistive Solutions for Mobility, Communication and HMI" ",

just fix it to:

2264 note = "Special issue on Assistive Computer Vision and Robotics - Assistive Solutions for Mobility, Communication and HMI",

pybibtexttools's People

Contributors

kabrau avatar claudioscheer avatar dineiar avatar dalvangriebler avatar

Stargazers

Larissa Guder avatar

Watchers

James Cloos avatar  avatar

Forkers

larissaguder

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.