humans-of-julia / bibparser.jl Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 4.0 645 KB

Parser for bibliographic formats, including BibTeX, in pure Julia

License: MIT License

Julia 100.00%

bibparser.jl's People

Contributors

Stargazers

Watchers

Forkers

simonbyrne fingolfin mcmarius blegat

bibparser.jl's Issues

Compiling FSM hangs

Hi there!

I am working on a package to display citation information in package documentation, and as part of this, I need to parse .bib files. Rather than reinvent the wheel, I'm trying to use this package, but the call toAutoma.compile(bibfile) is hanging. Was this a problem that you experienced when you were writing this package?

Difference between `BibParser.parse_entry` and `BibTeX.parse_bibtex`

Consider the following example

bib = """
@misc{rstan2019,
    author        = {{Stan Development Team}},
    note          = {R package version 2.19.2},
    year          = {2019},
    url           = {http://mc-stan.org/},
    title         = {{RStan}: the {R} interface to {Stan}}
   }
"""

e1 = BibTeX.parse_bibtex(bib)
e1[2]["rstan2019"]["author"]
# "{ Stan Development Team }" # correct

e2 = BibParser.parse_entry(bib)
e2["rstan2019"].authors
# BibInternal.Name("", "Team}", "", "", "") # incorrect

The first entry appears correct, but the second is not. Note that this also propagates to Bibliography.jl.

BibParser copies missing fields from previous entries

I think there is a strange behavior in case some entries have missing fields. That is, suppose that entries number 1 and 4 have abstract field, but in entries 2 and 3 it's missing. In that case abstract field is added to entries 2 and 3 and it's value is copied from the first entry.
I've noticed that behavior for abstract, comment and x-color fields, but I guess it may be true for other fields too.

Here is a minimal example:

@article{ashfahani_2019_continual_DL,
	abstract = { The feasibility of deep neural networks (DNNs) to address
                  data stream problems still requires intensive study because of
                  the static and offline nature of conventional deep learning
                  approaches. A deep continual learning algorithm, namely
                  autonomous deep learning (ADL), is proposed in this paper.
                  Unlike traditional deep learning methods, ADL features a
                  flexible structure where its network structure can be
                  constructed from scratch with the absence of an initial
                  network structure via the self-constructing network structure.
                  ADL specifically addresses catastrophic forgetting by having a
                  different-depth structure which is capable of achieving a
                  trade-off between plasticity and stability. Network
                  significance (NS) formula is proposed to drive the hidden
                  nodes growing and pruning mechanism. Drift detection scenario
                  (DDS) is put forward to signal distributional changes in data
                  streams which induce the creation of a new hidden layer. The
                  maximum information compression index (MICI) method plays an
                  important role as a complexity reduction module eliminating
                  redundant layers. The efficacy of ADL is numerically validated
                  under the prequential test-then-train procedure in lifelong
                  environments using nine popular data stream problems. The
                  numerical results demonstrate that ADL consistently
                  outperforms recent continual learning methods while
                  characterizing the automatic construction of network
                  structures. },
	archiveprefix = {arXiv},
	author = {Andri Ashfahani and Mahardhika Pratama},
	comment = {published = 2018-10-17T01:40:45Z, updated = 2020-01-09T12:19:19Z},
	doi = {10.1137/1.9781611975673.75},
	eprint = {1810.07348v4},
	month = jan,
	primaryclass = {cs.LG},
	title = {Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments},
	url = {http://arxiv.org/abs/1810.07348v4; http://arxiv.org/pdf/1810.07348v4},
	x-color = {#cc3300},
	x-fetchedfrom = {arXiv.org},
	year = 2019
}

@article{ashfahani_2020_DEVDAN,
	added-at = {2020-05-08T00:00:00.000+0200},
	author = {Andri Ashfahani and Mahardhika Pratama and Edwin Lughofer and Yew-Soon Ong},
	biburl = {https://www.bibsonomy.org/bibtex/2f01e837afa1ecc4df48befc53e43f458/dblp},
	ee = {https://doi.org/10.1016/j.neucom.2019.07.106},
	interhash = {d8ce7807e54d80e379324b2c3b4cd6df},
	intrahash = {f01e837afa1ecc4df48befc53e43f458},
	journal = {Neurocomputing},
	pages = {297--314},
	timestamp = {2020-05-09T11:39:11.000+0200},
	title = {DEVDAN: Deep evolving denoising autoencoder.},
	url = {http://dblp.uni-trier.de/db/journals/ijon/ijon390.html#AshfahaniPLO20},
	volume = 390,
	x-fetchedfrom = {Bibsonomy},
	year = 2020
}

Spurious spaces inserted with curly braces

julia> using BibParser

julia> bibtex = """@article{Heun1900,
          title = {Neue Methoden zur approximativen Integration der
          Differentialgleichungen einer unabh\\\"{a}ngigen Ver\\\"{a}nderlichen},
          author = {Heun, Karl},
          journal = {Z. Math. Phys},
          volume = {45},
          pages = {23--38},
          year = {1900},
       }"""
"@article{Heun1900,\n   title = {Neue Methoden zur approximativen Integration der\n   Differentialgleichungen einer unabh\\\"{a}ngigen Ver\\\"{a}nderlichen},\n   author = {Heun, Karl},\n   journal = {Z. Math. Phys},\n   volume = {45},\n   pages = {23--38},\n   year = {1900},\n}"

julia> _,bib = BibParser.parse_entry(bibtex)
("", OrderedCollections.OrderedDict("Heun1900" => Dict("volume" => "45","author" => "Heun , Karl","pages" => "23--38","journal" => "Z. Math. Phys","year" => "1900","title" => "Neue Methoden zur approximativen Integration der Differentialgleichungen einer unabh\\ \" { a } ngigen Ver\\ \" { a } nderlichen","type" => "article")))

julia> bib["Heun1900"]["title"]
"Neue Methoden zur approximativen Integration der Differentialgleichungen einer unabh\\ \" { a } ngigen Ver\\ \" { a } nderlichen"

Add acceptance of reference name containing a dash

Officially, BibTeX does not support dash (and many other things still used by the community ...) within the ref names.

Add "-" to the grammar
Check and add other usual characters for reference name
Issue warning when those characters appear

BibTex string parser changes string content.

Parsing BibTex authors fields changes the parsed text sometimes.

E.g.:

@Article{2015Nguyen,
  author    = {Vinh Phu Nguyen and Cosmin Anitescu and St{\'{e}}phane P.A. Bordas and Timon Rabczuk},
  journal   = {Mathematics and Computers in Simulation},
  title     = {Isogeometric analysis: An overview and computer implementation aspects},
  year      = {2015},
  month     = {nov},
  pages     = {89--116},
  volume    = {117},
  doi       = {10.1016/j.matcom.2015.05.008},
  publisher = {Elsevier {BV}},
}

will result in the strings of the authors mutated to:

julia> test["2015Nguyen"].authors
4-element Vector{BibInternal.Name}:
 BibInternal.Name("", "Nguyen", "", "Vinh", " Phu")
 BibInternal.Name("", "Anitescu", "", "Cosmin", "")
 BibInternal.Name("", "Bordas", "", "St\\'{e}}}phane", " P.A.")
 BibInternal.Name("", "Rabczuk", "", "Timon", "")

I had not yet the opportunity to determine the location of the mutation.
The unbalanced curly braces make the string cleaning quite difficult.

Unrelated to the issue: How do you plan to bring in CSL support?
I had a look at it and there are already some non Julia projects.
Do you aim for a Julia-only implementation?
Will you create a own repository for the CSL module (as it will be a quite big task) or do you plan to implement only a subset here in BibParser.jl?

I kinda would try to implement CSL in a own repository as i am job hunting and it would be a good portfolio project .

`@string` definitions bleed into following entry

In a bib file that has @string definitions at the top, the last string definition seems to bleed into the first bib entry. Consider broken.bib:

@string{zp = "Z. Phys."}

@mastersthesis{GoerzDiploma2010,
    Author = {Goerz, Michael},
    Title = {Optimization of a Controlled Phasegate for Ultracold Calcium Atoms in an Optical Lattice},
    School = {Freie Universität Berlin},
    type = {{Diplomarbeit}},
    url = {http://michaelgoerz.net/research/diploma_thesis.pdf},
    Year = {2010},
}

Here, parsing the file gets Z. Phys. inserted into the author last name:

julia> parse_file("./broken.bib")["GoerzDiploma2010"].authors[1]
BibInternal.Name("", "Z. Phys.Goerz", "", "Michael", "")

I'm guessing some parsing string buffer isn't being cleared.

Bump DataStructures compat

CompatHelper built PR #6 to bump the compat version of DataStructures a couple of months ago. Is there any reason not to accept this? It's preventing some other packages in my docs Project from updating and leading to precompilation issues and other warnings.

Improve BibTeX parser to cover the whole BibTeX gramma

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

Make a real Doc

BibLaTeX / Biber support

I am not sure whether this is a goal, but do you plan to also support the little more modern BibLaTeX/Biber format? It has a few small differences (for example it prefers date fields over year – and there is a type online).

Questions to current state of parsing and an error message.

I am looking forward to using DocumenterCitations.jl somewhen in my repositories and started playing around with BibParser trying to understand how I can maybe help going forward to CSL capabilities.

When I just take a Bibtex Example file – e.g. https://raw.githubusercontent.com/brechtm/citeproc-py/master/examples/xampl.bib

I get

julia> bib = BibParser.parse_file("xampl.bib")
ERROR: Expected } on line 345
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] expect at .julia/packages/BibParser/jVXKY/src/bibtex.jl:52 [inlined]
 [3] field!(::BibParser.BibTeX.Parser{Array{SubString{String},1}}, ::Dict{String,String}) at .julia/packages/BibParser/jVXKY/src/bibtex.jl:102
 [4] parse_bibtex(::String) at .julia/packages/BibParser/jVXKY/src/bibtex.jl:158
 [5] parse_file(::String) at .julia/packages/BibParser/jVXKY/src/bibtex.jl:169
 [6] #parse_file#1 at .julia/packages/BibParser/jVXKY/src/BibParser.jl:19 [inlined]
 [7] parse_file(::String) at .julia/packages/BibParser/jVXKY/src/BibParser.jl:18
 [8] top-level scope at REPL[7]:1

It seems to be a problem with doubly curly braces? Since line 345 is indeed

   author = "Ulrich {\"{U}}nderwood and Ned {\~N}et and Paul {\={P}}ot",

What's missing for this to be parsed?

Further if I shorten the file above to just the first entry

@ARTICLE{article-minimal,
   author = {L[eslie] A. Aamport},
   title = {The Gnats and Gnus Document Preparation System},
   journal = {\mbox{G-Animal's} Journal},
   year = 1986,
}

and parse the file only containing the, I get an empty dictionary.

Bugs in `@string` parsing

The parsing of @string entries seems to have some bugs:

import BibParser

bibtex = """
@string{foo = {Mrs. Foo}}

@string {bar = "Mr. Bar"}
"""

BibParser.BibTeX.parse_string(bibtex; check=:error)

results in

Warning: The entry is incomplete and end from (line 1, character 13) to (line 3, character 1): '= {Mrs. Foo}}'
Warning: The entry kind is invalid from (line 3, character 2) to (line 3, character 9): '@string'

The first @string using braces instead of quotes may not actually be in line with the description at http://www.bibtex.org/Format/, but it is definitely accepted by bibtex, and, more importantly, it's what BibDesk generates. The second @string with a space before the opening brace is directly from http://www.bibtex.org/Format/. It's not a format I personally use, but it should definitely be valid.

Foreign languages field values such as Japanese input fail

Preambles and comments

Preambles @preamble are not treated yet (and raise an incomplete entry error). If the preamble includes some LaTeX commands, it might impact the exported entries (some valid entries relying on @preamble would be considered invalid).

Entry comments @comment are not treated either (however it has no impact on the entries)