Code Monkey home page Code Monkey logo

anystyle's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anystyle's Issues

Delete parsed references from list

User request:

It would be very handy to have an option to delete an entry entirely from the edit section. Sometimes references are simply broken to begin with but you notice it only after the parsing. For example I just had this bibliographical entry in one of the lists I fed into AnyStyle:

Graddol, David, Dick Leith, et al., eds. Changing English. Milton Park, New York: Routledge > The spread of English within the British Isles, p. 125 ff.

This entry is irreparably broken, but I cannot take it out of the list without having to parse it again, thus losing all the edits I already made. Now I had to remember, find and delete that entry after the import into Zotero.

Incorrect volume/issue type

Refer the below citation
Judge, T., Erez, A., & Bono, J. (1998). The power of being positive: The relation between positive self-concept and job performance, Human Performance, 11(2/3), 167–187.

The issue identified as 2 not 2/3 and like wise it handles only when the number is present, other wise it throws exception error on training while trying with text like Suppl.

Is that anything to do with training?

usage is not working and

require 'anystyle/parser'

Anystyle.parse """
Turing, Alan, Computing Machinery and Intelligence, Mind 59, pp 433-460 (1950)
"""
=> [{
:type => :article
:author => "Turing, Alan",
:title => "Computing Machinery and Intelligence",
:journal => "Mind",
:volume => 59,
:pages => "433–460",
:year => 1950
}]

We triend these but not running throwing many errors

This is rather a question.Not an issue.How do i export the data in ruby hash to citeprocJSON or BIBJSON??

I am able to render the data in simple JSON format(by using the functio .to_json).However, it does not seem to segregate the names(with initials) separately.Also i happened to read an issue,and you have already mentioned that the data if rendered in citeproc-JSON/BIBJSON formats, will segregate the names separately.How should that be done?Is there any function to render/convert the data in to such formats?Can you provide a sample code?

Improve edition normalizing

Strings such as "2nd edition", "3rd ed.", "2ème édition", "4. Aufl." could be normalized to contain the number only. Literal strings such as "Third, revised and expanded edition" should of course be left as is. Works for both CSL and biblatex.

(In bibtex, if you still wish to implement this, the edition field should contain just the ordinal, with the first letter capitalized, for example, “Second”. One more incompatibility between bibtex and biblatex …)

Working with php

Hello sir.The library works extremely well in the terminal, but the problem is that when I try to run the ruby script via php (by using system()), by making the script to parse the given string(through php to ruby) and store it in a file,nothing happens.The data is not getting stored in the file.

PHP code :

RUBY CODE :

require 'anystyle/parser'
ref='Poe, Edgar A. Essays and Reviews. New York: Library of America, 1984.'
s=Anystyle.parse('Auster, Paul. The Art of Hunger. Expanded. New York: Penguin, 1997.', :bibtex).to_s
open('1.txt', 'w') do |f|
f.puts("\r\n")
f.puts s
end

The ruby code(above) works perfectly in the terminal.But when I try to run it through php , nothing happens.A little help would be appretiated.Thank you.

Correctly parse Vancouver-style names

Vancouver-style author names are unpunctuated, resulting in the Namae parser incorrectly putting the given/family name in the wrong order.

Example: The following authors

Sekijima Y, Dendle MA, Kelly JW

will be parsed as

Y, Sekijima and MA, Dendle and JW, Kelly

I've created a PR with unit tests to fix this: #44
3 out of the 5 Travis builds pass, not sure why 84.1 and 84.4 are failing though.

Add "crossref" tag

It would be nice if a "crossref" tag could be added which would allow at least some automation in the following cases:

In some bibliographies, individual contributions to a collection include cross-references to an entry for the volume as a whole. See, e.g., CMoS, 16e, 15.37, "Cross-references to other titles in reference lists":

Draper, Joan E. 1987. “Paris by the Lake: Sources of Burnham’s Plan of
Chicago.” In Zukowsky 1987, 107–19.

Harrington, Elaine. 1987. “International Influences on Henry Hobson
Richardson’s Glessner House.” In Zukowsky 1987, 189–207.

Zukowsky, John, ed. 1987. Chicago Architecture, 1872–1922: Birth of a
Metropolis.
Munich: Prestel-Verlag in association with the Art
Institute of Chicago.

I imagine a user would tag "In Zukowsky 1987" as "crossref", and anystyle would export "Zukowsky 1987" (or some variant without spaces, e.g., "zukowsky1987") to the bibtex/biblatex field "crossref".

I am not sure how this could be automated further, so I imagine users would then be responsible themselves to ensure a proper match between the content of the crossref fields and the cite key of the containing entry.

As to CSL, the content of the "crossref" tag could be exported to the "booktitle" variable, e.g. in square brackets, like "[Zukowsky 1987]". Again, users would have to fix such entries manually.

Parsing from .txt groups things together that don't belong (e.g. "Music 42")

Thanks for the awesome parser! It's largely been working very well for me; it's a very friendly interface.

When parsing from a plain text file that I copy-pasted into the parser, a few times it has grouped words and numbers together that don't belong.

For example, "Music 42" (journal title and volume number) or "Portugal: 2012" (location and publication date).

It's not consistent enough to be obvious why it's doing it on these cases. It needs to err on the side of separating things more, not less, so that corrections can be made and I can use those corrections to teach the parser :)

Segmentation fault on training of model with personal dataset

We have installed anystyle on 3 platforms - MaxOS X, Ubuntu, and Debian - and receive a segmentation fault when training the parser with our own data. Our training set is 167 lines. When we leave the default model out of the training, it doesn’t complain:

1.9.3-p551 :002 > Anystyle.parser.train 'train.txt', true
=> #Wapiti::Model:0x000000012cb6a8

However, when we do include the default model …


1.9.3-p551 :002 > Anystyle.parser.train 'train.txt', false
/home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54: [BUG] Segmentation fault
ruby 1.9.3p551 (2014-11-13 revision 48407) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0026 p:---- s:0096 b:0096 l:000095 d:000095 CFUNC :train
c:0025 p:0079 s:0092 b:0092 l:000091 d:000091 METHOD /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54
c:0024 p:0102 s:0087 b:0087 l:000086 d:000086 METHOD /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/anystyle-parser-0.7.4/lib/anystyle/parser/parser.rb:161
c:0023 p:0025 s:0082 b:0082 l:000278 d:000081 EVAL (irb):2
c:0022 p:---- s:0080 b:0080 l:000079 d:000079 FINISH
c:0021 p:---- s:0078 b:0078 l:000077 d:000077 CFUNC :eval
c:0020 p:0028 s:0071 b:0071 l:000070 d:000070 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80
c:0019 p:0033 s:0064 b:0063 l:000062 d:000062 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/context.rb:254
c:0018 p:0031 s:0058 b:0058 l:0001e8 d:000057 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:159
c:0017 p:0042 s:0050 b:0050 l:000049 d:000049 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:273
c:0016 p:0011 s:0045 b:0045 l:0001e8 d:000044 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:156
c:0015 p:0144 s:0041 b:0041 l:000024 d:000040 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:243
c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH
c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop
c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229
c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch
c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228
c:0008 p:0046 s:0022 b:0022 l:0001e8 d:0001e8 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:155
c:0007 p:0011 s:0019 b:0019 l:001368 d:000018 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:70
c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH
c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch
c:0004 p:0183 s:0011 b:0011 l:001368 d:001368 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69
c:0003 p:0039 s:0006 b:0006 l:000388 d:0024e8 EVAL /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/bin/irb:12
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000388 d:000388 TOP

-- Ruby level backtrace information ----------------------------------------
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/bin/irb:12:in <main>' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69:instart'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69:in catch' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:70:inblock in start'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:155:in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228:ineach_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in catch' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229:inblock in each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in loop' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:243:inblock (2 levels) in each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:156:in block in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:273:insignal_status'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:159:in block (2 levels) in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/context.rb:254:inevaluate'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80:in evaluate' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80:ineval'
(irb):2:in irb_binding' /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/anystyle-parser-0.7.4/lib/anystyle/parser/parser.rb:161:intrain'
/home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54:in train' /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54:intrain'

-- C level backtrace information -------------------------------------------
SEGV received in SEGV handler
Aborted


train.txt

Above is the training dataset, in case you are interested to see.

We have plenty of memory on the Debian machine, so I don’t think that’s the issue.
Thanks!

BibTex export doesn't work

This is my first time using it, but the first time I put in (one) reference it worked perfectly. However, after this, when I try to save to BibTex, it opens a new blank tab but doesn't appear to do anything.

Dies on certain input texts

I inadvertently fed anystyle-parse some weird input text, and it choked. Granted, it's not supposed to be able to parse this properly, but it probably shouldn't choke either?

Input text "@misc{70213094902020,\n doi = {DOI:10.1503/jpn.100140}\n}\n"

result:
/Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/wapiti-0.0.5/lib/wapiti/model.rb:47:in label': missing tokens, cannot apply pattern (Wapiti::NativeError) from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/wapiti-0.0.5/lib/wapiti/model.rb:47:inlabel'
from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/parser.rb:59:in label' from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/parser.rb:50:inparse'
from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/utility.rb:4:in parse' from anystyle-import.rb:34:in

'

biblatex: "type", not "genre"

The "genre" label should be exported to the bibtex/biblatex field "type". (There is no "genre" field in bibtex or biblatex.)

Split/group authors

For

  1. Abe, S., A. Furuya, T. Saito, and K. Takayama. November 1962. Method of producing L-malic acid by fermentation. U.S. patent 3,063,910.

all the individual name parts end up in their own bubble, whereas the JSON output just contains a single string:

"author":"1. Abe, S., A. Furuya, T. Saito, and K. Takayama"

Wouldn't it make more sense to identify entire names in both cases? I.e. "Abe, S." instead of "Abe," and "S.," after parsing, and "author":{"Abe, S.", "A. Furuya", etc.} in the JSON?

biblatex support & other remarks

Brilliant, thank you so much. For the most part, AnyStyle already works extremely well compared with any other parser I tried so far.

Just a few observations and remarks:

  • "45(3):23-45" is not separated, and parsed as "volume={3}" and "issue={23-45}"
  • Sometimes, "ed." is chopped off of names, e.g. "Alfred" -> "Alfr"
  • Frequently, surrounding quotes, and final commas are not removed from titles.
  • Frequently, periods are chopped off of initials, e.g. "author = {Doe, John R}"
  • Names containing initials without periods are inverted, e.g. "author = {JR, Doe}"
  • "Transl." is not removed from a translator's name.
  • "Accessed" is not removed from an "Accessed" field (also, this field should be named "Urldate" instead, at least for biblatex, see below)
  • The (field) labels being used seem to based, mostly, on CSL variables, resp., Zotero field names. I'd suggest introducing a few more, e.g.,
    • "Series Title" = CSL "collection-title",
    • "Series Number" = CSL "collection-number",
    • "Book Author" = CSL "container-author",
    • "No of Volumes" = CSL "number-of-volumes",
    • "Report number" = CSL "number", and
    • CSL "original-date";
    • maybe also the (hopefully) soon-to-be-introduced "Volume Title" = CSL "volume-title",
    • maybe an option to label authors as a corporate authors.

I'd also suggest adding biblatex as a separate output format. While some might want to continue using classical bibtex, I see huge advantages in using biblatex and its much more comprehensive data model, and it'd be nice if AnyStyle could output the biblatex format directly. The most important differences between bibtex and biblatex include:

  • Improved handling of dates:
    • "date = {YYYY-MM-DD}" instead of separate year, month, day fields
    • date ranges in the format "date = {YYYY-MM-DD/YYYY-MM-DD}"
    • "urldate" (instead of "accessed"; also I'm not sure whether there's any bibtex variant that would accept "accessed")
    • "origdate"
  • An "online" entry type
  • An "institution" field for report and thesis entries, and an "organization" field for manual and online entry types ("authority", again, is not recognized by biblatex, and probably by no other bibtex variant either)
  • An "incollection" entry type (AnyStyle uses this, though output for classical bibtex should probably return to "inbook" here)
  • A "maintitle" field, which will have to be used if AnyStyle starts using CSL "volume-title" (mapping is a bit complicated here, but I'd be happy to help here, as with all biblatex questions).

usage is not working

Hi

We are tyring to build the application from Git, but getting below error:

unable to run the application: cloud you please give more details where parser will be located after installation and name of the parser /usr/bin etc

kernel_require.rb:55:in `require': cannot load such file -- multimap (LoadError) #317

Segmentation fault

We’re getting a segmentation fault when training the parser with our own data. Our training set is 167 lines. When we leave the default model out of the training, Anystyle doesn’t complain:

1.9.3-p551 :002 > Anystyle.parser.train 'train.txt', true
=> #Wapiti::Model:0x000000012cb6a8

However, when we do include the default model …


1.9.3-p551 :002 > Anystyle.parser.train 'train.txt', false
/home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54: [BUG] Segmentation fault
ruby 1.9.3p551 (2014-11-13 revision 48407) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0026 p:---- s:0096 b:0096 l:000095 d:000095 CFUNC :train
c:0025 p:0079 s:0092 b:0092 l:000091 d:000091 METHOD /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54
c:0024 p:0102 s:0087 b:0087 l:000086 d:000086 METHOD /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/anystyle-parser-0.7.4/lib/anystyle/parser/parser.rb:161
c:0023 p:0025 s:0082 b:0082 l:000278 d:000081 EVAL (irb):2
c:0022 p:---- s:0080 b:0080 l:000079 d:000079 FINISH
c:0021 p:---- s:0078 b:0078 l:000077 d:000077 CFUNC :eval
c:0020 p:0028 s:0071 b:0071 l:000070 d:000070 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80
c:0019 p:0033 s:0064 b:0063 l:000062 d:000062 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/context.rb:254
c:0018 p:0031 s:0058 b:0058 l:0001e8 d:000057 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:159
c:0017 p:0042 s:0050 b:0050 l:000049 d:000049 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:273
c:0016 p:0011 s:0045 b:0045 l:0001e8 d:000044 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:156
c:0015 p:0144 s:0041 b:0041 l:000024 d:000040 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:243
c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH
c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop
c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229
c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch
c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228
c:0008 p:0046 s:0022 b:0022 l:0001e8 d:0001e8 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:155
c:0007 p:0011 s:0019 b:0019 l:001368 d:000018 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:70
c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH
c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch
c:0004 p:0183 s:0011 b:0011 l:001368 d:001368 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69
c:0003 p:0039 s:0006 b:0006 l:000388 d:0024e8 EVAL /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/bin/irb:12
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000388 d:000388 TOP

-- Ruby level backtrace information ----------------------------------------
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/bin/irb:12:in <main>' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69:instart'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69:in catch' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:70:inblock in start'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:155:in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228:ineach_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in catch' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229:inblock in each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in loop' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:243:inblock (2 levels) in each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:156:in block in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:273:insignal_status'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:159:in block (2 levels) in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/context.rb:254:inevaluate'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80:in evaluate' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80:ineval'
(irb):2:in irb_binding' /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/anystyle-parser-0.7.4/lib/anystyle/parser/parser.rb:161:intrain'
/home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54:in train' /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54:intrain'

-- C level backtrace information -------------------------------------------
SEGV received in SEGV handler
Aborted


We’re getting this on three different platforms - OS X, Debian, and Ubuntu. This is without Kyoto-cabinet or Redis installed.

Crowdsourcing the training with a Zotero add-on

I have made a proof-of-concept Zotero add-on that works as a front-end for AnyStyle.io. It allows the user to import a formatted bibliography from the clipboard into Zotero's library.

It truly feels magical. Try it for yourself. You just grab the bibliography from the end of an article, you select the menu item in Zotero... And boom... You have all the items in a nicely structured manner in your library!

I see many people using it if available as a public add-on. And if there is a easy-to-use interface in the plugin, similar to the website, that allows the users to make corrections to the results, I am sure the engine will become the best-trained one in the domain pretty soon. One can even see this implemented as a last recourse in Zotero's default Import from Clipboard functionality (when the translators fail to detect any structured data in the text being pasted).

But here are my questions:

  1. Are there any plans for providing such widespread services on the current servers? How many concurrent sessions can the server(s) take right now (I don't know how fast the current code is)? Are we even supposed to ramp up the adoption like that?
  2. Given that API keys are really for the programmers and not the end users, how can we give the end users access to AnyStyle.io? Apart from the 100-lines limitation, do the current API keys bear any other limitation?

redis support

Currently, the ML model relies on a feature key-value store that is implemented using kyoto-cabinet or (as a fallback) a regular Ruby hash. Because the feature dictionary is fairly large (probably in excess of 5MB) it is not ideal to use a Ruby hash as this consumes a lot of memory of the Ruby process; the kyoto-cabinet uses a file-based approach which performs very well, however, some users have reported difficulties installing kyoto-cabinet. Redis seems like a good alternative and it should be fairly easy to add as third KV store option.

Errors related to missing library

After installing anystyle-parser, and loading it in an irb session as follows gives

irb(main):002:0> require 'anystyle/parser'
LoadError: cannot load such file -- language_detector
    from /usr/local/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /usr/local/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser/parser.rb:27:in `'
    from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser/parser.rb:4:in `'
    from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser/parser.rb:2:in `'
    from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser/parser.rb:1:in `'
    from /usr/local/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /usr/local/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser.rb:14:in `'

Although the library is loaded all right, the parse method cannot be found

irb(main):007:0> Anystyle.parse 
NoMethodError: undefined method `parse' for Anystyle:Module

saji

issn

Can we mark issn with anystyle?

Differences in output from different instances of anystyle

I was working through the irb session illustration from the readme after installing anystyle. I copied and pasted the commands with example references, but received different parsing results than the illustration. For example,

puts Anystyle.parse('Auster, Paul. The Art of Hunger. Expanded. New York: Penguin, 1997.', :bibtex).to_s

@book{auster1997a,
author = {Auster, Paul},
title = {The Art of Hunger},
publisher = {Penguin},
date = {1997},
edition = {Expanded},
language = {},
address = {New York}
}
=> nil

This result is different from the result in the irb session in the readme (address vs. location, date vs. year). In addition, I ran this same code twice. The above result is after reinstalling the gem (there were encoding issues, and I was trying to resolve them), and the first result was even worse. The parser was unable to recognize New York as a location or address. (I'm working on ruby 1.9.3 through ssh to a dreamhost account).

Are there reasons why anystyle would give different results between sessions? I assume the training data that comes with it is the same each time I install. Any comment on this subject would be appreciated.

Thanks,
Jocelyn

Improve tokenizer

User request:

When a colon does not have a blank space following it (sometimes happens erroneously, e.g. “London:Routledge”), the anteceding and the preceding word are considered a unit and cannot be assigned different labels (so it also cannot be trained as incorrect). I think generally that behaviour is not desirable, maybe the parser could be altered to break up X:Y pairs by default?

Strange disappearance of authors when the last author has a middle name

I have these two references:

Yang, Q., Mudambi, R., & Meyer, K. E. 2008. Conventional and reverse knowledge flows in multinational corporations. Journal of Management, 34: 882-902.
Yli-Renko, H., Autio, E., & Sapienza, H. J. 2001. Social capital, knowledge acquisitions, and knowledge exploitation in young technology-based firms. Strategic Management Journal, 22: 587-613.

That anystyle correctly parses in the following way:

image

But then fails to reproduce the author names in the output:

@article{yang2008a,
  author = {},
  date = {2008},
  title = {Conventional and reverse knowledge flows in multinational corporations},
  journal = {Journal of Management},
  volume = {34},
  pages = {882–902},
  language = {en}
}
@article{yli-renko2001a,
  author = {},
  date = {2001},
  title = {Social capital, knowledge acquisitions, and knowledge exploitation in young technology-based firms},
  journal = {Strategic Management Journal},
  volume = {22},
  pages = {587–613},
  language = {en}
}

trying to modify anystyle parser to work with specimen labels

Hi Sylvester, there is a iDigBio hackothon going on, and as part of it we are trying to adjust your code to work with specimen labels. Just letting your know, and would like to talk about it with you if we get anywhere. May be it would be helpful to refactor anystyle to general and specific gems, don't know yet :)

Great great work on this gem, very much appreciate your input into citations parsing and showing the way with using wapiti!

Regex on line 94 of normalizer.rb seems to fire incorrectly

I have created a model from journal articles without titles and this is working well. However, I noticed that author labels are getting switched for editor labels in certain circumstances, and I think this regex may be the cause

if authors =~ /[^[:alnum:]][Ee]d(s|itors)?[^[:alnum:]]$/ && !hash.has_key?(:editor)

If I parse the following line

P. Eisenberger and W. A. Reed, Phys. Rev. A 5, 2085 (1972)

I get

[{:journal=>"Phys. Rev. A", :volume=>5, :pages=>"2085", :date=>"1972", :editor=>"Eisenberger, P. and Reed, W.A.", :type=>:article}]

If I change 'Reed' to 'Read' then I get the expected

[{:author=>"Eisenberger, P. and Read, W.A.", :journal=>"Phys. Rev. A", :volume=>5, :pages=>"2085", :date=>"1972", :type=>:article}]

suggesting that the token ending with 'ed,' is causing the regex on line 94 of normalizer.rb to fire

I think it would be better to include a word boundary before '[Ee]d' to ensure that only words starting and ending with this expression are matched. E.g.

if authors =~ /[^[:alnum:]]\b[Ee]d(s|itors)?\b[^[:alnum:]]$/ && !hash.has_key?(:editor)

Combine multiple date parts

In references like: Pear, Robert. 2008. “Crisis Puts Tax Moves Into Play.” The New York Times, October 2, sec. Business.

Add citation-number tag

For the references below, I couldn't tag the citation numbers as such. (they're being grouped along with the authors)

1. Abe, S., A. Furuya, T. Saito, and K. Takayama. November 1962. Method of producing L-malic acid by fermentation. U.S. patent 3,063,910.
2. Aliverdieva, D. A., D. V. Mamaev, D. I. Bondarenko, and K. F. Sholtz. 2006. Properties of yeast Saccharomyces cerevisiae plasma membrane dicarboxylate transporter. Biochemistry (Moscow) 71:1161–1169.

Automated generation of training data

Hi there - great tool! Have you considered automating the creation of training data from PubMed .nxml files, which you can get from ftp.ncbi.nlm.nih.gov/pub/pmc/.

The data is normalised, but could be extracted/transformed into the format required for training the parser according to different journal/citation styles.

Citation number is truncated in BibTeX and in CiteProc/JSON

If the citation number includes a punctuation mark, the value that is exported is truncated to an integer leaving any other characters behind.

All the following input citation number will result with citation_number = 1
1.1.4.1
1,123
1b

Note that parsing itself goes well. This happens both in BibTeX and in CiteProc/JSON (but not in XML).

I was using the anystyle.io web interface.

Thanks!

wapiti compilation on Windoze?

Hi there,
This tool looks absolutely great for some use-cases I have. But I can't get Wapiti compiled on a couple of Windows boxes I have tried it on. Same error in both cases, using Ruby1.9.3 and the appropriate DevKit.

Can I just double-check that this gem is compatible with Windows platform?

Cheers,
Chondrops.


Errlog:

generating native-i386-mingw32.def
compiling bcd.c
compiling decoder.c
In file included from decoder.c:43:0:
thread.h:32:21: fatal error: pthread.h: No such file or directory

Show more references per page

User request:

In a similar vein, the number of entries per ‘page’ in the edit section is quite small, which makes for many pages, which again are difficult to navigate. Granted, that is not often necessary, but if it is it’s a real bother. It would already help to have seven to ten entries on one page (graphically they could be somehow set off against each other by numbering them or giving every other entry another background colour). Best would be an option to choose how many entries to display per page. If that’s not possible, maybe you could increase the number pages displayed at the bottom? :)

Dies on certain input texts

I inadvertently fed anystyle-parse some weird input text, and it choked. Granted, it's not supposed to be able to parse this properly, but it probably shouldn't choke either?

Input text "@misc{70213094902020,\n doi = {DOI:10.1503/jpn.100140}\n}\n"

result:
/Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/wapiti-0.0.5/lib/wapiti/model.rb:47:in label': missing tokens, cannot apply pattern (Wapiti::NativeError) from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/wapiti-0.0.5/lib/wapiti/model.rb:47:inlabel'
from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/parser.rb:59:in label' from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/parser.rb:50:inparse'
from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/utility.rb:4:in parse' from anystyle-import.rb:34:in

'

Bibetex Error when running program

[root@localhost anystyle]# ruby parser.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require': cannot load such file -- bibtex (LoadError) from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:inrequire'
from parser.rb:4:in `

'
[root@localhost anystyle]#

Show more recent labels

User request:

The number of labels that appear below the ‘Assign label’ float box in the edit section is too small for my taste. I used about seven different labels regularly to correct entries and — as it goes ;) — always the one I needed had just dropped out of the list. Would it be difficult to just increase the list to seven or eight items? Even five (one more) would be a benefit, IMO.

Version error when running program

Hi I am individual

and trying to use this library for my research work... i tried to build it from source and running but gettting below error.. and parser is not working ... please provide detailed usage:

[root@localhost lib]# ls
anystyle sample.rb
[root@localhost lib]# ruby sample.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require': cannot load such file -- anystyle/parser/version (LoadError) from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:inrequire'
from /root/workspace/anystyle-parser/lib/anystyle/parser.rb:9:in <top (required)>' from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:inrequire'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require' from sample.rb:2:in

'
[root@localhost lib]# pwd
/root/workspace/anystyle-parser/lib
[root@localhost lib]#

my sample program is as below to invoke the parser .. please advise if i am missing anything.

I am running this program from anystyle-parser/lib/sample.rb program.

please suggest if i am missing ..

[root@localhost lib]# cat sample.rb

!/usr/local/bin/ruby

require './anystyle/parser'

Anystyle.parse 'Poe, Edgar A. Essays and Reviews. New York: Library of America, 1984.'

puts $b

puts "Bye"

I tried as suggested in the web site but none of them working to invoke the parser...

[root@localhost lib]#

Thanks in Advance

Citation key for bibtex generated incorrectly when there is an accent in the author's name

Input: Lévesque, Andrée. Red Travellers: Jeanne Corbin & Her Comrades (Montreal: McGill-Queen’s University Press, 2006).

Output in BibTeX (I may have corrected the parsing):
@book{l2006a,
author = {Lévesque, Andrée},
title = {Red Travellers: Jeanne Corbin & Her Comrades},
publisher = {McGill-Queen’s University Press},
date = {2006},
language = {en},
address = {Montreal}
}

Citation key is l2006a, expected is lévesque2006a OR levesque2006a, if BibTeX can't handle accents there.

Cheers,
Daniel

Improve accessed normalizer

"Accessed" is not removed from an "Accessed" field (also, this field should be named "Urldate" in BibTeX)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.