inukshuk / anystyle Goto Github PK
View Code? Open in Web Editor NEWFast citation reference parsing
Home Page: https://anystyle.io
License: Other
Fast citation reference parsing
Home Page: https://anystyle.io
License: Other
User request:
It would be very handy to have an option to delete an entry entirely from the edit section. Sometimes references are simply broken to begin with but you notice it only after the parsing. For example I just had this bibliographical entry in one of the lists I fed into AnyStyle:
Graddol, David, Dick Leith, et al., eds. Changing English. Milton Park, New York: Routledge > The spread of English within the British Isles, p. 125 ff.
This entry is irreparably broken, but I cannot take it out of the list without having to parse it again, thus losing all the edits I already made. Now I had to remember, find and delete that entry after the import into Zotero.
Refer the below citation
Judge, T., Erez, A., & Bono, J. (1998). The power of being positive: The relation between positive self-concept and job performance, Human Performance, 11(2/3), 167–187.
The issue identified as 2 not 2/3 and like wise it handles only when the number is present, other wise it throws exception error on training while trying with text like Suppl.
Is that anything to do with training?
require 'anystyle/parser'
Anystyle.parse """
Turing, Alan, Computing Machinery and Intelligence, Mind 59, pp 433-460 (1950)
"""
=> [{
:type => :article
:author => "Turing, Alan",
:title => "Computing Machinery and Intelligence",
:journal => "Mind",
:volume => 59,
:pages => "433–460",
:year => 1950
}]
We triend these but not running throwing many errors
I am able to render the data in simple JSON format(by using the functio .to_json).However, it does not seem to segregate the names(with initials) separately.Also i happened to read an issue,and you have already mentioned that the data if rendered in citeproc-JSON/BIBJSON formats, will segregate the names separately.How should that be done?Is there any function to render/convert the data in to such formats?Can you provide a sample code?
Strings such as "2nd edition", "3rd ed.", "2ème édition", "4. Aufl." could be normalized to contain the number only. Literal strings such as "Third, revised and expanded edition" should of course be left as is. Works for both CSL and biblatex.
(In bibtex, if you still wish to implement this, the edition field should contain just the ordinal, with the first letter capitalized, for example, “Second”. One more incompatibility between bibtex and biblatex …)
Hello sir.The library works extremely well in the terminal, but the problem is that when I try to run the ruby script via php (by using system()), by making the script to parse the given string(through php to ruby) and store it in a file,nothing happens.The data is not getting stored in the file.
PHP code :
RUBY CODE :
require 'anystyle/parser'
ref='Poe, Edgar A. Essays and Reviews. New York: Library of America, 1984.'
s=Anystyle.parse('Auster, Paul. The Art of Hunger. Expanded. New York: Penguin, 1997.', :bibtex).to_s
open('1.txt', 'w') do |f|
f.puts("\r\n")
f.puts s
end
The ruby code(above) works perfectly in the terminal.But when I try to run it through php , nothing happens.A little help would be appretiated.Thank you.
Vancouver-style author names are unpunctuated, resulting in the Namae parser incorrectly putting the given/family name in the wrong order.
Example: The following authors
Sekijima Y, Dendle MA, Kelly JW
will be parsed as
Y, Sekijima and MA, Dendle and JW, Kelly
I've created a PR with unit tests to fix this: #44
3 out of the 5 Travis builds pass, not sure why 84.1 and 84.4 are failing though.
"Transl." is not removed from a translator's name.
It would be nice if a "crossref" tag could be added which would allow at least some automation in the following cases:
In some bibliographies, individual contributions to a collection include cross-references to an entry for the volume as a whole. See, e.g., CMoS, 16e, 15.37, "Cross-references to other titles in reference lists":
Draper, Joan E. 1987. “Paris by the Lake: Sources of Burnham’s Plan of
Chicago.” In Zukowsky 1987, 107–19.Harrington, Elaine. 1987. “International Influences on Henry Hobson
Richardson’s Glessner House.” In Zukowsky 1987, 189–207.Zukowsky, John, ed. 1987. Chicago Architecture, 1872–1922: Birth of a
Metropolis. Munich: Prestel-Verlag in association with the Art
Institute of Chicago.
I imagine a user would tag "In Zukowsky 1987" as "crossref", and anystyle would export "Zukowsky 1987" (or some variant without spaces, e.g., "zukowsky1987") to the bibtex/biblatex field "crossref".
I am not sure how this could be automated further, so I imagine users would then be responsible themselves to ensure a proper match between the content of the crossref fields and the cite key of the containing entry.
As to CSL, the content of the "crossref" tag could be exported to the "booktitle" variable, e.g. in square brackets, like "[Zukowsky 1987]". Again, users would have to fix such entries manually.
Thanks for the awesome parser! It's largely been working very well for me; it's a very friendly interface.
When parsing from a plain text file that I copy-pasted into the parser, a few times it has grouped words and numbers together that don't belong.
For example, "Music 42" (journal title and volume number) or "Portugal: 2012" (location and publication date).
It's not consistent enough to be obvious why it's doing it on these cases. It needs to err on the side of separating things more, not less, so that corrections can be made and I can use those corrections to teach the parser :)
We have installed anystyle on 3 platforms - MaxOS X, Ubuntu, and Debian - and receive a segmentation fault when training the parser with our own data. Our training set is 167 lines. When we leave the default model out of the training, it doesn’t complain:
1.9.3-p551 :002 > Anystyle.parser.train 'train.txt', true
=> #Wapiti::Model:0x000000012cb6a8
However, when we do include the default model …
1.9.3-p551 :002 > Anystyle.parser.train 'train.txt', false
/home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54: [BUG] Segmentation fault
ruby 1.9.3p551 (2014-11-13 revision 48407) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0026 p:---- s:0096 b:0096 l:000095 d:000095 CFUNC :train
c:0025 p:0079 s:0092 b:0092 l:000091 d:000091 METHOD /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54
c:0024 p:0102 s:0087 b:0087 l:000086 d:000086 METHOD /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/anystyle-parser-0.7.4/lib/anystyle/parser/parser.rb:161
c:0023 p:0025 s:0082 b:0082 l:000278 d:000081 EVAL (irb):2
c:0022 p:---- s:0080 b:0080 l:000079 d:000079 FINISH
c:0021 p:---- s:0078 b:0078 l:000077 d:000077 CFUNC :eval
c:0020 p:0028 s:0071 b:0071 l:000070 d:000070 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80
c:0019 p:0033 s:0064 b:0063 l:000062 d:000062 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/context.rb:254
c:0018 p:0031 s:0058 b:0058 l:0001e8 d:000057 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:159
c:0017 p:0042 s:0050 b:0050 l:000049 d:000049 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:273
c:0016 p:0011 s:0045 b:0045 l:0001e8 d:000044 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:156
c:0015 p:0144 s:0041 b:0041 l:000024 d:000040 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:243
c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH
c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop
c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229
c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch
c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228
c:0008 p:0046 s:0022 b:0022 l:0001e8 d:0001e8 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:155
c:0007 p:0011 s:0019 b:0019 l:001368 d:000018 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:70
c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH
c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch
c:0004 p:0183 s:0011 b:0011 l:001368 d:001368 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69
c:0003 p:0039 s:0006 b:0006 l:000388 d:0024e8 EVAL /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/bin/irb:12
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000388 d:000388 TOP
-- Ruby level backtrace information ----------------------------------------
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/bin/irb:12:in <main>' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69:in
start'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69:in catch' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:70:in
block in start'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:155:in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in
each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in catch' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in
block in each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in loop' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:243:in
block (2 levels) in each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:156:in block in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:273:in
signal_status'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:159:in block (2 levels) in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/context.rb:254:in
evaluate'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80:in evaluate' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80:in
eval'
(irb):2:in irb_binding' /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/anystyle-parser-0.7.4/lib/anystyle/parser/parser.rb:161:in
train'
/home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54:in train' /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54:in
train'
-- C level backtrace information -------------------------------------------
SEGV received in SEGV handler
Aborted
Above is the training dataset, in case you are interested to see.
We have plenty of memory on the Debian machine, so I don’t think that’s the issue.
Thanks!
This is my first time using it, but the first time I put in (one) reference it worked perfectly. However, after this, when I try to save to BibTex, it opens a new blank tab but doesn't appear to do anything.
The web application and web service at http://anystyle.io seem down - results in a "502 Bad Gateway" error.
I inadvertently fed anystyle-parse some weird input text, and it choked. Granted, it's not supposed to be able to parse this properly, but it probably shouldn't choke either?
Input text "@misc{70213094902020,\n doi = {DOI:10.1503/jpn.100140}\n}\n"
result:
/Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/wapiti-0.0.5/lib/wapiti/model.rb:47:in label': missing tokens, cannot apply pattern (Wapiti::NativeError) from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/wapiti-0.0.5/lib/wapiti/model.rb:47:in
label'
from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/parser.rb:59:in label' from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/parser.rb:50:in
parse'
from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/utility.rb:4:in parse' from anystyle-import.rb:34:in
The "genre" label should be exported to the bibtex/biblatex field "type". (There is no "genre" field in bibtex or biblatex.)
For
- Abe, S., A. Furuya, T. Saito, and K. Takayama. November 1962. Method of producing L-malic acid by fermentation. U.S. patent 3,063,910.
all the individual name parts end up in their own bubble, whereas the JSON output just contains a single string:
"author":"1. Abe, S., A. Furuya, T. Saito, and K. Takayama"
Wouldn't it make more sense to identify entire names in both cases? I.e. "Abe, S." instead of "Abe," and "S.," after parsing, and "author":{"Abe, S.", "A. Furuya", etc.}
in the JSON?
Brilliant, thank you so much. For the most part, AnyStyle already works extremely well compared with any other parser I tried so far.
Just a few observations and remarks:
I'd also suggest adding biblatex as a separate output format. While some might want to continue using classical bibtex, I see huge advantages in using biblatex and its much more comprehensive data model, and it'd be nice if AnyStyle could output the biblatex format directly. The most important differences between bibtex and biblatex include:
Hi
We are tyring to build the application from Git, but getting below error:
unable to run the application: cloud you please give more details where parser will be located after installation and name of the parser /usr/bin etc
kernel_require.rb:55:in `require': cannot load such file -- multimap (LoadError) #317
We’re getting a segmentation fault when training the parser with our own data. Our training set is 167 lines. When we leave the default model out of the training, Anystyle doesn’t complain:
1.9.3-p551 :002 > Anystyle.parser.train 'train.txt', true
=> #Wapiti::Model:0x000000012cb6a8
However, when we do include the default model …
1.9.3-p551 :002 > Anystyle.parser.train 'train.txt', false
/home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54: [BUG] Segmentation fault
ruby 1.9.3p551 (2014-11-13 revision 48407) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0026 p:---- s:0096 b:0096 l:000095 d:000095 CFUNC :train
c:0025 p:0079 s:0092 b:0092 l:000091 d:000091 METHOD /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54
c:0024 p:0102 s:0087 b:0087 l:000086 d:000086 METHOD /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/anystyle-parser-0.7.4/lib/anystyle/parser/parser.rb:161
c:0023 p:0025 s:0082 b:0082 l:000278 d:000081 EVAL (irb):2
c:0022 p:---- s:0080 b:0080 l:000079 d:000079 FINISH
c:0021 p:---- s:0078 b:0078 l:000077 d:000077 CFUNC :eval
c:0020 p:0028 s:0071 b:0071 l:000070 d:000070 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80
c:0019 p:0033 s:0064 b:0063 l:000062 d:000062 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/context.rb:254
c:0018 p:0031 s:0058 b:0058 l:0001e8 d:000057 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:159
c:0017 p:0042 s:0050 b:0050 l:000049 d:000049 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:273
c:0016 p:0011 s:0045 b:0045 l:0001e8 d:000044 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:156
c:0015 p:0144 s:0041 b:0041 l:000024 d:000040 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:243
c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH
c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop
c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229
c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch
c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228
c:0008 p:0046 s:0022 b:0022 l:0001e8 d:0001e8 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:155
c:0007 p:0011 s:0019 b:0019 l:001368 d:000018 BLOCK /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:70
c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH
c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch
c:0004 p:0183 s:0011 b:0011 l:001368 d:001368 METHOD /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69
c:0003 p:0039 s:0006 b:0006 l:000388 d:0024e8 EVAL /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/bin/irb:12
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000388 d:000388 TOP
-- Ruby level backtrace information ----------------------------------------
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/bin/irb:12:in <main>' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69:in
start'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:69:in catch' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:70:in
block in start'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:155:in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in
each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in catch' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in
block in each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in loop' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/ruby-lex.rb:243:in
block (2 levels) in each_top_level_statement'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:156:in block in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:273:in
signal_status'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb.rb:159:in block (2 levels) in eval_input' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/context.rb:254:in
evaluate'
/home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80:in evaluate' /home/jsachs/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/irb/workspace.rb:80:in
eval'
(irb):2:in irb_binding' /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/anystyle-parser-0.7.4/lib/anystyle/parser/parser.rb:161:in
train'
/home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54:in train' /home/jsachs/.rvm/gems/ruby-1.9.3-p551/gems/wapiti-0.1.1/lib/wapiti/model.rb:54:in
train'
-- C level backtrace information -------------------------------------------
SEGV received in SEGV handler
Aborted
We’re getting this on three different platforms - OS X, Debian, and Ubuntu. This is without Kyoto-cabinet or Redis installed.
Thank you for the great work!
In your online-demo the tool automatically splits a single line of multiple citations correctly into multiple lines and parses them afterwards.
Can you help me how to do that in irb?
I have made a proof-of-concept Zotero add-on that works as a front-end for AnyStyle.io. It allows the user to import a formatted bibliography from the clipboard into Zotero's library.
It truly feels magical. Try it for yourself. You just grab the bibliography from the end of an article, you select the menu item in Zotero... And boom... You have all the items in a nicely structured manner in your library!
I see many people using it if available as a public add-on. And if there is a easy-to-use interface in the plugin, similar to the website, that allows the users to make corrections to the results, I am sure the engine will become the best-trained one in the domain pretty soon. One can even see this implemented as a last recourse in Zotero's default Import from Clipboard functionality (when the translators fail to detect any structured data in the text being pasted).
But here are my questions:
Names containing initials without periods are inverted, e.g. "author = {JR, Doe}"
Currently, the ML model relies on a feature key-value store that is implemented using kyoto-cabinet or (as a fallback) a regular Ruby hash. Because the feature dictionary is fairly large (probably in excess of 5MB) it is not ideal to use a Ruby hash as this consumes a lot of memory of the Ruby process; the kyoto-cabinet uses a file-based approach which performs very well, however, some users have reported difficulties installing kyoto-cabinet. Redis seems like a good alternative and it should be fairly easy to add as third KV store option.
After installing anystyle-parser, and loading it in an irb session as follows gives
irb(main):002:0> require 'anystyle/parser' LoadError: cannot load such file -- language_detector from /usr/local/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require' from /usr/local/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require' from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser/parser.rb:27:in `' from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser/parser.rb:4:in `' from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser/parser.rb:2:in `' from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser/parser.rb:1:in `' from /usr/local/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require' from /usr/local/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require' from /usr/local/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.7.1/lib/anystyle/parser.rb:14:in `'
Although the library is loaded all right, the parse method cannot be found
irb(main):007:0> Anystyle.parse NoMethodError: undefined method `parse' for Anystyle:Module
saji
Can we mark issn with anystyle?
I was working through the irb session illustration from the readme after installing anystyle. I copied and pasted the commands with example references, but received different parsing results than the illustration. For example,
puts Anystyle.parse('Auster, Paul. The Art of Hunger. Expanded. New York: Penguin, 1997.', :bibtex).to_s
@book{auster1997a,
author = {Auster, Paul},
title = {The Art of Hunger},
publisher = {Penguin},
date = {1997},
edition = {Expanded},
language = {},
address = {New York}
}
=> nil
This result is different from the result in the irb session in the readme (address vs. location, date vs. year). In addition, I ran this same code twice. The above result is after reinstalling the gem (there were encoding issues, and I was trying to resolve them), and the first result was even worse. The parser was unable to recognize New York as a location or address. (I'm working on ruby 1.9.3 through ssh to a dreamhost account).
Are there reasons why anystyle would give different results between sessions? I assume the training data that comes with it is the same each time I install. Any comment on this subject would be appreciated.
Thanks,
Jocelyn
User request:
When a colon does not have a blank space following it (sometimes happens erroneously, e.g. “London:Routledge”), the anteceding and the preceding word are considered a unit and cannot be assigned different labels (so it also cannot be trained as incorrect). I think generally that behaviour is not desirable, maybe the parser could be altered to break up X:Y pairs by default?
I have these two references:
Yang, Q., Mudambi, R., & Meyer, K. E. 2008. Conventional and reverse knowledge flows in multinational corporations. Journal of Management, 34: 882-902.
Yli-Renko, H., Autio, E., & Sapienza, H. J. 2001. Social capital, knowledge acquisitions, and knowledge exploitation in young technology-based firms. Strategic Management Journal, 22: 587-613.
That anystyle correctly parses in the following way:
But then fails to reproduce the author names in the output:
@article{yang2008a,
author = {},
date = {2008},
title = {Conventional and reverse knowledge flows in multinational corporations},
journal = {Journal of Management},
volume = {34},
pages = {882–902},
language = {en}
}
@article{yli-renko2001a,
author = {},
date = {2001},
title = {Social capital, knowledge acquisitions, and knowledge exploitation in young technology-based firms},
journal = {Strategic Management Journal},
volume = {22},
pages = {587–613},
language = {en}
}
Hi Sylvester, there is a iDigBio hackothon going on, and as part of it we are trying to adjust your code to work with specimen labels. Just letting your know, and would like to talk about it with you if we get anywhere. May be it would be helpful to refactor anystyle to general and specific gems, don't know yet :)
Great great work on this gem, very much appreciate your input into citations parsing and showing the way with using wapiti!
I have created a model from journal articles without titles and this is working well. However, I noticed that author labels are getting switched for editor labels in certain circumstances, and I think this regex may be the cause
if authors =~ /[^[:alnum:]][Ee]d(s|itors)?[^[:alnum:]]$/ && !hash.has_key?(:editor)
If I parse the following line
P. Eisenberger and W. A. Reed, Phys. Rev. A 5, 2085 (1972)
I get
[{:journal=>"Phys. Rev. A", :volume=>5, :pages=>"2085", :date=>"1972", :editor=>"Eisenberger, P. and Reed, W.A.", :type=>:article}]
If I change 'Reed' to 'Read' then I get the expected
[{:author=>"Eisenberger, P. and Read, W.A.", :journal=>"Phys. Rev. A", :volume=>5, :pages=>"2085", :date=>"1972", :type=>:article}]
suggesting that the token ending with 'ed,' is causing the regex on line 94 of normalizer.rb to fire
I think it would be better to include a word boundary before '[Ee]d' to ensure that only words starting and ending with this expression are matched. E.g.
if authors =~ /[^[:alnum:]]\b[Ee]d(s|itors)?\b[^[:alnum:]]$/ && !hash.has_key?(:editor)
In references like: Pear, Robert. 2008. “Crisis Puts Tax Moves Into Play.” The New York Times, October 2, sec. Business.
For the references below, I couldn't tag the citation numbers as such. (they're being grouped along with the authors)
1. Abe, S., A. Furuya, T. Saito, and K. Takayama. November 1962. Method of producing L-malic acid by fermentation. U.S. patent 3,063,910.
2. Aliverdieva, D. A., D. V. Mamaev, D. I. Bondarenko, and K. F. Sholtz. 2006. Properties of yeast Saccharomyces cerevisiae plasma membrane dicarboxylate transporter. Biochemistry (Moscow) 71:1161–1169.
For example Alfred
Hi there - great tool! Have you considered automating the creation of training data from PubMed .nxml files, which you can get from ftp.ncbi.nlm.nih.gov/pub/pmc/.
The data is normalised, but could be extracted/transformed into the format required for training the parser according to different journal/citation styles.
If the citation number includes a punctuation mark, the value that is exported is truncated to an integer leaving any other characters behind.
All the following input citation number will result with citation_number = 1
1.1.4.1
1,123
1b
Note that parsing itself goes well. This happens both in BibTeX and in CiteProc/JSON (but not in XML).
I was using the anystyle.io web interface.
Thanks!
Hi there,
This tool looks absolutely great for some use-cases I have. But I can't get Wapiti compiled on a couple of Windows boxes I have tried it on. Same error in both cases, using Ruby1.9.3 and the appropriate DevKit.
Can I just double-check that this gem is compatible with Windows platform?
Cheers,
Chondrops.
generating native-i386-mingw32.def
compiling bcd.c
compiling decoder.c
In file included from decoder.c:43:0:
thread.h:32:21: fatal error: pthread.h: No such file or directory
User request:
In a similar vein, the number of entries per ‘page’ in the edit section is quite small, which makes for many pages, which again are difficult to navigate. Granted, that is not often necessary, but if it is it’s a real bother. It would already help to have seven to ten entries on one page (graphically they could be somehow set off against each other by numbering them or giving every other entry another background colour). Best would be an option to choose how many entries to display per page. If that’s not possible, maybe you could increase the number pages displayed at the bottom? :)
See #8
For example: 45(3):23–7
I inadvertently fed anystyle-parse some weird input text, and it choked. Granted, it's not supposed to be able to parse this properly, but it probably shouldn't choke either?
Input text "@misc{70213094902020,\n doi = {DOI:10.1503/jpn.100140}\n}\n"
result:
/Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/wapiti-0.0.5/lib/wapiti/model.rb:47:in label': missing tokens, cannot apply pattern (Wapiti::NativeError) from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/wapiti-0.0.5/lib/wapiti/model.rb:47:in
label'
from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/parser.rb:59:in label' from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/parser.rb:50:in
parse'
from /Users/Stian/.rbenv/versions/1.9.3-rc1/lib/ruby/gems/1.9.1/gems/anystyle-parser-0.0.9/lib/anystyle/parser/utility.rb:4:in parse' from anystyle-import.rb:34:in
[root@localhost anystyle]# ruby parser.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require': cannot load such file -- bibtex (LoadError) from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in
require'
from parser.rb:4:in `
Rename the label on anystyle.io since this already caused some confusion.
User request:
The number of labels that appear below the ‘Assign label’ float box in the edit section is too small for my taste. I used about seven different labels regularly to correct entries and — as it goes ;) — always the one I needed had just dropped out of the list. Would it be difficult to just increase the list to seven or eight items? Even five (one more) would be a benefit, IMO.
Hi I am individual
and trying to use this library for my research work... i tried to build it from source and running but gettting below error.. and parser is not working ... please provide detailed usage:
[root@localhost lib]# ls
anystyle sample.rb
[root@localhost lib]# ruby sample.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require': cannot load such file -- anystyle/parser/version (LoadError) from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in
require'
from /root/workspace/anystyle-parser/lib/anystyle/parser.rb:9:in <top (required)>' from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in
require'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require' from sample.rb:2:in
my sample program is as below to invoke the parser .. please advise if i am missing anything.
I am running this program from anystyle-parser/lib/sample.rb program.
please suggest if i am missing ..
[root@localhost lib]# cat sample.rb
require './anystyle/parser'
Anystyle.parse 'Poe, Edgar A. Essays and Reviews. New York: Library of America, 1984.'
puts $b
puts "Bye"
I tried as suggested in the web site but none of them working to invoke the parser...
[root@localhost lib]#
Thanks in Advance
Input: Lévesque, Andrée. Red Travellers: Jeanne Corbin & Her Comrades (Montreal: McGill-Queen’s University Press, 2006).
Output in BibTeX (I may have corrected the parsing):
@book{l2006a,
author = {Lévesque, Andrée},
title = {Red Travellers: Jeanne Corbin & Her Comrades},
publisher = {McGill-Queen’s University Press},
date = {2006},
language = {en},
address = {Montreal}
}
Citation key is l2006a, expected is lévesque2006a OR levesque2006a, if BibTeX can't handle accents there.
Cheers,
Daniel
"Accessed" is not removed from an "Accessed" field (also, this field should be named "Urldate" in BibTeX)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.