emilmont / pystatparser Goto Github PK

View Code? Open in Web Editor NEW

109.0 109.0 49.0 781 KB

Simple Python Statistical Parser

License: Apache License 2.0

Python 100.00%

pystatparser's People

Contributors

Stargazers

Watchers

Forkers

brijrajsingh sibghatullahsheikh mbc1990 evankos safehammad pkarmstr innerfirexy sjtu-yys node31 bendavis78 scottpledger robmcdan dovedanhan balajikvijayan joshstclair aakashdp crazypython nithinv13 fareklazhar krunaldatascience pombredanne shivan1912 loretoparisi lauragwilliams patwaria mtdunno mukeshjaiswal44 anujonthemove hkazuakey chubbymaggie jessamynsmith adigituser giritheja mravunni omarmeriwani hxlszxy vantral mingtan888 khatrishamsundar tre-flip global-localhost global19 global19-atlassian-net arevalirio jippolitie olgavanderburgh linuxnote sabayossi

pystatparser's Issues

Definitive list of part-of-speech tags?

This is a great parser, thanks for sharing it!

Just wondering if you could include or link to a list of all the part-of-speech tags used and their descriptions (similar to this). I've been asking the internet, but haven't yet found a list that has all the tags that show up in pyStatParser, so I have to look up each one individually. Thank you!

THANK YOU

OH MY GOSH THANK YOU!!! I have been looking for a tool like this for the past few days and I am so relieved that someone made this. Thank you again. I also am hoping to enter the field of NLP and this is certainly a good step towards me learning more. Thank your for your good work :)

Learning source material for model building implementation?

Hi, I am trying to port this into another language (GDScript). The other language deals differently with file opening, writing, etc, and also implements data types a bit differently (lists and tuples are Arrays). So some adaptations were required during porting.

I have the parser working fine as long as I have the models already build. If I build them in the python version (this repo) and copy the files from the TEMP folder, all works. But if I have the GDScript code trying to build the models, it fails.

I don't have the theoretical knowledge to troubleshoot the model building. Some files mention a coursera course which no longer exists.
Is there any material around I could be reading to understand how to build the PCFG model and populate the TEMP dir?

temp file handling

Noticed when my installation tried to write to /usr/local/lib/python2.7/dist-packages/stat_parser/temp


Building the Grammar Model
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-29f6a952f468> in <module>()
      4 os.environ['DISPLAY'] = 'localhost:10.0'
      5 sent = "Each of us is full of shit in our own special way"
----> 6 parser = Parser()
      7 parser.parse(sent)
      8 tree = parser.parse(sent) # returns nltk Tree instance

/usr/local/lib/python2.7/dist-packages/stat_parser/parser.pyc in __init__(self, pcfg)
     78     def __init__(self, pcfg=None):
     79         if pcfg is None:
---> 80             pcfg = build_model()
     81 
     82         self.pcfg = pcfg

/usr/local/lib/python2.7/dist-packages/stat_parser/learn.pyc in build_model()
     26 
     27         if not exists(TEMP_DIR):
---> 28             makedirs(TEMP_DIR)
     29 
     30         # Normalise the treebanks

/usr/lib/python2.7/os.pyc in makedirs(name, mode)
    155         if tail == curdir:           # xxx/newdir/. exists if xxx/newdir exists
    156             return
--> 157     mkdir(name, mode)
    158 
    159 def removedirs(name):

OSError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/stat_parser/temp'

Wouldn't it be better to use a https://docs.python.org/2/library/tempfile.html or something else?

Fixed by manually creating that file and chmod'ing it as work around.

max() arg is an empty sequence

Please could anyone help with why I get this error when I run the example code?

Would REALLY appreciate a response

Thank you!

Is nltk.grammar.CFG supported?

Supposed to have

from nltk.grammar import CFG
    grammar = CFG.fromstring("""
    # Grammatical productions.
        S -> NP VP
        NP -> Det N PP | Det N
        VP -> V NP PP | V NP | V
        PP -> P NP
    # Lexical productions.
        NP -> 'I'
        Det -> 'the' | 'a'
        N -> 'man' | 'park' | 'dog' | 'telescope'
        V -> 'ate' | 'saw'
        P

will pyStatParser output a CFG string for the grammar?

Error while parsing a sentence with brackets

The program trew an error while parsing a sentence with brackets in it. If the part in brackets is removed, the sentence gets parsed successfully.

print parser.parse ("(CCC 2313) Defending one's country against aggression is permitted, but we should never forget that every human life, from the moment of conception, is sacred because it is made in God's image and likeness.")

Traceback (most recent call last):
File "<pyshell#224>", line 1, in
print parser.parse ("(CCC 2313) Defending one's country against aggression is permitted, but we should never forget that every human life, from the moment of conception, is sacred because it is made in God's image and likeness.")
File "stat_parser\parser.py", line 111, in nltk_parse
return nltk_tree(self.raw_parse(sentence))
File "stat_parser\parser.py", line 106, in raw_parse
tree = self.norm_parse(sentence)
File "stat_parser\parser.py", line 92, in norm_parse
if is_cap_word(words[0]):
File "stat_parser\word_classes.py", line 6, in is_cap_word
return CAP.match(word) is not None
TypeError: expected string or buffer

print parser.parse ("Defending one's country against aggression is permitted, but we should never forget that every human life, from the moment of conception, is sacred because it is made in God's image and likeness.")
(S+VP
(VBG defending)
(NP
(NP (PRP one) (POS 's))
(NN country)
(SBAR
(IN against)
(S
(VP
(VB aggression)
(VBZ is)
(UCP
(VP (JJ permitted))
(, ,)
(CC but)
(S
(NP (PRP we))
(VP
(MD should)
(ADVP (RB never))
(VB forget)
(PP (IN that) (NP (DT every) (JJ human) (NN life)))))
(, ,)
(PP
(IN from)
(NP
(NP (DT the) (NN moment))
(PP (IN of) (NP (NN conception)))))
(, ,)
(VP
(VBZ is)
(VBD sacred)
(SBAR
(IN because)
(S
(NP (PRP it))
(VP
(VBZ is)
(VBN made)
(PP (IN in) (NP (NNP God) (POS 's)))))))
(NN image)
(CC and)
(JJ likeness)))
(. .)))))

Error running the example in README

Hi, I cloned the master branch and tried the example in README, but encountered the following error. Could anyone know how to fix it? Thanks!

Python 2.7.12 (default, Jul 18 2016, 15:02:52)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from stat_parser import Parser
parser = Parser()
print parser.parse("How can the net amount of entropy of the universe be massively decreased?")
Traceback (most recent call last):
File "", line 1, in
File "stat_parser/parser.py", line 112, in nltk_parse
return nltk_tree(self.raw_parse(sentence))
File "stat_parser/parser.py", line 107, in raw_parse
tree = self.norm_parse(sentence)
File "stat_parser/parser.py", line 104, in norm_parse
return CKY(self.pcfg, norm_words)
File "stat_parser/parser.py", line 74, in CKY
_, top = max([(pi[1, n, X], bp[1, n, X]) for X in pcfg.N])
ValueError: max() arg is an empty sequence

Can't install the library from PyPi

Hi,

I can't seem to install the library from PyPi using pip3. I think I've tried all the possible combinations of pyStatParser name to install it via pip3.

What's the library name when installing with pip3? Cloning and installing with python setup.py install --user (python3) works fine.