sina-al / pynlp Goto Github PK
View Code? Open in Web Editor NEWA pythonic wrapper for Stanford CoreNLP.
License: MIT License
A pythonic wrapper for Stanford CoreNLP.
License: MIT License
I'm trying to use pynlp to process a bunch of text files, but I'm having trouble with one of them crashpynlp.txt . Using the following script
from pynlp import StanfordCoreNLP
with open("crashpynlp.txt", 'r') as file:
text = file.read()
nlp = StanfordCoreNLP(annotators="tokenize, ssplit, pos, lemma, ner")
doc = nlp(text)
I'm getting the following traceback
File "testPynlp.py", line 6, in <module>
doc = nlp(text)
File "/home/fernio/.local/lib/python3.6/site-packages/pynlp/client.py", line 132, in __call__
return self.annotate_one(texts)
File "/home/fernio/.local/lib/python3.6/site-packages/pynlp/client.py", line 138, in annotate_one
return Document(self._annotate(text))
File "/home/fernio/.local/lib/python3.6/site-packages/pynlp/client.py", line 135, in _annotate
return self._client.post(url=self._address, data=text, params=(('properties', str(self._properties)),))
File "/home/fernio/.local/lib/python3.6/site-packages/requests/sessions.py", line 559, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/home/fernio/.local/lib/python3.6/site-packages/pynlp/client.py", line 81, in request
response = super(CoreNLPClient, self).request(*args, **kwargs)
File "/home/fernio/.local/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/home/fernio/.local/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/home/fernio/.local/lib/python3.6/site-packages/requests/adapters.py", line 445, in send
timeout=timeout
File "/home/fernio/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/fernio/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1284, in _send_request
body = _encode(body, 'body')
File "/usr/lib/python3.6/http/client.py", line 161, in _encode
(name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 39: Body ('โ') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
Is it possible to use regexner with pynlp?
Thank you!
Is it possible to return the (integer) sentiment score, rather than the label in Sentence.sentiment
?
Hi
Can you check on some other piece of text? After updating the module I get far less entities and less precise.
Thanks for the effort
I'm getting the following error. It looks like the protobuf package is out of date?
$ python3 -m pynlp
Traceback (most recent call last):
File "/Users/sooheon/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/Users/sooheon/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 142, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/Users/sooheon/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 109, in _get_module_details
__import__(pkg_name)
File "/Users/sooheon/.pyenv/versions/nlp/lib/python3.6/site-packages/pynlp/__init__.py", line 1, in <module>
from .client import StanfordCoreNLP
File "/Users/sooheon/.pyenv/versions/nlp/lib/python3.6/site-packages/pynlp/client.py", line 3, in <module>
from .wrapper import Document
File "/Users/sooheon/.pyenv/versions/nlp/lib/python3.6/site-packages/pynlp/wrapper.py", line 1, in <module>
from pynlp.protobuf import from_bytes, to_bytes
File "/Users/sooheon/.pyenv/versions/nlp/lib/python3.6/site-packages/pynlp/protobuf/__init__.py", line 5, in <module>
from .CoreNLP_pb2 import Document
File "/Users/sooheon/.pyenv/versions/nlp/lib/python3.6/site-packages/pynlp/protobuf/CoreNLP_pb2.py", line 203, in <module>
options=None, file=DESCRIPTOR),
TypeError: __init__() got an unexpected keyword argument 'file'
Hi,
any plan to write the result into a JSON file with the same format as the JSON file outputFormat in the
CoreNLP?
Hi,
In the examples the "openie" annotator was used but the outputs still do not have the openIE results. So when are you planning to add the openIE support in the outputs?
In your source code I think it will be developed under the relations function defined in the Sentence class.
Hello,
I received CoreNLPServerError when trying to make it work with Version 3.9.1 of CoreNLP. Does it support the latest version for NER? Thanks!
When running the analysis on a long list of strings, I always get this error after successfully processing a number of strings:
google.protobuf.message.DecodeError: Tag had invalid wire type.
I'm crawling random webpages, so it doesn't seem to matter what the actual contents of the string are. I'm using BeautifulSoup to extract just the text, and it's coerced into a string to ensure it's unicode.
From what I've read about this error, it seems it occurs when trying to write over an existing file. I think it would be ideal if I could reset the CoreNLP server after each iteration.
My current workflow is
## start corenlp server from command line
$ python3 -m pynlp
In python:
from pynlp import StanfordCoreNLP
annotators = 'tokenize, ssplit, pos, lemma, ner, entitymentions, coref, sentiment'
nlp = StanfordCoreNLP(annotators=annotators)
document = nlp(str(line['text'])) ## line['text'] is a line of unicode text
The trackback call is:
Traceback (most recent call last):
File "/Users/adamg/Dropbox/Northwestern/Classes/Text_Analytics/homework/ta-hw4/extract_debates.py", line 188, in <module>
debate_sentiment_dct = analyze_utterances(analysis.get_lines())
File "/Users/adamg/Dropbox/Northwestern/Classes/Text_Analytics/homework/ta-hw4/sentiment.py", line 14, in analyze_utterances
document = nlp(str(line['text']))
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/pynlp/client.py", line 65, in __call__
return self.annotate(text)
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/pynlp/client.py", line 72, in annotate
return Document(_annotate(text, self._annotators, self._options, self._port))
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/pynlp/client.py", line 34, in _annotate
return from_bytes(_annotate_binary(text, annotators, options, port))
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/pynlp/client.py", line 39, in from_bytes
core.parseFromDelimitedString(doc, protobuf)
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/corenlp_protobuf/__init__.py", line 18, in parseFromDelimitedString
obj.ParseFromString(buf[offset+pos:offset+pos+size])
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/google/protobuf/message.py", line 185, in ParseFromString
self.MergeFromString(serialized)
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 1069, in MergeFromString
if self._InternalParse(serialized, 0, length) != length:
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 1095, in InternalParse
new_pos = local_SkipField(buffer, new_pos, end, tag_bytes)
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/google/protobuf/internal/decoder.py", line 850, in SkipField
return WIRETYPE_TO_SKIPPER[wire_type](buffer, pos, end)
File "/Users/adamg/miniconda2/envs/text_analytics3/lib/python3.6/site-packages/google/protobuf/internal/decoder.py", line 820, in _RaiseInvalidWireType
raise _DecodeError('Tag had invalid wire type.')
google.protobuf.message.DecodeError: Tag had invalid wire type.
On the command line, the CoreNLP server raises the error:
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:662)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Is there an obvious cause for this error? Alternatively, is there a way to restart the CoreNLP server after each loop within python?
Hello,
I have followed the instructions in the README and installed the library via pip3 install pynlp
.
When I go to the prompt and execute from pynlp import StanfordCoreNLP
I get the following error:
>>> from pynlp import StanfordCoreNLP
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pynlp'
Is there something i am doing wrong?
Thank you for your assistance,
Is there a way to install pynlp in a conda distribution or pip is the only possibility?
Traceback (most recent call last):
File "main_core.py", line 3, in <module>
from pynlp import StanfordCoreNLP
File "/Users/avelino/.virtualenvs/nuveo.nlp/lib/python2.7/site-packages/pynlp/__init__.py", line 1, in <module>
from .client import StanfordCoreNLP
File "/Users/avelino/.virtualenvs/nuveo.nlp/lib/python2.7/site-packages/pynlp/client.py", line 66
def __init__(self, properties: Properties):
^
SyntaxError: invalid syntax
Line 66 in 80a235f
Hi
Any plans for this class RelationExtractorAnnotator?
Thanks
Where can I invoke CorefChainAnnotation? Should it be directly in the server start?
Thx
When I run the command above, I get the error:
adamg:~ adamg$ python3 -m pynlp
/usr/local/opt/python3/bin/python3.5: Error while finding spec for 'pynlp.__main__' (<class 'ImportError'>: No module named 'corenlp_protobuf'); 'pynlp' is a package and cannot be directly executed
I have set my CORE_NLP
variable, and started a new Terminal session.
Hello,
Does pynlp keep the original tag type "O" which is the non-entity part?
For example,
sentence = "Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife"
Expecting result:
[('Nora Jani', 'PERSON'), ('a single person', 'O'), ('Matt Jani', 'PERSON'), ('and', 'O'), ('Susan Jani', 'PERSON'), ('husband and wife', 'O')]
Thanks.
I have installed this package with pip3.
There seems to be a circular dependency between the modules.
I get the following exception:
Traceback (most recent call last):
File "SemEval-2013.py", line 12, in <module>
from pynlp import StanfordCoreNLP
File "/inf/pynlp/pynlp/__init__.py", line 1, in <module>
from .client import stanford_core_nlp
File "/inf/pynlp/pynlp/client.py", line 5, in <module>
from pynlp.wrapper import Document
File "/inf/pynlp/pynlp/wrapper.py", line 2, in <module>
from pynlp import client
ImportError: cannot import name 'client'
According to Stanford's website, SUTime is provided automatically in corenlp. Is it included in this wrapper as well? If so, is there any documentation or can anyone provide an example as to how to use it (specifically to go from tagged entities to storing/printing a TIMEX3 object)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.