Code Monkey home page Code Monkey logo

predpatt's Introduction

PredPatt: Predicate-Argument Extraction from Universal Dependencies

We present PredPatt, a framework of extensible, interpretable, language-neutral predicate-argument extraction patterns. PredPatt bridges the deep syntax of the Universal Dependency project to an initial shallow semantic layer: this can form the basis for future layering of semantic annotations atop Universal Dependency treebanks, and separately can be considered a linguistically well-founded component of a "Universal IE" mechanism.

PredPatt is part of a wider initiative on decompositional semantics at Johns Hopkins University. To that end, it has been used to bootstrap semantic annotations in our recent EMNLP 2016 paper (White et al., 2016).

PredPatt shows the best precision and recall when compared with several prominent Open IE tools on a large benchmark (Zhang et al., 2017).

PredPatt extracts predicates and arguments from text .

?a extracts ?b from ?c
    ?a: PredPatt
    ?b: predicates
    ?c: text
?a extracts ?b from ?c
    ?a: PredPatt
    ?b: arguments
    ?c: text

Table of contents

Citation

If you use PredPatt please cite it as follows.

@InProceedings{zhang-EtAl:2017:IWCS,
    author = {Zhang, Sheng and Rudinger, Rachel and {Van Durme}, Ben },
    title = {{An Evaluation of PredPatt and Open IE via Stage 1 Semantic Role Labeling}},
    booktitle = {Proceedings of the 12th International Conference on Computational Semantics (IWCS)},
    month = {September},
    year = {2017},
    address = {Montpellier, France}
}

@InProceedings{white-EtAl:2016:EMNLP2016,
    author    = {White, Aaron Steven  and  Reisinger, Drew  and  Sakaguchi, Keisuke  and  Vieira, Tim  and  Zhang, Sheng  and  Rudinger, Rachel  and  Rawlins, Kyle  and  {Van Durme}, Benjamin},
    title     = {{Universal Decompositional Semantics on Universal Dependencies}},
    booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
    month     = {November},
    year      = {2016},
    address   = {Austin, Texas},
    publisher = {Association for Computational Linguistics},
    pages     = {1713--1723},
    url       = {https://aclweb.org/anthology/D16-1177}
}

License

BSD

predpatt's People

Contributors

azpoliak avatar esteng avatar johnb30 avatar rawlins avatar sheng-z avatar timvieira avatar vandurme avatar venkatasg avatar wswu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

predpatt's Issues

fatal error and crash in PredPatt.from_sentence(sen)

Look at "input_text_1" and "input_text_2".
There are some obvious spelling errors in the input sentence. There is a fatal error, when processing with PredPatt.from_sentence(sen). However, when using parse tree with PredPatt.from_constituency(sen), it is fine.

=====================================================
Python 3.5.4 (v3.5.4:3f56838976, Aug 7 2017, 12:56:33)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import nltk
from nltk.tokenize import sent_tokenize
from predpatt import PredPatt

input_text_1 = "Items that are not dishwasher safemay melt and create a potential fire hazard. NSF certified residential dishwashers are not intended for licensed food establishments."

input_text_2 = "Items that are not dishwasher safemay melt and create a potential fire hazard. NSF certified residential dishwashers are not intended for licensed food establishments. NSF certified residential dishwashers are not intendedfor licensed food establishments."

def sen_2_arg(paragraph):
... sent_tokenize_list = sent_tokenize(paragraph)
... for sen in sent_tokenize_list:
... print(sen)
... decomposed_sen = PredPatt.from_sentence(sen)
... for ds in decomposed_sen.instances:
... print(ds, ds.phrase())
... for a in ds.arguments:
... print(' ', a, a.phrase())
...

sen_2_arg(input_text_1)
Items that are not dishwasher safemay melt and create a potential fire hazard.
Predicate(safemay/5) ?a are not dishwasher safemay
Argument(that/1) that
Predicate(melt/6) ?a melt
Argument(Items/0) Items that are not dishwasher safemay
Predicate(create/8) ?a create ?b
Argument(Items/0) Items that are not dishwasher safemay
Argument(hazard/12) a potential fire hazard
NSF certified residential dishwashers are not intended for licensed food establishments.
Predicate(intended/6) ?a are not intended for ?b
Argument(dishwashers/3) NSF certified residential dishwashers
Argument(establishments/10) licensed food establishments

sen_2_arg(input_text_2)
Items that are not dishwasher safemay melt and create a potential fire hazard.
Predicate(safemay/5) ?a are not dishwasher safemay
Argument(that/1) that
Predicate(melt/6) ?a melt
Argument(Items/0) Items that are not dishwasher safemay
Predicate(create/8) ?a create ?b
Argument(Items/0) Items that are not dishwasher safemay
Argument(hazard/12) a potential fire hazard
NSF certified residential dishwashers are not intended for licensed food establishments.
Predicate(intended/6) ?a are not intended for ?b
Argument(dishwashers/3) NSF certified residential dishwashers
Argument(establishments/10) licensed food establishments
NSF certified residential dishwashers are not intendedfor licensed food establishments.

A fatal error has been detected by the Java Runtime Environment:

SIGBUS (0xa) at pc=0x0000000101863f49, pid=75707, tid=0x0000000000000307

JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode bsd-amd64 compressed oops)

Problematic frame:

C [libsystem_platform.dylib+0x4f49] _platform_memmove$VARIANT$Haswell+0x29

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

=============================
Python 3.5.4 (v3.5.4:3f56838976, Aug 7 2017, 12:56:33)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import nltk
from nltk.tokenize import sent_tokenize >>> from stanfordcorenlp import StanfordCoreNLP
from predpatt import PredPatt
stfd_nlp = StanfordCoreNLP('http://localhost', port=9000)

input_text_1 = "Items that are not dishwasher safemay melt and create a potential fire hazard. NSF certified residential dishwashers are not intended for licensed food establishments."

input_text_2 = "Items that are not dishwasher safemay melt and create a potential fire hazard. NSF certified residential dishwashers are not intended for licensed food establishments. NSF certified residential dishwashers are not intendedfor licensed food establishments."

def sen_2_arg(paragraph):
... sent_tokenize_list = sent_tokenize(paragraph)
... for sen in sent_tokenize_list:
... print(sen)
... pst = stfd_nlp.parse(sen)
... decomposed_sen = PredPatt.from_constituency(pst)
... for ds in decomposed_sen.instances:
... print(ds, ds.phrase())
...

sen_2_arg(input_text_1)
Items that are not dishwasher safemay melt and create a potential fire hazard.
Predicate(dishwasher/4) ?a are not dishwasher safemay
Predicate(melt/6) ?a melt ?b
Predicate(create/8) ?a create
NSF certified residential dishwashers are not intended for licensed food establishments.
Predicate(certified/1) ?a certified ?b
Predicate(intended/6) ?a are not intended for ?b

sen_2_arg(input_text_2)
Items that are not dishwasher safemay melt and create a potential fire hazard.
Predicate(dishwasher/4) ?a are not dishwasher safemay
Predicate(melt/6) ?a melt ?b
Predicate(create/8) ?a create
NSF certified residential dishwashers are not intended for licensed food establishments.
Predicate(certified/1) ?a certified ?b
Predicate(intended/6) ?a are not intended for ?b
NSF certified residential dishwashers are not intendedfor licensed food establishments.
Predicate(certified/1) ?a certified ?b
Predicate(establishments/9) ?a are not intendedfor licensed food establishments

ppatt.instances returns an emtpy list when using own CoNLL example

I followed the tutorial for loading an example from a CoNLL file. I used a sentence from a Slovenian UD corpus. That works fine, but when I tried out the commands from the "Play with PredPatt" section, the command ppatt.instances returned an empty list.

The commands worked fine when I used the Concrete file in the test directory.

Installation on OSX Mojave

In order to make the installation work on OSX Mojave, I have noticed that first you have to run the command: "export CFLAGS='-stdlib=libc++' "

Any plan to release it on pypi?

I am interested in using PredPatt on my project. Do you plan to release this library on PyPI?
It would be very helpful if you do that because it simplifies the installation workflow.

License

PredPatt currently doesn't have a license file, which makes it difficult for some people to use it in a meaningful manner.

cc @vandurme

Python2.7 vs 3 conflict in requirements--need version numbers

Hello,

It looks like PredPatt was written in Python2.7, but some of its required packages such as concrete and jpype1 require Python3 to install. I think I'd need to install previous versions of them to run PredPatt. Would you be able to provide us with specific version numbers with which PredPatt works?

Thank you!

Edit: Code in tutorial and using_predpatt.py, written in Python 2, is now defunct--the authors now advise to build and use a RESTful API instead.

Error when running using_predpatt.py

I face this error when I try to run doc/using_predpatt.py. any help, please!

Traceback (most recent call last):
File "C:\Windows\System32\PredPatt\doc\using_predpatt.py", line 8, in
P = PredPatt.from_sentence(sentence)
File "c:\windows\system32\predpatt\predpatt\patt.py", line 382, in from_sentence
parse = _PARSER(sentence)
File "c:\windows\system32\predpatt\predpatt\util\UDParser.py", line 105, in call
x = self.fresh(*args, **kwargs)
File "c:\windows\system32\predpatt\predpatt\util\UDParser.py", line 189, in fresh
raise e
IOError: [Errno 22] Invalid argument

load_conllu throws errors with enhanced dependencies in UD-r2.2

In UD-r2.2, to accomodate the use of empty nodes for the analysis of ellipsis in enhanced dependencies, the HEAD(gov in code) column is set to _. This throws an error in the load_conllu function, since DepTriple is called with int(gov) as one of the arguments. UD explains these nodes here and here.

Fix is easy enough. One can check the first column for '.', since UD stipulates that empty nodes must have index of the form i.1, where i is the index of referent of ellipsis. If '.' exists, ignore that line. Unless there is some information we can extract from the empty node?

UnicodeDecodeError

I am getting the following error. Can anyone please help how to resolve this?

==========
pd = PredPatt.from_sentence(sen)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/predpatt/patt.py", line 385, in from_sentence
parse = _PARSER(sentence)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/predpatt/util/UDParser.py", line 102, in call
s = s.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 16: invalid continuation byte

isNotCopula shouldn't take predicate as input parameter

Currently the filter isNotCopula takes a predicate object as input parameter and returns True if the predicate passed doesn't have gov_rel as cop, or if it is a list of copula verbs. However this filter isn't useful, since PredPatt doesn't annotate copula verbs as predicates(because UD uses the cop relation only for pure copulas, that is when the nonverbal predicate is the head of the clause).

isNotCopula should take Token as input parameter, and the the function should use token.gov_rel and token.text instead of pred.root.gov_rel and pred.root.text.

PredPatt parser running indefinitely for some specific options

I encountered a strange problem while parsing a particular sentence.
Consider:

conll_example = '''
1	The	_	DET	DT	_	2	det	_	_
2	action	_	NOUN	NN	_	2	ROOT	_	_
3	followed	_	VERB	VBD	_	2	acl	_	_
4	by	_	ADP	IN	_	3	agent	_	_
5	one	_	NUM	CD	_	6	nummod	_	_
6	day	_	NOUN	NN	_	4	pobj	_	_
7	an	_	DET	DT	_	9	det	_	_
8	Intelogic	_	PROPN	NNP	_	9	amod	_	_
9	announcement	_	NOUN	NN	_	3	dobj	_	_
10	that	_	ADP	IN	_	13	mark	_	_
11	it	_	PRON	PRP	_	13	nsubj	_	_
12	will	_	VERB	MD	_	13	aux	_	_
13	retain	_	VERB	VB	_	9	acl	_	_
14	an	_	DET	DT	_	16	det	_	_
15	investment	_	NOUN	NN	_	16	compound	_	_
16	banker	_	NOUN	NN	_	13	dobj	_	_
17	to	_	PART	TO	_	18	aux	_	_
18	explore	_	VERB	VB	_	13	advcl	_	_
19	alternatives	_	NOUN	NNS	_	18	dobj	_	_
20	"	_	PUNCT	''	_	19	punct	_	_
21	to	_	PART	TO	_	22	aux	_	_
22	maximize	_	VERB	VB	_	19	relcl	_	_
23	shareholder	_	NOUN	NN	_	24	compound	_	_
24	value	_	NOUN	NN	_	22	dobj	_	_
25	,	_	PUNCT	,	_	2	punct	_	_
26	"	_	PUNCT	''	_	2	punct	_	_
27	including	_	VERB	VBG	_	2	prep	_	_
28	the	_	DET	DT	_	30	det	_	_
29	possible	_	ADJ	JJ	_	30	amod	_	_
30	sale	_	NOUN	NN	_	27	pobj	_	_
31	of	_	ADP	IN	_	30	prep	_	_
32	the	_	DET	DT	_	33	det	_	_
33	company	_	NOUN	NN	_	31	pobj	_	_
34	.	_	PUNCT	.	_	2	punct	_	_
'''

conll_example = [ud_parse for sent_id, ud_parse in load_conllu(conll_example)][0]

obj = PredPatt(conll_example)

The above lines runs fine, but when I add the options as follows:
options = PredPattOpts(resolve_relcl=True, borrow_arg_for_relcl=True, resolve_conj=False, cut=True)

Then, the following line keeps running forever with no error messages:
obj = PredPatt(conll_example, opts=options)

PC Configurations:
OS: MacOS High Sierra
Python version: 3.6.5 |Anaconda custom (64-bit)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.