Code Monkey home page Code Monkey logo

eventtriplesextraction's Introduction

EventTriplesExtraction

  EventTriplesExtraction based on dependency parser and semantic role labeling, 基于依存句法与语义角色标注的事件三元组抽取   文本表示一直是个重要问题,如何以清晰,简介的方式对一个文本信息进行有效表示是个长远的任务
我尝试过使用关键词,实体之间的关联关系,并使用textgrapher的方式进行展示,但以词作为文本信息单元表示这种效果不是特别好,所以,本项目想尝试从事件三元组的方式出发,对文本进行表示.

方法1-基于ltp依存句法分析和语义角色标注的事件三元组抽取

    from triples_extraction import *
    extractor = TripleExtractor()
    svos = extractor.triples_main(content)
    print('svos', svos)

方法1-测试结果

    content = '李克强总理今天来我家了,我感到非常荣幸'
    svos = [
              ['李克强总理', '来', '我家'],
              ['我', '感到', '荣幸']
             ]

    content = ''' 以色列国防军20日对加沙地带实施轰炸,造成3名巴勒斯坦武装人员死亡。此外,巴勒斯坦人与以色列士兵当天在加沙地带与以交界地区发生冲突,一名巴勒斯坦人被打死。当天的冲突还造成210名巴勒斯坦人受伤。
当天,数千名巴勒斯坦人在加沙地带边境地区继续“回归大游行”抗议活动。部分示威者燃烧轮胎,并向以军投掷石块、燃烧瓶等,驻守边境的以军士兵向示威人群发射催泪瓦斯并开枪射击。'''
    svos = [
             ['以色列国防军', '实施', '轰炸'],
             ['冲突', '发生', '巴勒斯坦人与以色列士兵'],
             ['当天冲突', '造成', '受伤'],
             ['数千名巴勒斯坦人', '继续', '回归大游行抗议活动'],
             ['部分示威者', '投掷', '石块'],
             ['驻守边境以军士兵', '发射', '催泪瓦斯']
             ]

方法2-基于百度DDParser依存句法分析的事件三元组抽取

   from baidu_svo_extract import *
   extractor = SVOParser()
   svos = extractor.triples_main(content2)
   print('svos', svos)

方法2-测试结果

    content = '李克强总理今天来我家了,我感到非常荣幸'
    svos = [
             ['总理李克强', '来', '我家'], 
             ['我', '感到', '荣幸']]
             ]

    content = ''' 以色列国防军20日对加沙地带实施轰炸,造成3名巴勒斯坦武装人员死亡。此外,巴勒斯坦人与以色列士兵当天在加沙地带与以交界地区发生冲突,一名巴勒斯坦人被打死。当天的冲突还造成210名巴勒斯坦人受伤。
当天,数千名巴勒斯坦人在加沙地带边境地区继续“回归大游行”抗议活动。部分示威者燃烧轮胎,并向以军投掷石块、燃烧瓶等,驻守边境的以军士兵向示威人群发射催泪瓦斯并开枪射击。'''
    svos = [
              ['20日', '实施', '轰炸']
              ['当天冲突', '造成', '210名']
              ['巴勒斯坦人', '回归', '大游行']
              ['部分示威者', '燃烧', '轮胎']
             ]

方法3-基于词性模板规则的事件三元组抽取

   from pattern_event_triples import *
   extractor = ExtractEvent()
   events, spos = handler.phrase_ip(content1)
   spos = [i for i in spos if i[0] and i[2]]
   print('svos', svos)

方法3-测试结果

    content = '李克强总理今天来我家了,我感到非常荣幸'
    svos = [
              ('李克强总理', '来', '我家')
              ('李克强', '感到', '荣幸')
             ]

    content = ''' 以色列国防军20日对加沙地带实施轰炸,造成3名巴勒斯坦武装人员死亡。此外,巴勒斯坦人与以色列士兵当天在加沙地带与以交界地区发生冲突,一名巴勒斯坦人被打死。当天的冲突还造成210名巴勒斯坦人受伤。
当天,数千名巴勒斯坦人在加沙地带边境地区继续“回归大游行”抗议活动。部分示威者燃烧轮胎,并向以军投掷石块、燃烧瓶等,驻守边境的以军士兵向示威人群发射催泪瓦斯并开枪射击。'''
    svos = [
              ('数千名巴勒斯坦人在加沙地带边境地区', '继续回归游行', '抗议活动')
              ('部分示威者', '燃烧', '轮胎')
              ('边境', '以军', '士兵向示威人群发射催泪瓦斯开枪射击')
              ('士兵向示威人群', '发射', '催泪瓦斯开枪射击')
             ]

总结

本项目公开了基于ltp句法分析和语义角色标注、基于百度DDParser以及基于词法模式的事件三元组抽取方法,并给了实验示例。可以得到以下结论:
1、LTP在DDParser之外,还提供了语义角色标注的功能,这个可以用于事件三元组抽取的有效补充。
2、LTP速度比DDParser要快。
3、基于词法模式的事件三元组抽取速度最快,但效果取决于分词、词性标注性能。
4、基于词法模式,可以得到语义更长的三元组元素信息。

If any question about the project or me ,see https://liuhuanyong.github.io/
如有自然语言处理、知识图谱、事理图谱、社会计算、语言资源建设等问题或合作,可联系我:
1、我的github项目介绍:https://liuhuanyong.github.io
2、我的csdn博客:https://blog.csdn.net/lhy2014
3、about me:刘焕勇,**科学院软件研究所,[email protected]

eventtriplesextraction's People

Contributors

liuhuanyong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eventtriplesextraction's Issues

否定句提炼

针对否定句提炼时,否定词好像会删除,影响了原本意思?

pattern_event_triples.py的精准度略低

events, spos = handler.phrase_ip("李克强总理今天来我家了,我感到非常荣幸")
events
['李克强总理来我家', '李克强感到非常荣幸']
似乎是代码逻辑上有点问题

incompatible constructor arguments.

from triple_extraction import *报错
---------------------------------------------------------------------------TypeError Traceback (most recent call last)/tmp/ipykernel_687/1960098357.py in
----> 1 from triple_extraction import *
~/triple_extraction.py in
135 print('svos', svos)
136
--> 137 test()
~/triple_extraction.py in test()
131 content5 = ''' 以色列国防军20日对加沙地带实施轰炸,造成3名巴勒斯坦武装人员死亡。此外,巴勒斯坦人与以色列士兵当天在加沙地带与以交界地区发生冲突,一名巴勒斯坦人被打死。当天的冲突还造成210名巴勒斯坦人受伤。
132 当天,数千名巴勒斯坦人在加沙地带边境地区继续“回归大游行”抗议活动。部分示威者燃烧轮胎,并向以军投掷石块、燃烧瓶等,驻守边境的以军士兵向示威人群发射催泪瓦斯并开枪射击。'''
--> 133 extractor = TripleExtractor()
134 svos = extractor.triples_main(content2)
135 print('svos', svos)
~/triple_extraction.py in init(self)
9 class TripleExtractor:
10 def init(self):
---> 11 self.parser = LtpParser()
12
13 '''文章分句处理, 切分长句,冒号,分号,感叹号等做切分标识'''
~/sentence_parser.py in init(self)
10 def init(self):
11 LTP_DIR = "./ltp_data_v3.4.0"
---> 12 self.segmentor = Segmentor()
13 self.segmentor.load(os.path.join(LTP_DIR, "cws.model"))
14
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. pyltp.Segmentor(model_path: str, lexicon_path: str = None, force_lexicon_path: str = None)

Invoked with:

报错

WindowsPath‘ object has no attribute ‘read_text

Argument types are supported!

init(): incompatible constructor arguments. The following argument types are supported:
1. pyltp.Segmentor(model_path: str, lexicon_path: str = None, force_lexicon_path: str = None)

Invoked with:

运行之后会出现如上报错pyltp == 0.4.0, itp_data 3.4.0

baidu_svo_extract.py 文件 28 行代码有误

        for arc_index in range(len(rel_id)):
            if rel_id[arc_index] == index+1:   #arcs的索引从1开始
                if rel_id[arc_index] in child_dict:
                    child_dict[relation[arc_index]].append(arc_index)
                else:
                    child_dict[relation[arc_index]] = []
                    child_dict[relation[arc_index]].append(arc_index)

应该为
for arc_index in range(len(rel_id)):
if rel_id[arc_index] == index+1: #arcs的索引从1开始
if relation[arc_index] in child_dict:
child_dict[relation[arc_index]].append(arc_index)
else:
child_dict[relation[arc_index]] = []
child_dict[relation[arc_index]].append(arc_index)
误将 if relation[arc_index] in child_dict: 写为 if rel_id[arc_index] in child_dict:

加载不了模型?

请问下Segmentor: Model not loaded!,加载不了模型,这是 什么原因?

load pisrl.model RunTimeError

您好楼主
遇到问题 self.labeller.load(os.path.join(LTP_DIR, 'pisrl.model'))
RuntimeError: incompatible native format - size of long
能否描述下你运行成功的软件环境? 操作系统,python版本,pyltp版本, ltpdata版本。
我看过这里的Issue,有提到可能是ltpdata版本问题。
我用的win10, python3.6.5, pyltp0.2.1 , ltpdata v3.4.0

get_ips

pattern_event_triples.py中的get_ips()
phrase_postags = ''.join(phrase_postag_).replace('m', '').replace('q','').replace('a', '').replace('t', '')
请问这里是不是写错了,phrase_postags上边是list的遍历,这里又为字符串

about paper

您好,我想问一下Event Triples Extraction based on dependency parser and semantic role labeling,这个的论文我怎么没有搜到?楼主可以分享一下吗?

Model not loaded!

mldl@ub1604:/ub16_prj/EventTriplesExtraction$ python3 triple_extraction.py
[dynet] random seed: 3227227130
[dynet] allocating memory: 2000MB
[dynet] memory allocation done.
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
Segmentor: Model not loaded!
Postagger: Model not loaded!
Parser: Model not loaded!
Parser: Model not loaded!
SRL: Model not loaded!
svos []
mldl@ub1604:
/ub16_prj/EventTriplesExtraction$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.