mdredze / speech_ner_entity_linking_data Goto Github PK
View Code? Open in Web Editor NEWA dataset of transcribed speech that contains both named entity and entity linking annotations.
A dataset of transcribed speech that contains both named entity and entity linking annotations.
Authors: adrian dot benton at gmail dot com mark at dredze dot com This distribution contains named entity tagged and entity linked speech data. The named entity data was annotated as described in this paper: Carolina Parada, Mark Dredze, Frederick Jelinek. OOV Sensitive Named-Entity Recognition in Speech. International Speech Communication Association (INTERSPEECH), 2011. A subset of the person entities in this dataset were annotated for entity linking information as described in this paper: Adrian Benton, Mark Dredze. Entity Linking for Spoken Language. North American Chapter of the Association for Computational Linguistics (NAACL), 2015. The utterances were drawn from the HUB4 dataset https://catalog.ldc.upenn.edu/LDC98S71 , and the knowledge base is the same Wikipedia dump used in the TAC 2009 KBP track. This distribution includes the annotations only. The HUB4 data must be obtained from the LDC. The files contain the following: - "folds.txt": Mapping from each entity linking query ID to fold - "el_queries.txt": Entity linking queries. Format: QUERY_ID ENTITY_MENTION HUB4_UTTERANCE_ID ENTITY_TYPE KB_ID - "ne_el_labels.txt": All named entities annotated in the HUB4 transcripts. Any entity that does not have a linking annotation to the Wikipedia KB is marked as "NOT_ANNOTATED". Format: HUB4_UTTERANCE_ID ENTITY_MENTION START_SPAN END_SPAN ENTITY_TYPE KB_ID
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.