Code Monkey home page Code Monkey logo

character-identification's Introduction

Character Identification

Character Identification is an entity linking task that finds the global entity of each personal mention in multiparty dialogue. Let a mention be a nominal referring to a person (e.g., she, mom, Judy), and an entity be a character in a dialogue. The goal is to assign each mention to its entity, who may or may not participate in the dialogue. For the following example, the mention "mom" is not one of the speakers; nonetheless, it clearly refers to the specific person, Judy Geller, that could appear in some other dialogue. Identifying such mentions as real characters requires cross-document entity resolution, which makes this task challenging.

Character Identification Example

This task is a part of the Character Mining project led by the Emory NLP research group.

Dataset

All personal mentions are annotated with their global entities. For the above example, the first mention "I" is annotated with its global entity, Ross Geller, and the second mention "mom" is annotated with, Judy Geller, and so on. The mention detection is first performed automatically then corrected manually. The entity annotation is mostly crowdsourced although lots of them are fixed manually by experts.

Statistics

For each season, episodes 1 ~ 19 are used for training (TRN), 20 ~ 21 for development (DEV), and 22 ~ rest for evaluation (TST).

Dataset Episodes Scenes Utterances Tokens Speakers Mentions Entities
TRN 76 987 18,789 262,650 265 36,385 628
DEV 8 122 2142 28523 48 3932 102
TST 13 192 3,597 50,232 91 7,050 165
Total 97 1,301 24,528 341,405 331 47,367 781

Annotation

Each utterance is split into sentences and personal mentions in every sentence are annotated with their entities. For the example below, the utterance consists of one sentence including four mentions. The first three mentions, I, *mom and dad, are singular that refer to Ross Geller, Judy Geller and Jack Geller, respectively. The last mention, they, is plural that refers to both Judy Geller and Jack Geller.

{
  "utterance_id": "s01_e01_c01_u039",
  "speakers": ["Ross Geller"],
  "transcript": "I told mom and dad last night, they seemed to take it pretty well.",
  "tokens": [
    ["I", "told", "mom", "and", "dad", "last", "night", ",", "they", "seemed", "to", "take", "it", "pretty", "well", "."]
  ],
  "character_entities": [
    [[0, 1, "Ross Geller"], [2, 3, "Judy Geller"], [4, 5, "Jack Geller"], [8, 9, "Jack Geller", "Judy Geller"]]
  ]
}

Each mention is annotated by the following scheme:

[begin_index, end_index, entity(, entity)*]
  • begin_index: int - the beginning token index of the mention (inclusive).
  • end_index: int - the ending token index of the mention (exclusive).
  • entity: str - the label of the entity.

Citatioin

References

Shared Task

Contact

character-identification's People

Contributors

jdchoi77 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.