Code Monkey home page Code Monkey logo

uwsd's Introduction

Context-Aware Semantic Similarity Measurement for Unsupervised Word Sense Disambiguation

This repository houses the codebase for replicating the experiments detailed in Jorge Martinez-Gil's paper on Context-Aware Semantic Similarity Measurement for Unsupervised Word Sense Disambiguation. Discover more insights and applications through our arXiv preprint and an accessible Medium article.

๐ŸŒ Overview

Word sense disambiguation (WSD) plays a pivotal role in Natural Language Processing (NLP). It involves deciphering the intended meaning of a word in a multi-sense context, which is crucial for enhancing the performance of applications like machine translation and information retrieval.

Our repository offers an innovative unsupervised approach to WSD using context-aware semantic similarity:

  1. Preprocessing: Clean and prepare your text data.
  2. Context Extraction: Identify the context surrounding the ambiguous word.
  3. Semantic Similarity: Utilize pre-trained sentence embeddings and cosine similarity to evaluate semantic parallels.
  4. Sense Selection: Choose the sense with the highest similarity score.

Included are the necessary code, pre-trained embeddings, and test data for thorough evaluation.

๐Ÿ› ๏ธ Installation

pip install -r requirements.txt

๐Ÿ“Š Dataset

The CoarseWSD-20 dataset, a well-known resource for coarse-grained WSD, forms the backbone of our experiments. It encompasses 20 commonly ambiguous words.

๐Ÿš€ Usage Guide

Follow these steps to apply our method:

  1. Clone this repository.
  2. Install dependencies (refer to the installation section).
  3. Download and position pre-trained word embeddings in the data directory.
  4. Execute the script of your choice and observe the results in your console.

๐Ÿ“ Evaluation

Evaluate our approach using the provided test data.

Unsupervised Word Sense Disambiguation (UWSD):

  • python uwsd_bert.py - BERT
  • python uwsd_elmo.py - ELMo
  • python uwsd_use.py - Universal Sentence Encoder (USE)
  • python uwsd_wmd.py - Word Mover's Distance (WMD)

Context-Aware Semantic Similarity (CASS):

  • python cass-wordnet+bert.py - CASS using WordNet and BERT
  • python cass-word2vec+bert.py - CASS using word2vec and BERT
  • python cass-webscrapping+bert.py - CASS using webscraping and BERT

Example Scenario UWSD:

  • Typed object-oriented programming languages, such as java and c++ , often do not support first-class methods --> options (island, programming language)
    • uwsd_bert: programming language
    • uwsd_elmo: programming language
    • uwsd_use: programming language
    • uwsd_wmd: programming language
    • ChatGPT-4: programming language

Example Scenario CASS:

  • Vienna is a nice city situated in the center of the European continent.
    • cass-wordnet+bert: middle
    • cass-word2vec+bert: hub
    • cass-webscrapping+bert: mid
    • ChatGPT-4: middle

๐Ÿ“ˆ Performance Results

The summary of the results in terms of the CoarseWSD-20 dataset disambiguation is:

Strategy Hits Accuracy
UWSD+BERT 7,927 77.74%
MFS-Baseline 7,487 73.43%
UWSD+USE 7,335 71.94%
UWSD+ELMo 7,010 68.75%
UWSD+WMD 5,868 57.55%
RO-Baseline 4,459 43.73%

๐Ÿ“š Citation

If you utilize our work, kindly cite us:

@inproceedings{martinez2023b,
  author    = {Jorge Martinez-Gil},
  title     = {Context-Aware Semantic Similarity Measurement for Unsupervised Word Sense Disambiguation},
  journal   = {CoRR},
  volume    = {abs/2305.03520},
  year      = {2023},
  url       = {https://arxiv.org/abs/2305.03520},
  doi       = {https://doi.org/10.48550/arXiv.2305.03520},
  eprinttype = {arXiv},
  eprint    = {2305.03520}
}

๐Ÿ“„ License

Released under the MIT License. View License.

uwsd's People

Contributors

jorge-martinez-gil avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.