Code Monkey home page Code Monkey logo

computational-stylistic-variations's Introduction

computational-stylistic-variations

Stylistic Variations in Distributional Vector Space Models

This repository contains implementations for

  1. Xing Niu and Marine Carpuat. "Discovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases". Workshop on Stylistic Variation at EMNLP 2017.
@InProceedings{niu-carpuat:2017:StyVa,
  author    = {Niu, Xing  and  Carpuat, Marine},
  title     = {Discovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases},
  booktitle = {Proceedings of the Workshop on Stylistic Variation},
  year      = {2017},
  address   = {Copenhagen, Denmark},
  publisher = {Association for Computational Linguistics},
  pages     = {20--27}
}
  1. Xing Niu, Marianna Martindale, and Marine Carpuat. "A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output". EMNLP 2017.
@InProceedings{niu-martindale-carpuat:2017:EMNLP2017,
  author    = {Niu, Xing  and  Martindale, Marianna  and  Carpuat, Marine},
  title     = {A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output},
  booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
  year      = {2017},
  address   = {Copenhagen, Denmark},
  publisher = {Association for Computational Linguistics},
  pages     = {2804--2809}
}

Dependencies

Usage Instructions

  1. Set up parameters and pointers in global.cfg.
  • Get a hint of parameter settings from the Evaluation section below.
  • Choose Word2vec based models (e.g. SVM-W2V-subspace) for ranking purpose.
  • Choose LSA based models (e.g. PCA-LSA) for scoring purpose.
  1. Initialize and test.
> bash formality/evaluate.sh
  1. Calculate lexical formality for lines of text.
> bash formality/calc-formality-score.sh -i input-file -o output-file -p -s
Usage: calc-formality-score.sh -i INPUT_FILE -o OUTPUT_FILE [-s] [-l] [-p]
Optional arguments:
  -i INPUT_FILE    input file (absolute path)
  -o OUTPUT_FILE   output file (absolute path)
  -s               sort lines by formality score
  -l               only output lexical scores"
  -p               preprocess input file (tokenization and lowercasing)"

Evaluation

Method VSM Dimension PCA-Data Sub-Dim CTRW Accuracy BEAN Spearman's r BEAN RMSE
SVM W2V 10 0.776 0.566 0.424
PCA W2V 10 0.770 0.656 0.390
SimDiff W2V 10 0.780 0.646 0.404
SVM W2V 300 ppdb 20 0.844 0.662 0.372
PCA W2V 300 ppdb 20 0.829 0.660 0.389
SimDiff W2V 300 ppdb 20 0.832 0.662 0.386
SVM W2V 300 seed 20 0.801 0.576 0.384
PCA W2V 300 seed 20 0.768 0.653 0.377
SimDiff W2V 300 seed 20 0.781 0.658 0.364
SVM LSA 10 0.737 0.661 0.361
PCA LSA 10 0.730 0.655 0.352
SimDiff LSA 10 0.780 0.646 0.353
SVM LSA 300 ppdb 20 0.712 0.457 0.641
PCA LSA 300 ppdb 20 0.671 0.498 0.545
SimDiff LSA 300 ppdb 20 0.686 0.492 0.563
SVM LSA 300 seed 20 0.727 0.481 0.575
PCA LSA 300 seed 20 0.699 0.522 0.513
SimDiff LSA 300 seed 20 0.714 0.524 0.526
  • VSM: Vector Space Model
  • W2V: word2vec
  • LSA: Latent Semantic Analysis
  • CTRW: Choose the Right Word, see paper 1
  • BEAN: Blog, Email, Answers and News, see paper 2
  • RMSE: Root-Mean-Square Error

computational-stylistic-variations's People

Contributors

xingniu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.