Code Monkey home page Code Monkey logo

uiuc-wmt15's Introduction

uiuc-wmt15

Submission for WMT 2015

Vignesh Raja and Yisi Liu

#Results

Toy experiment

  • Used 1000 sentences from Europarl parallel corpus
  • Tested on News test 2013
test: 2.22 (0.876) BLEU-c ; 2.35 (0.876) BLEU

Baseline experiments

  • Used full Europarl parallel corpus (~600,000 sentences)
  • Tested on News test 2013
test: 17.72 (0.983) BLEU-c ; 18.59 (0.983) BLEU
  • Used Common Crawl corpus and News Commentary v10.
  • Tested on News test 2013
test: 19.83 (0.982) BLEU-c ; 20.69 (0.982) BLEU

Stem Czech words

  • Use hard-stemming length of 6
  • Used full Europarl parallel corpus (~600,000 sentences)
  • Tested on News test 2013
test: 17.08 (0.948) BLEU-c ; 17.88 (0.948) BLEU

Use Morfessor during pre-processing to extract morphemes of Czech words

  • Trained Morfessor on toy data (1000 Czech sentences from Europarl)
  • Trained Moses on 1000 sentences from Europarl
  • Tested on News test 2013
test: 1.00 (1.489) BLEU-c ; 1.06 (1.489) BLEU
  • Trained Morfessor on full Europarl Czech corpus
  • Trained Moses on 1000 sentences from Europarl
  • Tested on News test 2013
test: 1.02 (1.253) BLEU-c ; 1.09 (1.253) BLEU
  • Trained Morfessor on full Europarl Czech corpus
  • Trained Moses on full Europarl parallel corpus
  • Tested on News test 2013
test: 15.74 (1.071) BLEU-c ; 16.48 (1.071) BLEU

Score 100-best list with pos intersection

  • Trained on the full Europarl data plus Common Crawl and Commentary
  • Tested on News test 2014
  • Score 100 outputs for each source sentence and output the one with the largest number of intersection
Total TER: 0.6890355884250757 (41646.0/60441.0)
NIST score = 6.6273
BLEU score (mteval) = 0.2056
BLEU score (multibleu) = 17.51
BLEU-c (multibleu) = 15.19

HOWTO translate using our model

  • Tokenize:
perl -ne 'print $1."\n" if /<seg[^>]+>\s*(.*\S)\s*<.seg>/i;' < SRC.sgm > SRC.tok
  • Lowercase:
/opt/moses/scripts/tokenizer/lowercase.perl < SRC.tok > SRC.input
perl filter-model-given-input.pl FILTERED_DIR ~/DIR_YOU_TRAINED_YOUR_MODEL/tuning/moses.tuned.ini.* SRC.input
  • Run moses:
/opt/moses/bin/moses -f FILTERED_DIR/CONFIG_FILE -i THE_FILE_IN_FILTERED_DIR > OUTPUT_FILE

uiuc-wmt15's People

Contributors

yisiliu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.