Code Monkey home page Code Monkey logo

deep-text-corrector's Introduction

模型修改自 Deep Text Corrector

训练模型:

python3 train.py

使用模型:

python3 test.py

模型要求

纠正英文句子中的6种错误

  • 介词错误
  • 名词错误
  • 冠词错误
  • 动词错误
  • 谓语错误
  • 标点错误

运行环境

  • Python 3.6
  • TensorFlow 1.10
  • NLTK 3.3
  • NumPy 1.13

模型及参数

主要**是 sequence-to-sequence + attention mechanism,细节可参见 Bahdanau et al., 2014

encoder 和 decoder 的 RNN 每层大小和层数: 512, 4

RNN 单元:LSTM

最大单词(token)数(词典大小):2000

因使用了一种处理OOV词(词典外的词)的策略,可以使用较小的词典,可参见 Addressing the Rare Word Problem in NMT

数据集

数据处理

基本**是对正确的句子加入随机“噪声”来得到训练数据,分别针对下面不同情况进行处理

  • 介词、冠词、谓语、标点:将其随机替换成其他相应的词或去掉
  • 动词:随机改变时态

错误标记

对原句子和纠正后的句子分别用NLTK标记各单词的词性,然后贪心地找出不同的(被纠正的)词,最后根据其词性确定错误的类型。

代码结构

  • corrector/corrector.py 定义了Corrector 类,封装了训练和对句子进行纠正的函数,调用方法可参考 train.py 和 test.py
  • corrector/data_reader.py, text_corrector_data_readers.py 实现了对各数据集的读取和处理
  • corrector/text_corrector_models.py 实现seq2seq模型
  • corrector/seq2seq.py 旧版本tf中的seq2seq包
  • corrector/noise.py 提供了对文本添加噪声的函数

参考资料

deep-text-corrector's People

Contributors

atpaino avatar xspin avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.