Code Monkey home page Code Monkey logo

cnn-question-classification-keras's Introduction

Recurrent Convolutional Neural Networks for Chinese Question Classification on BQuLD

Architecture Overview

Alt text

For more details Click Here.

Bilingual Question Labelling Dataset (BQuLD)

This dataset is a bilingual (traditional Chinese & English) question labelling dataset designed for NLP researchers.
It originally consists of 1216 pairs of question and question label, which first published by the author of this GitHub tim5go
There are 9 question types in total, namely:

  1. NUMBER
  2. PERSON
  3. LOCATION
  4. ORGANIZATION
  5. ARTIFACT
  6. TIME
  7. PROCEDURE
  8. AFFIRMATION
  9. CAUSALITY

Embedding Preparation

In my experiment, I built a word2vec model on 全網新聞數據(SogouCA) Sogou Labs

For example, in Linux:

  1. clean XML tag
$ cat news_tensite_xml.dat | iconv -f gbk -t utf-8 -c | grep "<content>" 
  | sed 's\<content>\\' | sed 's\</content>\\' > corpus.txt
  1. word segmentation using LTP command line
$ cws_cmdline --threads 4 --input corpus.txt --segmentor-model cws.model > corpus.seg.txt
  1. simplified to traditional Chinese conversion using OpenCC
$ opencc -i corpus.seg.txt -o corpus_trad.txt -c s2t.json
  1. word2Vec training using Google Word2vec
$ nohup ./word2vec -train corpus_trad.txt -output sogou_vectors.bin -cbow 0 
  -size 200 -window 10 -negative 5 -hs 0 -sample 1e-4 -threads 24 -binary 1 -iter 20 -min-count 1 &

Result

Training Loss Training Accuracy Validation Loss Validation Accuracy
0.7000 87.11% 0.8945 77.87%

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.