Code Monkey home page Code Monkey logo

vietnamese-poem-classifier's Introduction

Vietnamese poem classification and evaluation 馃摐馃攳

A Vietnamese poem classifer using BertForSequenceClassification with the accuracy of 99.7%

This is a side project during the making of our Vietnamese poem generator

Features

  • Classify Vietnamese poem into categories of 4 chu, 5 chu, 7 chu, luc bat and 8 chu
  • Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: score = L/10 + 3T/10 + 6R/10

The rules for each genre are defined below:

Genre Length Tone Rhyme
4 chu - 4 words per line
- 4 lines per stanza (optional)
For each line:
- If the 2nd word is uneven (tr岷痗), the 4th word is even (b岷眓g)
- Vice versa
Last word (4th) of each line:
- Continuous rhyme (gieo v岷 ti岷縫)
- Alternating rhyme (gieo v岷 tr茅o)
- Three-line rhyme (gieo v岷 ba)
5 chu - 5 words per line
- 4 lines per stanza (optional)
Same as "4 chu" Same as "4 chu"
7 chu - 7 words per line
- 4 lines per stanza (optional)
For each line:
- If the 2nd word is uneven (tr岷痗), the 4th word is even (b岷眓g), the 6th word is uneven (tr岷痗)
- 5th word and last word (7th) must have different tone
The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme
luc bat - 6 words in odd line
- 8 words in even line
- 4 lines per stanza (optional)
For 6-word line:
- If the 2nd word is uneven (tr岷痗) the 4th word is even (b岷眓g), the 6th word is uneven (tr岷痗)

For 8-word line:
- Must be same as previous 6-word line
- The last word (8th) mut have same tone as 6th word but different accent
The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line
8 chu - 8 words per line
- 4 lines per stanza (optional)
For each line:
- If the 3rd word is uneven (tr岷痗), the 5th word is even (b岷眓g), the 8th word is uneven (tr岷痗)
Same as "4 chu"

Data

A collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download here

For more detail, refer to the Acknowledgments section

Training

Training code is in our repo Vietnamese poem generator

Run:

python poem_classifier_training.py

Installation

pip install vietnamese-poem-classifier

Or

pip install git+https://github.com/Anshler/vietnamese-poem-classifier

Inference

from vietnamese_poem_classifier.poem_classifier import PoemClassifier

classifier = PoemClassifier()

poem = '''Ng瓢峄漣 膽i theo gi贸 膽u峄昳 m芒y
          T么i bu峄搉 nh岷穞 nh岷h th谩ng ng脿y l茫ng qu锚n
          Em theo h煤 b贸ng kim ti峄乶
          B岷 th岷 t么i ng岷玬 tri峄乶 mi锚n th贸i 膽峄漣.'''

classifier.predict(poem)

#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]

Model

The model's weights are published at Huggingface Anshler/vietnamese-poem-classifier

Acknowledgments

This project was inspired by the evaluation method from fsoft-ailab's SP-GPT2 Poem-Generator

Dataset also taken from their repo

vietnamese-poem-classifier's People

Contributors

anshler avatar

Stargazers

Grace Pei avatar Dinh Phong(涓侀) avatar TiNyX3k avatar Nghia La avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.