A Vietnamese poem classifer using BertForSequenceClassification with the accuracy of 99.7%
This is a side project during the making of our Vietnamese poem generator
- Classify Vietnamese poem into categories of
4 chu
,5 chu
,7 chu
,luc bat
and8 chu
- Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow:
score = L/10 + 3T/10 + 6R/10
The rules for each genre are defined below:
Genre | Length | Tone | Rhyme |
---|---|---|---|
4 chu | - 4 words per line - 4 lines per stanza (optional) |
For each line: - If the 2nd word is uneven (tr岷痗), the 4th word is even (b岷眓g) - Vice versa |
Last word (4th) of each line: - Continuous rhyme (gieo v岷 ti岷縫) - Alternating rhyme (gieo v岷 tr茅o) - Three-line rhyme (gieo v岷 ba) |
5 chu | - 5 words per line - 4 lines per stanza (optional) |
Same as "4 chu" | Same as "4 chu" |
7 chu | - 7 words per line - 4 lines per stanza (optional) |
For each line: - If the 2nd word is uneven (tr岷痗), the 4th word is even (b岷眓g), the 6th word is uneven (tr岷痗) - 5th word and last word (7th) must have different tone |
The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |
luc bat | - 6 words in odd line - 8 words in even line - 4 lines per stanza (optional) |
For 6-word line: - If the 2nd word is uneven (tr岷痗) the 4th word is even (b岷眓g), the 6th word is uneven (tr岷痗) For 8-word line: - Must be same as previous 6-word line - The last word (8th) mut have same tone as 6th word but different accent |
The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |
8 chu | - 8 words per line - 4 lines per stanza (optional) |
For each line: - If the 3rd word is uneven (tr岷痗), the 5th word is even (b岷眓g), the 8th word is uneven (tr岷痗) |
Same as "4 chu" |
A collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download here
For more detail, refer to the Acknowledgments section
Training code is in our repo Vietnamese poem generator
Run:
python poem_classifier_training.py
pip install vietnamese-poem-classifier
Or
pip install git+https://github.com/Anshler/vietnamese-poem-classifier
from vietnamese_poem_classifier.poem_classifier import PoemClassifier
classifier = PoemClassifier()
poem = '''Ng瓢峄漣 膽i theo gi贸 膽u峄昳 m芒y
T么i bu峄搉 nh岷穞 nh岷h th谩ng ng脿y l茫ng qu锚n
Em theo h煤 b贸ng kim ti峄乶
B岷 th岷 t么i ng岷玬 tri峄乶 mi锚n th贸i 膽峄漣.'''
classifier.predict(poem)
#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]
The model's weights are published at Huggingface Anshler/vietnamese-poem-classifier
This project was inspired by the evaluation method from fsoft-ailab
's SP-GPT2 Poem-Generator
Dataset also taken from their repo