Marta Bañón's Projects
Several benchmarks on sentence splitting and language identification
Anonymizer module for Bicleaner's pipeline (WIP)
Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.
Hunspell dictionaries in UTF-8
Targetted language identifier, based on FastText and Hunspell.
Dictionaries for FastSpell
Flask RESTful项目示例,包含JWT认证、rq异步任务、Swagger文档、Redoc文档、Docker部署、uwsgi、supervisor……
Python module to interface with Java Loomchild sentence segmenter
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
Program used to split text into segments
Stopwords removal:
Transform TMX to text