Code Monkey home page Code Monkey logo

truecase's Introduction

TrueCase

Main Publish PyPI

A language independent, statistical, language modeling based tool in Python that restores case information for text.

The model was inspired by the paper of Lucian Vlad Lita et al., tRuEcasIng but with some simplifications.

A model trained on NLTK English corpus comes with the package by default, and for other languages, a script is provided to create the model. This model is not perfect, train the system on a large and recent dataset to achieve the best results (e.g. on a recent dump of Wikipedia).

Prerequisites

  • Python 3

The project uses NLTK. Find install instructions here.

Installing

pip3 install truecase

Usage

Simple usecase:

>>> import truecase
>>> truecase.get_true_case('hey, what is the weather in new york?')
'Hey, what is the weather in New York?''

Training your own model

TODO. For now refer to Trainer.py

Contributing

I see a lot of space for improvement. Feel free to fork and improve. Do sent a pull request.

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

truecase's People

Contributors

brucewuzhang avatar daltonfury42 avatar discoveredcheck avatar keshprad avatar louismartin avatar richecr avatar sergialonsaco avatar tannonk avatar tiberiuichim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

truecase's Issues

Incorrect handling of contractions

correctedStr = truecase.get_true_case("I didn't get the best price.")
print(correctedStr)

I did n't get the best price.

correctedStr = truecase.get_true_case("I don't want another.")
print(correctedStr)

I do n't want another.

ModuleNotFoundError

After installing the package I get:

ModuleNotFoundError: No module named 'truecase'
on
import truecase

Case of wanNA, gonNa and ...

Informal spoken American English words like WANNA and GONNA are converted to wanNa and gonNa.

>>> import truecase
>>> truecase.get_true_case("I DON'T WANNA GO TO SCHOOL.")
"I don't wanNA go to school."

bug report

Thanks for making this repo.

bugs in __function_one and __function_two of Trainer.py.

Logical bug, I checked the original implementation of

https://github.com/nreimers/truecaser

He first goes through the whole corpus to get all casing info. But you are getting casing info on the fly, which means that when a casing appears for the first time for one lower token, its 2-gram and 3-gram statistics will never be counted. This is not desired for this algorithm.

module 'truecase' has no attribute 'get_true_case'

When I use the code, I get the error :

module 'truecase' has no attribute 'get_true_case'

how to fix it?

I am running Ubuntu 20.04.4 LTS
I have installed nltk
I use python 3.7

Thank you for your answer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.