Code Monkey home page Code Monkey logo

normalizer's People

Contributors

asdofindia avatar balasankarc avatar copyninja avatar diadara avatar harish2704 avatar jerinphilip avatar jishnu7 avatar kavyamanohar avatar santhoshtr avatar sara-02 avatar subins2000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

normalizer's Issues

Improve loading rules

  • Use with style opening for automatically closing the file after reading
  • python3 compatibiity
  • Use for to loop through all lines
  • Simplify ignoring comments
  • Simplify rule syntax (is "" necessary?)

Include tests

There should be a test folder with various tests that check that the normalization is correct.

We could consider integrating travis too, after that.

Line Number value not correct in LoadRules error

I addded another line(line14) in normalizer_ml.rules as ൿ=ക്‍=ൿ=ക്‍ and raised an exception in LoadRules function of core.py


if(len(line.split("=")) != 2):
                raise Exception("[Error] Syntax Error in the Rules. Line number: %d"%line_number)
                # print(
                #     "[Error] Syntax Error in the Rules. Line number: ",
                #     line_number)
                # print("Line: " + text)

Output generated:

Exception: [Error] Syntax Error in the Rules. Line number: 27

Expected output:

Exception: [Error] Syntax Error in the Rules. Line number: 14

line_number is increamented twice within the loop, and hence the diiference in output

Make stripping punctuations configurable option.

There are cases where the normalized text is expected to retain the punctuations. As of now, the normalizer removes all punctuations. It will be good to make this a configurable option by passing an argument to the normalize method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.