Code Monkey home page Code Monkey logo

neolo's People

Contributors

adiel-mittmann avatar jcrowgey avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

adiel-mittmann

neolo's Issues

add setup.py

This project should be installable as a package.

-s option is dropping some data

There is an off-by-one error in sent_split which cause lines to be dropped from the output. Whether or not a line is dropped depends on how many sentence splitting punctuation marks it contains.

This is a severe bug which causes inaccurate results. It is in the same function which has the bug reported in #4, but is not related to that bug.

move punctuation ratio up in the output

Reported via email

There is one change that would be very desirable: I wonder how difficult
would it be if the last line of results Neolo gives us could be
moved up to group with the first results that come in the very first line.
This is the last line of a result:

Punctuation ratio: 0.0799721500682859                                                                               

Since it comes after all the
lists and dictionaries, very few people pay attention to it.

The way I think would be good, would be like this:

Punctuation ratio: 0.0799721500682859                                                                               
17021 sentences in your text.                                                                                       
Average sentence length: 10.969684507373245 tokens.                                                                 
                                                                                                                    
----------------- Neologism List ----------------                                                                   
abeakutic                                                                                                           
abovementioned                                                                                                      
abrines                                                                                                             
abulafia                                                                                                            
accep                                                                                                               
accidens                                                                                                            
achates,,,                                                                                                          

mltd ZeroDivisionError on short texts

Maybe MLTD is undefined on certain texts, in that case neolo still shouldn't try to divide by zero.

$ neolo mary.txt --dicts /usr/share/dict/words --mltd
Neologism list:
,
.

Statistics:
-----------
Text size: 25 tokens in 20 types.
Number of hapax legomena: 15
TTR (type-token ratio): 0.8
HTR (hapax-token ratio): 0.6
HTyR (hapax-type ratio): 0.75
Neologisms:  2 types not found in 1 dictionaries
Dictionaries contained 160707 tokens in 72376 types.
MLTD...Traceback (most recent call last):
  File "/home/j/src/neolo/venv/bin/neolo", line 10, in <module>
    sys.exit(main())
  File "/home/j/src/neolo/venv/lib/python3.7/site-packages/neolo/neolo.py", line 436, in main
    print(mltd(clean_text))
  File "/home/j/src/neolo/venv/lib/python3.7/site-packages/neolo/neolo.py", line 203, in mltd
    return float(toks)/float(seg_n)
ZeroDivisionError: float division by zero

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.