Code Monkey home page Code Monkey logo

duppub's People

Contributors

bmsauer avatar cannibalcheeseburger avatar khinezarthwe avatar maniis avatar matejklemen avatar neizod avatar nickalaskreynolds avatar

Watchers

 avatar  avatar  avatar

duppub's Issues

better algorithm detecting similar string

right now i use edit distance to check similarity between 2 strings, it run so slow i had to limit string size. are there any better algorithm?

that i have in mind:

  • approximate algorithms are fine!
  • i need to check all pair, not just 2 strings.

Need unit test (or some way to automate it)

Right now we have a manual test, which run by this in command line:

python report.py example_input.csv

Can we make them automate somehow? I mean the result is similarity percentage, and many algorithm report different percentage too. Any ideas?

Command line argument with ArgParse

Right now the script can be invoke with only one argument, the CSV file.

It'd be nice if we can invoke it and supply more arguments to tune the performance, such as

  • Threshold (default: 80%) -- report similarity only above this threshold.
  • Limit Characters (default: 100 chars) -- right now the algorithm run so slow we need to chop string to compare down, we should be able to tune this parameter.
  • Algorithm (default: levenshtein) -- other slower algorithm available is edit_distant.

Technical info: now the main function doing the hard code parsing argument with sys.argv, I think Python ArgParse can do this job better & cleaner.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.