anknown / ahocorasick Goto Github PK
View Code? Open in Web Editor NEWA faster and more efficient Golang implement of Aho-Corasick algorithm using Double Array Trie
License: MIT License
A faster and more efficient Golang implement of Aho-Corasick algorithm using Double Array Trie
License: MIT License
Hi, can I use one machine from several goroutines?
Currently matching is always case sensitive. Obviously the machine could be built with lowercase dict, and input text could be lower cased, but it could be more efficient for the case insensitive matching to happen in the matching logic itself, especially in the case where returnImmediately=true
It seems that this person has plagiarized your code. link
currently if the machine's trie contains "abc" it will match 3 times in "abcde abc zabc"
I am proposing an flag to match whole words only, in which case it would match once
Hi What would be the license of this fine code?
Thanks!
I recently attempted to consume this module. But being the paranoid individual that I am, decided to run a robust test against the algorithm to check for its correctness.
My tests revealed that the algorithm matched keywords incorrectly in marginal cases.
I am willing to contribute the tests that I implemented for this algorithm, and also help figure out where/why it fails.
I am currently using the cloudflare/ahocorasick library because it is accurate 100% of the time, but I really do wish to see the bug in this library fixed because of its major edge in efficiency.
Hi, I have seen the issue at #4
But ExactSearch just seems to try and match a single word with a single word. It doesn't match a whole string "only" in the middle with a word boundary, like the original problem reported.
I.e with ExactSearch "abc" it will NOT match at all "abcde abc zabc", but will match if the string is "abc" (so it's basically acting like a Map)
But with MultiPatternSearch abc will match 3 times.
It would be good to have an option where it can match inside an arbitrary long string, but only at word boundaries either side (eg if there is whitespace or end of line next to the match). I'd be happy to add a specific boundary character between words if it helps.
Hope that makes sense!
Where could I find gitlab.baidu.com/hanshinan/godarts ?
https://github.com/anknown/ahocorasick/blob/master/ahocorasick.go#L8 ?
I cannot access http://gitlab.baidu.com/hanshinan/godarts
Hi @anknown ,
Thank you for your library.
I have a question related to thread safe in your lib.
The lib has provided:
Hi anknown, or should I call you 韩诗楠.
Thank you for your effort porting Aho Corasick with DoubleArrayTrie to golang. But please give appropriate credit to other projects you benefited from.
Your benchmark is copied from my project. Why didn't you even mention it?
I haven't read your code yet, but as we (including pioneers I mentioned in my projects) are implementing almost the same algorithm. Did you borrow any ideas from them? If you did, you should give credit to them.
In a worldwide community like GitHub, we Chinese should show more respect for copyright. Otherwise we are always the joke.
Thank you!
Hi, I just opened an issue because I'm not sure how else to contact you. I really like this lib and am using it, thank you for open sourcing it!
Just a quick question. Given a Machine that is returned from Build, is it safe for multiple goroutines to call MultiPatternSearch concurrently on that Machine? I think the answer is yes but just want to be sure.
Thanks, Kevin
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.