for example:
If stack
is only found once on a page, but stackoverflow
is found a dozen times, then stackoverflow
should not be suggested before stack
in the suggestion list for the Find Suggest add-on. stack
should come before stackoverflow
and have a frequency count of 13 (+ count of other strings to which /^stack/.test(string) == true
).
I can only think of a few ways to do this which are O(n^2) atm.. currently the getSortedWords
function is O(nlog(n)), and n can be very large. This function does two things atm: get all of the words on the page, ands sorts them by frequency.
One way to handle this which I like so far is to change getSortedWords
to getWords
which is a array of Word
objects which store the word and the count in properties; when the suggest
function is called we do a O(n) check for x matches, and then sort those results, so:
getWords
would be O(n)
suggest
would be O(n) + O(x^2) where x <= ~50
Instead of a O(nlog(n)) getSortedWords
function.
Also the O(x^2) function could be optimized such that it'd be O(x) in some cases after reuse on the same page. This would be done by flagging a Word
object and updating it's count
property, so that we know the count is the true frequency and we don't have to recalculate it (which is O(n)).
If there is a way to include this calculation that I'm describing in a O(nlog(n)) sort then suggest
should just be O(n) + O(xlog(x)) imo, instead of being O(nlog(n))
on first execution and O(n)
afterward, which it is now.