Code Monkey home page Code Monkey logo

Comments (4)

ryanpitts avatar ryanpitts commented on August 28, 2024

The stopwords list is all managed on the backend, and I believe we're now pointing the dev server at a common elasticsearch instance that serves several Mozilla projects. (Hopefully Ross can confirm here?)

Not sure if there's a custom stopwords filter set up there, but by default elasticsearch just uses a standard English set. If we do want to change this, details and a couple relevant links here: http://stackoverflow.com/questions/4927629/can-i-customize-elastic-search-to-use-my-own-stop-word-list

I think current behavior is probably OK though? Anyone else want to weigh in?

from source.

dansinker avatar dansinker commented on August 28, 2024

I feel like skipping things like "in the" is just fine behavior. Certainly far better than the alternative (any instance of the word "the" pops a search result). Can't quite come up with a reason that I'd want any behavior that is counter to that.

from source.

ryanpitts avatar ryanpitts commented on August 28, 2024

Safe to close this one?

from source.

Pomax avatar Pomax commented on August 28, 2024

Those are reasonable justifications. It seems that some more operative words are not considered stop words, so it's unlikely to lead to searching missing out on relevant results. Things like "in the" are kind of borderline; they consist of stop words, but the combination is meaningful and far less common than the individual stop words. That said, keeping it this way it is now and having a look at the search logs at some point to see if people were actually searching using stop word phrases is less work, with the same net effect =)

from source.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.