Code Monkey home page Code Monkey logo

Comments (6)

pudo avatar pudo commented on June 11, 2024

Hi. I'm just working on an integration test harness for the API, so this comes in useful. So what I think it turns out to be is: "fuzziness": "AUTO" brings in a levenshtein tolerance of only 1 for a string the length of "Barrrack Obama". Adding two extra "r" exceeds that threshold. So I guess the best option would be to make fuzziness default to something other than AUTO, e.g. 2. I don't want to do this on the public API that we operate, since it's a massive performance penalty, but we could introduce and environment setting?

from yente.

skrafft avatar skrafft commented on June 11, 2024

Hi,

I don't think this is related to the AUTO value. I've tested multiple combination directly on Elastic Search with fuzziness=AUTO,1 or 2 and it does not change the results. As a matter of fact, the query https://api.opensanctions.org/search/default?q=Barrack%20Obama returns 1 result and https://api.opensanctions.org/search/default?q=Barrock%20Obama%fuzzy=true (changing one "a" to one "o") does not return anything.

I think there's something wrong with the mapping but could not figure what so ended up rewriting the query.

from yente.

pudo avatar pudo commented on June 11, 2024

Just to be clear: the guy is called Barack Obama (https://en.wikipedia.org/wiki/Barack_Obama). Barrack Obama is fuzziness=1, Barrock Obama is fuzziness=2. Am I total confused here?

from yente.

skrafft avatar skrafft commented on June 11, 2024

That's true but he also has aliases like Barrack Obama in the data so Barrack Obama is a perfect match according to Elastic Search (which makes fuzzy to 1 when you replace a to o). Anyway, searching https://api.opensanctions.org/search/default?q=Barock%20Obama does not return any result either.

from yente.

AndreiD avatar AndreiD commented on June 11, 2024

so for it to return a result for Barock%20Obama is there something that can be configured or added?

from yente.

pudo avatar pudo commented on June 11, 2024

Ok so I've solved this question, but the answer is less than amazing. Basically: ElasticSearch never does fuzzy search on all the terms in a query_string query - that's something you have to actively indicate by adding a tilde to the fuzzy term: barock~ obama gives a result.

My take-away: probably a good idea to use /match in yente most of the time if you're trying to match entities. The search API is just that: a way for people to search on the web site...

cf. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

from yente.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.