Comments (6)
Hi. I'm just working on an integration test harness for the API, so this comes in useful. So what I think it turns out to be is: "fuzziness": "AUTO"
brings in a levenshtein tolerance of only 1 for a string the length of "Barrrack Obama". Adding two extra "r" exceeds that threshold. So I guess the best option would be to make fuzziness
default to something other than AUTO, e.g. 2
. I don't want to do this on the public API that we operate, since it's a massive performance penalty, but we could introduce and environment setting?
from yente.
Hi,
I don't think this is related to the AUTO value. I've tested multiple combination directly on Elastic Search with fuzziness=AUTO,1 or 2 and it does not change the results. As a matter of fact, the query https://api.opensanctions.org/search/default?q=Barrack%20Obama returns 1 result and https://api.opensanctions.org/search/default?q=Barrock%20Obama%fuzzy=true (changing one "a" to one "o") does not return anything.
I think there's something wrong with the mapping but could not figure what so ended up rewriting the query.
from yente.
Just to be clear: the guy is called Barack Obama
(https://en.wikipedia.org/wiki/Barack_Obama). Barrack Obama
is fuzziness=1, Barrock Obama
is fuzziness=2. Am I total confused here?
from yente.
That's true but he also has aliases like Barrack Obama in the data so Barrack Obama is a perfect match according to Elastic Search (which makes fuzzy to 1 when you replace a to o). Anyway, searching https://api.opensanctions.org/search/default?q=Barock%20Obama does not return any result either.
from yente.
so for it to return a result for Barock%20Obama is there something that can be configured or added?
from yente.
Ok so I've solved this question, but the answer is less than amazing. Basically: ElasticSearch never does fuzzy search on all the terms in a query_string
query - that's something you have to actively indicate by adding a tilde to the fuzzy term: barock~ obama
gives a result.
My take-away: probably a good idea to use /match
in yente most of the time if you're trying to match entities. The search API is just that: a way for people to search on the web site...
cf. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
from yente.
Related Issues (20)
- s3 link? HOT 2
- index ready time HOT 4
- Is there a way to use /data/datasets/index.json instead of https://data.opensanctions.org/datasets/latest/index.json? HOT 2
- Unable to connect to elastic search HOT 7
- Improve matching API results HOT 1
- elasticsearch.helpers.BulkIndexError: 37 document(s) failed to index. HOT 3
- Make a helm chart HOT 1
- Implement incremental scans HOT 1
- Adding `include_dataset` as the opposite of existing `exclude_dataset` HOT 2
- Index freshness endpoint
- Expose latest available/loaded timestamps in /catalog HOT 1
- Mark some stopwords for search queries
- Array query params handling HOT 2
- Set up more container security scanning
- Self-signed certificate support
- Added Support for Yente with Elasticsearch on CapRover: A Template for Easy Deployment HOT 3
- compressed responses for large responses
- Allow httpx to use proxies HOT 4
- Support for `dataset` allow-listing in /search and /match APIs
- Expose detailed dataset metadata in /catalog API
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yente.