Code Monkey home page Code Monkey logo

Comments (9)

adelcasse avatar adelcasse commented on August 15, 2024 1

@missinglink sorry I didn't see your message before. We had something working on our side with libpostal as @elsa-pato described. We would need to look at it again but sure we're interested in contributing to the discussion if useful (I don't know if there were changes on that subject since 31 march).

from interpolation.

missinglink avatar missinglink commented on August 15, 2024

Hi @adelcasse, this feature would need to be addressed in the pelias/api repository to enable autocomplete in Pelias.

The interpolation engine was designed to be a standalone service, I would prefer not to implement autocomplete here because it's a linguistics/syntax/natural language problem and not directly related to address schemas/geodesic math etc.

I think would be 'cleaner' if an outside system (eg. Pelias using an elasticsearch index, or any other system) was able to resolve partially completed address inputs into a complete 'street name' and desired 'house number' (we also need a lat/lon anywhere inside the bounding box of the street in order to disabiguate two streets with the same name in the same city).

Once this has been done you can send those three values (street name, house number, filter point) to the interpolation service and it will return a value, either an interpolation or an exact match.

see: #112 (comment) regarding integration.

In that comment, I wrote a little about our current integration between 'Pelias proper' and the interpolation engine:

- If a user requests an address and elasticsearch returns a street, send a second request to the interpolation
  service for a result. If successful, use the interpolated result, otherwise use the street centroid.

This also only applies to the /v1/search endpoint and not to the autocomplete endpoint at this stage.

Let me explain a little more about how that currently works.

more info: [design doc] [relationship to pelias] [existing standards] [conflation]

indexing

  1. We join the road segments from OpenStreetMap into their longest contiguous linestring and export them as 'polylines' (our .0sv file format)
  2. We import the street data into Pelias in the layer named street, we include the centroid (midpoint) of the line string.
  3. We build the interpolation index, this is fairly complex but you can find more info in the wiki links I posted above, we use the exact same .0sv file as above.

searching

  1. We parse the input text and check to see what constituent parts it contains, if the query is identified as containing a street name and a house number then it's a candidate for an address search
  2. We query elasticsearch for an exact address match (we have addresses there also) and fall back to the street if we don't have a match.
  3. There is some logic in the pelias/api codebase which is able to detect that the user requested an address and got back a street.
  4. In this case, we pass the name of the street, the requested house number and the street centroid to the interpolation engine, which returns an interpolated result.
  5. The street-level-accuracy result is substituted with the address-level-accuracy result and returned to the user.

sorry for the wall of text :)
so... regarding autocomplete

If we want to enable this feature for autocomplete then we need to have a parsing engine capable of (at minimum) being able to parse partially completed input text in to housenumber street name, it should really also be able to handle identifying postalcodes and administrative areas (I recently enabled autocomplete on https://github.com/pelias/placeholder, so that could probably handle the admin portion).

Writing a geographic text parser is not an easy undertaking, and one that is autocomplete aware is even more difficult, we currently use three parsing engines:

  • libpostal is used by the /v1/search endpoint and is fairly accurate in most cases, it lacks two features which we would like to have.
    • it does not support autocomplete
    • it does not handle ambiguities (like 'ontario, ca' being both Canada and California)
  • addressit is a simple parser based off regular expressions, we use this as a fallback parser, it's not very robust, but it can handle some very generic formats of addreses.
  • placeholder is a library I wrote last year, it supports autocomplete. Currently it only supports administrative areas (towns, cities, countries etc) but could potentially be expanded to include streets. It also supports languages, ambiguities and synonyms.

Have a look at the readme docs for those repos to get a better understanding of how they work.

The current obstacle to enabling interpolation on autocomplete is that none of these engines is sufficiently capable of parsing partially completed address input (eg. "1 Ma").

If they were able to do so, then it would probably result in 10,000+ street names globally starting with Ma.
Each of these streets would need to be queried against the interpolation index, which could cause performance issues at scale.

There may be some workarounds for this (like only using the top 10), but they would also need to be considered.

Again, sorry for the wall of text, hopefully that gives some background to the feature and an idea of it's complexity.

If you're still interested in discussing further we could set up a call, depending on your timelines we might be available for consultancy work, if that interests you, or I can continue to help out for free on the issue tracker :)

from interpolation.

elsa-pato avatar elsa-pato commented on August 15, 2024

Hi @missinglink ,

I work with @adelcasse and I would like to add this feature to Pelias.
I'm totally new to Pelias and I've started looking into it last week, so I haven't gone too far yet :)

For now I've tried to do the following in pelias-api :

- if the basic autocomplete query did not succeed : 
-- call libpostal
-- if libpostal found a street & house number : 
--- repeat first query, without the house number, and filters to return streets only
- call interpolation

This works well for my test cases (french streets), but it might not work worldwide. That's why I'd like to have your opinion on how to proceed :)

I have a few questions in mind:

  • when should we trigger interpolation on autocomplete ? In my scenario, I only trigger it if the basic autocomplete doesn't find any result. This should help solve the performance issue, but it might not be the best for the user.
  • you say libpostal doesn't support autocomplete yet. What's missing exactly ? in my test cases libpostal behaves well so I haven't really looked into that part yet.

from interpolation.

missinglink avatar missinglink commented on August 15, 2024

hi @elsa-pato, sorry for the late reply.

I'd suggest looking in to your second point a bit more before you continue:

in my test cases libpostal behaves well so I haven't really looked into that part yet.

This hasn't been my experience, libpostal isn't designed to work with partially specified inputs.

Some basic examples:

http://localhost:4400/parse?address=Rue

[
  {
    "label": "city",
    "value": "rue"
  }
]
http://localhost:4400/parse?address=Champs-E

[
  {
    "label": "house",
    "value": "champs-e"
  }
]
http://localhost:4400/parse?address=Boulevard

[
  {
    "label": "suburb",
    "value": "boulevard"
  }
]
http://localhost:4400/parse?address=s

[
  {
    "label": "city_district",
    "value": "s"
  }
]

from interpolation.

missinglink avatar missinglink commented on August 15, 2024

In a lot of cases it also struggles with fully specified street names:

http://localhost:4400/parse?address=L’Esplanade des Invalides

[
  {
    "label": "house",
    "value": "l'esplanade des invalides"
  }
]

It really wasn't designed to work for anything less than a full postal address, and really must have a city or region specified in the input to work correctly.

from interpolation.

elsa-pato avatar elsa-pato commented on August 15, 2024

Hi,
Thanks for your reply :)
The thing is that in this specific case, we are looking for a house number in order to interpolate, so the input address would look more like "76 rue ..." which works way better.

http://localhost:4400/parse?address=410%20Boulevard
[
  {
    "label": "house_number",
    "value": "410"
  },
  {
    "label": "road",
    "value": "boulevard"
  }
]
http://localhost:4400/parse?address=410%20Rue
[
  {
    "label": "house_number",
    "value": "410"
  },
  {
    "label": "road",
    "value": "rue"
  }
]
http://localhost:4400/parse?address=410%20s
[
  {
    "label": "house_number",
    "value": "410"
  },
  {
    "label": "road",
    "value": "s"
  }
]

But sure, it's not perfect yet..

http://localhost:4400/parse?address=410%20L%E2%80%99Esplanade%20des%20Invalides
[
  {
    "label": "house_number",
    "value": "410"
  },
  {
    "label": "house",
    "value": "l'esplanade des invalides"
  }
]

(this result is actually quite weird, as -as far as I know, and in France at least- an "esplanade" is rarely a house name, more like a square. But maybe in other countries it's different, I'm currently trying to get a planet build to run more tests)

http://localhost:4400/parse?address=410%20Champs-E
[
  {
    "label": "postcode",
    "value": "410"
  },
  {
    "label": "country",
    "value": "champs-e"
  }
]

ok, this one really fails :p

One more thing is, as I implemented it, we call libpostal & interpolate only if the standard autocomplete search didn't return any result ; so it usually means that the user wrote quite a precise address, which I guess helps libpostal, and might help autocomplete stay performant.

You can check what I did there https://gitlab.scity.coop/pelias-contrib/api/commit/d24f37121f4cb184b26c99934199c338cc8ddf56 (it's really just a quick & dirty solution to start with)

That said, I'll give a deeper look to libpostal and see what I can do :)

from interpolation.

missinglink avatar missinglink commented on August 15, 2024

We are hoping to merge pelias/api#1287 soon which replaces the addressit parser with https://github.com/pelias/parser.
Once that work is complete it will be possible to tackle this issue and enable interpolation for autocomplete.

from interpolation.

missinglink avatar missinglink commented on August 15, 2024

the core team are looking at this again right now.

I've spent some time making the interpolation service more performant, and it can now handle around 6k/s requests on a single thread, so it that should be adequate to handle the load.

The problem still remains the logic for when to call the interpolation service when in autocomplete mode. Since this issue was opened we've completely refactored the parsing logic for autocomplete to use our own parser, which may help make this problem a little easier.

@adelcasse did you manage to find a solution that worked for you? are you still interested in contributing to the discussion of how this might work?

from interpolation.

adelcasse avatar adelcasse commented on August 15, 2024

@missinglink I've made tests with the "compare" tool (to see différences between your -geocode.earth- and our servers) and I see that your dev environment is "less strict" on housenumbers than your production one (and returns the street first when there is no matching housenumber) : is it the result of pelias/api#1432 or something else ? Is your dev environment code somewhere already ?

from interpolation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.