Code Monkey home page Code Monkey logo

osmnames's People

Contributors

and01 avatar breunigs avatar charsleysa avatar dagnelies avatar hixi avatar julienfastre avatar kharesimran avatar klokan avatar krahulreddy avatar lukasmartinelli avatar martinmikita avatar pablocm avatar pekarja5 avatar philippks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osmnames's Issues

Export made for release 1.1

We are ready to make an update of the osmnames.klokantech.com, but the release is not yet made.

What is the status of export made for the release 1.1?

improve display_name

sometimes the name is missing at the beginning of display_name and only the overlying places are stated.

Cities reported as administrative boundaries

When searching for "Paris" at:
http://nominatim.openstreetmap.org/search.php?q=paris&polygon=1&viewbox=
the first two records are reported by Nominatim as "City" and "County".

The same search on OSMNames:
http://osmnames.klokantech.com/?q=paris&format=html
returns both records - but both are "adminstrative boundaries".

Isn't this caused by merging the bounding box record to the city record?

The two records should be distinguishable. What can be done to improve the type reported - so people can choose between two different records (city and county) instead of between two equal records?

Probably related to #33

Empty city/state for few records

I have found few records with empty city and state values, while display name contains everything.

For example:

{
 'id': 15341364,
 'alternative_names': "",
 'boundingbox': [13.736494, 50.10368, 13.738399, 50.105],
 'city': "",
 'class': "highway",
 'country': "Czech Republic",
 'country_code': "cz",
 'county': "Středočeský kraj",
 'display_name': "Hálkova, Rakovník, okres Rakovník, Středočeský kraj, Czech Republic",
 'importance': 0.1,
 'lat': 50.104366,
 'lon': 13.737429,
 'name': "Hálkova",
 'name_suffix': "CZ",
 'osm_id': "30985694",
 'osm_type': "way",
 'place_rank': 26.0,
 'rank': 2680111.75,
 'state': "",
 'street': "Hálkova",
 'type': "residential",
 'wikidata': "",
 'wikipedia': ""}

{
 'id': 15334559,
 'alternative_names': "",
 'boundingbox': [14.163356, 49.786163, 14.172256, 49.788288],
 'city': "",
 'class': "highway",
 'country': "Czech Republic",
 'country_code': "cz",
 'county': "Středočeský kraj",
 'display_name': "Hálkova, Dobříš, okres Příbram, Středočeský kraj, Czech Republic",
 'importance': 0.1,
 'lat': 49.787533,
 'lon': 14.167943,
 'name': "Hálkova",
 'name_suffix': "CZ",
 'osm_id': "25218558",
 'osm_type': "way",
 'place_rank': 26.0,
 'rank': 2680111.75,
 'state': "",
 'street': "Hálkova",
 'type': "residential",
 'wikidata': "",
 'wikipedia': ""}

Other data with same class/type contains values.
Is it possible to fix this problem?

Handling street geometries as aggregates

(engl. translation to follow): Oft sind in OSM eigentlich zusammengehörende Strassen (mit gleichem Namen) gestückelt, wie z.B. die Spitalstrassse hier: http://www.openstreetmap.org/search?query=Spitalstrasse%2070%20Wetzikon#map=15/47.3249/8.8135 . Nominatim z.B. behandelt das als 4 unabhängige Strassen wenn die Hausnummer nicht vorhanden ist. D.h. die (nicht existierende Hausnr. 70) wird nicht in die Mitte der Strasse gesetzt... Solche zusammengehörenden Strassen sollten eigentlich für Geocoding-Zwecke zu Multi-Linestrings aufgearbeitet werden.

Write documentation

Update readme. Update OSMNames.org. Make exlicit the difference to building addresses...

Analyse: Precalculated ranking: place_rank / importance via Wikipedia import

For the export we need predefined rank for individual records. We want to get similar ranking as it is implemented in Nominatim - with maximal amount of information precalculated.

It seems relevant documentation for Nominatim is at:
http://wiki.openstreetmap.org/wiki/Nominatim/Development_overview

importance is defined by Wikipedia linking to OSM - we need to integrate the Wikipedia import into the docker workflow of OSM2VectorTiles.

place_rank is not described in wiki.

Details of this work all depends on Analyse of Nominatim #2 done by @and01

get rid of duplicates

In the current dataset of v0.2 there a still some duplicates. The reason is that sometimes the name in the same for the village and the administrative boundary. Add some logic the prevent. E.g. if administrative available take this entry and delete the other one, as the boundaries are more precise.
Example:
Erstfeld boundary administrative 8.600908996 46.81278292 16 0.35 Erstfeld, Uri, Schweiz, Suisse, Svizzera, Svizra 8.517504723 46.77636908 8.683019567 46.84932148
and
Erstfeld place village 8.64996352 46.82150873 19 0.275 Erstfeld, Uri, Schweiz, Suisse, Svizzera, Svizra 8.64996352 46.82150873 8.64996352 46.82150873

Add more info about data format / schema

The current docs lacks a formal schema (optional, data types) and mentions e.g. class and type, which suggests being enumerations. Add these enumeration values.

Nominatim Analysis

  • install nominatim
  • find out how the importing process and ranking process works in detail

Formal names vs shorter names of cities

Could the name of cities contain the shorter name? I expect it is part of the OSM record merged with the administrative boundary to get the bounding box, or it is simply available elsewhere in the OSM attributes.

The official long names of cities are exported now sometimes - and these are bad for the fulltext ranking and usability.

Example of existing problematic records in OSMNames:
"City of Edinburgh" -> should be "Edinburgh"
"Autonomous City of Buenos Aires" -> "Buenos Aires"
"Hlavní město Praha" -> "Praha"

BTW It seems like for the name of countries this is done already as there is "China" in OSMNames, not the official "People's republic of China" (also available in OSM and drawn on the maps sometimes).

Any idea how to get this in for the records marked with type city - as planned in #35 @and01 ?

Basic Export

  • Basic Export with bounding box, Name, Point(Centroid)
  • Without ranking, hierarchie

OSM or place_id identifier

It seems we have completely lost the OSM id on the way to export...

It would be great to have it included in the export - so the links like https://nominatim.openstreetmap.org/details.php?place_id=145024048
can be constructed automatically, such as mentioned in #33.

We have three options for format:

This is primarily for linking back to OSM objects for more info, and for debugging purposes.
Nominatim provides the the IDs too - and is doing almost equal operation as our code.

What do you think about this @and01?

Invalid lines in data version 0.4

Lines in the data.tsv: 1 841 264 (first line is header, which is skipped)
Imported lines: 1 841 250

Invalid lines (last column is missing):

line    name    class   type    lon lat place_rank  importance  street  city    county  state   country country_code    display_name    west    south   east    north   wikipedia

91864   Am Kirchberg    secondary   secondary   6.841595989904307   49.16163618488043   26  0.09999999999999998 Am Kirchberg    Großrosseln    Regionalverband Saarbrücken    Saarland    fr  Am Kirchberg, Nassweiler, Großrosseln, Regionalverband Saarbrücken, Saarland, Germany 6.838062095707301   49.1560293622095    6.8447050892484365  49.168108690510984
312501  Bremerhof   street  residential 6.841469674623482   49.154695885233714  26  0.09999999999999998 Bremerhof   Großrosseln    Regionalverband Saarbrücken    Saarland    fr  Bremerhof, Nassweiler, Großrosseln, Regionalverband Saarbrücken, Saarland, Germany    6.83416719294064    49.15132024136862   6.84734413146262    49.15787623075483
407149  Doktor-Alfred-Meiche-Weg    multiple    path,track  14.363858298560643  50.94067266995829   26  0.09999999999999998 Doktor-Alfred-Meiche-Weg    Sebnitz Landkreis Sächsische Schweiz-Osterzgebirge Saxony  de  Doktor-Alfred-Meiche-Weg, Sebnitz, Landkreis Sächsische Schweiz-Osterzgebirge, Saxony, Germany 14.360647023816826  50.940260531779266  14.36586827894166   50.94148621748033
460100  Einsiedler Straße  multiple    residential,service,tertiary    13.31406297991839   50.58950277822012   26  0.09999999999999998 Einsiedler Straß   Marienberg  Erzgebirgskreis Saxony  de  Einsiedler Straße, Marienberg, Erzgebirgskreis, Saxony, Germany    13.303206822747086  50.581045857013194  13.3230629646319    50.59806858034928
628486  GR 53 Variante Kappelstein  multiple    path,track  7.799475023069675   49.05598137336894   26  0.09999999999999998 GR 53 Variante Kappelstein              fr  GR 53 Variante Kappelstein, Wingen  7.798648986511921   49.049671057564694  7.813862476047975   49.06360228355331
634078  Graslitzer Straße  secondary   secondary   12.468319226703128  50.35478685822314   26  0.09999999999999998 Graslitzer Straße  Klingenthal Vogtlandkreis   Saxony  de  Graslitzer Straße, Klingenthal, Vogtlandkreis, Saxony, Germany 12.468123844540145  50.35457798119605   12.468470520055405  50.35511098641888
990056  Kriegwaldweg    street  residential 13.2857859590103    50.5880190975394    26  0.09999999999999998 Kriegwaldweg    Marienberg  Erzgebirgskreis Saxony  de  Kriegwaldweg, Marienberg, Erzgebirgskreis, Saxony, Germany  13.282151398156202  50.57649666288481   13.290877462274011  50.59671683082446
1289634 Pleiler Straße street  residential 13.087813620317007  50.50581282779383   26  0.09999999999999998 Pleiler Straße Jöhstadt   Erzgebirgskreis Saxony  de  Pleiler Straße, Jöhstadt, Erzgebirgskreis, Saxony, Germany    13.083079186128458  50.50025663181819   13.090216628137824  50.51393288031222
1367414 Rolnická   street  residential 14.574697193901413  50.947453964901804  26  0.09999999999999998 Rolnická               de  Rolnická, Horní Jindřichov   14.566575967916608  50.942408226829436  14.581660124686692  50.95028922746927
1416633 Saldernbrücke  primary primary 14.12193528293898   52.84334616634002   26  0.09999999999999998 Saldernbrücke  Bad Freienwalde (Oder)  Landkreis Märkisch-Oderland    Brandenburg pl  Saldernbrücke, Hohenwutzen, Bad Freienwalde (Oder), Landkreis Märkisch-Oderland, Brandenburg, Germany 14.12124218336555   52.84317391822985   14.123412593373303  52.84342839281019
1530555 Sommerweg   multiple    residential,unclassified    13.770007750972923  50.733779052622275  26  0.09999999999999998 Sommerweg   Altenber    Landkreis Sächsische Schweiz-Osterzgebirge Saxony  de  Sommerweg, Altenberg, Altenberg, Landkreis Sächsische Schweiz-Osterzgebirge, Saxony, Germany   13.767690154745454  50.73331788030968   13.77365597432919   50.73575986398025
1584087 Straßburger Straße    primary primary 7.812207888361513   48.576551263993466  26  0.09999999999999998 Straßburger Straße    Kehl    Ortenaukreis    Baden-Württemberg  fr  Straßburger Straße, Kehl, Ortenaukreis, Regierungsbezirk Freiburg, Baden-Württemberg, Germany    7.8018327686133375  48.57369311883033   7.8162894561118605  48.57670515573573

Invalid lines, two columns are missing (maybe country_code and wikipedia link):

line    name    class   type    lon lat place_rank  importance  street  city    county  state   country country_code    display_name    west    south   east    north   wikipedia

727992  Heinrich-Noë-Steig path    path    11.290707990062097  47.42556250037941   27  0.07499999999999996 Heinrich-Noë-Steig Mittenwald  Landkreis Garmisch-Partenkirchen    Free State of Bavaria   Heinrich-Noë-Steig, Mittenwald, Landkreis Garmisch-Partenkirchen, Upper Bavaria, Free State of Bavaria, Germany    11.290695442256094  47.42522234799208   11.290856290977985  47.42583355637149
908231  Karwendel-Klettersteig  path    path    11.297986228374295  47.428914575112074  27  0.07499999999999996 Karwendel-Klettersteig                  11.296963177992154  47.428440495896524  11.298769729583142  47.42985578024737

Invalid lines in data version 1.0 (World/Planet)

Input file has 20 842 915 entries (without first header line).
Sphinxsearch has processed 20 735 819 entries.

107 096 lines are invalid: number of columns isn't 19, some column values are missing.

10 lines contains field values with tab (this is not acceptable, and cannot be fixed with pre-processing):

1529261-\t-"Avenida -\t-A el Cargadero"-\t-primary-\t-primary-\t--102.99770443241843-\t-22.65882699820638-\t-26-\t-0.09999999999999998-\t-"Avenida -\t-A el Cargadero"-\t-" "-\t-Jerez-\t-Zacatecas-\t-Mexico-\t-mx-\t-"Avenida -\t-A el Cargadero, Jerez, Zacatecas, Mexico"-\t--102.99841050106902-\t-22.657710165970382-\t--102.99699831802235-\t-22.659943859347116-\t-" "

See attached tsv, ods and xls files packed in ZIP
planet-invalid-lines.zip

BTW command for checking:
cat data.tsv | sed -e 's/\r/ /g' | awk -F"\t" 'NR > 1 && NF != 19 {print NR"\t"$0}' > data-invalid-lines.tsv

Data format of hierarchical geonames

Target of the project is to have data export extremely easy to use with any fulltext search engine or SQL database.

Best praxis meeting this criteria is in GeoNames.org project - with .tsv format defined at: http://download.geonames.org/export/dump/

and GISGraphy:
http://download.gisgraphy.com/format.txt

My initial suggestion for the format is:

*osm_id - MUST BE UNIQUE "DOCUMENT ID" accross complete database

display_name - exactly as in Nominatim (may be improved later)

*name (=utf-8)
name_en
name_de
name_es
name_fr
name_ru
name_zh

*class
*type

*north (=boundingbox)
*south
*east
*west

*lat
*lon

scalerank - we have it
place_rank - nominatim has it

importance - exactly as in nominatim calculated

country (=country code, ISO-3166 2-letter country code)

street=<housenumber> <streetname>
city=<city>
county=<county>
state=<state>
country=<country>
postalcode=<postalcode>

(= a la nominatim http://wiki.openstreetmap.org/wiki/Nominatim)

? timestamp - osm modification?

This is mostly derived from JSON provided by Nominatim - see: http://wiki.openstreetmap.org/wiki/Nominatim and
http://nominatim.klokantech.com/?q=paris&format=jsonv2&addressdetails=1
and from the fields we have already in vector tiles in OSM2VectorTiles project.

On this ticket we should agree on the details of the export data format.

Export shorter Wikipedia + Wikidata ID

Nominatim returns wikipedia link in a short variant such as "es:Buenos Aires".
It would be better to return in OSMNames this short variant too - not a complete URL.

Another helpful information in the export would be wikidata id (such as "Q23436").

In Nominatim JSON these are columns "wikipedia" and "wikidata".

Merging via the relation "label" may be required - as mentioned in #41 and #35

Attach bbox larger than a point to all entities

Currently not all entities have a 'real' bbox which: Those e.g. coming from a node have zero extent. Try to attach a realistic bbox by estimating/guessing. E.g. by deriving bbox from extent of enclosing entity, or guessing some 100 meters if it's of type "village".
Other ideas?

Export of native & alternative names

For sure it would be good to export the native name of places (name_en vs name as in vector tiles).

Now native name like "Praha" is not discovered. In other scripts (such as japan, china, thai, etc) the original names are undiscoverable.

GeoNames has a single "alternativenames" record.

To be discussed on Monday @and01

Duplicate of same results?

I have found duplicate of same entry:
http://osmnames.klokantech.com/?q=humenne+presov&format=html

Entry line 2876052 and 2876053

Humenné    boundary    administrative  21.90590524496128   48.94311364081  18  0.3     okres Humenné  " " Prešovský kraj    Slovakia    sk  Humenné, okres Humenné, Prešovský kraj, Východné Slovensko    21.85670398275343   48.906349733391124  21.952359183778924  48.97980253019544   " "
Humenné    boundary    administrative  21.90590524496128   48.94311364081  20  0.25        okres Humenné  " " Prešovský kraj    Slovakia    sk  Humenné, okres Humenné, Prešovský kraj, Východné Slovensko    21.85670398275343   48.906349733391124  21.952359183778924  48.97980253019544   " "

Invalid lines in data version 0.6 (Europe)

Input file has 8 478 324 entries (without first header line).
Sphinxsearch has processed 8 478 284 entries.

40 lines are invalid: number of columns isn't 19, some column values are missing.

See attached tsv, ods and xls files packed in ZIP.

europe-invalid.zip

Analyse: Nominatim hierarchy

Very important point on which we need to know soon is how exactly Nominatim builds the "display_name" variable - and what part of the hierarchy reconstruction is pre-calculated (and stored in SQL tables) during the import of OSM planet and what part is made during query time.

Purpose of this ticket is to document and link relevant code related to gazetteer hierarchy made from OSM in Nominatim.

Towns in extracts

Adust docker workflow (source code in this repo) and include towns and assignment of towns to a country (parsing from places and first hierarchical import).

Run it on planet dump extract of Switzerland - and deliver exported .csv as part of the milestone release.

This is the essential step to make a progress with this project, should not be delayed. It was discussed on April 25th.

Would be good to have alpha version during this week (latest till Friday?). The result should be delivered and presented latest on milestone meeting (Monday 16:00) on Skype.

@and01 could you please provide progress report here?

Wikipedia importance

Importing Wikipedia into a Postgres database and improve the calculation of importance based on these data.

To be done with a separate Dockerfile included in the importing process.

Based on #4

add language support

currently the exported tsv includes all languages. This results in a long display_name. E.g.
Erschwil, Bezirk Thierstein, Amtei Dorneck-Thierstein, Solothurn, Schweiz, Suisse, Svizzera, Svizra
Therefore it would be swell to have a language based export.

Importance for streets

OSMNames seems to have the same importance value for a street in a capital town to an equally named street in a village.

For fulltext search results (without viewbox provided) the streets in big cities should be shown first. That's what Google does too.

Indexing house numbers and zip in the street record

For fulltext search it would make sense to have in the TSV dump a field with "house_numbers" - listing all house numbers, and also the "zip" code.

This way the fulltext search would be able to verify an address is correct and provide directly a record for a street as a result.
On the search server could be external logic for parsing addresses - making lookup for LonLat of the individual house for give street - in an external non-fulltext database.

During indexing as well as during searching a library for parsing & normalizing address would be helpful - to cover variety of formatting and alternatives for how people write an address.
Something like libpostal described in libpostal-overview - (tested and not-yet-used in production for now, see: klokantech/osmnames-sphinxsearch#9)

This may be a solution how to extend the OSMNames search to support postal address search as well - with an acceptable quality and performance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.