Code Monkey home page Code Monkey logo

gazetteer's Introduction

OpenStreetMap (OSM) geocoder

Main purpose of this project is easy to use geocoder/geoindexer.

Project consists of two parts: Gazetteer and GazetteerWeb

Gazetteer

Gazetteer used to parse osm data and do all dirty work with geometry.

You can use Gazetteer as standalone osm processor, to dump addresses from osm.

You can ignore GazetteerWeb and use data in your own geocoding/geosearching applications. Take an osm.bz2 dump and generate json with

  • full geocoded buildings
  • full geocoded POIs
  • streets
  • cities
  • administrative boundaries

Details are here https://github.com/kiselev-dv/gazetteer/tree/develop/Gazetteer

You could find data extracts here: http://data.osm.me/dumps/

GazetteerWeb

GazetteerWeb is a second part of the project. You may take it as example implementation of search engine for Gazetteer generated data or use it for your own purposes.

Details are here https://github.com/kiselev-dv/gazetteer/tree/develop/GazetteerWeb

gazetteer's People

Contributors

bushmank avatar dependabot[bot] avatar kiselev-dv avatar matkoniecz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gazetteer's Issues

Extract only addresses?

Another question I have: Is it possible to extract only addresses, and no highways or other data?

Thanks a lot!

Output to CSV - Syntax

Hi,

while working with the gazetteer (btw. great tool!) to extract addresses from OSM I found in the Readme the note, that it is possible to output also to CSV-format. Unfortunately I was not able to figure out how to do this. Could you give me please a little example?

Thanks a lot!

Failed to compile

mvn clean compile assembly:single -f Gazetteer/pom.xml
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Gazetteer 1.4-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for me.osm.osm-doc:osm-doc-java:jar:0.11 is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.868s
[INFO] Finished at: Sat Sep 19 02:34:41 CEST 2015
[INFO] Final Memory: 6M/236M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project Gazetteer: Could not resolve dependencies for project me.osm.Gazetteer:Gazetteer:jar:1.4-SNAPSHOT: Failure to find me.osm.osm-doc:osm-doc-java:jar:0.11 in http://raw.githubusercontent.com/kiselev-dv/mvn-repository/master/releases/ was cached in the local repository, resolution will not be reattempted until the update interval of osm-doc-mvn-repo has elapsed or updates are forced -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

убирать из addr-full-text ГО этого же города

пример , Гусевский район, Гусевское городское поселение, Гусев, улица
вариант 100% попаданий не вижу, варианты:

  1. ГО like '%' + город + '%'
  2. совпадение границы города и ГО
  3. обязательно город пространственно внутри ГО

не забыть про Питер и Москву

Refactor places processing

  • (done) Separate places and admin. boundaries. Move places to PlacePointsBuilder
  • (done) Add place point and place boundary junction
  • (done) Add place in polygon indexing

Fuzzy matching utils

Add different fuzzy matchers for street names and places names matching.
Must be accesible from full text addresses formatters and so on.

Add sqlite support for slices

  • Add DAO Layer
  • Add in memory DAO implementation (for test mainly)
  • Add sqlite DAO implementation
  • Move sinchronization to DAO (different implementations will use different syncronization)

Фильтровать неполные адреса

Дома/города других стран оставшиеся после обрезки страны по bbox-у.
Или фильтровать по стране, или убивать адреса без страны/области.

add a gui and sql / sqlite export

Adding a GUI for gazetter will improve usability, specialy when exporting.
Also adding an option for export sql or sqlite will make it easy and faster when using on android.

Netherlands - Out of memory

Good morning master,

I need your help once more, it seems that we need some tricks to resolve one of the most detailed country on OSM: Netherlands. So I ran the application as you suggested:

1st step
bzcat $inputFile | java -jar gazetteer-1.4.jar split - none

2nd step
java -jar gazetteer-1.4.jar slice --x10

3rd step
java -jar gazetteer-1.4.jar join --handlers out-gazetteer $outFile

2015-11-20 10.01.17.187 [join-stripe18544.gjson.gz] ERROR JoinSliceRunable - Join failed. File: data/stripe18544.gjson.gz.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2367)
...

and there also more stripes failing after these one.

Source of the file: http://download.geofabrik.de/europe/netherlands-latest.osm.bz2

What can be done here?
Thank you in advance.

Add full address text formatter configuration

Add template string or maybe file with js or groovy formatter script.
Address is the simple json object so using js or groovy with json.org lib preimported should be simple enough.

fix object namings to follow GeoJSON standards

Hi,
in data extracts produced by Gazetteer objects don't follow GeoJson object naming because are all-lowercase.

Could you please fix that as GeoJSON specification requires:

A geometry is a GeoJSON object where the type member's value is one of the following strings: "Point", "MultiPoint", "LineString", "MultiLineString", "Polygon", "MultiPolygon", or "GeometryCollection".

Errors mending

Mend common types of geometry errors:

  • Selfintersections and melted edges like >----< or >--
  • Shared edges between outer and inner shells
  • Unclosed polygons
  • Inner outside outer.

Gazetteer hangs on JoinSliceRunable

Gazetteer hangs on latest OSM dumps when joining.
java -Xmx2048m -jar bin/Gazetteer.jar join --handlers out-gazetteer latest.json.gz

[join-stripe2061.gjson.gz] INFO JoinSliceRunable - stripe2061.gjson.gz done in 0:00:41.550. 57 left

Statistic module

Most important:

  • Broken polygons
  • Addresses by city. Alerts for cities with huge addresses losts.

Геокодирование внутри дома, внутри osm границы

Геокодирование по паре сотен POI внутри дома может выглядеть как overkill но для неё есть самое прямое применение в OpenLevelUp. Главное чтобы osmid в ответе приходил.

Параметры API

В WEB API gazetteer нужно добавить какой-нибудь параметр по которому будет ограничена область геокодинга.

Похожее было здесь: https://github.com/kiselev-dv/gazetteer/tree/develop/Gazetteer#3-how-to-filter-data-by-boundary

osm_id=w00000001
osm_id=r00000001

Очевидно, что для этого клиентам нужно знать osmid (что противоречит геокодингу без костылей), но OpenLevelUp может сделать и такое потому как данные есть от overpass.

Возможно упростить всё пользователям двумя параметрами:
restrict=admin_level + restrict_query=Москва
restrict=building + restrict_query=Афимолл Сити
restrict=mall + restrict_query=Centre Commercial Le Coudoulet

Индексация

Самая простая реализация это добавить метки is_in_Афимолл_Сити всем объектам внутри здания.

Моллы/retail здания можно выделить из всех остальных "домов" для уменьшения индекса.

Для админ границ не такая востребованная фича, может подождать.

Обработка запросов

Матчить 2 параметра по этим меткам да хоть на равенство (уже лучше Номинатима).

Можно сортировать одинаковые результаты по дистанции к lat/lot/zoom параметоров из клиентского приложения (у OpenLevelUp в URL есть они). Т.е. перезаписывая веса внутренние веса ES на метры до lat/lot.

Можно эту пост-сортировку до центра экрана сделать опциональной.

PS.
тикет в OpenLevelUp https://framagit.org/PanierAvide/OpenLevelUp/issues/10
тикет в Photon komoot/photon#226

add osm node ids to Gazetteer objects

Please add osm node ids to Gazetteer objects, currently "id" key starts with tag name and osm node id is somewhere in the middle, so is very difficult to parse.
It would be great to have structured osm id object with osm node or way id, type and tag, for ex. {'id': '25496583', 'type': 'node', 'highway': 'traffic_signals'} or {'id': '25496583', 'type': 'way', 'highway': 'motorway'}

разное написание города в addr-full-text

adrpnt-0141602584-w90640819 Калининградская область, городской округ Калининград, калининград, улица Лермонтова, 9,

adrpnt-0141716322-n1771515686 Калининградская область, городской округ Калининград, Калининград, Корсунская улица, 10,

Add streets processing

Imports

  • [done] Import osm ways for streets
  • [done] Import associated street relations

Indexing

  • [done] Index building to nearest street
  • [done] Index street in polygons

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.