Code Monkey home page Code Monkey logo

ansa.geonames's Introduction

Ansa.GeoNames

This ASP.NET Core 2.0 console application will populate a data store with Gazetteer data from the GeoNames geographical database. These data can already be accessed via a public API or text-file data dumps but sometimes it is useful to be able to write SQL queries to a relational database. Ansa.GeoNames wraps the excellent NGeoNames library, which handles the heavy-lifting in terms of retrieving and parsing the GeoNames data, and will generate your choice of a Sqlite or SQL Server database.

Status

I wrote this application to address a need common to a number of my customers. It is not, as yet, comprehensive for all other use cases. The output data set includes tables for (1) GeoNames, (2) alternate Names, and (3) country info. There are not yet tables for feature codes, admin codes, time zones and the other reference lists included in the GeoNames database. I feel the provided tables do solve the most common needs of users, but I am interested in producing a fully-featured version of the application, particularly if other developers are interested.

Schema

The database schema mirrors the GeoNames Gazetteer data structure described here.

Intallation and setup

First, clone the repository to your local machine:

git clone https://github.com/rodoch/Ansa.GeoNames

Then, build the ASP.NET solution, specifying your target runtime environment, e.g.:

dotnet build -r win10-x64

Tip: See a list of target runtime identifiers here.

This will produce an .exe file and an appsettings.json file that you can grab from <PATH-TO-YOUR-APP>/bin/Debug/netcoreapp2.0/.

Or else, you can just run the application from within Visual Studio.

Configuration

Use the appsettings.json file to configure your target database:

{
  "Database": "Sqlite",
  "ConnectionString": "Data Source=D:\\GeoNamesOutput\\Geonames.sqlite",
  "DataSourcePath": "D:\\GeoNamesData",
  "GeoNames": {
    "AlternateNamesLanguageCodes": "ga,en",
    "CitiesMinimumPopulation": "15000"
  }
}
Key Description Values
Database Specifies the type of database you wish to build Sqlite OR SQLServer
ConnectionString DB connection string Note that a connection string for a Sqlite DB must include the Data Source= prefix
DataSourcePath Where you intend to store the raw text files from the GeoNames data store, prior to DB input. Directory path
GeoNames:AlternateNamesLanguageCodes Specify the languages for which you wish to have alternate toponymic names Comma-separated string of ISO Alpha-2 language codes. An empty value means all languages will be retrieved.
GeoNames:CitiesMinimumPopulation The minimum population a population centre must have to be included in the GeoNames dataset 1000 OR 5000 OR 15000

Things to note:

  • You will not need to retrieve the GeoNames raw data dumps yourself. The application will do this for you. You need to specify the DataSourcePath in order to decide where these files will be stored.
  • Some GeoNames raw text files range from 0.5-1.5GB unzipped, depending on the configuration you choose. Ensure you have adequate disk space.
  • Probably the biggest factor in download size, processing speed, and the size of the subsequent database is the number of alternate languages you specify.
  • The next biggest factor is the minimum population setting. Higher minima mean smaller file sizes.

Unsurprisingly, the SQL Server implementation writes data to the DB significantly faster. If creating a Sqlite database you might want to make yourself a cup of tea.

Building a Sqlite database

Ansa.GeoNames will create the database and all the necessary tables automatically. You just need to run the program with the appropriate configuration, e.g.

{
  "Database": "Sqlite",
  "ConnectionString": "Data Source=D:\\GeoNamesOutput\\GeoNames.sqlite",
  "DataSourcePath": "D:\\GeoNamesData",
  "GeoNames": {
    "AlternateNamesLanguageCodes": "ga,en",
    "CitiesMinimumPopulation": "15000"
  }
}

Remember to:

  1. Specify "Sqlite" as your Database value.
  2. Specify a connection string, including the Data Source= prefix.

Building a SQL Server database

  1. Create a new database on your SQL Server instance and give it an appropriate name, e.g. GeoNames. Set the recovery model and compatibility according to your own requirements. Latin1_General_CI_AI collation seems to be the best choice for this type of database.
  2. Run the provided SQL script in /DBScripts/SqlServer/ to generate the tables.
  3. Specify your configuration in appsettings.json, e.g.:
{
  "Database": "SQLServer",
  "ConnectionString": "Server=localhost;Database=GeoNames;Trusted_Connection=True;",
  "DataSourcePath": "D:\\GeoNamesData",
  "GeoNames": {
    "AlternateNamesLanguageCodes": "ga,en",
    "CitiesMinimumPopulation": "15000"
  }
}

Permissions

Please note that the database user specified in your SQL Server connection string must have ALTER permission on the database you are writing to. This is required so that IDENTITY_INSERT can be turned on for the duration of the data input process as we are inserting premade IDs in to the PK/Identity columns of the GeoNames and AlternateNames tables.

ansa.geonames's People

Contributors

rodoch avatar

Stargazers

Mahmoud Abdallah avatar

Watchers

Mahmoud Abdallah avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.