Code Monkey home page Code Monkey logo

milangritta / whatsmissingingeoparsing Goto Github PK

View Code? Open in Web Editor NEW
17.0 1.0 4.0 9.32 MB

The accompanying code and data for the Springer 2017 publication "What's missing in geographical parsing?" in Language Resources and Evaluation.

Home Page: https://link.springer.com/article/10.1007/s10579-017-9385-8

License: GNU General Public License v3.0

Python 100.00%
nlp machine-learning keras geoparsing geonames geocoding geotagging evaluation toponym-resolution toponymy

whatsmissingingeoparsing's Introduction

What's Missing In Geoparsing?


NEWS UPDATE 31.9.2019 - We have a LONG FOLLOW-UP PAPER OUT NOW that greatly expands on this topic. The title is "A Pragmatic Guide to Geoparsing Evaluation." It's now been published at Springer LREV Journal. For the project/paper repository, follow this link.


"Science is a wonderful thing if one does not have to earn one's living at it." -- Albert Einstein

Summary

Thanks for stopping by! In this repository, you will find the accompanying code and data for the publication "What's missing in geographical parsing?" in the journal Language Resources and Evaluation. In the unlikely case of any files missing, please track me down and I'll upload ๐Ÿ‘

What's included

  1. data - This is the output of all systems on both datasets (2 * 5 files) plus the gold standard (2 files)
  2. The dataset WikToR(SciPaper).xml is the original data as described and used in the paper.
  3. The LGL dataset, which is also used for evaluation is included as lgl.xml
  4. Essential experiment files (plus supporting scripts)

How to replicate

You should have some basic Python libraries like Numpy, NLTK, Matplotlib (if you want graphics), ... to start with.

  • methods.py is the main python script for running the experiments (requires the yahoo.py script)
  • Please install GeoPy to calculate the distances between coordinates.
  • Also install Wikipedia for Python, nice API wrapper ๐Ÿ‘
  • Scroll down to the end of the file to see example usage, I included all necessary instructions and comments.
  • Enjoy!

How to (re)create and modify WikToR

The dataset (WikToR) can be created (and unite tested) from scratch, extended, reduced, with more or fewer sentences added, etc. If you wish to do that, great! Here's what you need:

  • The wiktor.py file is the python script used to (re)generate and unit test WikToR.
  • Download the allCountries.txt data dump from GeoNames and save in the same directory as the script.
  • Please sign up for a GeoNames account and a USERNAME, which you will need to fill in on line 42 to ensure the API query works.
  • The first half of wiktor.py is for CORPUS CREATION, the second half is for CORPUS TESTING.
  • Enjoy!

"The science of today is the technology of tomorrow." -- Edward Teller

whatsmissingingeoparsing's People

Contributors

milangritta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.