Code Monkey home page Code Monkey logo

geograpy's Introduction

This project is no longer being maintained and has been archived. Please check the Forks list for newer versions.

Forks

We are aware of two 3rd party forks for this library:

Geograpy

Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.

Install & Setup

Grab the package using pip (this will take a few minutes)

pip install geograpy

Geograpy uses NLTK for entity recognition, so you'll also need to download the models we're using. Fortunately there's a command that'll take care of this for you.

geograpy-nltk

Basic Usage

Import the module, give some text or a URL, and presto.

import geograpy
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

Now you have access to information about all the places mentioned in the linked article.

  • places.countries contains a list of country names
  • places.regions contains a list of region names
  • places.cities contains a list of city names
  • places.other lists everything that wasn't clearly a country, region or city

Note that the other list might be useful for shorter texts, to pull out information like street names, points of interest, etc, but at the moment is a bit messy when scanning longer texts that contain possessive forms of proper nouns (like "Russian" instead of "Russia").

But Wait, There's More

In addition to listing the names of discovered places, you'll also get some information about the relationships between places.

  • places.country_regions regions broken down by country
  • places.country_cities cities broken down by country
  • places.address_strings city, region, country strings useful for geocoding

Last But Not Least

While a text might mention many places, it's probably focused on one or two, so Geograpy also breaks down countries, regions and cities by number of mentions.

  • places.country_mentions
  • places.region_mentions
  • places.city_mentions

Each of these returns a list of tuples. The first item in the tuple is the place name and the second item is the number of mentions. For example:

[('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]  

If You're Really Serious

You can of course use each of Geograpy's modules on their own. For example:

from geograpy import extraction

e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928')
e.find_entities()

# You can now access all of the places found by the Extractor
print e.places

Place context is handled in the places module. For example:

from geograpy import places

pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])

pc.set_countries()
print pc.countries #['United States']

pc.set_regions()
print pc.regions #['Ohio']

pc.set_cities()
print pc.cities #['Cleveland']

print pc.address_strings #['Cleveland, Ohio, United States']

And of course all of the other information shown above (country_regions etc) is available after the corresponding set_ method is called.

Credits

Geograpy uses the following excellent libraries:

Geograpy uses the following data sources:

Hat tip to Chris Albon for the name.

geograpy's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geograpy's Issues

geograpy-ntlk error

This is an issue fork from #4 by @shun-liang. I have the same problem.

When trying to run geograpy-nltk, I get the following error:

Traceback (most recent call last):
File "/Users/shun/.virtualenvs/hn_hiring_trend/bin/geograpy-nltk", line 5, in
nltk.downloader('maxent_ne_chunker')
TypeError: 'module' object is not callable

regions/countries returning all proper nouns

import geograpy as gp url = "https://www.politico.eu/article/italy-incurable-economy/" places = gp.get_place_context(url = url) places.regions

Returns a list of proper nouns from the article, the same goes for places.countries.

places.country_cities
seems to do better but still gives a funky return.
{'Italy': ['Rome', 'Naples', 'Codogno'], 'United States': ['Rome', 'Naples', 'Pierre', 'Brussels', 'Italy'], 'Belgium': ['Brussels'], 'France': ['Pierre']}

Unable to run it due to label() exception on extraction.py

Traceback (most recent call last):
  File "ale.py", line 7, in <module>
    places = geograpy.get_place_context(url="https://www.cntraveler.com/hotels/hong-kong-s-a-r-/jordan/mandarin-oriental-hong-kong")
  File "/usr/local/lib/python2.7/site-packages/geograpy/__init__.py", line 6, in get_place_context
    e.find_entities()
  File "/usr/local/lib/python2.7/site-packages/geograpy/extraction.py", line 31, in find_entities
    if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
  File "/usr/local/lib/python2.7/site-packages/nltk/tree.py", line 217, in _get_node
    raise NotImplementedError("Use label() to access a node label.")

All examples return empty result

import geograpy
from geograpy import places
from pprint import pprint

# address = geograpy.get_place_context(text='United States')
# pprint (vars(address))
from geograpy import extraction

e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928')
e.find_entities()

# You can now access all of the places found by the Extractor
pprint (e.places)
pprint (vars(e.places))

Results:

{'address_strings': [],
 'cities': [],
 'city_mentions': [],
 'conn': <sqlite3.Connection object at 0x110769858>,
 'countries': [],
 'country_cities': {},
 'country_mentions': [],
 'country_regions': {},
 'other': [],
 'places': [],
 'region_mentions': [],
 'regions': []}
[]

UnicodeDecodeError: 'charmap' codec can't decode...

Did @PandaWhoCodes pip install git+https://github.com/reach2ashish/geograpy.git plus
nltk.downloader.download('maxent_ne_chunker')
nltk.downloader.download('words')
nltk.downloader.download('treebank')
nltk.downloader.download('maxent_treebank_pos_tagger')
nltk.downloader.download('punkt')
nltk.download('averaged_perceptron_tagger')

and it seemed to be going well until I tried the example
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

I get
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 274: character maps to <undefined>

Python 3.6 Windows
Any thoughts? (or alternatives? I need to pull out city names. I've used GeoText for the country names (not positive it's working right yet) but GeoText's cities doesn't work very well.)

sqlite3.OperationalError

Fresh install on Python3 on Ubuntu 16.04

import geograpy
address = geograpy.get_place_context(text='United States')

results in

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/geograpy/__init__.py", line 9, in get_place_context
    pc = PlaceContext(e.places)
  File "/usr/local/lib/python3.5/dist-packages/geograpy/places.py", line 17, in __init__
    self.conn = sqlite3.connect(db_file)
sqlite3.OperationalError: unable to open database file

ImportError: cannot import name 'PlaceContext'


ImportError Traceback (most recent call last)
in ()
----> 1 from geograpy import places

~\Anaconda3\envs\goodreads\lib\site-packages\geograpy-0.3.7-py3.6.egg\geograpy_init_.py in ()
1 from extraction import Extractor
----> 2 from places import PlaceContext
3
4 def get_place_context(url=None, text=None):
5 e = Extractor(url=url, text=text)

ImportError: cannot import name 'PlaceContext'

Can Someone Help me please?

NotImplementedError: Use label() to access a node label.

pip installed geograpy on archlinux box and tried out this simple program

import geoprapy
t = 'India is a country'
p = geoprapy.get_place_context(text=t)

I got the following error

In [3]: p = geograpy.get_place_context(text=txt)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-3-ae088bc75e75> in <module>()
----> 1 p = geograpy.get_place_context(text=txt)

/home/m/.local/lib/python2.7/site-packages/geograpy/__init__.pyc in get_place_context(url, text)
      4 def get_place_context(url=None, text=None):
      5     e = Extractor(url=url, text=text)
----> 6     e.find_entities()
      7
      8     pc = PlaceContext(e.places)

/home/m/.local/lib/python2.7/site-packages/geograpy/extraction.pyc in find_entities(self)
     29         for ne in nes:
     30             if len(ne) == 1:
---> 31                 if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
     32                     self.places.append(ne[0][0])

/usr/lib/python2.7/site-packages/nltk/tree.pyc in _get_node(self)
    196     def _get_node(self):
    197         """Outdated method to access the node value; use the label() method instead."""
--> 198         raise NotImplementedError("Use label() to access a node label.")
    199     def _set_node(self, value):
    200         """Outdated method to set the node value; use the set_label() method instead."""

NotImplementedError: Use label() to access a node label.
Python version 2.7.11
Linux zero 4.5.4-1-ARCH #1 SMP PREEMPT Wed May 11 22:21:28 CEST 2016 x86_64 GNU/Linux

ImportError: No module named 'extraction'

Hi,
I just came across this library and was trying it out but every time I try to....

import geograpy

I get this error:
Traceback (most recent call last):
File "", line 1, in
File "/Users/Larry/miniconda3/envs/fido/lib/python3.5/site-packages/geograpy/init.py", line 1, in
from extraction import Extractor
ImportError: No module named 'extraction'

Error processing data (from demo)

NLTK seems to have changed this: http://www.nltk.org/_modules/nltk/tree.html

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/geograpy/**init**.py", line 6, in get_place_context
    e.find_entities()
  File "/usr/local/lib/python2.7/dist-packages/geograpy/extraction.py", line 31, in find_entities
    if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
  File "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node
    raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.

geograpy wont download

I get this error when I run the command pip install geograpy then other errors return also.
"Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
status = self.run(options, args)"

Way to Reset the initial Place Context

Hi,

Ideally I would like if we could reset the place context. It is not being reset. So, let' say I initialize it with Afghanistan and then on next iteration add Libya. It will still display Afghanistan in country_mentions. It should be reset.

python 3 support

the module newspaper won't download because it has changed to newspaper3k. The library itself also seems to have some python3 incompatibilities that could be fixed by running 2to3.

Error in installation

I am getting following error while installation.

Could not find a version that satisfies the requirement geograpy (from versions: )
No matching distribution found for geograpy.

OperationalError: unable to open database file

plz I am having a big error here
Traceback (most recent call last):
File "sm.py", line 36, in
pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])
File "/usr/local/lib/python3.6/dist-packages/geograpy3/places.py", line 34, in init
self.conn = sqlite3.connect(db_file)
sqlite3.OperationalError: unable to open database file

AttributeError: 'NoneType' object has no attribute 'name'

p = geograpy.get_place_context(text='Pristina')

Traceback (most recent call last):
File "", line 1, in
File "/home/cusco/VirtualEnvs/data_parser/lib/python3.7/site-packages/geograpy/init.py", line 12, in get_place_context
pc.set_cities()
File "/home/cusco/VirtualEnvs/data_parser/lib/python3.7/site-packages/geograpy/places.py", line 160, in set_cities
country_name = country.name
AttributeError: 'NoneType' object has no attribute 'name'

'NoneType' object has no attribute 'name'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.