Code Monkey home page Code Monkey logo

pyaddress's Introduction

address

address is an address parsing library, taking the guesswork out of using addresses in your applications. We use it as part of our apartment search and apartment spider applications.

Installation

pip install address

Example

First, we create an AddressParser. AddressParser allows us to feed in lists of cities, streets, and address suffixes. Then we call parse_address on our address string, which returns an Address instance with all the attributes filled out. From there, we can print parts of the address, change them, validate them, create a database model to store them, or anything else.

from address import AddressParser, Address

ap = AddressParser()
address = ap.parse_address('123 West Mifflin Street, Madison, WI, 53703')
print "Address is: {0} {1} {2} {3}".format(address.house_number, address.street_prefix, address.street, address.street_suffix)

> Address is: 123 W. Mifflin St.

AddressParser

AddressParser(self, suffixes=None, cities=None, streets=None)

suffixes, cities, and streets all accept lists as arguments. If you leave them as none, they will read default files from the package, namely suffixes.csv, cities.csv, and streets.csv. Streets is intentionally blank.

You can provide lists of acceptable suffixes, cities, and streets to lower your false positives. If you know all the addresses you are processing are in a small area, you can provide a list of the cities in the area and should get more accurate results. If you are only doing one city, you could provide that single city in a list, and a list of all streets in that city.

Address

Addresses get returned by AddressParser.parser_address(). They have the following attributes:

house_number

The number on a house. This is required for all valid addresses. E.g. 123 W. Mifflin St.

street_prefix

The direction before the street name. Always represented as one or two letters followed by a period. Not required. E.g. 123 W. Mifflin St.

street

The name of the street. Potentially multiple words. This is required for a valid address. E.g. 123 W. Mifflin St.

street_suffix

The ending of a street. This will always be the USPS abbreviation followed by a period. Not required, but highly recommended. E.g. 123 W. Mifflin St.

apartment

Apartment number or unit style or any number of things signifying a specific part of an address. Not required. E.g. 123 W. Mifflin St. Apt 10

buiding

Sometimes addresses are grouped into buildings, or are more commonly known as by building names. Not required, and often in parathenses. E.g. 123 W. Mifflin St. Apt 10 (The Estates)

city

The city part of the address, preferably following a comma. E.g. 123 W. Mifflin St., Madison, WI 53703

state

The state of the address, preferably following the city and a comma. Always two capitalized letters. E.g. 123 W. Mifflin St., Madison, WI 53703

zip

The 5 digit zip code of the address, preferably following the state. 9 digit zips not yet supported. E.g. 123 W. Mifflin St., Madison, WI 53703

full_address()

Returns a human readable version of the address for display. Follows the same style rules as the above attributes. Example return: (The Estates) 123 W. Mifflin St. Apt 10, Madison, WI 53703

Todo

  • Add verification of an address through Google Maps API, given an API key.
  • Allow custom validation conditions in AddressParser for what counts as a correct address or not.
  • Add exceptions for incorrect addresses instead of silent failing and letting user validate.

GitHub

File support requests and obtain the source from https://github.com/SwoopSearch/pyaddress

Authors

  • Josh Gachnang
  • Rob Jauquet

License and Copyright

Copyright (c) 2013 Swoop Search LLC.

This library is released under the New BSD License.

pyaddress's People

Contributors

qbotts84 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pyaddress's Issues

Memory Leak in parse_address

I just wanted to leave a note that we confirmed a fairly large memory leak associated with parse_address in our Heroku/Django application. Using heapy in a heroku bash shell, we deduced that the cities.csv file was continually being loaded and saved in memory. We didn't have the resources to find a workaround, so we moved to another package, but I wanted to file this ticket in case it flags the issue for anyone in the future who can save some time.

SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Unmatched token: ", token)?

When I try to run a basic call to pyaddress, I get the below error:

Sample Code:

import usaddress
from address import AddressParser
address_line1 = '728 Nashville Ave'
addr = usaddress.parse(address_line1)
ad = AddressParser()
addr2 = ad.parse_address(address_line1)
#perform some cleanup and functions on addr...
addr2.street_suffix

Error:

Traceback (most recent call last):

  File C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3369 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Input In [25] in <cell line: 2>
    from address import AddressParser

  File ~\AppData\Roaming\Python\Python39\site-packages\address\__init__.py:1 in <module>
    from .address import Address, AddressParser

  File ~\AppData\Roaming\Python\Python39\site-packages\address\address.py:185
    print "Unmatched token: ", token
          ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Unmatched token: ", token)?

Parsing fails with very simple address

from address import AddressParser, Address
ap = AddressParser()
address = ap.parse_address('351 King St. #400, San Francisco, CA, 94158')
address
Address - House number: 351 Prefix: None Street: San Suffix: None Apartment: King City,State,Zip: Francisco, CA 94158
address.apartment
'King'
address.street
'San'

Please Export AddressException

I am checking through a list of addresses. It would be nice if I could just
try:
address = ap.parse_address(WorkAddress)
except AddressException:
# handle error here

I expect my list to have errors, I just don't want to go through thousands manually.

Thank You

Unable to parse 5-digit-long house number in street address

In [17]: ap = AddressParser()

In [18]: ap.parse_address('5169 North Scottsdale Road')
Out[18]: Address - House number: 5169 Prefix: N. Street: Scottsdale Suffix: Rd. Apartment: None City,State,Zip: None, None None

In [19]: ap.parse_address('51691 North Scottsdale Road')

InvalidAddressException                   Traceback (most recent call last)
/home/xxx/<ipython-input-19-a178d7194cb2> in <module>()
----> 1 ap.parse_address('51691 North Scottsdale Road')

/usr/local/lib/python2.7/dist-packages/address/address.pyc in parse_address(self, address, line_number)
     86         loaded suffixes, cities, etc.
     87         """
---> 88         return Address(address, self, line_number, self.logger)
     89 
     90     def dstk_multi_address(self, address_list):

/usr/local/lib/python2.7/dist-packages/address/address.pyc in __init__(self, address, parser, line_number, logger, dstk_pre_parse)
    190 
    191         if self.house_number is None or self.house_number <= 0:
--> 192             raise InvalidAddressException("Addresses must have house numbers.")
    193         elif self.street is None or self.street == "":
    194             raise InvalidAddressException("Addresses must have streets.")

InvalidAddressException: Addresses must have house numbers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.