Code Monkey home page Code Monkey logo

pyap's People

Contributors

kczar avatar termopro avatar tomfunk avatar vladimarius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pyap's Issues

House numbers with letters in them do not parse correctly

It seems that if there is an address like N8780 Something Blvd Cape Canaveral, FL 11111, the housing number that contains a letter in it will be parsed from the first letter onward. So the street address will be 8780 Something Blvd. Upon further testing, if a letter is added in the middle then most of the housing number will be parsed out before the last occurrence of a letter in the housing number.

Can't install pypa in python 3.8.2

Hi,

Thanks for your great work! when I try to install I got this error:

(venv) User@SOZ-MBP16 sample_project % pip install pypa Collecting pypa Downloading pyPA-1.0rc.tar.gz (38 kB) ERROR: Command errored out with exit status 1: command: /Users/User/PycharmProjects/sample_project/venv/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/6k/6tmz3rc91759jdxs80_fjd180000gp/T/pip-install-4z09r1ik/pypa_857e7edb458246b0bbdc50f992f13718/setup.py'"'"'; __file__='"'"'/private/var/folders/6k/6tmz3rc91759jdxs80_fjd180000gp/T/pip-install-4z09r1ik/pypa_857e7edb458246b0bbdc50f992f13718/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/6k/6tmz3rc91759jdxs80_fjd180000gp/T/pip-pip-egg-info-tkpdibrw cwd: /private/var/folders/6k/6tmz3rc91759jdxs80_fjd180000gp/T/pip-install-4z09r1ik/pypa_857e7edb458246b0bbdc50f992f13718/ Complete output (6 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/private/var/folders/6k/6tmz3rc91759jdxs80_fjd180000gp/T/pip-install-4z09r1ik/pypa_857e7edb458246b0bbdc50f992f13718/setup.py", line 17 except ImportError, e: ^ SyntaxError: invalid syntax ---------------------------------------- WARNING: Discarding https://files.pythonhosted.org/packages/bc/5a/2964cadcb8bc8d875768a16f023abb328deb895fad65fb1406dd3abc6219/pyPA-1.0rc.tar.gz#sha256=8c5f32fed2f192bd2c07912f17e3f770ec3e09ebae2aef4091171dcdca875c72 (from https://pypi.org/simple/pypa/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Do you know what is the problem?
I am running on Macos python 3.8.2

Parsing French Canadian Address with Occupancy

Hello Vlad! Congratulations for this amazing work, I really appreciate. However, I noticed that the parser has problems with some cases of French Canadian addresses.

Examples
This address doesn't parse at all

  • 2545, rue De Lorimier, bureau 100, Longueuil, QC, J4K3P7

This one parse but the result is wrong, occupancy is set to None and city is set to bureau 100 Longueuil

  • 2545, rue De Lorimier, bureau 100 Longueuil, QC, J4K3P7
    {'full_address': '2545, rue De Lorimier, bureau 100 Longueuil, QC, J4K3P7, Canada', 'full_street': '2545, rue De Lorimier', 'street_number': '2545', 'street_type': 'rue', 'street_name': 'De Lorimier', 'route_id': None, 'post_direction': None, 'floor': None, 'building_id': None, 'occupancy': None, 'postal_box': None, 'city': 'bureau 100 Longueuil', 'region1': 'QC', 'postal_code': 'J4K3P7', 'country_id': 'CA'}

Here are some other examples of addresses that do not parse, but seem valid to me, probably because they contain hyphens, parenthesis or something else:

  • 110-395 Rue des Érables, Salaberry-de-Valleyfield, QC, J6T6T5, Canada
  • 1095 Rue de la Visitation, Saint-Charles-Borromée, QC, J6E0W7, Canada
  • 461 Rue Dufferin, Salaberry-de-Valleyfield, QC, J6S2B3, Canada
  • 200-1345 Boul Dagenais Ouest (Sainte-Rose), Laval, QC, H7L5Z9, Canada
  • 3149 Boul Dagenais Ouest (Fabreville), Laval, QC, H7P1T8, Canada
  • 655 Rue Boucher, Saint-Jean-sur-Richelieu, QC, J3B8P4, Canada
  • 101-2575 32e Avenue (LaSalle), Montréal, QC, H8T3G9, Canada
  • 3875 Boul Sainte-Rose (Laval-Ouest), Laval, QC, H7R1V2, Canada
  • 1840 32e Avenue (Lachine), Montréal, QC, H8T3M6, Canada
  • 123 Rue Huot, Notre-Dame-de-l'Île-Perrot, QC, J7V7M4, Canada
  • 1468 Boul Monseigneur-Langlois, Salaberry-de-Valleyfield, QC, J6S1C2, Canada
  • 93 Ave Conrad-Gosselin, Saint-Jean-sur-Richelieu, QC, J2X0A1, Canada
  • 795 Ave de Grande-Île, Salaberry-de-Valleyfield, QC, J6S3N9, Canada
  • 525 Rue Gadbois, Saint-Jean-sur-Richelieu, QC, J3A1V1, Canada
  • 400 Rue Croisetière, Saint-Jean-sur-Richelieu, QC, J2X0E5, Canada
  • 695 DU PONT, Terrebonne, QC, J6W1A2, Canada
  • 5150 Boul Dagenais Ouest (Laval-Ouest), Laval, QC, H7R1L8, Canada
  • 1250 Boul Dagenais Ouest (Fabreville), Laval, QC, H7L5E3, Canada
  • 1889 Boul Dagenais Ouest (Sainte-Rose), Laval, QC, H7L5A3, Canada
  • 149 MTEE DU MOULIN, Laval, QC, H7N3Y8, Canada
  • 3675 Boul Dagenais Ouest (Fabreville), Laval, QC, H7P5C9, Canada
  • 398 Boul Curé-Labelle (Chomedey), Laval, QC, H7V2S3, Canada
  • 3251 Boul Dagenais Ouest (Fabreville), Laval, QC, H7P1V3, Canada
  • 196 Rue St-Louis, Saint-Jean-sur-Richelieu, QC, J3B1Y1, Canada
  • 2525B Rue du Pont, Marieville, QC, J3M0C5, Canada
  • 1585 du Chevrotin, Richelieu, QC, J3L4Y3, Canada
  • 91 Ave Conrad-Gosselin, Saint-Jean-sur-Richelieu, QC, J2X0A1, Canada
  • 1000 Boul du Séminaire N, Saint-Jean-sur-Richelieu, QC, J3A1E5, Canada
  • A-1645A Aut Jean-Noël-Lavoie, Laval, QC, H7L3W3, Canada

Could you please take a look? My knowledge of regular expressions is very limited

How to Change Street number and street name in regex's rules

Hi Dear vladimarius

Thank you for publishing this program I wanna use your program for euro country that has different street addresses formating
for example :
AUSTRIA
"""Mr J Brownhall
264 High Street
ALLAMBIE NSW 2100
AUSTRALIA"""

has this formating you implemented in US code in this format :
full_street = r"""
(?:
(?P<full_street>
{street_number}?,?\ ?
{street_name}?,?\ ?

        (?:[\ \,]{street_type})\,?\ ?
        {post_direction}?\,?\ ?
        {floor}?\,?\ ?
        {building}?\,?\ ?
        {occupancy}?\,?\ ?
        {po_box}?
    )
)""".format(street_name=street_name,
            street_number=street_number,
            street_type=street_type,
            post_direction=post_direction,
            floor=floor,
            building=building,
            occupancy=occupancy,
            po_box=po_box,
            )

I get in trouble when I want to change the street number and street name :
full_street = r"""
(?:
(?P<full_street>
{street_name}?,?\ ?
{street_number}
(?:[\ ,]{street_type}),?\ ?
{post_direction}?,?\ ?
{floor}?,?\ ?
{building}?,?\ ?
{occupancy}?,?\ ?
{po_box}?
)
)""".format(street_name=street_name,
street_number=street_number,
street_type=street_type,
post_direction=post_direction,
floor=floor,
building=building,
occupancy=occupancy,
po_box=po_box,
)

and error is :
raise source.error(msg, len(condname) + 1)
re.error: unknown group name 'street_number' at position 142 (line 8, column 28)

Could you possibly help me
I would be grateful

Certain Addresses not captured

Pyap doesn't seem to capture any addresses containing PO Boxes, Unit (number), and Floor (number)/ (number th) Floor.

St. (Saint) and Mount / Mountain in City name failing to properly parse

It seems that when you include Saint or Mount / Mountain in the city name, it does not properly understand what the city is. Take the two examples:

  • 111 Example Name Mount Pleasant SC 11111
  • 111 Example Name St. Augustine SC 11111

In the first example, the City will be Pleasant and the "Mount" will be part of the street. In the second example, the "Augustine" will be the city and the "St." will be part of the street. If you convert "St." to Saint" it works great.

French addresses

Hi Vladimarius,
do you know if some work has been done on french addresses ? I have to parse some french addresses, it seems libpostal is nice but I have problems installing it and I found no other reliable solution, do you know other parsers for international addresses ?
Thanks for your work !
Romain

Support traditional (non-postal) province/state abbreviations

For example 123 Main St Kingston ON will parse, but 123 Main St Kingston Ont. will not. This longer abbreviation is extremely common formatting, especially for PEI and NWT. Also YK is probably a more common abbreviation than YT, despite that YK does not parse.

Likewise French Canadian speakers would probably use TNL, IPE, etc. Quebec should also have the two letter acronyms PQ and QB.

Postal English French
AB Alta. Alb.
BC B.C. C.-B.
MB Man. Man.
NB N.B. N.-B.
NL N.L. T.-N.-L
NS N.S. N.-É.
ON Ont. Ont.
PE P.E.I Î.-P.-É
QC Que. Qc / P.Q.
SK Sask. Sask.
NT N.W.T. T.N.-O
NU Nvt. Nt.
YT Yuk. YK
State Name USPS Abbreviation Traditional Abbreviation
Alabama AL Ala.
Alaska AK Alaska
Arizona AZ Ariz.
Arkansas AR Ark.
California CA Calif.
Colorado CO Colo.
Connecticut CT Conn.
Delaware DE Del.
Florida FL Fla.
Georgia GA Ga.
Hawaii HI Hawaii
Idaho ID Idaho
Illinois IL Ill.
Indiana IN Ind.
Iowa IA Iowa
Kansas KS Kans.
Kentucky KY Ky.
Louisiana LA La.
Maine ME Maine
Maryland MD Md.
Massachusetts MA Mass.
Michigan MI Mich.
Minnesota MN Minn.
Mississippi MS Miss.
Missouri MO Mo.
Montana MT Mont.
Nebraska NE Neb. or Nebr.
Nevada NV Nev.
New Hampshire NH N.H.
New Jersey NJ N.J.
New Mexico NM N.Mex.
New York NY N.Y.
North Carolina NC N.C.
North Dakota ND N.Dak.
Ohio OH Ohio
Oklahoma OK Okla.
Oregon OR Ore. or Oreg.
Pennsylvania PA Pa.
Rhode Island RI R.I.
South Carolina SC S.C.
South Dakota SD S.Dak.
Tennessee TN Tenn.
Texas TX Tex. or Texas
Utah UT Utah
Vermont VT Vt.
Virginia VA Va.
Washington WA Wash.
West Virginia WV W.Va.
Wisconsin WI Wis. or Wisc.
Wyoming WY Wyo.

Sources

Is there a way to return the raw, unformatted addresses

When I use pyap.parse() on the address below, the full address is formatted where the newline character \n is replaced by a comma and a space. I wonder if there is a way to also get the extracted but unformatted address. This might be useful if, say, a user would like to get the span of an address in the original text where the address is extracted from. Thanks!

address = """14234 Wilshire Blvd
Los Angeles, CA 90011"""

pyap.parse(address, country='US')[0].full_address
#14234 Wilshire Blvd, Los Angeles, CA 90011

US address parsing issue

Unable to parse this address. Canada parsing is fine though but US is very poor

"1607 23rd Street NW, Washington, DC 20008"

Matching condominium units for US addresses

Hi, thanks for the fantastic package. I'm finding it really useful.

Occasionally, some addresses aren't identified in natural text and I've deduced this to their use of "Unit \d+" to denote the unit of a condo. E.g.:

5625 NW 109th Ave, Unit 65, Doral, FL, 33178
451 Ives Dairy Road, Unit 204-1, Miami, Florida 33179

A search on the repo and I don't know if this variant has been mentioned before. If there's a reason this isn't implemented, sorry for missing that. If this is a possible improvement, I'd be happy to make the PR just let me know.

Until then, I might just replace all "Unit" with "Room" or one of the other current variants. Thanks!

Street type issue

The app catches words like 'Drug' thinking it's a driveway. Shouldn't the div be changed from [\.\ ,]? to [\.\ ,] in line 177 of data.py ?

Parsing issue with some of the US Address.

Hi Vlad! Thanks for this amazing work. However, I noticed that the parser has problems with some of the US addresses.

Examples
These following addresses don't parse at all.

  • 20555 Devonshire Street #116 Chatsworth CA 91311

  • 260-C North El Camino Real Encinitas CA 92024-2852

  • 623 H St NW Floor 3 Washington DC 20001

Could you please take a look?

"and" causing issues

Using pyap to parse addresses apart from names. Thank you for the work done as it is an amazing package.
Usually works great but when two names prepend the address it seems to falsely believe that the "and ExampleName" is part of the street.

Input: "ExampleName and ExampleName 111 Rock Rd Pittsboro, NC 1111"

Interestingly, using the "&" symbol seems to not cause this issue and works as expected.

address.as_dict() values are no longer guaranteed to be strings

Sometime between 0.2.0 and 0.3.0, non-string data started getting stored in addresses. If I iterate through the values in address.as_dict(), street numbers are now type int instead of type string (which broke some stuff I was using it for).

Not sure if this intentional or not, I wouldn't expect users to want numerical parts of addresses to be stored numerically, but it's not backwards compatible so I thought I'd bring it up.

Cheers!

Doesn't work well.

This doesn't work well. If I change your example to even:

225 E. John Carpenter Fwy, Suite 1500 Irving, Texas 75062

It isn't detected.

It doesn't identify short address

Thanks for the amazing code ....
It's Really helpful for me ...
But it can't find short address like ..

Roselle, NJ
Delta, BC
Waminster PA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.