Code Monkey home page Code Monkey logo

address-net's People

Contributors

jasonrig avatar niranjanaryan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

address-net's Issues

Tensorflow version

Hello, this is looking to be perfect for the project I'm working on right now but encountered an issue with what seems to be the Tensorflow version dependency. I understand the TF version needs to be higher than 1.12 and I'm using 2.2 but there seems to have been changes to the TF syntax.

E.g. get_variable is now Variable, random_normal is now random.normal in the model.py file.

I tried to fix these and it worked but now I'm getting an error in a tensorflow file tensorflow\python\ops\variables.py Line 261: TypeError: _variable_v2_call() got an unexpected keyword argument 'initializer'.

Can you let me know please which specific Tensorflow version you used during the development?

Addressnet installation Issues

I have installed python 3.6.8 and tensorflow 1.15 and still running into few installation issues. When I am trying to install
from addressnet.predict import predict_one , i get SyntaxError: future feature annotations is not defined.

As far as i know, function annotations are a feature that was introduced in Python 3.0. This should ideally work given that i have python 3.0.

Can you please help me fix this installation issue. It would be better if anyone could paste the installation steps.
Any help on this would really appreciated.

Its messing up few of the characters and putting them in other cateogries.

for addresses like
Example1:
'rathbone mirrison bakery kenmore road'
it gives this
{"building_name": "RATHBONE MIRRISONB", "street_name": " AKERY KENMORE", "street_type": "ROAD"}

Example2:
pontefract general infirmary southgate pontefract
{"street_name": "PONTEFRAT", "building_name": "CGENERALINFIRMARYSOUTHATE", "locality_name": "GPONTEFR", "state": "AUSTRALIAN CAPITAL TERRITORY", "street_type": "COURT"}

Can you tell me how it can be fixed. seems like a problem in script rather than model.

Retrain model

Hello,
I had used your package and it is very usefull. But the my data is formatted in UNICODE, which is Vietnamese, and it not working well. So can i use your code to retrain a new model for my own Vietnamese data? If yes, can you please help me? Thank you a lot.
For UNICODE example, "Số nhà 25, ngõ 294 Kim Mã, Phường Kim Mã, Quận Ba Đình, Thành phố Hà Nội". "street" is now "ngõ", "state" is now "Quận", ...
Sorry for my bad english,
Looking forward to hearing from you soon.

To find probability or confidence score of model output,

Hello @jasonrig @Stallon-niranjan ,
I was working on retraining of addressnet. I did it successfully, now i want to find the confidence score / probability of model. Like how much my model is confidence (86% confident of address result generated)
For which I tried using tf.nn.softmax but it's throwing an error.
Value error:- "Truth value of an array with more than one element is ambiguous. Use a.any or a.all".

Is there any way if you guys can help me out to find out confidence score, probability function which helps me out to use addressnet over millions of addresses.

Any help would be appreciated.

Thanks & Regards
Aj.

Lot Number over 3 characters issue

Hi,

With the pretrained model if I have an address like this: Lot 442, 123 AAA RD, BBB, WA 6000 it will get parsed nicely like this:
"flat_number": "442",
"flat_type": "LOT",
"locality_name": "BBB",
"number_first": "123",
"postcode": "6000",
"state": "WESTERN AUSTRALIA",
"street_name": "AAA",
"street_type": "ROAD"
Nice !

However if the Lot number increases to 4 characters like this: Lot 4424, 123 AAA RD, BBB, WA 6000 then I get odd results like this:
"building_name": "O",
"flat_number": "4424",
"flat_type": "LOT",
"locality_name": "BBB",
"number_first": "123",
"postcode": "6000",
"state": "WESTERN AUSTRALIA",
"street_name": "AAA",
"street_type": "ROAD"

Is there a way to fix this ?

P.S. Really great program by the way !

Showing abbreviated street type

Hi,

On applying addressnet to address be like "677 Timpany BLVD" , predict_one shows street_type as "BLVD" instead of "BOULEVARD".

Well, apart from this i want to apply it to USA address. Could you please guide me on that.

Any help would be appreciated.
Thanks

Incorrect prediction if ",AU" in address string

python 3.5.2

import addressnet.predict as address_lib
print(address_lib.predict_one("Jubilee Street Newport,VIC,3015,AU")["locality_name"])
>NEWPORTU # Should be NEWPORT

print(address_lib.predict_one("Jubilee Street Newport,VIC,3015")["locality_name"])
>NEWPORT # Correct

Model input and output shape

I'm trying to re-implement this in Keras. What's the output shape for this model? Does it output the indeces around text that falls in each category, or something completely different?

misconversion under some situations

Hi Jason,

I used your default trainer to decompose about 2500 addresses that I am trying to match to GNAF (or more specifically VICMAP_ADDRESS). Thanks for posting it.
It worked pretty well, though slow. Up to 7 seconds each on my micro cloud shell.
Maybe this was due to the warning message.

WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7eff508251e0>) includes params argument, but params are not passed to Estimator.
row 4 decomposing Address:  146/2 NOONE STREET CLIFTON HILL
predicting for 146/2 NOONE STREET CLIFTON HILL

These addresses were scraped from an old council document so don't follow modern standards. There is no postcode.

https://www.yarracity.vic.gov.au/-/media/files/the-area/heritage/city-of-yarra-heritage-review-appendix-8.pdf?la=en&hash=5818FC0071A12F3C6C8FB489BA0582681264F0AD

Here is a subset of the results.

NormalAddress,number_last_suffix,state,postcode,number_first,street_type,number_last,locality_name,building_name,street_name,flat_number
142 NOONE STREET CLIFTON HILL,,,,142,STREET,,CLIFTON HILL,,NOONE,
144 NOONE STREET CLIFTON HILL,,,,144,STREET,,CLIFTON HILL,,NOONE,
146 NOONE STREET CLIFTON HILL,,,,146,STREET,,CLIFTON HILL,,NOONE,
146/1 NOONE STREET CLIFTON HILL,,,,,STREET,1,CLIFTON HILL,,NOONE,146
146/2 NOONE STREET CLIFTON HILL,,,,,STREET,2,CLIFTON HILL,,NOONE,146
146/7 NOONE STREET CLIFTON HILL,,,,,STREET,7,CLIFTON HILL,,NOONE,146
146/8 NOONE STREET CLIFTON HILL,,,,48,STREET,,CLIFTON HILL,,NOONE,16
146/0 NOONE STREET CLIFTON HILL,,,,40,STREET,,CLIFTON HILL,,NOONE,16
160 NOONE STREET CLIFTON HILL,,,,160,STREET,,CLIFTON HILL,,NOONE,
162 NOONE STREET CLIFTON HILL,,,,162,STREET,,CLIFTON HILL,,NOONE,

Notice that most addressed were decomposed correctly, but 146/8 and 146/0 were converted incorrectly. Interesting that the RNN generated new numbers 16 and 48 which are not in the input data. Its repeatable. Adding a postcode does not change the behaviour.

To be sure, this is not the standard way to write a unit address. Note that 8/146 converts fine.

8/146 NOONE STREET CLIFTON HILL,,,,146,STREET,,CLIFTON HILL,,NOONE,8
0/146 NOONE STREET CLIFTON HILL,,,,146,STREET,,CLIFTON HILL,,NOONE,0

Also, when the number has a suffix, it sometimes gets added to the first_number
176B NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,B,,,,
176C NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,C,,,,
176D NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,D,,,,
176E NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,E,,,,
176G NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,G,,,,
176G NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,G,,,,
176H NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,H,,,,
176I NOONE STREET CLIFTON HILL,,,,176I,STREET,,CLIFTON HILL,,NOONE,,,,,,
176J NOONE STREET CLIFTON HILL,,,,176J,STREET,,CLIFTON HILL,,NOONE,,,,,,

Also, there are some issues when the address is
88 THE ESPLANADE CLIFTON HILL,,,,88,,,CLIFTON HILL,,THE ESPLANADE,,,
It does not detect THE ESPLANADE as a road_name, road_type.

Retrain model scrypt

Hello, First of all, thank you for the opportunity to use the code you wrote.

I'm trying to train a new model, but the result I get after that is very wrong.

{'street_name': '168A SEPARATION STREET NO', 'locality_name': 'COTE, VIC 3070'}

The code I use is the following, can you share your code or information where I might be mistaken?

Thank you so much.

import argparse
import datetime
import tensorflow as tf

import addressnet.dataset as dataset
from addressnet.model import model_fn

def _get_estimator(model_fn, model_dir):
    config = tf.estimator.RunConfig(tf_random_seed=17, keep_checkpoint_max=5, log_step_count_steps=2000,
                                    save_checkpoints_steps=2000)
    return tf.estimator.Estimator(model_fn=model_fn, model_dir=model_dir, config=config)


def train(tfrecord_input_file: str, model_output_file: str):
    input_file_only = os.path.basename(tfrecord_input_file)
    model_output_file_path = f'{model_output_file}/{input_file_only}'

    #print('Start training...')
    #print(f'tfrecord_input_file={tfrecord_input_file}')
    #print(f'model_output_file={model_output_file}')

    #print('Get estimator...')
    address_net_estimator = _get_estimator(model_fn, model_output_file_path)

    #print('Load dataset...')
    tfdataset = dataset.dataset(tfrecord_input_file)

    #print('Training model...')
    start = datetime.datetime.now()
    model = address_net_estimator.train(tfdataset)
    end = datetime.datetime.now()

    print('Evaluate model...')
    evaluation = model.evaluate(tfdataset)
    print(f'evaluation={evaluation}')

    print(f'Finished training in {end - start} sec on file {input_file_only}. '
                f'Model saved to {model_output_file_path}')


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--tfrecord_input_file", help="Tfrecord input file from generate_tf_records.py")
    parser.add_argument("--model_output_file", help="Model output file")
    args = parser.parse_args()

    train(args.tfrecord_input_file, args.model_output_file)

Re-train model

Currently, I want to try to retrain a new model but it's hard for me.
As you said, "you are free to train this model using the model_fn provided" https://github.com/jasonrig/address-net#pretrained-model
So I have a question,
Is the model_fn function in model.py for training a new model? If not, so how to train a new model? Could you explain it help me?

Another common abbreviation for Level

Hi,

A (unfortunately) common abbreviation for level I have come across is a simple L. For example : UNIT 900, L 9, 50 THINGO ST, HOOHAAVILLE, VIC 3000. I even tried adding L to lookups.py and deleting the cache but to no avail. The kind of result I get is :
"flat_number": "9009",
"flat_number_prefix": "L",
"flat_type": "UNIT",
"locality_name": "HOOHAAVILLE",
"number_first": "50",
"original": "UNIT 900, L 9, 50 THINGO ST, HOOHAAVILLE, VIC 3000",
"postcode": "3000",
"state": "VICTORIA",
"street_name": "THINGO",
"street_type": "STREET"

or L9 with no space drags the 9 into the 50.

Is there a way to get L in and recognised?

Addressnet installation Issues

Hi,
I've downloaded "addressnet" from this repo, unfortunately it is not working with latest version of "Tensor Flow." I'm using Python => 3.11.4 and tf => 2.14. Do you have any plan to release a new version of "addressnet" using latest version of Python and TF?

Thx in advance.

postal address

Hi Jason,
Great stuff and thanks for posting this up.

In real world, a lot of people use Postal address, like this:
GPO Box 500606 Canberra Act 2004
When it goes through your parser, it returns :

{'street_name': 'GPO', 'street_type': 'BROW', 'postcode': '5006062004', 'locality_name': 'CANBERRA', 'state': 'AUSTRALIAN CAPITAL TERRITORY'}

Not sure why it generated a street type BROW. Is there training data that could address this?
Or would that require some re-coding?

Cheers,
Nick

Tensorflow version problem

code : from addressnet.predict import predict_one

Output : AttributeError: module 'tensorflow' has no attribute 'FixedLenFeature'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.