Code Monkey home page Code Monkey logo

Comments (14)

daviddrysdale avatar daviddrysdale commented on August 24, 2024

As you know, occupancy wasn't really something I was concerned about when I did this port, so I'm not surprised you've got problems in a multi-process, limited-memory environment.

The core of the problem will be the loading of the metadata -- all of the patterns for all of the world's phone numbers, including geolocation data -- which a) all gets pulled in at library load time (i.e. when you import phonenumbers), and b) is in the form of generated Python code, rather than some sort of more efficient serialized form. (Both of these are different than the core Java code.)

My guess is that attacking a) would give the biggest win, in particular:

  • only load geolocation data if the geolocation functionality is needed
  • make each country's metadata only loaded on-demand (so if you only see phonenumbers from 5-10 countries you don't bother pulling in the other ~250 sets of metadata).

I'll have a think and see what I can do.

In the meanwhile, a quick fix you could try would be to replace geodata/init.py with a stub (GEOCODE_DATA={}), assuming you don't use the geocoding stuff (I think the geolocation data is ~75% of the space).

from python-phonenumbers.

potiuk avatar potiuk commented on August 24, 2024

Thanks for quick reaction :) Yeah that could help indeed, would be nice to incorporate a kind of solution like that in the next version (i.e. some flag specifying which data should be used). I do not know from which country numbers will be (this project has roots in US, Poland, Middle East, Egypt ). So I cannot really limit regions. For now I actually have found a workaround in the meantime, which I think will serve pretty well my needs. It's quite a bit simpler, but, since I really only want to guess whether I should add +CC in front or remove some leading zero and remove all non-numeric characters (I do not have to even check if the number is correct). I came up with a simple hack (but possibly one that will stay for a long time). I extracted the Region/Dial code from the code of yours (with proper attribution ;)) and implemented a really simple function (which covers probably 99% of cases, which is good enough for my case):

def normalize_phone_number(isocc, phone_number):
    '''
    :param isocc: country code in ISO 3166-1 format
    :param phone_number: number to normalize
    :return:
    normalized number in ISO E.164 format (+48123123123). Note that apart of standard normalisation it
    also removes leading 0s
    '''
    if isocc is None:
        #TODO: make reverse geo coding (?)
        isocc = "US"
    isocc = isocc.upper()
    dialing_code = ISOCC_TO_DIALING_CODE[isocc]
    # Remove all extra characters
    phone_number = ''.join([char for char in phone_number if char in "0123456789+"])
    if phone_number.startswith("00"):
        # Replace 00 with international prefix.
        phone_number = "+%s" % (phone_number[2:])
    if isocc not in ["IT", "SM", "VA"] and dialing_code and phone_number.startswith("0"):
        # avoid special case for Italy where 0-numbers are perfectly valid numbers
        # in all other countries it indicates local calling and leading 0 should be removed :(
        phone_number = "+%s%s" % (dialing_code, phone_number[1:])
    if not phone_number.startswith("+"):
        if not phone_number.startswith(isocc) and dialing_code:
            phone_number = "+%s%s" % (dialing_code, phone_number)
        else:
            phone_number = "+%s" % (phone_number,)
    return phone_number

from python-phonenumbers.

daviddrysdale avatar daviddrysdale commented on August 24, 2024

Your hack is potentially going to be rather brittle in the face of future changes to dialling plans, and is already likely to be broken for some countries (e.g. 00 is not a universal international dialling prefix) -- unfortunately, there's a reason why the base library is as complicated as it is!

In the meanwhile, I've pushed a potential improvement to the lowmem branch, which only loads the geocoder data on first use of a function from the geocoder sub-module -- would you be able to try it out and see if it helps with your memory problem?

from python-phonenumbers.

potiuk avatar potiuk commented on August 24, 2024

Sure. I will let you know when I try. I do realize my solution is over-simplistic hack and there are many cases it won't work (and as you say - future changes might break it even more).

Thanks for looking into it, I really appreciate it.

from python-phonenumbers.

potiuk avatar potiuk commented on August 24, 2024

I put in my "experimental" branch with lowmem version of the library added, and deployed at the dev system - it seems that all works fine now. I got around 4MB more memory used, and I can continue with the same number of workers as before. Thanks a lot :). We will deploy it tomorrow in production. If you could merge that lowmem change in the main branch, that would even be cooler. I am watching your repo now and wait for the release, so I hope I will be able to switch to the official library from pip whenever you get a new version released (which - from the commits I gather is pretty soon).

Thanks a lot for your help !

from python-phonenumbers.

AlexisH avatar AlexisH commented on August 24, 2024

I had just noticed the same issue and here you are reporting it and fixing it... thank you both!
It went from 20MB to 2MB memory used by the lib during my tests.

btw the time to load the lib has also improved, from 1300ms to 500ms. It would be nice to have the lib progressively load the "data" module as well but I guess that would require more work...
(A quick look at the code make me think that you could pre-generate the list of supported countries, and then load the "region_XX" module only once needed instead of loading all of them in init)

from python-phonenumbers.

daviddrysdale avatar daviddrysdale commented on August 24, 2024

OK, I've pulled the on-demand loading of the geodata into the main dev branch (so it will be included in the next point release -- which is triggered when the upstream Java project does a release).

I've also put some code into the lowmem branch to do on-demand loading of ordinary metadata -- let me know how you get on with it.

from python-phonenumbers.

potiuk avatar potiuk commented on August 24, 2024

Hello David,

Thanks. I will try it as soon as we get back working on the app using it.
For a short while we put it on hold and not doing any work on it (don't ask
;) ) but I hope this will be resurrected very soon and will try the latest
version right after...

I tested it locally and it seems Good To Go. All our tests are passing,
memory usage is ok when starting the app locally.

One comment: I saw it's now loading data per region, which is good on one
hand (for local apps) but might be misleading (re: memory use) for apps
which are 'global'. You might not realize in your tests that you need more
memory. So I suggest a "memory use" chapter in the documentation (I might
contribute if you don't like docs :) ). I saw that there is already
"load_all" method, which is cool. On a related note - we might also need
"load_all_gecode_data" or something like that - for those who would like to
load all the data upfront.

Regards,

J.

from python-phonenumbers.

daviddrysdale avatar daviddrysdale commented on August 24, 2024

Good point, I've added a note in the readme in the lowmem branch (which includes the observation that import phonenumbers.geocoder will force the load of all of the geodata, btw).

from python-phonenumbers.

potiuk avatar potiuk commented on August 24, 2024

+1

Very clear and straightforward.

J.

+48 660 796 129
"Humanity has advanced, when it has advanced, not because it has been
sober,
responsible, and cautious, but because it has been playful, rebellious, and
immature."

On Sun, Feb 24, 2013 at 4:57 PM, David Drysdale [email protected]:

Good point, I've added a note in the readme in the lowmem branch (which
includes the observation that import phonenumbers.geocoder will force the
load of all of the geodata, btw).


Reply to this email directly or view it on GitHubhttps://github.com//issues/13#issuecomment-14010452.

from python-phonenumbers.

AlexisH avatar AlexisH commented on August 24, 2024

Thanks David!
It works well so far.

Just a comment: it looks like a waste of memory to create all these "loader" functions, why not using only one function (but that would need an argument) and make the import statement dynamic?

Something like:

def _load_region(code):
    code = str(code)
    __import__('region_'+code, globals=globals(), fromlist=['PHONE_METADATA_'+code])

PhoneMetadata.register_nongeo_region_loader(800, _load_region)
PhoneMetadata.register_nongeo_region_loader(808, _load_region)
...

from python-phonenumbers.

daviddrysdale avatar daviddrysdale commented on August 24, 2024

Good idea; I've pushed something along those lines.

from python-phonenumbers.

daviddrysdale avatar daviddrysdale commented on August 24, 2024

lowmem branch pulled into dev

from python-phonenumbers.

potiuk avatar potiuk commented on August 24, 2024

Cool.

On Tue, Mar 5, 2013 at 12:31 PM, David Drysdale [email protected]:

lowmem branch pulled into dev

+48 660 796 129
"Humanity has advanced, when it has advanced, not because it has been
sober,
responsible, and cautious, but because it has been playful, rebellious, and
immature."

from python-phonenumbers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.