I think we need to pay more attention to the way we load ACT modules. Currently it tak

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Speed up import time to load ACT about act HOT 24 CLOSED

arm-doe commented on May 30, 2024

Speed up import time to load ACT

from act.

Comments (24)

AdamTheisen commented on May 30, 2024

@rcjackson Is moving the imports into the classes a good way to get around the import time?

from act.

AdamTheisen commented on May 30, 2024

It sounds reasonable to me

from act.

rcjackson commented on May 30, 2024

Once you import the modules once they don't load again until you restart. So it won't really matter if we put them in the classes or not, I suspect we will see the same performance either way.

…

________________________________ From: Adam Theisen <[email protected]> Sent: Thursday, May 23, 2019 2:49:59 PM To: ANL-DIGR/ACT Cc: Jackson, Robert; Mention Subject: Re: [ANL-DIGR/ACT] Speed up import time to load ACT (#62) @rcjackson<https://github.com/rcjackson> Is moving the imports into the classes a good way to get around the import time? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#62>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFIQA5CSCHJJRGWD3ACRFPLPW3YOPANCNFSM4HNX35PA>.

from act.

kenkehoe commented on May 30, 2024

OK. I am having a bit of issues with using ACT on my local machine. I can't get cartopy installed locally. Since the module is not used with time series plotting I can't do that. Should we work on only requiring the modules that get used so the repo is useable by everyone?

from act.

AdamTheisen commented on May 30, 2024

@kenkehoe If you can move the imports around and make it work for yourself without causing functionality issues, please go ahead.

from act.

AdamTheisen commented on May 30, 2024

I'm doing some testing and will just post some results in here.

First import try was 7.9 seconds, after that the average was about 1.4 seconds

The single largest time sync seemed to be importing mpcalc from metpy which adds about 0.4 seconds to the import

from act.

zssherman commented on May 30, 2024

I'll run through the code and see if I can find some changes, but some of the imports you only need to import certain functions, so instead of:
import xarray as xr,

if your only using open_mfdataset:
from xarray import open_mfdataset

Importing just needed functions might possibly speed up the code, and also in the init's there is full module imports, like in the io init.py there is:
from . import armfiles
this can be changed to:
from .armfiles import read_netcdf etc.

from act.

zssherman commented on May 30, 2024

Actually nvm on the first half, after more research and local test it seems that import specific functions won't really effect speed...

from act.

zssherman commented on May 30, 2024

import time: self [us] | cumulative | imported package
import time: 1975 | 119753 | act.plotting.SkewTDisplay
import time: 2569 | 2569 | cartopy.io.img_tiles
import time: 2082 | 4650 | act.plotting.XSectionDisplay
import time: 2155 | 2155 | act.plotting.GeoDisplay
import time: 2125 | 2125 | act.plotting.HistogramDisplay
import time: 2573 | 1217256 | act.plotting
import time: 2528 | 2528 | act.corrections.common
import time: 1918 | 1918 | act.corrections.ceil
import time: 1975 | 1975 | act.corrections.mpl
import time: 3180 | 9599 | act.corrections
import time: 2528 | 2528 | act.tests.sample_files
import time: 2928 | 5455 | act.tests
import time: 2473 | 2473 | act.discovery.get_armfiles
import time: 2965 | 5437 | act.discovery
import time: 237 | 237 | metpy
import time: 57 | 293 | metpy.calc
import time: 2350 | 2642 | act.retrievals.stability_indices
import time: 2986 | 5627 | act.retrievals
import time: 2182 | 2182 | act._version
import time: 242112 | 3633517 | act

Might help narrow it down.

from act.

AdamTheisen commented on May 30, 2024

@kenkehoe it seems to me that this is importing in an ok amount of time (2-4 seconds). Do you still see issues at all?

from act.

kenkehoe commented on May 30, 2024

It's OK. I'll close it.

from act.

AdamTheisen commented on May 30, 2024

@zssherman we've been asked to look into this issue again as the import times may be causing issues with the DQ processing.

from act.

zssherman commented on May 30, 2024

@AdamTheisen Gotcha, what were we using previous to check import times for act?

from act.

AdamTheisen commented on May 30, 2024

@zssherman I think you ran some tests in the past but I was using the newer python -X importtime script.py. @kenkehoe was going to help and run this on their systems/environments.

from act.

zssherman commented on May 30, 2024

@AdamTheisen Gotcha, yeah if he runs it on their system, I can get a better idea of where the hold up is in the code.

from act.

kenkehoe commented on May 30, 2024

System | Self Time [us] | Cumulative Time [us]

procnode1-dev (i.e. mercury) | 700,725 | 5,145,136
emerald | 1,470,465 | 11,659,648
emerald (using local disk only) | 687,683 | 2,826,976
dq1 (i.e. gold) | 1,186 | 2,335,415
my Mac laptop | 129,503 | 4,654,856

from act.

AdamTheisen commented on May 30, 2024

Thanks @kenkehoe ! That's an interesting range of times. Do you have the full output at all? That will help narrow down the areas taking the longest

from act.

kenkehoe commented on May 30, 2024

Ummm, yeah I can get the full output too. It's very long. I was trying to think of ways to summarize it. I'll look into it today.

from act.

mgrover1 commented on May 30, 2024

Here is profile how long each of the imports takes

Currently, we import everything at the top level:

import act

The only way to speed things up would be to change the api a bit, essentially requiring the user to import what they need. For example, if someone is doing io/corrections, they would do

import act.corrections
import act.io

from act.io import read_netcdf
from act.corrections import correct_ceil

For MetPy, at the top level, they import the basics (essentially just the xarray accessor), and require the user to import the rest when they are working with different parts of the package. For example:

from metpy.cbook import get_test_data
from metpy.plots import MapPanel, PanelContainer

Perhaps we could discuss this more in detail at the next ACT meeting. This would be a pretty big change, but it is the only way to get around the import time issue. There aren't any classes, moving imports, etc. we can change to magically fix this - there is a tradeoff between importing everything in one line, and importing things as the user needs them.

from act.

AdamTheisen commented on May 30, 2024

This definitely needs more discussion but it might be worth it in the long run if we expect ACT to continue to grow in size. Or we work on excluding the test files and data first and then go from there. Definitely open, but we would want to run this by other users. I would be curious what @maxwellevin would have to say with the other projects that use ACT and if the import time is an issue.

from act.

kenkehoe commented on May 30, 2024

I agree this is worth discussing and testing. Moving the testing files out of the installed repo and improving the speed would be worth the effort to move ACT inline with other highly used and regarded repos. Now is our chance to make these changes that could affect usage. Making the change in the future will just be more difficult. Would it be possible to go the numpy route with both options working but issuing warnings for the full import at the start and then transitioning to not importing everything at the start in a few versions to give people time to make changes now?

from act.

maxwelllevin commented on May 30, 2024

@AdamTheisen for tsdat we haven't thought much about optimizing module loading, but we definitely will at some point. I don't know much about how to improve it outside of what has already been suggested here, so I am definitely interested in learning more about this. For kicks I ran the same thing as @mgrover1 (I think?) to generate tsdat's import profile and it looks pretty evenly split between xarray and act. I'm certain there are things we could be doing from the tsdat side of things to mitigate this, I just don't know how yet.

code

pip install tsdat tuna
python -X importtime -c "import tsdat" 2 > import.log
tuna import.log

I like @mgrover1's suggestion of changing how modules are loaded to improve performance in most cases. The downside, as already mentioned, is that the API does change quite a bit since you now have to manually import the submodules you need. That can be a bit annoying if your workflow involves using all the act submodules, but I think most people only use one or two immediately.

For tsdat we are currently only actually using the act.utils module directly, but we spend a good chunk of time importing the act.discovery and act.plotting modules. I've seen the split submodule approach in other libraries too – rich and scipy come to mind for me, but I'm sure there's more.

from act.

mgrover1 commented on May 30, 2024

@AdamTheisen @kenkehoe At SciPy last week, the Scientific Python group mentioned they have a list of SPECs for the community - including one on lazy loading of submodules (which is what we are aiming for here), speeding up import times.

Here is the SPEC - https://scientific-python.org/specs/spec-0001/

It is worth looking into this strategy + method to see if we can use it here.

from act.

AdamTheisen commented on May 30, 2024

@mgrover1 this looks like a good option. Would take some initial effort up front to implement but could be worth it

from act.

Speed up import time to load ACT about act HOT 24 CLOSED

Comments (24)

System | Self Time [us] | Cumulative Time [us]

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent