Comments (24)
@rcjackson Is moving the imports into the classes a good way to get around the import time?
from act.
It sounds reasonable to me
from act.
from act.
OK. I am having a bit of issues with using ACT on my local machine. I can't get cartopy installed locally. Since the module is not used with time series plotting I can't do that. Should we work on only requiring the modules that get used so the repo is useable by everyone?
from act.
@kenkehoe If you can move the imports around and make it work for yourself without causing functionality issues, please go ahead.
from act.
I'm doing some testing and will just post some results in here.
First import try was 7.9 seconds, after that the average was about 1.4 seconds
The single largest time sync seemed to be importing mpcalc from metpy which adds about 0.4 seconds to the import
from act.
I'll run through the code and see if I can find some changes, but some of the imports you only need to import certain functions, so instead of:
import xarray as xr,
if your only using open_mfdataset:
from xarray import open_mfdataset
Importing just needed functions might possibly speed up the code, and also in the init's there is full module imports, like in the io init.py there is:
from . import armfiles
this can be changed to:
from .armfiles import read_netcdf etc.
from act.
Actually nvm on the first half, after more research and local test it seems that import specific functions won't really effect speed...
from act.
import time: self [us] | cumulative | imported package
import time: 1975 | 119753 | act.plotting.SkewTDisplay
import time: 2569 | 2569 | cartopy.io.img_tiles
import time: 2082 | 4650 | act.plotting.XSectionDisplay
import time: 2155 | 2155 | act.plotting.GeoDisplay
import time: 2125 | 2125 | act.plotting.HistogramDisplay
import time: 2573 | 1217256 | act.plotting
import time: 2528 | 2528 | act.corrections.common
import time: 1918 | 1918 | act.corrections.ceil
import time: 1975 | 1975 | act.corrections.mpl
import time: 3180 | 9599 | act.corrections
import time: 2528 | 2528 | act.tests.sample_files
import time: 2928 | 5455 | act.tests
import time: 2473 | 2473 | act.discovery.get_armfiles
import time: 2965 | 5437 | act.discovery
import time: 237 | 237 | metpy
import time: 57 | 293 | metpy.calc
import time: 2350 | 2642 | act.retrievals.stability_indices
import time: 2986 | 5627 | act.retrievals
import time: 2182 | 2182 | act._version
import time: 242112 | 3633517 | act
Might help narrow it down.
from act.
@kenkehoe it seems to me that this is importing in an ok amount of time (2-4 seconds). Do you still see issues at all?
from act.
It's OK. I'll close it.
from act.
@zssherman we've been asked to look into this issue again as the import times may be causing issues with the DQ processing.
from act.
@AdamTheisen Gotcha, what were we using previous to check import times for act?
from act.
@zssherman I think you ran some tests in the past but I was using the newer python -X importtime script.py. @kenkehoe was going to help and run this on their systems/environments.
from act.
@AdamTheisen Gotcha, yeah if he runs it on their system, I can get a better idea of where the hold up is in the code.
from act.
System | Self Time [us] | Cumulative Time [us]
procnode1-dev (i.e. mercury) | 700,725 | 5,145,136
emerald | 1,470,465 | 11,659,648
emerald (using local disk only) | 687,683 | 2,826,976
dq1 (i.e. gold) | 1,186 | 2,335,415
my Mac laptop | 129,503 | 4,654,856
from act.
Thanks @kenkehoe ! That's an interesting range of times. Do you have the full output at all? That will help narrow down the areas taking the longest
from act.
Ummm, yeah I can get the full output too. It's very long. I was trying to think of ways to summarize it. I'll look into it today.
from act.
Here is profile how long each of the imports takes
Currently, we import everything at the top level:
import act
The only way to speed things up would be to change the api a bit, essentially requiring the user to import what they need. For example, if someone is doing io/corrections, they would do
import act.corrections
import act.io
or
from act.io import read_netcdf
from act.corrections import correct_ceil
For MetPy, at the top level, they import the basics (essentially just the xarray accessor), and require the user to import the rest when they are working with different parts of the package. For example:
from metpy.cbook import get_test_data
from metpy.plots import MapPanel, PanelContainer
Perhaps we could discuss this more in detail at the next ACT meeting. This would be a pretty big change, but it is the only way to get around the import time issue. There aren't any classes, moving imports, etc. we can change to magically fix this - there is a tradeoff between importing everything in one line, and importing things as the user needs them.
from act.
This definitely needs more discussion but it might be worth it in the long run if we expect ACT to continue to grow in size. Or we work on excluding the test files and data first and then go from there. Definitely open, but we would want to run this by other users. I would be curious what @maxwellevin would have to say with the other projects that use ACT and if the import time is an issue.
from act.
I agree this is worth discussing and testing. Moving the testing files out of the installed repo and improving the speed would be worth the effort to move ACT inline with other highly used and regarded repos. Now is our chance to make these changes that could affect usage. Making the change in the future will just be more difficult. Would it be possible to go the numpy route with both options working but issuing warnings for the full import at the start and then transitioning to not importing everything at the start in a few versions to give people time to make changes now?
from act.
@AdamTheisen for tsdat
we haven't thought much about optimizing module loading, but we definitely will at some point. I don't know much about how to improve it outside of what has already been suggested here, so I am definitely interested in learning more about this. For kicks I ran the same thing as @mgrover1 (I think?) to generate tsdat's import profile and it looks pretty evenly split between xarray
and act
. I'm certain there are things we could be doing from the tsdat
side of things to mitigate this, I just don't know how yet.
code
pip install tsdat tuna
python -X importtime -c "import tsdat" 2 > import.log
tuna import.log
I like @mgrover1's suggestion of changing how modules are loaded to improve performance in most cases. The downside, as already mentioned, is that the API does change quite a bit since you now have to manually import the submodules you need. That can be a bit annoying if your workflow involves using all the act
submodules, but I think most people only use one or two immediately.
For tsdat
we are currently only actually using the act.utils
module directly, but we spend a good chunk of time importing the act.discovery
and act.plotting
modules. I've seen the split submodule approach in other libraries too – rich
and scipy
come to mind for me, but I'm sure there's more.
from act.
@AdamTheisen @kenkehoe At SciPy last week, the Scientific Python group mentioned they have a list of SPECs for the community - including one on lazy loading of submodules (which is what we are aiming for here), speeding up import times.
Here is the SPEC - https://scientific-python.org/specs/spec-0001/
It is worth looking into this strategy + method to see if we can use it here.
from act.
@mgrover1 this looks like a good option. Would take some initial effort up front to implement but could be worth it
from act.
Related Issues (20)
- New Example Template
- Standardize discovery module HOT 7
- DistributionDisplay Consistent Names
- Adding plotting options to Skew-T HOT 4
- 2023 ARM-ASR Joint PI Meeting - I/O Suggestions HOT 3
- QC-aware transformations HOT 8
- UserWarning for Time HOT 5
- Replace current CVD colormaps with CMWeather HOT 1
- Epoch addition to ACT Discovery HOT 6
- Support for Writing ICARTT formatted files
- GeoDisplay Attribution HOT 1
- Sunset Stamen maps in GeoDisplay and potentially replace HOT 1
- Consistent IO Names for V2.0.0
- Reorganize the Unit Tests HOT 4
- Change default behavior of datafilter
- Update ARM DQR webservice HOT 3
- setup.cfg excluding armfiles.py HOT 4
- ACT HomeyerRainbow cmap not found HOT 6
- Error while using add_dqr_to_qc: flag_values and flag_masks not set as expected HOT 9
- Make version 2.0 release prominent on landing page HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from act.