Comments (11)
ok, so this is too much effort. I don't think a startup is an issue for most people. One shouldn't be starting up pymatgen again and again in most use cases anyway. I will revert the change. If you have a solution, feel free to implement in monty and release.
from pymatgen.
@mturiansky thanks for profiling! such analysis is hugely appreciated!
lazy-loading seems like a good UX improvement here. PR incoming
from pymatgen.
Actually, I'm not sure this solves the problem. Sorry, I probably could have been more clear in my description.
The reason this doesn't work is because the importing of pandas is not actually being done within pymatgen.core.structure
. Pandas will be imported at the first invocation and then later invocations will simply read from the cached module. If you print out the full tree structure of python -Ximporttime
(not putting here because of length), you'll see that pandas is actually being within monty.json
. In monty, pandas is just an optional dependency though. If it exists, however, it will be imported.
Perhaps I should raise the issue with monty instead. If monty implements lazy loading also, then the change made here should work.
Thoughts?
from pymatgen.
By the way, here is the output from running importtime to show that the problem still exists:
import time: self [us] | cumulative | imported package
import time: 2518 | 100511 | numpy
import time: 242 | 129682 | pandas.core.api
import time: 511 | 176807 | pandas
import time: 411 | 284171 | monty.json
import time: 732 | 327149 | pymatgen.core.composition
import time: 343 | 109818 | scipy.spatial
import time: 644 | 112443 | pymatgen.core.lattice
import time: 3964 | 534195 | pymatgen.core
from pymatgen.
@mturiansky thanks for following up. materialsvirtuallab/monty#604 refactors to lazy loading in monty
also
from pymatgen.
This has become way more complicated than I had originally anticipated. The change implemented in monty is actually a significant regression:
import time: self [us] | cumulative | imported package
import time: 2044 | 100229 | numpy
import time: 347 | 106559 | monty.json
import time: 713 | 151276 | pymatgen.core.composition
import time: 338 | 101914 | scipy.spatial
import time: 677 | 104410 | pymatgen.core.lattice
import time: 210 | 121982 | pandas.core.api
import time: 441 | 164693 | pandas
import time: 472242 | 6429146 | pymatgen.symmetry.groups
import time: 778 | 6432398 | pymatgen.symmetry.maggroups
import time: 4142 | 6450731 | pymatgen.core.structure
import time: 3584 | 6760861 | pymatgen.core
It seems that not only is pandas still getting imported, it's getting imported with considerably higher overhead. Furthermore, since pydantic, bson, and torch are not installed in my environment, the interpreter attempts to import them over and over. The try except statement allows it to make this attempt over and over rather than just failing.
from pymatgen.
The use case that comes to mind is in a CLI, where you might call a script several times.
I will take a look at monty and see what I can do when I have some free time. Are you opposed to removing the try-except statements or if my solution does away with them would that be okay?
from pymatgen.
I would be opposed to removing try except because I want monty to be usable without those dependencies. If your solution keeps that and does not require try except, it is fine iwth me.
from pymatgen.
I think @mturiansky raises a good point overall, we want pymatgen to be lean, and that includes startup times. As long as the fix doesn’t create a mess, I’m in favor of it.
from pymatgen.
I would just say that anything that requires calling a script several times probably is better off looping in python, in which the import is done only once. Anyway, I am all for a solution that reduces startup times without mess.
from pymatgen.
I would just say that anything that requires calling a script several times probably is better off looping in python, in which the import is done only once.
Sure that's an obvious thing to say, but that's not always possible if you're trying to develop modular code, allowing the user to choose how they would like to interact with it. Returning to my original example of CLI, consider a script which can be used to do preprocessing and postprocessing of data, while running some other program in between. Sure, you could "throw it in a loop" and have python run the program for you, but that's not feasible for everyone. That also may entail developing significant amounts of automation code to run the program. Not everyone has a data-centric approach.
You could argue that two imports isn't a big deal for performance, and I'd agree. This is mainly a UX issue: half a second load times is noticeable for a user, especially when the import time is longer than the time it takes for the code to process the data. Half a second was just the number on my computer, some clusters have slow home directories for which import times can be much longer. @mkhorton hit the nail on the head with my intention for this request.
from pymatgen.
Related Issues (20)
- Pymatgen cannot appropriately parse CIF file download from Materials Project HOT 1
- `Vasprun` parser raises a `FileNotFoundError` when searching for POTCARs with pymatgen 2024.1.26 HOT 2
- Super slow import times on `2024.1.26` HOT 1
- String representation of Spin enum HOT 1
- Issue when using pymatgen to plot dos HOT 2
- tribchem: "Fixing" some issues in pymatgen.
- Compare two descriptions of crystal structures to see if the have the same underlying structure. HOT 6
- `PBE_64` configured via pmg doesn't work with pymatgen. HOT 9
- Vasprun get_potcars fails if no explicit directories in file path HOT 3
- Bug on DiscretizeOccupanciesTransformation
- Line mode and reciprocal density bug int MPHSEBSSet HOT 2
- Should we let the `StructureMatcher` method work at the affine space group classification level instead of the crystallographic space group classification level? HOT 5
- Finding the Lowest Energy Stacking Configuration for Two Layers with Specific Miller Indices Using Pymatgen. HOT 1
- Error when initializing Vasprun when LDAUL/J or MAGMOM exception triggers
- CifParser check() method does not account for different site multiplicities HOT 1
- POTCAR setup instructions should be made clearer HOT 3
- `bader_analysis_from_path` misusing `_get_filepath` and hence not finding input VASP files HOT 1
- `BadInputSetWarning` should not be raised for ISMEAR = 0 relaxations with metals if SIGMA is small
- `pymatgen.io.res` - compatibility issue with the AIRSS package. HOT 2
- how to load an MD from LAMMPS into a Trajectory object?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymatgen.