Feature Requested I noticed that pymatgen can have quite an overhe

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I think <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

I would just say that anything that requires calling a several times probably i

minimizing import overhead about pymatgen HOT 11 CLOSED

mturiansky commented on June 2, 2024 2

minimizing import overhead

from pymatgen.

Comments (11)

shyuep commented on June 2, 2024 1

ok, so this is too much effort. I don't think a startup is an issue for most people. One shouldn't be starting up pymatgen again and again in most use cases anyway. I will revert the change. If you have a solution, feel free to implement in monty and release.

from pymatgen.

janosh commented on June 2, 2024

@mturiansky thanks for profiling! such analysis is hugely appreciated!

lazy-loading seems like a good UX improvement here. PR incoming

from pymatgen.

mturiansky commented on June 2, 2024

Actually, I'm not sure this solves the problem. Sorry, I probably could have been more clear in my description.

The reason this doesn't work is because the importing of pandas is not actually being done within pymatgen.core.structure. Pandas will be imported at the first invocation and then later invocations will simply read from the cached module. If you print out the full tree structure of python -Ximporttime (not putting here because of length), you'll see that pandas is actually being within monty.json. In monty, pandas is just an optional dependency though. If it exists, however, it will be imported.

Perhaps I should raise the issue with monty instead. If monty implements lazy loading also, then the change made here should work.

Thoughts?

from pymatgen.

mturiansky commented on June 2, 2024

By the way, here is the output from running importtime to show that the problem still exists:

import time: self [us] | cumulative | imported package
import time:      2518 |     100511 |       numpy
import time:       242 |     129682 |         pandas.core.api
import time:       511 |     176807 |       pandas
import time:       411 |     284171 |     monty.json
import time:       732 |     327149 |   pymatgen.core.composition
import time:       343 |     109818 |     scipy.spatial
import time:       644 |     112443 |   pymatgen.core.lattice
import time:      3964 |     534195 | pymatgen.core

from pymatgen.

janosh commented on June 2, 2024

@mturiansky thanks for following up. materialsvirtuallab/monty#604 refactors to lazy loading in monty also

from pymatgen.

mturiansky commented on June 2, 2024

This has become way more complicated than I had originally anticipated. The change implemented in monty is actually a significant regression:

import time: self [us] | cumulative | imported package
import time:      2044 |     100229 |       numpy
import time:       347 |     106559 |     monty.json
import time:       713 |     151276 |   pymatgen.core.composition
import time:       338 |     101914 |     scipy.spatial
import time:       677 |     104410 |   pymatgen.core.lattice
import time:       210 |     121982 |           pandas.core.api
import time:       441 |     164693 |         pandas
import time:    472242 |    6429146 |       pymatgen.symmetry.groups
import time:       778 |    6432398 |     pymatgen.symmetry.maggroups
import time:      4142 |    6450731 |   pymatgen.core.structure
import time:      3584 |    6760861 | pymatgen.core

It seems that not only is pandas still getting imported, it's getting imported with considerably higher overhead. Furthermore, since pydantic, bson, and torch are not installed in my environment, the interpreter attempts to import them over and over. The try except statement allows it to make this attempt over and over rather than just failing.

from pymatgen.

mturiansky commented on June 2, 2024

The use case that comes to mind is in a CLI, where you might call a script several times.

I will take a look at monty and see what I can do when I have some free time. Are you opposed to removing the try-except statements or if my solution does away with them would that be okay?

from pymatgen.

shyuep commented on June 2, 2024

I would be opposed to removing try except because I want monty to be usable without those dependencies. If your solution keeps that and does not require try except, it is fine iwth me.

from pymatgen.

mkhorton commented on June 2, 2024

I think @mturiansky raises a good point overall, we want pymatgen to be lean, and that includes startup times. As long as the fix doesn’t create a mess, I’m in favor of it.

from pymatgen.

shyuep commented on June 2, 2024

I would just say that anything that requires calling a script several times probably is better off looping in python, in which the import is done only once. Anyway, I am all for a solution that reduces startup times without mess.

from pymatgen.

mturiansky commented on June 2, 2024

I would just say that anything that requires calling a script several times probably is better off looping in python, in which the import is done only once.

Sure that's an obvious thing to say, but that's not always possible if you're trying to develop modular code, allowing the user to choose how they would like to interact with it. Returning to my original example of CLI, consider a script which can be used to do preprocessing and postprocessing of data, while running some other program in between. Sure, you could "throw it in a loop" and have python run the program for you, but that's not feasible for everyone. That also may entail developing significant amounts of automation code to run the program. Not everyone has a data-centric approach.

You could argue that two imports isn't a big deal for performance, and I'd agree. This is mainly a UX issue: half a second load times is noticeable for a user, especially when the import time is longer than the time it takes for the code to process the data. Half a second was just the number on my computer, some clusters have slow home directories for which import times can be much longer. @mkhorton hit the nail on the head with my intention for this request.

from pymatgen.

minimizing import overhead about pymatgen HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent