Code Monkey home page Code Monkey logo

Comments (11)

shyuep avatar shyuep commented on June 2, 2024 1

ok, so this is too much effort. I don't think a startup is an issue for most people. One shouldn't be starting up pymatgen again and again in most use cases anyway. I will revert the change. If you have a solution, feel free to implement in monty and release.

from pymatgen.

janosh avatar janosh commented on June 2, 2024

@mturiansky thanks for profiling! such analysis is hugely appreciated!

lazy-loading seems like a good UX improvement here. PR incoming

from pymatgen.

mturiansky avatar mturiansky commented on June 2, 2024

Actually, I'm not sure this solves the problem. Sorry, I probably could have been more clear in my description.

The reason this doesn't work is because the importing of pandas is not actually being done within pymatgen.core.structure. Pandas will be imported at the first invocation and then later invocations will simply read from the cached module. If you print out the full tree structure of python -Ximporttime (not putting here because of length), you'll see that pandas is actually being within monty.json. In monty, pandas is just an optional dependency though. If it exists, however, it will be imported.

Perhaps I should raise the issue with monty instead. If monty implements lazy loading also, then the change made here should work.

Thoughts?

from pymatgen.

mturiansky avatar mturiansky commented on June 2, 2024

By the way, here is the output from running importtime to show that the problem still exists:

import time: self [us] | cumulative | imported package
import time:      2518 |     100511 |       numpy
import time:       242 |     129682 |         pandas.core.api
import time:       511 |     176807 |       pandas
import time:       411 |     284171 |     monty.json
import time:       732 |     327149 |   pymatgen.core.composition
import time:       343 |     109818 |     scipy.spatial
import time:       644 |     112443 |   pymatgen.core.lattice
import time:      3964 |     534195 | pymatgen.core

from pymatgen.

janosh avatar janosh commented on June 2, 2024

@mturiansky thanks for following up. materialsvirtuallab/monty#604 refactors to lazy loading in monty also

from pymatgen.

mturiansky avatar mturiansky commented on June 2, 2024

This has become way more complicated than I had originally anticipated. The change implemented in monty is actually a significant regression:

import time: self [us] | cumulative | imported package
import time:      2044 |     100229 |       numpy
import time:       347 |     106559 |     monty.json
import time:       713 |     151276 |   pymatgen.core.composition
import time:       338 |     101914 |     scipy.spatial
import time:       677 |     104410 |   pymatgen.core.lattice
import time:       210 |     121982 |           pandas.core.api
import time:       441 |     164693 |         pandas
import time:    472242 |    6429146 |       pymatgen.symmetry.groups
import time:       778 |    6432398 |     pymatgen.symmetry.maggroups
import time:      4142 |    6450731 |   pymatgen.core.structure
import time:      3584 |    6760861 | pymatgen.core

It seems that not only is pandas still getting imported, it's getting imported with considerably higher overhead. Furthermore, since pydantic, bson, and torch are not installed in my environment, the interpreter attempts to import them over and over. The try except statement allows it to make this attempt over and over rather than just failing.

from pymatgen.

mturiansky avatar mturiansky commented on June 2, 2024

The use case that comes to mind is in a CLI, where you might call a script several times.

I will take a look at monty and see what I can do when I have some free time. Are you opposed to removing the try-except statements or if my solution does away with them would that be okay?

from pymatgen.

shyuep avatar shyuep commented on June 2, 2024

I would be opposed to removing try except because I want monty to be usable without those dependencies. If your solution keeps that and does not require try except, it is fine iwth me.

from pymatgen.

mkhorton avatar mkhorton commented on June 2, 2024

I think @mturiansky raises a good point overall, we want pymatgen to be lean, and that includes startup times. As long as the fix doesn’t create a mess, I’m in favor of it.

from pymatgen.

shyuep avatar shyuep commented on June 2, 2024

I would just say that anything that requires calling a script several times probably is better off looping in python, in which the import is done only once. Anyway, I am all for a solution that reduces startup times without mess.

from pymatgen.

mturiansky avatar mturiansky commented on June 2, 2024

I would just say that anything that requires calling a script several times probably is better off looping in python, in which the import is done only once.

Sure that's an obvious thing to say, but that's not always possible if you're trying to develop modular code, allowing the user to choose how they would like to interact with it. Returning to my original example of CLI, consider a script which can be used to do preprocessing and postprocessing of data, while running some other program in between. Sure, you could "throw it in a loop" and have python run the program for you, but that's not feasible for everyone. That also may entail developing significant amounts of automation code to run the program. Not everyone has a data-centric approach.

You could argue that two imports isn't a big deal for performance, and I'd agree. This is mainly a UX issue: half a second load times is noticeable for a user, especially when the import time is longer than the time it takes for the code to process the data. Half a second was just the number on my computer, some clusters have slow home directories for which import times can be much longer. @mkhorton hit the nail on the head with my intention for this request.

from pymatgen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.