Code Monkey home page Code Monkey logo

Comments (10)

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by devin-petersohn
Thursday Mar 10, 2022 at 20:53 GMT


Hi @skrawcz, I'm happy to help support this idea.

  • Would (1) require Modin to be a hard dependency? Would your users want that type of behavior from hamilton?
  • 2 seems much more natural to me. Modin users can be hamilton users in that case, right?

What is required from the Modin side for an integration?

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by skrawcz
Friday Mar 11, 2022 at 00:13 GMT


@devin-petersohn

In my ideal world, we could we enable people to maintain https://github.com/stitchfix/hamilton/blob/main/examples/hello_world/my_functions.py without touching it.

Reasoning:

  1. that code is reusable in different contexts very easily.
  2. if someone wanted to switch to dask, ray, koalas, they "technically" wouldn't have to change that code. Maybe a platform team wants to control this, and the user shouldn't need to?

idea 1

Add this to the top of the file, if people want

from hamilton.augment import pandas as pd

And if modin was detected as being installed, it would be returned, otherwise vanilla pandas would be.

idea 2

We would require users to "hard code" modin -- i.e. replace the pandas import in https://github.com/stitchfix/hamilton/blob/main/examples/hello_world/my_functions.py with modin.
Or do some if else based on an environment variable, or something like that.
Essentially like idea (1) -- but the user controls it. The downside is that they have to know about modin.

idea 3

Is there some other python duck typing way (this might be considered hacky), that "driver" or "graph adapter" code would own?

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by elijahbenizzy
Friday Mar 11, 2022 at 03:43 GMT


Either one could be OK I think.

That said, I want to try something like the following, in the driver:

sys.modules['pandas'] = modin.pandas

So long as its the first thing executed (big IF, but it should be), then this should work...

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by skrawcz
Monday Mar 28, 2022 at 05:08 GMT


Either one could be OK I think.

That said, I want to try something like the following, in the driver:

sys.modules['pandas'] = modin.pandas

So long as its the first thing executed (big IF, but it should be), then this should work...

Hmm, that could work. Will have to prototype and see how the ergonomics feel.

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by skrawcz
Monday Mar 28, 2022 at 05:10 GMT


And if modin was detected as being installed, it would be returned, otherwise vanilla pandas would be.

Just putting it in here, that we could use https://docs.python.org/3/library/importlib.html#checking-if-a-module-can-be-imported or something like that to check if modin is installed.

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by devin-petersohn
Thursday Apr 07, 2022 at 01:12 GMT


Thanks @skrawcz and @elijahbenizzy! Is there some way we can help to support this on the Modin side?

Typically the approach I have taken with Modin is that the choice should be the users and that users should be aware that Modin is being used. We make it easy to not only move to Modin, but also back to pandas from Modin if you choose.

Replacing sys.modules could work, but there may be some considerations here:

  • When would you change sys.modules?
    • If it happens on import, and the user doesn't end up using Hamilton after import, they may still end up using Modin thinking it is pandas (weird edge case)
  • Would users be able to opt-out of this (or opt-in)?
  • If Modin had a configuration that did this for you, could that help?

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by skrawcz
Tuesday Apr 12, 2022 at 05:37 GMT


Thanks @devin-petersohn comments inline:

When would you change sys.modules?

  • If it happens on import, and the user doesn't end up using Hamilton after import, they may still end up using Modin thinking it is pandas (weird edge case)

It wouldn't happen on import of hamilton no. It would be as part of a script/flow of execution:

from hamilton import driver, switch_modin_for_pandas, switch_pandas_for_modin

# do switch here
switch_modin_for_pandas()
# have to import after doing the switch
import func_module
dr = driver.Driver({}, func_module)
df = dr.execute(['a', 'b', ...])
# switch it back
switch_pandas_for_modin()
save_df(df)

Would users be able to opt-out of this (or opt-in)?

The idea is that they opt-in to this, but ideally they don't have to change any of their logic to do so.

If Modin had a configuration that did this for you, could that help?

Potentially. It would enable us to depend on it, rather than having to write and maintain it ourselves.

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by devin-petersohn
Tuesday Apr 19, 2022 at 02:46 GMT


Ok this should actually be pretty straightforward as an opt-in utility. I think we can probably start with a hacky approach in Hamilton and then merge it into Modin as it matures as a configuration (i.e. modin.config.OverrideAllPandasCalls.enable()). I like the idea of the config, but let's make sure it solves the use case first is the thinking here. Would that process make sense to you @skrawcz and @elijahbenizzy?

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by skrawcz
Thursday Apr 21, 2022 at 15:41 GMT


@devin-petersohn sounds good.

Do you happen by chance to have that hacky incantation?

from hamilton.

HamiltonRepoMigrationBot avatar HamiltonRepoMigrationBot commented on May 13, 2024

Comment by skrawcz
Monday May 23, 2022 at 23:38 GMT


@devin-petersohn okay so we can't play with sys modules, because modin still requires access to pandas. So you'd have to provide a means to do it. Created modin-project/modin#4488 to track.

Otherwise I am going to prototype idea 1 and see how that feels.

from hamilton.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.