Comments (7)
Description:
I encountered an issue while trying to apply multiple data validation decorators to a single function in the Hamilton DAG framework. Specifically, I am trying to validate different columns of a DataFrame using multiple instances of the @check_output_custom decorator. However, I receive a ValueError indicating that the function cannot be defined more than once.
Steps to Reproduce:
- Define a function to process a DataFrame.
- Apply multiple @check_output_custom decorators to the function, each with different validation parameters.
- Attempt to run the decorated function.
Example code snippet:
1st issue code snippet
@check_output_custom(CompositePrimaryKeyValidatorPySparkDataFrame(columns=["OrderID", "ItemNumber"], importance="fail")) @check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")) def process_order_data(order_data_config: dict, order_filter_template: List) -> DataFrame: # Function implementation pass
This raises the error:
ValueError: Cannot define function process_order_data_raw more than once. Already defined by function <function process_order_data
2nd issue code snippet
@check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")) @check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="ProductID", allowed_values=[10, 20, 30], importance="warn")) def process_order_data(order_data_config: dict, order_filter_template: List) -> DataFrame: # Function implementation pass
This raises the error:
ValueError: Cannot define function process_order_data_CategoricalValuesValidator more than once. Already defined by function <function process_order_data
Expected Behavior
Applying multiple @check_output_custom decorators to a single function should allow for different validation checks on various columns of the DataFrame without raising a ValueError.
Actual Behavior
A ValueError is raised, indicating that the function cannot be defined more than once by the same validator.
Library & System Information
python version = 3.9.5 hamilton library version = 1.65.0
Additional Context:
This issue prevents the application of multiple validators to a single function, which is necessary for comprehensive data validation in our use case. It would be helpful if the framework could support multiple validators on the same function without raising errors.
Thank you for your attention to this issue.
Thanks for opening! This is limitation I think. E.G. two that have the same name + another complexity. I think we can build a fix, but just to check, if you have them both in the same validator (E.G. as follows) does it work? My guess is not, but worth a try:
@check_output_custom(
CompositePrimaryKeyValidatorPySparkDataFrame(columns=["OrderID", "ItemNumber"], importance="fail")),
CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")
)
from hamilton.
Another thought would be to add another custom validator that takes in multiple validators... 🤔
Otherwise I think a potential avenue to scope would be to include some name_
kwarg to help name the node so it doesn't clash...
from hamilton.
@elijahbenizzy
Yes, its working. Thanks for your help. If it is mention in the documentation, it would be helpful for the new comers.
But the 2nd issue still present which is 'If we pass same data validator twice getting error' like below
@check_output_custom(
CategoricalValuesValidatorPySparkDataFrame(column="ReportingId", allowed_values=[156], importance="fail")),
CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")
)
from hamilton.
@rohithrockzz could you try installing
pip install sf-hamilton==1.66.1rc0
and see if that fixes your issue please?
from hamilton.
@skrawcz
Yes, it worked. Thank you so much for the quick fix
from hamilton.
@rohithrockzz great thanks for verifying. I will publish a non-RC version in the morning.
from hamilton.
@rohithrockzz this has been released under sf-hamilton==1.66.1
from hamilton.
Related Issues (20)
- `Config` node missing from legend
- Add `Builder.with_materializers()`
- `UX` Hamilton Project
- Decorator like `@subdag` that works with async functions HOT 7
- chore: Remove numpy dependency HOT 2
- UI: Improve parallel / collect UI support
- Enable configuration/tags to turn off data introspection in SDK
- Enable mutation of the output of nodes in a linear fashion via decorators
- save_to.parquet strips URI prefix when decorated from a resolve_from_config wrapper HOT 2
- Add helm chart for running Hamilton UI
- Add podman compatible instructions for running Hamilton UI
- Programmatic project creation for Hamilton UI
- Add google collab links for all notebooks HOT 2
- Enable outputs to access all inputs HOT 2
- Better support for caching / checkpointing development workflow - umbrella issue HOT 2
- `PandasExcelWriter` overwrites file
- Workflow stuck with parallelizable node HOT 2
- chore: Support notebooks in docs
- display `DataLoader` as materializer in viz HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hamilton.