Comments (13)
BTW: masking semi-structured data is on my bucket list as well. However, I am a bit concerned about performance implications of the solution I linked above. Wondering to what extent this solution will hamper the columnar access of single VARIANT paths.
That is my concern as well, lets keep this thread open to share thoughts on performance and design considerations for this feature
from dbt-snow-mask.
Hello @rumbin, Interesting thoughts
- Idea was to use have the flexibility for masking policies in the same way how currently tests are defined for each model.
- We could also create a single yml with all models and related meta, that should work as well same as tests
- If we need an dbt-snow-mask yml, that needs some additional design and development
- Regarding masking semi-structured data, just came across an use case last week so will read the reference you have shared and will try to implement in dbt-snow-mask as well
from dbt-snow-mask.
@entechlog
I am not sure I do understand your thoughts about a central dbt-snow-mask.yml.
So I simply elaborate a bit more on my thoughts and hope we'll converge into the same solution at some point ;-).
My understanding of a masking policy is something which is rather re-usable. Just like the general definition of a test method. E.g., not_nul
or unique
are defined centrally as a one macro, respectively, to be used in many models' columns.
Or, just like a UDF or a UDTF, which are also defined once in a central schema, like, e.g., public
or utils
or so.
Specifying which policy to apply to which column is currently done via the schema YAML in the meta
section and I feel that this is the most natural approach. So this should be kept in my eyes.
I think that your and my points of view differ when it comes to considering the reusability of a masking policy.
I understand that there may be use cases where a masking policy is pretty schema specific, since the masking conditions are tailored to some specific roles which might in turn be closely related to the schema's content. In such a case it might be useful to place the policy in the same schema like the model it applies to, however, the policy's creation macro will anyway need to be uniquely named and, thus, it would not harm to put the policies in a common, central schema, as well.
Having the masking policies in a central schema ensures we have no duplication and we can configure different access control for the policies, than for the models's schemas.
BTW: masking semi-structured data is on my bucket list as well. However, I am a bit concerned about performance implications of the solution I linked above. Wondering to what extent this solution will hamper the columnar access of single VARIANT paths.
from dbt-snow-mask.
I am not sure I do understand your thoughts about a central dbt-snow-mask.yml.
So I simply elaborate a bit more on my thoughts and hope we'll converge into the same solution at some point ;-).My understanding of a masking policy is something which is rather re-usable. Just like the general definition of a test method. E.g.,
not_nul
orunique
are defined centrally as a one macro, respectively, to be used in many models' columns.
Or, just like a UDF or a UDTF, which are also defined once in a central schema, like, e.g.,public
orutils
or so.
Specifying which policy to apply to which column is currently done via the schema YAML in themeta
section and I feel that this is the most natural approach. So this should be kept in my eyes.
Thank you for the detailed insights on your thoughts, We are totally on same page here
from dbt-snow-mask.
I think that your and my points of view differ when it comes to considering the reusability of a masking policy.
I understand that there may be use cases where a masking policy is pretty schema specific, since the masking conditions are tailored to some specific roles which might in turn be closely related to the schema's content. In such a case it might be useful to place the policy in the same schema like the model it applies to, however, the policy's creation macro will anyway need to be uniquely named and, thus, it would not harm to put the policies in a common, central schema, as well.
Had the same thoughts when I originally started, to have a schema for compliance
and to define the masking policies in a central schema and reuse them in the required target database/schema. Had to take the current approach for ease of development but this is an enhancement which can be worked on in future
from dbt-snow-mask.
Had the same thoughts when I originally started, to have a schema for
compliance
and to define the masking policies in a central schema and reuse them in the required target database/schema. Had to take the current approach for ease of development but this is an enhancement which can be worked on in future
Great. I think the location/name of this compliance schema should be configurable as variables in the dbt-project.yml. In a similar fashion as this is solved by the dbt-artifacts package.
I'll be on holidays the next couple of weeks but looking forward to helping on this task.
from dbt-snow-mask.
One more consideration that comes to my mind:
When applying the masking policy to a table column, the policy is currently referenced with its fully-qualified path, including the database name.
This will lead to unexpected behaviour when cloning the database. In the clone the masked columns will still point to the original location of the masking policy and not to the cloned one.
To remedy this, the masking has to be applied to the column by pointing to a relative location, omitting the database name and just specifying schema and name of the policy.
Specifying the database name is only relevant if the policies are stored in a database that differs from the target.database
. While there might be use cases for such an architecture, my guess is that the number of cases where the policies are stored in the same database will make the majority.
Conclusion:.
In my eyes, we should not include the database name in the policy path when applying the policy, or even better, we make it configurable, if the the fully-qualified path should be used.
from dbt-snow-mask.
Regarding the fully-qualified path of the masking policy (see my last comment), I just made the experience that this is really troublesome with cloned databases:
The cloned DB ist still referencing the original policy then.
As a consequence, updating an existing policy in the original DB will fail as soon as we are trying to create or replace
the policy.
It cannot be replaced, since the cloned DB refers to it and the policy is not automatically being unapplied by dbt_snow_mask as it is not its job to care for other DBs than the target.database
.
from dbt-snow-mask.
Thank You @rumbin for your inputs on this. Sorry for the very late response, I might try to find sometime to get started on this one, before I start just wanted to check to see if you have started on any of the enhancements ?
- compliance schema as a configurable as variables in the dbt-project.yml as opposed to creating the masking policies in multiple schemas
- mitigate cloned DB referencing the original policy because of fully-qualified path of the masking policy
from dbt-snow-mask.
@entechlog, unfortunately, I haven't found any time to start working on this.
If you need any input or review from me, I'll happily help!
from dbt-snow-mask.
No issues. Sure, I will pass on for your review/testing once I have something on this one
from dbt-snow-mask.
Hello @rumbin @robscriva ,
Just made changes to add this feature and created a release candidate https://github.com/entechlog/dbt-snow-mask/releases/tag/0.1.8rc1
Will you be having sometime to review and test the changes ? Added integration tests as well part of this change and they are passing.
Test result with config enabled
vars:
use_common_masking_policy_db: "True"
common_masking_policy_db: "DEMO_DB"
common_masking_policy_schema: "GDPR"
Test result without config in project
from dbt-snow-mask.
This feature has been implemented now. Closing the issue.
from dbt-snow-mask.
Related Issues (20)
- Improvement: include sourcename or modelname in posthook to minimise overhead on run model tasks HOT 2
- Masking policy is not applied HOT 8
- apply policy to all models HOT 6
- Apply masking policy with FORCE keyword HOT 5
- Ability to update existing masking functions HOT 2
- Improvement: Allow for default dbt schema name generation HOT 2
- Improvement/bug: alert or warn when a masking policy specified does not exist
- dbt snow mask versioning HOT 2
- Feature Request: Make log optional
- Hyperlink to dbt's meta documetation HOT 1
- Post-hook executed as a pre-hook HOT 2
- Feature Request: Conditional Masking Policy HOT 3
- Applying policy on tables
- Masking policy does not apply for models or snapshots HOT 4
- Conditional Masking Policy does not need `{{masked_column}}` in its body
- Insufficient privilege to create a masking policy
- Unapply policies should handle if the materialization exists or not
- on-run-start with 'models' parameter failed HOT 1
- Abillity to use macro on dynamic talbe
- Ability to pass in node and custom_schema_name HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-snow-mask.