Code Monkey home page Code Monkey logo

Comments (13)

entechlog avatar entechlog commented on July 21, 2024 2

BTW: masking semi-structured data is on my bucket list as well. However, I am a bit concerned about performance implications of the solution I linked above. Wondering to what extent this solution will hamper the columnar access of single VARIANT paths.

That is my concern as well, lets keep this thread open to share thoughts on performance and design considerations for this feature

from dbt-snow-mask.

entechlog avatar entechlog commented on July 21, 2024

Hello @rumbin, Interesting thoughts

  • Idea was to use have the flexibility for masking policies in the same way how currently tests are defined for each model.
  • We could also create a single yml with all models and related meta, that should work as well same as tests
  • If we need an dbt-snow-mask yml, that needs some additional design and development
  • Regarding masking semi-structured data, just came across an use case last week so will read the reference you have shared and will try to implement in dbt-snow-mask as well

from dbt-snow-mask.

rumbin avatar rumbin commented on July 21, 2024

@entechlog
I am not sure I do understand your thoughts about a central dbt-snow-mask.yml.
So I simply elaborate a bit more on my thoughts and hope we'll converge into the same solution at some point ;-).

My understanding of a masking policy is something which is rather re-usable. Just like the general definition of a test method. E.g., not_nul or unique are defined centrally as a one macro, respectively, to be used in many models' columns.
Or, just like a UDF or a UDTF, which are also defined once in a central schema, like, e.g., public or utils or so.
Specifying which policy to apply to which column is currently done via the schema YAML in the meta section and I feel that this is the most natural approach. So this should be kept in my eyes.

I think that your and my points of view differ when it comes to considering the reusability of a masking policy.
I understand that there may be use cases where a masking policy is pretty schema specific, since the masking conditions are tailored to some specific roles which might in turn be closely related to the schema's content. In such a case it might be useful to place the policy in the same schema like the model it applies to, however, the policy's creation macro will anyway need to be uniquely named and, thus, it would not harm to put the policies in a common, central schema, as well.

Having the masking policies in a central schema ensures we have no duplication and we can configure different access control for the policies, than for the models's schemas.

BTW: masking semi-structured data is on my bucket list as well. However, I am a bit concerned about performance implications of the solution I linked above. Wondering to what extent this solution will hamper the columnar access of single VARIANT paths.

from dbt-snow-mask.

entechlog avatar entechlog commented on July 21, 2024

I am not sure I do understand your thoughts about a central dbt-snow-mask.yml.
So I simply elaborate a bit more on my thoughts and hope we'll converge into the same solution at some point ;-).

My understanding of a masking policy is something which is rather re-usable. Just like the general definition of a test method. E.g., not_nul or unique are defined centrally as a one macro, respectively, to be used in many models' columns.
Or, just like a UDF or a UDTF, which are also defined once in a central schema, like, e.g., public or utils or so.
Specifying which policy to apply to which column is currently done via the schema YAML in the meta section and I feel that this is the most natural approach. So this should be kept in my eyes.

Thank you for the detailed insights on your thoughts, We are totally on same page here

from dbt-snow-mask.

entechlog avatar entechlog commented on July 21, 2024

I think that your and my points of view differ when it comes to considering the reusability of a masking policy.
I understand that there may be use cases where a masking policy is pretty schema specific, since the masking conditions are tailored to some specific roles which might in turn be closely related to the schema's content. In such a case it might be useful to place the policy in the same schema like the model it applies to, however, the policy's creation macro will anyway need to be uniquely named and, thus, it would not harm to put the policies in a common, central schema, as well.

Had the same thoughts when I originally started, to have a schema for compliance and to define the masking policies in a central schema and reuse them in the required target database/schema. Had to take the current approach for ease of development but this is an enhancement which can be worked on in future

from dbt-snow-mask.

rumbin avatar rumbin commented on July 21, 2024

Had the same thoughts when I originally started, to have a schema for compliance and to define the masking policies in a central schema and reuse them in the required target database/schema. Had to take the current approach for ease of development but this is an enhancement which can be worked on in future

Great. I think the location/name of this compliance schema should be configurable as variables in the dbt-project.yml. In a similar fashion as this is solved by the dbt-artifacts package.
I'll be on holidays the next couple of weeks but looking forward to helping on this task.

from dbt-snow-mask.

rumbin avatar rumbin commented on July 21, 2024

One more consideration that comes to my mind:

When applying the masking policy to a table column, the policy is currently referenced with its fully-qualified path, including the database name.

This will lead to unexpected behaviour when cloning the database. In the clone the masked columns will still point to the original location of the masking policy and not to the cloned one.
To remedy this, the masking has to be applied to the column by pointing to a relative location, omitting the database name and just specifying schema and name of the policy.

Specifying the database name is only relevant if the policies are stored in a database that differs from the target.database. While there might be use cases for such an architecture, my guess is that the number of cases where the policies are stored in the same database will make the majority.

Conclusion:.
In my eyes, we should not include the database name in the policy path when applying the policy, or even better, we make it configurable, if the the fully-qualified path should be used.

from dbt-snow-mask.

rumbin avatar rumbin commented on July 21, 2024

Regarding the fully-qualified path of the masking policy (see my last comment), I just made the experience that this is really troublesome with cloned databases:

The cloned DB ist still referencing the original policy then.
As a consequence, updating an existing policy in the original DB will fail as soon as we are trying to create or replace the policy.
It cannot be replaced, since the cloned DB refers to it and the policy is not automatically being unapplied by dbt_snow_mask as it is not its job to care for other DBs than the target.database.

from dbt-snow-mask.

entechlog avatar entechlog commented on July 21, 2024

Thank You @rumbin for your inputs on this. Sorry for the very late response, I might try to find sometime to get started on this one, before I start just wanted to check to see if you have started on any of the enhancements ?

  • compliance schema as a configurable as variables in the dbt-project.yml as opposed to creating the masking policies in multiple schemas
  • mitigate cloned DB referencing the original policy because of fully-qualified path of the masking policy

from dbt-snow-mask.

rumbin avatar rumbin commented on July 21, 2024

@entechlog, unfortunately, I haven't found any time to start working on this.
If you need any input or review from me, I'll happily help!

from dbt-snow-mask.

entechlog avatar entechlog commented on July 21, 2024

No issues. Sure, I will pass on for your review/testing once I have something on this one

from dbt-snow-mask.

entechlog avatar entechlog commented on July 21, 2024

Hello @rumbin @robscriva ,

Just made changes to add this feature and created a release candidate https://github.com/entechlog/dbt-snow-mask/releases/tag/0.1.8rc1

Will you be having sometime to review and test the changes ? Added integration tests as well part of this change and they are passing.

Test result with config enabled

vars:
  use_common_masking_policy_db: "True"
  common_masking_policy_db: "DEMO_DB"
  common_masking_policy_schema: "GDPR"

image

Test result without config in project
image

Data in snowflake
image

from dbt-snow-mask.

nadesansiva avatar nadesansiva commented on July 21, 2024

This feature has been implemented now. Closing the issue.

from dbt-snow-mask.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.