Code Monkey home page Code Monkey logo

dbt-snow-mask's People

Contributors

808sandbr avatar donjae avatar dtger avatar entechlog avatar il-dat avatar jodybuz avatar jonhopper-dataengineers avatar lbk-fishtown avatar melissa-ellevation avatar robscriva avatar santhin avatar stchena avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dbt-snow-mask's Issues

Apply masking policy with FORCE keyword

Describe the feature:
Snowflake is now allowing us to run apply masking rules with FORCE keyword:
ALTER TABLE...ALTER COLUMN...SET MASKING POLICY ...FORCE

This is opening the door for us to use this keyword into the APPLY MASKING POLICY functionalities.

With this feature, we'll remove the annoying point where we need to unset rules firstly then to re-apply rules again. While in development, it's a great benefit when performing CLONE operation, we do not need to care about 'unset' anymore.

Suggested Solution:
Add variable use_force_applying_masking_policy: false. If this value turned into true then run apply masking policies with FORCE keyword

Improvement/bug: alert or warn when a masking policy specified does not exist

Context:
We currently use Terraform to manage/maintain our Snowflake account. This includes creating masking policies. Due to limitations in the current Terraform provider, it is not possible to apply masking policies to columns.

Problem:
We encountered an issue where we accidentally wrote the wrong masking policy name and run dbt. The result was a success from the output. We tried testing the column policy by running a SELECT query and it turns out the policy was not applied.

After we realised it was a typo and dbt was re-run, we couls see the dbt output showing the masking policies it has added. We confirmed by running a SELECT query and things were as expected ๐Ÿ‘

Expectation:
We would have expected some sort of warning or error be raised when an incorrect masking policy name has been provided.
The rationale for this is that applying a masking policy is fundamentally about protecting visibility to sensitive data and ensuring the right governance rules are applied against it. For something this important, it should be important to raise awareness to when a policy could not be applied.

Feature request: applying masking policies seen as mandatory

Currently, when in a .yml file is specified that a certain column in a model should have a masking policy applied, that masking policy is not applied when the masking policiy is not already created. This can lead to data being available unmasked when it is not supposed to be.

In our scenario, we use personal development schema's. Data is loaded as a variant into a general schema, lets say DB1.STG.TABLE1.
Our models in DBT then load/transform these tables from the STG schema to a different schema, lets say DB1.DBO.
If I run the DBT models that should load the data from STG to DBO, a personal schema is created: DB1.coone_mtb_DBO.TABLE1.

Since I didnt explicitly create the masking policies first, doing an 'apply masking policies' (which in our case is set as a post-hook) will not apply the masking policies on the tables in this personal schema, even though it is specified in the .yml file.

We would expect in these cases, the applying of the masking policies to return an error. Because the .yml file specified there should be a masking policy applied, but it wasn't present in the database.

Ability to pass in node and custom_schema_name

We use a custom generate_schema_name macro for our models. I have been unable to figure out how to duplicate this logic for where it should attempt to place the masking policy. When our ADO pipelines for new PRs are running, it doesn't seem to be able to determine the correct schema to place it in. I can either hard code it using the common schema name variable. Or I can leave it to place it in every schema that a model uses.

In this case, the use common masking policy schema is technically variable. When local dev / PRs (which we use a target.name to determine if it's a PR) then use the default schema. When it's our dev/sandbox/prod dbt environments, use a common schema.

The purpose for this is that we want to use a common schema (I don't need 30 copies of a masking policy in each schema where a model uses them that are all identical), but we don't want people's development creating objects in the "real" schemas either.

dbt snow mask versioning

I noticed the latest snow-mask version is not compatible with the 1.2.0 version of dbt. Once that dbt 1.3.0 is already out, when is there going to be an update on the compatibility with the latest versions, and with what frequency are you thinking in keeping up with the new dbt versions? We want to use snow-mask in our project but this limitations blocks its use for deployment on our side.

Post-hook executed as a pre-hook

Hi,

Thanks for your work !
I use this pkg to set a masking policy on one of my table. To set it I use a post-hook in the suitable model.
But I have a strange behavior: when I add it in my dbt_project.yml, it works fine:
models: analytics: result: +post-hook: - "{{ config(post_hook=dbt_snow_mask.apply_masking_policy('models')) }}"
But when I add it directly in the sql file defining my model it's launched as a pre-hook (resulting in the creation of a table with no masking policy):
{{ config( post_hook=dbt_snow_mask.apply_masking_policy('models') ) }}
I managed to reproduce it at will.
Don't hesitate if you need further details.

Bests,
Arnaud

Improvement: Allow for default dbt schema name generation

Currently, specifying a value for the 'common_masking_policy_schema' variable results in dbt using that value, as-is, as the target schema for the masking policies. This is different from the default dbt behaviour of prefixing the 'schema' variable from dbt_project.yml file with the value of 'schema' variable in the profiles.yml file.

I propose adding a new configuration variable, 'common_masking_policy_dbt_schema_override'. The value would be True/False. The default value would be 'True', which would cause the code to behave as it does currently (preserving backwards compatibility). If the value specified is 'False', then the dbt macro 'generate_schema_name' would be called to calculate the schema name.

I have cloned the repository and tried out changes to support this. For example, the changes to 'create_masking_policy.sql' are:

diff --git a/macros/snow-mask/create-policy/create_masking_policy.sql b/macros/snow-mask/create-policy/create_masking_policy.sql
index 4570393..795af4e 100644
--- a/macros/snow-mask/create-policy/create_masking_policy.sql
+++ b/macros/snow-mask/create-policy/create_masking_policy.sql
@@ -19,14 +19,24 @@
         {%- if (var('use_common_masking_policy_db', 'False')|upper in ['TRUE','YES']) -%}
             {% if (var('common_masking_policy_db') and var('common_masking_policy_schema')) %}
                 {% set masking_policy_db = var('common_masking_policy_db') | string  %}
-                {% set masking_policy_schema = var('common_masking_policy_schema') | string  %}
+
+                {%- if (var('common_masking_policy_dbt_schema_override', 'True')|upper in ['TRUE', 'YES']) -%}
+                    {% set masking_policy_schema = var('common_masking_policy_schema') | string  %}
+                {% else %}
+                    {% set masking_policy_schema = generate_schema_name(var('common_masking_policy_schema') | string, node) %}
+                {% endif %}
             {% endif %}
         {% endif %}

         {# Override the schema name (in the masking_policy_db) when use_common_masking_policy_schema_only flag is set #}
         {%- if (var('use_common_masking_policy_schema_only', 'False')|upper in ['TRUE','YES']) and (var('use_common_masking_policy_db', 'False')|upper in ['FALSE','NO']) -%}
             {% if var('common_masking_policy_schema') %}
-                {% set masking_policy_schema = var('common_masking_policy_schema') | string  %}
+
+                {%- if (var('common_masking_policy_dbt_schema_override', 'True')|upper in ['TRUE', 'YES']) -%}
+                    {% set masking_policy_schema = var('common_masking_policy_schema') | string  %}
+                {% else %}
+                    {% set masking_policy_schema = generate_schema_name(var('common_masking_policy_schema') | string, node) %}
+                {% endif %}
             {% endif %}
         {% endif %}

Thoughts?

I have a branch ready to be committed, if this sounds like something that would be acceptable.

Thanks,

Applying Masking Policy for Sources Results in incorrect or blank model/schema names

The code below in apply_masking_policy_list_for_sources.sql was added in #25 to add logic for a common masking db. It is a copy of the code for applying policies to models (model.database, mode.schema), so needs to be updated to work for source(s) nodes.

{% set masking_policy_db = model.database %}
{% set masking_policy_schema = model.schema %}
{# Override the database and schema name when use common_masking_policy_db flag is set #}
{%- if (var('use_common_masking_policy_db', 'False')|upper == 'TRUE') or (var('use_common_masking_policy_db', 'False')|upper == 'YES') -%}
{% if var('common_masking_policy_db') and var('common_masking_policy_schema') %}
{% set masking_policy_db = var('common_masking_policy_db') | string %}
{% set masking_policy_schema = var('common_masking_policy_schema') | string %}
{% endif %}
{% endif %}
{% set masking_policy_list_sql %}
show masking policies in {{masking_policy_db}}.{{masking_policy_schema}};

Snow mask version 0.1.8 or 0.1.9 introduces code that attempts to "create schema if not exists" prior to creating a masking policy, this is a permission we don't want to give to the dbt role

Hello, first of all thank you for making this great package! We've happily been using it in production for quite a while now.

Issue summary

We tried upgrading from version 0.1.7 to 0.1.9 today, however, we experienced a failure when trying to create a new masking policy.

After upgrading to 0.1.9 it seems snow mask first runs create schema if not exists db_name.schema_nam prior to trying to create the masking policy. However, the Snowflake role that we use to execute DDL does not have permission to create a new schema in this database, we very much do not want dbt to be creating schemas in our RAW database which is the database context for this failure. Additionally, we did not see anything in the release notes about this new behavior.

I found this out by looking at the full context in the dbt.log file, below is the relevant excerpt:

12:52:44.927852 [debug] [MainThread]: Creating schema "_ReferenceKey(database='raw', schema='elation_firefly_health', identifier=None)"
12:52:44.932793 [debug] [MainThread]: Using snowflake connection "macro_create_masking_policy"
12:52:44.932913 [debug] [MainThread]: On macro_create_masking_policy: /* {"app": "dbt", "dbt_version": "1.1.0", "profile_name": "default", "target_name": "dev", "connection_name": "macro_create_masking_policy"} */
create schema if not exists raw.elation_firefly_health -- HERE IS THE ISSUE
12:52:44.932981 [debug] [MainThread]: Opening a new connection, currently in state init
12:52:45.697066 [debug] [MainThread]: Snowflake adapter: Snowflake query id: 01a480a4-0504-a4df-0000-98a50fdebaae
12:52:45.697691 [debug] [MainThread]: Snowflake adapter: Snowflake error: 003001 (42501): SQL access control error:
Insufficient privileges to operate on database 'RAW'
12:52:45.698276 [debug] [MainThread]: Snowflake adapter: Error running SQL: macro create_schema
12:52:45.698552 [debug] [MainThread]: Snowflake adapter: Rolling back transaction.
12:52:45.698949 [debug] [MainThread]: Snowflake adapter: Error running SQL: macro create_masking_policy
12:52:45.699234 [debug] [MainThread]: Snowflake adapter: Rolling back transaction.
12:52:45.699601 [debug] [MainThread]: On macro_create_masking_policy: Close
12:52:45.942882 [error] [MainThread]: Encountered an error while running operation: Database Error
  003001 (42501): SQL access control error:
  Insufficient privileges to operate on database 'RAW'
12:52:45.943639 [debug] [MainThread]: 
12:52:45.944479 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1071ddb20>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1071dd9d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1071ddaf0>]}
12:52:46.309796 [debug] [MainThread]: Connection 'macro_create_masking_policy' was properly closed.

I added a comment on the relevant line, see the spot where it says HERE IS THE ISSUE.

Question for the snow mask maintainers

Ideally snow mask would not try to create a table, this is pretty surprising behavior. Can this behavior be configurable behavior?

Masking policy does not apply for models or snapshots

Hello - I've noticed what I believe to be a bug.

I am running pre- and post-hook create and apply for masking policies. I've found that policies can be created for all of sources, models, and snapshots. However, masking policies are only being applied to sources and are not being applied to models or snapshots.

I believe the problem is in apply_masking_policy_list_for_models.sql, though I'm not actually sure exactly what it is... it seems for some reason this macro does not do anything. my quick fix for this was to just copy/rename apply_masking_policy_list_for_sources.sql and replace graph.sources.values() in line 5 with graph.node.values().

This worked like a charm and did not seem to really change the functionality. Curious if you have plans to more officially address this bug any time soon? Thank you!

Feature request: Handle updated policies automatically

Currently, if an existing policy is updated, it needs to be unapplied, deleted, created (or CREATE OR REPLACE...) and applied again.
This is imho a manual process which is cumbersome when applied to production environments.
It would be great if we could find a solution which handles these things automatically.

Model build time unapply

Currently for the unapply operation in the apply_masking_policy_list_for_models macro it loops through the nodes in the graph. Perhaps this could be decoupled to allow a pre-hook unapply to the model as well.

So there would be two approaches to applying policies:

  • unapply / apply in pre-post hook steps
  • unapply / apply in batch via run-operations or on-run-start/end

Create Masking Policy in a .gitlab-ci.yml File

Hello !

So I am trying to run the dbt command 'run-operation' on a yml file (.gitlab-ci.yml)

There is the command on dbt:
dbt run-operation create_masking_policy --args '{"resource_type": "models"}'

I would like this command on my yml file between the seed and run command:
script:

  • export DBT_SNOWFLAKE_DATABASE="ANALYTICS_PRODUCTION"
  • export API_KEY=$DBT_CLOUD_API_KEY
  • dbt deps --profiles-dir . --target prod
  • dbt seed --full-refresh --profiles-dir . --target prod
  • ADD THE COMMAND HERE
  • dbt run --profiles-dir . --target prod --exclude tag:cleaned_snapshots

Do you know if it is possible for me to incorporate that create_marking_policy into my yml file for the deployment on our production ? I keep getting syntax errors when trying to do so...

Thank you so much and amazing project btw !
Salah

Improved support for slim CI & defer

I am using dbt-snow-mask as part of a CI/CD pipeline.
We use dbt build with state:modifed and defer flags. We also create a new database per pull request.
We want masking to be applied in the test databases as well.
We also use custom schemas

This introduces a few issues:

  1. There may be models with masking rules that are not in the test db
  2. There may be models in schemas that do not exist in the test db

The script also does the deployment into production and this also introduces some issues:

  1. Masking policies need to be created automatically
  2. Masking policies must be recreated on future runs in case the policy is redefined
  3. Since recreating a policy requires un-assigning a policy and during a deploy not all models will run, the policies must be reassigned to all models outside of a run

Right now I solved via some custom macros, but I think this can be simplified by creating policies in one schema, e.g. public and having a run operation for models to re-apply masking policy. It would also be good to do sources and models from one run operation.

These are the steps we run

    {{ log("Creating schemas for masking policies", true) }}
    {% do create_missing_schemas_with_masking_policy() %}

    {{ log("Unapplying masking policies", true) }}
    {% do snow_mask_reapply_policies('unapply') %}

    {{ log("Create masking policies", true) }}
    {% do dbt_snow_mask.create_masking_policy(resource_type='sources') %}
    {% do dbt_snow_mask.create_masking_policy(resource_type='models') %}

    {{ log("Reapply masking policies", true) }}
    {% do snow_mask_reapply_policies('apply') %}

and this is the macro for removing/adding the policies without a dbt run.

{# This macro reapplies masking policies #}
{#
    To run: 
    dbt run-operation snow_mask_reapply_policies
#}

{% macro snow_mask_reapply_policies(operation_type="apply") %}

    {% set schema_list = [] %}
    {% set schemas =  list_schemas() %}
    {% for schema in schemas %}
        {% do schema_list.append(schema['name']) %}
    {% endfor %}

    {% for node in graph.nodes.values() -%}

        {% set database = node.database | string %}
        {% set schema   = node.schema | string | upper %}
        {% set name   = node.name | string %}
        {% set identifier = (node.identifier | default(name, True)) | string %}

        {% set unique_id = node.unique_id | string %}
        {% set resource_type = node.resource_type | string %}
        {% set materialization = node.config.materialized | string %}

        {% set meta_key = 'masking_policy' %}

        {# Only run if there is a schema in the db for this relation #}
        {% set schema_exists = schema in schema_list %}

        {# If other materializations are added that create views or tables, #}
        {# this list will need to update #}
        {% if (schema_exists) and (materialization in ['table', 'view', 'incremental']) %}

            {# override materialization_type for incremental to table #}
            {% if materialization == 'incremental' %}
                {% set materialization = 'table' %}
            {% endif %}
            
            {% set meta_columns = dbt_snow_mask.get_meta_objects(unique_id, meta_key, resource_type) %}

            {% set masking_policy_list_sql %}
                show masking policies in {{database}}.{{schema}};
                select $3||'.'||$4||'.'||$2 as masking_policy from table(result_scan(last_query_id()));
            {% endset %}

            {%- for meta_tuple in meta_columns if meta_columns | length > 0 %}
                {% set column   = meta_tuple[0] %}
                {% set masking_policy_name  = meta_tuple[1] %}

                {% if masking_policy_name is not none %}

                    {% set masking_policy_list = dbt_utils.get_query_results_as_dict(masking_policy_list_sql) %}

                    {% set timestamp = modules.datetime.datetime.now().strftime("%H:%M:%S") ~ " | " %}

                    {% set policies_in_database = masking_policy_list['MASKING_POLICY'] | list %}
                    {% set full_masking_policy_name = database|upper ~ '.' ~ schema|upper ~ '.' ~ masking_policy_name|upper %}
                    {% if full_masking_policy_name not in policies_in_database %}
                        {{ log(timestamp ~ "Missing Masking policy defined for "~ unique_id ~" column: " ~ column,true) }}
                        {{ log(timestamp ~ "Policy Name: " ~ full_masking_policy_name, true) }}
                        {# Force an exit when masking policy is missing #}
                        {{ 0/0 }}
                    {% endif %} 

                    {% for masking_policy_in_db in masking_policy_list['MASKING_POLICY'] %}
            
                        {%- set relation = adapter.get_relation(
                            database=database,
                            schema=schema,
                            identifier=identifier) -%}

                        {% set relation_exists = relation is not none  %}

                        {% if (relation_exists) and (full_masking_policy_name == masking_policy_in_db) %}
                            {{ log(timestamp ~ operation_type ~ "ing masking policy to model : " ~ full_masking_policy_name ~ " on " ~ database ~ '.' ~ schema ~ '.' ~ identifier ~ '.' ~ column, info=True) }}
                            {% set query %}
                                {% if operation_type == "apply" %}
                                    alter {{materialization}}  {{database}}.{{schema}}.{{identifier}} modify column  {{column}} set masking policy  {{database}}.{{schema}}.{{masking_policy_name}}
                                {% elif operation_type == "unapply" %}
                                    alter {{materialization}}  {{database}}.{{schema}}.{{identifier}} modify column  {{column}} unset masking policy
                                {% endif %}
                            {% endset %}

                            {% do run_query(query) %}
                        {% elif (full_masking_policy_name == masking_policy_in_db) %}
                            {{ log(timestamp ~ "Skipping non-existant relation " ~ database ~ '.' ~ schema ~ '.' ~ identifier ~ ' column: ' ~ column, info=True) }}
                        {% endif %}
                    {% endfor %}
                {% endif %}

            {% endfor %} #}
        {% endif%}

    {% endfor %}

{% endmacro %} 

apply policy to all models

Is there a reason that
dbt run-operation apply_masking_policy --args '{"resource_type": "sources"}'
works for sources but not models? And that models need to have masking policies applied as a post-hook or on-run-end? I am able to run the above command with a resource_type of models but it does nothing.

It would be very helpful to have this feature to mitigate the difficulty of unapplying and reapplying masking policies when they change, because running all models can take a long time, but applying masking policies should be very quick.

Feature Request: Make log optional

In our projects we work a lot with the logs. This includes the log file as well as the stdout.
Especially this log, even only in log file, is increasing our logs heavily:

`{% for node in graph.nodes.values() -%}

{{ log(modules.datetime.datetime.now().strftime("%H:%M:%S") ~ " | macro - now processing : " ~ node.unique_id | string , info=False) }}`

I'm open for any solution like optional logs via vars, as preferred by entechlog.

on-run-start with 'models' parameter failed

I tried to put {{ dbt_snow_mask.create_masking_policy('models') }} in the on-run-start, however it runs error message as below. any idea what I did wrong?

dbt=1.7.0

on-run-start:
  "{{ dbt_snow_mask.create_masking_policy('models') }}"
 'dict object' has no attribute 'create_masking_policy_mp_encrypt_pii'

when executing manually, it creates policy without issue.

dbt run-operation create_masking_policy --args "{'resource_type': 'models'}"

Ability to update existing masking functions

Hello!

My team has started using this package for masking and it's been working awesome. But one thing we ran into is having to manually update the masking functions (alter function statements) if for example we needed to add an additional role to the masking targets.

Is the only way to do this currently through dbt to unapply all masking, drop the functions, recreate functions and reapply them?
Ideally we wouldn't want to do this because it would expose sensitive data for a short period of time. I might be missing some functionality that already exists?

Thanks!

Feature Request: Conditional Masking Policy

Feature Request: Conditional Masking Policy

Description:

dbt-snow-mask currently supports only simple, static rules based just on the meta-masked column and user's role.
It would be great to introduce conditional masking policies to allow for more fine-grained control of access.

For example, in our use-case, we couldn't proceed with dbt-snow-mask because we had our data scientists divided into 2 groups with mutually exclusive lists of client applications and their data. We wanted ds1 to have full access to their PII data, while only masked access to ds2 PII data and vice versa.
With conditional masking policies, this could be possible.

Link to Snowflake Documentation

Proposed Solution:

An example of how DDLs could be declared with the new feature:

{% macro create_masking_policy_mp_pii(node_database, node_schema, masked_column, conditional_columns) %}
    
    CREATE MASKING POLICY IF NOT EXISTS {{node_database}}.{{node_schema}}.mp_{{masked_column}}_pii AS
    (
        {{masked_column}} string, 

        {% for cond_col in conditional_columns %}
        {{cond_col}} string{{ "," if not loop.last }}
        {% endfor %}
    ) RETURNS string ->
        CASE
            WHEN CURRENT_ROLE() IN ('DS1') AND <your-conditions-here> THEN {{masked_column}}
        ELSE '***MASKED***'
        END
{% endmacro %}

An example of how to declare conditional columns in meta:

...
- name: email
   description:  User personal email
   meta:
     masking_policy: mp_pii
     conditional_columns:
       - cond_col1
       - cond_col2
       - ...
...

Modify get_meta_objects.sql macro to extract conditional columns.

Modify get_masking_policy_list_for_models.sql and create_masking_policy.sql macros to include the newly extracted parameters.

Modify the apply query in the following way:

alter {{materialization}} {{database}}.{{schema}}.{{alias}} modify column {{column}} set masking policy {{masking_policy_db}}.{{masking_policy_schema}}.{{masking_policy_name}} using ({{column}}, {{conditional_columns}}) {% if var('use_force_applying_masking_policy','False')|upper in ['TRUE','YES'] %} force {% endif %};

Benefits:

Allow companies to enforce more fine-grained access control. This functionality brings in best-of-both-worlds from Snowflake's Column Masking Policies and RBAC.
The introduction of conditional masking policies would add a lot of flexibility to the dbt-snow-mask tool. Users would be able to define masking policies that are more specific and nuanced than what is currently possible with static rules. This would make the tool more powerful and easier to use in complex data environments.

Challenges:

  • Masking policy naming issues - it would be optimal for every new masking policy to include table name and column name to ensure that a new policy does not replace an already existing / just applied policy in Snowflake. I imagine this would create quite a few debugging issues.

Additional Information:

I'd be happy to help implement this.

Thank you for considering this feature suggestion!

Masking policy is not applied

Hi,
I've followed the instructions and am unable to see the masking policy to be applied. In my setup, it will do the following:

  • Create the masking policy per my macro: mp_email
  • Create the model as a view: ref_contacts

However, the view is never set with the masking policy. I'm not sure what I've done wrong.

Here's the repo with my test models (It's a public one just for testing): https://github.com/dweaver33/snowflake_mask_test

Output of dbt run:

20:12:02  Running with dbt=1.1.0
20:12:02  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 436 macros, 3 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
20:12:02  
20:12:04  
20:12:04  Running 2 on-run-start hooks
20:12:05  1 of 2 START hook: data_pii_testing.on-run-start.0 ............................. [RUN]
20:12:05  1 of 2 OK hook: data_pii_testing.on-run-start.0 ................................ [OK in 0.00s]
20:12:05  2 of 2 START hook: data_pii_testing.on-run-start.1 ............................. [RUN]
20:12:05  2 of 2 OK hook: data_pii_testing.on-run-start.1 ................................ [OK in 0.00s]
20:12:05  
20:12:05  Concurrency: 1 threads (target='dev')
20:12:05  
20:12:05  1 of 1 START view model dev.ref_contacts ....................................... [RUN]
20:12:06  1 of 1 OK created view model dev.ref_contacts .................................. [SUCCESS 1 in 0.68s]
20:12:06  
20:12:06  Running 1 on-run-end hook
20:12:06  1 of 1 START hook: data_pii_testing.on-run-end.0 ............................... [RUN]
20:12:06  1 of 1 OK hook: data_pii_testing.on-run-end.0 .................................. [OK in 0.00s]
20:12:06  
20:12:06  
20:12:06  Finished running 1 view model, 3 hooks in 3.10s.
20:12:06  
20:12:06  Completed successfully
20:12:06  
20:12:06  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Set masking_policy_list should be outside of the loop

In apply_masking_policy_list_for_sources.sql and apply_masking_policy_list_for_models.sql the variable masking_policy_list is set for each column in the model. So that means that if you got 30 columns masked, it will run 30 times the same query.
The reason is that you don't want to execute this query if none masking policies are set for the model. But I think it should be run 1 time in the macro, so the variable masking_policy_list should be a few rows above.

        {% set masking_policy_list_sql %}     
            show masking policies in {{masking_policy_db}}.{{masking_policy_schema}};
            select $3||'.'||$4||'.'||$2 as masking_policy from table(result_scan(last_query_id()));
        {% endset %}

--  {% set masking_policy_list = dbt_utils.get_query_results_as_dict(masking_policy_list_sql) %}  ๐Ÿ‘ˆ here instead

        {%- for meta_tuple in meta_columns if meta_columns | length > 0 %}
            {% set column   = meta_tuple[0] %}
            {% set masking_policy_name  = meta_tuple[1] %}
                {% if masking_policy_name is not none %}

                {% set masking_policy_list = dbt_utils.get_query_results_as_dict(masking_policy_list_sql) %} -- should be above ๐Ÿ‘†

In the screenshot below, you'll see a part of an example, where I have a table with 14 columns, so it runs an unnecessary amount of queries against the database.
Screenshot 2022-05-10 at 14 48 11

What do you think?

dbt-snow-mask does not appear to be compatible with dbt 1.0.1

Upon running dbt deps I get the following error:

18:39:34 Installing entechlog/dbt_snow_mask Runtime Error Failed to read package: Runtime Error This version of dbt is not supported with the 'dbt_snow_mask' package. Installed version of dbt: =1.0.1 Required version of dbt for 'dbt_snow_mask': ['>=0.18.0', '<=0.22.0'] Check the requirements for the 'dbt_snow_mask' package, or run dbt again with --no-version-check

Is there something specific to DBT 1.0.1 that makes it incompatible?

How would you apply salt?

How would you apply a salt when using a hash function? I guess we could create a separate UDF and apply the logic to get the salt there, but it would be nice to address it.

Improvement: include sourcename or modelname in posthook to minimise overhead on run model tasks

#41

I've edited the code to allow for the following post hooks in the project file to process only the relevant maskingpolicies on a run task.

                +post-hook: "{{ dbt_snow_mask.apply_masking_policy('models','mymodelname')}}" 
                +post-hook: "{{ dbt_snow_mask.apply_masking_policy('sources','mysourcename')}}" 

-- Altered macro: [apply_masking_policy] (50% the old macro)

{% macro apply_masking_policy(resource_type="models",resource_name="undefined",meta_key="masking_policy") %}

    {% if execute %}

        {% if resource_type == "sources" and  resource_name == "undefined" %}
            {{ dbt_snow_mask.apply_masking_policy_list_for_sources(meta_key) }}
        {% elif resource_type == "models" and resource_name == "undefined" %}
            {{ dbt_snow_mask.apply_masking_policy_list_for_models(meta_key) }}           
        {% elif resource_type == "sources" and resource_name != "undefined" %}
            {{ dbt_snow_mask.apply_masking_policy_list_for_onesource(meta_key,resource_name) }}
        {% elif resource_type == "models" and resource_name != "undefined" %}
            {{ dbt_snow_mask.apply_masking_policy_list_for_onemodel(meta_key,resource_name) }}
        {% endif %}
    {% endif %}

{% endmacro %}

-- New macro: [apply_masking_policy_list_for_onesource] (95% the old macro apply_masking_policy_list_for_sources)
--Most important change:

% macro apply_masking_policy_list_for_onesource(meta_key,resource_name,operation_type="apply") %}

    {% for node in graph.sources.values() -%}
        {%- if node.source_name == resource_name -%}
 [...]

-- New macro: [apply_masking_policy_list_for_onemodel] (95% the old macro apply_masking_policy_list_for_models)
--Most important change:

{% macro apply_masking_policy_list_for_onemodel(meta_key,resource_name,operation_type="apply") %}
{%set resource_name =  '/'+ resource_name +'/' %}
    {% if operation_type == "apply" %}    
        {% for node in graph.nodes.values() %}
            {%- if resource_name in node.path  -%}          
 [...]

If you need the whole code, I created a PullRequest

Fix masking policy schema only, not db

Hello, thanks for this great package!

I would like to use the use_common_masking_policy_db functionality to fix only the schema, but not the database. So for example my masking policies will always live in the public schema, but I want to maintain separation between my prod and dev environments. Otherwise as I understand it testing out a new masking policy definition in dev could update my prod environment!

If my understanding is correct, and this change fits in with your product vision, I'd be happy to contribute a fork. The line in question is here, should be a pretty simple change.

Compile masking policies but don't run them

Hello,

I am looking to generate the masking policies in the target folder but not actually run them against Snowflake. I would like to then send these via email to our Snowflake admin.

I have tried to place the create and apply macros in a model and compile the model. However, I don't have the permissions required to create masking policies in Snowflake so dbt compile returns the error insufficient permissions.

Is there a way to generate masking policies as SQL without connecting to snowflake?

Missing packages.yml file

This project uses a dbt_utils function (get_query_results_as_dict in the apply_masking_policy_list_for_sources.sql macro) but doesn't have dbt_utils installed. If someone is using this package, but has not yet independently installed dbt_utils, they will get the error:
'dbt_utils' is undefined

Add a packages.yml file with dbt_utils installed.

Unapply policies should handle if the materialization exists or not

Is it possible to have unapply masking policies macros to validate if the source, model or snapshots exists
before trying to remove the mask?

Currently I'm getting the following error

[2023-10-04, 00:13:01 UTC] {subprocess.py:93} INFO - 00:13:01  00:13:01 | unapplying masking policy to model  : MY_DB.MY_SCHEMA.MY_MASK on MY_DB.MY_SCHEMA.MY_MODEL.col1
[2023-10-04, 00:13:02 UTC] {subprocess.py:93} INFO - 00:13:02  Database error while running on-run-start
[2023-10-04, 00:13:02 UTC] {subprocess.py:93} INFO - 00:13:02  Encountered an error:
[2023-10-04, 00:13:02 UTC] {subprocess.py:93} INFO - Database Error
[2023-10-04, 00:13:02 UTC] {subprocess.py:93} INFO -   002003 (42S02): SQL compilation error:
[2023-10-04, 00:13:02 UTC] {subprocess.py:93} INFO -   Table 'MY_DB.MY_SCHEMA.MY_MODEL' does not exist or not authorized.

How to reproduce it:

  1. Create a model file not previously materialized and set the masking_policy meta field.
  2. In dbt_proyect.yml have the following set up
on-run-start:
  - "{{ dbt_snow_mask.unapply_masking_policy('sources') }}"
  - "{{ dbt_snow_mask.unapply_masking_policy('models') }}"
  - "{{ dbt_snow_mask.create_masking_policy('sources') }}"
  - "{{ dbt_snow_mask.create_masking_policy('models') }}"
  - "{{ dbt_snow_mask.apply_masking_policy('sources') }}"
 
models:

    +post-hook:
    - "{{ dbt_snow_mask.apply_masking_policy('models') }}"
  1. Run dbt model

We unset, recreate and reset the masking policies so it transparently handles changes to the policy body

this is how our create_masking_policy_my_mask.sql looks:

{% macro create_masking_policy_my_mask(node_database,node_schema) %}

CREATE OR REPLACE MASKING POLICY {{node_database}}.{{node_schema}}.my_mask AS (val string)
...

{% endmacro %}

Also, let me know if this is the intended way to manage the lifecycle of the policies using this package.

Thanks in advance.

Conditional Masking Policy does not need `{{masked_column}}` in its body

Conditional Masking Policy does not need {{masked_column}} in its body

Description:

After introducing #62 and using the change for a while I discovered that the masking policies' bodies don't really need the specific column name.

You can quite literally name it whatever and as long as you pass a proper column name during `ALTER TABLE ALTER COLUMN

Example:

I create a masking policy called default_string_mp_pii for the column email. Its signature visible after executing DESCRIBE MASKING POLICY default_string_mp_pii is then seen as containing (EMAIL STRING, ...).

When I then create or replace the exact same masking policy on column vulnerable_last_name, the masking policy's entry is lost, instead being replaced by the new signature containing VULNERABLE_LAST_NAME STRING, ...).

This jinja part is absolutely not needed and I feel it should be removed - instead relying on a generic "variable"-style name which will be the same regardless of column the masking policy is being applied to.

Proposed Solution:

  • Change {{masked_column}} to masked_column in masking policy DDL. This will prevent dbt from overwriting this parameter on each call.
  • remove the use of conditionally_masked_column in create_masking_policy.sql. It is not needed.
  • Remove the setting of conditionally_masked_column in get_masking_policy_list_for.... It is not needed.

Benefits:

Challenges:

  • Removing the parameter might break compatibility, but as far as I remember, I made it an optional parameter...

Additional Information:

When I find some time, I can fix it.

Insufficient privilege to create a masking policy

Hello,

I think there is some sort of bug in dBt when configuring dBt_project.yml file of "dbt_snow_mask".
I tried to configure "dBt_project.yml" (e.g.
name: 'dbt_snow_mask'
version: '1.0.0'

require-dbt-version: [">=0.18.0"]
config-version: 2

target-path: "target"
clean-targets: ["target", "dbt_modules"]
macro-paths: ["macros"]
log-path: "logs"

vars:
use_common_masking_policy_db: "True"
common_masking_policy_db: "MY_DB"
common_masking_policy_schema: "MY_SCHEMA"
use_force_applying_masking_policy: "True"
)
to force dBt to create masking policies in a dedicated database and schema in SF instance, but I get an error "Insufficient privileges to operate on database 'MY_DB'", but when I use the same role as my dbt user does in SnowFlake itself I have no problem of creating or applying masking policies in that same location.

Please advise.

Applying policy on tables

We currently have dbt_snow_mask configured in our dbt_project.yml to create masking policies for all models on run start and apply policies to run as a post-hook. When materializing a model as a table that references columns from an upstream model with a masking policy applied to it we get the following database error.

String '357715109e6a2d0c23343a5dda82be162326e4715ec69bb8f24755848739d910' is too long and would be truncated

dbt_project.yml

on-run-start:
  - "{{ dbt_snow_mask.create_masking_policy('models')}}"

vars:
  use_common_masking_policy_schema_only: "True"
  common_masking_policy_schema: "PUBLIC"

models:
  +post-hook:
    - "{{ dbt_snow_mask.apply_masking_policy('models') }}"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.