Comments (5)
I'd define flags as confidentiality_dropdown and topics_dropdown. It will align with other feature glue_crawler. Imo these are "features" and we enable/disable them. preview_data should be renamed to data_preview
I would rename custom_confidentiality_list to custom_confidentiality_values and make it a DICT like: {"Custom_Secret": {"hide_schema": true, hide_preview: "true"}} or even simpler: {"Custom_Secret": "secret", "Custom_Secret2": "public"}
For catalog make sure to hide the topics or classification from search if those feature are disabled. Then it shouldn't affect search.
from dataall.
EDIT
Changes required for this feature enhancement
Frontend Changes and Config additions -
- Add feature flags configs for topics and confidentiality to enable/disable on UI
"datasets": {
"active": true,
"features": {
...,
"preview_data": false,
"glue_crawler": false,
"confidentiality_dropdown" : true,
"custom_confidentiality_mapping" : {
"Public" : "Unclassified",
"Custom Confidentiality" : "Official",
"Custom Confidential" : "Secret",
"Another Confidentiality" : "Official"
},
"topics_dropdown" : false
}
},
- Add custom list for confidentiality list and keep the default in constants (
frontend/src/modules/constants.js
) - Make the UI for files
DatasetCreateForm.js
,DatasetEditForm.js
,DatasetImportForm.js
,DataGovernance.js
render conditionally the topics and confidentiality based on feature flag - Change the
Catalog.js
view to display topics and confidentiality based on config
Backend Changes
Confidentiality and topics are present as enums in the backend ( for e.g. - . They are used in the validation of inputs when a dataset is created in the graphql level. Also, those enums are used for filter and in conditions in various functions related to datasetsdataall/modules/datasets_base/db/enums.py
and dataall/modules/datasets/api/dataset/enums.py
)
- Modify the enums classes to extend custom configs's list or create a new class and use that as a custom config's enum. For this I am thinking of making the
ConfidentialityClassification
enum , have custom configs as class variables if they are present in config.json otherwise default to the three standard confidentiality config's. This similarly could also be applied for topics if custom topics are present.
Question - As the default confidentiality levels ( i.e. Unclassified, Secret, Official ) are used in the code for filtering and in some conditions, the new configs's list should work with them. The way I can think about this is, either for the custom config have a map, which will specify which custom confidentiality is similar to Unclassified, Secret, etc and then translate this config wherever the conditions are used with standard configs ( i.e. Unclassified, Secret, Official ). See dataall/modules/datasets/services/dataset_profiling_service.py
-> _check_preview_permissions_if_needed function for example.
OR Make changes to the understanding of confidentiality levels and come up with a more generic logic which doesn't tightly bind with the use of standard confidentiality levels . @dlpzx , @noah-paige , @zsaltys could you please let me know you thoughts on this.
EDIT - Going forward with mapping the custom confidentiality with existing confidentiality metrics.
Other Question to answer and clarify
- How does this change affect the indexing in open search ? -> After checking the code, found out that the Dataset indexes will automatically get updated. If the user decides to update the dataset's with new confidentiality levels the index will update and should reflect in catalog
- Does hiding search selectables like topics and classification ( confidentiality levels ) create issues with Catalog search . ( I am yet to see if this creates a problem ) -> This doesn't cause issues
- Check if the topics and confidentiality is used on some other modules like Glossaries, Quicksight Dashboard, etc . @dlpzx , @noah-paige -> Did not found any from my testing
from dataall.
Hi @zsaltys , Thanks for the suggestions. I have edited the design / code change document on this issue
from dataall.
We are working offline with @TejasRGitHub on the implementation of this issue
from dataall.
Completed as part of #1049
from dataall.
Related Issues (20)
- Better handling of "out of sync" Tables HOT 1
- Allow cross region shares with a feature flag
- Introduce Persistent Email Reminders for Producers of Datasets HOT 8
- Email notifications for share failures HOT 1
- Simplify Classification Config and Add more customizablity into it HOT 1
- More robust handling of exceptions inside share manager ECS task HOT 2
- Update documentation after v2.5 release for dataset related changes HOT 1
- UI: add spinner, when the group is deleted or invited to ENV
- Data.all allows 2 or more datasets with same s3 bucket HOT 2
- Create generic shares_base and s3_datasets_shares modules from current dataset_sharing HOT 1
- Automated Share Correction with Re-Apply Share ECS Task
- Give access to re-apply share button to consumers
- UI Inconsistency: Failure to Reflect Updated Table Location
- Unable to load consumption roles on request modal page for large number of consumption roles HOT 4
- Reject Reason should not be editable when the share is already approved
- No effect on session after setting cognito_user_session_timeout_inmins HOT 6
- Send alerts when share verifier ECS task fails
- ECS task fails and crashes when RDS queries return error HOT 2
- Bulk share re-apply UI for a dataset
- Investigate whether search lambda (search_handler.py ) can be combined with api_handler.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataall.