Comments (3)
Hi @Macklon thanks for opening an issue. This is a very frequent question that we have also ask ourselves a couple of times.
Why bootstraping?
The reason why the bootstraping is not automated as part of the environment creation is that it allows us to have a separate step that requires some manual action that "approves" the trust between data.all and the onboarded account. We want to keep this explicit approval that validates that an account can be accessed by data.all.
Having said that, customers typically automate the bootstraping of their accounts through different processes. Because each process is very particular, from the open-source repo we just keep the minimum implementation which is the manual bootstraping.
Ideation
We are open to ideas to make the bootstraping more streamlined. Reviewing your ideation proposal, the first thoughts that come to mind are:
- As a general guideline, better to use IAM roles than IAM users. If the credentials get compromised, the temporary credentials of a role reduce the blast impact.
- Who is going to store the SSM parameter? It looks like the environment AWS account Admins will be dependent from the central data.all Team to be able to link an Environment if an SSM needs to be created every time.
- How are the IAM policies of the role going to look like? and How do we ensure that the central account does not get too much access? In the case the IAM user has broad permissions or someone in the environment makes a mistake an opens the role permissions, data.all would have "unlimited" access to the account.
As a suggestion, I would avoid the security trouble of storing credentials, and especially permanent credentials of IAM users. And if possible I would avoid environment users creating SSM parameters or any other resource in the deployment account, or depending on an admin team that manages SSM parameters for them, it can become a bottleneck.
Here are some alternatives that I could brainstorm:
- CICD pipeline with base infrastructure - most customers already have some sort of base infra deployed in accounts from a tooling account. They would add the CDKToolkit with the necessary parameters as part of their CICD base infra process.
- Automation script - This is a little outdated, because data.all onboarding has simplified a lot over the years. In early projects some customers used a script to run several actions on the onboarded account, including bootstraping.
- AWS Organization - I think it is not your use-case, but a very cool feature in AWS Organizations are CloudFormation StackSets . We could use them to deploy the
CDKToolkit
to all accounts in the Organization. - Let the customer introduce AWS temporary credentials in the UI at the time they are trying to link the environment and let data.all backend execute the bootstrap API with those credentials. We need to evaluate if that implies any security issue.
We will think about more alternatives internally, to see if we can come up with solutions that make this process easier.
from dataall.
FYI environment accounts are created and owned by us when onboarding customers. Post onboarding data and access will be owned by customer.
from dataall.
I think the AWS StackSets solution is very neat if the user is using Organizations already. It will also allow us to deploy other stacks in the accounts (if needs be) so it's not only useful for bootstrapping. Having said that I am not sure if it's worth enforcing usage of Orgs in data.all just for this, we need to understand the implications better.
from dataall.
Related Issues (20)
- Better handling of "out of sync" Tables HOT 1
- Allow cross region shares with a feature flag
- Introduce Persistent Email Reminders for Producers of Datasets HOT 8
- Email notifications for share failures HOT 1
- Simplify Classification Config and Add more customizablity into it HOT 1
- More robust handling of exceptions inside share manager ECS task HOT 2
- Update documentation after v2.5 release for dataset related changes HOT 1
- UI: add spinner, when the group is deleted or invited to ENV
- Data.all allows 2 or more datasets with same s3 bucket HOT 2
- Create generic shares_base and s3_datasets_shares modules from current dataset_sharing HOT 1
- Automated Share Correction with Re-Apply Share ECS Task
- Give access to re-apply share button to consumers
- UI Inconsistency: Failure to Reflect Updated Table Location
- Unable to load consumption roles on request modal page for large number of consumption roles HOT 4
- Reject Reason should not be editable when the share is already approved HOT 1
- No effect on session after setting cognito_user_session_timeout_inmins HOT 7
- Send alerts when share verifier ECS task fails
- ECS task fails and crashes when RDS queries return error HOT 2
- Bulk share re-apply UI for a dataset
- Investigate whether search lambda (search_handler.py ) can be combined with api_handler.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataall.