aws-samples / aws-cloudhsm-cloudformation-template Goto Github PK
View Code? Open in Web Editor NEWAutomated deployment of AWS CloudHSM resources using AWS CloudFormation
License: MIT No Attribution
Automated deployment of AWS CloudHSM resources using AWS CloudFormation
License: MIT No Attribution
I really appreciate this template. Since AL2 is going EOL in 2025, it would be nice to see AL2023 support added for the EC2 instance.
Although the documentation states that Internet connectivity is required for the deployment of the stack, it can be changed so the stack could work without this requirement. The majority of software (with the exception of AWS CLI v2) is hosted on S3, so a private VPC with the S3 VPC endpoint works just fine (if one comments out the forceful update of AWS CLI v2). I've raised the corresponding feature request in AWS CLI v2 repo to re-consider the release hosting location to be moved to S3 too.
Given that the auxiliary EC2 instance is spawned from Amazon Linux 2 which already has AWS CLI installed, do we really need to forcibly re-install the package?
Update the automation to download and use v5 of the CloudHSM client. Currently, v3 is used.
Examples of change to the CLI:
The current AWS CloudHSM documentation describes the cluster activation process as follows:
https://docs.aws.amazon.com/cloudhsm/latest/userguide/activate-cluster.html
Note the use of the cluster activate
command.
The current implementation still works, but it uses a slightly different and perhaps outdated means to trigger cluster activation. It uses the process of setting the Crypto Officer (CO) password to an initial value to trigger the activation.
When you choose to create custom key store, the stack gets stuck for more than 1.5 hours and then rolls back. The custom key store gets created but fails to connect with HSM cluster with error- "KMS cannot connect the custom key store to its CloudHSM cluster. Error code: USER_NOT_FOUND". I assume 'kmsuser' is not getting configured correctly.
Expand the scope of cluster creation orchestration to the create cluster state machine so that a single end-to-end workflow represents the creation process. Doing so will help support external cluster cert signing processes.
Consider using AWS Systems Manager automation tasks integrated with the Step Functions create cluster state machine to carry out only those operations needed on the initial EC2 client.
Rework the custom resource and oClusterInfo | oCloudHsmKeyStoreId output to be more aligned with the format of properties of other resources that have IDs. e.g. EC2 instances.
Support use of distinct KMS keys for resources
And add inline comments around the egress rule for 443 to explain the purposes for the egress rule (AWS package downloads, connectivity to AWS Systems Manager and other AWS services - be specific). Also highlight that in a more formal implementation, the CIDR should be changed from 0.0.0.0/0
to be more specific and align with how connectivity to such endpoints is established.
Implement some extent of test automation so that the effort required to validate changes is greatly reduced. See the TESTING.md
file for an initial set of test cases to consider automating.
When cloudhsm-cli
is selected for installation during a create stack, use of ssm-user
to execute the CLI manually after stack creation results in warning messages and stdout/stderr content from the CLI being displayed. For example, when a user uses AWS Systems Manager Session Manager to access the EC2 client and execute the cloudhsm-cli
command. In this case, the use is ssm-user
.
You can still use the CLI, but the output messages are annoying.
h-4.2$ /opt/cloudhsm/bin/cloudhsm-cli interactive
thread 'CloudHSM Worker' panicked at 'failed to create appender: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-appender-0.2.2/src/rolling.rs:499:53
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error writing to log file. Falling back to standard error.
2023-04-27T18:33:55.150Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::connection::connection_pool] Adding HSM connection to connection pool: HsmConnection { hsm_info: HSM { IP: "10.4.12.221", Port: 2223 } }
2023-04-27T18:33:55.150Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::connection::connection_pool] Adding HSM connection to connection pool: HsmConnection { hsm_info: HSM { IP: "10.4.19.44", Port: 2223 } }
2023-04-27T18:33:55.150Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] HSM 10.4.12.221:2223 is connecting
2023-04-27T18:33:55.159Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::server_connection::common] Initializing new connection: HSM { IP: "10.4.12.221", Port: 2223 }
2023-04-27T18:33:55.160Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] HSM 10.4.19.44:2223 is connecting
2023-04-27T18:33:55.165Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::server_connection::common] Initializing new connection: HSM { IP: "10.4.19.44", Port: 2223 }
2023-04-27T18:33:55.216Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::server_properties] Version handshake with server succeeded. Received version: ComponentVersion { major: 2, minor: 8 }
2023-04-27T18:33:55.216Z INFO [793] ThreadId(1) [hsm1_marshaling::server_handshake] Reporting sdk version CLI:5.8.0-el7:CodeBuildBatchProject-uFu5sNXfquqK:1ce78aba-ddf5-4c08-aaab-3d9eda62e152
2023-04-27T18:33:55.217Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::server_properties] Version handshake with server succeeded. Received version: ComponentVersion { major: 2, minor: 8 }
2023-04-27T18:33:55.217Z INFO [793] ThreadId(1) [hsm1_marshaling::server_handshake] Reporting sdk version CLI:5.8.0-el7:CodeBuildBatchProject-uFu5sNXfquqK:1ce78aba-ddf5-4c08-aaab-3d9eda62e152
2023-04-27T18:33:55.309Z INFO [793] ThreadId(2) [cloudhsm_provider::hsm1::connection::connection_pool::cluster_info_message] Current cluster version is 0; incoming cluster version is 199391178
2023-04-27T18:33:55.309Z INFO [793] ThreadId(2) [cloudhsm_provider::hsm1::connection::connection_pool::cluster_info_message] HSMs to be added: {HSM { IP: "10.4.19.44", Port: 2223 }, HSM { IP: "10.4.12.221", Port: 2223 }}
2023-04-27T18:33:55.309Z INFO [793] ThreadId(2) [cloudhsm_provider::hsm1::connection::connection_pool::cluster_info_message] HSMs to be removed: {}
2023-04-27T18:33:55.311Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] Updating the state of HSM 10.4.19.44:2223
2023-04-27T18:33:55.311Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] HSM 10.4.19.44:2223 is connected and ready
2023-04-27T18:33:55.317Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] Updating the state of HSM 10.4.12.221:2223
2023-04-27T18:33:55.318Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] HSM 10.4.12.221:2223 is connected and ready
2023-04-27T18:33:55.320Z INFO [793] ThreadId(1) [cloudhsm_provider::hsm1::connection::connection_pool] HSM Connection already in pool 10.4.19.44:2223
aws-cloudhsm > 2023-04-27T18:33:55.320Z INFO [793] ThreadId(3) [cloudhsm_provider::hsm1::connection::connection_pool::cluster_info_message] Current cluster version is 199391178; incoming cluster version is 199391178
The issue is likely due to the activate operation being carried out via the root user and the underlying CLI log file being created using the root user's ID and group ID. Subsequent attempts by non-root users to execute the CLI result in the warning message and log output being written to the terminal.
Prior to running the command as the ssm-user, the run/
directory:
sh-4.2$ ls -alR /opt/cloudhsm/run
/opt/cloudhsm/run:
total 4
drwxrwxrwt 2 root root 41 May 25 18:49 .
drwxr-xr-x 7 root root 61 May 25 18:33 ..
-rw-r--r-- 1 root root 3193 May 25 18:49 cloudhsm-cli.log.2023-05-25
Two methods:
cloudhsm-cli
packageOn a suitable Linux instance:
cloudhsm-cli
package/opt/cloudhsm/cloudhsm-cli interactive
cloudhsm-cli
at stack creation.cloudhsm-cli
as the ssm-user
.Add support for being able to create a cluster from a backup and automatically reconnect an existing disconnected keystore that was associated with the cluster from which the backup was made.
The template currently supports creation of a new CloudHSM cluster from a backup when pStackScope
is set to cluster-and-client-only
, but support has not been added to enable the same operation when the scope is set to with-custom-key-store
. Currently, under that condition, the template will create a new custom keystore and connect it to the newly instantiated cluster. The desire is to at least have an option to attempt to reconnect an existing keystore to the new cluster.
Update the template with the currently supported set of AWS Regions given that ap-northeast-3
and sa-east-1
now support CloudHSM, however cn-*
do not.
In this section of the doc:
Make clear that the customer CA cert associated with the cluster from which the backup was taken must exist in Secrets Manager under the name:
/{system_id}/{backup_cluster_id}/customer-ca-cert
Where {system_id}
is the value of the pSystem
parameter used for both the stack associated with the original cluster from which the backup was taken and the new cluster to be created from the backup. Also highlight that the pSystem
parameter value for both stack needs to be the same.
The original customer CA cert is used during the process of creating a new cluster from a backup to configure the EC2 client with the proper CA cert so that the CloudHSM client tools can interact with HSMs in the newly created cluster.
Hi, since this is a CloudFormation it would be nice to have the template URL associated with the current commit SHA for easier deployment.
The reason is that cloudhsm.yaml
is larger than 50kB, resulting in at 'templateBody' failed to satisfy constraint: Member must have length less than or equal to 51200
Enable users to use their own cluster cert signing process as an option. Continue with the default behavior of using self-signed CA cert and automating the process of signing the cluster cert.
Since CloudHSM SDK v3 provides key management commands that are not in SDK v3, add an option to support installing v3 in place of v5. Continue to default to using v5.
While making this change, determine if there's a "latest" reference for the v5 SDK so that the example can avoid pinning to a potentially old version of v5.
Currently it's set to 0.0.0.0/0
, but can be tightened up to be a CIDR within the VPC in which the HSM ENIs are being created.
e.g. pHsmsVpcCidr
In Regions where more than 3 AZs exist, CloudHSM might not be supported in all AZs. Currently, the template may either encounter an unsupported AZ during creation of the first HSM or during creation of subsequent HSMs. In the latter case, there can be a delay of minutes prior to recognition that an unsupported subnet/AZ has been specified.
Ideally, the template would check up front the specified AZs and issue an error prior to creation of the first HSM when an unsupported subnet/AZ has been specified.
See Editing CloudHSM key store
Implement an update state machine to support:
kmsuser
password in support of common operationsIn the spirit of the blog post How to lower costs by automatically deleting and recreating HSMs, add support for removing all HSMs from a cluster and adding HSMs later on.
Consider changing the default value of the pStackScope
parameter to cluster-and-client-only
from the current default of with-custom-key-store
. The goal of this change is to make the simpler and faster to achieve configuration the default.
The template would still default to 2 HSMs so that it demonstrates the best practice of deploying HSMs to at least 2 AZs.
For example, enable users of the custom resource to get the list of HSM IP addresses via !GetAtt
.
Although the current template gets the job done, it would benefit from being reengineered to provide a set of proper modular CloudFormation custom resources/extensions that could be published to a private registry and easily consumed by other templates.
Consider enhancing the EC2 client automation to use an auto scaling group of 1 to help enhance resiliency of the client. Balance this potential enhancement with the option to power off the client when not needed.
HSMs can end up being replaced by AWS due to internal failures and other circumstances. When the first HSM in a cluster gets replaced prior to the cluster being initialized, the cluster private key and consequently the CSR are also replaced. This means that the external/BYO PKI process needs to be restarted to use the new CSR. The longer the BYO PKI process takes, the greater exposure of the initial HSM to being replaced.
This issue calls for the IaC to be enhanced to help minimize the impact of this situation.
During a stack delete operation, detect if zero HSMs exist and skip 30 second wait before checking state of HSMs. For example, if an error occurs during stack create and before any HSMs have been created, a delete operation will currently still wait for 30 seconds before checking the state of HSMs in the cluster and recognizing that none exist.
Address scenario in which an HSM enters the "degraded" state during creation of the HSM.
During testing of our IaC, we have seen cases in which, during cluster creation, the first HSM to be created doesn't enter the ACTIVE
state but enters a degraded state. With the current code, this state is not caught. Eventually, the create operation times out and an auto rollback of the stack is attempted.
When additional HSMs are created beyond the first HSM, any of the create actions could result in an HSM entering the degraded state.
An HSM in a degraded state can be deleted. i.e. the HSM won't automatically transition to an ACTIVE state later of its own accord.
This is what the state of such an HSM looks like:
"Hsms": [
{
"AvailabilityZone": "us-east-2a",
"ClusterId": "cluster-quhwuyosn7k",
"SubnetId": "subnet-04a76758b58c05023",
"EniId": "eni-09422fd6133d93b07",
"EniIp": "10.4.14.83",
"HsmId": "hsm-n5yjka6nfos",
"State": "DEGRADED",
"StateMessage": "HSM creation failed. Please delete this HSM and try again."
}
],
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.