aws-samples / aws-cloudhsm-cloudformation-template Goto Github PK

Automated deployment of AWS CloudHSM resources using AWS CloudFormation

License: MIT No Attribution

Shell 100.00%

aws-cloudhsm-cloudformation-template's Issues

Enhancement AL 2023 support

I really appreciate this template. Since AL2 is going EOL in 2025, it would be nice to see AL2023 support added for the EC2 instance.

Support new instance type

https://aws.amazon.com/about-aws/whats-new/2024/08/aws-cloudhsm-hsm2m-medium-instance-type/

Enhancement: support the stack to work with private VPCs

Although the documentation states that Internet connectivity is required for the deployment of the stack, it can be changed so the stack could work without this requirement. The majority of software (with the exception of AWS CLI v2) is hosted on S3, so a private VPC with the S3 VPC endpoint works just fine (if one comments out the forceful update of AWS CLI v2). I've raised the corresponding feature request in AWS CLI v2 repo to re-consider the release hosting location to be moved to S3 too.

Given that the auxiliary EC2 instance is spawned from Amazon Linux 2 which already has AWS CLI installed, do we really need to forcibly re-install the package?

Update to use CloudHSM client v5

Update the automation to download and use v5 of the CloudHSM client. Currently, v3 is used.

Examples of change to the CLI:

New explicit command for cluster activation

The current AWS CloudHSM documentation describes the cluster activation process as follows:

https://docs.aws.amazon.com/cloudhsm/latest/userguide/activate-cluster.html

Note the use of the cluster activate command.

The current implementation still works, but it uses a slightly different and perhaps outdated means to trigger cluster activation. It uses the process of setting the Crypto Officer (CO) password to an initial value to trigger the activation.

The stack fails at last stage due to custom key store failing to connect with HSM cluster

When you choose to create custom key store, the stack gets stuck for more than 1.5 hours and then rolls back. The custom key store gets created but fails to connect with HSM cluster with error- "KMS cannot connect the custom key store to its CloudHSM cluster. Error code: USER_NOT_FOUND". I assume 'kmsuser' is not getting configured correctly.

Move EC2 client-based automation to create cluster Step Functions state machine

Expand the scope of cluster creation orchestration to the create cluster state machine so that a single end-to-end workflow represents the creation process. Doing so will help support external cluster cert signing processes.

Consider using AWS Systems Manager automation tasks integrated with the Step Functions create cluster state machine to carry out only those operations needed on the initial EC2 client.

Rework format of stack outputs

Rework the custom resource and oClusterInfo | oCloudHsmKeyStoreId output to be more aligned with the format of properties of other resources that have IDs. e.g. EC2 instances.

Support customer managed KMS keys

Support use of distinct KMS keys for resources

Secrets Manager secrets
EC2 client EBS volume
CloudWatch log groups

Security: Add documentation and guidance for the EC2 client's egress 443 rule

And add inline comments around the egress rule for 443 to explain the purposes for the egress rule (AWS package downloads, connectivity to AWS Systems Manager and other AWS services - be specific). Also highlight that in a more formal implementation, the CIDR should be changed from 0.0.0.0/0 to be more specific and align with how connectivity to such endpoints is established.

Implement test automation

Implement some extent of test automation so that the effort required to validate changes is greatly reduced. See the TESTING.md file for an initial set of test cases to consider automating.

cloudhsm-cli: when selected for installation, run as ssm-user results in warning and log messages to terminal

When cloudhsm-cli is selected for installation during a create stack, use of ssm-user to execute the CLI manually after stack creation results in warning messages and stdout/stderr content from the CLI being displayed. For example, when a user uses AWS Systems Manager Session Manager to access the EC2 client and execute the cloudhsm-cli command. In this case, the use is ssm-user.

You can still use the CLI, but the output messages are annoying.

h-4.2$ /opt/cloudhsm/bin/cloudhsm-cli interactive
thread 'CloudHSM Worker' panicked at 'failed to create appender: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-appender-0.2.2/src/rolling.rs:499:53
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error writing to log file. Falling back to standard error.
2023-04-27T18:33:55.150Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::connection::connection_pool] Adding HSM connection to connection pool: HsmConnection { hsm_info: HSM { IP: "10.4.12.221", Port: 2223 } }
2023-04-27T18:33:55.150Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::connection::connection_pool] Adding HSM connection to connection pool: HsmConnection { hsm_info: HSM { IP: "10.4.19.44", Port: 2223 } }
2023-04-27T18:33:55.150Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] HSM 10.4.12.221:2223 is connecting
2023-04-27T18:33:55.159Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::server_connection::common] Initializing new connection: HSM { IP: "10.4.12.221", Port: 2223 }
2023-04-27T18:33:55.160Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] HSM 10.4.19.44:2223 is connecting
2023-04-27T18:33:55.165Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::server_connection::common] Initializing new connection: HSM { IP: "10.4.19.44", Port: 2223 }
2023-04-27T18:33:55.216Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::server_properties] Version handshake with server succeeded. Received version: ComponentVersion { major: 2, minor: 8 }
2023-04-27T18:33:55.216Z INFO  [793] ThreadId(1) [hsm1_marshaling::server_handshake] Reporting sdk version CLI:5.8.0-el7:CodeBuildBatchProject-uFu5sNXfquqK:1ce78aba-ddf5-4c08-aaab-3d9eda62e152
2023-04-27T18:33:55.217Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::server_properties] Version handshake with server succeeded. Received version: ComponentVersion { major: 2, minor: 8 }
2023-04-27T18:33:55.217Z INFO  [793] ThreadId(1) [hsm1_marshaling::server_handshake] Reporting sdk version CLI:5.8.0-el7:CodeBuildBatchProject-uFu5sNXfquqK:1ce78aba-ddf5-4c08-aaab-3d9eda62e152
2023-04-27T18:33:55.309Z INFO  [793] ThreadId(2) [cloudhsm_provider::hsm1::connection::connection_pool::cluster_info_message] Current cluster version is 0; incoming cluster version is 199391178
2023-04-27T18:33:55.309Z INFO  [793] ThreadId(2) [cloudhsm_provider::hsm1::connection::connection_pool::cluster_info_message] HSMs to be added: {HSM { IP: "10.4.19.44", Port: 2223 }, HSM { IP: "10.4.12.221", Port: 2223 }}
2023-04-27T18:33:55.309Z INFO  [793] ThreadId(2) [cloudhsm_provider::hsm1::connection::connection_pool::cluster_info_message] HSMs to be removed: {}
2023-04-27T18:33:55.311Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] Updating the state of HSM 10.4.19.44:2223
2023-04-27T18:33:55.311Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] HSM 10.4.19.44:2223 is connected and ready
2023-04-27T18:33:55.317Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] Updating the state of HSM 10.4.12.221:2223
2023-04-27T18:33:55.318Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::hsm_connection_impl] HSM 10.4.12.221:2223 is connected and ready
2023-04-27T18:33:55.320Z INFO  [793] ThreadId(1) [cloudhsm_provider::hsm1::connection::connection_pool] HSM Connection already in pool 10.4.19.44:2223
aws-cloudhsm > 2023-04-27T18:33:55.320Z INFO  [793] ThreadId(3) [cloudhsm_provider::hsm1::connection::connection_pool::cluster_info_message] Current cluster version is 199391178; incoming cluster version is 199391178

The issue is likely due to the activate operation being carried out via the root user and the underlying CLI log file being created using the root user's ID and group ID. Subsequent attempts by non-root users to execute the CLI result in the warning message and log output being written to the terminal.

Prior to running the command as the ssm-user, the run/ directory:

sh-4.2$ ls -alR /opt/cloudhsm/run
/opt/cloudhsm/run:
total 4
drwxrwxrwt 2 root root   41 May 25 18:49 .
drwxr-xr-x 7 root root   61 May 25 18:33 ..
-rw-r--r-- 1 root root 3193 May 25 18:49 cloudhsm-cli.log.2023-05-25

Reproduce

Two methods:

1. Download `cloudhsm-cli` package

On a suitable Linux instance:

Download the cloudhsm-cli package
As root, execute /opt/cloudhsm/cloudhsm-cli interactive
As a non-root user, execute the same command

2. Use this CloudFormation template

Create a stack but select the option to install the cloudhsm-cli at stack creation.
After stack is created, use Session Manager to access the EC2 client and run the cloudhsm-cli as the ssm-user.

Attempt to reconnect existing keystore when creating cluster from backup

Add support for being able to create a cluster from a backup and automatically reconnect an existing disconnected keystore that was associated with the cluster from which the backup was made.

The template currently supports creation of a new CloudHSM cluster from a backup when pStackScope is set to cluster-and-client-only, but support has not been added to enable the same operation when the scope is set to with-custom-key-store. Currently, under that condition, the template will create a new custom keystore and connect it to the newly instantiated cluster. The desire is to at least have an option to attempt to reconnect an existing keystore to the new cluster.

Update supported AWS Regions

Update the template with the currently supported set of AWS Regions given that ap-northeast-3 and sa-east-1 now support CloudHSM, however cn-* do not.

Enhance create cluster from backup to highlight dependency on customer CA cert from original cluster

In this section of the doc:

https://github.com/aws-samples/aws-cloudhsm-cloudformation-template#creating-a-cloudhsm-cluster-from-a-backup

Make clear that the customer CA cert associated with the cluster from which the backup was taken must exist in Secrets Manager under the name:

/{system_id}/{backup_cluster_id}/customer-ca-cert

Where {system_id} is the value of the pSystem parameter used for both the stack associated with the original cluster from which the backup was taken and the new cluster to be created from the backup. Also highlight that the pSystem parameter value for both stack needs to be the same.

The original customer CA cert is used during the process of creating a new cluster from a backup to configure the EC2 client with the proper CA cert so that the CloudHSM client tools can interact with HSMs in the newly created cluster.

Provide Template URL

Hi, since this is a CloudFormation it would be nice to have the template URL associated with the current commit SHA for easier deployment.
The reason is that cloudhsm.yaml is larger than 50kB, resulting in at 'templateBody' failed to satisfy constraint: Member must have length less than or equal to 51200

Support external cluster cert signing processes

Enable users to use their own cluster cert signing process as an option. Continue with the default behavior of using self-signed CA cert and automating the process of signing the cluster cert.

Provide option to install CloudHSM client SDK v3

Since CloudHSM SDK v3 provides key management commands that are not in SDK v3, add an option to support installing v3 in place of v5. Continue to default to using v5.

While making this change, determine if there's a "latest" reference for the v5 SDK so that the example can avoid pinning to a potentially old version of v5.

Security: Add parameter for CloudHSM cluster VPC CIDR to tighten up egress rule for EC2 client to connect to HSMs

Currently it's set to 0.0.0.0/0, but can be tightened up to be a CIDR within the VPC in which the HSM ENIs are being created.

e.g. pHsmsVpcCidr

Check CloudHSM support selected subnets/AZs prior to creating cluster

In Regions where more than 3 AZs exist, CloudHSM might not be supported in all AZs. Currently, the template may either encounter an unsupported AZ during creation of the first HSM or during creation of subsequent HSMs. In the latter case, there can be a delay of minutes prior to recognition that an unsupported subnet/AZ has been specified.

Ideally, the template would check up front the specified AZs and issue an error prior to creation of the first HSM when an unsupported subnet/AZ has been specified.

Key store: Support stack updates

See Editing CloudHSM key store

Implement an update state machine to support:

Renaming key store
Disconnecting and reconnecting the key store in support of maintenance operations
- Informing KMS of an update to the kmsuser password in support of common operations
Disconnecting the key store and reconnecting it to a different CloudHSM cluster
- For example, in support of connecting to a CloudHSM cluster created from a backup
Changing the option to delete the key store upon stack deletion

Cost optimization: Support deletion of all HSMs in a cluster without deleting the cluster

In the spirit of the blog post How to lower costs by automatically deleting and recreating HSMs, add support for removing all HSMs from a cluster and adding HSMs later on.

Default to CloudHSM cluster only for `pStackScope`

Consider changing the default value of the pStackScope parameter to cluster-and-client-only from the current default of with-custom-key-store. The goal of this change is to make the simpler and faster to achieve configuration the default.

The template would still default to 2 HSMs so that it demonstrates the best practice of deploying HSMs to at least 2 AZs.

Consider adding attributes to the CloudHSM cluster resource

For example, enable users of the custom resource to get the list of HSM IP addresses via !GetAtt.

Evolve toward more modular custom resources and use of CloudFormation private registry

Although the current template gets the job done, it would benefit from being reengineered to provide a set of proper modular CloudFormation custom resources/extensions that could be published to a private registry and easily consumed by other templates.

CloudHSM cluster: Explore using auto scaling group of 1 to help support resilience of EC2 client

Consider enhancing the EC2 client automation to use an auto scaling group of 1 to help enhance resiliency of the client. Balance this potential enhancement with the option to power off the client when not needed.

BYO PKI: Help address situations in which first HSM gets replaced before initalization

HSMs can end up being replaced by AWS due to internal failures and other circumstances. When the first HSM in a cluster gets replaced prior to the cluster being initialized, the cluster private key and consequently the CSR are also replaced. This means that the external/BYO PKI process needs to be restarted to use the new CSR. The longer the BYO PKI process takes, the greater exposure of the initial HSM to being replaced.

This issue calls for the IaC to be enhanced to help minimize the impact of this situation.

Delete: Optimization when no HSMs exist

During a stack delete operation, detect if zero HSMs exist and skip 30 second wait before checking state of HSMs. For example, if an error occurs during stack create and before any HSMs have been created, a delete operation will currently still wait for 30 seconds before checking the state of HSMs in the cluster and recognizing that none exist.

Detect and handle HSMs in `DEGRADED` state

Address scenario in which an HSM enters the "degraded" state during creation of the HSM.

During testing of our IaC, we have seen cases in which, during cluster creation, the first HSM to be created doesn't enter the ACTIVE state but enters a degraded state. With the current code, this state is not caught. Eventually, the create operation times out and an auto rollback of the stack is attempted.

When additional HSMs are created beyond the first HSM, any of the create actions could result in an HSM entering the degraded state.

An HSM in a degraded state can be deleted. i.e. the HSM won't automatically transition to an ACTIVE state later of its own accord.

This is what the state of such an HSM looks like:

            "Hsms": [
                {
                    "AvailabilityZone": "us-east-2a",
                    "ClusterId": "cluster-quhwuyosn7k",
                    "SubnetId": "subnet-04a76758b58c05023",
                    "EniId": "eni-09422fd6133d93b07",
                    "EniIp": "10.4.14.83",
                    "HsmId": "hsm-n5yjka6nfos",
                    "State": "DEGRADED",
                    "StateMessage": "HSM creation failed. Please delete this HSM and try again."
                }
            ],

aws-samples / aws-cloudhsm-cloudformation-template Goto Github PK

aws-cloudhsm-cloudformation-template's Issues

New explicit command for cluster activation

Reproduce

1. Download cloudhsm-cli package

2. Use this CloudFormation template

Recommend Projects

Recommend Topics

Recommend Org

1. Download `cloudhsm-cli` package