aws / aws-rfdk Goto Github PK

The Render Farm Deployment Kit on AWS is a library for use with the AWS Cloud Development Kit that helps you define your render farm cloud infrastructure as code.

Home Page: https://docs.aws.amazon.com/rfdk/index.html

License: Apache License 2.0

Shell 6.91% JavaScript 1.85% TypeScript 87.35% Dockerfile 0.04% PowerShell 0.69% Python 3.17%

aws cdk infrastructure-as-code cloud-infrastructure

aws-rfdk's Introduction

Render Farm Deployment Kit on AWS (RFDK)

The Render Farm Deployment Kit on AWS is an open-source library for use with the AWS Cloud Development Kit that is designed to help you deploy, configure, and maintain your render farm infrastructure in the cloud.

It offers high-level object-oriented abstractions to define render farm infrastructure using the power of Python and Typescript.

The RFDK is available in:

Javascript, Typescript (Node.js >= 18.0.0 officially supported, Node.js >= 14.15.0 unofficially supported)
- We recommend using an Active LTS Release
Python (Python >= 3.6)

Note: Language version compatibility is the greater of those listed above and the versions listed in the AWS CDK.

Reporting Bugs/Feature Requests
Security issue notifications
Contributing
Code of Conduct
Licensing

Reporting Bugs/Feature Requests

We welcome you to use the GitHub issue tracker to report bugs or suggest features.

When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:

A reproducible test case or series of steps
The version of our code being used
Any modifications you've made relevant to the bug
Anything unusual about your environment or deployment

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our vulnerability reporting page. Please do not create a public github issue.

Contributing

Contributions to the RFDK are encouraged. If you want to fix a problem, or want to enhance the library in any way, then we are happy to accept your contribution. Information on contributing to the RFDK can be found in CONTRIBUTING.md.

Code of Conduct

This project has adopted the Amazon Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Licensing

See the LICENSE file for our project's licensing. We will ask you to confirm the licensing of your contribution.

We may ask you to sign a Contributor License Agreement (CLA) for larger changes.

aws-rfdk's People

Contributors

Stargazers

Watchers

Forkers

horsmand jusiskin ddneilson ryyakobe grbartel yashda gdday jericht kozlove-aws aws-painec batty16 alcousins addemko charlesmjm looeye-at-3lift qpc-database cloudbridgetechnologies christian-korneck test-mass-forker-org-1 tryweirdier xuetinp cherie-chen leongdl lucaseck aws-samuel crowecawcaw dustinkrejci michael-liao-amazon halfmu edwards-aws ramonfigueiredo cheolmin seanpm2001 sakshie95 khalid-kifayat rondeau-aws radmehrdarsi gintian mgway

aws-rfdk's Issues

Configurable expiry for generated X.509 certificates

All X.509 certificates generated by the X509CertificatePem construct have a default expiry of 3 years, and that is not configurable. The request is to make that expiry configurable.

Use Case

Prevent render farms from becoming inoperable after 3 years by extending the duration of X.509 certificates.

Other

N/A

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Cannot create StaticPrivateIpServer

Something seems to have changed in the IAM permissions required to set up a SNS Topic for ASG Lifecycle events. The RFDK can no longer successfully deploy a StaticPrivateIpServer nor, by extension, a MongoDbInstance.

Reproduction Steps

Attempt to deploy a StaticPrivateIpServer

Error Log

Unable to publish test message to notification target arn:aws:sns:us-west-2:00000000000000:RFDKInteg-DL-ComponentTier1602182965287344326-AttachEniNotificationTopicc8b1e9a6783c4954b191204dd5e3b9e0F5D22665-YH1IJMTK90DR using IAM role arn:aws:iam::00000000000000:role/RFDKInteg-DL-ComponentTie-AttachEniNotificationRol-YTWHO4LENVPB. Please check your target and role configuration and try to put lifecycle hook again. (Service: AmazonAutoScaling; Status Code: 400; Error Code: ValidationError; Request ID: 9af0bc15-462d-4af3-912b-315d4b89c7ef; Proxy: null)

Environment

CDK CLI Version : 1.57.0
CDK Framework Version: 1.57.0
RFDK Version: 0.17.0
Deadline Version: N/A
Node.js Version: Any
OS : Any
Language (Version): All

This is 🐛 Bug Report

I-named interfaces cannot be instantiated in Python

A quirk of jsii means that any of the interfaces that we have named as I<Something> cannot be instantiated in Python. This is okay for most interfaces, but is a problem for a couple that we need to instantiate as part of construct properties. Namely:

IDistinguishedName
IMongoDbUsers
IMongoDbX509User

Reproduction Steps

% python3 -m venv bug_env
% source bug_env/bin/activate
% pip install aws_rfdk
% python
>>> name = rfdk.IDistingushedName()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'aws_rfdk' has no attribute 'IDistingushedName'
>>> user = rfdk.IMongoDbUsers()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/local/home/neilsd/github/bug_env/lib/python3.6/site-packages/typing_extensions.py", line 1211, in _no_init
    raise TypeError('Protocols cannot be instantiated')
TypeError: Protocols cannot be instantiated
>>> user = rfdk.IMongoDbX509User()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/local/home/neilsd/github/bug_env/lib/python3.6/site-packages/typing_extensions.py", line 1211, in _no_init
    raise TypeError('Protocols cannot be instantiated')
TypeError: Protocols cannot be instantiated

Environment

CDK CLI Version : 1.57.0
CDK Framework Version: 1.57.0
RFDK Framework Version: 0.16.0
Node.js Version: any
OS : all
Language (Version): Python

Other

A fix is incoming

This is 🐛 Bug Report

Worker configuration doesn't work with Deadline 10.1.11

The RFDK integration tests are failing to be able to assign workers to pools and groups in the worker fleet tests when running with Deadline 10.1.11.5. In this Deadline release, worker AMI's were changed so that the workers won't auto-start, which is a likely culprit for why the RFDK group/pool setup code isn't working.

Reproduction Steps

The results that I saw failures in are from my branch (which is up-to-date with 5f2ce7f and not missing any major code changes) where I'm enabling the integration tests to run from a CodeBuild Project. I've split up the worker fleet tests into 2 groups of tests, workers with HTTP connections and workers with HTTPS connections, so these are only the results for the HTTP tests, but these same tests passed when using Deadline 10.1.10.6 so I have reason to believe it's a compatibility issue with 10.1.11.5 and we will see the same failures when using the release or mainline branches. I haven't modified the tests or their setup at all from those in mainline.

Attempt a run of the integration tests with the following configuration in integ/test-config.sh:

export USER_ACCEPTS_SSPL_FOR_RFDK_TESTS=true
export DEADLINE_VERSION='10.1.11.5'
export SKIP_deadline_01_repository_TEST=true
export SKIP_deadline_02_renderQueue_TEST=true

Error Log

Here are my test results, with some output from the first failing tests (WF 1-2 and WF 1-3). The other failures looked similar.

119 | Deadline WorkerFleet tests (Linux Worker HTTP mode)
120 | Worker node tests
121 | ✓ WF-1-1: Workers can be attached to the Render Queue (3221 ms)
122 | ✕ WF-1-2: Workers can be added to groups, pools and regions (7439 ms)
123 | ✕ WF-1-3: Workers can be assigned jobs submitted to a group (23670 ms)
124 | ✕ WF-1-4: Workers can be assigned jobs submitted to a pool (21612 ms)
125 | Deadline WorkerFleet tests (Windows Worker HTTP mode)
126 | Worker node tests
127 | ✓ WF-2-1: Workers can be attached to the Render Queue (3213 ms)
128 | ✕ WF-2-2: Workers can be added to groups, pools and regions (4231 ms)
129 | ✕ WF-2-3: Workers can be assigned jobs submitted to a group (22657 ms)
130 | ✕ WF-2-4: Workers can be assigned jobs submitted to a pool (22609 ms)
131 |  
132 | ● Deadline WorkerFleet tests (Linux Worker HTTP mode) › Worker node tests › WF-1-2: Workers can be added to groups, pools and regions
133 |  
134 | expect(received).toMatch(expected)
135 |  
136 | Expected pattern: /testpool\ntestgroup\ntestregion/
137 | Received string:  "testpool
138 | testgroup
139 | none
140 | "
141 |  
142 | 124 \|       return awaitSsmCommand(bastionId, params).then( response => {
143 | 125 \|         var responseOutput = response.output;
144 | > 126 \|         expect(responseOutput).toMatch(/testpool\ntestgroup\ntestregion/);
145 | \|                                ^
146 | 127 \|       });
147 | 128 \|     });
148 | 129 \|
149 |  
150 | at awaitSsmCommand_1.default.then.response (components/deadline/deadline_03_workerFleetHttp/test/deadline_03_workerFleetHttp.test.ts:126:32)
151 |  
152 | ● Deadline WorkerFleet tests (Linux Worker HTTP mode) › Worker node tests › WF-1-3: Workers can be assigned jobs submitted to a group
153 |  
154 | thrown: Object {
155 | "Name": "aws:runShellScript",
156 | "Output": "
157 | ----------ERROR-------
158 | failed to run commands: exit status 1",
159 | "OutputS3BucketName": "",
160 | "OutputS3KeyPrefix": "",
161 | "OutputS3Region": "us-west-2",
162 | "ResponseCode": 1,
163 | "ResponseFinishDateTime": 2020-11-17T03:17:56.207Z,
164 | "ResponseStartDateTime": 2020-11-17T03:17:34.875Z,
165 | "StandardErrorUrl": "",
166 | "StandardOutputUrl": "",
167 | "Status": "Failed",
168 | "StatusDetails": "Failed",
169 | }
170 |  
171 | 134 \|
172 | 135 \|     // eslint-disable-next-line @typescript-eslint/no-shadow
173 | > 136 \|     test.each(setConfigs)(`WF-${id}-%i: Workers can be assigned jobs submitted to a %s`, async (_, name, arg) => {
174 | \|                          ^
175 | 137 \|       /**********************************************************************************************************
176 | 138 \|        * TestID:          WF-3, WF-4
177 | 139 \|        * Description:     Confirm that jobs sent to a specified group/pool/region are routed to a worker in that set
178 |  
179 | at new Spec (../node_modules/jest-jasmine2/build/jasmine/Spec.js:116:22)
180 | at Array.forEach (<anonymous>)
181 | at Suite.describe (components/deadline/deadline_03_workerFleetHttp/test/deadline_03_workerFleetHttp.test.ts:136:26)
182 | at Object.<anonymous>.describe.each (components/deadline/deadline_03_workerFleetHttp/test/deadline_03_workerFleetHttp.test.ts:58:3)
183 | at Array.forEach (<anonymous>)
184 | at Object.<anonymous> (components/deadline/deadline_03_workerFleetHttp/test/deadline_03_workerFleetHttp.test.ts:57:25)
185 |  

325 | Test Suites: 1 failed, 1 total
326 | Tests:       6 failed, 2 passed, 8 total
327 | Snapshots:   0 total
328 | Time:        124.712 s
329 | Ran all test suites matching /deadline_03_workerFleetHttp.test/i.
330 | Test results written to: .e2etemp/deadline_03_workerFleetHttp.json
331 | error Command failed with exit code 1.

Below is the portion of a worker's cloud init log where configureWorker.sh gets run. I believe the line WORKER_NAMES=() shows an issue; there should have been a worker in this list and the lines after that are trying to apply the group testgroup to the empty list.

2020-11-16T20:45:37.136-06:00   + '[' -z '' ']'
2020-11-16T20:45:37.136-06:00   + echo 'INFO: WORKER_REGION not provided'
2020-11-16T20:45:37.136-06:00   INFO: WORKER_REGION not provided
2020-11-16T20:45:37.136-06:00   ++ hostname -s
2020-11-16T20:45:37.136-06:00   + WORKER_NAME_PREFIX=ip-10-0-193-155
2020-11-16T20:45:37.136-06:00   + WORKER_NAMES=()
2020-11-16T20:45:37.136-06:00   + shopt -s dotglob
2020-11-16T20:45:37.136-06:00   + for file in '/var/lib/Thinkbox/Deadline10/slaves/*'
2020-11-16T20:45:37.136-06:00   + file='*'
2020-11-16T20:45:37.136-06:00   + workerSuffix='*'
2020-11-16T20:45:37.136-06:00   + '[' -z '*' ']'
2020-11-16T20:45:37.136-06:00   + WORKER_NAMES+=("$WORKER_NAME_PREFIX"-$workerSuffix)
2020-11-16T20:45:37.136-06:00   + shopt -u dotglob
2020-11-16T20:45:37.136-06:00   + '[' 1 -gt 0 ']'
2020-11-16T20:45:37.136-06:00   + for group in '"${WORKER_GROUPS[@]}"'
2020-11-16T20:45:37.136-06:00   + existingGroups=($("$DEADLINE_COMMAND" -GetGroupNames))
2020-11-16T20:45:37.886-06:00   ++ /opt/Thinkbox/Deadline10/bin/deadlinecommand -GetGroupNames
2020-11-16T20:45:37.886-06:00   + [[ ! none =~ testgroup ]]
2020-11-16T20:45:38.887-06:00   + /opt/Thinkbox/Deadline10/bin/deadlinecommand -AddGroup testgroup
2020-11-16T20:45:38.887-06:00   Group testgroup added
2020-11-16T20:45:38.887-06:00   Successfully added group: testgroup
2020-11-16T20:45:38.887-06:00   ++ IFS=,
2020-11-16T20:45:38.887-06:00   ++ echo 'ip-10-0-193-155-*'
2020-11-16T20:45:38.887-06:00   ++ IFS=,
2020-11-16T20:45:38.887-06:00   ++ echo testgroup
2020-11-16T20:45:40.891-06:00   + /opt/Thinkbox/Deadline10/bin/deadlinecommand -SetGroupsForSlave 'ip-10-0-193-155-*' testgroup
2020-11-16T20:45:40.891-06:00   Set groups to testgroup
2020-11-16T20:45:40.891-06:00   + '[' 0 -gt 0 ']'
2020-11-16T20:45:40.891-06:00   + service --status-all

Environment

CDK CLI Version : 1.72.0
CDK Framework Version: 1.72.0
RFDK Version: 0.19.0
Deadline Version: 10.1.11.5
Node.js Version:
OS : AL2
Language (Version): TypeScript

Other

Next steps will be to confirm that the issue is with the changes made to workers not automatically starting, then assess how to update the UserData script to configure the workers to handle this.

This is 🐛 Bug Report

Windows Worker Fleet Example

Hi! The current example seems to be for linux support only. If there is Windows support can this be added as an example / documented?

Thank you!

This is a 📕 documentation issue

Integration tests failing during start-up

The integration tests are unable to run due to a failure in our rfdk-integ-e2e.sh script.

Reproduction Steps

Fill in integ/test-config.sh without using any pre-component hooks and then run yarn run e2e from the integ directory.

Error Log

./scripts/bash/rfdk-integ-e2e.sh: line 77: PRE_COMPONENT_HOOK: unbound variable

Environment

This failure is happening in a bash script in our integration test directory. This is happening trying to run the integration tests on Linux using our current mainline branch.

Solution

if [ -n "PRE_COMPONENT_HOOK" ]

Should be written as:

if [ ! -z "${PRE_COMPONENT_HOOK+x}" ]

This is 🐛 Bug Report

Add linting for whitespace in TypeScript type declarations

Currently, the linters configured in the code repository do not lint for consistent whitespace between TypeScript variable names and their type declaration.

For example, all of the following whitespace usages are accepted:

const myVar:string;
const myVar: string;
const myvar:       string;

Similarly, function and class method argument declarations are not linted:

class Foo {
  constructor(a:string) {
  }
}
class Foo {
  constructor(a: string) {
  }
}
class Foo {
  constructor(a:    string) {
  }
}

This issue proposes adding/configuring the linters to check whitespace in such cases.

Use Case

Without linting, code style can become inconsistent between different files. It also causes unnecessary cycles in code review. This issue originates from a code review discussion that had the contributor change all of their whitespace only to later discover that the change made their code inconsistent with the rest of the repository's codebase. Having proper linting for this code style removes enforces this code style convention before code review and makes the convention easily discovered by new contributors or to prior contributors if we decide to change the convention.

Proposed Solution

Investigate the various linters we use in the project and determine if any support this type of linting. If so, configure the linter and apply the style to any non-compliant source files. If not, investigate the possibility of creating a linter rule for this.

Other

N/A

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Leverage ECS circuit breakers to fail ECS deployments

There is a new ECS feature called "deployment circuit breakers" -- https://aws.amazon.com/blogs/containers/announcing-amazon-ecs-deployment-circuit-breaker/ -- that should be used by the RFDK during its ECS-based deployments.

Use Case

The current RFDK cannot fail a deployment of its ECS-based components (Deadline RenderQueue & UsageBasedLicensing); if there's a misconfiguration leading to a failed deployment, then the ECS task will just repeatedly fail and restart, but the CloudFormation deployment will succeed. It would be much better if the CloudFormation deployment also failed in this situation.

Proposed Solution

Explore whether or not circuit breakers can solve this problem for us -- https://aws.amazon.com/blogs/containers/announcing-amazon-ecs-deployment-circuit-breaker/

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Update version bump script to bump python example too

Currently our bump script can bump all of the lerna packages contained in our repository, but not the python package contained in the examples directory.

Use Case

When doing a version bump, we need to be able to bump the python version at the same time. This will save time when doing releases and make it impossible to forget to do, which can happen when it's a manual process.

Proposed Solution

Update the bump script so it updates the RFDK version used in examples/deadline/All-In-AWS-Infrastructure-Basic/python. We should take the time to evaluate if we can do this in a way that will also cover any other python examples that get added to the examples directory in the future.

Other

Pull request #171 can be used as a reference to see where the version bump needs to be done. The AWS CDK version bump in this PR can be ignored. We are currently doing CDK version upgrades manually, and that was a separate miss that it didn't get updated when the rest of the repository had its CDK version updated.

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Incorrect Katana and Maxwell Usage-Based Licensing Port

The aws-rfdk.deadline.UsageBasedLicense.forKatana() method adds a security group ingress rule for port 4101, when the licensing traffic for Katana is actually served on port 4151.

Similarly, aws-rfdk.deadline.UsageBasedLicense.forKatana() adds a rule for port 5055 but should be 5555.

This is 🐛 Bug Report

Worker nodes and UserData

My team would like to be able to mount storage (such as FSx for Lustre) on the Worker nodes before the worker nodes connect to the Deadline server. This storage contains the data needed to be able to run jobs.

That isn't currently possible to do using the CDK & RFDK. I've added code to mount the storage to the UserData, but I only have the ability to add the code after the UserData that the RFDK adds to the Auto-Scaling Group. I can do it manually in the EC2 console, but not in the CDK application itself.

I would like the ability to either add my UserData before the RFDK's UserData, or I would like to be able to get the RFDK's UserData so that I can do this myself, and then I'm responsible for setting the UserData on the ASG.

I wasn't sure whether this was a bug, or a feature request. It's a little of both - ideally workers shouldn't connect to the server until they're ready to run jobs.

Reproduction Steps

Error Log

Environment

CDK CLI Version :
CDK Framework Version:
RFDK Version:
Deadline Version:
Node.js Version:
OS :
Language (Version):

Other

This is 🐛 Bug Report

Documentation for deploying in isolated subnets

Some customers want to deploy RFDK constructs into isolated subnets that have no internet access. The instances/resources deployed by the RFDK need to interact with some external sites (ex: MongoDbInstance construct accesses the MongoDB yum repository), and/or AWS services (ex: CloudFormation to signal successful startup).

The RFDK documentation should include a section on requirements for isolated subnets so that customers know which VPC Interface/Gateway endpoints they will need to deploy, and other limitations related to isolated subnets.

This is a 📕 documentation issue

Expand configuration options for Security Group configuration

It is not currently possible to provide/configure the Security Group for:

Repository -- https://docs.aws.amazon.com/rfdk/api/latest/docs/aws-rfdk.deadline.Repository.html
The RCS host instance in RenderQueue -- https://docs.aws.amazon.com/rfdk/api/latest/docs/aws-rfdk.deadline.RenderQueue.html

It would be grand to be able to have fine-grained control over those security groups.

Use Case

Presently, the security groups created for these resources allow full-egress by default. Customers that are aiming for enhanced layers of network-level access controls have a need to set these security groups to deny all egress by default, and to explicitly add their own egress rules.

Customers may also have created their own security group that, say, controls access to VPC Interface Endpoints, and they need a means by which those security groups can be added to the Repository & RenderQueue's hosts.

Proposed Solution

Property on the construct(s) that allow providing the security group that will be used. Note: RenderQueue should provide separate properties for the ALB & ECS host; the Connections object of the RenderQueue should remain as the ALB's SG, but there should also be an easy-access way to get to the RCS-host's SG.
Addition of an addSecurityGroup() method (ex: https://docs.aws.amazon.com/rfdk/api/latest/docs/aws-rfdk.deadline.WorkerInstanceFleet.html#add-wbr-security-wbr-groupsecuritygroup ) to these constructs that allows the customer to add additional security groups to the construct after creation. In the RenderQueue, there should be separate methods for the ALB & RCS SGs. Note: These additional security groups should not be added to the Connections object of the construct -- doing that would make any use of the construct's Connections object also change the added SGs.

Other

N/A

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Expose UpdatePolicy on WorkerInstanceFleet

There's no way to perform updates to the AMI used by a worker fleet and have the fleet immediately deploy them, the current configuration will only use the new AMI for new instances that get added to the auto scaling group and leave the old ones as-is. I would like a way to be able to kill any currently running worker instances and start new ones with the updated AMI in their place, with a single CDK deployment.

Use Case

I want to be able to upgrade the version of Deadline on the AMI my workers are using and only have to run cdk deploy once to get all my running workers replaced with instances running the new Deadline version.

Proposed Solution

I propose we expose the AutoScalingGroup's updatePolicy construct property on the WorkerInstanceFleet, which would allow customers to choose if they'd like to do a blue-green deployment or a rolling update.

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Allow subnet selection for RenderQueue's ALB

The ask is for the ability to control which subnets the RenderQueue will deploy its ALB(s) into.

Use Case

When deploying into a Local Zone (ex: the LA Local Zone), the Local Zone is actually a part of a greater region (us-west-2 in the case of the LA zone). An ALB cannot be deployed into subnets that are both in the region and in the local zone. So, to support local zones we need the ability for the user to select which subnets the ALB will be created within (either entirely in the LZ or entirely in the regular Region AZs).

Proposed Solution

Adding a vpcAlbSubnets property to the RenderQueue props.

Other

N/A

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Add an example for using EC2 Image Builder

Use Case

Keeping software updated on custom AMI's can be a pain if you're creating them manually. The EC2 Image Builder service has matured to the point that it is a viable option for automating the process. While the support in CloudFormation and CDK is still a bit lacking, it is usable and can be worked into an RFDK app to build the required AMIs for the worker fleet on the fly.

Proposed Solution

Write up an example app that uses the EC2 Image Builder L1 constructs to take any base AMI, install Deadline on it, and then create a new AMI that is used by a worker fleet.

This is a 🚀 Feature Request

Modify behavior of CloudWatchAgent agent installation

Many RFDK constructs use the CloudWatchAgent construct ( https://docs.aws.amazon.com/rfdk/api/latest/docs/aws-rfdk.CloudWatchAgent.html ) by default. This construct always ensures that the CloudWatchAgent is installed (either pre-installed, or by installing it itself), and will fail the UserData script if it cannot be installed.

I suggest a modification to the scripting to not always install the agent if it is not present.

Note: The instance initialization (ex: cloud-init) can be viewed in the EC2 console, or API, by querying the instance's syslog; on Linux, at least.

Use Case

The S3 method of installing the CloudWatchAgent requires that the host be able to reach the us-east-1 endpoints of S3, as that is where the bucket containing the agent is located. If deploying to any region other than us-east-1, this necessitates public internet access (tcp port 443) for any construct that uses the RFDK's CloudWatchAgent.

Using yum install on an Amazon Linux 2 instance only requires port 80 access to the regional S3 endpoints, which can be obtained, securely, through the use of a VPC Gateway Endpoint for S3 in the VPC.

For Windows, the current CloudWatchAgent construct also requires public internet access to fetch the gpg program. This makes it impossible to deploy an RFDK worker to an isolated subnet, as required by some security standards in the industry.

The current construct also make it impossible for a customer to use Linux distributions that do not support RPM.

Proposed Solution

I suggest a change of behavior to:

Only try to install the agent on Amazon Linux 2 ( this will preserve functionality for the Repository ), but only do the install via yum (check for whether we're running on AL2 before trying to install); do not install the agent using the S3
Do not fail the script if CloudWatchAgent is not present, or cannot be installed.
Document, clearly, in the CloudWatchAgent construct and all RFDK constructs that use it, that customers should pre-install the CloudWatchAgent on their AMI, or use AL2 with access to regional S3 on port 80, if they want to enable log streaming functionality on their host(s).

Other

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

openssl "string too long"

When using openssl to generate certificates, there are limits on the length that the common name, organization, and organizational unit strings can be. Our code that is utilizing openssl to generate certificates doesn't have any of these validations, so it's possible to have the Lambda fail while attempting to create the certificate.

Reproduction Steps

Create an X509CertificatePem, giving it a common name longer than 64 characters
Attempt a cdk deploy

Error Log

This output is from running the integration tests found in the integ directory:

  66/101 | 11:46:49 PM | CREATE_FAILED        | Custom::RFDK_X509Generator                  | RenderStructWFS1/RenderQueueCertPEM1604523630410737353/Default/Default (RenderStructWFS1RenderQueueCertPEM16045236304107373535BCBE002) Failed to create resource. Command failed: openssl req -passout env:CERT_PASSPHRASE -newkey rsa:2048 -days 1095 -out /tmp/tmp.WaO7Ph/cert.csr -keyout /tmp/tmp.WaO7Ph/cert.key -subj /CN=renderqueue.RFDKInteg-WFS1-ComponentTier1604523630410737353.local/O=AWS/OU=Thinkbox
Generating a 2048 bit RSA private key
........................................................+++
.............................................................+++
unable to write 'random state'
writing new private key to '/tmp/tmp.WaO7Ph/cert.key'
-----
problems making Certificate Request
139781724301216:error:0D07A097:asn1 encoding routines:ASN1_mbstring_ncopy:string too long:a_mbstr.c:158:maxsize=64

Error: Command failed: openssl req -passout env:CERT_PASSPHRASE -newkey rsa:2048 -days 1095 -out /tmp/tmp.WaO7Ph/cert.csr -keyout /tmp/tmp.WaO7Ph/cert.key -subj /CN=renderqueue.RFDKInteg-WFS1-ComponentTier1604523630410737353.local/O=AWS/OU=Thinkbox
Generating a 2048 bit RSA private key
........................................................+++
..

Environment

The openssl command is being run inside a Lambda using AL2 with the Lambda layer published by RFDK, which installs "OpenSSL 1.0.2k-fips for Amazon Linux 2".

Other

Fixing this should include doing the validation for any other constraints put on the fields of a distinguished name.

This is 🐛 Bug Report

Deadline Monitor unable to connect to workers to get logs

My team is having issues with getting access to worker logs when the workers are created using the Spot Event Plugin. They don't stream their logs to CloudWatch, and it looks like the local port (that Deadline Monitor would connect to in order to access logs) on each worker is randomized. That means that the Security Group for the workers has to open up all ports - so it isn't very secure.

Is there any chance of either making the port be static instead of random? Any other solution would also be acceptable - so long as it doesn't involve opening all of the ports.

Reproduction Steps

Error Log

Environment

**CDK CLI Version : 1.66.0
**CDK Framework Version: 1.66.0
**RFDK Version: 0.18
Deadline Version:
Node.js Version:
OS :
Language (Version):

Other

This is 🐛 Bug Report

Error in documentation

Step 10 of this readme is not correct. This is a copy paste issue from the TypeScript documentation:
https://github.com/aws/aws-rfdk/blob/mainline/examples/deadline/All-In-AWS-Infrastructure-Basic/python/README.md

Required IAM permissions

Customers in enterprise AWS environments need to know in advance the IAM permissions that are required:

to deploy RFDK in a default configuration
while a default RFDK configuration is operating

This is a 📕 documentation issue

Worker Fleet integration tests failing to create ASG's

The Worker Fleet integration tests are failing when trying to create autoscaling groups named similarly to WorkerStructWF1Worker3ASG803D4B7C and WorkerStructWF1Worker2ASG6D4C29FF, both with the reason:

Received 1 FAILURE signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement

Reproduction Steps

Checkout our release candidate branch bump/0.18.0
Setup integ/test-config.sh with these values:
- DEADLINE_VERSION='10.1.10.6'
- LINUX_DEADLINE_AMI_ID='ami-05d4887175201bde8'
- WINDOWS_DEADLINE_AMI_ID='ami-09c712180218564f2'
- SKIP_deadline_01_repository_TEST=true
- SKIP_deadline_02_renderQueue_TEST=true
Configure your AWS credentials
Run yarn run e2e from the directory integ

Error Log

2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: BEGIN - ip-###-###-###-###\ec2-user
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: Operating System: Linux
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: CPU Architecture: x86_64
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: CPUs: 2
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: Video Card: Cirrus Logic GD 5446
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: Deadline Worker 10.1 [v10.1.10.6 Release (1a2a926fa)]
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: AccessDeniedException occurred while fetching tags for the instance:
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: Got Access Denied when trying to DescribeTags in EC2 Instance EC2 Instance. Please make sure your user has the 'iam:GetUser' IAM Permission to make these error messages better.
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: Please make sure your IAM user <unknown> has the following IAM Permission(s) to access EC2 Instance.
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: ec2:DescribeTags
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: (Deadline.AWS.AWSPortalAccessDeniedException)
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: Is tracked by resource tracker: true
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:20: Scanning for auto configuration
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:23: Auto Configuration: No auto configuration for Repository Path could be detected, using local configuration
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:23: Connecting to repository
2020-10-08T17:12:26.551-05:00	2020-10-08 22:12:23: Could not connect to Deadline Repository: The configured root CA ('/var/lib/Thinkbox/Deadline10/gateway_certs/ca.crt') does not exist.
2020-10-08T17:12:31.551-05:00	2020-10-08 22:12:23: Deadline Worker will try to connect again in 10 seconds...
2020-10-08T17:12:33.806-05:00	2020-10-08 22:12:33: Could not connect to Deadline Repository: The configured root CA ('/var/lib/Thinkbox/Deadline10/gateway_certs/ca.crt') does not exist.
2020-10-08T17:12:38.551-05:00	2020-10-08 22:12:33: Deadline Worker will try to connect again in 10 seconds...
2020-10-08T17:12:43.812-05:00	2020-10-08 22:12:43: Could not connect to Deadline Repository: The configured root CA ('/var/lib/Thinkbox/Deadline10/gateway_certs/ca.crt') does not exist.
2020-10-08T17:12:48.551-05:00	2020-10-08 22:12:43: Deadline Worker will try to connect again in 10 seconds...
2020-10-08T17:12:54.069-05:00	2020-10-08 22:12:53: Could not connect to Deadline Repository: The configured root CA ('/var/lib/Thinkbox/Deadline10/gateway_certs/ca.crt') does not exist.
 2020-10-08T17:12:58.551-05:00	2020-10-08 22:12:53: Deadline Worker will try to connect again in 10 seconds...

Environment

CDK CLI Version: 1.66.0
CDK Framework Version: 1.66.0
RFDK Version: 0.18.0 (release candidate)
Deadline Version: 10.1.10.6
Node.js Version: 12.18.3
OS : AL2
Language (Version): TypeScript (~4.0.3)

Other

This is 🐛 Bug Report

Allow TS examples to build without requiring a full package build

Currently to build our TS examples like All-In-AWS-Infrastructure-Basic you need to first run the build.sh script in the base directory of the repo. This is cumbersome and time consuming, so if we can set out repo up to avoid this, it would be easier to use the examples.

Use Case

When someone wants to use a TS example, they need to follow these steps:

Run build.sh from the base directory, waiting for the full install, build, and tests to run for all the packages in the repo.
Navigate to the example directory and run yarn run build to execute the build.

This can be error prone and the error messaging for trying to run the example build before the full build doesn't make the solution obvious.

It would be ideal if we could simplify this to just being able to run yarn and yarn build from the examples directory and not have to worry about the base directory's build.

Proposed Solution

We need to do these things to get this to work:

Make packages/aws-rfdk/tsconfig.json a static file rather than being generated by build.sh.
Change the build script target for the examples to use tsc -b instead of just tsc.
We've updated some example to contain the extra steps to get the build to work using the build.sh script, so those need to be updated to use the simplified method.

tsconfig.js

We currently have this line in build.sh to generate the tsconfig.json for the aws-rfdk package:

/bin/bash scripts/generate-aggregate-tsconfig.sh > tsconfig.json

We originally had this because it's what CDK does to avoid having to add and update tsconfig.json files in all of their packages. Since we made the decision to switch away from multiple packages in a repo (we had core and deadline as their own packages rather than in a shared package), we don't really need this feature anymore. The tsconfig.json file content shouldn't need to change often, so it can be made into a static file and delete this script.

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Windows worker fleets failing to start

Trying to deploy a worker fleet (or new instance into an existing worker fleet) that uses any Windows AMI is failing due to a bug in a script that we execute on the host as part of its initialization process (in the UserData). The nature of the bug means that it affects all versions of RFDK in production, and all worker fleets that try to deploy using an AMI with a Windows operating system.

The script that is failing attempts to install and configure the CloudWatch agent onto the instance. Because this script fails, the script that is supposed to configure the Deadline worker to connect to the render queue and start it never gets executed and the health check fails, causing the CDK deployment to fail and roll back.

Since the failure prevents CloudWatch from setting up properly, no logs are uploaded to CloudWatch and viewable from the AWS Console, and the host gets terminated.

Log statement seen that signals we're falling back to the latest version of the CloudWatch agent rather than the version we try to pin to: https://github.com/aws/aws-rfdk/blob/mainline/packages/aws-rfdk/lib/core/scripts/powershell/configureCloudWatchAgent.ps1#L26
Error message that was observed: https://github.com/aws/aws-rfdk/blob/mainline/packages/aws-rfdk/lib/core/scripts/powershell/configureCloudWatchAgent.ps1#L52

Environment

CDK CLI Version: all
CDK Framework Version: all
RFDK Version: all
Deadline Version: all

This is 🐛 Bug Report

Create a Construct to supply Deadline installers

We would like a construct which determines the S3 URI's for the Deadline installers and Deadline Docker recipes. The version of Deadline for these can be either specified by the user or defaulted to the latest version.

For now, this will be integrated with the Repository construct to get the repository installer for it to use. In the future we can extend it out to be used by other constructs that require Deadline installers or the Docker recipes as well.

Use Case

Currently to get the repository installer, we use hard-coded values that make assumptions about where the installer should be found in S3:

aws-rfdk/packages/aws-rfdk/lib/deadline/lib/version.ts

Lines 293 to 310 in 75d0f9f

    
           const installerBucket = Bucket.fromBucketName(this, 'ThinkboxInstallers', ExactVersion.INSTALLER_BUCKET); 
        
           const { majorVersion, minorVersion, releaseVersion, patchVersion } = versionComponents; 
        
           this.majorVersion = majorVersion; 
        
           this.minorVersion = minorVersion; 
        
           this.releaseVersion = releaseVersion; 
        
           const fullVersionString = this.fullVersionString(patchVersion); 
        
           const objectKey = `Deadline/${fullVersionString}/Linux/DeadlineRepository-${fullVersionString}-linux-x64-installer.run`; 
        
           this.linuxInstallers = { 
        
             patchVersion, 
        
             repository: { 
        
               s3Bucket: installerBucket, 
        
               objectKey, 
        
             }, 
        
           };

This would provide a more robust way of determining the installer locations

Proposed Solution

A VersionQuery Construct that contains a CustomResource that downloads the version index file and parses it, then returns the required information about the installers for a specific Deadline version.

The Repository construct will then take the versioned installers supplied by the VersionQuery construct, and use the repo installer.

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Implement using Construct.construct to stay ahead of deprecation of Construct.node

Upstream CDK has deprecated the use of Construct.node in favor of Construct.construct. We'll need to update places where we access through Construct.node to use Construct.construct instead.

Reference: aws/aws-cdk#9557

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

stage-deadline errors for RFDK 0.18.0

Description

There are two issues with the stage-deadline script in the RFDK 0.18.0 release.

1. Failed import of `aws-sdk`

Running the stage-deadline script in isolation is useful for quickly importing Deadline assets for use by the app, and also is used in our integration tests, and is also currently the expected way to stage Deadline for a Python app. This does not work for RFDK 0.18.0 and it outputs the error:

Cannot find module 'aws-sdk'

Reproduction Steps

Clone the RFDK
cd to the examples/deadline/All-In-AWS-Infrastructure-Basic/python directory
Install dependencies with python setup.py install
Execute npx [email protected] stage-deadline 10.1.9.2

Actual Result

internal/modules/cjs/loader.js:968
  throw err;
  ^

Error: Cannot find module 'aws-sdk'
Require stack:
- /local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/dynamodb/composite-table.js
- /local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/dynamodb/index.js
- /local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/custom-resource/dynamo-backed-resource.js
- /local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/custom-resource/index.js
- /local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/version-provider/handler.js
- /local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/version-provider/index.js
- /local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/bin/stage-deadline.js
- /local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/bin/stage-deadline
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:965:15)
    at Function.Module._load (internal/modules/cjs/loader.js:841:27)
    at Module.require (internal/modules/cjs/loader.js:1025:19)
    at require (internal/modules/cjs/helpers.js:72:18)
    at Object.<anonymous> (/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/dynamodb/composite-table.js:10:19)
    at Module._compile (internal/modules/cjs/loader.js:1137:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1157:10)
    at Module.load (internal/modules/cjs/loader.js:985:32)
    at Function.Module._load (internal/modules/cjs/loader.js:878:14)
    at Module.require (internal/modules/cjs/loader.js:1025:19) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/dynamodb/composite-table.js',
    '/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/dynamodb/index.js',
    '/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/custom-resource/dynamo-backed-resource.js',
    '/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/lib/custom-resource/index.js',
    '/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/version-provider/handler.js',
    '/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/lib/core/lambdas/nodejs/version-provider/index.js',
    '/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/bin/stage-deadline.js',
    '/local/home/painec/.npm/_npx/18416/lib/node_modules/aws-rfdk/bin/stage-deadline'
  ]
}

Expected Result

The script executes and downloads Deadline assets to the stage folder

Workaround

Typescript apps can locally run npx stage-deadline 10.1.9.2, but Python apps cannot, so for Python the user has to stage Deadline elsewhere and copy the staging folder over

Environment

RFDK Version: 0.18.0
Deadline Version: 10.1.9.2
Node.js Version: 12.18.3
OS : AL2

Other

I tried installing the missing module with npm install -g aws-sdk but it didn't help.

2. HTTP 301 Redirect from HTTP -> HTTPS causes abort

When the aws-sdk is available in the node search path, a different error message is displayed:

Expected status code 200, but got 301

Reproduction Steps

Clone the RFDK
Build the project (./build.sh)
cd examples/deadline/All-In-AWS-Infrastructure-Basic/ts
Run npx stage-deadline

Actual Result

Expected status code 200, but got 301

Expected Result

The script executes and downloads Deadline assets to the stage folder

This is 🐛 Bug Report

Update version bump script to add CDK version to changelog

Running the version bump script should update the changelog to include a section that displays the version of CDK that the new version of RFDK is depending on.

Use Case

Currently the process of adding the version of CDK is done manually after the bump script is ran. We would like to automate this process and combine it into the bump script.

Proposed Solution

This can be accomplished by pulling the CDK version from the package.json using node and then using sed in bash.sh to find the correct spot in the changelog and add in the new section.

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

configureWorker.sh attempts to start Launcher in GUI mode

It seems we have an issue in configureWorker.sh where we attempt to start the Launcher in GUI mode if it was not installed as a service which throws the following error in the cloud-init script:

QXcbConnection: Could not connect to display 
QXcbConnection: Could not connect to display 
"/tmp/assets/1234567bc1d4fd78061765b059dcc8e32568828e5cf479b08115489651491c8f.sh: line 137:  3296 Aborted                 ""$DEADLINE_LAUNCHER"""

I believe the workaround is to include the -nogui flag to the parameters

Reproduction Steps

Build a Worker AMI where the Launcher has not been installed as a service
Use the example from What is RFDK? and plug this AMI id into the configuration

Error Log

1612823870830,DEBUG: The Worker is also capable of decompressing responses using Brotli
1612823870830,Successfully set ip-10-248-3-190 to listen on port 56032
1612823870830,+ port_offset=1
1612823870830,+ service --status-all
1612823870830,+ grep -q 'Deadline 10 Launcher'
1612823870830,+ DEADLINE_LAUNCHER=/opt/Thinkbox/Deadline10/bin/deadlinelauncher
1612823871080,+ /opt/Thinkbox/Deadline10/bin/deadlinelauncher -shutdownall
1612823871080,No Launcher to shutdown
1612823871080,+ sudo killall -w deadlineworker
1612823871080,deadlineworker: no process found
1612823871080,+ true
1612823871330,+ /opt/Thinkbox/Deadline10/bin/deadlinelauncher
1612823871330,daemon was not added to the CommandLineParser.
1612823871330,daemon was not added to the CommandLineParser.
1612823871830,DEBUG: The Worker is also capable of decompressing responses using Brotli
1612823871830,Auto Configuration: Picking configuration based on: ip-10-248-3-190.example.com / 10.248.3.190
1612823871830,"Auto Configuration: No auto configuration could be detected, using local configuration"
1612823871830,Launcher Thread - Launcher thread initializing...
1612823871830,Launcher Thread - opening remote TCP listening port 17000
1612823871830,Launcher Thread - creating local listening TCP socket on an available port...
1612823871830,Launcher Thread - local TCP port bound to: [::1]:34433
1612823871830,Launcher Thread -  updating local listening port in launcher file: 34433
1612823871830,Launcher Thread - Launcher thread listening on port 17000
1612823871830,upgraded was not added to the CommandLineParser.
1612823871830,upgradefailed was not added to the CommandLineParser.
1612823871830,service was not added to the CommandLineParser.
1612823872080,daemon was not added to the CommandLineParser.
1612823872830,Launching Worker: 
1612823872830,Launcher Thread - Remote Administration is now disabled
1612823877414,Launcher Thread - Automatic Updates is now disabled
1612823881834,QXcbConnection: Could not connect to display 
1612823881834,QXcbConnection: Could not connect to display 
1612823882334,"/tmp/assets/1234563bc1d4fd78061765b059dcc8e32568828e5cf479b08115489651491c8f.sh: line 137:  3296 Aborted                 ""$DEADLINE_LAUNCHER"""
1612823882334,Feb 08 22:38:02 cloud-init[2513]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [134]
1612823882334,Feb 08 22:38:02 cloud-init[2513]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
1612823882334,Feb 08 22:38:02 cloud-init[2513]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
1612823886414,"Cloud-init v. 19.3-5.amzn2 finished at Mon, 08 Feb 2021 22:38:02 +0000. Datasource DataSourceEc2.  Up 109.73 seconds"

Environment

**CDK CLI Version : Unknown
**CDK Framework Version: Unknown
**RFDK Version: Main
**Deadline Version: 10.1.13
**Node.js Version: Unknown
**OS : Unknown
**Language (Version): Python (Unknown)

Other

This is 🐛 Bug Report

Auto-Approval for Dependabot

Having to rubber stamp all the Dependabot issues to allow them to be merged is cumbersome. We need to integrate the auto-approve GitHub action in the aws-rfdk repo that will auto-approve any MR's created by Dependabot that successfully build.

Add construct for Deadline Webservice

I would like to have as well a Deadline web service running in my render farm.

Use Case

I use deadline web service to submit jobs via HTTP api programatically.
As well read info about jobs/taks for monitoring.

Proposed Solution

Implement a construct to add Deadline Webservice into the service stack.

MountableEFS.mountToLinuxInstance causes all future userdata commands to not run

When you run the method MountableEFS.mountToLinuxInstance it adds user datacommand to download the file mountEfs.sh. It then executes this code to perform the mounting. The last line in this file is exit $?, since this script is being run in the same shell as the main Userdata and not it's own subshell this exits the entire userdata shell causing all lines after it in the userdata to not execute.

In order to fix this we need to either:

Run mountEfs.sh in a subshell
Remove the call to exit from mountEFS.

Reproduction Steps

This can be reproduced by doing the following:

Create a MountableEFS file system
Create an instance
Call fs.mount_to_linux_instance(instance)
Call instance.user_data.add_commands(<your commands here>)
Deploy your app and see if the extra commands run.

Error Log

I am seeing the User data exit before running my additional commands

Environment

**CDK CLI Version :1.57.0
**CDK Framework Version:1.57.0
**RFDK Version:0.17
**Deadline Version:10.1.9.2

This is 🐛 Bug Report

Expose option to select subnets for deployment of HealthMonitor

As the title says, expose a SubnetSelection option in the HealthMonitor construct.

Use Case

Building out a bespoke subnet topology where there may be many private subnets, but a dedicated subnet specifically for workers & its health monitor. Network traffic between subnets is tightly controlled by network access control lists.

Proposed Solution

Follow the usual vpcSubnets pattern that's followed in all of the other constructs. The option should control the placement of the load balancers.

Other

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

RenderQueue endpoint does not expose fully qualified domain name for the load balancer

RenderQueue constructs with TLS enabled require a PrivateHostedZone to be created on the farm's VPC. The hostname and zoneName on the zone create a fully qualified domain name for referring encrypted traffic to the renderQueue. In the RFDK, the endpoint property on the RenderQueue construct exposes the queue's address. But this address is taken from the load balancer's DNS name, so for TLS-enabled render queue's, it doesn't show the fully qualified domain name.

Reproduction Steps

Create a RenderQueue construct, providing in the properties external TLS as traffic encryption and a PrivateHostedZone as hostname
Examine the exposed .endpoint.hostname property on the RenderQueue

Expected

Hostname on the render queue matches the fully qualified domain name for the private zone

Result

Hostname contains the DNS name for the load balancer with no zone name

Environment

CDK CLI Version : 1.61.1
CDK Framework Version: 1.61.1
RFDK Version: 0.17
Deadline Version: 10.1.9.2
Node.js Version: 12.18.3
OS : Linux
Language (Version): TypeScript

Other

I tried patching myself by adjusting line 422 of render-queue.ts to use the loadBalancerFQDN variable to set the endpoint's address but this caused it to fail one of the unit tests, so investigation by someone with more familiarity might be needed.

This is 🐛 Bug Report

Deadline VersionQuery fails with poor error message

When using the VersionQuery construct to query for a version of Deadline that does not exist, then the error message that you get does not clearly indicate what the problem is.

Reproduction Steps

Either VersionQuery with a non-existent Deadline version (ex: 10.1.99), or

In the RFDK repository, modify scripts/getSupportedDeadlineVersion.ts to add versionString: "10.1.99" to the props of the call to getVersionUris
From the jsii/superchain docker container in the root directory of the RFDK repo, run: node ./scripts/getSupportedDeadlineVersions.ts

Error Log

When using the VersionQuery:

Failed to create resource. Cannot convert undefined or null to object
TypeError: Cannot convert undefined or null to object
at Function.keys (<anonymous>)
at VersionProvider.getRequestedUriVersion (/var/task/lib/version-provider/version-provider.js:150:46)
at VersionProvider.getUrisForPlatform (/var/task/lib/version-provider/version-provider.js:117:21)
at VersionProvider.getVersionUris (/var/task/lib/version-provider/version-provider.js:44:40)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async VersionProviderResource.doCreate (/var/task/version-provider/handler.js:30:47)
at async VersionProviderResource.handler (/var/task/lib/custom-resource/simple-resource.js:41:27)

When using getSupportedDeadlineVersion.ts:

ERROR - Cannot convert undefined or null to object

Environment

CDK CLI Version : 1.75.0
CDK Framework Version: 1.75.0
RFDK Version: 0.21.0
Deadline Version: 10.1.11
Node.js Version: N/A
OS : Linux
Language (Version): all

This is 🐛 Bug Report

Add Example Application for configuring a farm to use Deadline's SEP

Currently there isn't an example in the RFDK for configuring the Spot event Plugin. While long term this is not our recommended way of auto scaling for now this is a stepping stone.

This will involve creating a new example farm and a writeup on the additional manual steps that are required to configure a render farm.

This is a 📕 documentation issue

Document how to connect Deadline Monitor to Render Queue

Customers need to be able to connect the Deadline Monitor to the render farms that are deployed with RFDK, yet there is currently no documentation explaining this is done.

This was originally requested in #105 (comment)

This is a 📕 documentation issue

Render Farm Dashboard construct to monitor Functional and Billing Metrics

Use Case

The render farm created and deployed using RFDK can grow and become more complex for a customer to monitor all of its components. RFDK should provide a way so that a user (Ops engineer) can monitor whole render farm from a single place.

Proposed Solution

The proposal is to add additional methods to each deadline construct which returns CDK Widget(s) to monitor.

For eg. a worker-fleet construct can expose a methods to return

LogQueryWidget for error logs.
GraphWidget for Fleet size, memory & CPU usages.
GraphWidget for no. of running jobs/tasks, completed vs in progress jobs/tasks.

This feature also adds a construct named ‘RenderFarmDashboard’, which creates a cloudwatch dashboard with necessary widgets for all the high-level deadline constructs eg. Respository, Render Queue, Worker Fleet & UBL Server.

A user can:

either modify this ‘RenderFarmDashboard’ by adding/removing widgets of their interest,
or create a new Dashboard and add the widgets by fetching it from different deadline constructs.

This way, a user will have a single place to monitor all the required metrics/alarms, include running costs for all the components in Render Farm.

Other

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

RFDK support for other AWS partitions

Originally posted by @zxkane in #176 (comment)

Any plan to mirror the Deadline installers to other aws partitions?

The Deadline installers and Docker recipes are currently published to an S3 bucket that exists in the main aws partition. These are not accessible to the China or GovCloud regions. We also need to investigate whether other functional parts of RFDK will work in these partitions.

Architectural diagram

Customers would like to see an architectural diagram of what a default RFDK deployment looks like on AWS. Using standard AWS iconography would be helpful.

This is a 📕 documentation issue

DocDB version is not locked to v3.6 for Deadline

Officially, Deadline only supports version 3.6 of the MongoDB API. Amazon DocumentDB has just GA'd their 4.0 engine ( https://aws.amazon.com/blogs/database/introducing-amazon-documentdb-with-mongodb-compatibility-4-0/ ).

According to the CloudFormation documentation, the DocumentDB cluster defaults to the latest engine version when being created -- this is now 4.0.0. In a quick test, it looks like the default is still to create a 3.6.0 engine, but I think we can safely assume that will change.

Other

Fix will be to simply specify the engine version for Document DB as 3.6.0.

This is 🐛 Bug Report

Bad filepath in Repository when using VersionQuery

There is an unresolved CDK token in the file path that the Repository generates for container-based connectors. The result is a REPO_URI being passed to the container that looks like:

             {
                "Name": "REPO_URI",
                "Value": "file:///opt/Thinkbox/DeadlineRepository-1.888154589708898e+289"
              },

Oddly, it doesn't seem to cause issues in some regions (us-west-2) but did in others (ca-central-1).

This is 🐛 Bug Report

Support for NFS filesystems in Deadline Render Farm

This is a feature request for support of NFS filesystems within the Deadline render farm. Specifically, for use as the Repository Filesystem and support for using one as an asset filesystem on worker nodes.

Use Case

Basically flexibility. The current default EFS filesystem for the repository filesystem can become a bottleneck under certain extreme workloads. In these cases the ability to switch to a different filesystem type is required to get past the bottleneck.

Proposed Solution

The interface that the RFDK provides for the Repository's filesystem is generic enough to support other filesystem types; they just need to have integrations created for them.

An initial solution for NFS support is implemented in -- https://github.com/ddneilson/aws-rfdk/tree/mountable_nfs -- however it is not yet ready for integration into the RFDK. Specifically, the MountableNfs class -- https://github.com/ddneilson/aws-rfdk/blob/mountable_nfs/packages/aws-rfdk/lib/core/lib/mountable-nfs.ts This solution specifically only supports insecure/unauthenticated NFS.

The solution should support authenticated users (ex: kerberos) before becoming a part of the RFDK, but that requires work on integrating authentication providers into the RFDK first.

Other

Note: To use the MountableNfs that is in the fork, simply create an instance of MountableNfs and pass it as the fileSystem property on the deadline.Repository construct.

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Add Helper Method for Configuring SSM

It's very common to require access to a host in a render farm for troubleshooting. A simple way of connecting to a host is the AWS Systems Manager Session Manager. It would be nice to have a helper method that applies the required permissions for using Session Manager.

Use Case

An example of where this can be used is on a worker fleet that is having issues. I would like to be able to pass the worker fleet into the helper method and then be able to go into the AWS console and connect to the host using the Session Manager to be able to access the host from the browser without having to configure ssh keys and permissions.

Proposed Solution

A helper method that applies this policy:

new PolicyStatement(
    actions: [
        'ssmmessages:CreateControlChannel',
        'ssmmessages:CreateDataChannel',
        'ssmmessages:OpenControlChannel',
        'ssmmessages:OpenDataChannel',
        'ssm:UpdateInstanceInformation',
    ],
    resources: ['*']
)

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

dependabot unnecessarily trying to update cdk toolchain packages

Dependabot creates PRs to automatically update packages that are part of the CDK/JSII toolchain. Our process is to update these dependencies manually when we are updating to use the latest CDK version since we want the tool versions we use to match what is used to produce our target version of CDK.

For this reason, we should configure dependabot to ignore these dependencies.

Construct for configuring Deadline Workers

We've had an ask from an internal team for a convenient way to configure an arbitrary Deadline Worker in the same way that we configure the workers within a WorkerInstanceFleet.

Use Case

Creating a Launch Template for use with Deadline's Spot Event Plugin

Proposed Solution

Refactoring the CloudWatch log stream & Worker settings steps out of the WorkerInstanceFleet code into a self-contained WorkerConfiguration construct/class.

Other

N/A

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Enable integration tests to be run in a pipeline

Currently the integration tests are configured in a way that allows people with the aws-rfdk repository checked out locally to manually configure and run them. However, if someone wants to automate them to be run in the testing step of a pipeline, it isn't currently possible. We need some tooling around allowing that to be possible.

Use Case

I want to be able to set a pipeline that pulls the aws-rfdk repository, runs the build and package, and then starts the integration tests with the artifacts creating from the packaging step.

Proposed Solution

To accomplish this we need to:

Package the integration tests and anything they require to be used during the build/package step so they are accessible alongside the aws-rfdk package in any following pipeline steps.
Write a script that can be used as the entry point for the testing step that will do any necessary configuration and then execute rfdk-integ-e2e.sh to run the tests.
Make sure that if anything fails, the tests clean up after themselves.

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Cyclic dependency when using UsageBasedLicensing

Given a tiered architecture like in the RFDK examples where the UsageBasedLicensing is in a prerequisite tier of the WorkerInstanceFleet, attempts to use UBL's grantPortAccess() method result in a cyclic dependency between the Worker's stack & the UBL stack. This cyclic dependency is caused by the security group ingress rule that must be created on the UBL's security group.

Reproduction Steps

Add a grantPortAccess() call to an RFDK's example app.

Error Log

Cyclic dependency.

Environment

CDK CLI Version : 1.72.0
CDK Framework Version: 1.72.0
RFDK Version: 0.20.0
Deadline Version: 10.1.11
Node.js Version: any
OS : any
Language (Version): all

Other

I will fix this.

This is 🐛 Bug Report

Upgrade the "All in AWS" python example

❓ General Issue

After bumping to 0.18.0 we noticed that our bump script doesn't bump the python example. The RFDK version used in examples/deadline/All-In-AWS_Infrastructure-Basic/python should be manually bumped to 0.18.0. The CDK version should be bumped to 1.66.0 as well, to match the version used across the rest of the repository.

Move Lambdas Directory

I propose that we move the lambdas folder up a directory level, out of core, to be at the same level as core and deadline.

Use Case

Right now we have a nice lib directory inside core/lambdas/nodejs that has some very useful classes, but we wouldn't be able to use them in a Lambda if the Lambda itself was put under deadline/lambdas/nodejs.

Proposed Solution

Having lambdas at the same directory level as core and deadline, instead of individual lambdas directories in those other directories would allow us to more share our Lambdas library with Lambdas that are meant to be used by classes in both core and deadline.

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

	const installerBucket = Bucket.fromBucketName(this, 'ThinkboxInstallers', ExactVersion.INSTALLER_BUCKET);

	const { majorVersion, minorVersion, releaseVersion, patchVersion } = versionComponents;

	this.majorVersion = majorVersion;
	this.minorVersion = minorVersion;
	this.releaseVersion = releaseVersion;

	const fullVersionString = this.fullVersionString(patchVersion);
	const objectKey = `Deadline/${fullVersionString}/Linux/DeadlineRepository-${fullVersionString}-linux-x64-installer.run`;

	this.linuxInstallers = {
	patchVersion,
	repository: {
	s3Bucket: installerBucket,
	objectKey,
	},
	};

aws / aws-rfdk Goto Github PK

aws-rfdk's Introduction

Render Farm Deployment Kit on AWS (RFDK)

Table of Contents

Reporting Bugs/Feature Requests

Security issue notifications

Contributing

Code of Conduct

Licensing

aws-rfdk's People

Contributors

Stargazers

Watchers

Forkers

aws-rfdk's Issues

Use Case

Other

Reproduction Steps

Error Log

Environment

Reproduction Steps

Environment

Other

Reproduction Steps

Error Log

Environment

Other

Reproduction Steps

Error Log

Environment

Solution

Use Case

Proposed Solution

Other

Use Case

Proposed Solution

Use Case

Proposed Solution

Other

Reproduction Steps

Error Log

Environment

Other

Use Case

Proposed Solution

Other

Use Case

Proposed Solution

Use Case

Proposed Solution

Other

Use Case

Proposed Solution

Use Case

Proposed Solution

Other

Reproduction Steps

Error Log

Environment

Other

Reproduction Steps

Error Log

Environment

Other

Reproduction Steps

Error Log

Environment

Other

Use Case

Proposed Solution

tsconfig.js

Environment

Use Case

Proposed Solution

Description

1. Failed import of aws-sdk

Reproduction Steps

Actual Result

Expected Result

Workaround

1. Failed import of `aws-sdk`