Code Monkey home page Code Monkey logo

nvidia-omniverse-nucleus-on-amazon-ec2's Introduction

NVIDIA Omniverse Nucleus on Amazon EC2

NVIDIA Omniverse is a scalable, multi-GPU, real-time platform for building and operating metaverse applications, based on Pixar's Universal Scene Description (USD) and NVIDIA RTX technology. USD is a powerful, extensible 3D framework and ecosystem that enables 3D designers and developers to connect and collaborate between industry-leading 3D content creation, rendering, and simulation applications. Omniverse helps individual creators to connect and enhance their 3D artistic process, and enterprises to build and simulate large scale virtual worlds for industrial applications.

With Omniverse, everyone involved in the lifecycle of 3D data has access to high-quality visualizations, authoring, and review tools. Teams do not need additional overhead to manage complex 3D data pipelines. Instead, they can focus on their unique contributions to bring value to the market. Non-technical stakeholders do not need to subject themselves to applications with steep learning curves, nor do results need to be compromised for the sake of iteration reviews.

To support distributed Omniverse users, Nucleus should be deployed in a secure environment. With on-demand compute, storage, and networking resources, AWS infrastructure is well suited to all spatial computing workloads, including Omniverse Nucleus. This repository provides the steps and infrastructure for an Omniverse Enterprise Nucleus Server deployment on Amazon EC2.

Contents

Prerequisites

To learn more, reference the official documentation from NVIDIA: https://docs.omniverse.nvidia.com/prod_nucleus/prod_nucleus/enterprise/cloud_aws_ec2.html

Architecture

architecture

Deployment

1. Download Nucleus Deployment Artifacts from NVIDIA

Place them in ./src/tools/nucleusServer/stack

For example: ./src/tools/nucleusServer/stack/nucleus-stack-2022.1.0+tag-2022.1.0.gitlab.3983146.613004ac.tar.gz

Consult NVIDIA documentation to find the appropriate packages.

Note This deployment has a templated copy of nucleus-stack.env located at ./src/tools/nucleusServer/templates/nucleus-stack.env this may need to be updated if NVIDIA makes changes to the nucleus-stack.env file packaged with their archive.

The same applies to NVIDIA's reverse proxy nginx.conf located at ./src/tools/reverseProxy/templates/nginx.conf

2. configure .env file

create ./.env

Set the following variables

  export APP_STACK_NAME=omni-app
  export AWS_DEFAULT_REGION=us-west-2

  # STACK INPUTS
  export OMNIVERSE_ARTIFACTS_BUCKETNAME=example-bucket-name
  export ROOT_DOMAIN=example-domain.com
  export NUCLEUS_SERVER_PREFIX=nucleus
  export NUCLEUS_BUILD=nucleus-stack-2022.1.0+tag-2022.1.0.gitlab.3983146.613004ac # from Step 1
  export ALLOWED_CIDR_RANGE_01=cidr-range-with-public-access
  export DEV_MODE=true

NOTE: This deployment assumes you have a public hosted zone in Route53 for the ROOT_DOMAIN, this deployment will add a CNAME record to that hosted zone

3. Run the deployment

The following script will run cdk deploy. The calling process must be authenticated with sufficient permissions to deploy AWS resources.

chmod +x ./deploy.sh
./deploy.sh

NOTE: deployment requires a running docker session for building Python Lambda functions

NOTE: It can take a few minutes for the instances to get up and running. After the deployment script finishes, review your EC2 instances and check that they are in a running state.

4. Test the connection

Test a connection to <NUCLEUS_SERVER_PREFIX>.<ROOT_DOMAIN> from within the ALLOWED_CIDR_RANGE set in the .env file. Do so by browsing to https://<NUCLUES_SERVER_PREFIX>.<ROOT_DOMAIN> in your web browser.

The default admin username for the Nucleus server is 'omniverse'. You can find the password in a Secrets Manager resource via the AWS Secrets Manager Console. Alternatively, from the Omniverse WebUI, you can create a new username and password.

Troubleshooting

Unable to connect to the Nucleus Server

If you are not able to connect to to the Nucleus server, review the status of the Nginx service, and the Nucleus docker stack. To do so, connect to your instances from the EC2 Console via Session Manager - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/session-manager.html.

  • On the Nginx Server, run sudo journalctl -u nginx.service, if this is produces no output the Nginx service is not running.

  • On the Nucleus server, run sudo docker ps, you should see a list of Nucleus containers up.

If there are issues with either of these, it is likely there was an issue with the Lambda and/or SSM run commands that configure the instances. Browse to the Lambda Console (https://us-west-2.console.aws.amazon.com/lambda/home?region=us-west-2#/functions) and search for the respective Lambda Functions:

  • STACK_NAME-ReverseProxyConfig-CustomResource
  • STACK_NAME-NucleusServerConfig-CustomResource

Review the function CloudWatch Logs. ​

No service log entries, or unable to restart nitro-enclave service

If there are issues with either of these, it is likely there was an issue with the Lambda and/or SSM run commands that configure the instances. Browse to the Lambda Console and search for the STACK_NAME-ReverseProxyConfig-CustomResource Lambda Function, then review the CloudWatch Logs.

At times the Reverse Proxy custom resource Lambda function does not trigger on a initial stack deployment. If the reverse proxy instance is in a running state, but there are now invocations/logs, terminate the instance and give the auto scaling group a few minutes to create another one, and then try again. Afterwards, check the CloudWatch Logs for the Lambda function: ReverseProxyAutoScalingLifecycleLambdaFunction

Additional Nginx Commands

View Nitro Enclaves Service Logs:

sudo journalctl -u nginx.service

Viewing Nginx Logs

sudo cat /var/log/nginx/error.log

sudo cat /var/log/nginx/access.log

Restart Nginx

systemctl restart nginx.service

Additional Nucleus server notes

Review NVIDIA's Documentation - https://docs.omniverse.nvidia.com/prod_nucleus/prod_nucleus/enterprise/installation/quick_start_tips.html

default base stack and config location: /opt/ove/

default omniverse data dir: /var/lib/omni/nucleus-data

Interacting with the Nucleus Server docker compose stack:

sudo docker-compose --env-file ./nucleus-stack.env -f ./nucleus-stack-ssl.yml pull

sudo docker-compose --env-file ./nucleus-stack.env -f ./nucleus-stack-ssl.yml up -d

sudo docker-compose --env-file ./nucleus-stack.env -f ./nucleus-stack-ssl.yml down

sudo docker-compose --env-file ./nucleus-stack.env -f ./nucleus-stack-ssl.yml ps

Generate new secrets

sudo rm -fr secrets && sudo ./generate-sample-insecure-secrets.sh

Getting Help

If you have questions as you explore this sample project, post them to the Issues section of this repository. To report bugs, request new features, or contribute to this open source project, see CONTRIBUTING.md.

Changelog

To view the history and recent changes to this repository, see CHANGELOG.md

Security

See CONTRIBUTING for more information.

License

This sample code is licensed under the MIT-0 License. See the LICENSE file.

References

NVIDIA Omniverse

Learn more about the NVIDIA Omniverse Platform

Omniverse Nucleus

Learn more about the NVIDIA Omniverse Nucleus

nvidia-omniverse-nucleus-on-amazon-ec2's People

Contributors

amazon-auto avatar dependabot[bot] avatar kellan-cartledge avatar krokoko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nvidia-omniverse-nucleus-on-amazon-ec2's Issues

Creating EC2 routes fail due to already existing when deploying.

Environment: Deploying using version v2023.1.0 of this repo in Visual Studio Code console using WSL ubuntu

  • I am under the impression that the deploy process is making concurrent requests to create the same or similar routes which is causing the deploy to fail.
  • Perhaps too many route tables are being created as appose to one with multiple routes.

Here is the error on the console.

omni-app-nucleus-dev: deploying... [1/1]
omni-app-nucleus-dev: creating CloudFormation changeset...
11:35:59 AM | CREATE_FAILED        | AWS::EC2::Route                           | VpcResourcesOmniVp...faultRouteB86AFBA2
Resource handler returned message: "AlreadyExists" (RequestToken: 080ab3df-22a7-5b69-db67-1cecf711a294, HandlerErrorCode: AlreadyExists)

11:35:59 AM | CREATE_FAILED        | AWS::EC2::Route                           | VpcResourcesOmniVp...faultRouteA291E5F8
Resource handler returned message: "AlreadyExists" (RequestToken: 269005d6-5bb1-ed5b-0e80-20e66be29dac, HandlerErrorCode: AlreadyExists)
Resource handler returned message: "AlreadyExists" (RequestToken: 269005d6-5bb1-ed5b-0e80-20e66be29dac, HandlerErrorCode: AlreadyExists)

Resource handler returned message: "AlreadyExists" (RequestToken: 269005d6-5bb1-ed5b-0e80-20e66be29dac, HandlerErrorCode: AlreadyExists)

11:35:59 AM | CREATE_FAILED        | AWS::EC2::Route                           | VpcResourcesOmniVp...faultRoute118D220B
Resource handler returned message: "AlreadyExists" (RequestToken: 8fd72df0-6e87-020a-5d61-52a01412c72e, HandlerErrorCode: AlreadyExists)

I ran the deploy script, but bootstrapping was with failure with message below. I have tried to delete the CDKToolKit stack from Cloudfomation, remove "cdk" buckets from S3, but it still could not fix the issue. Pleas he

⏳ Bootstrapping environment aws://xxxxxxxxxx/us-west-2...
Trusted accounts for deployment: (none)
Trusted accounts for lookup: (none)
Using default execution policy of 'arn:aws:iam::aws:policy/AdministratorAccess'. Pass '--cloudformation-execution-policies' to customize.
CDKToolkit: creating CloudFormation changeset...
12:28:32 PM | CREATE_FAILED | AWS::IAM::Policy | ImagePublishingRoleDefaultPolicy
Unable to retrieve Arn attribute for AWS::ECR::Repository, with error message null

❌ Environment aws://xxxxxxxxxxx/us-west-2 failed bootstrapping: Error: The stack named CDKToolkit failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Unable to retrie
ve Arn attribute for AWS::ECR::Repository, with error message null
at FullCloudFormationDeployment.monitorDeployment (/home/ssm-user/.nvm/versions/node/v16.17.1/lib/node_modules/aws-cdk/lib/api/deploy-stack.ts:496:13)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at /home/ssm-user/.nvm/versions/node/v16.17.1/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:626:24
at async Promise.all (index 0)
at CdkToolkit.bootstrap (/home/ssm-user/.nvm/versions/node/v16.17.1/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:623:5)
at initCommandLine (/home/ssm-user/.nvm/versions/node/v16.17.1/lib/node_modules/aws-cdk/lib/cli.ts:357:12)

The stack named CDKToolkit failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Unable to retrieve Arn attribute for AWS::ECR::Repository, with error message null

Use new version of nucles-stack cause configuration error

I use nucleus-stack-2023.2.3 and updated nucleus-stack.env and nginx.conf. CDK run without error. When I connect to server, it asks for login. After login, it said {nucleus}.{domain}:443 service is not accessible. Please check your network connection and the server status with administrator.

The reverse proxy and all containers run without problem.

Do you have any idea for this error?

Banny

Possible extension of this deployment to incorporate Nvidia Deep Search extension

It would be a great addition to allow for the Nvidia Deep Search extension to be added to this VNC. If it could not be deployed at the same time, Some documentation of how it could be added to this VNC setup would be very helpful.

Currently I am unsure if a new EC2 instance should be added for the deep search service and the OpenSearch service.

Here is more info of the service design: https://docs.omniverse.nvidia.com/services/latest/services/deepsearch/design.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.