Cloud Pak for Data 3.0 on AWS and Azure

Cloud Pak for Data is an end to end platform that helps organizations in their journey to AI. It enables data engineers, data stewards, data scientists, and business analysts to collaborate using an integrated multiple-cloud platform. Cloud Pak for Data uses IBM’s deep analytics portfolio to help organizations meet data and analytics challenges. The required building blocks (collect, organize, analyze, infuse) for information architecture are available using Cloud Pak for Data on Azure.

Cloud Pak for Data uses cloud native services and features including VNets, VPCs, Availability Zones, security groups, Managed Disks, and Load Balancers to build a highly available, reliable, and scalable cloud platform.

This deployment guide provides step-by-step instructions for deploying IBM Cloud Pak for Data on a Red Hat OpenShift Container Platform 4.3 cluster on AWS and Azure.

This reference deployment provides Terraform scripts to deploy Cloud Pak for Data on a new Red Hat OpenShift Container Platform 4.3 cluster on AWS and Azure. This cluster includes:

A Red Hat OpenShift Container Platform cluster created in a new or existing VPC on Red Hat CoreOS (RHCOS) instances, using the Red Hat OpenShift Installer Provisioned Infrastructure.
A highly available storage infrastructure with Portworx or OpenShift Container Storage. You also have the option to select NFS for Azure and Elastic File System for AWS.
Scalable OpenShift compute nodes running Cloud Pak for Data services. See Services for the services that are enabled in this deployment.

Cost and licenses

The deployment module includes configuration parameters that you can customize. See AWS and Azure deployment topology for more details. Some of these parameters, such as instance type and count, will affect the cost of deployment. For cost estimates, see the pricing page for each AWS and Azure service you will be using. Prices are subject to change. This deployment requires a Red Hat OpenShift subscription and a Cloud Pak for Data subscription. You can obtain a 60-day trial license. See the prerequisites section.

Prerequisites

Step 1. Sign up for a Red Hat Subscription

This deployment requires a Red Hat subscription. You’ll need to provide your OpenShift Installer Provisioned Infrastructure pull secret.

If you don’t have a Red Hat account, you can register on the Red Hat website. (Note that registration may require a non-personal email address). To procure a 60-day evaluation license for OpenShift, follow the instructions at Evaluate Red Hat OpenShift Container Platform. The OpenShift pull secret should be downloaded and the file location be made available to Terraform script parameters.

Step 2. Cloud Pak for Data Subscription

You will need to have a Cloud Pak for Data entitlement API key to download images from the IBM entitled Cloud Pak registry. If you don't have a paid entitlement, you can create a 60 day trial subscription key. Note: After 60 days contact IBM Cloud Pak for Data sales.

Step 3. Storage Subscription

You can select one of the two container storages while installing this Quickstart.

Note: You also have the option to select NFS for Azure or Elastic File System for AWS in which case there is no additional storage subscription required.

Portworx

When you select Portworx as the persistent storage layer, you will need to specify the install spec from your Portworx account. You can generate a new spec using the Spec Generator. Note that the Portworx trial edition expires in 30 days after which you need to upgrade to an Enterprise Edition.

OpenShift Container Storage (OCS) Subscription

The Red Hat OCS license is linked as a separate entitlement to your RedHat subscription. If you do not have a separate subscription for OCS, a 60-day trial version is installed. Note that OCS v4.x is only available for AWS.

Deployment topology

See AWS topology for more details for AWS.

See Azure topology for more details for Azure.

Resource Requirements for each service.

The table lists the resource requirements for each of the services, that will decide the number of compute nodes that will be needed for the deployment. Note that the base platform without any services installed uses 4 vCPUs.

Service Name	CPU cores (vCPUs)	Memory (in GB)
Watson Studio Local (non-HA)	12	48
Watson Knowledge Catalog (Small, non-HA)	27	104
Watson Machine Learning (Small)	16	64
Data Virtualization (Small)	16	64
Watson OpenScale (Small,includes WML)	30	120
Spark Engine	7	28
Cognos Dashboards Engine	4	16
Streams	0.8	17
Streams Flows	0.3	0.384
Db2 Warehouse (SMP)	9	102
Db2 Warehouse (MPP)	41	614
DataStage Enterprise Plus	6	24
Cognos Analytics	11	29
Db2 Advanced Edition	5	14
Decision Optimization	0.9	1.5
SPSS Modeler	11	84

How to Deploy

You need to have Terraform installed on your client.

See AWS deployment documentation for AWS deployment.

See Azure deployment documentation for Azure deployment.

Auto Scaling

The number of compute nodes in the cluster is controlled by MachineSets. The cluster will create new compute nodes using the machineset if:

A pod is unschedulable due to a lack of resources.
A node fails health check for 300 seconds.

Note: Health Checks do not run on the master nodes. See the RedHat documentation for details.

To manually scale up or scale down the cluster:

Find the MachineSet for the node in the region that you want to scale.

oc get machineset -n openshift-machine-api

To manually increase or decrease the nodes in a zone, set the replicas to the desired count:

oc scale --replicas=<number of nodes for the machineset> machineset <machineset> -n openshift-machine-api

Cloud Pak for Data Services

You can browse the various services that are available for use by navigating to the services catalog page in Cloud Pak for Data.

As part of the deployment, the following services can be enabled.

Watson Studio
Watson Knowledge Catalog
Watson Machine Learning
Data Virtualization
Watson OpenScale
Apache Spark
Cognos Dashboards
Streams
Streams Flows
Db2 Warehouse
DataStage Enterprise Plus
Cognos Analytics
Db2 Advanced Edition
Decision Optimization
SPSS Modeler

To get information on various other services that are available, you can visit Cloud Pak for Data Service Catalog.

Activating Portworx using a key

After the installation is complete, activate the license:

PX_POD=$(oc get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
oc exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl license activate <activation id>

For more information see Portworx Licensing.

skwatra1992 / cp4d-deployment Goto Github PK

cp4d-deployment's Introduction