Code Monkey home page Code Monkey logo

aws-dms-workshop's Introduction


AWS DMS migration runbook with RDS, Redshift, S3 as target

AWS Database Migration Service

AWS Schema Conversion Tool

Oracle to PostgreSQL, Redshift, S3, Athena -- Lab Guide

mslogo

Dickson Yue, Solutions Architect

[email protected]

RUN THIS WORKSHOP IN AP-SOUTHEAST-1 (SINGAPORE)

Objective

In this lab, you will be performing a migration from Oracle to PostgreSQL using SCT and DMS

High Level Steps

  • Create a AWS CloudFormation stack

    • Source: RDS Oracle - snapshot

    • Targets: RDS PostgreSQL, Redshift, S3

    • IAM Role for DMS S3 access

  • Create AWS Database Migration Instances

  • Connect to your environment

  • Setup AWS Schema Conversion Tool

  • Convert the Oracle schema to PostgreSQL

  • Create Source Endpoint in AWS DMS

  • Create Target Endpoint in AWS DMS -- RDS PostgreSQL

  • Create Migration Task in AWS DMS -- RDS PostgreSQL target

  • Start the migration

  • Generate transactions on Oracle and see the data being migrated to PostgreSQL -- CDC

Optional

  • Create 2 Target Endpoints in AWS DMS -- Redshift and S3

  • Create 2 Migration Tasks in AWS DMS -- Redshift and S3 target

  • Create database and table in Athena data catalog

  • Query data in Redshift and Athena

Prerequisites

Install a SQL Client

Install SCT

Download JDBC Drivers

Or launch an EC2 Windows Instance in the Singapore Region

Please check on page 7 for the procedure

Create AWS Cloudformation Stack

In this step, you will launch a AWS Cloudformation template they will setup the following resources needed for this lab.

Instructions

OracleDBName ORCL
OracleDBPassword oraadmin123
OracleDBStorage 100
OracleInstanceType db.t2.medium
  • Target RDS PostgreSQL Database Configuration
RDSDBName postgres
RDSDBUsername postadmin
RDSDBPassword postadmin123
RDSInstanceType db.t2.medium
RDSDBStorage 100
  • Target Redshift Database Configuration
RsDBName dw
RsDBUsername rsadmin
RsDBPassword rsAdmin123
RsInstanceType dc1.large
  • Click Next

  • Tags

Key purpose
Value dms-lab
  • Click 'Next'

Checked "I acknowledge that AWS CloudFormation might create IAM resources."

  • Review and Click Create

-Optional-

Using an EC2 instance for Workshop tools

Launch a Windows EC2 instance from a pre-installed AMI

  • Connect on the AWS Console and go to Singapore Region

  • Go to the EC2 Services

  • Create and Launch an Instance

  • Go to "Community AMIs" to select the AMI with all tools pre-installed

  • Search and Select the AMI "DMS_Workshop_Win_Tools"

  • Select an Instance: t2.micro would be good for this workshop

  • Next: Configure Instance Details

  • Number of instances: 1

  • Network: Select the VPC created by the CloudFormation template (ie. vpc-XXXXXX | dmsworkshop)

  • Auto-assign Public IP: Enable

  • No change to other options

  • Next: Add Storage

  • Next: Add Tags

  • Next: Configure Security Group

  • Allow RDP (port: 3389), from source: "My IP"

  • Review and Launch

  • Launch if no change required.

  • Proceed without a key pair and Launch Instances

  • The instance is being provisioned and can take few minutes

  • Go to "View Instances"

  • Once the instance is in running state with Status Checks 2/2

  • Retrieve the public IP address

  • Connect using Microsoft Remote Desktop

  • Hostname: IP or DNS name retrieved from the console

  • Username: Administrator

  • Password: Qo;u@-4Tea!b.*wvnvughW-ll!8sy*vl

AWS Schema Conversion Tool

Install AWS Schema Conversion Tool (on your laptop)

  • In the Global Settings window,

    • Goto Drivers on left panel

    • Oracle Driver Path: Select the downloaded ojdbc jar file

    • PostgreSQL Driver Path: Select the downloaded postgresql jar file

    • Click OK to Proceed

Open Security Groups from Source / Target Database Instances

For you to access Source and Target databases, you will have to add your laptop to Oracle & Postgres Security Groups

https://ap-southeast-1.console.aws.amazon.com/ec2/v2/home?region=ap-southeast-1\#SecurityGroups:sort=groupId

Add following inbound rule to respective security groups

  • Oracle Port -- 1521 > Open to 'My IP'

  • Postgres Port -- 5432 > Open to 'My IP'

  • Redshift Port -- 5439 > Open to 'My IP'

Create a new SCT project

  • In AWS SCT, select File > New Project Wizard

  • Step 1 -- Select Source

Project Name DMS-Workshop-Oracle2PostgreSQL
Location Leave Default - Transactional Database (OLTP)
Source Database Engine Oracle
Target Database Engine Amazon RDS for PostgreSQL
  • Click OK to proceed

  • Step 2 -- Connect to Source Database

    • In the top of icon, select "Connect to Oracle"
Type SID
Server Name DNS name of your Oracle RDS instance
(follow next screen)
Server Port 1521
Oracle SID ORCL
User name dbmaster
Password oraadmin123

RDS endpoints


  • Click 'Test Connection' -- Make sure you get a 'Connection Successful' message

  • Click 'Next' to proceed

  • Step 3 -- Select Target

    • On the top icon, select "Connect to Amazon RDS for PostgreSQL"
Server Name DNS name of your RDS PostgreSQL instance
Database postgres
User name postadmin
Password postadmin123
  • Click 'Test Connection' -- Make sure you get a 'Connection Successful' message

  • Click 'Finish' to proceed

Run Schema Conversation In SCT

Review the project screen and familiarize yourself

  • Uncheck all schemas on the left except for the DMS_SAMPLE schema.

  • Select DMS_SAMPLE

  • Click 'Actions' > 'Create Report'

  • Go to the 'Summary' tab on the top and review the generated report

  • Look through what Oracle objects could be automatically converted and what could not be. Now, right click and click "Convert schema". The schema will be converted and shown on the PostgreSQL instance (it has not been applied yet).

  • Take a few minutes to review the objects being converted.

  • Since the majority of the objects which could not be converted are secondary objects like functions or procedures, right click on the created schema on the Right Panel and click "Apply to database". This will apply all those converted objects in the PostgreSQL target.

  • The above steps will convert all your Oracle objects into PostgreSQL objects. Objects which could not be converted automatically must be taken care of manually after migration at a later time.

  • At this point, most of the objects from your source Oracle databased has been converted to PostgreSQL target

Database migration Service

Create Database Migration Instance

Name dms-workshop-instance
Description dms instance for workshop
Instance Class dms.t2.medium
VPC [cf-stack-name]
Multi-AZ No
Publicly accessible Checked
  • Click 'Create Replication Instance' to proceed

Wait for a couple of minutes for the migration instance to start and change the status to 'available'

Create Source / Target Endpoints

Create Source Endpoint

Endpoint Type Source
Endpoint identifier dms-workshop-oracle
Source engine: oracle
Server name <oralce-rds-dns-endpoint>
get this from here
https://ap-southeast-1.console.aw
s.amazon.com/rds/home?region=ap-s
outheast-1#dbinstances
Port 1521
SSL Mode none
User name dbmaster
Password oraadmin123
SID ORCL
VPC [cf-stack-name]
Replication instance dms-workshop-instance
Refresh schemas after successful Checked
connection test

  • Click 'Run Test' -> ensure you get the 'Connection tested successfully' message

  • Click on 'Save' to proceed

Create Target Endpoint

| Endpoint Type | Target | | Endpoint identifier | dms-workshop-postgres | | Source engine: | postgres | | Server name | < postgres-rds-dns-endpoint> | | | | | | get this from here | | | https://ap-southeast-1.console.aw | | | s.amazon.com/rds/home?region=ap-s | | | outheast-1#dbinstances | | Port | 5432 | | SSL Mode | none | | User name | postadmin | | Password | postadmin123 | | Database Name | postgres | | VPC | [cf-stack-name] | | Replication instance | dms-workshop-instance | | Refresh schemas after successful | Checked | | connection test | |

  • Click 'Run Test' -> ensure you get the 'Connection tested successfully' message

  • Click on 'Save' to proceed

  • Once all both source and target database endpoints have been created and successfully tested, you can proceed to the next step.

Create DMS Migration Task

Enter these Details

  • Basic Info

Make sure your configuration looks like the image below

Replication instance dms-workshop-instance
Source endpoint dms-workshop-oracle
Target endpoint dms-workshop-postgres
Migration type Migrate existing data and replicate ongoing changes
Start task on create Checked

  • Task Settings
Target table preparation mode Do nothing
CDC stop mode Don't use custom CDC stop mode
Include LOB columns in replication Limited LOB Mode
Max LOB size (kb) 32KB
Enable logging Checked

Make sure your configuration looks like the image below

  • Table Mappings
Schema Name is DMS_SAMPLE
Table name is like %
Action Include
  • Click 'Add Selection Rule'

  • Under 'Transformation rules' section click on 'add transformation rule' (we will be creating 3 rules here)

  • Rule 1:

    • Target: 'Schema'

    • Schema name is: 'DMS_SAMPLE'

    • Action: 'make lower case'

Target Schema


Schema Name is DMS_SAMPLE Action Make lowercase

  • Click 'Add transformation rule'.

  • Rule 2:
Target Table
Schema Name is DMS_SAMPLE
Table Name is like %
Action Make lowercase
  • Click 'Add transformation rule'.

  • Rule 3:
Target Column
Schema Name is DMS_SAMPLE
Table Name is like %
Column name is like %
Action Make lowercase
  • Click 'Add transformation rule'.

  • Make sure your configuration looks like the image below.

  • Take few minutes to review the JSON text generated

  • Click on 'Create Task'

Wait for the task to get created and start running.

At this stage, the database migration task should load 100% of data from Oracle to Postgres. (This will usually take few 10s of minutes)

  • Monitoring the progress for your database migration task

    • Select your newly create database migration task

    • Click on 'Task monitoring' tab & review the cloud watch metrics for your task

    • Click on 'Table statistics' tab & review table level stats for your migration

[Executing transactions on the source to test CDC ]{.underline}

Once your task's initial load is completed. You might want to execute a few transactions on the source. So, connect to your source database as dbmaster using your favorite tool: SQLDeveloper, DBeaver or even SQL*Plus!

  • Execute the following to sell some tickets:

    • This is a stored procedure in Oracle, it will take ~ 3mins to perform 1000 transactions

exec ticketManagement.generateTicketActivity(0.01,1000);

Once the transactions are committed on source, you should see them on the target.

Check the status on your console > Task > Table Statistics

Once you've "sold" some tickets, you can execute the following to "transfer" some tickets


exec ticketManagement.generateTransferActivity(0.1,1000);

[Optional: Create DMS endpoints and task for Redshift]{.underline}

Create Target Endpoint

Endpoint Type Target
Endpoint identifier dms-workshop-redshift
Target engine: redshift
Server name* get this from here https://ap-southeast-1.console.aws.amazon.com/redshift/home?region=ap-southeast-1#cluster-list:
Port 5439
SSL mode none
User name rsadmin
Password rsAdmin123
Database name dw
  • Click 'Run Test' -> ensure you get the 'Connection tested successfully' message

  • Click on 'Save' to proceed

  • Once all both source and target database endpoints have been created and successfully tested, you can proceed to the next step.

Create DMS Migration Task

Enter these Details

  • Basic Info

Make sure your configuration looks like the image below

Task name dms-workshop-redshift
Replication instance dms-workshop-instance
Source endpoint dms-workshop-oracle
Target endpoint dms-workshop-redshift
Migration type Migrate existing data and replicate ongoing changes
Start task on create Checked
  • Task Settings
Target table preparation mode Do nothing
CDC stop mode Don't use custom CDC stop mode
Include LOB columns in replication Limited LOB Mode
Max LOB size (kb) 32KB
Enable logging Checked

Let sync only one table for simplicity

  • Table Mappings
Schema Name is DMS_SAMPLE
Table name is like SPORTING_EVENT_TICKET
Action Include
  • Click on 'Create Task'

Wait for the task to get created and start running.

At this stage, the database migration task should load 100% of data from Oracle to Postgres. (This will usually take few 10s of minutes)

  • Monitoring the progress for your database migration task

    • Select your newly create database migration task

    • Click on 'Task monitoring' tab & review the cloud watch metrics for your task

    • Click on 'Table statistics' tab & review table level stats for your migration

Query data in Redshift

Configure your SQL bench with new redshift

|-----------------------------------|-----------------------------------| | Driver | Redshift | | | (com.amazon.redshift.jdbc.Driver) | | | | | | If you don't have the driver, | | | add one and download from here | | | | | | <https://s3.amazonaws.com/redshif | | | t-downloads/drivers/RedshiftJDBC4 | | | 2-1.2.7.1003.jar> | | Url | jdbc:redshift://{your redshift | | | endpoint}.ap-southeast-1.redshift | | | .amazonaws.com:5439/dw | | | | | | https://ap-southeast-1.console.aw | | | s.amazon.com/redshift/home?region | | | =ap-southeast-1#cluster-list: | | User name | Rsadmin | | Password | rsAdmin123 |

Run the script below to verify


select count(*) from dms_sample.sporting_event_ticket;

Remark

Do you aware of we didn't do schema creation for redshift. DMS will create the schema if it doesn't exists. It is not prefer in most of the case as the data type might not be optimal. In addition, the distribution key and sort key design are very critical to the Redshift query performance. The suggestion is

[Optional: Create DMS endpoints and task for S3 ]{.underline}

Create Target Endpoint

Endpoint Type Target
Endpoint identifier dms-workshop-s3
Target engine: S3
Service Access Role ARN* arn:aws:iam::{accounted}:role/{ro
le-name}
get this from here
<https://console.aws.amazon.com/i
am/home?region=ap-southeast-1#/ro
les>
Target bucket name* {bucket-name}
Target bucket folder* dms
VPC [cf-stack-name]
Replication instance dms-workshop-instance
Refresh schemas after successful Checked
connection test

IAM DMS Role

AWS console -> IAM -> Role

https://console.aws.amazon.com/iam/home?region=ap-southeast-1#/roles

Filter: {cfn-stack-name}

S3

AWS console -> S3

Filter: {cfn-stack-name}

https://s3.console.aws.amazon.com/s3/home?region=ap-southeast-1

Create DMS Migration Task

Enter these Details

  • Basic Info

Make sure your configuration looks like the image below

Task name dms-workshop-s3
Replication instance dms-workshop-instance
Source endpoint dms-workshop-oracle
Target endpoint dms-workshop-s3
Migration type Migrate existing data and replicate ongoing changes
Start task on create Checked
  • Task Settings
Target table preparation mode Do nothing
CDC stop mode Don't use custom CDC stop mode
Include LOB columns in replication Limited LOB Mode
Max LOB size (kb) 32KB
Enable logging Checked

Let sync only one table for simplicity

  • Table Mappings
Schema Name is DMS_SAMPLE
Table name is like SPORTING_EVENT_TICKET
Action Include
  • Click on 'Create Task'

Wait for the task to get created and start running.

At this stage, the database migration task should load 100% of data from Oracle to Postgres. (This will usually take few 10s of minutes)

  • Monitoring the progress for your database migration task

    • Select your newly create database migration task

    • Click on 'Task monitoring' tab & review the cloud watch metrics for your task

    • Click on 'Table statistics' tab & review table level stats for your migration

Once the S3 DMS task is completed, you could verify the files in S3

In the Amazon S3 bucket , you will find the csv files.

Configure Athena


Go to Athena console

https://ap-southeast-1.console.aws.amazon.com/athena/home?force&region=ap-southeast-1#query

Create database

Run the script below

Create database if not exists dmsworkshop;

Create table, change the s3 location to you bucket

CREATE EXTERNAL TABLE IF NOT EXISTS
dmsworkshop.sporting_event_ticket (

`ID` int,

`SPORTING_EVENT_ID` int,

`SPORT_LOCATION_ID` int,

`SEAT_LEVEL` double,

`SEAT_SECTION` string,

`SEAT_ROW` string,

`SEAT` string,

`TICKETHOLDER_ID` int,

`TICKET_PRICE` double

)

ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

WITH SERDEPROPERTIES (

'serialization.format' = ',',

'field.delim' = ','

) LOCATION
's3://{bucket_name}/dms/DMS_SAMPLE/SPORTING_EVENT_TICKET/'

TBLPROPERTIES ('has_encrypted_data'='false');

Run the script below to verify


Select count(*) from dmsworkshop.sporting_event_ticket

Clean Up Your Lab Environment:

DO NOT FORGET TAKE DOWN YOUR ENVIRONMENT

  1. Stop and delete your database migration tasks in DMS

  2. Delete the source/target endpoints in DMS

  3. Delete your DMS replication instance

  4. Delete the cloud formation template

Delete CloudFormation stack from the CloudFormation console.

Appendix

Oracle - Get row count for all tables
SELECT table_name, num_rows

FROM dba_tables

WHERE owner = 'DMS_SAMPLE'

ORDER BY table_name;
Postgres
SELECT relname AS table_name, n_live_tup AS num_rows

FROM pg_stat_user_tables

WHERE schemaname = 'dms_sample'

ORDER BY table_name

Oracle - Command to get database size on disk
SELECT owner, SUM(bytes) / 1024 / 1024 Size_MB

FROM dba_segments

WHERE owner = 'DMS_SAMPLE'

group by owner;
Postgres - Command to get database size on disk
SELECT pg_size_pretty(CAST((SELECT SUM(pg_total_relation_size
(table_schema || '.' || table_name)) FROM
information_schema.tables WHERE table_schema = 'dms_sample') AS
BIGINT)) AS tables_schema_size

Drop dms_sample and restart dms task

drop schema dms_sample cascade;

aws-dms-workshop's People

Contributors

dicksonyue avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.