Code Monkey home page Code Monkey logo

azure-synapse-analytics-workshop-400's Introduction

Azure Synapse Analytics Workshop (level 400, 4 days)

Event preflight checklist.

Workshop Agenda
  Day 1 Day 2 Day 3 Day 4
  Ingest & Transform Optimize, Query & Secure Work with Spark, Machine Learning, Monitor & Manage Challenge-based Learning
08:00 Azure Synapse Analytics 101 Design Presentation: DW Optimization (*) Design Presentation: Synapse Spark (*) Challenge Introduction (****)
08:15
08:30 Demo Walkthrough - Synapse Analytics In Action Begin POC Challenge (****)
08:45 Break Build Hands-on: DW Optimization Part 1 (***)
09:00 Design Presentation: Data Loading & Data Lake Organization (*) Build Hands-on: Machine Learning (***)
09:15
09:30 Break
09:45 Continue POC Challenge (****)
10:00 Q&A / Digital Cohort Activity: Data Lake Design & Security Considerations (**) Q&A / Digital Cohort Activity: DW Optimization (**) Break
10:15 Build Hands-on: Spark (***)
10:30 Break
10:45 Design Presentation: DW Optimization Part 2 (*)
11:00 Break Demo Walkthrough - CI/CD with Synapse Workspaces (**) Break
11:15 Build Hands-on: Data Integration Part 1 (***) Build Hands-on: DW Optimization Part 2 (***) Continue POC Challenge (****)
11:30
11:45 Main Break Main Break
12:00 Main Break Main Break
12:15
12:30
12:45 Design Presentation: Security (*) Design Presentation: Monitor & Manage (*)
13:00 Design Presentation: Data Transformations (*) Finish POC Challenge (****)
13:15 Q&A / Digital Cohort Activity: Monitor & Manage (**)
13:30 Q&A / Digital Cohort Activity: Security (**)
13:45 Q&A / Digital Cohort Activity: Data Engineering Discussion (**) Break
14:00 Build Hands-on: Data Integration Part 2 (***) Break Build Hands-on: Monitoring (***) Challenge Debrief (****)
14:15 Build Hands-on: Security (***)
14:30 Closing
14:45 Q&A
Closing (if 3-day event)
15:00 End of Day 1 End of Day 2 End of Day 3 End of Day 4
 
  Key  
  Design Build Deploy  
  * SME Presentations L400 Technical *** Guided virtual lab experience with remote SME assistance **** Challenge & team-based learning experience (hack)  
  ** Q&A Discussion and small group cohort activity      
     
  8 hr 45 min 7 hr 15 min 5 hr 30 min  
  address "why" design this way address the "how" to apply "how" and "why" learning to a scenario  
     
  LEVEL 400 LEVEL 200-300  

azure-synapse-analytics-workshop-400's People

Contributors

ciprianjichici avatar codingbandit avatar daronyondem avatar givenscj avatar jasonhorner avatar joelhulen avatar kcris avatar kylebunting avatar manesh-r avatar roxanagoidaci avatar srushti-714 avatar thetrainingboss avatar zoinertejada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-synapse-analytics-workshop-400's Issues

L400 - Day 3 - Lab 6 - Exercise 1

When attempting to check if XGBoost is properly installed in the notebook, the following error is returned:

Exception : Not support to display this type of data: <class 'list'>
Traceback (most recent call last):
File "", line 891, in
File "", line 718, in display
File "", line 671, in _convert
Exception: Not support to display this type of data: <class 'list'>

image

Lab 1 / Exercise 1 / Task 1 / Step 2

You are asking the user to provide the name to the new linked service (to Azure Cosmos DB) as the same as the storage account name, but when I tried that, it said the "Name is not unique"

image

Lab 1 / Exercise 1 / Task 2 / Step 1

You're telling the user to create a new dataset, but you are not saying how to do it (clicking the ellipsis next to Datasets). It could be OK for a L400 course, but I'd reconsider for the sake of the user experience.

Lab 2 / Exercise 2 / Task 1 / Step 6

When I clicked "Import projections", I was shown a dialog

image

I picked the IR that we created in the beginning of the exercise and clicked "Turn on debug"

DW Optimizer/Task 2/Step 4

The complex query is throwing an error:
SELECT
AVG(TotalProfit) as AvgMonthlyCustomerProfit
FROM
(
SELECT
S.CustomerId
,D.Year
,D.Month
,SUM(S.TotalAmount) as TotalAmount
,AVG(S.TotalAmount) as AvgAmount
,SUM(S.ProfitAmount) as TotalProfit
,AVG(S.ProfitAmount) as AvgProfit
FROM
[wwi_perf].[Sale_Partition01] S
join [wwi].[Date] D on
D.DateId = S.TransactionDateId
WHERE
D.Year in (2010, 2011, 2012, 2013, 2014)
GROUP BY
S.CustomerId
,D.Year
,D.Month
) T

Lab 2 / Exercise 3 / Task 3 / Step 7

For some reason, I'm seeing 5,216,822 rows as opposed to the 15M you are getting. It could be that I messed something up in the data flow, but I've reviewed everything and can't find any differences. I'd advise you to test it again to see if you get the same results

image

DW Optimization/Exercise 1/Task1/Step 5

It will be good to explicitly state that the user need to run the query 👍 SELECT TOP 1000 * FROM
(
SELECT
S.CustomerId
,SUM(S.TotalAmount) as TotalAmount
FROM
[wwi_perf].[Sale_Heap] S
GROUP BY
S.CustomerId
) T
OPTION (LABEL = 'Lab03: Heap')

before querying the sys.pdm.dw. exec

Missing .\artifacts\environment-setup\labfiles\AzureCred.ps1

Please could you post an example of what this AzureCred.ps1 file should look like as it is missing from the repo. I'd like to get an idea of what I need to write to make the Repo work to create all of the environment and linked services.

DW Optimization Exercise1/Task1

Please explicitly state the the query needs to run on SQL Pool . Initially, I didn't realize, it was running on SQL On Demand and was throwing error

Lab 2 / Exercise 4 / Step 8

Maybe because I'm getting much less data (see previous issue I logged), from this step forward, I'm not getting any rows in the queries

Step 8:
image

Step 9:
image

Step 10:
image

Step 11:
image

Step 12:
image

Datasets

Please can you document and response to this issue on what datasets need to be on the datalake for these labs to be successful?

DW Optimizer/Exercise2/Task2/Step 4

  1. Same issue, I can't see the query plan and it asks me to download SSMS to see the query plan.

  2. Also, Its not clear where in the query plan do you see the optimization done due to the materialized view

L400 - Lab 6 – Exercise 1

When running the lab job executions in lab 6 Exercise 1, Cell 7 returns an error. I wonder if in a previous lab there is an AD integration. Or there is some steps missing in this lab.
Create Spark temporary views for sales and products fails with an authentication error (cell 7). See Below.

Error : com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user ''. ClientConnectionId:f934fdeb-4dbd-4e17-b8b4-f3e4c90d00fd at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258) at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:256) at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:108) at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:4290) at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3157) at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$100(SQLServerConnection.java:82) at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3121) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2026) at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866) at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:569) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper$$anonfun$3.apply(SQLAnalyticsJDBCWrapper.scala:70) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper$$anonfun$3.apply(SQLAnalyticsJDBCWrapper.scala:70) at scala.util.Try$.apply(Try.scala:192) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper.createConnection(SQLAnalyticsJDBCWrapper.scala:70) at com.microsoft.spark.sqlanalytics.utils.Utils$.createConnection(Utils.scala:287) at com.microsoft.spark.sqlanalytics.read.SQLAnalyticsReader.com$microsoft$spark$sqlanalytics$read$SQLAnalyticsReader$$connection$lzycompute(SQLAnalyticsReader.scala:39)

Appears that we are missing instructions for authentication pre-requisite, or this was not set up correctly as part of the lab: https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-using-azure-active-directory-authentication?view=sql-server-ver15

DW Optimization/Exercise 1/Task1/Step 4

It is not clear from the instruction on what the user needs to do to use the Explain function to get the data movement recommendation and the cost of query operations

CLI 2.0 Commands

In the 01-enviornment-setup.ps1 script, you are using not using the newest CLI 2 command. I had to update the following:

$userName = ((az account show) | ConvertFrom-JSON).user.name

Versus the depreciated command shown below, at line 65:
$userName = ((az ad signed-in-user show) | ConvertFrom-JSON).UserPrincipalName

Lab 1 / Exercise 4 / Task 2 / Step 19

The pipeline run is failing for me with an error message:

{
"errorCode": "2200",
"message": "ErrorCode=UserErrorSqlDWCopyCommandError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=SQL DW Copy Command operation failed with error 'COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.',Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.,Source=.Net SqlClient Data Provider,SqlErrorNumber=105211,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=105211,State=1,Message=COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.,},],'",
"failureType": "UserError",
"target": "Copy sales",
"details": []
}

L400 – Lab 6 Exercise 1 – Documentation

This exercise seems to highlight that there is 4 basic things that need to be preformed. But when you log into the Notebook there is about 51 Cells that need to be run. Does running these Cells complete the items needed to complete this lab? I would suggest a better connection between what needs to be preformed and what we are doing in the lab exercise.

Day 2/Lab2/Exercise 2/Part2

While I was able to pick another table and run the transaction, it would be great to add bit more info on the use case here. It didn't ring a bell for me on how the table distribution impacts query performance just by executing the steps

Lab 1 / Exercise 6 / Task 2 / Step 14

My destination column names are slightly different from yours. For some reason, it does not show the type after its name, like yours do

image

I double checked that my source and sink match yours

image

image

Lab 3/Exercise 2/Step 4

It will be good to provide some guidance on what is the importance of this feature before the user begins this exercise. for e.g , why we need to turn on the stats, why is important for query optimizer

Lab 1 / Exercise 3 / Task 5 / Step 5

At first, my chart showed differently than yours.

image

I noticed that in your chart, the categories are shown in DESC order. Mine are not ordered at all. To get the same results, I added an order by clause to that query

image

Lab 1 / Exercise 4 / Task 1 / Header

It feels like the second paragraph is duplicated and conveys the same idea as the first

Task 1: Configure workload management classification
When loading a large amount of data, it is best to run only one load job at a time for fasted performance. If this isn't possible, run a minimal number of loads concurrently. be sure that you allocate enough memory to the pipeline session. To do this, increase the resource class of a user which has permissions to rebuild the index on this table to the recommended minimum.

For fastest loading speed, run only one load job at a time. If that isn't feasible, run a minimal number of loads concurrently. If you expect a large loading job, consider scaling up your SQL pool before the load.

DW Optimization/Exercise/Task1/Step 3

Small typo:

Check the structure of the Sale_Heap table, by right clicking on it in the Data bub and selecting New SQL script and then CREATE. Take a look at the script used to create the table:

Please correct it to Data Hub

Lab3/Exercise2/task3

The Setting of result Set cache as "On" is throwing a Syntax error. I am using the following command:
ALTER DATABASE [SQLPool01]
SET RESULT_SET_CACHING ON

Lab 1 / Exercise 3 / Task 1 / Step 4

When trying to create the [external] schema, it tells me it already exists. It could be a bug in the lab guide, or it could be that someone already ran this lab and it wasn't cleaned up
image

L400 – Lab 6 – Documentation

This lab can be very confusing. A user can spend a little bit of time digging though all of the text just trying to figure out where to go to perform the lab. I would highlight better that this lab needs to be performed on “asaworkspace03” and that in the upper right you can locate “Synapse Analytics”. For example there is some much text about ONNX, but only half a line telling you were to go to perform the lab.

Lab 3/Exercise 2/Step 5/Step 2

The query with clustered index column retrieved the results in 1 sec vs 15 secs took with the CCI option.

This is contrary to what is mentioned in the steps.

Lab 2 / Exercise 2 / Task 3 / Step 13

It's unclear to me how we're changing the ProductCategory column in this transformation. Both before and after the Derived column task, I see Décor shown the same way:

Before:
image

After: (notice how the Revenue column is well formed, but no changes to the category you highlighted in the instructions)
image

DW Optimizer/Lab 1/Exercise 2/Step 6

DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD ( 'wwi_perf.mvCustomerSales' )

Unable to run this query as it says invalid object name:

Failed to execute query. Error: Invalid object name 'SQLPool01.wwi_perf.mvCustomerSales'.
Total execution time: 00:00:00.288

Lab 1 / Exercise 1 / Task 1 / Step 1

You're asking the user to go to the Synapse workspace, but you don't provide a link. I know this is L400, but it doesn't mean everyone knows the link to web.azuresynapse.net. It may be helpful to provide a link

Lab 1 / Exercise 2 / Task 2 / Step 1

Minor, but worth noticing: You're asking the user to "navigate" to the folder where the parquet file is, but note that we're already there because of the previous exercise. Typically in other Microsoft courses, you would add "navigate to the folder if needed"

Lab 2 / Exercise 1 / Task 3 / Step 3

You are telling the user to leave configurations as they are, but for every field you say "Compression type: Select none", "Column delimiter: Select Comma (,)", etc. It should not say "select" because if feels like we need to change something

Empty Storage Containers with Cosmos DB Data

The cosmos DB documents are not copying over via the AZ Copy. It appears there may not be any data below. :

https://solliancepublicdata.blob.core.windows.net/wwi-02/online-user-profiles-02/

Would you please verify if this is the correct path? I can modify the code below to use the updated path.

The relevant powershell code is as follows:
foreach ($dataDirectory in $dataDirectories.Keys) {

            $vals = $dataDirectories[$dataDirectory].tostring().split(",");

            $source = $publicDataUrl + $vals[1];

            $path = $vals[0];

            $destination = $dataLakeStorageBlobUrl + $path + $destinationSasKey
            Write-Information "Copying directory $($source) to $($destination)"
            & $azCopyCommand copy $source $destination --recursive=true 

Day 2/Lab2/Exercise 2/Part1

DBCC PDW_SHOWSPACEUSED('wwi_perf.Sale_Hash')

Running this commands throws the following error:
Failed to execute query. Error: Could not find object 'Sale_Hash' or you do not have permission.
Total execution time: 00:00:00.311

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.