solliancenet / azure-synapse-analytics-workshop-400 Goto Github PK

TSQL 3.15% PowerShell 60.63% Jupyter Notebook 36.23%

azure-synapse-analytics-workshop-400's Issues

L400 – Lab 6 Exercise 1 – Documentation

This exercise seems to highlight that there is 4 basic things that need to be preformed. But when you log into the Notebook there is about 51 Cells that need to be run. Does running these Cells complete the items needed to complete this lab? I would suggest a better connection between what needs to be preformed and what we are doing in the lab exercise.

Lab 2 / Exercise 1 / Task 1 / Step 4

This step is no longer needed

Lab 3/Exercise 2/Step 5/Step 2

The query with clustered index column retrieved the results in 1 sec vs 15 secs took with the CCI option.

This is contrary to what is mentioned in the steps.

L400 - Lab 6 – Exercise 1

When running the lab job executions in lab 6 Exercise 1, Cell 7 returns an error. I wonder if in a previous lab there is an AD integration. Or there is some steps missing in this lab.
Create Spark temporary views for sales and products fails with an authentication error (cell 7). See Below.

Error : com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user ''. ClientConnectionId:f934fdeb-4dbd-4e17-b8b4-f3e4c90d00fd at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258) at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:256) at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:108) at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:4290) at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3157) at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$100(SQLServerConnection.java:82) at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3121) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2026) at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866) at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:569) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper$$anonfun$3.apply(SQLAnalyticsJDBCWrapper.scala:70) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper$$anonfun$3.apply(SQLAnalyticsJDBCWrapper.scala:70) at scala.util.Try$.apply(Try.scala:192) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper.createConnection(SQLAnalyticsJDBCWrapper.scala:70) at com.microsoft.spark.sqlanalytics.utils.Utils$.createConnection(Utils.scala:287) at com.microsoft.spark.sqlanalytics.read.SQLAnalyticsReader.com$microsoft$spark$sqlanalytics$read$SQLAnalyticsReader$$connection$lzycompute(SQLAnalyticsReader.scala:39)

Appears that we are missing instructions for authentication pre-requisite, or this was not set up correctly as part of the lab: https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-using-azure-active-directory-authentication?view=sql-server-ver15

Lab 2 / Exercise 1 / Task 3 / Step 3

You are telling the user to leave configurations as they are, but for every field you say "Compression type: Select none", "Column delimiter: Select Comma (,)", etc. It should not say "select" because if feels like we need to change something

Lab 1 / Exercise 1 / Task 2 / Step 1

You're telling the user to create a new dataset, but you are not saying how to do it (clicking the ellipsis next to Datasets). It could be OK for a L400 course, but I'd reconsider for the sake of the user experience.

L400 – Lab 6 – Documentation

This lab can be very confusing. A user can spend a little bit of time digging though all of the text just trying to figure out where to go to perform the lab. I would highlight better that this lab needs to be performed on “asaworkspace03” and that in the upper right you can locate “Synapse Analytics”. For example there is some much text about ONNX, but only half a line telling you were to go to perform the lab.

Day 2/Lab2/Exercise 2/Part1

DBCC PDW_SHOWSPACEUSED('wwi_perf.Sale_Hash')

Running this commands throws the following error:
Failed to execute query. Error: Could not find object 'Sale_Hash' or you do not have permission.
Total execution time: 00:00:00.311

DW Optimization/Exercise/Task1/Step 2

Even though it is a level 400 content, it will be good if link certain concepts to the docs page for better understanding of the user. For e.g sys.dm_pdw_exec_requests . if we can link it to :https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-pdw-exec-requests-transact-sql?view=aps-pdw-2016-au7

DW Optimization/Exercise 1/Task1/Step 4

It is not clear from the instruction on what the user needs to do to use the Explain function to get the data movement recommendation and the cost of query operations

Lab 1 / Exercise 1 / Task 1 / Step 2

You are asking the user to provide the name to the new linked service (to Azure Cosmos DB) as the same as the storage account name, but when I tried that, it said the "Name is not unique"

DW Optimization/Exercise 1/Task 2

Please add that the users cannot generate the query plan from Synapse Studio and have to go to SSMS to download the query plan

Empty Storage Containers with Cosmos DB Data

The cosmos DB documents are not copying over via the AZ Copy. It appears there may not be any data below. :

https://solliancepublicdata.blob.core.windows.net/wwi-02/online-user-profiles-02/

Would you please verify if this is the correct path? I can modify the code below to use the updated path.

The relevant powershell code is as follows:
foreach ($dataDirectory in $dataDirectories.Keys) {

            $vals = $dataDirectories[$dataDirectory].tostring().split(",");

            $source = $publicDataUrl + $vals[1];

            $path = $vals[0];

            $destination = $dataLakeStorageBlobUrl + $path + $destinationSasKey
            Write-Information "Copying directory $($source) to $($destination)"
            & $azCopyCommand copy $source $destination --recursive=true

DW Optimization Exercise1/Task1

Please explicitly state the the query needs to run on SQL Pool . Initially, I didn't realize, it was running on SQL On Demand and was throwing error

DW Optimizer/Lab 1/Exercise 2/Step 6

DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD ( 'wwi_perf.mvCustomerSales' )

Unable to run this query as it says invalid object name:

Failed to execute query. Error: Invalid object name 'SQLPool01.wwi_perf.mvCustomerSales'.
Total execution time: 00:00:00.288

Lab 1 / Exercise 3 / Task 5 / Step 5

At first, my chart showed differently than yours.

I noticed that in your chart, the categories are shown in DESC order. Mine are not ordered at all. To get the same results, I added an order by clause to that query

Lab 2 / Exercise 4 / Step 8

Maybe because I'm getting much less data (see previous issue I logged), from this step forward, I'm not getting any rows in the queries

Step 8:

Step 9:

Step 10:

Step 11:

Step 12:

DW Optimizer/Task 2/Step 4

The complex query is throwing an error:
SELECT
AVG(TotalProfit) as AvgMonthlyCustomerProfit
FROM
(
SELECT
S.CustomerId
,D.Year
,D.Month
,SUM(S.TotalAmount) as TotalAmount
,AVG(S.TotalAmount) as AvgAmount
,SUM(S.ProfitAmount) as TotalProfit
,AVG(S.ProfitAmount) as AvgProfit
FROM
[wwi_perf].[Sale_Partition01] S
join [wwi].[Date] D on
D.DateId = S.TransactionDateId
WHERE
D.Year in (2010, 2011, 2012, 2013, 2014)
GROUP BY
S.CustomerId
,D.Year
,D.Month
) T

Lab 1 / Exercise 4 / Task 2 / Step 19

The pipeline run is failing for me with an error message:

{
"errorCode": "2200",
"message": "ErrorCode=UserErrorSqlDWCopyCommandError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=SQL DW Copy Command operation failed with error 'COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.',Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.,Source=.Net SqlClient Data Provider,SqlErrorNumber=105211,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=105211,State=1,Message=COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.,},],'",
"failureType": "UserError",
"target": "Copy sales",
"details": []
}

DW Optimizer/Exercise2/Task2/Step 4

Same issue, I can't see the query plan and it asks me to download SSMS to see the query plan.
Also, Its not clear where in the query plan do you see the optimization done due to the materialized view

Lab 1 / Exercise 2 / Task 2 / Step 1

Minor, but worth noticing: You're asking the user to "navigate" to the folder where the parquet file is, but note that we're already there because of the previous exercise. Typically in other Microsoft courses, you would add "navigate to the folder if needed"

L400 - Day 3 - Lab 6 - Exercise 1

When attempting to check if XGBoost is properly installed in the notebook, the following error is returned:

Exception : Not support to display this type of data: <class 'list'>
Traceback (most recent call last):
File "", line 891, in
File "", line 718, in display
File "", line 671, in _convert
Exception: Not support to display this type of data: <class 'list'>

Lab 1 / Exercise 4 / Task 1 / Header

It feels like the second paragraph is duplicated and conveys the same idea as the first

Task 1: Configure workload management classification
When loading a large amount of data, it is best to run only one load job at a time for fasted performance. If this isn't possible, run a minimal number of loads concurrently. be sure that you allocate enough memory to the pipeline session. To do this, increase the resource class of a user which has permissions to rebuild the index on this table to the recommended minimum.

For fastest loading speed, run only one load job at a time. If that isn't feasible, run a minimal number of loads concurrently. If you expect a large loading job, consider scaling up your SQL pool before the load.

Lab 1 / Exercise 6 / Task 2 / Step 14

My destination column names are slightly different from yours. For some reason, it does not show the type after its name, like yours do

I double checked that my source and sink match yours

Lab 2 / Exercise 2 / Task 4 / Step 2

Missing step to expand the "Tables" item

Day 2/Lab2/Exercise 2/Part2

While I was able to pick another table and run the transaction, it would be great to add bit more info on the use case here. It didn't ring a bell for me on how the table distribution impacts query performance just by executing the steps

Lab 1 / Exercise 3 / Task 2 / Step 4

Just a note that it took me 2 minutes to run the operation, as opposed to the 6-8 minutes mentioned in the instructions

Lab 2 / Exercise 2 / Task 1 / Step 6

When I clicked "Import projections", I was shown a dialog

I picked the IR that we created in the beginning of the exercise and clicked "Turn on debug"

Lab 2 / Exercise 1 / Task 1 / Step 3

The Integration runtime setup dialog no longer shows "Azure, Self-Hosted". It only says "Azure"

Datasets

Please can you document and response to this issue on what datasets need to be on the datalake for these labs to be successful?

L400-Day 3-Lab 6-Exercise 1 Cell 7 line magic UsageError

When attempting to run Cell 7 in the lab, I get the following error:

UsageError: %%spark is not a line magic.

Lab 1 / Exercise 3 / Task 1 / Step 4

When trying to create the [external] schema, it tells me it already exists. It could be a bug in the lab guide, or it could be that someone already ran this lab and it wasn't cleaned up

Lab 2 / Exercise 2 / Task 3 / Step 13

It's unclear to me how we're changing the ProductCategory column in this transformation. Both before and after the Derived column task, I see Décor shown the same way:

Before:

After: (notice how the Revenue column is well formed, but no changes to the category you highlighted in the instructions)

DW Optimization/Exercise/Task1/Step 3

Small typo:

Check the structure of the Sale_Heap table, by right clicking on it in the Data bub and selecting New SQL script and then CREATE. Take a look at the script used to create the table:

Please correct it to Data Hub

CLI 2.0 Commands

In the 01-enviornment-setup.ps1 script, you are using not using the newest CLI 2 command. I had to update the following:

$userName = ((az account show) | ConvertFrom-JSON).user.name

Versus the depreciated command shown below, at line 65:
$userName = ((az ad signed-in-user show) | ConvertFrom-JSON).UserPrincipalName

Lab3/Exercise2/task3

The Setting of result Set cache as "On" is throwing a Syntax error. I am using the following command:
ALTER DATABASE [SQLPool01]
SET RESULT_SET_CACHING ON

Lab 2 / Exercise 3 / Task 3 / Step 7

For some reason, I'm seeing 5,216,822 rows as opposed to the 15M you are getting. It could be that I messed something up in the data flow, but I've reviewed everything and can't find any differences. I'd advise you to test it again to see if you get the same results

solliancenet / azure-synapse-analytics-workshop-400 Goto Github PK

azure-synapse-analytics-workshop-400's Issues

Recommend Projects

Recommend Topics

Recommend Org