azure-synapse-analytics-workshop-400's Issues
L400 – Lab 6 Exercise 1 – Documentation
This exercise seems to highlight that there is 4 basic things that need to be preformed. But when you log into the Notebook there is about 51 Cells that need to be run. Does running these Cells complete the items needed to complete this lab? I would suggest a better connection between what needs to be preformed and what we are doing in the lab exercise.
Lab 2 / Exercise 1 / Task 1 / Step 4
This step is no longer needed
Lab 3/Exercise 2/Step 5/Step 2
The query with clustered index column retrieved the results in 1 sec vs 15 secs took with the CCI option.
This is contrary to what is mentioned in the steps.
L400 - Lab 6 – Exercise 1
When running the lab job executions in lab 6 Exercise 1, Cell 7 returns an error. I wonder if in a previous lab there is an AD integration. Or there is some steps missing in this lab.
Create Spark temporary views for sales and products fails with an authentication error (cell 7). See Below.
Error : com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user ''. ClientConnectionId:f934fdeb-4dbd-4e17-b8b4-f3e4c90d00fd at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258) at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:256) at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:108) at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:4290) at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3157) at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$100(SQLServerConnection.java:82) at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3121) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2026) at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866) at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:569) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper$$anonfun$3.apply(SQLAnalyticsJDBCWrapper.scala:70) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper$$anonfun$3.apply(SQLAnalyticsJDBCWrapper.scala:70) at scala.util.Try$.apply(Try.scala:192) at com.microsoft.spark.sqlanalytics.utils.SQLAnalyticsJDBCWrapper.createConnection(SQLAnalyticsJDBCWrapper.scala:70) at com.microsoft.spark.sqlanalytics.utils.Utils$.createConnection(Utils.scala:287) at com.microsoft.spark.sqlanalytics.read.SQLAnalyticsReader.com$microsoft$spark$sqlanalytics$read$SQLAnalyticsReader$$connection$lzycompute(SQLAnalyticsReader.scala:39)
Appears that we are missing instructions for authentication pre-requisite, or this was not set up correctly as part of the lab: https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-using-azure-active-directory-authentication?view=sql-server-ver15
Lab 2 / Exercise 1 / Task 3 / Step 3
You are telling the user to leave configurations as they are, but for every field you say "Compression type: Select none", "Column delimiter: Select Comma (,)", etc. It should not say "select" because if feels like we need to change something
Lab 1 / Exercise 1 / Task 2 / Step 1
You're telling the user to create a new dataset, but you are not saying how to do it (clicking the ellipsis next to Datasets). It could be OK for a L400 course, but I'd reconsider for the sake of the user experience.
L400 – Lab 6 – Documentation
This lab can be very confusing. A user can spend a little bit of time digging though all of the text just trying to figure out where to go to perform the lab. I would highlight better that this lab needs to be performed on “asaworkspace03” and that in the upper right you can locate “Synapse Analytics”. For example there is some much text about ONNX, but only half a line telling you were to go to perform the lab.
Day 2/Lab2/Exercise 2/Part1
DBCC PDW_SHOWSPACEUSED('wwi_perf.Sale_Hash')
Running this commands throws the following error:
Failed to execute query. Error: Could not find object 'Sale_Hash' or you do not have permission.
Total execution time: 00:00:00.311
DW Optimization/Exercise/Task1/Step 2
Even though it is a level 400 content, it will be good if link certain concepts to the docs page for better understanding of the user. For e.g sys.dm_pdw_exec_requests . if we can link it to :https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-pdw-exec-requests-transact-sql?view=aps-pdw-2016-au7
DW Optimization/Exercise 1/Task1/Step 4
It is not clear from the instruction on what the user needs to do to use the Explain function to get the data movement recommendation and the cost of query operations
Lab 1 / Exercise 1 / Task 1 / Step 2
DW Optimization/Exercise 1/Task 2
Please add that the users cannot generate the query plan from Synapse Studio and have to go to SSMS to download the query plan
Empty Storage Containers with Cosmos DB Data
The cosmos DB documents are not copying over via the AZ Copy. It appears there may not be any data below. :
https://solliancepublicdata.blob.core.windows.net/wwi-02/online-user-profiles-02/
Would you please verify if this is the correct path? I can modify the code below to use the updated path.
The relevant powershell code is as follows:
foreach ($dataDirectory in $dataDirectories.Keys) {
$vals = $dataDirectories[$dataDirectory].tostring().split(",");
$source = $publicDataUrl + $vals[1];
$path = $vals[0];
$destination = $dataLakeStorageBlobUrl + $path + $destinationSasKey
Write-Information "Copying directory $($source) to $($destination)"
& $azCopyCommand copy $source $destination --recursive=true
DW Optimization Exercise1/Task1
Please explicitly state the the query needs to run on SQL Pool . Initially, I didn't realize, it was running on SQL On Demand and was throwing error
DW Optimizer/Lab 1/Exercise 2/Step 6
DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD ( 'wwi_perf.mvCustomerSales' )
Unable to run this query as it says invalid object name:
Failed to execute query. Error: Invalid object name 'SQLPool01.wwi_perf.mvCustomerSales'.
Total execution time: 00:00:00.288
Lab 1 / Exercise 3 / Task 5 / Step 5
Lab 2 / Exercise 4 / Step 8
DW Optimizer/Task 2/Step 4
The complex query is throwing an error:
SELECT
AVG(TotalProfit) as AvgMonthlyCustomerProfit
FROM
(
SELECT
S.CustomerId
,D.Year
,D.Month
,SUM(S.TotalAmount) as TotalAmount
,AVG(S.TotalAmount) as AvgAmount
,SUM(S.ProfitAmount) as TotalProfit
,AVG(S.ProfitAmount) as AvgProfit
FROM
[wwi_perf].[Sale_Partition01] S
join [wwi].[Date] D on
D.DateId = S.TransactionDateId
WHERE
D.Year in (2010, 2011, 2012, 2013, 2014)
GROUP BY
S.CustomerId
,D.Year
,D.Month
) T
Lab 1 / Exercise 4 / Task 2 / Step 19
The pipeline run is failing for me with an error message:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorSqlDWCopyCommandError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=SQL DW Copy Command operation failed with error 'COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.',Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.,Source=.Net SqlClient Data Provider,SqlErrorNumber=105211,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=105211,State=1,Message=COPY statement failed because non nullable column 'TransactionDate' cannot have the NULL value as its default.,},],'",
"failureType": "UserError",
"target": "Copy sales",
"details": []
}
DW Optimizer/Exercise2/Task2/Step 4
-
Same issue, I can't see the query plan and it asks me to download SSMS to see the query plan.
-
Also, Its not clear where in the query plan do you see the optimization done due to the materialized view
Lab 1 / Exercise 2 / Task 2 / Step 1
Minor, but worth noticing: You're asking the user to "navigate" to the folder where the parquet file is, but note that we're already there because of the previous exercise. Typically in other Microsoft courses, you would add "navigate to the folder if needed"
L400 - Day 3 - Lab 6 - Exercise 1
When attempting to check if XGBoost is properly installed in the notebook, the following error is returned:
Exception : Not support to display this type of data: <class 'list'>
Traceback (most recent call last):
File "", line 891, in
File "", line 718, in display
File "", line 671, in _convert
Exception: Not support to display this type of data: <class 'list'>
Lab 1 / Exercise 4 / Task 1 / Header
It feels like the second paragraph is duplicated and conveys the same idea as the first
Task 1: Configure workload management classification
When loading a large amount of data, it is best to run only one load job at a time for fasted performance. If this isn't possible, run a minimal number of loads concurrently. be sure that you allocate enough memory to the pipeline session. To do this, increase the resource class of a user which has permissions to rebuild the index on this table to the recommended minimum.
For fastest loading speed, run only one load job at a time. If that isn't feasible, run a minimal number of loads concurrently. If you expect a large loading job, consider scaling up your SQL pool before the load.
Lab 1 / Exercise 6 / Task 2 / Step 14
Lab 2 / Exercise 2 / Task 4 / Step 2
Missing step to expand the "Tables" item
Day 2/Lab2/Exercise 2/Part2
While I was able to pick another table and run the transaction, it would be great to add bit more info on the use case here. It didn't ring a bell for me on how the table distribution impacts query performance just by executing the steps
Lab 1 / Exercise 3 / Task 2 / Step 4
Lab 2 / Exercise 2 / Task 1 / Step 6
Lab 2 / Exercise 1 / Task 1 / Step 3
Datasets
Please can you document and response to this issue on what datasets need to be on the datalake for these labs to be successful?
L400-Day 3-Lab 6-Exercise 1 Cell 7 line magic UsageError
Lab 1 / Exercise 3 / Task 1 / Step 4
Lab 2 / Exercise 2 / Task 3 / Step 13
DW Optimization/Exercise/Task1/Step 3
Small typo:
Check the structure of the Sale_Heap table, by right clicking on it in the Data bub and selecting New SQL script and then CREATE. Take a look at the script used to create the table:
Please correct it to Data Hub
CLI 2.0 Commands
In the 01-enviornment-setup.ps1 script, you are using not using the newest CLI 2 command. I had to update the following:
$userName = ((az account show) | ConvertFrom-JSON).user.name
Versus the depreciated command shown below, at line 65:
$userName = ((az ad signed-in-user show) | ConvertFrom-JSON).UserPrincipalName
Lab3/Exercise2/task3
The Setting of result Set cache as "On" is throwing a Syntax error. I am using the following command:
ALTER DATABASE [SQLPool01]
SET RESULT_SET_CACHING ON
Lab 2 / Exercise 3 / Task 3 / Step 7
Missing .\artifacts\environment-setup\labfiles\AzureCred.ps1
Please could you post an example of what this AzureCred.ps1 file should look like as it is missing from the repo. I'd like to get an idea of what I need to write to make the Repo work to create all of the environment and linked services.
DW Optimization/Exercise 1/Task1/Step 5
It will be good to explicitly state that the user need to run the query 👍 SELECT TOP 1000 * FROM
(
SELECT
S.CustomerId
,SUM(S.TotalAmount) as TotalAmount
FROM
[wwi_perf].[Sale_Heap] S
GROUP BY
S.CustomerId
) T
OPTION (LABEL = 'Lab03: Heap')
before querying the sys.pdm.dw. exec
Lab 3/Exercise 2/Step 4
It will be good to provide some guidance on what is the importance of this feature before the user begins this exercise. for e.g , why we need to turn on the stats, why is important for query optimizer
Lab 1 / Exercise 1 / Task 1 / Step 1
You're asking the user to go to the Synapse workspace, but you don't provide a link. I know this is L400, but it doesn't mean everyone knows the link to web.azuresynapse.net. It may be helpful to provide a link
L400 – Lab 6 – Broken links
It would appear that all links on (https://github.com/solliancenet/azure-synapse-analytics-workshop-400/blob/master/day-03/lab06-machine-learning.md#exercise-1---training-models ) are broken.
For example if you go to Exercise 2 Steps 1 and 3. Those links return an error. This seems to be the case for nearly all links on this page
Lab 1 / Exercise 6 / Task 2 / Step 19
You say we can check the execution of the pipeline "here", but you don't mention where that is. I went to Monitor-> Pipeline Runs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.