Code Monkey home page Code Monkey logo

fabric's Introduction

Fabric

The following code snippets may be executed in a Microsoft Fabric notebook. These scripts primarily leverage the semantic-link (a.k.a. SemPy) library.

This repo is intended to help business intelligence analysts/developers and data scientists become more familiar with notebooks (and therefore Python) in Microsoft Fabric and the potential benefits of using the Semantic Link library.

Requirements

Semantic-link library version 0.6.0 or higher

Load the Semantic-Link library inside of your notebook

%pip install semantic-link

Load Semantic-link into a custom Fabric environment

An even better way to ensure the Semantic-link library is available in your workspace/notebooks is to load it as a library in a custom Fabric environment. If you do this, you will not have to run the above '%pip install semantic-link' code every time in your notebook. Please follow the steps below to create an environment and load the Semantic-Link library.

Create a custom environment

  1. Navigate to your Fabric workspace
  2. Click 'New' -> More options
  3. Within 'Data Science', click 'Environment'
  4. Name your environment, click 'Create'

Add Semantic-Link as a library to the environment

  1. Within 'Public Libraries', click 'Add from PyPI'
  2. Enter 'Semantic-link' in the text box under 'Library'
  3. Click 'Save' at the top right of the screen
  4. Click 'Publish' at the top right of the screen
  5. Click 'Publish All'

Update your notebook to use the new environment (must wait for the environment to finish publishing)

  1. Navigate to your Notebook
  2. Select your newly created environment within the 'Environment' drop down in the navigation bar at the top of the notebook

Packages, functions and SemPy help

Useful packages for python notebooks (SemPy and more)

import sempy
import sempy.fabric as fabric
import pandas as pd
import numpy as np
import pyspark
import pyspark.sql.functions as F
import pyarrow.dataset as ds
import pyarrow.parquet as pq
import os
from delta.tables import DeltaTable
from powerbiclient import Report

Show the directory of all SemPy functions

import sempy
import sempy.fabric
dir(sempy.fabric)

Show useful information about a given function

import sempy
import sempy.fabric as fabric
help(fabric.list_datasets) #Replace 'list_datasets' within any function shown in dir(sempy.fabric) output

Show the version of a Python library within your notebook

import pkg_resources
library_name = 'semantic-link' #Enter the name of the library
version = pkg_resources.get_distribution(library_name).version
version

Show the latest version available of a Python library

import requests
library_name = 'semantic-link' #Enter the name of the library
url = f"https://pypi.org/pypi/{library_name}/json"
response = requests.get(url)
data = response.json()
latest_version = data["info"]["version"]
print(f"The latest version of '{library_name}' is: {latest_version}")

Identify if you have the latest version of a given library installed in your notebook

import pkg_resources
import requests

library_name = 'semantic-link' #Enter the name of the library
version = pkg_resources.get_distribution(library_name).version

url = f"https://pypi.org/pypi/{library_name}/json"
response = requests.get(url)
data = response.json()
latest_version = data["info"]["version"]

if version == latest_version:
    print(f"You have the latest version of '{library_name}' installed.")
else:
    print(f"A new version '{latest_version}' of the '{library_name}' library is available.")

Show all available libraries in your notebook

import pkg_resources
import pandas as pd
installed_packages = [pkg.key for pkg in pkg_resources.working_set]
title = 'Installed Packages'
df = pd.DataFrame({title: installed_packages})
display(df)

Refresh the TOM cache. If the semantic model has been updated during your notebook session, run this script to ensure you get the latest model metadata.

import sempy
import sempy.fabric as fabric
fabric.refresh_tom_cache()

Capacities

Show a list of all accessible capacities

import sempy
import sempy.fabric as fabric
x = fabric.list_capacities()
x

Gets the Artifact ID (of the notebook)

import sempy
import sempy.fabric as fabric
x = fabric.get_artifact_id()
x

Lakehouse

Gets the Lakehouse ID from the current lakehouse

import sempy
import sempy.fabric as fabric
x = fabric.get_lakehouse_id()
x

Gets the SQL Endpoint for a given workspace/lakehouse

import sempy
import sempy.fabric as fabric
import pandas as pd

def get_sql_endpoint(workspaceName = None, lakehouseName = None):

    if workspaceName == None:        
        workspaceID = fabric.get_workspace_id()
    else:
        workspaceID = fabric.resolve_workspace_id(workspaceName)
    if lakehouseName == None:
        lakehouseID = fabric.get_lakehouse_id()
    else:
        dfItems = fabric.list_items()
        dfItems = dfItems[dfItems['Display Name'] == lakehouseName and dfItems['Type'] == 'Lakehouse']
        lakehouseID = dfItems['Id'].iloc[0]

    workspaceID = fabric.get_workspace_id()
    lakehouseID = fabric.get_lakehouse_id()
    client = fabric.FabricRestClient()
    response = client.get(f"/v1/workspaces/{workspaceID}/lakehouses/{lakehouseID}")
    responseJson = response.json()
    sqlEndpoint = responseJson['properties']['sqlEndpointProperties']['connectionString']
    
    return sqlEndpoint

get_sql_endpoint()

Show a list of the tables in your lakehouse using the List Tables API

import sempy
import sempy.fabric as fabric
import pandas as pd
import json

def get_lakehouse_tables(workspaceId = None, lakehouseId = None):

    if workspaceId == None:
        workspaceId = fabric.get_workspace_id()
    if lakehouseId == None:
        lakehouseId = fabric.get_lakehouse_id()

    client = fabric.FabricRestClient()
    wName = fabric.resolve_workspace_name(workspaceId)
    response = client.get(f"/v1/workspaces/{workspaceId}/lakehouses/{lakehouseId}/tables")
    tableList = response.json()['data']
    dfItems = fabric.list_items()
    dfItems = dfItems[dfItems['Id'] == lakehouseId]
    lakehouseName = dfItems['Display Name'].iloc[0]

    df = pd.DataFrame({'Workspace Name': [], 'Lakehouse Name': [], 'Table Name': [], 'Type': [], 'Location': [], 'Format': []})

    for t in tableList:
        tName = t['name']
        tType = t['type']
        tLocation = t['location']
        tFormat = t['format']

        new_data = {'Workspace Name': wName, 'Lakehouse Name': lakehouseName, 'Table Name': tName, 'Type': tType, 'Location': tLocation, 'Format': tFormat}
        df = pd.concat([df, pd.DataFrame(new_data, index=[0])], ignore_index=True)

    return df

get_lakehouse_tables()

Load a parquet or csv file within your lakehouse as a delta table in your lakehouse

import sempy
import sempy.fabric as fabric
import pandas as pd
import os

def load_table(tablename, source, mode = None, workspaceId = None, lakehouseId = None):

    #Mode values: Overwrite; Append
    if workspaceId == None:
        workspaceId = fabric.get_workspace_id()
    if lakehouseId == None:
        lakehouseId = fabric.get_lakehouse_id()
    if mode == None:
        mode = 'Overwrite'
    if not mode in ['Overwrite', 'Append']:
        return print("Invalid Mode value. Mode must be either 'Overwrite' or 'Append'.")

    payload = None
    file_extension = os.path.splitext(source)[1]
    
    if file_extension == '.csv':
        payload = {
        "relativePath": source,
        "pathType": "File",
        "mode": mode,
        "formatOptions": {
            "format": "Csv",
            "header": "true",
            "delimiter": ","
        }
        }
    elif file_extension == '.parquet':
        payload = {
        "relativePath": source,
        "pathType": "File",
        "mode": mode,
        "formatOptions": {
            "format": "Parquet",
            "header": "true"
        }
        }
    else:
        return print("Invalid file extension. Only .csv and .parquet are supported.")

    client = fabric.FabricRestClient()
    response = client.post(f"/v1/workspaces/{workspaceId}/lakehouses/{lakehouseId}/tables/{tableName}/load",json= payload)
    
    return response

load_table(
     tablename = 'TestFile' #Enter the name of the table to be created
    ,source = 'Files/Folder/Myfilename.parquet' #Enter the file path of your Parquet file
    )

Workspaces

Show a list of all objects in your workspace

import sempy
import sempy.fabric as fabric
x = fabric.list_items()
x

Shows a list of your accessible workspaces, sorted alphabetically

import sempy
import sempy.fabric as fabric
x = fabric.list_workspaces().sort_values(by='Name', ascending=True)
x

Filter to a particular workspace

import sempy
import sempy.fabric as fabric
workspaceName = '' #Enter the workspace name to be used as a filter
x = fabric.list_workspaces()
filter_condition = [workspaceName]
x = x[x['Name'].isin(filter_condition)]
x

Filter to a particular workspace and extract the value

import sempy
import sempy.fabric as fabric
workspaceName = '' #Enter the workspace name to be used as a filter
x = fabric.list_workspaces()
filter_condition = [workspaceName]
x = x[x['Name'].isin(filter_condition)]
y = x["Id"].iloc[0]
z = fabric.resolve_workspace_name(y)
z

Show the Capacity SKU, Region, State for your workspaces

import sempy
import sempy.fabric as fabric
import pandas as pd

dfC = fabric.list_capacities()
dfW = fabric.list_workspaces().sort_values(by='Name', ascending=True)
dfC.rename(columns={'Id': 'Capacity Id'}, inplace=True)
dfC['Capacity Id'] = dfC['Capacity Id'].str.upper()
x = pd.merge(dfW, dfC[['Capacity Id', 'Sku', 'Region', 'State']], on='Capacity Id', how='inner')
x

Find the workspace ID for a given workspace name

import sempy
import sempy.fabric as fabric
x = '' #Enter the workspace name
id = fabric.resolve_workspace_id(x)
id

Find the workspace name for a given workspace ID

import sempy
import sempy.fabric as fabric
id = '' #Enter the workspace ID
x = fabric.resolve_workspace_name(id)
x

Get the current workspace ID

import sempy
import sempy.fabric as fabric
x = fabric.get_workspace_id()
x

Apps

Shows a list of apps within the workspace

import sempy
import sempy.fabric as fabric
x = fabric.list_apps()
x

Gateways

Shows a list of gateways within the workspace

import sempy
import sempy.fabric as fabric
x = fabric.list_gateways()
x

Reports

Shows a list of reports within the workspace

import sempy
import sempy.fabric as fabric
x = fabric.list_reports()
x

Rebind a report to a dataset

  • First run this code to create the report_rebind function.
reportName = '' #Enter the report name
datasetName = '' #Enter the dataset name
for item in report_list:
    report_rebind(reportName, datasetName)

Rebind a list of reports to a single dataset

  • First run this code to create the report_rebind function.
report_list = ['', '', '', '', ''] #Enter your list of reports
datasetName = '' #Enter the dataset name
for item in report_list:
    report_rebind(item, datasetName)

Visualize a Power BI Report within a Fabric notebook

import sempy
import sempy.fabric as fabric
from powerbiclient import Report

def launch_report(reportName):

    dfR = fabric.list_reports()
    filtered_df = dfR[(dfR['Name'] == reportName)]
    report_id = filtered_df['Id'].iloc[0]

    report = Report(group_id=None, report_id=report_id)
    return report

launch_report('') #Enter report name

Dashboards

List dashboards in a given workspace

import sempy
import sempy.fabric as fabric
import json
import pandas as pd

def list_dashboards(workspaceName=None):

    df = pd.DataFrame(columns=['Dashboard ID', 'Dashboard Name', 'Read Only', 'Web URL', 'Embed URL', 'Data Classification'])

    if workspaceName == 'None':
        groupId = fabric.get_workspace_id()
    else:
        groupId = fabric.resolve_workspace_id(workspaceName)

    client = fabric.PowerBIRestClient()
    response = client.get(f"/v1.0/myorg/groups/{groupId}/dashboards")
    responseJson = response.json()

    for v in responseJson['value']:
        dashboardID = v['id']
        displayName = v['displayName']
        isReadOnly = v['isReadOnly']
        webURL = v['webUrl']
        embedURL = v['embedUrl']
        dataClass = v['dataClassification']

        new_data = {'Dashboard ID': dashboardID, 'Dashboard Name': displayName, 'Read Only': isReadOnly, 'Web URL': webURL, 'Embed URL': embedURL, 'Data Classification': dataClass}
        df = pd.concat([df, pd.DataFrame(new_data, index=[0])], ignore_index=True) 

    df['Read Only'] = df['Read Only'].astype(bool)

    return df

list_dashboards()

Dataset and dataset objects

Shows a list of datasets in your current workspace

import sempy
import sempy.fabric as fabric
x = fabric.list_datasets()
x

Shows a list of datasets in your current workspace with additional properties

import sempy
import sempy.fabric as fabric
x = fabric.list_datasets(additional_xmla_properties=['Model.DefaultMode', 'Model.DirectLakeBehavior', 'CompatibilityLevel'])
x

Shows the dataset ID for a given dataset name

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter your dataset name
x = fabric.list_datasets()
x = x[x['Dataset Name'] == datasetName]
datasetID = x["Dataset ID"].values[0]
datasetID

Shows the TMSL for a given dataset

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter your dataset name
workspaceName = '' #Enter your workspace name
x = fabric.get_tmsl(datasetName, workspaceName)
print(x)

List the data sources within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_datasources(datasetName)
x

List the tables within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_tables(datasetName)
x

List the columns within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_columns(datasetName)
x

List the partitions within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_partitions(datasetName)
x

List the partitions within a given dataset (semantic model) and properly format the Query column

import sempy
import sempy.fabric as fabric
import pandas as pd

datasetName = '' #Enter dataset name
x = fabric.list_partitions(datasetName)

def format_sql_query(value):
    return value.replace('\n', '<br>').replace('\t', '&emsp;')

x_styled = x.style.format({'Query': format_sql_query})
x_styled

List the tables of a Direct Lake semantic model and show the lakehouse table from which it derives ('Query')

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name

dfP = fabric.list_partitions(datasetName)
dfP_filtered = dfP[dfP['Mode'] == 'DirectLake'][['Table Name', 'Query']]
dfP_filtered

List the measures within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_measures(datasetName)
x

List the measures within a given dataset (semantic model) and properly format the Measure Expression column

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_measures(datasetName)

def format_sql_query(value):
    return value.replace('\n', '<br>').replace('\t', '&emsp;')

x_styled = x.style.format({'Measure Expression': format_sql_query})
x_styled

List the hierarchies within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_hierarchies(datasetName)
x

List the relationships within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_relationships(datasetName)
x

Plot the relationships for a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
relationships = fabric.list_relationships(datasetName)
plot_relationship_metadata(relationships)

List the roles within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.get_roles(datasetName)
x

List the roles and role members within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.get_roles(dataset = datasetName, include_members = True)
x

List the row level security (RLS) for each role within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.get_row_level_security_permissions(datasetName)
x

List the calculation groups and calculation items within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_calculation_items(datasetName)
x

List the perspectives and perspective objects within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_perspectives(datasetName)
x

List the translations within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_translations(datasetName)
x

List the expressions (parameters) within a given dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_expressions(datasetName)
x

Create a new blank semantic model

import sempy
import sempy.fabric as fabric

def create_blank_semantic_model(datasetName, compatibilityLevel = 1604, workspaceName = None):

  if workspaceName == None:
    workspaceId = fabric.get_workspace_id()
    workspaceName = fabric.resolve_workspace_name(workspaceId)

  tmsl = f'''
  {{
    "createOrReplace": {{
      "object": {{
        "database": '{datasetName}'
      }},
      "database": {{
        "name": '{datasetName}',
        "compatibilityLevel": {compatibilityLevel},
        "model": {{
          "culture": "en-US",
          "defaultPowerBIDataSourceVersion": "powerBI_V3"
        }}
      }}
    }}
  }}
  '''

  return fabric.execute_tmsl(script = tmsl, workspace = workspaceName)

create_blank_semantic_model('') #Enter semantic model name to be created

Dataset Refresh

Valid options for refresh_type: 'full', 'automatic', 'dataOnly', 'calculate', 'clearValues', 'defragment'. Default is 'automatic'.

Refresh a dataset

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
fabric.refresh_dataset(dataset = datasetName, refresh_type = 'full')

Refresh specific table(s) in a dataset

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
my_objects = [
    {"table": "tableName1"}, #Update 'tableName1' with your table name
    {"table": "tableName2"}  #Update 'tableName2' with your table name
]
fabric.refresh_dataset(dataset = datasetName, refresh_type = 'full', objects = my_objects)

Refresh specific partition(s) in a dataset

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
my_objects = [
    {"table": "table1", "partition": "partition1"}, #Update 'table1' with your table name and 'partition1' with the partition name
    {"table": "table1", "partition": "partition2"}  #Update 'table1' with your table name and 'partition2' with the partition name
]
fabric.refresh_dataset(dataset = datasetName, refresh_type = 'full', objects = my_objects)

Refresh a combination of tables/partition(s) in a dataset

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
my_objects = [
    {"table": "table1"}, #Update 'table1' with your table name
    {"table": "table2", "partition": "partition2"}  #Update 'table2' with your table name and 'partition2' with the partition name
]
fabric.refresh_dataset(dataset = datasetName, refresh_type = 'full', objects = my_objects)

Show refresh requests for a given dataset

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.list_refresh_requests(datasetName)
x

Read data from a dataset

Show a preview of the data in a given table from a dataset (semantic model)

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
tableName = '' #Enter table name
rowLimit = 100
x = fabric.read_table(datasetName,tableName,False,rowLimit)
x

Tabular Object Model

Connect to the Tabular Object Model (TOM); prints each table name

import sempy
import sempy.fabric as fabric
from sempy.fabric._client import DatasetXmlaClient
from sempy.fabric._cache import _get_or_create_workspace_client
sempy.fabric._client._utils._init_analysis_services()
workspaceName = '' #Enter workspace name
datasetName = '' #Enter dataset name
workspace_client = _get_or_create_workspace_client(workspaceName)
ds = workspace_client.get_dataset(datasetName)

m = ds.Model

for t in m.Tables:
	print(t.Name)

Create a new measure in a semantic model

import sempy
import sempy.fabric as fabric
sempy.fabric._client._utils._init_analysis_services()
from sempy.fabric._cache import _get_or_create_workspace_client
from sempy.fabric._client._connection_mode import ConnectionMode
import Microsoft.AnalysisServices.Tabular as TOM

def create_measure(datasetName, tableName, measureName, measureExpression, workspaceName = None):

    if workspaceName == None:
        workspaceID = fabric.get_workspace_id()
        workspaceName = fabric.resolve_workspace_name(workspaceID)

    tom_server = _get_or_create_workspace_client(workspaceName).get_dataset_client(datasetName, ConnectionMode.XMLA)._get_connected_dataset_server(readonly=False)

    measure = TOM.Measure()
    measure.Name = measureName
    measure.Expression = measureExpression
    
    for d in tom_server.Databases:
        if d.Name == datasetName:
            print(f"Updating {d.Name}...")
            m = d.Model
            for t in m.Tables:
                if t.Name == tableName:
                    if not any(existing_measure.Name == measureName for existing_measure in t.Measures):
                        t.Measures.Add(measure)
                        print(f"'{measureName}' added to the '{tableName}' table.")
                    else:
                        print(f"'{measureName}' already exists as a measure. No measure was added.")
            m.SaveChanges()

create_measure(datasetName = '',tableName = '', measureName = '', measureExpression = '') # Enter dataset name, table name, measure name, measure expression (DAX)

Update the DAX expression of an existing measure in a semantic model

import sempy
import sempy.fabric as fabric
sempy.fabric._client._utils._init_analysis_services()
from sempy.fabric._cache import _get_or_create_workspace_client
from sempy.fabric._client._connection_mode import ConnectionMode
import Microsoft.AnalysisServices.Tabular as TOM

workspaceName = '' #Enter workspace name
datasetName = '' #Enter dataset name
tom_server = _get_or_create_workspace_client(workspaceName).get_dataset_client(datasetName, ConnectionMode.XMLA)._get_connected_dataset_server(readonly=False)

mName = '' #Enter measure name
mExpr = '' #Enter DAX expression
measure = TOM.Measure()
measure.Name = mName
measure.Expression = mExpr
for d in tom_server.Databases:
    if d.Name == datasetName:
        print(f"Updating {d.Name}...")
        m = d.Model
        for t in m.Tables:
            for ms in t.Measures:
                if ms.Name == mName:
                    ms.Expression = mExpr
                    print(f"The DAX expression for '{mName}' has been updated.")
        m.SaveChanges()

Example code to make sure you always get the latest model metadata

import sempy
import sempy.fabric as fabric
from sempy.fabric._client import DatasetXmlaClient
from sempy.fabric._cache import _get_or_create_workspace_client
sempy.fabric._client._utils._init_analysis_services()
workspaceName = '' #Enter workspace name
datasetName = '' #Enter dataset name

tom_server = _get_or_create_workspace_client(workspaceName)._create_tom_server(False)
for d in tom_server.Databases:
    if d.Name == datasetName:
       m = d.Model
       for t in m.Tables:
         for ms in t.Measures:
            print(ms.Name + ' : ' + ms.Expression)

tom_server.Disconnect(True)
tom_server.Dispose()

DAX

Run DAX via evaluate_dax()

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.evaluate_dax(
    datasetName,
    """
    EVALUATE
    SUMMARIZECOLUMNS(
    "Header Name",[Measure Name] // update with actual measure name
    )
    """)
x

Run Dynamic Management Views (DMVs) via evaluate_dax()

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.evaluate_dax(
        datasetName,
        """
        SELECT
        MEASURE_GROUP_NAME AS [TableName]
        ,ATTRIBUTE_NAME AS [ColumnName]
        ,DATATYPE AS [DataType]
        ,DICTIONARY_SIZE AS [DictionarySize]
        ,DICTIONARY_ISRESIDENT AS [IsResident]
        ,DICTIONARY_TEMPERATURE AS [Temperature]
        ,DICTIONARY_LAST_ACCESSED AS [LastAccessed]
        FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS
        WHERE [COLUMN_TYPE] = 'BASIC_DATA'
        AND NOT [ISROWNUMBER]
        ORDER BY [DICTIONARY_TEMPERATURE] DESC
        """)
x

Examples using the new 'INFO' DAX functions

import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.evaluate_dax(
    datasetName,
    """
    // Select specific columns and rename columns      	
    EVALUATE
    SELECTCOLUMNS(INFO.TABLES(),"Table Name", [Name],"Hidden", [IsHidden],"Data Category", [DataCategory],[Description])    
    """)
x
import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.evaluate_dax(
    datasetName,
    """
    // Filter by multiple columns
    EVALUATE
    FILTER(INFO.TABLES(),[Name] = "TableName" && [IsHidden] = False) // Update filter with the actual table name
    """)
x
import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.evaluate_dax(
    datasetName,
    """
    // Select specific columns and filter
    EVALUATE
    FILTER(SELECTCOLUMNS(INFO.TABLES(),[Name],[IsHidden],[DataCategory],[Description]),[Name] = "TableName") // Update filter with the actual table name
    """)
x
import sempy
import sempy.fabric as fabric
datasetName = '' #Enter dataset name
x = fabric.evaluate_dax(
    datasetName,
    """
    // View specific columns and order the table by a column in descending order
    EVALUATE
    SELECTCOLUMNS(COLUMNSTATISTICS(),"Table Name",[Table Name], [Column Name], [Cardinality])
    ORDER BY [Cardinality] DESC
    """)
x

Run a single measure against 1+ columns in your dataset

import sempy
import sempy.fabric as fabric
x = fabric.evaluate_measure(
    "DatasetName", #Enter your dataset name
    "MeasureName", #Enter the measure name from your dataset   
    ["'TableName'[ColumnName]", "TableName[ColumnName]"]) # Enter columns
x

Enable DAX cell magic (ability to run DAX directly in a notebook cell using %%dax)

%load_ext sempy

Run DAX using DAX cell magic

%%dax "DatasetName" -w "WorkspaceName" # Enter DatasetName & WorkspaceName 

EVALUATE
SUMMARIZECOLUMNS(
"Header Name",[Measure Name] // Update with actual measure name
)

Run Dynamic Management Views (DMVs) using DAX cell magic

%%dax "DatasetName" -w "WorkspaceName" # Enter DatasetName & WorkspaceName 

SELECT * FROM $SYSTEM.DISCOVER_SESSIONS

Vertipaq Info

Shows the current in memory model size based on the specific columns which are in memory

import sempy
import sempy.fabric as fabric

datasetName = '' #Enter dataset name

dfC = fabric.list_columns(datasetName, extended=True)
dfC_filt = dfC[dfC['Is Resident']]
total_size_sum = dfC_filt['Total Size'].sum()
total_size_sum

Direct Lake

View the columns which have the highest temperature and are in memory

import sempy
import sempy.fabric as fabric

datasetName = '' #Enter dataset name

dfC = fabric.list_columns(datasetName, extended=True).sort_values(by='Temperature', ascending=False)
dfC

Show the guardrails by SKU for Direct Lake models

import pandas as pd
url = 'https://learn.microsoft.com/power-bi/enterprise/directlake-overview'

tables = pd.read_html(url)
df = tables[0]
df['Fabric/Power BI SKUs'] = df['Fabric/Power BI SKUs'].str.split('/')
df = df.explode('Fabric/Power BI SKUs', ignore_index=True)
df

Show the Direct Lake guardrails for your workspace

import sempy
import sempy.fabric as fabric
import pandas as pd

def get_sku_size(workspaceName = None):

    if workspaceName == None:
        workspaceId = fabric.get_workspace_id()
        workspaceName = fabric.resolve_workspace_name(workspaceId)

    dfC = fabric.list_capacities()
    dfW = fabric.list_workspaces().sort_values(by='Name', ascending=True)
    dfC.rename(columns={'Id': 'Capacity Id'}, inplace=True)
    dfCW = pd.merge(dfW, dfC[['Capacity Id', 'Sku', 'Region', 'State']], on='Capacity Id', how='inner')
    sku_value = dfCW.loc[dfCW['Name'] == workspaceName, 'Sku'].iloc[0]
    
    return sku_value

sku_value = get_sku_size()

def get_directlake_guardrails(skuSize):
    url = 'https://learn.microsoft.com/power-bi/enterprise/directlake-overview'

    tables = pd.read_html(url)
    df = tables[0]
    df['Fabric/Power BI SKUs'] = df['Fabric/Power BI SKUs'].str.split('/')
    df = df.explode('Fabric/Power BI SKUs', ignore_index=True)
    filtered_df = df[df['Fabric/Power BI SKUs'] == sku_value]
    
    return filtered_df

get_directlake_guardrails(sku_value)

Migrate Power Query logic to Dataflows Gen2

Power Query logic can be migrated to Dataflows Gen2 by using a Power Query Template file and then importing that file into Dataflows Gen2. This will migrate all of your tables to Dataflows Gen2 in one swoop.

The following process dynamically creates the Power Query Template (.pqt) file so you can import it into Dataflows Gen2 to create delta tables in your Fabric lakehouse.

Note: For multi-partitioned tables, this process will take just the first partition from the table. In the case of Incremental Refresh, it will take the Source Expression (M query) from the table and ignore the individual partitions.

Method 1: Fabric Notebook

  1. Open a notebook in Fabric and make sure that the semantic-link library is installed
  2. Connect the notebook to your lakehouse
  3. Run this script in the notebook, specifying the semantic model name in the last line of the script which calls the function
  4. Make sure that you have installed OneLake file explorer
  5. Create a new Dataflow Gen2 within your Fabric workspace
  6. Select the PowerQueryTemplate.pqt file created in step 3 (note: you may have to right click on the folder in the Windows file explorer and select 'OneLake -> Sync from OneLake')
  7. Click 'Configure connection' to configure the connection to the data source
  8. Select a destination for each table (your desired lakehouse)
  9. Click 'Publish'

Method 2: Tabular Editor

  1. Open your model in Tabular Editor (version 2 or 3)
  2. Paste this script into the C# script window in Tabular Editor
  3. Update the folderPath parameter in line 29 to be the folder where the .pqt (Power Query Template) fill will be created
  4. Run the script
  5. Create a new Dataflow Gen2 within your Fabric workspace
  6. Click on the link 'Import from a Power Query template'
  7. Select the PowerQueryTemplate.pqt file created in step 4
  8. Click 'Configure connection' to configure the connection to the data source
  9. Select a destination for each table (your desired lakehouse)
  10. Click 'Publish'

Direct Lake migration

The following process automates the migration of an import/DirectQuery model to a new Direct Lake model. The first step is specifically applicable to models which use Power Query to perform data transformations. If your model does not use Power Query, you must migrate the base tables used in your semantic model to a Fabric lakehouse.

Prerequisites

  • Make sure you enable XMLA Read/Write for your capacity
  • Make sure you have a lakehouse in a Fabric workspace
  • Enable the following setting: Workspace -> Workspace Settings -> General -> Data model settings -> Users can edit data models in the Power BI service

Instructions

  1. Download this notebook. Use version 0.1.3 or higher only.
  2. Make sure you are in the 'Data Engineering' persona. Click the icon at the bottom left corner of your Workspace screen and select 'Data Engineering'
  3. In your workspace, select 'New -> Import notebook' and import the notebook from step 1.
  4. Add your lakehouse to your Fabric notebook
  5. Follow the instructions within the notebook.

The migration process

Note: The first 4 steps are only necessary if you have logic in Power Query. Otherwise, you will need to migrate your semantic model source tables to lakehouse tables.

  1. The first step of the notebook creates a Power Query Template (.pqt) file which eases the migration of Power Query logic to Dataflows Gen2.
  2. After the .pqt file is created, sync files from your OneLake file explorer, create a new Dataflows Gen2, and import the Power Query Template file.
  3. Manually map each table to its destination (your lakehouse).
  4. Publish the Dataflow Gen2 and wait for it to finish creating the delta lake tables in your lakehouse.
  5. Back in the notebook, the next step will create your new Direct Lake semantic model with the name of your choice, taking all the valuable properties from the orignal semantic model and refreshing/framing your new semantic model.
  6. Finally, you can easily rebind your all reports which use the import/DQ semantic model to the new Direct Lake semantic model in one click.

Completing these steps will do the following:

  • Offload your Power Query logic to Dataflows Gen2 inside of Fabric (where it can be maintained and development can continue).
  • Dataflows Gen2 will create delta tables in your Fabric lakehouse. These tables can then be used for your Direct Lake model.
  • Create a new semantic model in Direct Lake mode containing all the standard tables and columns, calculation groups, measures, relationships, hierarchies, roles, row level security, perspectives, and translations from your original semantic model.
  • Non-supported objects are not transferred (i.e. calculated columns, calculated tables, relationships using columns with unsupported data types etc.).
  • Reports used by your original semantic model will be rebinded to your new semantic model.

fabric's People

Contributors

m-kovalsky avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.