Code Monkey home page Code Monkey logo

ssis's Introduction

SSIS

These tools enable machine learning in the HC platform by setting up extensibility points that call out to R/Python scripts during ETLs.

Data Flow Process

  1. A SAM Batch ETL runs.
  2. The SAM pauses ETLs after a specific entity runs.
  3. The R/Python script runs making predictions and populates a table.
  4. The SAM resumes ETLs.

Extensibility Points Configuration

Configuration for extensibility points are stored in two tables:

  • ETLEngineConfig (some generic configuration)
  • ObjectAttributeBase (additional detailed configuration)

Details

The ExternalScriptExecution.dtsx runs as an extensibility after the OnPostTableLoad on the empty destination table. The empty destination table should have already been designed in SAMD with the appropriate columns and data types. The binding query for that empty destination table should execute a select statement from the source table where 1 = 2. For example:

SELECT {columns with Aliases matching destination columns}
FROM {DatabaseNM}.{SchemaNM}.{SourceTableNM}
WHERE 1 = 2.

This simple binding query creates a dependency upon the source table.

Once the destination table loads its empty binding, the OnPostTableLoad event will trigger the ExternalScriptExecution. This script will execute the intended function on the intended external script (Python or R) if all the required Object Attributes are defined correctly.

Required Object Attributes:

  • ExternalRScript: System-level object attribute containing the R script. Warning: It's possible that this is table-level, depending on the version of the package.
  • ExternalPythonScript: System-level object attribute containing the Python script. Warning: It's possible that this is table-level, depending on the version of the package.
  • RInterpreterPath: System-level local path of the RScript.exe interpreter
  • PythonInterpreterPath: System-level local path of the Python interpreter
  • ExternalScriptType: Table-level variable, R or Python
  • ExternalScriptFunction: Table-level variable, function within script. For example: FindTrends
  • ExternalScriptSourceEntity: Table-level variable, source table from which to query
  • ExternalScriptArguments: Table-level variable, space-delimited extra arguments for script function, i.e. "z2" "Test.txt"

If any of these attributes are not defined, the package will exit and log the reason in EDW Console.

Glossary

  • SSIS
    • SQL Server Integration Services.
  • SSMS
    • SQL Server Management Studio. Where work happens.
  • SSDT
    • SQL Server Data Tools. This is where the ISPAC file is first installed.
  • EDW Console
    • Catalyst web app for viewing and configuring ETLs.
    • Errors are surfaced here.
  • Batches
    • ETLs scheduled in EDW Console (web portal) by a TD.
  • SSIS packages
    • are found in: SSMS > Integration Services Catalogs > CatalystExtensibility
  • ISPAC File
    • The project deployment file is a self-contained unit of deployment that includes only the essential information about the packages and parameters in the project. reference
  • DTX File
    • The actual SSIS package files that are installed into SSIS via SSMS.
  • EDWAdmin.CatalystAdmin.AttributeBASE
    • A table that stores keys and descriptions that can be used to assign values to instances of objects.
  • EDWAdmin.CatalystAdmin.ObjectAttributeBASE
    • A table that stores instances of key values for specific objects.

ssis's People

Contributors

aylr avatar mdpettit avatar mmastand avatar taylorlarsen avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.