Code Monkey home page Code Monkey logo

madlib-sas-macros's Introduction

madlib-sas-macros

This repository contains a set of SAS MACROs which allow users to publish SAS models into Greenplum, HAWQ and PostgreSQL databases with MADlib SQL-based advanced analytics and machine learning library installed for in-database scoring.

Two types of MACROs exist today

  • Publish MACROs: Transform and copy SAS model output tables to Greenplum/HAWQ/Postgres database.
  • Score MACROs: Use tables generated by publish MACROs and MADlib scoring functions to apply SAS model to new data in-database.

Why publish SAS models to Greenplum/HAWQ/PostgreSQL for scoring?

In production environments data typically flows into databases for storage and use in supporting applications. Traditional modeling tools, like SAS, require data to be moved into an environment external from these databases for modeling and subsequent model scoring.

This process can involve many steps and for large data volumes can be time consuming (if not impossible). Publishing SAS models into Greenplum, HAWQ or Postgres where the data lives shortens the number of steps required to score data and increases the speed in which data can be scored.

Requirements

  • MADlib: In-database machine learning library. (Developed using MADlib 1.8)
  • SAS/Base: (Developed using SAS/Base 9.4)
  • SAS/STAT: (Developed using SAS/STAT 12.3)
  • SAS/Access to Greenplum or SAS/Access to HAWQ or SAS/Access to Postgres:
    (Developed using SAS/Access to Greenplum 9.4)

Currently supported SAS-MADlib models

Publish

  • Linear Regression
  • Logistic Regression

Score

  • Linear Regression
  • Logistic Regression

Limitations

Publish and Score MACROS:

  • SAS PROCs allow for a large number of variations through the use of additional parameters and options. Current publish and score MACROs only support a limited set of SAS PROC functionality. Please see examples for supported functionality.

Examples

Note - Models included in these examples are for code test purposes only. No effort has been made to include appropriate features and generate accurate models.

/************** Setup/Connection Details **************/

%let repopath=c:\Desktop;       * Location of madlib-sas-macros repo;
%let engine=greenplm;           * SAS/Access engine (greenplm, hawq or postgres);
%let server='XXX.XXX.XXX.XXX';  * database server/host IP address;
%let port=5432;                 * database port number;
%let database='public';         * database name;
%let schema=sasgpdb;            * database schema;
%let user='username';           * database credentials - username;
%let password='password';       * database credentials - password;


/************** Establish libref to database **************/

* Establish libref to database;
libname mydblib &engine. server=&server. port=&port. database=&database. user=&user. password=&password. schema=&schema.;


/************** Data preparation **************/

* Copy SAShelp 'cars' dataset to database for testing;
proc sql;
  CREATE TABLE mydblib.cars AS
  SELECT *
  FROM sashelp.cars;
quit;


/************** Run MACROS files **************/

%macro include_files;

  * Determine operating system and set correct / or \ for filenames;
  %let del=/; * UNIX OS;
  %if &sysscp=WIN %then %let del=\; * Windows OS;

  * linear regression MACROs;
  %include "&repopath.&del.madlib-sas-macros&del.linear-regression&del.madlib_sas_publish_linreg.sas";
  %include "&repopath.&del.madlib-sas-macros&del.linear-regression&del.madlib_sas_score_linreg.sas";

  * logistic regression MACROs;
  %include "&repopath.&del.madlib-sas-macros&del.logistic-regression&del.madlib_sas_publish_logreg.sas";
  %include "&repopath.&del.madlib-sas-macros&del.logistic-regression&del.madlib_sas_score_logreg.sas";

%mend include_files;
%include_files;

/************** Linear Regression **************/

* Linear regression model predicting mpg_highway;
proc reg data=sashelp.cars;
  model mpg_highway=Horsepower Weight EngineSize;
  * Output parameter estimates to a dataset;
  ods output parameterEstimates = sas_mpg_highway_lrm;
run;
quit;

* Run model publishing macro;
%madlib_sas_publish_linreg(
  modelDataset=sas_mpg_highway_lrm
 ,modelTable=&schema..db_mpg_highway_lrm
 ,server=&server
 ,db=&database
 ,port=&port
 ,user=&user
 ,password=&password
 ,drop=1
);

* Run model scoring macro;
%madlib_sas_score_linreg(
  inTable=&schema..cars
 ,outTable=&schema..cars_scored_lrm
 ,modelTable=&schema..db_mpg_highway_lrm
 ,server=&server
 ,db=&database
 ,port=&port
 ,user=&user
 ,password=&password
 ,predictColumnName=predict
 ,residualColumnName=err
 ,drop=1
);

* View results;
proc print data=mydblib.cars_scored_lrm;
run;


/************** Logistic Regression **************/

* Prep data for logistic - create binary column (BMW or Mercedes-Benz then yes) from make column;
proc sql;
  CREATE TABLE work.cars_binary AS
  SELECT *
        ,CASE WHEN make IN ('BMW','Mercedes-Benz') THEN 1 ELSE 0 END AS bmw_mb_flag
  FROM sashelp.cars;
quit;

* Logistic regression model predicting whether car is BMW or Mercedes-Benz;
proc logistic data=work.cars_binary;
  model bmw_mb_flag (event='1') = msrp enginesize cylinders weight;
  ods output parameterEstimates = sas_bmw_mb_flag_lgrm;
run;
quit;

* Run model publishing macro;
%madlib_sas_publish_logreg(
  modelDataset=sas_bmw_mb_flag_lgrm
 ,modelTable=&schema..db_bmw_mb_flag_lgrm
 ,server=&server
 ,db=&database
 ,port=&port
 ,user=&user
 ,password=&password
 ,drop=1
);

* Run model scoring macro;
%madlib_sas_score_logreg(
  inTable=&schema..cars
 ,outTable=&schema..cars_scored_lgrm
 ,modelTable=&schema..db_bmw_mb_flag_lgrm
 ,server=&server
 ,db=&database
 ,port=&port
 ,user=&user
 ,password=&password
 ,predictColumnName=predict
 ,drop=1
);

* View results;
proc print data=mydblib.cars_scored_lgrm;
run;

References

Contact

Jarrod Vawdrey

@jjvawdrey

[email protected]

madlib-sas-macros's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.