Building an Azure ML SSIS Task

In several previous blog posts (HERE and HERE), I’ve introduced and discussed the Azure Machine Learning service, its features, benefits and general capabilities. Since that time I have been toying with the idea of a building a custom SSIS Task to integrate Azure ML into SSIS.

My vision of the project is pretty simple and is roughly modeled after the built-in Execute SQL and Web Service Tasks. First, since the Azure ML batch execution service requires the input data to be hosted in an Azure Blob Storage account,  the task would be capable of either uploading the required input data to an Azure Blob Storage account or simply using an URI to an existing blob. If the input source is not an existing blob, that data could come from a variable, a file connection or via direct input.

Next, the task would execute and monitor the batch prediction job via the end point provided. The task would also periodically log the status for the purpose of visibility. When the job completes, the task would allow the results to be downloaded to either a string variable or file connection.

To start, I re-purposed the Azure Storage Connection Manager found in SSIS Azure Blob Storage sample project on MSDN. The project itself consists of just a single task which I called ExecuteAzureMLBatch and its supporting classes. While a deep dive into the code is outside the scope of this post, those that are interested can download the project directly from my Github project: https://github.com/bluewatersql/AzureMLSSIS.

Building the ExecureAzureMLBatch Task

1. Download the code from https://github.com/bluewatersql/AzureMLSSIS

2. The solution consists of two projects: one for the connection manager and one for the task itself. The build events for both projects handle all the heavy lifting for using the components on your location machine. Ensure the path to GacUtil.exe as well as your SQL Server installation found in the Post-Build Events is correct for your machine. Note that you need to do this for both projects.

image

3. Build the project. The post-build events discussed in step 2 take care of registering the required assemblies in the GAC and copying the files to the required directories. If you get any squirrely errors, you may need to run Visual Studio as an Administrator to successful run GacUtil.

Using the ExecuteAzureMLBatch Task

After the solution has been successful built, your ready to use the task in an SSIS package. To use the task successfully, use the following steps:

  1. Add a new Windows Azure Storage connection manager to your package. This connection does not have a design-time UI for configuration so you will need to use the properties window to set the following properties for where experiment sample data will live:
    • UseDevelopmentStorage = false
    • StorageName = <STORAGE_ACCOUNT />
    • AccessKey = <STORAGE_ACCOUNT_ACCESS_KEY />
    • Path = <CONTAINER_NAME />
  2. Add the ExecuteAzureML task to the control flow design surface and configure.image
    • Set the Azure ML Batch URL Endpoint (ex: https://ussouthcentral.services.azureml.net/workspaces/<….>/services/<…>/jobs)
    • Set the Azure ML Key
    • The experiment input source can come from a variable, a file connection or by direct input.
    • The experiment destination can be none, variable or a file connection and determines where the results are loaded to. Note that when none is specified the results are not downloaded.
  3. Run your packageimage

Once the rask runs, you can process the results as required. I hope you find this as interesting as I do and feel free to let me know if you have suggestions, feedback or would like to actively contribute.

Till next time!

Chris

Programmatically Executing SSIS Packages

While working on the next iteration of my SSIS ETL Framework, I’ve discovered that the capabilities of the out-of-the-box Execute Package task are quite lacking. Luckily, with SQL Server 2012, it has never been easier to execute SSIS packages programmatically. In this post, we will look at two different options for executing SSIS packages from .Net code first by calling stored procedures found in the SSISDB catalog and then by using the provided SSIS API.

SSIS Catalog Stored Procedures

One of the coolest features of the SSIS Catalog in SQL Server 2012 is the ability to completely control your SSIS deployment through stored procedures. Executing a package deployed to the catalog is no different.

In its simplest form, to kick-off an run a SSIS Package there are two procedures to call: [catalog].[create_execution] and [catalog].[start_execution].

To start, you must first call create execution which includes the reference to the project, folder and package. Additionally, a reference to an environment can be included as well as the option to run the package with the 32-bit runtime. The procedure after checking whether the caller is authorized to run the package, creates the execution shell and returns an identifier for the execution. The execution identifier is important and is used to subsequently start the execution and later as we will see, set parameter configurations.

In the example below, we create an execution for DemoPackage.dtsx which is found in the Blog folder and ETL Demo project. We use  a variable (@executionID) to capture the identifier for the execution.

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

DECLARE @executionID BIGINT

EXEC [catalog].[create_execution] 
    @folder_name=N'Blog',
    @project_name=N'ETL Demo',
    @package_name=N'DemoPackage.dtsx',
    @reference_id=NULL,
    @use32bitruntime=FALSE,     
    @execution_id=@executionID OUTPUT

Once the execution is created, we can asynchronously kick it off by calling the start_execution procedure. This as seen below accepts a single argument which is the execution id from above.

EXEC [catalog].[start_execution] @executionID

After the SSIS package execution has started you can monitor its status by querying the [internal].[operations] table. This table contains all the execution information included when the execution was created, started, completed, who created it and most importantly the status.

SELECT [STATUS]
FROM [SSISDB].[internal].[operations]
WHERE operation_id = @executionID

The status column is represented as integer and can be decoded using the table below.

Code Description
1 Created
2 Running
3 Cancelled
4 Failed
5 Pending
6 Ended Unexpectedly
7 Success
8 Stopping
9 Complete

After you’ve gotten your package to run, one of the next logic questions is how can I configure it or set parameter values prior to execution? Like before, a call to another stored procedure, [catalog].[set_execution_parameter_value], allows you to specify the value for a package or project parameter for a specified execution.

EXEC [catalog].[set_execution_parameter_value] 
    @executionID,
    @object_type=30, 
    @parameter_name=N'MyPkgParam',
    @parameter_value='Test Value'

In the example above, we pass in our execution identifier, the parameter name and the value we wish to specify. The object type for this procedure will take one of three values:

  • 20 – Project Parameter
  • 30 – Package Parameter
  • 50 – Environment Parameter

Taken together, these three stored procedures will allow us to easily and using basic database calls, programmatically call our packages stored in the SSIS Catalog.

SSIS API

If calling T-SQL stored procedures directly are not quite your thing, a second option for executing SSIS packages programmatically exists in the form of the SSIS API. To use the SSIS API, we need to reference a couple of namespaces:

  • Microsoft.SqlServer.Management.IntegrationServices: found in the GAC (Global Assembly Cache)
  • Microsoft.SqlServer.Management.Smo: found at C:\Program Files\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.SqlServer.Management.Sdk.Sfc.dll

It should be noted at this point that the API calls are simply using the stored procedures discussed previously. With the namespaces in place, we have everything we need to get started and can begin by reviewing the hierarchy of objects that must be navigated for us to retrieve a reference to our package.

The whole process starts with a reference to a Server. This can be created by either using the Server SMO object or by passing in a connection string to the Integration Services API. From the Server, we must find the Catalog. Currently, there can only be a single catalog on each server and its name will always be SSISDB. In the Catalog, we must next find out Folder, which contains our Project that ultimately contains the Package.

Our package is implemented as a PackageInfo object within the API. From this object, we can call the execute method which accepts arguments much like the preceding stored procedures (32-bit runtime and environment reference) as well as collections for passing in execution parameter values. The execute method returns the identifier of the execution that can subsequently be used to monitor the execution.

In the sample code below, I show one way of doing this.

Operation.ServerOperationStatus status = Operation.ServerOperationStatus.Created;

var server = new Server("localhost");
var ssis = new IntegrationServices(server);

if (ssis.Catalogs.Contains("SSISDB"))
{
    var catalog = ssis.Catalogs["SSISDB"];

    if (catalog.Folders.Contains("Blog"))
    {
        var folder = catalog.Folders["Blog"];

        if (folder.Projects.Contains("ETL Demo"))
        {
            var project = folder.Projects["ETL Demo"];

            if (project.Packages.Contains("DemoPackage.dtsx"))
            {
                var package = project.Packages["DemoPackage.dtsx"];

                long executionId = package.Execute(false, null, null, null);

                var execution = catalog.Executions[executionId];

                while (!execution.Completed)
                {
                    execution.Refresh();
                    Thread.Sleep(1000);
                }

                return execution.Status;
                                                           
            }
        }
    }
}            

return status;

There are a couple of things to point out in this example before wrapping up. First, the Execute method happens asynchronously. To make this code run synchronously, I use polling to wait on the package to finish execution so that the execution status can be returned. Second, the Execute method accepts a collection of PackageInfo.ExecutionValueParameterSet objects that allow you to configure parameter values for your execution. This value was passed as null for the example but could have easily been set as seen below.

Collection<PackageInfo.ExecutionValueParameterSet> executionParams = new Collection<PackageInfo.ExecutionValueParameterSet>(); executionParams.Add(new PackageInfo.ExecutionValueParameterSet() { ParameterName = “MyParam”, ParameterValue = “TestValue”, ObjectType = 30 };

 

long executionId = package.Execute(false, null, executionParams, null);

Wrap-Up

In this post, we looked at different options that are available to execute SSIS package programmatic from the SQL Server 2012 catalog. This hardly scratches the surface of the programmatic possibilities of the catalog but in my case allowed me to work around a limitations of the out-of-the-box Execute package task.

Till next time!

Chris

Building Custom SSIS Azure Service Bus Components

Over the last couple of weeks, I have been experimenting and working on a set of custom SSIS Service Bus components that will ultimately be the basis of the ETL framework described in my previous post (HERE). While these components are being developed as a proof of concept to test out my theories on they should be generally useful in other use cases as well. As such I am making these components (and their associated sources) available to community through a new Codeplex project (https://ssisservicebus.codeplex.com/) for you to use and abuse.

Overview

The SSIS components set forth are all built to allow for straight-forward interaction with Azure Service Bus (What is it? Go HERE) Topics and Queues. There are four components included in the initial project (more to come later) including a custom connection manager, for each enumerator and two control flow tasks that allow you to send, receive and acknowledge messages in the form of a handshake.

image

I’ll apologize in advance for the cheesy and irrelevant icons but I was less worried about how the components looks on the design surface and more interested in how they work at runtime. So without further ado, a summary of each component…..

Azure Service Bus Connection Manager

The connection manager supports connections to Azure Service Bus Queues, Topics and Subscriptions. The AcquireConnection method returns the base class MessageClientEntity for the configured type. To simplify my life, I made the decision to require each task to ensure that the connection is appropriate rather than implement different connection managers.

image

Brokered Message ForEach Enumerator

This custom foreach enumerator allows brokered messages to be read off the queue or subscription using the SSIS For Each container. The enumerator supports individual and batch based reads and can be configured with a message cap or maximum number of messages to read before exiting. Brokered message objects can be piped into an object variable using the variable mappings to allow for messages to be consumed by other tasks.

image

Brokered Message Handshake Task

The handshake task when mapped to a variable containing a Brokered Message object, can be configured to call the complete or abandon method thereby either finishing the conversation or abandoning the message so it can be processed again at a later time. This handshake task is an essential part to processing brokered messages within the control flow.

image

Send Brokered Message Task

The last task is capable of sending a brokered message to a queue or topic depending on the configuration of the connection manager. The brokered message data is either sourced from a SSIS variable or is a directly input string and can be sent synchronously or asynchronously.

image

The current implementation of these components is relatively sparse in terms of Azure Service Bus whiz-bang features as they simply send and receive. The plan as of now is to build them out iteratively, adding features as needed. Also note, that while I have made an effort to ensure that the design time UI experience is inline with the out of the box SSIS controls, I have not added robust design-time error handling and reporting.

At this point while production ready is a ways off, I would invite you to check the project out. If you are interested in contributing either your time, ideas or feedback I would gladly welcome it. If you have any questions, feel free to contact me directly.

 

Till next time!

Chris