May 3rd, 2011

Use PowerShell to Perform SQL Data Mining

Summary: Learn how to use Windows PowerShell to perform SQL Server data mining.

Microsoft Scripting Guy, Ed Wilson, is here. Today I continue the week about SQL in honor of next week’s SQLRally in Orlando. I have the honor of introducing Mark Tabladillo, who will talk to us about data mining.

Photo of Mark Tabladillo

Mark Tabladillo Ph.D. is a business intelligence consultant and trainer, based in the United States, with clients around the world. He has worked with business intelligence applications and development since 1998. He focuses on data mining and predictive analytics.

Mark has earned the Microsoft Certified Application Developer (MCAD) designation for .NET, Microsoft Certified IT Professional (MCITP), and he is a Microsoft Certified Trainer (MCT). He has also served the SAS community by presenting at conferences and organizing events. Mark has a doctorate (Ph.D.) in Industrial Engineering from Georgia Tech. His data mining resource is at http://marktab.net.

Windows PowerShell with SQL Server data mining

Windows PowerShell version 2.0 provides an excellent maintenance foundation for SQL Server data mining, and this blog post will focus on that aspect. I first used business intelligence software in the UNIX operating system, and the comparable shells available are well known by professionals. Windows PowerShell 2.0 implements the vision and momentum that started with version 1, and it provides a robust and stable technology for enterprise maintenance and operations—and in this case, enterprise data mining. Because Windows PowerShell ships with the Windows operating systems, it can and should be part of any business intelligence solution.

This blog post provides a basic outline for using Windows PowerShell with SQL Server data mining—only one aspect of the complete Microsoft Business Intelligence tools. The sections of this post include:

  • Motivation
  • What a script for data mining looks like
  • APIs for data mining
  • A complete example
  • Promotion for SQL Rally 2011

Motivation

I realize that many readers of this blog already are Windows PowerShell fans, but many business intelligence people still need to warm up to the concept of this powerful technology. Windows PowerShell is Microsoft’s powerful object-oriented scripting language. You can download Windows PowerShell from the Microsoft website, but beginning with Windows 7 and Windows Server 2008 R2, you should have Windows PowerShell as part of the standard installation (look in the Accessories folder). Two separate applications are referred to as Windows PowerShell 2.0: the command screen window and the Integrated Scripting Environment (ISE). The icons for these applications are shown in the following image.

PowerShell icons

I believe that these applications are so important that I have pinned them to my start menu on all my Windows-based computers. The Windows PowerShell functionally has replaced the command prompt for me, because it can do all that and more. Even in restricted corporate environments that disallow Windows PowerShell scripts (which you would write with the ISE), you can still open a Windows PowerShell window, and issue individual commands.

Because of the pipe facility, a single command can have a number of subsequent options in one line. Some Windows PowerShell geeks like to see what they can cram into one line. Even scripts can often achieve results in fewer lines than writing comparable C# code, and new aliases (shortcuts) continue to be invented. The scripts themselves are text files, typically with a .ps1 extension. A Windows PowerShell file could be created by another process and then later executed, and you can use any text editor to modify the scripts themselves. What makes the technology particularly powerful is access to the full range of .NET classes.

What a script for data mining looks like

I do not intend to make this blog post exceedingly long (based on other guest posts I have read on Scripting Guys). However, developing enterprise-level data mining with Windows PowerShell typically will require more than just a few lines. The actual length could possibly be shortened by creating your own aliases, but for the simplicity of this post, I assume that we are using commands that ship with Windows PowerShell (with no added extensions).

The following code shows how to create a Microsoft data mining structure, and then how to add data mining models to that structure. These routines are part of a much longer piece of working code, which you can try for yourself. I will provide that link at the end of this blog post. Also, to keep this topic interesting, I will skip a detailed explanation of the code (even though I added some comments inline).

################################################################

#

# Listing 16.4 — Creating the mining structure

#

#

 

function CreateMiningStructure {

      param(

            [Microsoft.AnalysisServices.Database] $db

        )       

 

    # Initialize a new mining structure

    [Microsoft.AnalysisServices.MiningStructure] $ms =

        new-object Microsoft.AnalysisServices.MiningStructure(“PayChannelAnalysis”, “PayChannelAnalysis”)

    $ms.Source = new-object Microsoft.AnalysisServices.DataSourceViewBinding(“MovieClick”)

 

    # Create the columns of the mining structure

    # setting the type, content and data binding

 

    # User Id column

    [Microsoft.AnalysisServices.ScalarMiningStructureColumn] $UserID =

        new-object Microsoft.AnalysisServices.ScalarMiningStructureColumn(“UserId”, “UserId”)

    $UserID.Type = [Microsoft.AnalysisServices.MiningStructureColumnTypes]::Long

    $UserID.Content = [Microsoft.AnalysisServices.MiningStructureColumnContents]::Key

    $UserID.IsKey = $true

    # Add data binding to the column

    # Methods need to pipe to null to remove output from the returned object

    $UserID.KeyColumns.Add(“Customers”, “SurveyTakenID”, [System.Data.OleDb.OleDbType]::Integer) > $null

    # Add the column to the mining structure

    $ms.Columns.Add($UserID) > $null

 

    # Generation column

    [Microsoft.AnalysisServices.ScalarMiningStructureColumn] $Generation =

        new-object Microsoft.AnalysisServices.ScalarMiningStructureColumn(“Generation”, “Generation”)

    $Generation.Type = [Microsoft.AnalysisServices.MiningStructureColumnTypes]::Text

    $Generation.Content = [Microsoft.AnalysisServices.MiningStructureColumnContents]::Discrete

    # Add data binding to the column

    $Generation.KeyColumns.Add(“Customers”, “Generation”, [System.Data.OleDb.OleDbType]::WChar) > $null

    # Add the column to the mining structure

    $ms.Columns.Add($Generation) > $null

 

    # Add Nested table by creating a table column and adding

    # a key column to the nested table

    [Microsoft.AnalysisServices.TableMiningStructureColumn] $PayChannels =

        new-object Microsoft.AnalysisServices.TableMiningStructureColumn(“PayChannels”, “PayChannels”)

    $PayChannels.ForeignKeyColumns.Add(“PayChannels”, “SurveyTakenID”, [System.Data.OleDb.OleDbType]::Integer) > $null

 

    [Microsoft.AnalysisServices.ScalarMiningStructureColumn] $Channel =

        new-object Microsoft.AnalysisServices.ScalarMiningStructureColumn(“Channel”, “Channel”)

    $Channel.Type = [Microsoft.AnalysisServices.MiningStructureColumnTypes]::Text

    $Channel.Content = [Microsoft.AnalysisServices.MiningStructureColumnContents]::Key

    $Channel.IsKey = $true

    # Add data binding to the column

    $Channel.KeyColumns.Add(“PayChannels”, “Channel”, [System.Data.OleDb.OleDbType]::WChar) > $null

    $PayChannels.Columns.Add($Channel) > $null

    $ms.Columns.Add($PayChannels) > $null

 

    # Add the mining structure to the database

    $db.MiningStructures.Add($ms) > $null

    $ms.Update() > $null

 

    $ms

    }

 

################################################################

#

# Listing 16.5 — Adding mining models to the structure

#

#

the mining structure

function CreateModels {

      param(

            [Microsoft.AnalysisServices.MiningStructure] $ms

        )       

 

    [Microsoft.AnalysisServices.MiningModel] $ClusterModel

    [Microsoft.AnalysisServices.MiningModel] $TreeModel

    [Microsoft.AnalysisServices.MiningModelColumn] $mmc

 

    # Create the Cluster model and set the algorithm

    # and parameters

    $ClusterModel = $ms.CreateMiningModel($true, “Premium Generation Clusters”)

    $ClusterModel.Algorithm = “Microsoft_Clustering”

    $ClusterModel.AlgorithmParameters.Add(“CLUSTER_COUNT”, 0)

 

    # The CreateMiningModel method adds

    # all the structure columns to the collection

 

    # Copy the Cluster model and change the necessary properties

    $TreeModel = $ClusterModel.Clone()

    $TreeModel.Name = “Generation Trees”

    $TreeModel.ID = “Generation Trees”

    $TreeModel.Algorithm = “Microsoft_Decision_Trees”

    $TreeModel.AlgorithmParameters.Clear()

    $TreeModel.Columns[“Generation”].Usage = “Predict”

    $TreeModel.Columns[“PayChannels”].Usage = “Predict”

 

    # Add an aliased copy of the PayChannels table to the trees model

    $mmc = $TreeModel.Columns.Add(“PayChannels_Hbo_Encore”)

    $mmc.SourceColumnID = “PayChannels”

    $mmc = $mmc.Columns.Add(“Channel”)

    $mmc.SourceColumnID = “Channel”

    $mmc.Usage = “Key”

 

    # Now set a filter on the PayChannels_Hbo_Encore table and use it

    # as input to predict other channels

    $TreeModel.Columns[“PayChannels_Hbo_Encore”].Filter = “Channel=’HBO’ OR Channel=’Encore'”

 

    # Set a complementary filter on the payChannels predictable nested table

    $TreeModel.Columns[“PayChannels”].Filter = “Channel<>’HBO’ AND Channel<>’Encore'”

 

    $ms.MiningModels.Add($TreeModel)

 

    # Submit the models to the server

    $ClusterModel.Update()

    $TreeModel.Update()

    }

The data mining model code adds both a Microsoft Clustering and a Microsoft Decision Tree model to the structure. You can read an orientation to the algorithms available with SQL Server data mining on my website.

APIs for data mining

Several application programming interfaces (APIs) provide the classes for accessing SQL Server data mining technology. From the SQL Server 2008 R2 documentation on MSDN, these APIs include:

ADOMD.NET (ActiveX data objects multidimensional for .NET)

AMO (Analysis Management Objects)

OLE DB (object linking and embedding for databases—essentially Data Mining Extensions (DMX))

ASSL (Analysis Services Scripting Language, including XMLA and XML for analysis)

Clicking the links goes directly to the documentation. The prerequisite for understanding this documentation include a basic foundation in object-oriented programming and an introduction to .NET. You can find many free resources (including video) for learning these technologies throughout the TechNet website.

A complete example

Earlier, I shared some Windows PowerShell code, and I said it was part of longer code. I originally took code from a book that I recommend: Data Mining with Microsoft SQL Server 2008. This book provides some C# data mining code, which I then converted to Windows PowerShell 2.0. Key to this conversion was being able to use the try-catch-finally implemented in Windows PowerShell 2.0. Robust enterprise code should be using this type of error handling, and Windows PowerShell 2.0 provides the way to implement the concept. You can see the full example, along with instructions about how to replicate the results yourself in my article, Programming SQL Server Data Mining with PowerShell 2.0,on my website.

In production, a developer or architect could develop scripts for regular activities. Windows PowerShell allows for creating extensions and aliases, and experienced coders can make these extensions based on regular data mining processing. I believe that developers could also write some Windows PowerShell scripts for data mining analysts to use, for activities such as uploading structures and models into production, and querying development or production data mining models. For advanced users, I recommend studying the many free Windows PowerShell extensions available (including the Windows 7 Resource Kit PowerShell Pack), and consider developing your own.

Promotion for SQL Rally 2011

As I described in my blog, I am going to be speaking at SQL Rally 2011 in Orlando, FL in May. My topic will be Enterprise Data Mining, and I will include some Windows PowerShell examples too (based on Contoso Retail). Also at this event, there will be many other knowledgeable speakers including Aaron Nelson and Ed Wilson. You should consider coming. For more information about the conference, see the SQL Rally 2011 website.

Conclusion and contact information

Windows PowerShell should be seriously considered for enterprise data mining (whether Microsoft or not). The advanced features in Windows PowerShell 2.0 make the technology robust for error handling. Tasks, which would have previously required C #.NET or Visual Basic.NET console applications can be now achieved with this free technology. I believe the scripting interface lowers the adoption barrier while simultaneously raising (or opening) the scope of .NET classes available.

I provide a data mining resource at http://marktab.net and my blog is at http://marktab.net/datamining. Both resources have a contact form for sending me a message. Also on my blog, check out my recent interview with Scripting Guy guru, Ed Wilson.

SQL Week will continue tomorrow when our guest blogger will be Michael Wells.

Thank you, Mark, for sharing this post about data mining.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

Author

0 comments

Discussion are closed.