New samples at @devscope #msbi codeplex- II–#PowerShell Dynamic Outliers with #SSAS mining models

Finally… ! Updated our codeproject project with an improved sample for this. :) btw-See the first post on this thread for other Powershell MSBI data driven/dynamic samples & utilities

II-#PowerShell Dynamic Outliers Monitoring with #SSAS mining models

The idea for this sample goes right back to the initial releases of the Excel Data Mining Addin. Now that’s a cool Excel Addin, in fact it’s way more valuable today and it’s a little sad that data mining is even a little bit forgotten in the MSBI stack :( (yes, it is…). The issue with SSAS Data Mining, particularly this addin I think… it  really was ahead of its time! because now, in our always changing data driven world, it’s the time for such agile & self service data mining/data intelligence scenarios.

Anyway, that Addin showed that is was possible to do 100% dynamic outlier detection based only on the given data/schema. Models would be generated in runtime for each dataset.

And so, finally, we had the time to take this concept and turn to PowerShell to make a lot of other data driven scenarios  scriptable & able to run completely automated (ex: scheduled/intelligent alerts/trends/anomalies monitoring).

So, what if we could use this to monitor several data streams we have available in our solutions?  We already have & use several very agile & data driven kpis/reports/datasets… but that still usually involves fixed alert thresholds, what if we would have something looking at these data streams every day and tell us, “hey, here’s something that’s not usual, are you aware?”? (ie something that would do what I usually do, so that I can go on doing several other things… like blogging and reading :) ).

And that’s precisely what the sample scripts does:

  • iterate all the views in a predefined schema (love this pattern for dba/bi monitoring)
  • execute & get the datasets, we used SQL server but you can use any datatable
  • pass the datatables to the data mining module (it will, like the excel data mining addin, create a temporary model dynamically for each table, do the outlier stuff, some nasty hacks of our own, get an outlier probability value for each row
  • then filtering (in this scenario) for today outliers, using a convention styled approach – choosing the first date column available (btw-a better approach would be save the last execution time for each view and use that date value as a date filter)
  • if there are outliers, send the report by mail (using the cmdlets I talked about in my last post)
  • if not, keep quiet!

And anytime I want to monitor another dataset, just create another view in that schema, the script will adapt accordingly… hence the data driven mantra applies:) (data+services mindset to explore in a future post…)

Like this:

A schema (mon) where I can put the data streams I want to monitor for outliers/anomalies.

image

As a sample I use a dataset of SQL Analysis Services processing logs/elapsed times, by object (measure group, partition, dimension,…).

Ensure that I have a date column to allow for proper date/time filtering-after the outlier detection-, (1 month old outliers aren’t that interesting anyway).

 

image

Now imagine that I force something strange in that data stream, changing the started date as so to increase the processing time for that SSAS object (creating an anomaly)…

image

And I run the provided outlier sample script…. :)

image

and…amazingly (remembering that I did not configure any kind of warning thresholds…), my two “simulated” outliers are now highlighted in my mailbox

image

How cool is that? :) The script doesn’t know have to know the data in anyway, anything goes, that’s the beauty of it.

(defending myself :) from possible comments by MSBI team regarding this screenshot… I can only say that I promise to reassess my mobile stack after the “new” release of the MSBI stack for mobile… :)  )

Be advised though that the script is just a sample, there are a –few- known “issues” at this moment:

  • Do not expect perfect results, we didn’t spend much time with SSAS mining model tuning for the moment, and we are limited to the accuracy we can expect to get from analysis services mining models (would be great to get some data mining experts help/feedback!)
  • Some column types can cause the mining model/structure not to be created
  • We had to use a hack… and introduce a outlier probability of 2 when we were getting an outlier probability of 0… (cof, I know, will have to dig deeper as for the reason for this… sure we messed up anywhere)
  • Still missing is the column highlight where the outlier is most probable, that’s possible, the data mining addin does this but we hadn’t time for that yet, sorry
  • several others issues will appear I’m sure…( I did say that’s a sample right? :) )

That’s it for today, browse/get the sample at http://devscopebisamples.codeplex.com !

Note that there’s a bundle download available with Excel Data Insights AddIn, Mail DataTable Reports & this outliers sample.

The “tiny” main script:

# for each view found
$viewName = $reportViewsSchema+”.”+$_.Name

$data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

$firstDateCol= $data[0].Table.Columns | ? {$_.DataType -eq [datetime]} | select -index 1 | select $_.ColumnName

$dmTable = Set-Outliers -dataTable $data[0].Table -connectionString $ssasconn

write-host “Evaluating OutlierProbability threshold…”
$x=$dmTable | sort-object OutlierProbability -descending | select -index 5| select OutlierProbability

write-host “Highlighting…”
$dmTable=$dmTable | ? {($_.$firstDateCol -ge [System.DateTime]::Now.Date )} | select *,@{name=”SysRowCssClass”;expression={if ($_.OutlierProbability -ge $x.OutlierProbability ){ “highlight” } else { “” }}} | sort-object $firstDateCol.ColumnName -descending

$alertCount=($dmTable | ? {$_.SysRowCssClass -like “highlight”}).COunt
write-host “Today outliers: $alertCount”

# send mail only if alert column is present and its 1
if ($alertCount -gt -1)
{
$html = Out-Email -subject “Outliers in $viewName : $alertCount (today)” -inputObject $dmTable -to $mailto -from $mailfrom -smtpServer $mailsmtp
}

ps-and don’t tell my boss that PowerShell & Data Mining is doing our work now! ;) We’ll figure out something else to do eventually….

Take care,

 

Rui

Advertisements

New samples at @devscope #msbi codeplex- I–Quick Dynamic #PowerShell Mail DataTable Reports

Just uploaded a few more BI samples at our codeplex MSBI samples project. This time some purely data driven PowerShell cmdlets & samples for sending data table reports and… :) detecting outliers (dynamically & at runtime) using SSAS data mining temporary models (very similar to what Excel Data Mining addin does in fact).

Usage for this scripts  is mainly for agile monitoring & alerting needs. Be advised that the outliers reports is work in progress though – really just a concept sample.

btw-Powershell scripts/samples credits go to Rui Romano, impressive as always :)

I-Quick, Agile, Data driven, dynamic, mail reports from your queries

image

Just produce a DataTable anywhere with PowerShell (useful Invoke-OLEDBCommand cmdlet also in the sample)

Then send it quickly by mail, no UI binding needed, no reports, your data is the report. Note that table formatting is a little tuned beyond the PowerShell default html tables. You can highlight, there’s also a more data friendly handling of numeric values and some others.

Import-Module .\DevScope.BI.Samples.PowerShell.psm1 –Force

$data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

    $html = Out-Email -subject “Alerts in $viewName : $alertCount” -inputObject $data -to $mailto -from $mailfrom -smtpServer $mailsmtp

The fun starts when you combine this with a complete agile/data driven mindset. For example, mailing every view in predefined schema as long as it as at least a row with a column alert flagged to 1. That’s precisely what the sample script provided does (and our preferred use):

$reportViewsSchema=”mail”

Invoke-OLEDBCommand -connectionString $connStr `
    -sql “select Name from sys.views where SCHEMA_NAME(schema_id) = ‘$reportViewsSchema'” |%  {
   
   # for each view found
    $viewName = $reportViewsSchema+”.”+$_.Name
   
    $data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

   
    $alertCount=($data | ? {$_.Alert -eq 1}).COunt
    write-host “Alerts found: $alertCount”

    # send mail only if alert column is present and its 1
    if ($alertCount -gt 0)
        {
        $html = Out-Email -subject “Alerts in $viewName : $alertCount” -inputObject $data -to $mailto -from $mailfrom -smtpServer $mailsmtp
        }

}

 

Mix this with SQL DMVs, SSIS Logs, SSAS Logs & whatever info you gather for monitoring/operations your BI/other solution, result: pure fun & almost effortless insights! :)

1) prepare your data

image

2)configure your scripts, edit the ExecConfig.ps1 for settings, and

image

or even better :)

image

 

That’s BI for the BI team, as we don’t actually have a budget for that really. :) We need the insights though!

Just download the scripts at devscopebisamples.codeplex.com. Would be great to improve and get some feedback!

image

Next post: Dynamic Outlier Detection with PowerShell and SSAS Mining models!

 

Rui