New samples at @devscope #msbi codeplex- II–#PowerShell Dynamic Outliers with #SSAS mining models

Finally… ! Updated our codeproject project with an improved sample for this. :) btw-See the first post on this thread for other Powershell MSBI data driven/dynamic samples & utilities

II-#PowerShell Dynamic Outliers Monitoring with #SSAS mining models

The idea for this sample goes right back to the initial releases of the Excel Data Mining Addin. Now that’s a cool Excel Addin, in fact it’s way more valuable today and it’s a little sad that data mining is even a little bit forgotten in the MSBI stack :( (yes, it is…). The issue with SSAS Data Mining, particularly this addin I think… it  really was ahead of its time! because now, in our always changing data driven world, it’s the time for such agile & self service data mining/data intelligence scenarios.

Anyway, that Addin showed that is was possible to do 100% dynamic outlier detection based only on the given data/schema. Models would be generated in runtime for each dataset.

And so, finally, we had the time to take this concept and turn to PowerShell to make a lot of other data driven scenarios  scriptable & able to run completely automated (ex: scheduled/intelligent alerts/trends/anomalies monitoring).

So, what if we could use this to monitor several data streams we have available in our solutions?  We already have & use several very agile & data driven kpis/reports/datasets… but that still usually involves fixed alert thresholds, what if we would have something looking at these data streams every day and tell us, “hey, here’s something that’s not usual, are you aware?”? (ie something that would do what I usually do, so that I can go on doing several other things… like blogging and reading :) ).

And that’s precisely what the sample scripts does:

  • iterate all the views in a predefined schema (love this pattern for dba/bi monitoring)
  • execute & get the datasets, we used SQL server but you can use any datatable
  • pass the datatables to the data mining module (it will, like the excel data mining addin, create a temporary model dynamically for each table, do the outlier stuff, some nasty hacks of our own, get an outlier probability value for each row
  • then filtering (in this scenario) for today outliers, using a convention styled approach – choosing the first date column available (btw-a better approach would be save the last execution time for each view and use that date value as a date filter)
  • if there are outliers, send the report by mail (using the cmdlets I talked about in my last post)
  • if not, keep quiet!

And anytime I want to monitor another dataset, just create another view in that schema, the script will adapt accordingly… hence the data driven mantra applies:) (data+services mindset to explore in a future post…)

Like this:

A schema (mon) where I can put the data streams I want to monitor for outliers/anomalies.

image

As a sample I use a dataset of SQL Analysis Services processing logs/elapsed times, by object (measure group, partition, dimension,…).

Ensure that I have a date column to allow for proper date/time filtering-after the outlier detection-, (1 month old outliers aren’t that interesting anyway).

 

image

Now imagine that I force something strange in that data stream, changing the started date as so to increase the processing time for that SSAS object (creating an anomaly)…

image

And I run the provided outlier sample script…. :)

image

and…amazingly (remembering that I did not configure any kind of warning thresholds…), my two “simulated” outliers are now highlighted in my mailbox

image

How cool is that? :) The script doesn’t know have to know the data in anyway, anything goes, that’s the beauty of it.

(defending myself :) from possible comments by MSBI team regarding this screenshot… I can only say that I promise to reassess my mobile stack after the “new” release of the MSBI stack for mobile… :)  )

Be advised though that the script is just a sample, there are a –few- known “issues” at this moment:

  • Do not expect perfect results, we didn’t spend much time with SSAS mining model tuning for the moment, and we are limited to the accuracy we can expect to get from analysis services mining models (would be great to get some data mining experts help/feedback!)
  • Some column types can cause the mining model/structure not to be created
  • We had to use a hack… and introduce a outlier probability of 2 when we were getting an outlier probability of 0… (cof, I know, will have to dig deeper as for the reason for this… sure we messed up anywhere)
  • Still missing is the column highlight where the outlier is most probable, that’s possible, the data mining addin does this but we hadn’t time for that yet, sorry
  • several others issues will appear I’m sure…( I did say that’s a sample right? :) )

That’s it for today, browse/get the sample at http://devscopebisamples.codeplex.com !

Note that there’s a bundle download available with Excel Data Insights AddIn, Mail DataTable Reports & this outliers sample.

The “tiny” main script:

# for each view found
$viewName = $reportViewsSchema+”.”+$_.Name

$data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

$firstDateCol= $data[0].Table.Columns | ? {$_.DataType -eq [datetime]} | select -index 1 | select $_.ColumnName

$dmTable = Set-Outliers -dataTable $data[0].Table -connectionString $ssasconn

write-host “Evaluating OutlierProbability threshold…”
$x=$dmTable | sort-object OutlierProbability -descending | select -index 5| select OutlierProbability

write-host “Highlighting…”
$dmTable=$dmTable | ? {($_.$firstDateCol -ge [System.DateTime]::Now.Date )} | select *,@{name=”SysRowCssClass”;expression={if ($_.OutlierProbability -ge $x.OutlierProbability ){ “highlight” } else { “” }}} | sort-object $firstDateCol.ColumnName -descending

$alertCount=($dmTable | ? {$_.SysRowCssClass -like “highlight”}).COunt
write-host “Today outliers: $alertCount”

# send mail only if alert column is present and its 1
if ($alertCount -gt -1)
{
$html = Out-Email -subject “Outliers in $viewName : $alertCount (today)” -inputObject $dmTable -to $mailto -from $mailfrom -smtpServer $mailsmtp
}

ps-and don’t tell my boss that PowerShell & Data Mining is doing our work now! ;) We’ll figure out something else to do eventually….

Take care,

 

Rui

About these ads

About Rui Quintino
Developer & Software Architect @ DevScope

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 279 other followers

%d bloggers like this: