New samples at @devscope #msbi codeplex- II–#PowerShell Dynamic Outliers with #SSAS mining models

Finally… ! Updated our codeproject project with an improved sample for this. :) btw-See the first post on this thread for other Powershell MSBI data driven/dynamic samples & utilities

II-#PowerShell Dynamic Outliers Monitoring with #SSAS mining models

The idea for this sample goes right back to the initial releases of the Excel Data Mining Addin. Now that’s a cool Excel Addin, in fact it’s way more valuable today and it’s a little sad that data mining is even a little bit forgotten in the MSBI stack :( (yes, it is…). The issue with SSAS Data Mining, particularly this addin I think… it  really was ahead of its time! because now, in our always changing data driven world, it’s the time for such agile & self service data mining/data intelligence scenarios.

Anyway, that Addin showed that is was possible to do 100% dynamic outlier detection based only on the given data/schema. Models would be generated in runtime for each dataset.

And so, finally, we had the time to take this concept and turn to PowerShell to make a lot of other data driven scenarios  scriptable & able to run completely automated (ex: scheduled/intelligent alerts/trends/anomalies monitoring).

So, what if we could use this to monitor several data streams we have available in our solutions?  We already have & use several very agile & data driven kpis/reports/datasets… but that still usually involves fixed alert thresholds, what if we would have something looking at these data streams every day and tell us, “hey, here’s something that’s not usual, are you aware?”? (ie something that would do what I usually do, so that I can go on doing several other things… like blogging and reading :) ).

And that’s precisely what the sample scripts does:

  • iterate all the views in a predefined schema (love this pattern for dba/bi monitoring)
  • execute & get the datasets, we used SQL server but you can use any datatable
  • pass the datatables to the data mining module (it will, like the excel data mining addin, create a temporary model dynamically for each table, do the outlier stuff, some nasty hacks of our own, get an outlier probability value for each row
  • then filtering (in this scenario) for today outliers, using a convention styled approach – choosing the first date column available (btw-a better approach would be save the last execution time for each view and use that date value as a date filter)
  • if there are outliers, send the report by mail (using the cmdlets I talked about in my last post)
  • if not, keep quiet!

And anytime I want to monitor another dataset, just create another view in that schema, the script will adapt accordingly… hence the data driven mantra applies:) (data+services mindset to explore in a future post…)

Like this:

A schema (mon) where I can put the data streams I want to monitor for outliers/anomalies.

image

As a sample I use a dataset of SQL Analysis Services processing logs/elapsed times, by object (measure group, partition, dimension,…).

Ensure that I have a date column to allow for proper date/time filtering-after the outlier detection-, (1 month old outliers aren’t that interesting anyway).

 

image

Now imagine that I force something strange in that data stream, changing the started date as so to increase the processing time for that SSAS object (creating an anomaly)…

image

And I run the provided outlier sample script…. :)

image

and…amazingly (remembering that I did not configure any kind of warning thresholds…), my two “simulated” outliers are now highlighted in my mailbox

image

How cool is that? :) The script doesn’t know have to know the data in anyway, anything goes, that’s the beauty of it.

(defending myself :) from possible comments by MSBI team regarding this screenshot… I can only say that I promise to reassess my mobile stack after the “new” release of the MSBI stack for mobile… :)  )

Be advised though that the script is just a sample, there are a –few- known “issues” at this moment:

  • Do not expect perfect results, we didn’t spend much time with SSAS mining model tuning for the moment, and we are limited to the accuracy we can expect to get from analysis services mining models (would be great to get some data mining experts help/feedback!)
  • Some column types can cause the mining model/structure not to be created
  • We had to use a hack… and introduce a outlier probability of 2 when we were getting an outlier probability of 0… (cof, I know, will have to dig deeper as for the reason for this… sure we messed up anywhere)
  • Still missing is the column highlight where the outlier is most probable, that’s possible, the data mining addin does this but we hadn’t time for that yet, sorry
  • several others issues will appear I’m sure…( I did say that’s a sample right? :) )

That’s it for today, browse/get the sample at http://devscopebisamples.codeplex.com !

Note that there’s a bundle download available with Excel Data Insights AddIn, Mail DataTable Reports & this outliers sample.

The “tiny” main script:

# for each view found
$viewName = $reportViewsSchema+”.”+$_.Name

$data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

$firstDateCol= $data[0].Table.Columns | ? {$_.DataType -eq [datetime]} | select -index 1 | select $_.ColumnName

$dmTable = Set-Outliers -dataTable $data[0].Table -connectionString $ssasconn

write-host “Evaluating OutlierProbability threshold…”
$x=$dmTable | sort-object OutlierProbability -descending | select -index 5| select OutlierProbability

write-host “Highlighting…”
$dmTable=$dmTable | ? {($_.$firstDateCol -ge [System.DateTime]::Now.Date )} | select *,@{name=”SysRowCssClass”;expression={if ($_.OutlierProbability -ge $x.OutlierProbability ){ “highlight” } else { “” }}} | sort-object $firstDateCol.ColumnName -descending

$alertCount=($dmTable | ? {$_.SysRowCssClass -like “highlight”}).COunt
write-host “Today outliers: $alertCount”

# send mail only if alert column is present and its 1
if ($alertCount -gt -1)
{
$html = Out-Email -subject “Outliers in $viewName : $alertCount (today)” -inputObject $dmTable -to $mailto -from $mailfrom -smtpServer $mailsmtp
}

ps-and don’t tell my boss that PowerShell & Data Mining is doing our work now! ;) We’ll figure out something else to do eventually….

Take care,

 

Rui

New samples at @devscope #msbi codeplex- I–Quick Dynamic #PowerShell Mail DataTable Reports

Just uploaded a few more BI samples at our codeplex MSBI samples project. This time some purely data driven PowerShell cmdlets & samples for sending data table reports and… :) detecting outliers (dynamically & at runtime) using SSAS data mining temporary models (very similar to what Excel Data Mining addin does in fact).

Usage for this scripts  is mainly for agile monitoring & alerting needs. Be advised that the outliers reports is work in progress though – really just a concept sample.

btw-Powershell scripts/samples credits go to Rui Romano, impressive as always :)

I-Quick, Agile, Data driven, dynamic, mail reports from your queries

image

Just produce a DataTable anywhere with PowerShell (useful Invoke-OLEDBCommand cmdlet also in the sample)

Then send it quickly by mail, no UI binding needed, no reports, your data is the report. Note that table formatting is a little tuned beyond the PowerShell default html tables. You can highlight, there’s also a more data friendly handling of numeric values and some others.

Import-Module .\DevScope.BI.Samples.PowerShell.psm1 –Force

$data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

    $html = Out-Email -subject “Alerts in $viewName : $alertCount” -inputObject $data -to $mailto -from $mailfrom -smtpServer $mailsmtp

The fun starts when you combine this with a complete agile/data driven mindset. For example, mailing every view in predefined schema as long as it as at least a row with a column alert flagged to 1. That’s precisely what the sample script provided does (and our preferred use):

$reportViewsSchema=”mail”

Invoke-OLEDBCommand -connectionString $connStr `
    -sql “select Name from sys.views where SCHEMA_NAME(schema_id) = ‘$reportViewsSchema’” |%  {
   
   # for each view found
    $viewName = $reportViewsSchema+”.”+$_.Name
   
    $data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

   
    $alertCount=($data | ? {$_.Alert -eq 1}).COunt
    write-host “Alerts found: $alertCount”

    # send mail only if alert column is present and its 1
    if ($alertCount -gt 0)
        {
        $html = Out-Email -subject “Alerts in $viewName : $alertCount” -inputObject $data -to $mailto -from $mailfrom -smtpServer $mailsmtp
        }

}

 

Mix this with SQL DMVs, SSIS Logs, SSAS Logs & whatever info you gather for monitoring/operations your BI/other solution, result: pure fun & almost effortless insights! :)

1) prepare your data

image

2)configure your scripts, edit the ExecConfig.ps1 for settings, and

image

or even better :)

image

 

That’s BI for the BI team, as we don’t actually have a budget for that really. :) We need the insights though!

Just download the scripts at devscopebisamples.codeplex.com. Would be great to improve and get some feedback!

image

Next post: Dynamic Outlier Detection with PowerShell and SSAS Mining models!

 

Rui

An alternative model for faster Excel Self Service BI–Excel Data Insights Addon– Part I

Finally some minutes to share my thoughts & some light into what I presented this last saturday at SqlSat Portugal . I had a great time doing this session (as expected, public speaking can turn into an addiction really, particularly for shy folks and introverts like myself :)… thrilling experience…  ).

Feedback was great and some people were even a little doubtful :) of what was really going on under the hood… for such a fast/single click to insights experience/self service BI with Excel 2013. :)

The presentation slides are available here, you can browse through them online. They are not very extensive but they should guide through  the main topics:

the why - in the age of data, big data, and vast diversity of key performance improvement indicators –or better, datasets-, the need for more agility, productivity, an instant/single click self service BI with Excel 2013, PowerPivot and PowerView is needed (IMO)

and the “what” -a somewhat “twisted” self service model/concept that loads your data into existing reusable powerpivot/powerview templates, supported by a new Excel addin (concept sample/preview) that for now goes with the name of Data Insights

Have to say that all this is currently at proof of concept/sample level. Although I am very happy, even surprised, with the results, and yet to discover a big showstopper. Being the reason to fully share this soon, to allow for anyone to contribute and we can move the idea further. Some help from the PowerPivot/PowerView product teams would be particularly helpful as I think this kind of models could be near perfect with a few adjustments in powerpivot/excel/powerview core modules. (also could enable a much wider adoption of data usage & analytics with Excel 2013… just saying…)

The concept is very simple to grasp from an end user perspective, just grab your data (database connections, data explorer, whatever), put it in a regular table:

image

and click Explore data in the Data Insights Addon Excel ribbon:

image

Wait a few seconds (currently I’m not totally happy with performance, part due to the sample addon data handling code-that we can tune a little bit-, part for the excel data model/powerview loading/refreshing times-that will be hard for us to tune if not impossible ;) )

image

Either way, a click and 10 secs to 40 secs latter (depending on the data), we get a full PowerView/PowerPivot model to explore, prefilled with dashboards, commonly used dimensions & visualization patterns. We can easily change what measure to focus on, reusing the same visual & interactive patterns.

image

 

(as a side bonus you don’t even need PowerPivot enabled for this, although PoweView is a must honestly – note to the product team: startup times could benefit a little tuning!)

The original table is added to the model, so you can use all your fields and further customize the model & dashboards, (ex: search). You’ll have both conformed/prebuilt dimensions like date, time, categories, pictures,  together with your original data.

image

image

This was for sales data, the same single click experience can be used to get the same data insights experience for any other dataset as longs as it shares some common concepts like date/measures/categories/images.

Take tweet data for example, a table with tweets in a few msbi hashtags, with users and pictures:

image 

Another click on Explore Data, and:

image

image

Finally can get some insights when’s the best weekdays and time of day to tweet and blog :) : (although the dataset is very small to jump to clear conclusions…)image

image

image

 

This is self service that makes sense to me. Will it target all self service scenarios? Of course not! Are there shortcomings? Yes, they are, more on that later. But for a lot of end user scenarios, simple and small datasets, this would be just perfect! Finally, not more date tables, time dimensions, wasted effort doing the same models over and over again.

All this works from a template based model, so I can edit once directly in Excel, and reuse many times with all kinds of datasets:

image

So here it is, I think there’s something to explore here… what do you think? Trimming the sample & code, few tweaks missing, to share soon.

Final thanks to DevScope BI Team, Rui Romano for the excel addin draft code, Jorge Esteves for the dynamic PowerPivot model, Sónia Gomes for the AddIn icon and help with layout, and Jason Thomas, even if he’s not aware :), for his  work on dynamic measure & group selection. (we were very confortable with this on regular SSAS cubes, the posts by Jason were helpful bringing the concept to powerpivot/dax) . The work by Tom Gleeson at his blog (Gobán Saor) is also of particular relevance to this concept. This one for example :) will probably be very helpful taking this concept further…

Take care,

Rui

Join me on #SqlSatPortugal next saturday for #MSBI #powerpivot #powerview … “with a twist”

Hope to see you all this Saturday at the #188 SqlSat Portugal event. It will be an amazing event! Amazing speakers, amazing schedule & content.

I’ll be talking about my passion for data (and lack of time! :) ) right at 9am with a session titled  “Faster than the speed of light”… with #MSBI”

clip_image001

What will we be focusing on this session? Data, big data, small data, lots of data, *real* self-service bi. Trying to get data insights quicker with the help of Excel PowerView, PowerPivot, Data Explorer… and a few other addons :)  Will it work…? really hope so!

 

And I would bet that not every SqlSat events gets to show a new Excel Addin ;) Can’t say more, you’ll have to attend the session. (and I’ll have to get this damn thing working…)

What are you waiting for?

image 

Take care!

Rui

SmartPivot new build, release notes

Just finished publishing, download here as usual. Not a major release for now, more to come soon… not until #sqlsaturday 188 here in Portugal I’m afraid! :)

As for changes in this release, 2.33:

image

No need to request activation keys for free features, just install & use. (free activation key will still be needed only for testing full features, can’t change that, sorry!)

Bug fix when building the search cache for large hierarchies, now works with large hierarchies for levels up to 100.000 members. previously it would skip the entire hierarchy, not good (sorry about that…).

You can now connect directly using the search option, you will be asked for a connection using quick connect and the search pane will open instantly after that. fast!

Filter by list now works with specific member names including characters like ‘.

Search & parent child hierarchies bug fixes and other fast search performance improvements.

And finally got that tooltips filled in the ribbon! :)

image

(as for the future, following pivotviewer, cubes/powerpivot fast search I wonder if we can help with global saved sets/clusters… been getting interesting feedback on this, 2013 named sets are almost useless I’m afraid to say, let’s see…)

So now, sqlsat mode=on, clickpivot=on… ;)

Take care!

Rui

SmartPivot new build, quick connect with recent cubes list & full text search adjustments

Just finishing up uploading a new build of SmartPivot. (this is a “minor” release, more to come soon – as we are processing feedback and working out on some usability improvements on the full text/instant search over ssas olap/powerpivot feature)

Meantime… :)

-you now have a recent cubes list when using QuickConnect (kudos to our team members José Barbosa & Rui Romano for this one, handy indeed! :) )

image

image

-in this release we opted for automatically create the fast text search cache automatically whenever it’s not still available (previously we were doing this only for local cubes/ppivot models). We got so used to the feature that waiting for a refresh click already seemed too much. :)

image

Take care!

Rui

ps-download page is now at http://www.devscope.net/products/SmartPivot

#SmartPivot new beta release with #SSAS and #PowerPivot instant text search, now supports #Excel2013

We’ve just finished publishing a new beta release for SmartPivot, our addin for Excel OLAP cubes (and now also PowerPivot/Excel 2013 Data Models).

You can download it here. Give it a try, just remember it’s a beta release, use it for testing purposes only! :)  It would be great to get feedback on the new features, particularly the “fast full text search” over Cubes/PowerPivot.

Now, onto the new stuff!

A new instant Search *beta* feature, for both OLAP cubes and PowerPivot Models

This is probably the main new feature for this release. We hope that this can take Excel olap/ppivot pivot tables usage to a whole new usability level. But tell us what you think.

We are still working on some issues with the feature, but we hope it’s stable enough on the beta , allowing for “crowdsourcing” the final adjustments. :)

It’s rather self –explanatory, just select a pivot and click search…

image

On the first run for a cube/ppivot, wait a few moments for SmartPivot to read/cache all dimension data (at the moment it will do this automatically only for PowerPivot models or local SSAS connections).

And you’ll be able to search both data (member) and metadata (measures/hierarchies) with instant search results:

image

image

This makes exploring the data much easier for end users, as they usually know the data very well (but not always the cube concepts of measures, dimensions, attributes and others).  They can start their own model discovery.

image

Can we hope in a near future that a feature like this would  be pervasive in every frontend and handled internally by the ssas/powerpivot engines? ;) (good note to post on the ssas team equest for feedback survey) But until that’s available we hope this SmartPivot feature can help a little bit. :)

 

And it now supports Excel 2007/2010/2013 and PowerPivot

 

Excel 2013

image

Excel 2010

image

PowerPivot 2013 (my kindle book stats model)

image

PowerPivot 2010 (the “Understanding the US debt” Book excel sample )companion)

image

 

Dev Tools

Not finished yet, but you can already explore the SSAS rowsets available when working with Cubes/Ppivot models, useful for troubleshooting and advanced models discovery

image

image

image

Other small improvements in this release

  • Cell Value from Table Reports -You can now start a table report from a cell value (see a previous post here), like you would do with the Detail By option, but it allows you to build a table report for that specific context.
  • Duplicate pivot option– just an handy tiny feature  to save a few clicks
  • Auto update notification

Now, we just need you feedback! Download it and tell us what you think.

Regards,

Rui

Follow

Get every new post delivered to your Inbox.

Join 161 other followers