#SqlPort upcoming CTP hands on workshop–Cloud #MachineLearning & #DataMining using #AzureML 29 Nov Lisbon

It’s already this Saturday 29 that I’ll be hosting the first AzureML hands-on workshop in Portugal. It has been a pleasant surprise to see that it’s filling up rather quickly, much to thanks to the restless SqlPort community team (Particularly Niko & Paulo amazing work… allowing me to fully  focus on the workshop itself… more pressure for sure… no excuses!).

I’m rather curious to see how it goes, as there will be for sure an interesting mix of profiles attending this workshop. And I hope that will contribute to make this a very interactive, personally demanding  but enjoyable day!

The event is organized by SQLPort, all the resulting revenue will revert to community support & upcoming events.

Check out the workshop agenda below or in the event page here . The event will take place at Microsoft Portugal.

Also disclosing that  every participant will be granted access to my very own personal notes & best highlights on some of my favorites data books… (go to love Amazon/Kindle platform)  all of which I’m reviewing right now  ;) .

Data Science for Business: What you need to know about data mining and data-analytic thinking     The Signal and the Noise: Why So Many Predictions Fail-but Some Don't      Naked Statistics: Stripping the Dread from the Data     

See you there!

Rui

CTP – Community Technology Preview 2014 Edition – Cloud Machine Learning & Data Mining using AzureML

SQLPort

Sábado, 29 de Novembro de 2014 das 09:00 às 17:30 (WET)

Lisboa, Portugal

 

Abstract

Full one day hands-on workshop including introductory course to Machine Learning process, use cases, models & algorithms and also how to model, test & deploy Machine Learning solutions with the new Microsoft AzureML cloud service.


Objectives

  • Understand Machine Learning & Data Science Fundamentals
  • Model & Deploy Machine Learning Solutions with AzureML, ML Studio & AzureML Web Services


Attendee’s pre-requisites/profile

  • Data Analysts & Enthusiasts, Business Intelligence Professionals, Data Scientists & Machine Learning


Contents

  • Introduction to Data Science and Machine Learning
  • Machine Learning Use Cases, Models & Workflow
  • Introduction to Azure Machine Learning & ML Studio
    • o Data Cleansing & Transformation
    • o Data Analysis & Visualization
    • o Machine Learning Models: Classification, Regression, Clustering, & Text Mining
    • o Model Tuning, Scoring & Evaluation
    • o Using R in Azure ML
    • o Publishing Models as Web Services
    • o Using Excel with Azure ML
    • Books & Resources for AzureML, Data Science & Machine Learning
Advertisements

New samples at @devscope #msbi codeplex- II–#PowerShell Dynamic Outliers with #SSAS mining models

Finally… ! Updated our codeproject project with an improved sample for this. :) btw-See the first post on this thread for other Powershell MSBI data driven/dynamic samples & utilities

II-#PowerShell Dynamic Outliers Monitoring with #SSAS mining models

The idea for this sample goes right back to the initial releases of the Excel Data Mining Addin. Now that’s a cool Excel Addin, in fact it’s way more valuable today and it’s a little sad that data mining is even a little bit forgotten in the MSBI stack :( (yes, it is…). The issue with SSAS Data Mining, particularly this addin I think… it  really was ahead of its time! because now, in our always changing data driven world, it’s the time for such agile & self service data mining/data intelligence scenarios.

Anyway, that Addin showed that is was possible to do 100% dynamic outlier detection based only on the given data/schema. Models would be generated in runtime for each dataset.

And so, finally, we had the time to take this concept and turn to PowerShell to make a lot of other data driven scenarios  scriptable & able to run completely automated (ex: scheduled/intelligent alerts/trends/anomalies monitoring).

So, what if we could use this to monitor several data streams we have available in our solutions?  We already have & use several very agile & data driven kpis/reports/datasets… but that still usually involves fixed alert thresholds, what if we would have something looking at these data streams every day and tell us, “hey, here’s something that’s not usual, are you aware?”? (ie something that would do what I usually do, so that I can go on doing several other things… like blogging and reading :) ).

And that’s precisely what the sample scripts does:

  • iterate all the views in a predefined schema (love this pattern for dba/bi monitoring)
  • execute & get the datasets, we used SQL server but you can use any datatable
  • pass the datatables to the data mining module (it will, like the excel data mining addin, create a temporary model dynamically for each table, do the outlier stuff, some nasty hacks of our own, get an outlier probability value for each row
  • then filtering (in this scenario) for today outliers, using a convention styled approach – choosing the first date column available (btw-a better approach would be save the last execution time for each view and use that date value as a date filter)
  • if there are outliers, send the report by mail (using the cmdlets I talked about in my last post)
  • if not, keep quiet!

And anytime I want to monitor another dataset, just create another view in that schema, the script will adapt accordingly… hence the data driven mantra applies:) (data+services mindset to explore in a future post…)

Like this:

A schema (mon) where I can put the data streams I want to monitor for outliers/anomalies.

image

As a sample I use a dataset of SQL Analysis Services processing logs/elapsed times, by object (measure group, partition, dimension,…).

Ensure that I have a date column to allow for proper date/time filtering-after the outlier detection-, (1 month old outliers aren’t that interesting anyway).

 

image

Now imagine that I force something strange in that data stream, changing the started date as so to increase the processing time for that SSAS object (creating an anomaly)…

image

And I run the provided outlier sample script…. :)

image

and…amazingly (remembering that I did not configure any kind of warning thresholds…), my two “simulated” outliers are now highlighted in my mailbox

image

How cool is that? :) The script doesn’t know have to know the data in anyway, anything goes, that’s the beauty of it.

(defending myself :) from possible comments by MSBI team regarding this screenshot… I can only say that I promise to reassess my mobile stack after the “new” release of the MSBI stack for mobile… :)  )

Be advised though that the script is just a sample, there are a –few- known “issues” at this moment:

  • Do not expect perfect results, we didn’t spend much time with SSAS mining model tuning for the moment, and we are limited to the accuracy we can expect to get from analysis services mining models (would be great to get some data mining experts help/feedback!)
  • Some column types can cause the mining model/structure not to be created
  • We had to use a hack… and introduce a outlier probability of 2 when we were getting an outlier probability of 0… (cof, I know, will have to dig deeper as for the reason for this… sure we messed up anywhere)
  • Still missing is the column highlight where the outlier is most probable, that’s possible, the data mining addin does this but we hadn’t time for that yet, sorry
  • several others issues will appear I’m sure…( I did say that’s a sample right? :) )

That’s it for today, browse/get the sample at http://devscopebisamples.codeplex.com !

Note that there’s a bundle download available with Excel Data Insights AddIn, Mail DataTable Reports & this outliers sample.

The “tiny” main script:

# for each view found
$viewName = $reportViewsSchema+”.”+$_.Name

$data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

$firstDateCol= $data[0].Table.Columns | ? {$_.DataType -eq [datetime]} | select -index 1 | select $_.ColumnName

$dmTable = Set-Outliers -dataTable $data[0].Table -connectionString $ssasconn

write-host “Evaluating OutlierProbability threshold…”
$x=$dmTable | sort-object OutlierProbability -descending | select -index 5| select OutlierProbability

write-host “Highlighting…”
$dmTable=$dmTable | ? {($_.$firstDateCol -ge [System.DateTime]::Now.Date )} | select *,@{name=”SysRowCssClass”;expression={if ($_.OutlierProbability -ge $x.OutlierProbability ){ “highlight” } else { “” }}} | sort-object $firstDateCol.ColumnName -descending

$alertCount=($dmTable | ? {$_.SysRowCssClass -like “highlight”}).COunt
write-host “Today outliers: $alertCount”

# send mail only if alert column is present and its 1
if ($alertCount -gt -1)
{
$html = Out-Email -subject “Outliers in $viewName : $alertCount (today)” -inputObject $dmTable -to $mailto -from $mailfrom -smtpServer $mailsmtp
}

ps-and don’t tell my boss that PowerShell & Data Mining is doing our work now! ;) We’ll figure out something else to do eventually….

Take care,

 

Rui