AzureML Web Service Scoring with Excel and Power Query

While drafting some samples & labs for this saturday SQLPort AzureML workshop decided to dig a little bit trying to use some regular AzureML scoring web services (request/response) directly in Excel, using Power Query. :)

So… can it be done? Sure! Advisable? Far,far from it…. but for a quick sample/test scenario it might be handy. :)

Saying again, what I’ll show here is far from a recommended web service usage for AzureML in Excel. It will be slow, probably insecure if you need to distribute your file, and there will be I’m sure better ways to do it with the batch web service mode, soon I hope (without the need to pass through the Azure marketplace),

Side notes:

    How it works

    To the point, you can use the Power Query Web.Contents function to invoke the AzureML web service api. As long as you are careful on: 1) using post and not get, 2)setting the authorization header expected by azureml, 3)setting the content type to json  4) building the json request like shown in the api help page.
    PowerQuery will handle the json response automatically..We encapsulate the web request in a function, we can then use it to build our scored table passing the score input column values and getting the score in return.

Step by Step-From AzureML to Excel Power Query Scoring

You’ll need an AzureML  scoring web service, I used a simple two class log regression model to classify the well known Fisher Iris Dataset (with only two classes, as the original as three):

image

The AzureML Experiment & Web Service Experiment, can’t get much simpler than this.

Training Experiment

image

Web Service Experiment

image

Publishing the trained model as a Web Service

AzureML-Iris Classifier Web Service API Help page

image

this JSON request will be needed for Power Query, same thing as the service key:

image

In Power Query you’ll then create a function like this (text version below):

image

image

image

Invoking the Power Query scoring function

Then you just import the original table to Power Query and add a custom column invoking the new scoring  function

image

Note: to test this it is advisable to filter the top 1/2 rows first, it is not lightning fast…. Smile

image

The result should now be visible, a little bit more Power Query expanding and renaming and that’s it!

image

image

To check that I’m getting the expected score, just used another small tool (more on this later)

image

Power Query results

image

Scoring the whole 100 rows iris dataset

The problem using the regular Power Query web.contents function  is that you will be receiving api usage threshold errors from the AzureML API very, very soon. So using another trick inspired (again) on this post by Chris Webb (amazing tip by Alejandro Lopez-Lago), to add some delay between requests.

Just remove the top filter and refresh the query. Wait between 30 secs-60 secs (this will depend on the kind of endpoint you are using, AzureML is changing this whole “stag/production web service model “ right now from what I’m seeing)

image

note: think the latest Power Query update now shows the row count progress, neat! perfect for this Smile

image

image

image

Some misses there Smile guess the model needs some additional training!

image

Download the sample here, (note: will need your own AzureML endpoint & key, you do know that AzureML has a free tier right? :) ) the full Power Query score function for this model is below.

Take care!

Rui

let

     Score= (sepal_length as any,sepal_width,petal_length,petal_width) =>

let

serviceUri="*yourserviceuri_note:ends with score*",
serviceKey="yourservicekey”,

PostContents= "

{
  ""Id"": ""score00001"",
  ""Instance"": {
    ""FeatureVector"": {
      ""sepal_length"": """&sepal_length&""",
      ""sepal_width"": """&sepal_width&""",
      ""petal_length"": """&petal_length&""",
      ""petal_width"": """&petal_width&"""
    },
    ""GlobalParameters"": {}
  }
}
        
                   
",
      //Define Wait function
    Wait = (seconds as number, action as function) =>
            if (List.Count(
             List.Generate(
              () => DateTimeZone.LocalNow() + #duration(0,0,0,seconds),
              (x) => DateTimeZone.LocalNow() < x,
              (x) => x)
              ) = 0)
               then null else action()
    //Call Wait function to wait 5 seconds
    //then return the current time again
   
,
    Source=Wait ( 0.05, () => Web.Contents(serviceUri,
    [Content=Text.ToBinary(PostContents),
    Headers=[Authorization="Bearer "&serviceKey,#"Content-Type"="application/json; charset=utf-8"]])),
    #"Imported JSON" = Json.Document(Source),
    #"Table from List" = Table.FromList(#"Imported JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Transposed Table" = Table.Transpose(#"Table from List")
   in
     #"Transposed Table"
    in Score

#SqlPort upcoming CTP hands on workshop–Cloud #MachineLearning & #DataMining using #AzureML 29 Nov Lisbon

It’s already this Saturday 29 that I’ll be hosting the first AzureML hands-on workshop in Portugal. It has been a pleasant surprise to see that it’s filling up rather quickly, much to thanks to the restless SqlPort community team (Particularly Niko & Paulo amazing work… allowing me to fully  focus on the workshop itself… more pressure for sure… no excuses!).

I’m rather curious to see how it goes, as there will be for sure an interesting mix of profiles attending this workshop. And I hope that will contribute to make this a very interactive, personally demanding  but enjoyable day!

The event is organized by SQLPort, all the resulting revenue will revert to community support & upcoming events.

Check out the workshop agenda below or in the event page here . The event will take place at Microsoft Portugal.

Also disclosing that  every participant will be granted access to my very own personal notes & best highlights on some of my favorites data books… (go to love Amazon/Kindle platform)  all of which I’m reviewing right now  ;) .

Data Science for Business: What you need to know about data mining and data-analytic thinking     The Signal and the Noise: Why So Many Predictions Fail-but Some Don't      Naked Statistics: Stripping the Dread from the Data     

See you there!

Rui


CTP – Community Technology Preview 2014 Edition – Cloud Machine Learning & Data Mining using AzureML


SQLPort


Sábado, 29 de Novembro de 2014 das 09:00 às 17:30 (WET)

Lisboa, Portugal

 

Abstract

Full one day hands-on workshop including introductory course to Machine Learning process, use cases, models & algorithms and also how to model, test & deploy Machine Learning solutions with the new Microsoft AzureML cloud service.


Objectives

  • Understand Machine Learning & Data Science Fundamentals
  • Model & Deploy Machine Learning Solutions with AzureML, ML Studio & AzureML Web Services


Attendee’s pre-requisites/profile

  • Data Analysts & Enthusiasts, Business Intelligence Professionals, Data Scientists & Machine Learning


Contents

  • Introduction to Data Science and Machine Learning
  • Machine Learning Use Cases, Models & Workflow
  • Introduction to Azure Machine Learning & ML Studio
    • o Data Cleansing & Transformation
    • o Data Analysis & Visualization
    • o Machine Learning Models: Classification, Regression, Clustering, & Text Mining
    • o Model Tuning, Scoring & Evaluation
    • o Using R in Azure ML
    • o Publishing Models as Web Services
    • o Using Excel with Azure ML
    • Books & Resources for AzureML, Data Science & Machine Learning

#SQLSaturday Portugal is coming to Porto! tomorrow!

SQL Saturday is coming to Porto for the first time and it’s already tomorrow! Smile  One entire day of free sessions for Data Enthusiasts and Professionals!

SqlSatPortugal team has put everything they got (as usual!), not a doubt that it will be an awesome event. Great speakers & great sessions like you would expect . Topics going from the regular suspects SQL Server, Analytics, Business Intelligence to Mobile Development and PowerShell Tips,  not forgetting the most recent Data technologies from Msft like AzureML!

Some close friends and team mates will be sharing their knowledge also,  Sandro Pereira (DevScope Integration Architect & MVP for Biztalk), José António Silva (R&D Director at DevScope) and Rui Marinho (former team mate at DevScope now pursuing his mobile passion at Xamarin). Inspiring to work every day among such group of amazing minds.

Check out the schedule . Not registered yet? Hurry up!

Celebrating SqlSaturday we are also finishing some updates to our Office Apps and will be covering these and other Data/DataViz related tips on our DevScope sponsored session. Excel will never be the same!

(Expect got some goodies at the end, including invitations to upcoming workshops on Azure Machine Learning I’ll be doing soon )

Just a teaser Winking smile … see you all tomorrow!

Take Care,

Rui

image

image

image

SmartCharts, SmartPivot & Gantt Chart for #Excel -Latest features & releases @DevScope #Apps

We updated some of our apps and add-ins last week. Brief post introducing some of the new features:

SmartPivot – now with Measure Group/Table Filter in search pane, also Table Report & Filter Tools improvements

A much requested feature now available (should have been in the initial Search release honestly…): SmartSearch pane now allows for measure group/table filtering, much like the regular PivotTable fields browser in Excel.

Download the new SmartPivot release here.

image

Fixed some issues also with Table Report & Filter Tools (both Saved Filters & Filter by list)

image

 

SmartCharts-now with real-time Share feature & Free download during SharePoint Conference 2014

You can now share your charts session, opening it in a browser or tablet/mobile window (disclaimer:some safari/OS issues still to fix). You can then collaborate real time with anyone connected to that SmartCharts session.

Also, you can freely download SmartCharts Task Pane in Office Store during #SPC2014. :) feedback will be appreciated!

Download SmartCharts Task Pane in Office Store

image

 

Gantt Chart & SmartCharts for SharePoint

Soon to be published SmartCharts for SharePoint ;) stay tuned.

image

Gantt Chart for Excel on the other hand is already available in Office Store, and free! :)

Download Gantt Chart for Excel

Gantt Chart for Excel

 

Take care,

 

Rui

To Microsoft Office Apps/Developer Teams: Some requests to improve #Office #Apps Experience–a developer’s view :)

SharePoint conference 2014 is around the corner. The office team & office developer team have been posting some new content on their blog, useful tips for anyone building Office apps for office store. Be sure to check these recent posts for guidance on office apps development:

(also, Office 2013 SP1 was released this week, sp1 new features for sharepoint & office apps api/sdk expected to be announced during the SharePoint conference next week)

Anyway, this post is not for developers, but for Office Apps developer/seller dashboard teams :)

It’s my personal view, after publishing, supporting & improving a few apps (SmartCharts, GanttChart, -a few others coming soon ;) )on how the experience could be improved for end users and developers, hoping to drive a much higher acceptance for Office Apps & the App model in Office365/SharePoint.

So, what have we learned and what would be great to see improved in the Office Apps experience?

1-Apps in the office ribbon are difficult to find (available only through insert/app.. :( )

We frequently get people asking where they can find the app after installing! We need an individual Apps tab, where we could add our favorite apps, our full list of installed apps, quick access to the store & recent apps. And task panes apps  aren’t particularly “insertable” anyway. ~

Sorry, this need an urgent fix! An new Apps ribbon tab!

image

2-No app icons? Except for local catalogs… needs fixing also, why is that? we get the icon in about, but not listed in our apps…

image

Developer/Local Catalogs Apps with Icon

image

Office Apps-no icon :(

(SP1 for Office 2013 seems to bring some minor improvements but far from what I describe above… we get app icons for some apps but you can’t configure the apps that show in the ribbon… currently you will get bing maps and people graph  from msft)

image

Office 2013 SP1 tab for Apps

3-Users can’t get details/screenshots about an app using the Excel built in app browser?

How are they supposed to pay, install & trust an app if they don’t know what its supposed to do?

image

we click try/but/install and the only available info is…

image

come on!? doesn’t make sense when we have rich description and screenshots available in the store.

Also this should be en easy fix by the office team because all we are seeing is a web page from the store. (you can see that page here)

(also note that in this screen there is no way, that I know of ,  of clicking to open the app details in the browser…? you need to click the publisher link and then see all apps…)

4-App analytics in seller dashboard is….well…not great :(

Trust me, the one thing you should do from day one is to configure google analytics, your own custom analytics framework or both (yes, be sure to reflect that in your privacy page).

Office Seller dashboard analytics is not very useful to gain insights on how to tune your app and how it’s being used. Main pain points for analytics in the app seller dashboard (office apps):

  • we get historic data only for the latest 4 weeks :(
  • we only have week totals, not detailed data, not even daily data (no demographics, time, date, browser versions, nothing…)
  • The CSV you can download also has only week total aggregates, not very useful
  • data usually lags 2-3 days behind, and happened recently due to some issues on the office store, even weeks behind current date
  • if you are actually selling apps, there’s no metric whatsoever regarding revenue (you’ll have to wait for the end of month and use very limited tax/payout reports, and even then you’ll have to convert from international currencies to local currency…)
  • paid downloads & free download are bundled together in downloads metric

image

image

Hope all these could be fixed soon, in the world of data, the available dashboards and data are really lacking (being kind here honestly…).

I would gladly skip the built-in Seller Dashboard “dashboards” and exchange all that for the detailed “fact” data on downloads, browser hits. I’ll do my “dashboarding” on that! :)

5-Developer support could be a little better

While developing and getting SmartCharts approved we had several issues with the api, supporting IE9 in Excel web app, strange redirects issues breaking the office API context model, undocumented issues with permissions and data read API calls. I must say that having been in touch with some office apps support teams in this process, we were on your own almost all the time :(. Expect delay for advanced issues, these are the ones difficult to reproduce (ex: excel web app in IE9/chrome).

There a few locations to send feedback (foruns, office sp dev user voice site), but getting through and actually get real help has been a struggle for us. My initial post on addFromPromptAsync /read permission question is still to see an answer, (we just went ahead and requested full read/write permissions instead of read, problem “solved”… but seems a bug/very undocumented to me…)

6-Some bug/issues remain

Some reported issues, that seem to be related to core office app js API  remain,

  • for example excel web app & dates have issues:

clip_image001

  • very annoying issue-some users buy the apps but Excel keeps showing the app as trial, I have seen that happen, there are app reviews from users pointing this issue, and we’ve also received reports from some users with this problem (yet to find a proper workaround… been using  an undocumented one to fix this… ) –
    (note:received two user reports today with this specific issue!…not good…)

That’s it…

Hope this can reach someone on the office teams and help a little bit with some of these issues.  Do you know someone there? can you please pass this along? thanks!  ;)

btw-be sure to check the SharePoint Conference App Awards – People’s Choice! contest. now running into final voting period until February 28, 4:00PM PST. (little disappointed to know that SmartCharts didn’t make it to the final in people awards, wasn’t aware of this voting until very recently :( but hoping we still get a chance on other categories :) )

Take care!

 

Rui

“Multi-threading” the Sql Server Analysis Services Formula Engine II-a parallel query msmdpump proxy

Following a previous post (“Multi-threading” the Sql Server Analysis Services Formula Engine – I #ssas #mdx ), we returned recently to the issue of multi threading and SSAS formula engine. See that last post or this post by James Serra for reference.

Honestly it’s kind of amazing that something like the SSAS engine can run incredibly  well using a single threaded model for each query. It’s damn smart, and as long it runs smart you usually don’t need brute force. :)

Until… you find yourself with a very (very) complex cube, together with a rather complex scorecard model also completely built in SSAS/mdx, and…. PerformancePoint with its “with clauses” (and  like Chris Webb pointed before, with clause disables FE cache)

Sample query generated by PerformancePoint (a scorecard model, kpis on rows), running in 50-60 secs.

image

.

To return a complex and FE intensive scorecard with several different KPIs (completely different measure groups), with values/metrics like value, ytd value, prior year, end of year projection, target, ytd target, score and so on… well, requires a not so usual amount of computation by SSAS. Worst, due to the, let’s call it a “reasonable”, :) amount of mdx scopes involved it triggered some internal thresholds i, and it stops being smart… maybe switching to cell by cell mode? Query 3 kpis individually, none exceeds 2-3 secs, get the 3 together in the same query, –> 30-40 secs….

After exhausting all the tuning we could possibly remember… reducing down the query from 1m20 secs, to under 40-50 secs, but still annoying, why all the CPU power if we are waiting for damn 50 secs?

Another thing to note was that every row was a different KPI, completely isolated from all the other rows returned, really a very good candidate for partitioning and multi threading. But SSAS doesn’t do that (query hint would be great SSAS team ;) ),

so….

(and this where I have to say the usual disclaimer, please do this at home, never at work! it’s not supported… aside from tolerating that 50 secs what follows is the worst possible thing you could do…)

We had previously built some SSAS http msmdpump proxies before (it allows you to query SSAS server over http) adding some “extra” features we needed…

So why not trying intercepting that query in a “fake” msmdpump (proxy) built in asp.net, partitioning it by member rows requested, run a bunch of parallel MDX queries against the SSAS DB, get the results, join the cell set together, and return as a SSAS XMLA pump reply….? (kinda…nuts….yes)

And well, far, far, far away from being a reusable module for this scenarios we built it for only this specific one, changed the connections in PerformancePoint pointing to the http endpoint (only for this kpis/scorecards), and gave it a go:

From here, query running to 50 secs, almost no cpu activiy:

image

To here, query now running under 14 secs (multi-threading is very noticeable, obviously you should have spare cpu available, not for a already heavily loaded server ):

image

Short story, how it works?

1-a web app gets the XMLA http requests targeted to the isapi SSAS msmdpump, inspects them for a very specific pattern. everything else passthrough to the regular msmdpump (debug/testing) or blocked (production server, we only need to answer a very specific pattern from performance point)

2-it splits the member list requested in the mdx rows (on rows), generates an identical MDX query for each group of N members

image

3-using .net parallel apis then sends x simultaneous queries/threads to the local SSAS db:

 

image

image

image

4-and the tricky part, getting the result cell set xml for each query, and aggregating that in a unique cellset xmla that then is returned to the original http request made by performance point (you can also fire a man studio mdx query and passing it through the proxy)

Closing Notes

Honestly I don’t even know if it’s a good snippet to share , but well we confirmed that it can work in very, very specific scenarios, as a last resource when you can’t  influence queries being made (tried that also… didn’t work out  so good…) …. If you risk being fired for not tuning that query that the CEO runs every day,  you may want to test something like this… otherwise forget what you’ve read. :)

But if this post happens to generate some requests where it can be put to good use I consider to share it privately or even on our codeplex sample site.

Please be warned that in most cases you won’t need anything like this, SSAS FE engine should work just fine  because it’s very smart (and we are not! :) ), only for very complex cubes and specific scenarios something as risky as this can be possibly useful.

ps-this also opens a rather interesting possibility, spreading the inner requests to secondary servers in a scale out farm…  imagine that working automatically for some queries… will we get this for SQL 2016? ;)

As for the code, its goes something like this (lots of complexity removed, just the core steps):

 

image

image

….

image

 

Take care,

Rui

Introducing SmartCharts Data Discovery App for Excel

Finally! :) time to briefly introduce our SmartCharts app for Excel now available at Office Apps Store (download here).

image  image  image image

So, yet another “chart” app/tool for Excel…what’s the point? :) what’s it for?

Well, beyond being a DevScope research project and a place where we will be test driving lots of #dataviz features using the latest technologies, there were some other drivers to build the app:

  • There’s so much data available these days, but still most people can’t even acknowledge that there’s lot of hidden value in data, being it small, medium or big data… so they don’t even start exploring it :(
  • Provide a data discovery tool that keeps user focus on the data, not modeling, not chart designing, just slice, dice & visual data mining
  • There are a lot of patterns for dataviz, but there seem to be a lack of reusable pre-built analytic models for the most simple data intelligence tasks, time series, pattern recognition, drill down views
  • We are building an excessive number of dashboards (really! we are), repeating the same patterns over and over again, same for data mining… can’t we get reusable analytic modules that quickly adapt to different datasets? Data+Services inter-connectable through already available metadata?
  • An Excel app  seemed an awesome way to continue the research work we started with our Data Insights addin few months ago (same core concepts really, we will be back to that eventually ;) )
  • A tool that we can now use to quickly & easily explore small datasets instantly in Excel (under 100k,150k rows) all kind of log tables, log streams, etl logs, usage logs. 
  • (yes, do not expect much value if you’re using it with small tables, or tables without time/date fields) it’s a data discovery tool, not a chart designing tool

All these are of course “experimental hypothesis” that we can now test and research (and already collecting lots of useful feedback :) ).

And stay tuned, we will be releasing some new features this week ;).

Download here & get us some feedback.

http://www.devscope.net/products/SmartCharts

Finishing this post with a few typical usage scenarios…

… these are from real data , loaded the app with different tables I regularly check (clickstream logs, usage logs, etl logs, ssas query logs,…)

(some values masked, sorry!)

Integration load resource usage (~80k rows dataset)

(side note that the first time I loaded this data on the app I immediately found that I was completely unaware on where integration load resources were being spent, and how that patterns were evolving over time)

image

Integration loads vs extracted rows

image

Drill down view of integration time over time

image

Time of day patterns (integration times)

image

Time of day patterns (rows extracted)

image

Now a completely different dataset(120k rows), obtained from an analysis services query trace table log, instant insights: (again note that no modeling or chart designing was necessary…just click, load & explore)

image

 

image

image

 

http://www.devscope.net/products/SmartCharts

Take care!

 

Rui

Follow

Get every new post delivered to your Inbox.

Join 371 other followers