“Multi-threading” the Sql Server Analysis Services Formula Engine II-a parallel query msmdpump proxy

Following a previous post (“Multi-threading” the Sql Server Analysis Services Formula Engine – I #ssas #mdx ), we returned recently to the issue of multi threading and SSAS formula engine. See that last post or this post by James Serra for reference.

Honestly it’s kind of amazing that something like the SSAS engine can run incredibly  well using a single threaded model for each query. It’s damn smart, and as long it runs smart you usually don’t need brute force. :)

Until… you find yourself with a very (very) complex cube, together with a rather complex scorecard model also completely built in SSAS/mdx, and…. PerformancePoint with its “with clauses” (and  like Chris Webb pointed before, with clause disables FE cache)

Sample query generated by PerformancePoint (a scorecard model, kpis on rows), running in 50-60 secs.

image

.

To return a complex and FE intensive scorecard with several different KPIs (completely different measure groups), with values/metrics like value, ytd value, prior year, end of year projection, target, ytd target, score and so on… well, requires a not so usual amount of computation by SSAS. Worst, due to the, let’s call it a “reasonable”, :) amount of mdx scopes involved it triggered some internal thresholds i, and it stops being smart… maybe switching to cell by cell mode? Query 3 kpis individually, none exceeds 2-3 secs, get the 3 together in the same query, –> 30-40 secs….

After exhausting all the tuning we could possibly remember… reducing down the query from 1m20 secs, to under 40-50 secs, but still annoying, why all the CPU power if we are waiting for damn 50 secs?

Another thing to note was that every row was a different KPI, completely isolated from all the other rows returned, really a very good candidate for partitioning and multi threading. But SSAS doesn’t do that (query hint would be great SSAS team ;) ),

so….

(and this where I have to say the usual disclaimer, please do this at home, never at work! it’s not supported… aside from tolerating that 50 secs what follows is the worst possible thing you could do…)

We had previously built some SSAS http msmdpump proxies before (it allows you to query SSAS server over http) adding some “extra” features we needed…

So why not trying intercepting that query in a “fake” msmdpump (proxy) built in asp.net, partitioning it by member rows requested, run a bunch of parallel MDX queries against the SSAS DB, get the results, join the cell set together, and return as a SSAS XMLA pump reply….? (kinda…nuts….yes)

And well, far, far, far away from being a reusable module for this scenarios we built it for only this specific one, changed the connections in PerformancePoint pointing to the http endpoint (only for this kpis/scorecards), and gave it a go:

From here, query running to 50 secs, almost no cpu activiy:

image

To here, query now running under 14 secs (multi-threading is very noticeable, obviously you should have spare cpu available, not for a already heavily loaded server ):

image

Short story, how it works?

1-a web app gets the XMLA http requests targeted to the isapi SSAS msmdpump, inspects them for a very specific pattern. everything else passthrough to the regular msmdpump (debug/testing) or blocked (production server, we only need to answer a very specific pattern from performance point)

2-it splits the member list requested in the mdx rows (on rows), generates an identical MDX query for each group of N members

image

3-using .net parallel apis then sends x simultaneous queries/threads to the local SSAS db:

 

image

image

image

4-and the tricky part, getting the result cell set xml for each query, and aggregating that in a unique cellset xmla that then is returned to the original http request made by performance point (you can also fire a man studio mdx query and passing it through the proxy)

Closing Notes

Honestly I don’t even know if it’s a good snippet to share , but well we confirmed that it can work in very, very specific scenarios, as a last resource when you can’t  influence queries being made (tried that also… didn’t work out  so good…) …. If you risk being fired for not tuning that query that the CEO runs every day,  you may want to test something like this… otherwise forget what you’ve read. :)

But if this post happens to generate some requests where it can be put to good use I consider to share it privately or even on our codeplex sample site.

Please be warned that in most cases you won’t need anything like this, SSAS FE engine should work just fine  because it’s very smart (and we are not! :) ), only for very complex cubes and specific scenarios something as risky as this can be possibly useful.

ps-this also opens a rather interesting possibility, spreading the inner requests to secondary servers in a scale out farm…  imagine that working automatically for some queries… will we get this for SQL 2016? ;)

As for the code, its goes something like this (lots of complexity removed, just the core steps):

 

image

image

….

image

 

Take care,

Rui

Advertisements

New samples at @devscope #msbi codeplex- I–Quick Dynamic #PowerShell Mail DataTable Reports

Just uploaded a few more BI samples at our codeplex MSBI samples project. This time some purely data driven PowerShell cmdlets & samples for sending data table reports and… :) detecting outliers (dynamically & at runtime) using SSAS data mining temporary models (very similar to what Excel Data Mining addin does in fact).

Usage for this scripts  is mainly for agile monitoring & alerting needs. Be advised that the outliers reports is work in progress though – really just a concept sample.

btw-Powershell scripts/samples credits go to Rui Romano, impressive as always :)

I-Quick, Agile, Data driven, dynamic, mail reports from your queries

image

Just produce a DataTable anywhere with PowerShell (useful Invoke-OLEDBCommand cmdlet also in the sample)

Then send it quickly by mail, no UI binding needed, no reports, your data is the report. Note that table formatting is a little tuned beyond the PowerShell default html tables. You can highlight, there’s also a more data friendly handling of numeric values and some others.

Import-Module .\DevScope.BI.Samples.PowerShell.psm1 –Force

$data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

    $html = Out-Email -subject “Alerts in $viewName : $alertCount” -inputObject $data -to $mailto -from $mailfrom -smtpServer $mailsmtp

The fun starts when you combine this with a complete agile/data driven mindset. For example, mailing every view in predefined schema as long as it as at least a row with a column alert flagged to 1. That’s precisely what the sample script provided does (and our preferred use):

$reportViewsSchema=”mail”

Invoke-OLEDBCommand -connectionString $connStr `
    -sql “select Name from sys.views where SCHEMA_NAME(schema_id) = ‘$reportViewsSchema'” |%  {
   
   # for each view found
    $viewName = $reportViewsSchema+”.”+$_.Name
   
    $data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

   
    $alertCount=($data | ? {$_.Alert -eq 1}).COunt
    write-host “Alerts found: $alertCount”

    # send mail only if alert column is present and its 1
    if ($alertCount -gt 0)
        {
        $html = Out-Email -subject “Alerts in $viewName : $alertCount” -inputObject $data -to $mailto -from $mailfrom -smtpServer $mailsmtp
        }

}

 

Mix this with SQL DMVs, SSIS Logs, SSAS Logs & whatever info you gather for monitoring/operations your BI/other solution, result: pure fun & almost effortless insights! :)

1) prepare your data

image

2)configure your scripts, edit the ExecConfig.ps1 for settings, and

image

or even better :)

image

 

That’s BI for the BI team, as we don’t actually have a budget for that really. :) We need the insights though!

Just download the scripts at devscopebisamples.codeplex.com. Would be great to improve and get some feedback!

image

Next post: Dynamic Outlier Detection with PowerShell and SSAS Mining models!

 

Rui

“Multi-threading” the Sql Server Analysis Services Formula Engine – I #ssas #mdx

The SSAS single threaded Formula Engine (FE)

It’s a known (also unfortunate…) fact that analysis services (multidimensional) formula engine (FE) is single threaded.

Now, for anyone not familiar with SSAS this should come as a very unusual  limiting factor. Honestly, in a world dominated by the cloud, map/reduce, MPP/shared nothing platforms, (imo) the only plausible reason to still have a single threaded SSAS FE, is that we are really looking at an amazing product from a group of amazing minds that, unfortunately, hasn’t got any major upgrade since its very first 2005 release. (aside from 2008 block computation improvements ), and yes the reason for that reason would need another kind of post…

Well, not really single threaded…

A relevant note nevertheless is that, when we say SSAS FE is single threaded, we’re not talking about a single threaded engine FE per server, where we would have a single thread handling all session queries. That would be obviously unsustainable. What seems to happen -as far as I can guess-  is that the FE is single thread per query. When the work is heavily FE bounded, a single thread will handle the FE load. Other simultaneous queries will get their own FE thread from the thread pool. (note: the number of total available worker threads is not unlimited of course)

A very simple query will show this behavior a (using with clause to inhibit the FE cache –a tip from Chris I use a lot – and median calculation to make it very CPU/FE bound):

with
   member [Measures].[Test1] as
   ‘median([Customer].[Customer].[Customer],[Measures].[Internet Sales Amount])’
   member [Measures].[Test2] as
   ‘median([Customer].[Customer].[Customer],[Measures].[Internet Tax Amount])’
   member [Measures].[Test3] as
   ‘median([Customer].[Customer].[Customer],[Measures].[Internet Order Quantity])’

select
non empty {[Measures].[Test1],[Measures].[Test2],[Measures].[Test3] } on 0,
non empty [Date].[Calendar].[Month]*[Product].[Category].[Category] on 1
from [Adventure Works]

This query running on my laptop (SQL 2012 SSAS) will average 45-55 secs duration, with a very constant 20%-25% cpu usage (4 listed cores):

image

   

image

 

image

In a multi-core server environment  this would be even more noticeable, with very low total cpu% usage during that query FE execution.

The real issue (for end users at least) is that adding more cores/cpus will not have any significant performance  improvement for queries like this –but faster cpus should help a bit-. A single thread will handle all the load, all the cpu% power on recent hardware will be completely wasted. You will see several FE heavy queries taking several seconds to complete, almost without any kind of cpu/io pressure. “Uncomfortable” to watch at least.

“Partitioning” MDX queries & multi threading

If a “final round” can be expected for SSAS multidimensional I really hope that this could be somehow fixed (configurable at least), in a way that multiple threads could handle the FE load when appropriate.

(relevant update, see comments below, thanks Chris! don’t miss this post from Chris Webb, Parallelism and MDX Queries, an amazing piece of work using ssas proc/clr to multi thread some specific query patterns, may be this can be fine tuned for a wider range of scenarios)

Partitioning the workload by unrelated measure makes for a very strong candidate. (related measures will share intermediate calculations,cache, and probably won’t benefit so much from multi threading).

An example, let’s split that previous query into three, one for each measure requested, (note that I used 3 unrelated base measures)

(query 1, the others are similar)

image

Executing them in parallel will result in something like: query1: 24 secs, query2: 23 secs, query 3: 23 secs. But as they executed simultaneously, we only have to wait for the longest one to get all the data we need, ~24 secs. Less 50% of the original single query duration.

CPU usage now spikes predictably to ~75% during the ~25 secs queries duration.

 

image

Possible scenarios for query partitioning

So, what do we take from here? Well, it depends a lot on each specific query workload… But it can come handy in some scenarios:

  • you have a system were cpu capacity is clearly underused most of time
  • you control the queries sent to ssas, through a web frontend for example or you are using a frontend that can use multiple parallel requests (tip: performance point)
  • queries are easy “partitionable“, eg: by measure
  • you already did everything possible regarding the usual ssas performance tuning best practices (at least everything -you think- is possible)

In these scenarios,  you can try to partition the queries, execute and aggregate the results. There’s a chance that your users can benefit from better response times.

(and of course partitioning queries like this is a poor solution for badly written MDX that tends to be very FE intensive. For that my recipe is always refer back to Chris, Marco, Mosha, Piasevoli, Greg, Kasper, Richard, and many others, never forgetting our amazing BI team at DevScope :) )

Next post: “Multi-threading” the Sql Server Analysis Services Formula Engine II – Performance Point

Stay tuned!

Rui