“Multi-threading” the Sql Server Analysis Services Formula Engine II-a parallel query msmdpump proxy

Following a previous post (“Multi-threading” the Sql Server Analysis Services Formula Engine – I #ssas #mdx ), we returned recently to the issue of multi threading and SSAS formula engine. See that last post or this post by James Serra for reference.

Honestly it’s kind of amazing that something like the SSAS engine can run incredibly  well using a single threaded model for each query. It’s damn smart, and as long it runs smart you usually don’t need brute force. :)

Until… you find yourself with a very (very) complex cube, together with a rather complex scorecard model also completely built in SSAS/mdx, and…. PerformancePoint with its “with clauses” (and  like Chris Webb pointed before, with clause disables FE cache)

Sample query generated by PerformancePoint (a scorecard model, kpis on rows), running in 50-60 secs.

image

.

To return a complex and FE intensive scorecard with several different KPIs (completely different measure groups), with values/metrics like value, ytd value, prior year, end of year projection, target, ytd target, score and so on… well, requires a not so usual amount of computation by SSAS. Worst, due to the, let’s call it a “reasonable”, :) amount of mdx scopes involved it triggered some internal thresholds i, and it stops being smart… maybe switching to cell by cell mode? Query 3 kpis individually, none exceeds 2-3 secs, get the 3 together in the same query, –> 30-40 secs….

After exhausting all the tuning we could possibly remember… reducing down the query from 1m20 secs, to under 40-50 secs, but still annoying, why all the CPU power if we are waiting for damn 50 secs?

Another thing to note was that every row was a different KPI, completely isolated from all the other rows returned, really a very good candidate for partitioning and multi threading. But SSAS doesn’t do that (query hint would be great SSAS team ;) ),

so….

(and this where I have to say the usual disclaimer, please do this at home, never at work! it’s not supported… aside from tolerating that 50 secs what follows is the worst possible thing you could do…)

We had previously built some SSAS http msmdpump proxies before (it allows you to query SSAS server over http) adding some “extra” features we needed…

So why not trying intercepting that query in a “fake” msmdpump (proxy) built in asp.net, partitioning it by member rows requested, run a bunch of parallel MDX queries against the SSAS DB, get the results, join the cell set together, and return as a SSAS XMLA pump reply….? (kinda…nuts….yes)

And well, far, far, far away from being a reusable module for this scenarios we built it for only this specific one, changed the connections in PerformancePoint pointing to the http endpoint (only for this kpis/scorecards), and gave it a go:

From here, query running to 50 secs, almost no cpu activiy:

image

To here, query now running under 14 secs (multi-threading is very noticeable, obviously you should have spare cpu available, not for a already heavily loaded server ):

image

Short story, how it works?

1-a web app gets the XMLA http requests targeted to the isapi SSAS msmdpump, inspects them for a very specific pattern. everything else passthrough to the regular msmdpump (debug/testing) or blocked (production server, we only need to answer a very specific pattern from performance point)

2-it splits the member list requested in the mdx rows (on rows), generates an identical MDX query for each group of N members

image

3-using .net parallel apis then sends x simultaneous queries/threads to the local SSAS db:

 

image

image

image

4-and the tricky part, getting the result cell set xml for each query, and aggregating that in a unique cellset xmla that then is returned to the original http request made by performance point (you can also fire a man studio mdx query and passing it through the proxy)

Closing Notes

Honestly I don’t even know if it’s a good snippet to share , but well we confirmed that it can work in very, very specific scenarios, as a last resource when you can’t  influence queries being made (tried that also… didn’t work out  so good…) …. If you risk being fired for not tuning that query that the CEO runs every day,  you may want to test something like this… otherwise forget what you’ve read. :)

But if this post happens to generate some requests where it can be put to good use I consider to share it privately or even on our codeplex sample site.

Please be warned that in most cases you won’t need anything like this, SSAS FE engine should work just fine  because it’s very smart (and we are not! :) ), only for very complex cubes and specific scenarios something as risky as this can be possibly useful.

ps-this also opens a rather interesting possibility, spreading the inner requests to secondary servers in a scale out farm…  imagine that working automatically for some queries… will we get this for SQL 2016? ;)

As for the code, its goes something like this (lots of complexity removed, just the core steps):

 

image

image

….

image

 

Take care,

Rui

New samples at @devscope #msbi codeplex- II–#PowerShell Dynamic Outliers with #SSAS mining models

Finally… ! Updated our codeproject project with an improved sample for this. :) btw-See the first post on this thread for other Powershell MSBI data driven/dynamic samples & utilities

II-#PowerShell Dynamic Outliers Monitoring with #SSAS mining models

The idea for this sample goes right back to the initial releases of the Excel Data Mining Addin. Now that’s a cool Excel Addin, in fact it’s way more valuable today and it’s a little sad that data mining is even a little bit forgotten in the MSBI stack :( (yes, it is…). The issue with SSAS Data Mining, particularly this addin I think… it  really was ahead of its time! because now, in our always changing data driven world, it’s the time for such agile & self service data mining/data intelligence scenarios.

Anyway, that Addin showed that is was possible to do 100% dynamic outlier detection based only on the given data/schema. Models would be generated in runtime for each dataset.

And so, finally, we had the time to take this concept and turn to PowerShell to make a lot of other data driven scenarios  scriptable & able to run completely automated (ex: scheduled/intelligent alerts/trends/anomalies monitoring).

So, what if we could use this to monitor several data streams we have available in our solutions?  We already have & use several very agile & data driven kpis/reports/datasets… but that still usually involves fixed alert thresholds, what if we would have something looking at these data streams every day and tell us, “hey, here’s something that’s not usual, are you aware?”? (ie something that would do what I usually do, so that I can go on doing several other things… like blogging and reading :) ).

And that’s precisely what the sample scripts does:

  • iterate all the views in a predefined schema (love this pattern for dba/bi monitoring)
  • execute & get the datasets, we used SQL server but you can use any datatable
  • pass the datatables to the data mining module (it will, like the excel data mining addin, create a temporary model dynamically for each table, do the outlier stuff, some nasty hacks of our own, get an outlier probability value for each row
  • then filtering (in this scenario) for today outliers, using a convention styled approach – choosing the first date column available (btw-a better approach would be save the last execution time for each view and use that date value as a date filter)
  • if there are outliers, send the report by mail (using the cmdlets I talked about in my last post)
  • if not, keep quiet!

And anytime I want to monitor another dataset, just create another view in that schema, the script will adapt accordingly… hence the data driven mantra applies:) (data+services mindset to explore in a future post…)

Like this:

A schema (mon) where I can put the data streams I want to monitor for outliers/anomalies.

image

As a sample I use a dataset of SQL Analysis Services processing logs/elapsed times, by object (measure group, partition, dimension,…).

Ensure that I have a date column to allow for proper date/time filtering-after the outlier detection-, (1 month old outliers aren’t that interesting anyway).

 

image

Now imagine that I force something strange in that data stream, changing the started date as so to increase the processing time for that SSAS object (creating an anomaly)…

image

And I run the provided outlier sample script…. :)

image

and…amazingly (remembering that I did not configure any kind of warning thresholds…), my two “simulated” outliers are now highlighted in my mailbox

image

How cool is that? :) The script doesn’t know have to know the data in anyway, anything goes, that’s the beauty of it.

(defending myself :) from possible comments by MSBI team regarding this screenshot… I can only say that I promise to reassess my mobile stack after the “new” release of the MSBI stack for mobile… :)  )

Be advised though that the script is just a sample, there are a –few- known “issues” at this moment:

  • Do not expect perfect results, we didn’t spend much time with SSAS mining model tuning for the moment, and we are limited to the accuracy we can expect to get from analysis services mining models (would be great to get some data mining experts help/feedback!)
  • Some column types can cause the mining model/structure not to be created
  • We had to use a hack… and introduce a outlier probability of 2 when we were getting an outlier probability of 0… (cof, I know, will have to dig deeper as for the reason for this… sure we messed up anywhere)
  • Still missing is the column highlight where the outlier is most probable, that’s possible, the data mining addin does this but we hadn’t time for that yet, sorry
  • several others issues will appear I’m sure…( I did say that’s a sample right? :) )

That’s it for today, browse/get the sample at http://devscopebisamples.codeplex.com !

Note that there’s a bundle download available with Excel Data Insights AddIn, Mail DataTable Reports & this outliers sample.

The “tiny” main script:

# for each view found
$viewName = $reportViewsSchema+”.”+$_.Name

$data = Invoke-OLEDBCommand -connectionString $connStr -sql “select * from $viewName”

$firstDateCol= $data[0].Table.Columns | ? {$_.DataType -eq [datetime]} | select -index 1 | select $_.ColumnName

$dmTable = Set-Outliers -dataTable $data[0].Table -connectionString $ssasconn

write-host “Evaluating OutlierProbability threshold…”
$x=$dmTable | sort-object OutlierProbability -descending | select -index 5| select OutlierProbability

write-host “Highlighting…”
$dmTable=$dmTable | ? {($_.$firstDateCol -ge [System.DateTime]::Now.Date )} | select *,@{name=”SysRowCssClass”;expression={if ($_.OutlierProbability -ge $x.OutlierProbability ){ “highlight” } else { “” }}} | sort-object $firstDateCol.ColumnName -descending

$alertCount=($dmTable | ? {$_.SysRowCssClass -like “highlight”}).COunt
write-host “Today outliers: $alertCount”

# send mail only if alert column is present and its 1
if ($alertCount -gt -1)
{
$html = Out-Email -subject “Outliers in $viewName : $alertCount (today)” -inputObject $dmTable -to $mailto -from $mailfrom -smtpServer $mailsmtp
}

ps-and don’t tell my boss that PowerShell & Data Mining is doing our work now! ;) We’ll figure out something else to do eventually….

Take care,

 

Rui

This is a post reminder that we are still waiting for #SQL #PowerView to be able to explore #SSAS Multidimensional…

Not the most inspired post I know :) Just a reminder.

I would also add that the possibility that this would only work for SQL 2012… would be a tiny little bit difficult to understand honestly… Or at least, give us strong technical reasons please.

And don’t forget to vote the connect request.

Rui

“Multi-threading” the Sql Server Analysis Services Formula Engine – I #ssas #mdx

The SSAS single threaded Formula Engine (FE)

It’s a known (also unfortunate…) fact that analysis services (multidimensional) formula engine (FE) is single threaded.

Now, for anyone not familiar with SSAS this should come as a very unusual  limiting factor. Honestly, in a world dominated by the cloud, map/reduce, MPP/shared nothing platforms, (imo) the only plausible reason to still have a single threaded SSAS FE, is that we are really looking at an amazing product from a group of amazing minds that, unfortunately, hasn’t got any major upgrade since its very first 2005 release. (aside from 2008 block computation improvements ), and yes the reason for that reason would need another kind of post…

Well, not really single threaded…

A relevant note nevertheless is that, when we say SSAS FE is single threaded, we’re not talking about a single threaded engine FE per server, where we would have a single thread handling all session queries. That would be obviously unsustainable. What seems to happen -as far as I can guess-  is that the FE is single thread per query. When the work is heavily FE bounded, a single thread will handle the FE load. Other simultaneous queries will get their own FE thread from the thread pool. (note: the number of total available worker threads is not unlimited of course)

A very simple query will show this behavior a (using with clause to inhibit the FE cache –a tip from Chris I use a lot – and median calculation to make it very CPU/FE bound):

with
   member [Measures].[Test1] as
   ‘median([Customer].[Customer].[Customer],[Measures].[Internet Sales Amount])’
   member [Measures].[Test2] as
   ‘median([Customer].[Customer].[Customer],[Measures].[Internet Tax Amount])’
   member [Measures].[Test3] as
   ‘median([Customer].[Customer].[Customer],[Measures].[Internet Order Quantity])’

select
non empty {[Measures].[Test1],[Measures].[Test2],[Measures].[Test3] } on 0,
non empty [Date].[Calendar].[Month]*[Product].[Category].[Category] on 1
from [Adventure Works]

This query running on my laptop (SQL 2012 SSAS) will average 45-55 secs duration, with a very constant 20%-25% cpu usage (4 listed cores):

image

   

image

 

image

In a multi-core server environment  this would be even more noticeable, with very low total cpu% usage during that query FE execution.

The real issue (for end users at least) is that adding more cores/cpus will not have any significant performance  improvement for queries like this –but faster cpus should help a bit-. A single thread will handle all the load, all the cpu% power on recent hardware will be completely wasted. You will see several FE heavy queries taking several seconds to complete, almost without any kind of cpu/io pressure. “Uncomfortable” to watch at least.

“Partitioning” MDX queries & multi threading

If a “final round” can be expected for SSAS multidimensional I really hope that this could be somehow fixed (configurable at least), in a way that multiple threads could handle the FE load when appropriate.

(relevant update, see comments below, thanks Chris! don’t miss this post from Chris Webb, Parallelism and MDX Queries, an amazing piece of work using ssas proc/clr to multi thread some specific query patterns, may be this can be fine tuned for a wider range of scenarios)

Partitioning the workload by unrelated measure makes for a very strong candidate. (related measures will share intermediate calculations,cache, and probably won’t benefit so much from multi threading).

An example, let’s split that previous query into three, one for each measure requested, (note that I used 3 unrelated base measures)

(query 1, the others are similar)

image

Executing them in parallel will result in something like: query1: 24 secs, query2: 23 secs, query 3: 23 secs. But as they executed simultaneously, we only have to wait for the longest one to get all the data we need, ~24 secs. Less 50% of the original single query duration.

CPU usage now spikes predictably to ~75% during the ~25 secs queries duration.

 

image

Possible scenarios for query partitioning

So, what do we take from here? Well, it depends a lot on each specific query workload… But it can come handy in some scenarios:

  • you have a system were cpu capacity is clearly underused most of time
  • you control the queries sent to ssas, through a web frontend for example or you are using a frontend that can use multiple parallel requests (tip: performance point)
  • queries are easy “partitionable“, eg: by measure
  • you already did everything possible regarding the usual ssas performance tuning best practices (at least everything -you think- is possible)

In these scenarios,  you can try to partition the queries, execute and aggregate the results. There’s a chance that your users can benefit from better response times.

(and of course partitioning queries like this is a poor solution for badly written MDX that tends to be very FE intensive. For that my recipe is always refer back to Chris, Marco, Mosha, Piasevoli, Greg, Kasper, Richard, and many others, never forgetting our amazing BI team at DevScope :) )

Next post: “Multi-threading” the Sql Server Analysis Services Formula Engine II – Performance Point

Stay tuned!

Rui

Tabular Queries in Excel 2013, Good news, Bad news,DAX Queries and the missing DAX visual query builder

(update: SmartPivot for Excel version 2.3 – a much improved version- has just been published, please visit SmartPivot Product Page for the latest version Excel BI features including instant search, quick connect , a stunning pivot viewer visualization for Excel tables & many others)

image6[3]

SmartPivot Latest Version for Excel 2007, 2010, 2013 & PowerPivot

The never-ending thread… :) Anyway, I just finished some tests with Excel 2013, like the title says, good news, and bad news. (that is, imo).

So, bad news first, so that I can finish this post in a good mood!

Bad News-“Tabular” Pivot Table on OLAP cubes, the scalability/performance problem is still there

Not that I was expecting improvements here, but it’s sad to note that for now I get the same very bad experience both with 2010 & 2013.

Just load an olap pivot table with 11 columns, a tiny rowset of 244 rows and I had to wait more than a minute for the data. (In the process I also crashed SQL Man Studio and Excel due to the abnormal number of cells returned)…

image

Fortunately, I was glad to see that SmartPivot tabular report works perfectly on Excel 2013 Preview.

image

Same 244 rows in a native table, 11 columns, ~1.6 secs… No dumps.

image

Anyway, it would be great that native excel cube pivot tables could be fixed to enable this scenario. Let’s see in the RTM…

So, now for the good news…

Good News-Edit Dax Query in Excel tables shows potential, but we’ll need a friendly query editor soon!

If there’s one thing I always hoped that the tabular model could fix, (and as the tabular suggests… ), were real tabular queries, with real excel tables. A single semantic model (read *real* semantic, not bism…) to rule them all (both pivot and table queries).

Yes, there’s more to data science than pivot tables and low grain aggregated values.

For a lot of (good) reasons it’s very common for users to need that style of data layout. Why the hell can’t users use the same platform/solution to get a simple list of customers by country, or products by categories (yes, there are no metric involved)

And although I don’t feel the problem is properly solved in the Excel 2013 preview, there’s a new table menu option that immediately caught my attention: Edit DAX.

image

That will allow to edit the underlying DAX model query being used, switching from table style query to a full DAX query.

So for example, I can create a table from one of my data model existing tables,

image

and then changing the table query to a DAX query, and craft the query to return an additional calculated column (ex: score based on average rating)

image

I can only guess (hope!) that anytime soon, the old field list/pivot layout that  today supports  most of pivot table/pivot chart data analysis will be finally replaced to a full grown olap/tabular friendly query editor, that can target lots of data layouts (pivot/table style). Like the field list in a pivot table, there should be a field list for a data model based table query. That would be pretty amazing.

Because, and I’ll have to disagree with some msbi most valuable guys :), DAX (like MDX) is extremely difficult to craft  manually beyond the basic stuff.  For pivot based layouts we can live (hardly…) with the drag & drop field list. For tabular queries (particularly for DAX queries) users will need a new (very easy & very visual) query editor tool that explores the full potential of tabular backends.

That reminds me that although I can’t say that DAX is a lot more complex that MDX (I’m tempted to…:) ), or the way way around, one thing is for sure, Excel OLAP users never needed to know MDX to work with OLAP, I and hope that they’ll be able to get the full power of tabular without knowing DAX too!

Btw, regarding DAX queries, be sure to check Paul te Braak promising DAX Studio and his latest posts on DAX:

DAX Querying Part I

DAX II – Extending the use of Evaluate

DAX Studio @codeplex

 

Final note – Slicers also work with DAX query tables, but they use visual filtering in the Excel table.

It was rather intriguing to note that I could use a table slicer (new feature of Excel 2013), also when using a custom DAX query. In fact the newly added DAX column was immediately available to use as a slicer. But the reason for that become pretty obvious: the slicer uses the Excel table column filters to filter the visible rows only, it doesn’t change the underlying table query. Ex:

image

So, that’s it for now. Take care,

 

Rui

Missing in Action – The Analysis Services UDM/Dimensional Team?

Right from the start I know this won’t be a pretty post. :) Sorry about that, about the rant… its just the way I see it.

Let me say honestly that my contribution to MS  connect has been non existent… shame on me, no excuses, that probably somehow undermines some thoughts & perspectives that follow… But honestly, I’m irrelevant here, that’s not the point.

My friend & team mate Rui Romano was pointing me to a recent  issue on MS connect (see Subscription and Alerts Issues with Analysis Services in SQL Server 2012 by Teo).

Reading (and voting btw) on the issue submitted by Teo led me to browse & search for lots of related connect issues on analysis services (mainly dimensional)…

And again it bothers me, as it appears that (to me at least) , regarding ssas dimensional, either there is no team anymore or that team is actually doing zip-zap-nothing as since the 2005 release I would say that the effort on improving the product (dimensional) is extremely lacking….

Almost all the suggestions regarding ssas dimensional, since the 2005 release, are still active or worst, closed as won’t fix.

And the few features that made to 2008 are not even rather memorable… I prefer bids helper for aggregations, the designer for attribute relations is imo worst and confusing. (linear restore/backup time is one of the few I actually find useful, on par with block mode optimizations)

It seems that ssas dimensional model boils down to some lonely guy, intern, working part time, doing….what?…documentation?

I really do find extremely valuable and needed that MS has now a very strong self service BI platform, the promising powerview and –finally- data alerts! (although I can’t believe that this amazing feature is actually “locked” on reporting services reports….. ). But aside from that, honestly, letting go of a great (and I mean great!) product like analysis services dimensional… what the hell?

By now MS should really be clear on this, where are we going, with what tools & models… and assuring that we’ll have a better platform, not just another-different- one.

And don’t even start with BI “Semantic” Model! I don’t see any clue of that for now (remind you its RC0). There are bism models, which are tabular, and there are udm which  are dimensional. I’m yet to see the bi semantic model that will properly bind and unify these two worlds.

The famous post from Chris Webb comes to my mind frequently. Chris wrote at the time:

“The point I was trying to make in my original post was that the announcements made at PASS, as I and everyone I spoke to there interpreted them, made me very concerned (to say the least) for the future of the UDM and the multidimensional model. First of all there was the news that Microsoft was putting all of its development efforts into Vertipaq and BISM, while the UDM was (for yet another release) getting very few obvious improvements. Then there was the news that Project Crescent was only going to support BISM as a data source and not the UDM, which made it seem like the UDM was a second class citizen in this regard. And finally there was a lack of clarity in the roadmap which meant I wasn’t sure whether BISM was meant to replace the UDM or not, or whether BISM would ever be able to do the same things that the UDM can do today.”

The concern seems actually pretty up to date to me. Actually a little reinforced as we are now on rc0.

Its really unfortunate that so many obvious, extremely requested & frequently questioned by customers, issues remain on connect without any kind of  follow up (makes sense if there’s no one on the team anyway).

Some of my favorites… still waiting…:

https://connect.microsoft.com/SQLServer/feedback/details/403083/drillthrough-improvements-to-make-usable-in-large-cubes#tabs

https://connect.microsoft.com/SQLServer/feedback/details/249697/change-measures-display-decouple-them-from-measure-groups-to-the-client

https://connect.microsoft.com/SQLServer/feedback/details/392972/drill-through-on-semi-additive-measures-in-analysis-services

https://connect.microsoft.com/SQLServer/feedback/details/713196/enable-power-view-to-access-multidimensional-models#tabs

https://connect.microsoft.com/SQLServer/feedback/details/225212/enable-drillthrough-on-calculated-measures

 

(ok, this one is priceless :) I forgive the team for this one)

https://connect.microsoft.com/SQLServer/feedback/details/346154/attributes-limitation-in-analysis-services

 

So I would end with a public connect request: Can the ssas team please go public the ssas udm server code base?… so that at least anyone can take it further where it could be today, seven years passed the great & amazing 2005 udm release? :)

Not a pretty post as I said :), hoping that I’m dead wrong too!

Ending with a more positive tone. Comes to mind that MS should start working on a proper Data Server, the ultimate real semantic model server that will join together olap with tabular, odata driven & enabled, with real data alerting, incorporating also the kind of scenarios that data explorer is targeting today. Running on the cloud or on-premise. That would by Data Intelligence on steroids! :)

Sorry about this one!

Rui

Excel 2007/2010 Pivot Tables: Getting detailed/granular table reports from OLAP… in seconds!

(update: SmartPivot for Excel version 2.3 – a much improved version- has just been published, please visit SmartPivot Product Page for the latest version Excel BI features including fast tabular reports, instant search, quick connect , a stunning pivot viewer visualization for Excel tables & many others)

image6[3]

SmartPivot Latest Version for Excel 2007, 2010, 2013 & PowerPivot

 

Well, I’ve been delaying  & postponing this post. Being the main reason having some assurance that this could have some actual value :). Really, I’m not sure, but anyway, here it goes!

 

A few weeks (actually I think I now can say months… :) ) I was called to help a customer solve a set of somehow advanced scenarios on Analysis Services. One of the issues, actually the big one, was that end users were having a real hard time getting detailed reports (near to drill trough, maximum granularity “detail” queries) working with Excel 2007 Pivot Tables.

 

My first thought was, honestly :) , “sure… olap & analysis services are mostly aggregate oriented, not the best option for creating large facts/rows detailed reports“. We could say, if being a little “naive”, that there other good options like using reporting services,etc.  IMO that really doesn’t cut it anymore, my reasons:

 

  • End users really don’t care if they want aggregates or detail rows, they want a single BI repository to their information needs. And they’re right! :)
  • Solutions like building  custom reports have the problem that they don’t allow free adhoc report authoring & filtering
  • From a “single version of the truth” perspective (whatever that means :) ) using several tools & repositories to show almost the same information is the best way to go (IMO)

 

So perceiving that end users have good reasons I went a little further to see why, when getting back *only* near 10.000 records with 8/9 fields on a pivot table, Excel 2007 mostly….ahh…”crashed”, or took several minutes to get the “tiny” 10.000 rows pivot table back.

 

And the results were somehow surprising. In this specific scenario it really has nothing to do with OLAP/Analysis Services, but with the mdx query excel 2007 generates, that even with hidden grand totals (to mimic a table result), requests aggregated values to the olap (All member).

 

The problem with is this? Well  for a result of 784 real result rows (what end user sees), using 13 columns, olap processes aprox. 125.000 total cells, and returns aprox 10.000 total rows. But if the result rows are now 6.207 we get 1.211.896  total cells from olap, and  86.564   total olap rows. Why is that?

 

The problem is that Excel still requests all member aggregations even when Excel is not showing that values (that’s a “table” report layout remember?). In this case the yellow rows are returned from analysis services but never shown.

image

The query goes something like this (note drilldown member style of query):

 

image

 

This is unfortunate because a simple custom mdx query using standard cross joins to get the same “table” style results, actually returns on just a few seconds!!! Something like:

image

 

(note: a few weeks later Richard Lees posted exactly the same problem, with great insights & tips: using properties or custom “reporting” hierarchies. Please see here . That’s a great post. This was after my issue, it sure would have helped me at the time! :) )

 

So, although testing this on Excel is difficult due to performance maybe we’re talking of something like (relation between real rows vs olap rows with 13 columns)

 

image

 

So can we do something do to ease things out? And still maintaining end user authoring & filtering ? :) Well, yes we can! We really can’t tailor the MDX query generated by Excel…but.. we can change the mdx query from an external data table query, and we can get the filters from the pivot and using them on our external table query! :)

 

The results are really impressive, actually I was deeply surprised, warm cache queries for 6.207 rows are under a second, for 73.094 rows under 15 seconds, we were actually  able to return more than 1million rows from analysis services, using this technique, in a few minutes.

 

(note: that when using native pivot table for the 73.094 rows scenario I couldn’t get the query to end without first crashing my excel – if you use Prism HUD (great tool btw) the memory & cpu footprint of this native pivot table queries are *very* noticeable…

in fact be careful when exploring several columns excel pivot tables with analysis services, particularly  if both are in the same box…  I got the impression that my disk almost “fried” due to excessive memory usage…).

So, here’s the sample I’ve worked on, I would really like to get some feedback. :)  You can filter the table report like a regular pivot table. It checks the results so that grand totals check, and show the refresh time in seconds. It also does the trick of changing the column labels to the friendly ones.

 

image

Some results (note excel result rows vs olap rows):

 

image

 

Some notes:

 

  • When opening you must enable macros, that’s the way custom mdx query is changed when adding/removing filters on the fast sample
  • I use a named instance for SQL2008, so if you have default instance you have to change it on the connections tab
  • The sample is bases on the Contoso Retail Olap sample
  • If you need to add more columns to the sample just edit the query on the config sheet. As long as you leave the {filters} token
  • Doesn’t support multi select filters (hey, it’s just a sample! actually my customer did an awesome job with this startup, and the final workable excel was great)
  • There’s a strange result with employee jae,pak witch doesn’t show up on the native pivot table (fell free to send feedback on this, my time is limited :) )

image

 

 

 

 

So, that’s it. I think there’s some potential here I and would like to share & receive feedback. I talked with several people on the possibility that we were missing some excel option to fix this, and that it’s still open :) , if you know one please share it.

 

Richard Lees mentions that this should be fixed on 2010, but I’m still getting the same behavior on the sample, again, maybe we missed something. I also would like to share some final notes:

 

  • End users really shouldn’t be worrying about using the analysis services only for aggregated values. What they want is a single repository for their information needs, specially if the data scope is the same! I actually think there should be a really, cof, :) Unified Model that covers both type of scenarios.
  • As it seems that we can actually use olap to do, at least some, more granular queries… why doesn’t excel offer a real table style, no totals, end user authored pivot table? Doesn’t seem difficult to me…

 

Thanks!

 

Rui

 

Update Aug-2010:

Update Sep-2011: (some notes from Chris Webb on the issue, now regarding denali tabular & dax scenarios-surprisingly…still a problem! -)

Sample Download:

 http://cid-8b11b4c8b8ea33ad.skydrive.live.com/redir.aspx?resid=8B11B4C8B8EA33AD!223&Bpub=SDX.Docs&Bsrc=GetSharingLink