Notes from Cortana Womens Health Risk #MachineLearning Competition-I

Another Cortana Machine Learning is now over – Womens Health Risk Assessment -, be sure to check the Winners, and most importantly  their submissions and overall approaches to the problem.

Must confess I hadn’t so much fun in a while working with data, stats, code & dataviz, been an amazing learning journey, really highly effective on consolidating ~more than a  year of almost exclusive deep dive/personal quest in data,stats, R, machine learning (and yes, some math involved…).

Have to say  almost as equally frustrated missing the top 10….again, 6th on public leaderboard, 13th on final rankings. Far from one of my  goals, actually winning the competition. And trust me, I gave it all, to the last mile.
(ah… and yes, prizes would be great help on research funds Winking smile )

I had got a very lucky 34th place in previous brain signals competition, but that one was completely out of reach, due to the knowledge needed on signal processing/ECoG, gave up midway really)

So, I decided to take this one very seriously as I knew it would be the only way to maximize learning, both on competition data, ml process and everything data/stats/ml related. I would only stop when there were abs nothing more I could do, and so it was.

Other goal, ensure that DevScope ml minds could maximize learning also, have some fun, hopefully get into winners, or at least top 10. Very happy to see that we’re all top 25. Smile Huge congrats Bruno & Fábio. (still couldn’t  get most brilliant mind at DevScope to enter this one though, hope better luck next time…. Winking smile )

I put so many hours… days, weeks on this one, if there hadn’t been so much learning involved  (and personal enjoyment)  I would classify this as my most unproductive period… ever.

So, I’ll publish as much as I possibly can in upcoming blog posts, what worked out, what didn’t, code, tips, insights that still intrigue me, from Azure ML, to Power BI, R, caret, xgboost, random forests, ensembles, learning with counts, even some python. Also noted as much as I can on code files, OneNote was huge help, it’s great now to look back and have all this.

Other goal was to pursue both native Azure ML & custom R tracks to compete, but preferably win with Azure ML native modules (as I knew few top competitors would be all R/python). Still believe that using AzureML modules, no code required, would be enough for top 10-20 rank. But also believe, with current azureml capabilities,  if you want to win, you’re better off using native R/python from the start.

That’s relevant, also relevant  that top 3 winners submissions use base R, xgboost,  all features, no AzureML native modules (azureml teams, some hints here…)

Still, it isn’t over yet, as now I can look back and try to understand what could I have done differently and learn from top submissions. How much randomness is involved in a competition that was so close like this one? bitten by excessive variance on my submissions? multiple comparisons? Can an ensemble of the best techniques and models from top submissions improve the score we have to date? what more insights can we get from competition data? submissions data?

Awesome journey, so much still to learn…. hope to share as much, stay tuned!


(just the tip of the iceberg really…)

online PowerBI /EDA Companion









2 Published Data Stories in #PowerBI Data Stories Gallery

Few others to follow, meantime recently published these . Be sure to check all the gallery, really amazing & inspiring data stories there.

Cortana Machine Learning Competition – Womens Health Risk Assessment – Power BI Companion

Small companion for anyone participating in the Cortana Machine Learning competition – Women’s Health Risk Assessment

The Power BI file allows for easy data profiling over the supplied training dataset, overall patterns, data quality, missing data, feature selection. Note: you’ll still have to build the models though!

Although the current version of Power BI Publish to web doesn’t allow  R visuals, they’re highly recommended (ex: density plots & others), and can easily be added to the data/column profile pages.

So what are you waiting? Smiley Happy You can be running your first submission in minutes using the online tutorial. Then it’s up to you.


The Story of Dr Ignaz Semmelweis – “Savior of Mothers”

This sample PowerBI workbook tells the real story of Dr Ignaz Semmelweis. Year is 1846, Dr Semmelweis begins his 3 year work at Vienna Hospital maternity clinic, and is immediately  disturbed  with the high incidence of puerperal fever (also known as "childbed fever") in Vienna Hospital, causing  extremely high maternal mortality between 1823-1850.

Semmelweis was one of the first data “detectives” of our time, using data and statistics to test his hypothesis and theories.

This workbook allows you to explore the data collected by Semmelweis through those years and shows typical exploratory patterns we can use to hint on important information and possible causes from observational time series data.

You can see the video walkthrough here:


A sense of Wonder… reflecting on the #PowerBI amazing journey

Few hours left in the day, been an amazing day for a lot of reasons… better late than never, they usually say right?

So… finally dumping a blog pots that formed on my mind going back several months (cof, years…) now… Anyway, today is even more relevant than then, so…
Warning: there won’t be fancy templates, crazy ideas,  strange tools, insights, snippets or new downloads on this post. in fact it will be a completely useless post for anyone reading I guess, I just need to get it out.  Don’t have a clear objective, or clear picture of what will come up. Also I won’t be slightly concerned if it’s too short, or too long, I won’t massively edit it (ok, slightly…). Fellow reader, I’m not writing for you today, I’m sorry.

Still, A-post-must-be-written.

So, how to start? How did we get here so fast? Seems only yesterday we were seeing this new Power BI Dashboards screenshots shown at WPC 2014. ~7-8 months after seeing Power BI GA then already a immensely powerful & stable Enterprise/Self-Service/even Personal BI platform, soon after that our first major project with the technology is online.

I’m not easily surprised, and yet time & time again I was completely baffled with what Power BI teams came up delivering, beyond user & community most requested features & needs, a lot others I honestly didn’t knew I was missing, and yet I can’t live without today.

Such an amazing journey… not in my wildest dreams, honestly I would say we would get here. And boy did we need it. Back in 2013-2014 remember we were getting very uncomfortable, mainly for the medium sized/enterprise BI scenarios, at least for any BI team that only works with Microsoft BI stack: powerful but completely stale OLAP engine, no clear roadmap,  failed products ("Planning" anyone?) , acquisitions that ended up bringing no value in the long term, huge number of disparate tools with poor integration, pieces that didn’t fit properly together aside from very superficial demonstrations. (we deliver solutions we hope can add value, not fancy toys that work great in demos)

The nagging feeling that just selling the latest tools was more important the adding value. I’m still a big believer that revenue is a side effect from value we bring for customers. Add value, then capture some of it so you can add more value, repeat.  You can make the other way work temporarily but it won’t be sustainable in the long term, and even if it were… that’s not the point right?   That was the  feeling honestly, yes, we had some great tools (heck PowerView  on SilverLight was great for its time, we couldn’t have Power BI today without “project crescent” right? :)  ), but still msbi kept lagging dangerously behind new vendors like Tableau, Qlick, etc in several key areas.

Something was missing, purely speculating here. Maybe that came finally with Nadella, also James Phillips strong vision, bringing the needed kind of startup mindset to Power BI vision (reminder, speculating here!).

Don’t have much visibility on Microsoft Power BI internal journey, for years know I’ve been inspired with the lean startup, customer development movement, mindset, and from the outside at least, seems a lean startup perfect case study. Hope one day some of that backstage story can be told, really.

A new or reinforced  why- building a platform that could add immense value through the use of data, none of the typical limiting constraints, an inspiring vision, a fine tuned learning machine, powered by real feedback, real needs, really hearing customers & msbi –still- strong and devoted community. And, not less amazing, brilliant , flawless execution. Incrementally, steadily, the pieces simply kept fitting together perfectly. How can these teams have such agility without never slightly compromising a stable, robust, and coherent platform down the road. I don’t know. It shouldn’t be possible. Am I dreaming?

Mission, vision, execution, brilliantly orchestrated.

To the unsung heroes, the amazing minds than could plan, design, envision such a brilliant architecture where pieces keep fitting magically together release after release… To the teams than then gradually brought it to life. Here’s to you. Know a few, but it’s pretty clear it’s only the tip of the iceberg. :)

I remember that right about Power BI GA back in ~July 2015, I was reading “Meaningful: The Story of Ideas That Fly” (truly inspiring book btw) and really felt I could be reading the story of Power BI there, the mindset, the why. Equally inspiring. heck. Even the book color reminded me of Power BI :)

From the book description…

”We don’t change the world by starting with our brilliant ideas, our dreams; we change the world by helping others to live their dreams. The story of ideas that fly is the story of the people who embrace them, love them, adopt them, care about them and share them. Successful ideas are the ones that become meaningful to others—helping them to see what’s possible for them.”

You see…? :)

Guess when you stop having customers and start having followers instead, amazing things are bound to happen.

Long post I, I warned right? Anyway, and also speaking of Azure, SQL Server,  Microsoft new support for Open Source platforms, R, Hadoop, Spark, so many others, such a powerful data platform we now have in our hands. Huge responsibility for us, data professionals, to take all this and deliver on the promise of immensely valuable data.

Should not surprise many, that I feel we’re not there yet (far from it, never been the one easily pleased with only gorgeous dashboards, sorry!). I believe we’re still capturing a tiny potential of the value hidden in all this data. But those thoughts will have to wait for another day. (hope it won’t take another year anyway…)  

Disclaimer: I don’t work for Microsoft, never have, although I work pretty much exclusively with Microsoft stack since I started my professional life, and also very sure, that I will stick with msft stack to the end of my work life. (that is, aside from my faithful Kindle and those precious things called moleskins )  

So, anyone still reading should have this in consideration. I may be biased… slightly…

Better late than never, Happy Birthday Power BI! :)


Guest interview in Power BI Developer Blog and a new Send To Power BI app for Excel

Be sure not to miss the latest blog on the official Power BI Developer Blog :) , the amazing Power BI team asked us to share some thoughts on our latest work & research working with the new Power BI APIs and capabilities. And we were deeply honored to contribute. How cool is that? :)

Read the full post here.

And celebrating the day :) also announcing that even if you can already use SmartCharts to push & sync data to the new Power BI Preview directly from Excel, you now have a dedicated app just to do that :) a brand new Send To Power BI Office App for Excel.

Give it a test drive and leave us your feedback or review. :) It will be greatly appreciated!

Download the new app from the App store.


For more info:

DevScope GitHub repository
Send To Power BI Excel App
SmartCharts Excel App
DevScope Office Apps
Rui Quintino Blog
Rui Romano Blog

Awesome day! :)


More Power BI API awesomeness, now on PowerShell & SmartCharts

Amazing week for Data lovers & Business Intelligence minds, Power BI Preview is now *globally* available! Be sure to check out the public announcement  for details. And Chris Webb’s BI blog for additional insights.

By now you know that there’s a new & strong developer story for the new Power BI,  and even if at the moment we have a simple API to test drive, it already allows to think of interesting scenarios. Particularly for real-time dashboards or  the click to insights pattern  I happen to obsess so much ….

Really, seems there will be no place for data to hide from Power BI :). When Power BI can’t get there… we will bring it to Power BI in a blink of an eye, in real time if needed, all using the brand new APIs.

Don’t have Power BI Designer? or even Excel? and need to quick load  some data to explore? Real time? Check out DevScope github  & use the awesome Power BI API Powershell modules to load your Data into PowerBI directly from the command line (for more info & a cool real time sample don’t miss  Rui Romano blog post).

Not forgetting we already have NuGet Power BI api modules for .Net, Xamarin,  iOS & Android by Sylvain Pontoreau .  Or more recently new sample web apps for Power BI api scenarios.

Adding that, as of… now :) you can also use SmartCharts to quick send & live sync to Power BI directly from Excel. :)

Start with your data as usual or the sample dataset and just click Send To Power BI.


Authenticate & authorize the app for Power BI access,  choose a Dataset and a table name :


and your data will be available in your Power BI account ready to more Power BI awesomeness

(note: if using date/time columns we also add some extra time related columns we happen to use in the app).


Your data ready to explore in Power BI:




Noting that you should be able to do all this on trial mode without any restrictions.  Got feedback and requests? let us know (there’s a feedback form in the app :) we will get back to you)

Get SmartCharts here, or checkout or other DevScope Labs  Office Apps here.

If you want to know more about the new APIs, some additional recommended reading:

More to come :) stay tuned


How #AzureML and #MachineLearning are helping managing my Inbox–Part I

In this post, how to build an AzureML model to auto suggest folders/labels for email classification & archiving. How to consume AzureML web service directly on Outlook.

note: sample AzureML experiments now published on the brand new AzureML gallery. Email Classifier-Training & Email Classifier – Scoring

Context: for years I’ve opted for an archive by folders strategy to handle my Inbox (not obsessed with inbox zero but  I try to keep it below ~50 inbox mails, and instead use inbox like a buffer – since my first contact with David Allen GTD years ago). true: once in a while I try to test the single  archive folder, but until now…always reverted back to my folders to organize so many parallel projects and threads going on

For this to work, I think one thing  is absolutely mandatory, very fast folder archiving and switching. Although have tried a few tools for this  I ended up coding some Outlook macros over the years to fit this… “peculiar” way of working and unproductive task switching…. 

So to archive a mail I just press alt-4, this window popups up, I type my search terms, enter, and it’s done. mail Archived


Or to switch context to a specific project or fast search, just press alt-3, search, enter and I’m there with the latest thread mails immediately available.


(that can admit, I obsess over searching…. “as you type” kind of search :) )

Fast forward to “the present”, #MachineLearning #DataIntelligence #AzureML era. I now have a few thousand “labeled” mails (to use machine learning terms) on several active folders. My macros were needing an improvement….  :)

So I exported my archive mails in the form:   |from | to | subject | time offset since fixed day|  format to a csv (tsv in fact)


and let AzureML do the heavy lifting of building my very personal email classifier suggestion web service, putting some multiclass classification models & also text handling /feature hashing AzureML features to work in a more useful scenario than classifying flowers :)



Sticking with the neural network model for now (default params), created and adjusted the scoring experiment & published the web service.



Used the VBA code sample from the new cool AzureML generated Excel files,


few lines of code added on my Outlook macros, based on AzureML Excel VBA code/macros


and we get a pretty impressive auto classifier ready to use & help manage our inbox, suggesting the folders where the message belongs when archiving.(note: also triggered when sending messages, archiving both sent and original message if needed.)

So for example,  if I’m disturbing Joana with another annoying mail :) about SmartCharts new features AzureML advises me:


On the other hand, if it were an help request to Romano on some stream analytics samples AzureML would opt for:


press Enter, for now it’s needed…:) ,  and that’s done. How cool is that? :)

(have to say that the accuracy is not 100% obviously , but pretty damn useful already)

All this running on a free AzureML workspace. Up & running & minutes, from training to online web service

(excluding the time to slightly adapt AzureML VBA code to call the web service & get my mail properly exported)

Training & scoring experiments are available in AzureML gallery fo you to test drive. Off course I loaded a small sample of my mail in those, so you will have to load your own to really see how it works.

…now, to be close to perfection this would benefit from automated data update & model retraining, and that will be the case for use the new AzureML training APIs.

But that will have to wait! :) (true, reduced some time processing my inbox, but not that much…!)

btw- hope to get feedback on this scenario, ex: feature engineering tips and model tuning suggestions to improve the model results



SmartCharts for Excel new features: guest post by Joana Barbosa

Sharing the first guest post on my blog (cool!) and proudly introducing my colleague Joana Barbosa (who I hope soon to have her own tek blog ;) ). She will share some of most recent feature of our tiny (but cool) dataviz App for Office, SmartCharts.

Joana works on DevScope  “multi-faceted” BI Team :)  , among several amazing contributions, she’s the main dev working on SmartCharts coolest features (namely d3/dataviz). (adding to that…. fixing my bugs, sorry Joana! :) )

ps-feedback on features or suggestions? please use the app built-in feedback form . It will reach me or Joana in a moment.

Go Joana!

Hi everyone,

So, recently SmartCharts got some new features for you to play with. And, since the 2014 FIFA Ballon d’Or winner was announced this week, I’m going to demonstrate them using a soccer dataset: Champions League – Best Scorer By Year.

First of all, we’ll load our dataset.


Our table has columns for date, player’s name, country and team as well as goals scored. But we also have three special columns: Images of the player, team and country containing links to pictures.

What can we do with this? We can enhance our charts of course!

One of SmartCharts’ new feature is binding an image column to another dimension column. For that, we need to name it according to the following rule:

– Consider our column ‘Country’. To bind an image column to this dimension, we must call the new one ‘Country_image

Applying this to our dataset, we’ll see the following result:


So, our Bubble Chart has become much nicer, with flag images on it. Also, when you go over a Bubble or over a slice in the Pie Chart, we’ll see our dimension chart over time accompanied by the dimension image we added.

Imagine now that we want to see all players from our country in the set. How could we do it?

Before, we had to alter our dataset in the spreadsheet. Now all we got to do is click on whatever you want to filter by, for instance, the Portugal bubble. This way, by changing the dimension to Player, we’ll only see the Portuguese players. That easy!


Of course, we can go even further by selecting and filtering another dimension (ex. Team: Real Madrid). Thus, we’ll only see the players from Portugal and who were also playing in Real Madrid. In this case, the single result is Cristiano Ronaldo.

Oh, if you want to print one of your charts, or take a screenshot to copy to a document but you don’t want that toolbar messing with your nice chart, just click on the small arrow icon in the upper right corner, and the toolbar will be gone, hidden until you re-click the button.


And that’s it! Our brand new features that I hope you try and enjoy.

The dataset is available for you to check it here:

So, want to build some cool fresh interactive charts?

Then, Go Smart and make your own SmartCharts!