Notes from Cortana Womens Health Risk #MachineLearning Competition-I

Another Cortana Machine Learning is now over – Womens Health Risk Assessment -, be sure to check the Winners, and most importantly  their submissions and overall approaches to the problem. (update 26/10- Power BI companion with final results uploaded & online here)

Must confess I hadn’t so much fun in a while working with data, stats, code & dataviz, been an amazing learning journey, really highly effective on consolidating ~more than a  year of almost exclusive deep dive/personal quest in data,stats, R, machine learning (and yes, some math involved…).

Have to say  almost as equally frustrated missing the top 10….again, 6th on public leaderboard, 13th on final rankings. Far from one of my  goals, actually winning the competition. And trust me, I gave it all, to the last mile.
(ah… and yes, prizes would be great help on research funds Winking smile )

I had got a very lucky 34th place in previous brain signals competition, but that one was completely out of reach, due to the knowledge needed on signal processing/ECoG, gave up midway really)

So, I decided to take this one very seriously as I knew it would be the only way to maximize learning, both on competition data, ml process and everything data/stats/ml related. I would only stop when there were abs nothing more I could do, and so it was.

Other goal, ensure that DevScope ml minds could maximize learning also, have some fun, hopefully get into winners, or at least top 10. Very happy to see that we’re all top 25. Smile Huge congrats Bruno & Fábio. (still couldn’t  get most brilliant mind at DevScope to enter this one though, hope better luck next time…. Winking smile )

I put so many hours… days, weeks on this one, if there hadn’t been so much learning involved  (and personal enjoyment)  I would classify this as my most unproductive period… ever.

So, I’ll publish as much as I possibly can in upcoming blog posts, what worked out, what didn’t, code, tips, insights that still intrigue me, from Azure ML, to Power BI, R, caret, xgboost, random forests, ensembles, learning with counts, even some python. Also noted as much as I can on code files, OneNote was huge help, it’s great now to look back and have all this.

Other goal was to pursue both native Azure ML & custom R tracks to compete, but preferably win with Azure ML native modules (as I knew few top competitors would be all R/python). Still believe that using AzureML modules, no code required, would be enough for top 10-20 rank. But also believe, with current azureml capabilities,  if you want to win, you’re better off using native R/python from the start.

That’s relevant, also relevant  that top 3 winners submissions use base R, xgboost,  all features, no AzureML native modules (azureml teams, some hints here…)

Still, it isn’t over yet, as now I can look back and try to understand what could I have done differently and learn from top submissions. How much randomness is involved in a competition that was so close like this one? bitten by excessive variance on my submissions? multiple comparisons? Can an ensemble of the best techniques and models from top submissions improve the score we have to date? what more insights can we get from competition data? submissions data?

Awesome journey, so much still to learn…. hope to share as much, stay tuned!


(just the tip of the iceberg really…)

online PowerBI /EDA Companion









2 Published Data Stories in #PowerBI Data Stories Gallery

Few others to follow, meantime recently published these . Be sure to check all the gallery, really amazing & inspiring data stories there.

Cortana Machine Learning Competition – Womens Health Risk Assessment – Power BI Companion

Small companion for anyone participating in the Cortana Machine Learning competition – Women’s Health Risk Assessment

The Power BI file allows for easy data profiling over the supplied training dataset, overall patterns, data quality, missing data, feature selection. Note: you’ll still have to build the models though!

Although the current version of Power BI Publish to web doesn’t allow  R visuals, they’re highly recommended (ex: density plots & others), and can easily be added to the data/column profile pages.

So what are you waiting? Smiley Happy You can be running your first submission in minutes using the online tutorial. Then it’s up to you.


The Story of Dr Ignaz Semmelweis – “Savior of Mothers”

This sample PowerBI workbook tells the real story of Dr Ignaz Semmelweis. Year is 1846, Dr Semmelweis begins his 3 year work at Vienna Hospital maternity clinic, and is immediately  disturbed  with the high incidence of puerperal fever (also known as "childbed fever") in Vienna Hospital, causing  extremely high maternal mortality between 1823-1850.

Semmelweis was one of the first data “detectives” of our time, using data and statistics to test his hypothesis and theories.

This workbook allows you to explore the data collected by Semmelweis through those years and shows typical exploratory patterns we can use to hint on important information and possible causes from observational time series data.

You can see the video walkthrough here:


A sense of Wonder… reflecting on the #PowerBI amazing journey

Few hours left in the day, been an amazing day for a lot of reasons… better late than never, they usually say right?

So… finally dumping a blog pots that formed on my mind going back several months (cof, years…) now… Anyway, today is even more relevant than then, so…
Warning: there won’t be fancy templates, crazy ideas,  strange tools, insights, snippets or new downloads on this post. in fact it will be a completely useless post for anyone reading I guess, I just need to get it out.  Don’t have a clear objective, or clear picture of what will come up. Also I won’t be slightly concerned if it’s too short, or too long, I won’t massively edit it (ok, slightly…). Fellow reader, I’m not writing for you today, I’m sorry.

Still, A-post-must-be-written.

So, how to start? How did we get here so fast? Seems only yesterday we were seeing this new Power BI Dashboards screenshots shown at WPC 2014. ~7-8 months after seeing Power BI GA then already a immensely powerful & stable Enterprise/Self-Service/even Personal BI platform, soon after that our first major project with the technology is online.

I’m not easily surprised, and yet time & time again I was completely baffled with what Power BI teams came up delivering, beyond user & community most requested features & needs, a lot others I honestly didn’t knew I was missing, and yet I can’t live without today.

Such an amazing journey… not in my wildest dreams, honestly I would say we would get here. And boy did we need it. Back in 2013-2014 remember we were getting very uncomfortable, mainly for the medium sized/enterprise BI scenarios, at least for any BI team that only works with Microsoft BI stack: powerful but completely stale OLAP engine, no clear roadmap,  failed products ("Planning" anyone?) , acquisitions that ended up bringing no value in the long term, huge number of disparate tools with poor integration, pieces that didn’t fit properly together aside from very superficial demonstrations. (we deliver solutions we hope can add value, not fancy toys that work great in demos)

The nagging feeling that just selling the latest tools was more important the adding value. I’m still a big believer that revenue is a side effect from value we bring for customers. Add value, then capture some of it so you can add more value, repeat.  You can make the other way work temporarily but it won’t be sustainable in the long term, and even if it were… that’s not the point right?   That was the  feeling honestly, yes, we had some great tools (heck PowerView  on SilverLight was great for its time, we couldn’t have Power BI today without “project crescent” right? :)  ), but still msbi kept lagging dangerously behind new vendors like Tableau, Qlick, etc in several key areas.

Something was missing, purely speculating here. Maybe that came finally with Nadella, also James Phillips strong vision, bringing the needed kind of startup mindset to Power BI vision (reminder, speculating here!).

Don’t have much visibility on Microsoft Power BI internal journey, for years know I’ve been inspired with the lean startup, customer development movement, mindset, and from the outside at least, seems a lean startup perfect case study. Hope one day some of that backstage story can be told, really.

A new or reinforced  why- building a platform that could add immense value through the use of data, none of the typical limiting constraints, an inspiring vision, a fine tuned learning machine, powered by real feedback, real needs, really hearing customers & msbi –still- strong and devoted community. And, not less amazing, brilliant , flawless execution. Incrementally, steadily, the pieces simply kept fitting together perfectly. How can these teams have such agility without never slightly compromising a stable, robust, and coherent platform down the road. I don’t know. It shouldn’t be possible. Am I dreaming?

Mission, vision, execution, brilliantly orchestrated.

To the unsung heroes, the amazing minds than could plan, design, envision such a brilliant architecture where pieces keep fitting magically together release after release… To the teams than then gradually brought it to life. Here’s to you. Know a few, but it’s pretty clear it’s only the tip of the iceberg. :)

I remember that right about Power BI GA back in ~July 2015, I was reading “Meaningful: The Story of Ideas That Fly” (truly inspiring book btw) and really felt I could be reading the story of Power BI there, the mindset, the why. Equally inspiring. heck. Even the book color reminded me of Power BI :)

From the book description…

”We don’t change the world by starting with our brilliant ideas, our dreams; we change the world by helping others to live their dreams. The story of ideas that fly is the story of the people who embrace them, love them, adopt them, care about them and share them. Successful ideas are the ones that become meaningful to others—helping them to see what’s possible for them.”

You see…? :)

Guess when you stop having customers and start having followers instead, amazing things are bound to happen.

Long post I, I warned right? Anyway, and also speaking of Azure, SQL Server,  Microsoft new support for Open Source platforms, R, Hadoop, Spark, so many others, such a powerful data platform we now have in our hands. Huge responsibility for us, data professionals, to take all this and deliver on the promise of immensely valuable data.

Should not surprise many, that I feel we’re not there yet (far from it, never been the one easily pleased with only gorgeous dashboards, sorry!). I believe we’re still capturing a tiny potential of the value hidden in all this data. But those thoughts will have to wait for another day. (hope it won’t take another year anyway…)  

Disclaimer: I don’t work for Microsoft, never have, although I work pretty much exclusively with Microsoft stack since I started my professional life, and also very sure, that I will stick with msft stack to the end of my work life. (that is, aside from my faithful Kindle and those precious things called moleskins )  

So, anyone still reading should have this in consideration. I may be biased… slightly…

Better late than never, Happy Birthday Power BI! :)
