Notes from Cortana Womens Health Risk #MachineLearning Competition-I

Another Cortana Machine Learning is now over – Womens Health Risk Assessment -, be sure to check the Winners, and most importantly  their submissions and overall approaches to the problem. (update 26/10- Power BI companion with final results uploaded & online here)
image

Must confess I hadn’t so much fun in a while working with data, stats, code & dataviz, been an amazing learning journey, really highly effective on consolidating ~more than a  year of almost exclusive deep dive/personal quest in data,stats, R, machine learning (and yes, some math involved…).

Have to say  almost as equally frustrated missing the top 10….again, 6th on public leaderboard, 13th on final rankings. Far from one of my  goals, actually winning the competition. And trust me, I gave it all, to the last mile.
(ah… and yes, prizes would be great help on research funds Winking smile )

I had got a very lucky 34th place in previous brain signals competition, but that one was completely out of reach, due to the knowledge needed on signal processing/ECoG, gave up midway really)

So, I decided to take this one very seriously as I knew it would be the only way to maximize learning, both on competition data, ml process and everything data/stats/ml related. I would only stop when there were abs nothing more I could do, and so it was.

Other goal, ensure that DevScope ml minds could maximize learning also, have some fun, hopefully get into winners, or at least top 10. Very happy to see that we’re all top 25. Smile Huge congrats Bruno & Fábio. (still couldn’t  get most brilliant mind at DevScope to enter this one though, hope better luck next time…. Winking smile )

I put so many hours… days, weeks on this one, if there hadn’t been so much learning involved  (and personal enjoyment)  I would classify this as my most unproductive period… ever.

So, I’ll publish as much as I possibly can in upcoming blog posts, what worked out, what didn’t, code, tips, insights that still intrigue me, from Azure ML, to Power BI, R, caret, xgboost, random forests, ensembles, learning with counts, even some python. Also noted as much as I can on code files, OneNote was huge help, it’s great now to look back and have all this.

Other goal was to pursue both native Azure ML & custom R tracks to compete, but preferably win with Azure ML native modules (as I knew few top competitors would be all R/python). Still believe that using AzureML modules, no code required, would be enough for top 10-20 rank. But also believe, with current azureml capabilities,  if you want to win, you’re better off using native R/python from the start.

That’s relevant, also relevant  that top 3 winners submissions use base R, xgboost,  all features, no AzureML native modules (azureml teams, some hints here…)

Still, it isn’t over yet, as now I can look back and try to understand what could I have done differently and learn from top submissions. How much randomness is involved in a competition that was so close like this one? bitten by excessive variance on my submissions? multiple comparisons? Can an ensemble of the best techniques and models from top submissions improve the score we have to date? what more insights can we get from competition data? submissions data?

Awesome journey, so much still to learn…. hope to share as much, stay tuned!

Rui

(just the tip of the iceberg really…)

online PowerBI /EDA Companion

image

clip_image001[6]

clip_image001[10]image

clip_image001[14]clip_image001[12]

clip_image001[16]

image

image

image

Advertisements

Guest interview in Power BI Developer Blog and a new Send To Power BI app for Excel

Be sure not to miss the latest blog on the official Power BI Developer Blog :) , the amazing Power BI team asked us to share some thoughts on our latest work & research working with the new Power BI APIs and capabilities. And we were deeply honored to contribute. How cool is that? :)

Read the full post here.

And celebrating the day :) also announcing that even if you can already use SmartCharts to push & sync data to the new Power BI Preview directly from Excel, you now have a dedicated app just to do that :) a brand new Send To Power BI Office App for Excel.

Give it a test drive and leave us your feedback or review. :) It will be greatly appreciated!

Download the new app from the App store.

clip_image002

For more info:

DevScope GitHub repository
Send To Power BI Excel App
SmartCharts Excel App
DevScope Office Apps
Rui Quintino Blog
Rui Romano Blog
www.devscope.net

Awesome day! :)

Rui

More Power BI API awesomeness, now on PowerShell & SmartCharts

Amazing week for Data lovers & Business Intelligence minds, Power BI Preview is now *globally* available! Be sure to check out the public announcement  for details. And Chris Webb’s BI blog for additional insights.

By now you know that there’s a new & strong developer story for the new Power BI,  and even if at the moment we have a simple API to test drive, it already allows to think of interesting scenarios. Particularly for real-time dashboards or  the click to insights pattern  I happen to obsess so much ….

Really, seems there will be no place for data to hide from Power BI :). When Power BI can’t get there… we will bring it to Power BI in a blink of an eye, in real time if needed, all using the brand new APIs.

Don’t have Power BI Designer? or even Excel? and need to quick load  some data to explore? Real time? Check out DevScope github  & use the awesome Power BI API Powershell modules to load your Data into PowerBI directly from the command line (for more info & a cool real time sample don’t miss  Rui Romano blog post).

Not forgetting we already have NuGet Power BI api modules for .Net, Xamarin,  iOS & Android by Sylvain Pontoreau .  Or more recently new sample web apps for Power BI api scenarios.

Adding that, as of… now :) you can also use SmartCharts to quick send & live sync to Power BI directly from Excel. :)

Start with your data as usual or the sample dataset and just click Send To Power BI.

image

Authenticate & authorize the app for Power BI access,  choose a Dataset and a table name :

image

and your data will be available in your Power BI account ready to more Power BI awesomeness

(note: if using date/time columns we also add some extra time related columns we happen to use in the app).

image

Your data ready to explore in Power BI:

image

image

image

Noting that you should be able to do all this on trial mode without any restrictions.  Got feedback and requests? let us know (there’s a feedback form in the app :) we will get back to you)

Get SmartCharts here, or checkout or other DevScope Labs  Office Apps here.

If you want to know more about the new APIs, some additional recommended reading:

More to come :) stay tuned

Rui

SmartCharts for Excel new features: guest post by Joana Barbosa

Sharing the first guest post on my blog (cool!) and proudly introducing my colleague Joana Barbosa (who I hope soon to have her own tek blog ;) ). She will share some of most recent feature of our tiny (but cool) dataviz App for Office, SmartCharts.

Joana works on DevScope  “multi-faceted” BI Team :)  , among several amazing contributions, she’s the main dev working on SmartCharts coolest features (namely d3/dataviz). (adding to that…. fixing my bugs, sorry Joana! :) )

ps-feedback on features or suggestions? please use the app built-in feedback form . It will reach me or Joana in a moment.

Go Joana!

Hi everyone,

So, recently SmartCharts got some new features for you to play with. And, since the 2014 FIFA Ballon d’Or winner was announced this week, I’m going to demonstrate them using a soccer dataset: Champions League – Best Scorer By Year.

First of all, we’ll load our dataset.

clip_image002

Our table has columns for date, player’s name, country and team as well as goals scored. But we also have three special columns: Images of the player, team and country containing links to pictures.

What can we do with this? We can enhance our charts of course!

One of SmartCharts’ new feature is binding an image column to another dimension column. For that, we need to name it according to the following rule:

– Consider our column ‘Country’. To bind an image column to this dimension, we must call the new one ‘Country_image

Applying this to our dataset, we’ll see the following result:

clip_image004

So, our Bubble Chart has become much nicer, with flag images on it. Also, when you go over a Bubble or over a slice in the Pie Chart, we’ll see our dimension chart over time accompanied by the dimension image we added.

Imagine now that we want to see all players from our country in the set. How could we do it?

Before, we had to alter our dataset in the spreadsheet. Now all we got to do is click on whatever you want to filter by, for instance, the Portugal bubble. This way, by changing the dimension to Player, we’ll only see the Portuguese players. That easy!

clip_image006

Of course, we can go even further by selecting and filtering another dimension (ex. Team: Real Madrid). Thus, we’ll only see the players from Portugal and who were also playing in Real Madrid. In this case, the single result is Cristiano Ronaldo.

Oh, if you want to print one of your charts, or take a screenshot to copy to a document but you don’t want that toolbar messing with your nice chart, just click on the small arrow icon in the upper right corner, and the toolbar will be gone, hidden until you re-click the button.

clip_image008

And that’s it! Our brand new features that I hope you try and enjoy.

The dataset is available for you to check it here:

http://1drv.ms/1CcHLe4

So, want to build some cool fresh interactive charts?

Then, Go Smart and make your own SmartCharts!

Joana

https://twitter.com/JoanaVBarbosa

AzureML Web Service Scoring with Excel and Power Query

While drafting some samples & labs for this saturday SQLPort AzureML workshop decided to dig a little bit trying to use some regular AzureML scoring web services (request/response) directly in Excel, using Power Query. :)

So… can it be done? Sure! Advisable? Far,far from it…. but for a quick sample/test scenario it might be handy. :)

Saying again, what I’ll show here is far from a recommended web service usage for AzureML in Excel. It will be slow, probably insecure if you need to distribute your file, and there will be I’m sure better ways to do it with the batch web service mode, soon I hope (without the need to pass through the Azure marketplace),

Side notes:

    How it works

    To the point, you can use the Power Query Web.Contents function to invoke the AzureML web service api. As long as you are careful on: 1) using post and not get, 2)setting the authorization header expected by azureml, 3)setting the content type to json  4) building the json request like shown in the api help page.
    PowerQuery will handle the json response automatically..We encapsulate the web request in a function, we can then use it to build our scored table passing the score input column values and getting the score in return.

Step by Step-From AzureML to Excel Power Query Scoring

You’ll need an AzureML  scoring web service, I used a simple two class log regression model to classify the well known Fisher Iris Dataset (with only two classes, as the original as three):

image

The AzureML Experiment & Web Service Experiment, can’t get much simpler than this.

Training Experiment

image

Web Service Experiment

image

Publishing the trained model as a Web Service

AzureML-Iris Classifier Web Service API Help page

image

this JSON request will be needed for Power Query, same thing as the service key:

image

In Power Query you’ll then create a function like this (text version below):

image

image

image

Invoking the Power Query scoring function

Then you just import the original table to Power Query and add a custom column invoking the new scoring  function

image

Note: to test this it is advisable to filter the top 1/2 rows first, it is not lightning fast…. Smile

image

The result should now be visible, a little bit more Power Query expanding and renaming and that’s it!

image

image

To check that I’m getting the expected score, just used another small tool (more on this later)

image

Power Query results

image

Scoring the whole 100 rows iris dataset

The problem using the regular Power Query web.contents function  is that you will be receiving api usage threshold errors from the AzureML API very, very soon. So using another trick inspired (again) on this post by Chris Webb (amazing tip by Alejandro Lopez-Lago), to add some delay between requests.

Just remove the top filter and refresh the query. Wait between 30 secs-60 secs (this will depend on the kind of endpoint you are using, AzureML is changing this whole “stag/production web service model “ right now from what I’m seeing)

image

note: think the latest Power Query update now shows the row count progress, neat! perfect for this Smile

image

image

image

Some misses there Smile guess the model needs some additional training!

image

Download the sample here, (note: will need your own AzureML endpoint & key, you do know that AzureML has a free tier right? :) ) the full Power Query score function for this model is below.

Take care!

Rui

let

     Score= (sepal_length as any,sepal_width,petal_length,petal_width) =>

let

serviceUri="*yourserviceuri_note:ends with score*",
serviceKey="yourservicekey”,

PostContents= "

{
  ""Id"": ""score00001"",
  ""Instance"": {
    ""FeatureVector"": {
      ""sepal_length"": """&sepal_length&""",
      ""sepal_width"": """&sepal_width&""",
      ""petal_length"": """&petal_length&""",
      ""petal_width"": """&petal_width&"""
    },
    ""GlobalParameters"": {}
  }
}
        
                   
",
      //Define Wait function
    Wait = (seconds as number, action as function) =>
            if (List.Count(
             List.Generate(
              () => DateTimeZone.LocalNow() + #duration(0,0,0,seconds),
              (x) => DateTimeZone.LocalNow() < x,
              (x) => x)
              ) = 0)
               then null else action()
    //Call Wait function to wait 5 seconds
    //then return the current time again
   
,
    Source=Wait ( 0.05, () => Web.Contents(serviceUri,
    [Content=Text.ToBinary(PostContents),
    Headers=[Authorization="Bearer "&serviceKey,#"Content-Type"="application/json; charset=utf-8"]])),
    #"Imported JSON" = Json.Document(Source),
    #"Table from List" = Table.FromList(#"Imported JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Transposed Table" = Table.Transpose(#"Table from List")
   in
     #"Transposed Table"
    in Score

#SqlPort upcoming CTP hands on workshop–Cloud #MachineLearning & #DataMining using #AzureML 29 Nov Lisbon

It’s already this Saturday 29 that I’ll be hosting the first AzureML hands-on workshop in Portugal. It has been a pleasant surprise to see that it’s filling up rather quickly, much to thanks to the restless SqlPort community team (Particularly Niko & Paulo amazing work… allowing me to fully  focus on the workshop itself… more pressure for sure… no excuses!).

I’m rather curious to see how it goes, as there will be for sure an interesting mix of profiles attending this workshop. And I hope that will contribute to make this a very interactive, personally demanding  but enjoyable day!

The event is organized by SQLPort, all the resulting revenue will revert to community support & upcoming events.

Check out the workshop agenda below or in the event page here . The event will take place at Microsoft Portugal.

Also disclosing that  every participant will be granted access to my very own personal notes & best highlights on some of my favorites data books… (go to love Amazon/Kindle platform)  all of which I’m reviewing right now  ;) .

Data Science for Business: What you need to know about data mining and data-analytic thinking     The Signal and the Noise: Why So Many Predictions Fail-but Some Don't      Naked Statistics: Stripping the Dread from the Data     

See you there!

Rui

CTP – Community Technology Preview 2014 Edition – Cloud Machine Learning & Data Mining using AzureML

SQLPort

Sábado, 29 de Novembro de 2014 das 09:00 às 17:30 (WET)

Lisboa, Portugal

 

Abstract

Full one day hands-on workshop including introductory course to Machine Learning process, use cases, models & algorithms and also how to model, test & deploy Machine Learning solutions with the new Microsoft AzureML cloud service.


Objectives

  • Understand Machine Learning & Data Science Fundamentals
  • Model & Deploy Machine Learning Solutions with AzureML, ML Studio & AzureML Web Services


Attendee’s pre-requisites/profile

  • Data Analysts & Enthusiasts, Business Intelligence Professionals, Data Scientists & Machine Learning


Contents

  • Introduction to Data Science and Machine Learning
  • Machine Learning Use Cases, Models & Workflow
  • Introduction to Azure Machine Learning & ML Studio
    • o Data Cleansing & Transformation
    • o Data Analysis & Visualization
    • o Machine Learning Models: Classification, Regression, Clustering, & Text Mining
    • o Model Tuning, Scoring & Evaluation
    • o Using R in Azure ML
    • o Publishing Models as Web Services
    • o Using Excel with Azure ML
    • Books & Resources for AzureML, Data Science & Machine Learning

#SQLSaturday Portugal is coming to Porto! tomorrow!

SQL Saturday is coming to Porto for the first time and it’s already tomorrow! Smile  One entire day of free sessions for Data Enthusiasts and Professionals!

SqlSatPortugal team has put everything they got (as usual!), not a doubt that it will be an awesome event. Great speakers & great sessions like you would expect . Topics going from the regular suspects SQL Server, Analytics, Business Intelligence to Mobile Development and PowerShell Tips,  not forgetting the most recent Data technologies from Msft like AzureML!

Some close friends and team mates will be sharing their knowledge also,  Sandro Pereira (DevScope Integration Architect & MVP for Biztalk), José António Silva (R&D Director at DevScope) and Rui Marinho (former team mate at DevScope now pursuing his mobile passion at Xamarin). Inspiring to work every day among such group of amazing minds.

Check out the schedule . Not registered yet? Hurry up!

Celebrating SqlSaturday we are also finishing some updates to our Office Apps and will be covering these and other Data/DataViz related tips on our DevScope sponsored session. Excel will never be the same!

(Expect got some goodies at the end, including invitations to upcoming workshops on Azure Machine Learning I’ll be doing soon )

Just a teaser Winking smile … see you all tomorrow!

Take Care,

Rui

image

image

image