As more businesses look to become data driven with their digital analytics, the tools that help a company work with data are becoming more popular. You can’t make a change to a data driven culture and still rely on the tools you were using before, such as Excel.
Collecting, processing, modelling and visualising data so you can make decisions quickly means you need tools that provide scale, reproducibility, flexibility and quick iteration on output.
For many people this means using R and its supporting infrastructure, but other solutions include Python, Tableau, Alteryx, Adobe Data Workbench, BigQuery and others.
If you don’t know R, or know it and would like to see how it can apply to digital analytics, then check out www.dartistics.com, which is based on course material for an R Workshop Tim Wilson and I first ran in September 2016.
EARL stands for “Effective Applications of the R language”, and it is a key event to attend to see how users are putting R into action beyond the theoretical research purposes within Universities.
Attending this year were several web focused companies such as eBay, The Telegraph, MoneySuperMarket and MEC (a WPP Media agency) who showed off how R helped them specifically for digital analytics, and I left with lots of ideas.
A brief review of some of the event is below, which in general I thought was one of the most constructive I’ve ever been to – many thanks to the EARL organisers.
I was fortunate to get my talk accepted for the second day, about a data infrastructure we have been experimenting with to deploy R models straight into client websites via a tag manager. I had a chance to deploy a proof of concept before the conference so could report back some exciting initial results where we sped up a website without touching their server, purely through deploying a prediction model through Google Tag Manager via R.
My presentation is here if you would like to hear more – Super-charging websites using a real-time R API
There were lots more great talks but I have stuck with the ones I saw that were most applicable to digital analytics.
First up was Joe from RStudio, a company who dedicates 70% of its engineering effort to free open-source R packages that has helped R become a thriving ecosystem. He spoke of some of the many new features they had been working on in 2016 with focus on R Notebooks, a way of doing interactive shared analysis in the cloud, and sparklyR, the exciting interface to the new big-data wunderkid, Spark.
Spark allows full data analysis over multiple clusters so if you are collecting a lot of data in say Display publishing then it can cope with as much data as you throw at it, at the PetaByte scale. This was later demonstrated by Vincent Warmerdam of GoDataDriven, saying on datasets greater than 2GB it started to pay off speed wise.
We also heard from David Smith who works with Microsoft R. Microsoft has recently invested heavily in putting R within its products including running native R on Microsoft SQL 2106, and has a very impressive offering to help bring R firmly into enterprise business. David ran through some of these solutions which are well worth checking out, and a later talk by Adam Rich of beazley demonstrated it in detail, inside a corporation where using R on corporate infrastructure may have been impossible before Microsoft got involved.
The Telegraph’s data scientist @magdapiatkowsa talked about the issue and solutions for predicting user journeys within the online publisher.
She noted that similar problems as solved by Netflix and Amazon don’t necessarily apply to a news website due to user expectations, and went through how they improved their engagement metrics, starting slowly with easy algorithms such as “most viewed” and moving down the chain of sophistication to tailored, context tree path dependent recommendations.
Next up Jack Wright from MoneySuperMarket demo’d who they presented a web application that “wowed” their CEO that used R as its backend – building data products are key to MoneySuperMarket’s strategy. Jack showed building on the NAME web development stack rather than more traditional route of using Shiny, which stressed the point that R sometimes needs to work within the infrastructure your business has already, rather than trying to force non-R types into yours.
Juan Hernandez gave an impressive talk about how they are automating media mix modelling for clients via their R app “The Beast” with automated Google trends and client marketing costs.
He went through how to pick out events, long term trends and actual campaign effect on business revenue that helps inform which campaigns are working or not. This should be helpful in helping get the effectiveness of say Display campaigns before they end to see if they should have more or less budget.
Maciej Bledowski showed us how much eBay is ahead of us all by demoing the models they had created for AB testing TV campaigns. With eBay’s scale they could measure TV campaigns per country region, but a true AB test can’t be done as those regions have different trends and demographics – what to do?
Well in eBays case it was to create a synthetic control trend from a combination of others, meaning they could more accurately measure the effectiveness of different TV creatives on their web traffic.
Timothy Wong from Centrica runs British Gas in particular, and a nice application of recurrent neural nets was demonstrated for predicting when gas boilers will fail. Neural nets can get very accurate predictions, but does not allow easily for confidence intervals or getting an idea on what factors made the prediction – if you want explanation then something like linear regression may be more useful, but if you want just pure predictive power neural nets are the way to go.