Over the weekend there has been (more) in-fighting by the analytics community on Twitter about who has the best – or more accurate – expected goals model. Despite being nowhere near the point of xG being a widespread term in general football circles, once again Twitter devolved into a bunfight about models, data sources and who was right and who is wrong.
To address some of the points made, I’ve written the below about StrataData and how it differs to other providers.
Our primary purpose as a company is sports trading. We use expert insight and smart technology to better predict the outcomes of sporting events. Part of this is our Data Collection arm, we have a team of around 60 analysts spread across the globe and watch and collect bespoke data on every one of the games from the 24 competitions we cover.
The key word here is bespoke. We don’t collect data in the same way as Opta – they are a well established company, have their own way of doing things and it would be utterly pointless (and probably not cost effective) to replicate what they collected. That doesn’t mean everything they are doing is ideal for what we want.
The crux of our data collection focus’ around Chances – note this is NOT shots – but Chances. There are several differences we record as oppose to Opta data.
I’ll give an example of some of these below – but let’s start by seeing what some of the fuss was about
Here is the @Caley_graphics Shot Chart and xG for Tottenham vs West Brom – using Opta data
Here is the @11tegen11 Race Chart for the same game, again using Opta data – not the exact same amounts of xG but close
Now here is the Race Chart by @GoalCharts – who use StrataData. Quite a big difference, primarily in Tottenham’s score.
I’ve also included below a Shot Graphics produced by @GoalCharts so it can be compared to @Caley_graphics
Of note are that Tottenham have 4 large circles within the 6 Yard Box on the @GoalCharts graphic but only 2 on @Caley_graphics shot map
The difference is that at Stratagem we capture Dangerous Moments – these are instances where a chance is created that on a repeatable basis would result in a shot/goal – however, the end result of these is not a shot. These are not recorded by Opta (as no Shot is taken) which they are well within their rights to do – that is how they have defined their collection process.
Examples of the 2 Dangerous Moments for Tottenham are shown below
In the first Kane is inches from connecting with Son’s low cross, he almost has an open goal if he can get anything on it. The 2nd is Alli throwing himself at a bouncing shot from a half cleared corner – again he’s very close to connecting with only the keeper to beat from inside the 6 yard box.
Should these ‘Chances’ be included? As expected goals primary function is to look at repeatability we feel the clear answer is yes. On another day Kane or Alli at the very least connect with the ball which changes a 0.00 chance rating into something much, much higher given their proximity to goal.
There are a few other factors we do/don’t record – we only record the best chance in each attacking move – this is due to only one chance being ABLE to be scored during this time – as this is the overriding factor in what we look at a goalmouth scramble with 2 or 3 shots could still only have a maximum value of the highest quality chance.
We also don’t include shots taken from outside the box, that travel less than 5 yards before being blocked and don’t enter the area – the repeatable chance of these being scored is minimal (well below 1%) and so we took the decision not to record these.
These 3 factors combined is why you will always see Stratagem data refer to Chances and not Shots – Shots are what Opta collect and they do a very good job of this – this also means that our total number of chances will differ from Opta’s total number of shots.
On top of this we collect several other aspects of each chance rating – one of the most notable being Defensive Pressure and Number of Players between the Ball and the goal – work has been done on this by several bloggers who have access to our data and the results have been overwhelmingly positive.
As well as it being used several times on articles about Burnley – with 2 great examples below
it has been used in a more holistic way with 2 excellent articles below
The main thing to highlight is that Stratagem did not set out with the goal of collecting data for use by the analytics community, clubs etc – what continues to be the primary aim is to collect data to be used by the company to provide insights and help predict outcomes. However, I’ve been an amateur blogger myself. I know the limitations of freely available data and anything to help expand the breadth of analysis available can only be a good thing. The reasons weren’t wholly altruistic – the growth of StrataBet, Stratagem and StrataData as a recognisable brand is increasing all the time, primarily in part to some of the excellent articles written by many talented analysts using the data.
It’s my hope that both data providers continue to flourish and the analytics community on Twitter embrace both ideas and sets of data.
At least until the next crisis……..
If you are interested in seeing more about StrataData please send me a DM with your email address to @donceno – data is open to all (with stipulations) to aid the blogging community.