The Value of a Ball-Playing Centre-Back

When Manchester United completed the signing of Harry Maguire from Leicester City for a record fee of €87M, as expected, people said that this was not worth the money; “he’s not worth more than Van Dijk”, “Manchester United have been robbed”, and so on. I justified this by saying that a player is worth whatever a team is willing to spend on that player, and how they value that player (be it the “wrong” way or the “right” way) is their problem and way of operating. But I want to use this example as a way to explore centre-backs, and how their growing importance coupled with the generally inflated transfer prices has had an impact on their market values.

Over the last half-decade or so, there has been an increase in importance being given to centre-backs who can “play out of the back” and who are “good with their feet”. What has caused this? One reason is that more teams have started pressing higher up the pitch and adopted a counter-pressing style to regain possession in the final third. To counter this, teams require players in all areas on the field to be skillful with the ball, needing defenders who are able to string passes comfortably from the back. This is one of many reasons why there has been a rise in demand of these versatile centre-backs.Soccer is a skills business, and like any skills business, as skillsets become more defined and rarer, their value proportionally should increase. Such has been the case with centre-backs as well. This demand of centre-backs as not only defenders, but as players who can contribute to possessions has caused this inflation of prices. Another symptom of this is that we seem to find fewer top-grade centre-backs now, because our criteria for judging them has changed. Quite simply, the standards have shifted.

Observe the following visual, which highlights the increase in xGBuildup contributed by centre-backs in Premier League teams over the past five seasons. By definition, xGBuildup is the “total xG of every possession the player is involved in without key passes and shots”.

Each point represents a single team, and how much xGBuildup their centre-backs contributed – xG data from understat.com.

So this is pretty clear, centre-backs are now more involved in starting/being part of meaningful possessions than before, and this will only continue to grow. Now, let’s compare this with the transfer prices of centre-backs. The transfer prices are rapidly increasing as well, representing the value being attached to these players. Of course, let’s keep in mind the general inflation of transfer prices over the last half-decade, but the centre-back valuations have gone up drastically even given this inflation.

Top 10 transfer prices of centre-backs for each of the last six seasons – transfer data from transfermarkt.com.

Furthermore, if we categorize the top 25 centre-back transfers of all time, here’s how they shape up in terms of seasons, and we can see that heavy bias towards the past few years, again signalling the increase in value and demand for players in this position.

Transfer data from transfermarkt.com.

By taking into account how rare a good ball playing centre-back in England’s first division is, how inflation of transfer prices have generally created such a market of free spending, and the fact that Manchester United played 23 different defensive lines last season, United’s decision to spend a record fee on Harry Maguire is at least a little justified. They are willing to splash out so much money for a player who can solve their defensive instability, is still relatively young (26 years old), English, and can confidently carry the ball into the midfield – which is something Ole Gunnar Solskjær looks to implement in his team. I tweeted a short thread about his intent and centre-back options going into the new season, and how he finds himself with three very comfortable ball-playing centre-backs.

https://platform.twitter.com/widgets.js

This entire post is starting to lean towards a defense of Manchester United’s purchase of Maguire, but I think what stems from that idea is a bigger discussion about how the game is changing. The game has changed the demands of centre-backs faster than the centre-backs have evolved and adapted to those demands, causing this misalignment in our valuation of players and their actual transfer prices. However, I don’t think this trend will continue for too long, as teams and coaches start training defenders differently in order to adapt to this change; creating that supply of “young ball-playing centre-backs” who are in such heavy demand.

Feedback welcome, as always.

Twitter: https://twitter.com/yatin_kapur 

Tracking Time Spent Leading vs. Trailing

Not too long ago, during the World Cup, I created a vizualisation exploring how long each team was in front for and how long they were behind during the World Cup group stages. The purpose was to explore which teams just got through vs. the ones that were convincing throughout the three games.

I was collecting this data from football-lineups.com, and I noticed that the script I use to fetch data can be applied to collect data for the Premier League. So I went ahead and collected all of the Premier League data that the site had. Initially I was storing this locally, but after I saw that there is almost 20 seasons’ worth of data, I realized that it would definitely be in better practice to use a cloud database – which is why I migrated the SQL database to Google Cloud Platform.

I spent a few days meditating over what to do with this data, and then I remembered this tweet by Tom:

This really triggered my imagination and showed me how I could present, in summation, a team’s profile of what times they were leading at and trailing at all minutes of the game during a season. I made a simple Flask app and used D3.js for a very similar visualization to what the above tweet shows. The following is what I’ve put together. From left to right, we go from minute 0 to 90.

The method for determining when a team was leading and trailing was pretty straight forward. I first parsed the goal times for every game, and inserted them into the scores table. I would then, in a different script, fetch these times for a given a game and then run through minute 0 to 90 to add an entry for home score and away score at each minute. The obvious issue is when a goal is scored at the 9xth minute. I dealt with this issue by adding x+1 more entires for this game, which means that not all games have score entries exactly in the [0, 90] interval. This creates a slight inconsistency in the data, but the cases are far and few between so it’s not a major problem. Finally, I wrote a SQL query to return ‘1’ in case the team was in front, ‘2’ when scores were level, and ‘3’ when behind. I send this data via a POST request to the html template, and then I use D3.js to append svg rectangles and produce the colours for every minute.

The use case for this tool depends on the questions being asked. Teams can use this to analyze when opposition is prone to lose their lead, when they tend to score goals during the game. Game state implications can also be measured, and change in tactics and strategy can be implemented after understanding what usually follows after the opposition has taken the lead or lost the lead. Managers and coaching staff can simulate these situations in training and observe how players respond to the three different game states, all with different amounts of time remaining on the clock.

Long term impacts can also be measured by looking at for how long a team is able to sustain it’s pattern of winning or losing, and how current form impacts future games. All of these can be acted upon by managers in order to adapt to circumstances accordingly.

I deployed the app before the start of the season on Google App Engine last week, and it seems to be working alright with the new data that’s been collected for Premier League Matchday 1 so far.

I hope to eventually develop a model which can help in predicting the final points tally of a team given the average time spent leading, and average time spent trailing, per 90 minutes. I’ve got 440 data points to work with, which is 22 seasons times 20 league positions. A clustering or regression algorithm could help, but I want to learn about more parameters that I can add to a model looking at game state. Perhaps something to do with how many times a team loses a lead or retakes the lead might be helpful. Feedback always welcome!

GitHub repo: github.com/yatin-kapur/leadingtrailing
Twitter: twitter.com/yatin_kapur

How Possession Adjusted Shot Frequency Impacts Shots on Target

The idea behind this much, much overdue post is to explore how time-to-shoot impacts shot quality. Due to lack of xG data, I’ll use shots on target for now. I’ll compare these with how many goals were also scored, to understand how goals are impacted by TTS and shots on target, are teams who shoot more frequently converting more of their shots on target than ones who shoot less frequently? That’s kind of the premise of doing this analysis.

More importantly I’ll be trying to explore how being at home or not impacts this, we’ll see how away teams generally tend to perform and what type of approach are they forced to, or choose to take when attacking the home team.

One may point out how is this much different from looking at just shots and shots on target, because after all, TTS is a function of the number of shots. The difference is that TTS is possession adjusted, so a lower TTS doesn’t necessarily mean a higher number of shots on target, it introduces also how well a team used the possession that they had. As such, if a team has a low TTS and high number of shots on target, they’ve actually made use of the possession very well – as opposed to a team who has a lower number of shots on target.

Right from the start, before I even start looking at any data, I’m inclined to say that this will take the shape of a very roughly normally distributed curve.

My thinking behind this is that if a team takes an unusually low time to shoot, it’s likely because they’re shooting from outside the box OR taking really xG shots. And thus not many of them would end up on target. As we move into a range where time to shoot is relatively higher, but still probably normal, I’d think most teams are getting wiser and picking and choosing their opportunities to have a go and not wildly shooting as with teams with super low TTS numbers. After some point, though, diminishing returns will start showing and teams who are too ponderous on the ball will fail to register a good number of shots on target just because they take too long in shooting.

All the data I’ve got is from https://football-lineups.com, I web-scraped all the fixtures from the 2016/17 season and collected the shots, shots on target, and possession stats to calculate the TTS for each team in each fixture.

The calculation for TTS was fairly easy; team possession * (90 * 60) / shots.
Where team possession is a decimal between 0.00 to 1.00 and the 90 * 60 is to represent the seconds in a game.
I took data from the top 5 leagues for each game in the 2016/17 season

The result of plotting this data came out as such:

Home team data for SOT plotted against TTS.
Home team data for Goals plotted against TTS.

We observe a very basic trend, that generally when a team has more shots on target it implies that they took less time to shoot. The home team is regularly getting a higher number of shots on target whilst shooting at a good rate. It’s also worth noting that although it’s a relatively clear negative relationship, it does slightly stabilize at 9+ shots on target. This could suggest that teams who are shooting at a rate of 100-200seconds/shot have better quality shooters in their team who are able to hit the target more often as opposed to teams who are near 1-4SOT and still taking at 100-200seconds/shot showing how although they are making good use of their time on the ball with good shooting numbers, they are not keeping hold of the ball enough. In order to get into the higher numbers of shots on target, they likely need players who can hold the ball better as opposed to better finishers.

As for goals, we can observe a similar shape, the less time a team takes to shoot, the more goals they take a similar shape. I’m interested to understand why there’s a spread from the 4-6 goals mark. My assumption is that there are teams who rely on taking a lot of shots to score that many goals, and there are teams who take a relatively lower number of shots but are taking higher percentage chances.

It would be good to observe of xG changes the graphs’ shapes as we can then clearly define high/low quality chances and shots.

 

Away team data on SOT plotted against TTS.

The away team data is fascinating, it looks similar but there are differences in that the away team is rarely able to get 10+ shots on target. The away team approach to play is also probably quite different in that they don’t always look to shoot as frequently, but they much rather prefer taking better type of shots. This is reflected in how there is a lesser of a negative trend and more of a stable trend, although still declining in the number of shots on target taken. I’d also assume that the away teams have less possession, and this impacts their time to shoot which seems to be higher on average than the home team. So we can sort of tell that the away team will have less possession and be more picky of when to shoot. I think this is a typical trait where a team that has less of the ball wants to make the most of it and not rush into chances, as we saw with the home team who posted lower TTS numbers.

The goals data is really similar to the SOT for the away team as well, as you’d expect after looking at the home team data.

Away team data on Goals plotted against TTS.

After writing this I felt slightly dissatisfied because it wasn’t thorough enough and didn’t point to anything concrete as I was hoping when I started out. Nevertheless, I’ll try to do more and better things from here on out. I expected the graphs to come out as a bell curve but that wasn’t the case, I should have thought about how TTS is a function of shots taken and why that would impact the graphs in the way that it did. Clearly, more shots = less TTS and more shots = more shots on target, but I hope this helped in understanding how TTS takes into account the possession statistics which can dictate the way a team selects their shots.


Follow this blog and connect with me on twitter!
https://twitter.com/Yatin_Kapur