Trying To Find Similar Players (Again)

Something I’ve written about (and since deleted) on here before is trying to find similar players. The last method involved clustering passes and getting the percentage of a player’s passes that come within each cluster. Then I could get the distance between the distribution of percentages for different players before getting the distance for the output of different players too (things like their xG, number of passes etc). The problem was I didn’t normalise these output numbers so some figures would have skewed the results quite a bit.

Now, I don’t have event data, so I won’t be able to do the clustering, but last year I downloaded a load of the player data on WyScout. It won’t be as good as having the event data and there’s been lots of talk on Twitter about some questionable WyScout figures, but I thought what the hell. I’ve also been reading ‘Football Hackers’ by Christoph Biermann, where Sven Mislintat mentioned seeing Lucas Torreira pop up as a similar player to N’Golo Kanté, which made me want to try and do something like this again.

I have tried the new method with FAWSL data, but with only twelve teams the results aren’t too exciting. It may be something I come back to, but for now, I’ll just use the player data from WyScout.

The data is from the 2018/19 season rather than this season, but it’s for players in 67 leagues. I’ve also said they’ve got to have played more than 1350 minutes to try and weed out numbers that come from small samples. I want to try and focus on players who play in smaller leagues, to try and do a hybrid with the piece I did about players in other leagues (it can be seen on the Wayback Machine here, which I normally wouldn’t link but it mentioned Erling Haaland before he joined Salzburg – though the transfer was already arranged – which I want to gloat about).

The method uses 78 metrics, which is basically everything on WyScout that isn’t a total or a set-piece figure. I’ll probably fine-tune certain filters though, depending on what aspects of the player’s game seem the most ‘them’. Anyway, after rambling on for a while, let’s get into some players.