This piece seeks to explain how a large number of player-level statistics can quickly be turned into valuable information for scouting and recruiting. It presents a model used to compare players, look for similar players and place players in a larger context. To explain how our model works, we present a lower complexity version to exemplify the underlying mechanism and methods. The methods discussed form the core to our methods at Helpside Basketball when working with clients. We put them into practice to create a headstart in scouting on the behalf of combining both traditional scouting and data-driven investigations.
Oftentimes, the number of available stats for players (particularly offered by companies such as InStat or Synergy Sports) is overwhelming. Since categorizing players in the traditional 1-5 positions becomes increasingly an ineffective approach to modern scouting, a new way of capturing behavior and types of players is necessary. Thus, the presented models fulfill two purposes. First, to reduce the wide array of statistics per each player into fewer variables without losing too much information. Such a procedure enables us to develop a perspective on a player faster, compare players and evaluate them in a larger context. Second, we use this as a basis to develop clusters of players and, thereby, find player types similar to each other. This approach is depicted in Figure 1 where a larger number of variables to characterize players is reduced to, in the most drastically reducing case, one player type.
To exemplify how numerous variables that describe one player can be reduced in complexity, we discuss the methods on a dataset of roughly 1000 ‘overseas’ players. The dataset contains 10 variables, one per each play-type indicating its share in the player’s entire set of scoring attempts. In a first step, we create a visual contextualization of these players based on the share of play-types. We apply two models called principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). If you are curious about these techniques and how we apply them in detail, find this research document attached. Roughly speaking, PCA allows us to create a map that reduces the ten play-type variables per each player to fewer variables. To do so, new variables (principal components) are created which seek to explain as much as possible of the original ten variables. The result is displayed in Figure 2 with each point displaying a player.
Here, the ten variables indicating the play-type shares are reduced to only two variables, creating a good overview and general roadmap of all players. With some players and their position on this map highlighted, notice how we find different player types at different positions. Joffrey Lauvergne, a rather traditional big-men, in the east of our map. Point guard Tyrese Rice is situated at the very opposite, western part of the map. What PCA is great at, is to point out this very global-level structure of players: bigs to the right, guards to the left, and everyone else structured in the middle. Maintaining the basic principle: similar players are closer together, different players are further apart.
Yet, to detect more fine-grained differences and local-level clusters of similar players, the t-SNE method becomes more effective. t-SNE is applied to prove that these clusters exist (find more details here). Local cluster, as a term, describes players that display significant similarities within the play-type-based variables, which set them apart from the remainder of analyzed players. While t-SNE is unable to preserve the global structure generated from PCA (Figure 2), it is great at pointing out clusters on a rather local level. E.g. The red marked (see Figure 3) cloud of points which rather separates itself from the remainder, among others, is home to Vladimir Micov, Jeremy Nzeulie, Milenko Tepic, James Anderson, Egeha Arna. Thereby, we would evaluate them as similar player types, measured by how they use different play-types to score.
Yet, these presented tools fulfilled the first purpose we stated initially: to offer a perspective on a player faster, compare players,
and evaluate them in a larger context. For systematic scouting, i.e. to develop player clusters, scout similar players within a large database
of players, these tools do not hold the test.
The model we present allows, given a particular player, to distinguish between players with similar, rather similar or completely different profiles.
Therefore, we divide the entire set of players into clusters. Players in the same cluster can be said to show a 'similar profile'.
A ‘similar profile’ might be offensive behavior, offensive skills, defensive strengths, or an overall player profile. The definition might significantly
depend on the criteria you wish to find a similar player to. Here, we again use offensive behavior as a case study
based on the previously introduced dataset using play-type shares. Roughly you could say, that here players are in the same cluster
if they attempt to score from similar play-types.
The model is based on Kohonen-maps, a machine-learning model. It behaves similar to the previously presented models in the way
in accepts a datasets with multiple variables (here all the play-type shares), and creates a less complex, though maintaining the information,
data structure. Instead of creating something like a visual plot, Kohonen-maps sort observations (i.e. players) into clusters.
A final Kohonen-map then consists of multiple clusters (arranged in a map structure) that each contain a list of similar players that
are similar to each other. Furthermore, clusters which are located closer on the map to each other, are more similar than those further apart.
See the clusters in the interactive visualization of the Kohonen map: each cluster is home to a rather unique combination of play-types a player uses
to score. Hovering of the cells (i.e. the clusters) find a list of players identified with that cluster below.
This profile can be inferred from the barplot within each cell.
for each cell display the average shares of the respective play-types for the players witin the cluster. Here is an example.
For the 13 (top-left), these players most commonly attempt to score through spot-up situations (25% of their possessions), meanwhile on average the players in this cluster post-up only 2% of their possessions. Instead, players in cluster 4 (bottom-right), post-up in 16% of their possessions but and less often use spot-ups to score. Each of these clusters depicted in the figure has therefore some more or less unique distribution of play-type shares (i.e. which play-types the players identified with that cluster use to attempt scoring a FG). Based on these clusters, we are able to identify players that are similar to each other and display similar tendencies in their scoring behavior. For scouting procedures, this allows to quickly create lists of similar player types to those that we search for. Say a team wants to replace a player: this methodology allows us to identify the cluster the player belongs to, and search specifically for other players within this cluster that match the specified criteria. Combining this with traditional scouting, allows us to scout a large database of potential players at a much faster pace. This same approach can be used in even greater detail. While the cluster above use play-type data as a rather general criterion, we might also be more precise to instead include variables (which Synergy provides e.g. on PnR-behavior) to look for players matching a certain style and strength’s in their PnR game e.g. good passers or good ability to attack the basket afterward. In this way, this approach can be scaled to basically any more specific part of the game.
The visualization is based on the aweSOM package in R.
Our model at Helpside Basketball uses these workflows of mapping and clustering players to enhance scouting. In particular, it offers a headstart as we are able to shrink down a large list of players to only a few, that match the behavior of a type of player that is desired. As we discussed, ‘behavior’ here can be adapted to the context. Say, we aim to look for the player's general offensive behavior and how they score (i.e. what we did here using the play-type shares). But we can also investigate more closely, given the right data is available, to cluster centers only by the behavior in the post.