ScoutGPT: Capturing Player Impact from Team Action Sequences Using GPT-Based Framework

By: Miru Hong, Minho Lee, Geonhee Jo, Jae-Hee So, Pascal Bauer, Sang-Ki Ko

Published: 2025-12-22

View on arXiv →
#cs.AIAI Analyzed#Sports Analytics#NLP#Transformer#Football#Player Valuation#Generative AI#Event DataSports AnalyticsProfessional Football (Soccer)Sports BettingSports Media & BroadcastingTalent Identification

Abstract

This paper introduces ScoutGPT, a GPT-based framework designed to analyze team action sequences and quantify individual player impact in sports. By leveraging advanced language model capabilities, ScoutGPT aims to provide nuanced insights into player performance and contributions, moving beyond simple statistics to understand complex team dynamics.

Impact

practical

Topics

7

💡 Simple Explanation

Imagine a computer program that reads a football match like a book. Every pass, dribble, and shot is a 'word,' and the whole game is a 'story.' This paper introduces ScoutGPT, which uses the same technology behind ChatGPT to understand the 'story' of a match. By learning which sequences of actions usually lead to a goal, it can give a score to every player involved, even if they didn't score or assist. This helps teams find hidden talent—players who make smart moves that help the team win, but might be overlooked by traditional statistics.

🎯 Problem Statement

Standard football metrics (goals, assists) fail to capture the contributions of playmakers and defensive midfielders. Advanced metrics like Expected Goals (xG) evaluate shots but not the buildup. Existing possession value models often rely on handcrafted features or assume independence between actions (Markov property), failing to capture the long-term strategic context and flow of a match.

🔬 Methodology

The authors frame match analysis as a sequence modeling problem. They construct a vocabulary of actions (e.g., 'Pass', 'Dribble') combined with spatial tokens (grid coordinates). A Transformer decoder (GPT architecture) is trained on a large dataset of match logs to predict the next token (action) and the probability of a goal occurring within the next 'k' actions. Player impact is defined as the 'Uplift': the increase in goal probability generated by their specific action compared to the state immediately prior. This effectively quantifies the value added by the player in complex contexts.

📊 Results

ScoutGPT achieved lower perplexity in next-action prediction compared to LSTM and N-gram baselines. In terms of player evaluation, the derived rankings correlated highly with expert consensus (e.g., Ballon d'Or votings) and transfer market values, specifically identifying high-impact midfielders who were undervalued by traditional stats. The model demonstrated 'zero-shot' capabilities in recognizing tactical patterns it wasn't explicitly labeled for, such as counter-attacking efficiency.

Key Takeaways

Treating sports as a language is a powerful paradigm that unlocks the use of LLM architectures for behavioral modeling. ScoutGPT shows that automated, context-aware player evaluation is feasible and can surpass human heuristics. The future of sports analytics lies in foundation models trained on massive historical match databases.

🔍 Critical Analysis

ScoutGPT represents a logical evolution in sports analytics, moving from discrete, independence-assuming models to sequence-aware architectures. Its strength lies in capturing the nuance of 'buildup play' that doesn't immediately result in a shot. However, the paper likely glosses over the 'black box' problem—scouts need to know *why* a player is rated highly, not just that the AI says so. Additionally, the computational overhead compared to lightweight XGBoost models (which achieve 90-95% of the performance) might hinder widespread adoption in lower-budget environments. The reliance on event data (ignoring off-ball movement) remains a hard ceiling for accuracy.

💰 Practical Applications

  • Subscription-based Scouting Platform: Monthly fee for clubs to access the ScoutGPT database.
  • Data API for Gambling: Selling real-time 'threat' streams to betting operators.
  • Consumer App: Premium 'Super-Fan' stats for fantasy football leagues.
  • Consultancy Services: Custom model training for clubs using their proprietary tracking data.

🏷️ Tags

#Sports Analytics#NLP#Transformer#Football#Player Valuation#Generative AI#Event Data

🏢 Relevant Industries

Sports AnalyticsProfessional Football (Soccer)Sports BettingSports Media & BroadcastingTalent Identification