Football tracking

The intersection of sports analytics and computer vision has revolutionized how football is analyzed, coached, and understood. Modern data science enables teams to extract actionable insights from video footage, transforming subjective observations into objective, quantifiable metrics. This project leverages state-of-the-art deep learning techniques to automatically annotate football matches, providing detailed tactical analysis, player performance metrics, and game event detection.

Project Objective: Develop an automated system for real-time football match analysis using computer vision and deep learning, capable of detecting players, tracking movements, identifying game phases, and recognizing key events such as shots, passes, and tactical formations.

Technical Stack: YOLOv4, OpenPose, Python, OpenCV, PyTorch, Hungarian Algorithm, Real-time object detection, Pose estimation, Video processing pipelines.

Key Applications:

  • Performance analysis: Player positioning, movement patterns, work rate metrics
  • Tactical insights: Formation detection, pressing intensity, space utilization
  • Automated highlight generation: Key moments identification and compilation
  • Scouting intelligence: Player evaluation through objective metrics
  • Training optimization: Data-driven coaching decisions

Player Detection & Tracking - YOLO + Hungarian Algorithm

YOLOv4 for Real-Time Player Detection

Technology Overview: YOLO (You Only Look Once) represents a paradigm shift in object detection, processing entire images in a single forward pass through the neural network, enabling real-time performance crucial for sports analysis.

Why YOLOv4:

  • Real-time Performance: Processes 30+ frames per second on standard GPU hardware, essential for live match analysis
  • Pre-trained on COCO: The COCO dataset includes 80+ object categories including "person", providing robust baseline performance for player detection without extensive retraining
  • Accuracy-Speed Trade-off: YOLOv4 optimizes the balance between detection accuracy and inference speed through architectural innovations (CSPDarknet53 backbone, PANet, SAM attention)
  • Handles Occlusion: Effectively detects partially obscured players, common in crowded match scenarios

Multi-Object Tracking with Hungarian Algorithm

The Challenge: Detecting players in individual frames is insufficient for analysis—we need to maintain consistent player identities across the entire video sequence to track movement patterns, calculate distances covered, and analyze tactical positioning over time.

Hungarian Algorithm Solution: Originally developed for assignment optimization problems, the Hungarian algorithm solves the task of matching detections across consecutive frames by minimizing assignment costs.

Implementation Details:

  • Cost Metric - Intersection over Union (IoU):
    • Calculates overlap between bounding boxes from consecutive frames
    • Assumes players don't teleport between frames (temporal coherence)
    • IoU formula: Area(Intersection) / Area(Union) of two bounding boxes
    • High IoU (>0.5) indicates same player; low IoU suggests different players
  • Tracking Pipeline:
    1. Detect all players in frame N using YOLOv4
    2. Detect all players in frame N+1
    3. Calculate IoU matrix between all N and N+1 detections
    4. Apply Hungarian algorithm to find optimal assignment maximizing total IoU
    5. Assign consistent player IDs maintaining temporal continuity
  • Handling Edge Cases:
    • Players entering/exiting frame: Initialize new tracks or terminate existing ones
    • Detection failures: Implement Kalman filtering to predict positions during brief occlusions
    • Jersey number recognition: Integrate OCR for initial ID assignment assistance

Game Phase Segmentation - Half-Time Detection

Objective: Automatically identify match phases (first half, halftime, second half, extra time) to enable phase-specific analysis and ensure accurate temporal event alignment.

Methodology:

  • Activity Heuristics: Halftime characterized by minimal player activity, players leaving the field, empty pitch scenes
  • Player Count Analysis: Track number of detected players over time; dramatic drops indicate phase transitions
  • Motion Analysis: Calculate aggregate player movement using optical flow; halftime shows near-zero motion
  • Loss Function: Define error as percentage deviation between predicted and ground-truth phase boundaries (start/end timestamps)

Performance Metrics:

  • Achieved ±5 second accuracy in 90% of test cases
  • Handles broadcast variations (replays, commentary overlays, camera angles)
  • Robust to unusual events (injuries, VAR reviews, crowd disturbances)

Applications: Enables automated splitting of match footage for phase-specific analysis, fatigue assessment comparing first vs. second half performance, and precise temporal alignment with event data from other sources.


Shot Detection - Multi-Modal Approach

Initial Approach: YOLO Ball Tracking

Methodology: The first attempt used YOLOv4 to detect both players and the ball simultaneously, analyzing ball trajectory patterns to identify shots.

Challenges Encountered:

  • Scale Disparity: Football is significantly smaller than players (pixel area 10-100x smaller), making detection more challenging
  • Motion Blur: Ball moves at high velocity during shots (70+ mph), causing severe motion blur in standard 30fps video
  • Occlusion: Ball frequently obscured by players' bodies during crucial shot moments
  • False Positives: Similar objects (white markings, referee equipment, crowd items) misclassified as ball
  • Limited Accuracy: Achieved only 62% shot detection accuracy—insufficient for production use

Enhanced Approach: OpenPose Human Pose Estimation

Strategic Pivot: Rather than tracking the ball directly, identify shots by recognizing the characteristic body poses and movement sequences of a shooting player. This approach is more robust as human pose is less affected by motion blur and scale variations.

OpenPose Technology:

  • Skeletal Keypoint Detection: Identifies 25+ body keypoints (ankles, knees, hips, shoulders, elbows, wrists, etc.) with sub-pixel accuracy
  • Real-time Performance: Process video at 15-20 FPS on GPU hardware
  • Multi-person Detection: Simultaneously tracks poses of all visible players
  • Robust to Clothing: Works regardless of uniform color or style

Implementation Architecture:

  • Temporal Window Analysis:
    • Analyzed annotation data to determine optimal sequence length: 3 seconds (90 frames at 30fps)
    • Captures complete shooting motion: wind-up, contact, follow-through
    • Fixed duration enables consistent input to classification model
  • Feature Engineering:
    • Joint angles: Calculated angles between connected keypoints (e.g., hip-knee-ankle angle)
    • Angular velocities: Rate of change of joint angles indicates motion dynamics
    • Spatial relationships: Distances between keypoints (foot height, body orientation)
    • Temporal sequences: Stacked features across time creating motion signatures
  • Classification Model:
    • Architecture: LSTM (Long Short-Term Memory) network to capture temporal dependencies
    • Input: Sequence of 90 pose vectors (each containing 50+ features from keypoints)
    • Output: Binary classification (shot vs. non-shot) with confidence score
    • Training: Supervised learning on 5,000+ manually annotated shot sequences

Advanced Capabilities:

  • Player Attribution: Identify which specific player took the shot by associating pose keypoints with player tracking IDs
  • Shot Type Classification: Distinguish between:
    • Right foot vs. left foot shots (by analyzing which leg performs kicking motion)
    • Headers (elevated body position, neck/head orientation)
    • Volleys vs. ground shots (foot height at ball contact)
  • Quality Assessment: Estimate shot power and accuracy based on body mechanics:
    • Follow-through distance correlates with shot power
    • Body balance (center of mass position) indicates accuracy potential
    • Approach angle affects shot trajectory

Performance Improvements:

  • Achieved 87% shot detection accuracy (25% improvement over ball-tracking approach)
  • Reduced false positives by 60% through pose context
  • Added player-level shot attribution (impossible with ball tracking alone)
  • Enabled shot type classification providing richer analytics

Technical Advantages:

  • Scalability: OpenPose pre-trained on large datasets, no domain-specific retraining required
  • Generalization: Works across different camera angles, stadiums, and lighting conditions
  • Interpretability: Pose-based features are human-understandable, facilitating debugging and model refinement
  • Multi-use: Same pose detection pipeline enables other analyses (running distance, player fatigue, injury risk assessment)

Project Outcomes & Future Directions

Key Achievements

This project successfully demonstrates the application of cutting-edge computer vision and deep learning techniques to automated sports analytics. The developed system provides:

  • Real-time Player Tracking: Consistent player identification throughout match footage with 90%+ tracking accuracy
  • Automated Game Phase Detection: Precise segmentation of match periods with ±5 second accuracy
  • Intelligent Event Recognition: 87% accuracy in shot detection with player attribution and shot type classification
  • Scalable Architecture: Leverages pre-trained models (YOLOv4, OpenPose) minimizing custom training requirements
  • Multi-level Analytics: From individual player metrics to team-level tactical insights

Technical Learnings

  • Transfer Learning Effectiveness: Pre-trained models provide excellent starting points, dramatically reducing development time and data requirements
  • Multi-Modal Approaches: Combining different detection modalities (object detection + pose estimation) yields superior results than single-method approaches
  • Domain Adaptation: Sports video presents unique challenges (fast motion, occlusions, camera angles) requiring specialized preprocessing and post-processing pipelines
  • Iterative Refinement: Initial ball-tracking approach failure led to creative pivot toward pose-based detection—demonstrating importance of flexible problem-solving

Limitations & Challenges

  • Video Quality Dependency: Performance degrades with low-resolution footage or poor lighting conditions
  • Broadcast Variations: Different camera angles and zoom levels require adaptive processing
  • Computational Cost: Real-time processing demands high-end GPU hardware
  • Pass Detection Complexity: Accurate pass identification requires near-perfect ball tracking—remains unsolved challenge

Future Enhancement Opportunities

  • Pass Network Analysis: Develop robust ball tracking combined with player recognition to map passing patterns, quantify team connectivity, and identify playmakers
  • Formation Recognition: Apply clustering algorithms to player positions to automatically identify tactical formations (4-4-2, 4-3-3, etc.) and formation transitions
  • Space Control Metrics: Calculate pitch control probabilities using player positions and movement vectors, visualizing territorial dominance
  • Defensive Line Tracking: Monitor defensive line positioning to assess offside trap effectiveness and defensive organization
  • Physical Performance Metrics: Estimate distances covered, sprint counts, high-intensity running periods for fitness monitoring
  • Camera Calibration: Implement homography transformations to map pixel coordinates to real-world pitch positions, enabling precise distance measurements
  • Multi-Camera Fusion: Integrate feeds from multiple camera angles for complete pitch coverage and improved tracking robustness
  • Predictive Analytics: Build models predicting dangerous situations (expected goals - xG) based on player positions and game context

Practical Applications

  • Professional Clubs: Enhance scouting reports with objective performance data, reduce manual video analysis workload
  • Broadcast Enhancement: Generate real-time graphics and statistics for viewer engagement
  • Youth Development: Provide data-driven feedback to young players, track development progress objectively
  • Betting & Fantasy Sports: Inform prediction models with granular match data
  • Academic Research: Enable large-scale studies on tactics, training effectiveness, and game theory

Conclusion: This project illustrates the transformative potential of AI in sports analytics. By automating tedious annotation tasks, we free analysts to focus on interpretation and strategic insights. The techniques developed here are transferable to other sports (basketball, hockey, rugby) and domains requiring video-based activity recognition. As computer vision models continue improving and computational costs decrease, automated sports analytics will become accessible to organizations at all levels, democratizing data-driven decision-making in athletics.