Mastering Mathematical Optimization in Python for Fantasy Football
Written on
Chapter 1: Introduction to Mathematical Optimization
Mathematical optimization might not be as glamorous as large language models, yet it's a crucial skill in demand by top tech companies like Amazon and Meta. This tutorial is designed for beginners, guiding you through the use of OR-Tools, a powerful optimization library from Google, to create the ideal Fantasy Premier League (FPL) team. While basic Python knowledge is assumed, we'll begin with the fundamentals of OR-Tools, linear optimization, and the world of fantasy football.
Fantasy Premier League Overview
The examples in this tutorial utilize data from the Fantasy Premier League (FPL), which is the largest fantasy football league globally, featuring over 11 million players. As a manager, you are allocated a fictional budget of £100 million to assemble a squad of 15 players. Your team earns points based on player performances throughout the season (e.g., 4 points for a goal, 3 points for an assist, etc.). Each week, you can make substitutions, and a leaderboard tracks your ranking against other fantasy managers, making for a highly competitive environment!
Given the millions of possible player combinations, FPL is an ideal case for applying optimization techniques to maximize points.
Preparing the Data
To start, we will use FPL data available on Kaggle. Create a free account on Kaggle if you don't already have one to access the dataset, which includes 75 attributes for each of the 778 Premier League players, such as their team, position, cost (now_cost), and points_per_game.
To follow this tutorial, ensure you have the following libraries installed: numpy, pandas, and Google's OR-Tools. If you need to install them, run:
pip install numpy pandas ortools
Now, let's import these libraries and load the data:
import numpy as np
import pandas as pd
from ortools.sat.python import cp_model
# Load data
df = pd.read_csv('players.csv') # Adjust the path as needed
# Preview the dataset
print(df.shape)
display(df.head())
Note that the cost column (now_cost) is measured in £100k. For instance, the first player listed (Granit Xhaka) will cost 48 * £100k = £4.8 million. Keep in mind that this data is updated weekly, so figures may vary based on when you access them.
Before diving into optimization, let's randomly select 15 players and evaluate our team:
# Calculate the cost of selecting a random set of 15 players
total_cost = 0
total_points_per_game = 0
for i in np.random.choice(df['id'], 15):
cost = df[df['id'] == i].now_cost.values[0] / 10
points_per_game = df[df['id'] == i].points_per_game.values[0]
position = df[df['id'] == i].position.values[0]
print(f"Player {i}: Cost = £{cost}m, Position = {position}")
total_cost += cost
total_points_per_game += points_per_game
print(f"nTotal cost: £{total_cost}m")
print(f"nTotal points_per_game: {total_points_per_game}")
While our team cost is under the £100 million budget (a positive outcome), we have no forwards and only one goalkeeper — not ideal. Moreover, we've wasted some budget, spending only £72.2 million, which suggests we haven't selected the best players. Let's explore how we can enhance our selections through optimization.
Understanding Optimization
Mathematical optimization involves various techniques aimed at identifying the best solution from a vast array of possibilities. Sometimes the goal is merely to find feasible solutions (e.g., "identify all meals purchasable within a £10.32 budget"). This type of problem is known as constraint programming.
Other times, the focus is on determining the optimal solution under specific constraints (e.g., "find the least expensive meal with at least 1000 calories, containing two vegetables and one fish"). Such problems typically fall under linear programming (LP), where the objective is to maximize or minimize a value subject to certain linear constraints. It's worth noting that "programming" here relates more to "planning" or "solving" than to traditional coding.
Setting Constraints
To begin, we need to define our model and establish some basic constraints:
- We need to select 2 goalkeepers, 5 defenders, 5 midfielders, and 3 forwards (totaling 15 players).
- Our total expenditure must not exceed £100 million.
- No more than 3 players can be chosen from the same team (a rule enforced by FPL).
Additionally, we must ensure that each player can only be selected once.
While we could impose further constraints (e.g., excluding injured players), we will focus on these fundamental requirements for now.
In OR-Tools, constraints are added as a series of linear equations. For example, the requirement to select exactly 5 defenders can be expressed as:
def1 + def2 + def3 + ... + def262 = 5
where each def variable corresponds to a specific defender. Let's translate our constraints into linear equations using OR-Tools.
Constraint #1: Selecting the Required Player Types
# Define the model
model = cp_model.CpModel()
# Specify how many players we want to select for each position
POSITION_MAP = {
'Goalkeeper': {'code': 'GKP', 'count': 2},
'Defender': {'code': 'DEF', 'count': 5},
'Midfielder': {'code': 'MID', 'count': 5},
'Forward': {'code': 'FWD', 'count': 3}
}
# Initialize an empty dictionary for decision variables
decision_variables = {}
# Loop through POSITION_MAP to add constraints for each position
for position, details in POSITION_MAP.items():
players_in_position = list(df[df['position'] == details['code']].id.values)
player_count = details['count']
player_variables = {i: model.NewBoolVar(f"player{i}") for i in players_in_position}
decision_variables.update(player_variables)
model.Add(sum(player_variables.values()) == player_count)
After executing this code, we've added all decision variables to the model. Let's check the decision variables to ensure they were added correctly:
# Display decision variables
print(decision_variables.values())
# Verify the number of decision variables (should be 778)
print(len(decision_variables))
Constraint #2: Budget Limit
This constraint is straightforward. We set our budget (in £100k) and create a decision variable that represents the total expenditure.
# Constraint #2 - total cost must not exceed the budget
BUDGET = 1000
player_costs = {player: df[df['id'] == player]['now_cost'].values[0] for player in df['id']}
model.Add(sum(var * player_costs[i] for i, var in decision_variables.items()) <= BUDGET)
Constraint #3: Team Player Limit
This constraint is more complex, as we need to create a linear equation for each team.
# Constraint #3 - limit of 3 players per team
MAX_PLAYERS_PER_TEAM = 3
teams = df['team'].unique()
for team in teams:
eligible_players = df[df['team'] == team].id.values
model.Add(sum(decision_variables[i] for i in eligible_players) <= MAX_PLAYERS_PER_TEAM)
Adding the Objective Function
Now that we've defined the constraints, it's time to establish our objective function. This function tells the algorithm which criterion we aim to optimize.
To succeed in FPL, our goal is to accumulate the highest points. Thus, we need to select the combination of players that maximizes our potential points.
Although predicting which players will score the most points is challenging, we can use players' historical average points per game as a reasonable proxy.
# Add objective function: maximize points_per_game
player_points_per_game = {player: df[df['id'] == player]['points_per_game'].values[0] for player in decision_variables.keys()}
total_points = sum(var * player_points_per_game[i] for i, var in decision_variables.items())
model.Maximize(total_points)
Solving the Model
Finally, we instruct the model to find the optimal solution based on our constraints and objective function:
solver = cp_model.CpSolver()
status = solver.Solve(model)
Display the Solution
def show_solution(status):
"""
Print out the solution (if any).
"""
if status == cp_model.OPTIMAL:
print("Optimal solution found. Players selected:n")
total_cost = 0
total_points_per_game = 0
players = {'GKP': [], 'DEF': [], 'MID': [], 'FWD': []}
for i, var in decision_variables.items():
if solver.Value(var) == 1:
player_position = df[df['id'] == i].position.values[0]
players[player_position].append(i)
for position, ids in players.items():
print(f"nPlayers in {position}:")
for i in ids:
player_name = df[df['id'] == i]['name'].values[0]
player_cost = df[df['id'] == i]['now_cost'].values[0] / 10
player_team = df[df['id'] == i]['team'].values[0]
player_points_per_game = df[df['id'] == i]['points_per_game'].values[0]
print(f"{player_name}: {player_team}, £{player_cost}, {player_points_per_game}")
total_cost += player_cost
total_points_per_game += player_points_per_game
print("nTotal cost: ", total_cost)
print("nTotal points_per_game: ", total_points_per_game)else:
print("No solution found.")
show_solution(status)
By successfully optimizing our selections, we formed a team within the budget and achieved a total points per game of 86.0, a significant improvement from the 23.8 points scored through random selection!
Conclusion
Thank you for reading! I hope you found this guide insightful. Feel free to connect with me on Twitter or LinkedIn! 😊
One More Thing
I also run the SQLgym and publish a free newsletter called AI in Five, where I share weekly updates on AI news, coding tips, and career insights for Data Scientists and Analysts. Subscribe if you're interested!
Chapter 2: Video Tutorials
In this chapter, we will enhance our understanding through practical video demonstrations.
This first video, titled "Optimization for FPL with Python - Episode 1: Goalkeeper Selection," provides a comprehensive overview of optimizing your FPL team selection.
The second video, "Using Machine Learning to Make Fantasy Football Projections," explores how machine learning can enhance your fantasy football strategies.