import time
import math
import json
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold, GridSearchCV
from sklearn.base import clone
import xgboost as xgb
from xgboost.callback import TrainingCallback
import plotly.graph_objects as go
import pickle
import random
1 XGBoost classification model for pairwise ranking
1.1 Objective
Objective: Testing the performance of an XGBoost model trained for ranking pairwise schedules.
1.2 Background
In this experiment we develop a Machine Learning model using XGBoost that can evaluate two neighboring schedules and rank them according to preference. This ranking model can be applied to quickly guide the search process towards a ‘good enough’ solution.
The choice of using an ordinal model instead of a cardinal model is based on the consideration that it is significantly easier to determine whether alternative A is superior to B than to quantify the exact difference between A and B. This makes intuitive sense when considering the scenario of holding two identical-looking packages and deciding which one is heavier, as opposed to estimating the precise weight difference between them. (Ho et al. 2000).
1.3 Hypothesis
An XGBoost ranking model achieves superior computational efficiency compared to evaluating each element of a pair individually, leading to faster overall performance in ranking tasks.
1.4 Methodology
1.4.1 Tools and Materials
We use packages from Scikit-learn to prepare training data and evaluate the model and the XGBClassifier
interface from the XGBoost library.
1.4.2 Experimental Design
To compare an XGBoost Machine Learning model with a simple evaluation of each individual element of the pair, we will use a pairwise ranking approach. The objective is to rank two neighboring schedules according to preference.
from functions import get_v_star
= 12 # Number of patients
N = 18 # Number of intervals
T = 5 # Length of each interval
d = [0.0, 0.27, 0.28, 0.2, 0.15, 0.1] # Service times distribution
s = 0.20 # Probability of a scheduled patient not showing up
q = 0.8 # Weight for the waiting time in objective function
w = 20000 # Number of schedules to sample
num_schedules = get_v_star(T) v_star
We will create a random set of pairs of neighboring schedules with \(N = 12\) patients and \(T = 18\) intervals of length \(d = 5\).
A neighbor of a schedule x is considered a schedule x’ where single patients have been shifted one interval to the left. Eg: ([2,1,1,2], [1,2,0,3]) are neighbors and ([2,1,1,2], [2,1,3,0]) are not, because [1,2,0,3] - [2,1,1,2] = [-1, 1, -1, 1] and [2,1,3,0] - [2,1,1,2] = [0, 0, 2, -2].
Service times will have a discrete distribution. The probability of a scheduled patient not showing up will be \(q = 0.2\).
The objective function will be the weighted average of the total waiting time of all patients and overtime. The model will be trained to predict which of the two neighboring schedules has the lowest objective value. The prediction time will be recorded. Then the same schedules will be evaluated by computing the objective value and then ranked.
1.4.3 Variables
- Independent Variables: A list of tuples with pairs of neighboring schedules.
- Dependent Variables: A list with rankings for each tuple of pairwise schedules. Eg: If the rank for ([2,1,1], [1,1,2]) equals 0 this means that the schedule with index 0 ([2,1,1]) has the lowest objective value.
1.4.4 Data Collection
The data set will be generated using simulation in which random samples will be drawn from the population of all possible schedules. For each sample a random neighboring schedule will be created.
1.4.5 Sample Size and Selection
Sample Size: The total population size equals \({{N + T -1}\choose{N}} \approx\) 52.0 mln. For this experiment we will be using a relatively small sample of 20000 pairs of schedules.
Sample Selection: The samples will be drawn from a lexicographic order of possible schedules in order to accurately reflect the combinatorial nature of the problem and to ensure unbiased sampling from the entire combinatorial space.
1.4.6 Experimental Procedure
The experiment involves multiple steps, beginning with data preparation and concluding with model evaluation.The diagram below illustrates the sequence of steps.
Step 1: Randomly select a subset of schedules.
from functions import random_combination_with_replacement
= time.time()
start = random_combination_with_replacement(T, N, num_schedules)
schedules print(f"Sampled: {len(schedules)} schedules\n")
= random.choices(range(len(schedules)), k=7)
h print(f"Sampled schedules: {h}")
for i in h:
print(f"Schedule: {schedules[i]}")
= time.time()
end = end - start
data_prep_time
print(f"\nProcessing time: {data_prep_time} seconds\n")
Total number of combinations: 51,895,935
Sampled: 20000 schedules
Sampled schedules: [3143, 16624, 16871, 2155, 2873, 3097, 6994]
Schedule: [7, 1, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Schedule: [6, 1, 4, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Schedule: [5, 1, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
Schedule: [5, 1, 3, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Schedule: [4, 2, 1, 2, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Schedule: [3, 3, 3, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
Schedule: [7, 0, 2, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
Processing time: 0.2260730266571045 seconds
Step 2: Create pairs of neighboring schedules.
from functions import create_neighbors_list
= time.time()
start = [create_neighbors_list(schedule, v_star) for schedule in schedules] # This can be done in parellel to improve speed
neighbors_list = time.time()
end for i in h:
= neighbors_list[i][0]
original_schedule = neighbors_list[i][1]
neighbor_schedule = [x - y for x, y in zip(neighbors_list[i][0], neighbors_list[i][1])]
difference print(f"Neighbors\n{original_schedule}\n{neighbor_schedule}\n{difference}")
= end - start
training_set_feat_time print(f"\nProcessing time: {training_set_feat_time} seconds\n")
Neighbors
[7, 1, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[7, 1, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Neighbors
[6, 1, 4, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[7, 1, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[-1, 0, 1, 0, 0, 0, 0, -1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Neighbors
[5, 1, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
[5, 0, 3, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1]
[0, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1]
Neighbors
[5, 1, 3, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[4, 1, 4, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[1, 0, -1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1]
Neighbors
[4, 2, 1, 2, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[5, 1, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[-1, 1, -1, 1, 0, 0, -1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Neighbors
[3, 3, 3, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
[3, 3, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, -1, 1, 0, 0, 0, -1]
Neighbors
[7, 0, 2, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[7, 1, 2, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[0, -1, 0, 1, -1, 0, 0, 1, 0, -1, 1, 0, 0, 0, 0, 0, 0, 0]
Processing time: 1.674152135848999 seconds
Step 3: For each schedule in each pair calculate the objective. For each pair save the index of the schedule that has the lowest objective value.
from functions import calculate_objective
= [w * calculate_objective(neighbor[0], s, d, q)[0] + (1 - w) * calculate_objective(neighbor[0], s, d, q)[1] for neighbor in neighbors_list]
objectives_schedule_1 = time.time()
start = [w * calculate_objective(neighbor[1], s, d, q)[0] + (1 - w) * calculate_objective(neighbor[1], s, d, q)[1] for neighbor in neighbors_list]
objectives_schedule_2 = time.time()
end = end - start
training_set_lab_time = [[obj, objectives_schedule_2[i]] for i, obj in enumerate(objectives_schedule_1)]
objectives = np.argmin(objectives, axis=1).tolist()
rankings for i in range(5):
print(f"Objectives: {objectives[i]}, Ranking: {rankings[i]}")
print(f"\nProcessing time: {training_set_lab_time} seconds\n")
# Saving neighbors_list and objectives to a pickle file
= 'datasets/neighbors_and_objectives.pkl'
file_path with open(file_path, 'wb') as f:
'neighbors_list': neighbors_list, 'objectives': objectives, 'rankings': rankings}, f)
pickle.dump({print(f"Data saved successfully to '{file_path}'")
Objectives: [40.7132903786003, 44.29093770976293], Ranking: 0
Objectives: [53.109972522414175, 43.64776101084806], Ranking: 1
Objectives: [53.4264119642738, 46.525074507348194], Ranking: 1
Objectives: [47.571610887503645, 47.571610887503645], Ranking: 0
Objectives: [49.47536904144756, 40.17117767200208], Ranking: 1
Processing time: 17.09611225128174 seconds
Data saved successfully to 'datasets/neighbors_and_objectives.pkl'
Step 4: Create training and test sets.
# Prepare the dataset
= []
X for neighbors in neighbors_list:
0] + neighbors[1])
X.append(neighbors[
= np.array(X)
X = np.array(rankings)
y
# Split the dataset into training and test sets
= train_test_split(X, y, test_size=0.2, random_state=42) X_train, X_test, y_train, y_test
Step 5: Train the XGBoost model.
class CustomCallback(TrainingCallback):
def __init__(self, period=10):
self.period = period
def after_iteration(self, model, epoch, evals_log):
if (epoch + 1) % self.period == 0:
print(f"Epoch {epoch}, Evaluation log: {evals_log['validation_0']['logloss'][epoch]}")
return False
def fit_and_score(estimator, X_train, X_test, y_train, y_test):
"""Fit the estimator on the train set and score it on both sets"""
=[(X_test, y_test)], verbose=0
estimator.fit(X_train, y_train, eval_set
)
= estimator.score(X_train, y_train)
train_score = estimator.score(X_test, y_test)
test_score
return estimator, train_score, test_score
= StratifiedKFold(n_splits=5, shuffle=True, random_state=94)
cv
# Initialize the XGBClassifier without early stopping here
# Load the best trial parameters from a JSON file.
with open("model_params.json", "r") as f:
= json.load(f)
model_params
# Initialize the EarlyStopping callback with validation dataset
= xgb.callback.EarlyStopping(
early_stop =10, metric_name='logloss', data_name='validation_0', save_best=True
rounds
)
= xgb.XGBClassifier(
clf ="hist",
tree_method=model_params["max_depth"],
max_depth=model_params["min_child_weight"],
min_child_weight=model_params["gamma"],
gamma=model_params["subsample"],
subsample=model_params["colsample_bytree"],
colsample_bytree=model_params["learning_rate"],
learning_rate=model_params["n_estimators"],
n_estimators=9,
early_stopping_rounds#callbacks=[CustomCallback(period=50), early_stop],
=[CustomCallback(period=50)],
callbacks
)print("Params: ")
for key, value in model_params.items():
print(f" {key}: {value}")
= time.time()
start = []
results
for train_idx, test_idx in cv.split(X, y):
= X[train_idx], X[test_idx]
X_train, X_test = y[train_idx], y[test_idx]
y_train, y_test = fit_and_score(
est, train_score, test_score
clone(clf), X_train, X_test, y_train, y_test
)
results.append((est, train_score, test_score))= time.time()
end = end - start
training_time print(f"\nTraining time: {training_time} seconds\n")
Params:
max_depth: 6
min_child_weight: 1
gamma: 0.1
subsample: 0.8
colsample_bytree: 0.8
learning_rate: 0.1
n_estimators: 100
Epoch 49, Evaluation log: 0.05455219040368683
Epoch 99, Evaluation log: 0.04216296073211015
Epoch 49, Evaluation log: 0.05368845318281092
Epoch 99, Evaluation log: 0.04094662195513956
Epoch 49, Evaluation log: 0.04904532701848075
Epoch 99, Evaluation log: 0.03687723997185094
Epoch 49, Evaluation log: 0.05073861599853263
Epoch 99, Evaluation log: 0.0393132749446886
Epoch 49, Evaluation log: 0.04947142685431754
Epoch 99, Evaluation log: 0.03850802336719062
Training time: 0.9882938861846924 seconds
Step 6: To evaluate the performance of the XGBoost ranking model, we will use Stratified K-Fold Cross-Validation with 5 splits, ensuring each fold maintains the same class distribution as the original dataset. Using StratifiedKFold(n_splits=5, shuffle=True, random_state=94), the dataset will be divided into five folds. In each iteration, the model will be trained on four folds and evaluated on the remaining fold. A custom callback, CustomCallback(period=10), will print the evaluation log every 10 epochs.
The fit_and_score function will fit the model and score it on both the training and test sets, storing the results for each fold. This provides insight into the model’s performance across different subsets of the data, helps in understanding how well the model generalizes to unseen data and identifies potential overfitting or underfitting issues. The overall processing time for the cross-validation will also be recorded.
# Print results
for i, (est, train_score, test_score) in enumerate(results):
print(f"Fold {i+1} - Train Score (Accuracy): {train_score:.4f}, Test Score (Accuracy): {test_score:.4f}")
Fold 1 - Train Score (Accuracy): 0.9925, Test Score (Accuracy): 0.9862
Fold 2 - Train Score (Accuracy): 0.9922, Test Score (Accuracy): 0.9872
Fold 3 - Train Score (Accuracy): 0.9920, Test Score (Accuracy): 0.9890
Fold 4 - Train Score (Accuracy): 0.9924, Test Score (Accuracy): 0.9878
Fold 5 - Train Score (Accuracy): 0.9922, Test Score (Accuracy): 0.9882
Training the model on the entire dataset provides a final model that has learned from all available data. Recording the training time helps in understanding the computational efficiency and scalability of the model with the given hyperparameters.
# Fit the model on the entire dataset
# Initialize the XGBClassifier without early stopping here
= time.time()
start
= xgb.XGBClassifier(
clf ="hist",
tree_method=model_params["max_depth"],
max_depth=model_params["min_child_weight"],
min_child_weight=model_params["gamma"],
gamma=model_params["subsample"],
subsample=model_params["colsample_bytree"],
colsample_bytree=model_params["learning_rate"],
learning_rate=model_params["n_estimators"],
n_estimators
)
clf.fit(X, y)= time.time()
end= end - start
modeling_time
# Calculate and print the training accuracy
= clf.score(X, y)
training_accuracy print(f"Training accuracy: {training_accuracy * 100:.2f}%\n")
print(f"\nTraining time: {modeling_time} seconds\n")
Training accuracy: 99.23%
Training time: 0.1360487937927246 seconds
1.4.7 Validation
Generating test schedules and calculating their objectives and rankings allows us to create a new dataset for evaluating the model’s performance on unseen data.
= 1000
num_test_schedules
= random_combination_with_replacement(T, N, num_test_schedules)
test_schedules = [create_neighbors_list(test_schedule, v_star) for test_schedule in test_schedules] # This can be done in parellel to improve speed
test_neighbors
print(f"Sampled: {len(test_schedules)} schedules\n")
= [w * calculate_objective(test_neighbor[0], s, d, q)[0] + (1 - w) * calculate_objective(test_neighbor[0], s, d, q)[1] for test_neighbor in test_neighbors]
test_objectives_schedule_1 # Start time measeurement for the evaluation
= time.time()
start = [w * calculate_objective(test_neighbor[1], s, d, q)[0] + (1 - w) * calculate_objective(test_neighbor[1], s, d, q)[1] for test_neighbor in test_neighbors]
test_objectives_schedule_2 = [0 if test_obj < test_objectives_schedule_2[i] else 1 for i, test_obj in enumerate(test_objectives_schedule_1)]
test_rankings = time.time()
end = end - start
evaluation_time
# Combine the objectives for each pair for later processing
= [[test_obj, test_objectives_schedule_2[i]] for i, test_obj in enumerate(test_objectives_schedule_1)]
test_objectives
print(f"\nEvaluation time: {evaluation_time} seconds\n")
for i in range(6):
print(f"Neighbors: {test_neighbors[i]},\nObjectives: {test_objectives[i]}, Ranking: {test_rankings[i]}\n")
Total number of combinations: 51,895,935
Sampled: 1000 schedules
Evaluation time: 0.898392915725708 seconds
Neighbors: ([6, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0], [6, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]),
Objectives: [34.10370626270946, 36.3890775770396], Ranking: 0
Neighbors: ([6, 1, 2, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [6, 0, 2, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0]),
Objectives: [42.168055975490745, 33.26579003483442], Ranking: 1
Neighbors: ([4, 4, 1, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [4, 5, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
Objectives: [43.17359852761168, 47.900340916589684], Ranking: 0
Neighbors: ([5, 3, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [5, 4, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
Objectives: [55.90431273631407, 63.446546787123296], Ranking: 0
Neighbors: ([7, 1, 2, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [7, 0, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0]),
Objectives: [53.4264119642738, 45.89097380607223], Ranking: 1
Neighbors: ([4, 1, 5, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], [5, 1, 4, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]),
Objectives: [34.31133011310308, 39.00838306816456], Ranking: 0
Making predictions on new data and comparing them to the actual rankings provides an evaluation of the model’s performance in practical applications. Recording the prediction time helps in understanding the model’s efficiency during inference.
= test_neighbors
input_X = []
X_new for test_neighbor in input_X:
0] + test_neighbor[1])
X_new.append(test_neighbor[
# Predict the target for new data
= clf.predict(X_new)
y_pred
# Probability estimates
= time.time()
start = clf.predict_proba(X_new)
y_pred_proba = time.time()
end = end - start
prediction_time print(f"\nPrediction time: {prediction_time} seconds\n")
print(f"test_rankings = {np.array(test_rankings)[:6]}, \ny_pred = {y_pred[:6]}, \ny_pred_proba = \n{y_pred_proba[:6]}")
Prediction time: 0.004464864730834961 seconds
test_rankings = [0 1 0 0 1 0],
y_pred = [0 1 0 0 1 0],
y_pred_proba =
[[0.97218204 0.02781794]
[0.00402802 0.995972 ]
[0.99254835 0.00745166]
[0.99486387 0.00513611]
[0.07889068 0.9211093 ]
[0.99056184 0.00943818]]
Calculating the ambiguousness of the predicted probabilities helps in understanding the model’s confidence in its predictions. High ambiguousness indicates uncertain predictions, while low ambiguousness indicates confident predictions.
Ambiguousness is calculated using the formula for entropy:
\[ H(X) = - \sum_{i} p(x_i) \log_b p(x_i) \]
Where in our case:
\(H(X)\) is the ambiguousness of the random variable \(X\) - the set of probability scores for the predicted rankings,
\(p(x_i)\) is probability score \(x_i\),
\(\log_b\) is the logarithm with base \(b\) (here \(\log_2\) as we have two predicted values),
The sum is taken over all possible outcomes of \(X\).
Calculating cumulative error rate and cumulative accuracy helps in understanding how the model’s performance evolves over the dataset.
Visualizing the relationship between ambiguousness and error provides insights into how uncertainty in the model’s predictions correlates with its accuracy. This can help in identifying patterns and understanding the conditions under which the model performs well or poorly.
from functions import calculate_ambiguousness
= np.abs(y_pred - np.array(test_rankings))
errors
= calculate_ambiguousness(y_pred_proba)
ambiguousness: np.ndarray = pd.DataFrame({"Ambiguousness": ambiguousness, "Error": errors}).sort_values(by="Ambiguousness")
df 'Cumulative error rate'] = df['Error'].expanding().mean()
df[# Calculate cumulative accuracy
'Cumulative accuracy'] = 1 - df['Cumulative error rate']
df[
df.head()
# Create traces
= go.Figure()
fig =df["Ambiguousness"], y=df["Error"],
fig.add_trace(go.Scatter(x="markers",
mode="Error",
name=dict(size=9)))
marker=df["Ambiguousness"], y=df["Cumulative accuracy"],
fig.add_trace(go.Scatter(x="lines",
mode="Cum. accuracy",
name= dict(width = 3, dash = 'dash')))
line
fig.update_layout(={
title'text': f"Error vs Ambiguousness</br></br><sub>n={num_test_schedules}</sub>",
'y': 0.95, # Keep the title slightly higher
'x': 0.02,
'xanchor': 'left',
'yanchor': 'top'
},="Ambiguousness",
xaxis_title="Error / Accuracy",
yaxis_title=dict(font=dict(color='white')),
hoverlabel=dict(t=70) # Add more space at the top of the chart
margin
) fig.show()
1.4.8 Hyperparameter Optimization
In the initial model the choice of hyperparameters was based on default values, examples from demo’s or trial and error. To improve the model’s performance, we applied a hyperparameter optimization technique to find the best set of hyperparameters. We used a grid search with cross-validation to find the optimal hyperparameters for the XGBoost model. The grid search was performed over a predefined set of hyperparameters, and the best hyperparameters were selected based on the model’s performance on the validation set. The best hyperparameters were then used to train the final model.
from functions import compare_json
with open("best_trial_params.json", "r") as f:
= json.load(f)
best_trial_params
= compare_json(model_params, best_trial_params)
differences
= pd.DataFrame(differences)
params_tbl ={'json1_value': 'base parameters', 'json2_value': 'optimized parameters'}, inplace=True)
params_tbl.rename(indexprint(params_tbl)
max_depth gamma subsample colsample_bytree \
base parameters 6 0.100000 0.800000 0.800000
optimized parameters 5 0.304548 0.781029 0.922528
learning_rate n_estimators
base parameters 0.100000 100
optimized parameters 0.239488 490
# Fit the model on the entire dataset
# Initialize the XGBClassifier without early stopping here
# Load the best trial parameters from a JSON file.
with open("best_trial_params.json", "r") as f:
= json.load(f)
best_trial_params
= time.time()
start
= xgb.XGBClassifier(
clf ="hist",
tree_method=best_trial_params["max_depth"],
max_depth=best_trial_params["min_child_weight"],
min_child_weight=best_trial_params["gamma"],
gamma=best_trial_params["subsample"],
subsample=best_trial_params["colsample_bytree"],
colsample_bytree=best_trial_params["learning_rate"],
learning_rate=best_trial_params["n_estimators"],
n_estimators
)
clf.fit(X, y)= time.time()
end= end - start
modeling_time print(f"\nTraining time: {modeling_time} seconds\n")
# Calculate and print the training accuracy
= clf.score(X, y)
training_accuracy print(f"Training accuracy: {training_accuracy * 100:.2f}%")
Training time: 0.36020803451538086 seconds
Training accuracy: 99.97%
# Predict the target for new data
= clf.predict(X_new)
y_pred
# Probability estimates
= time.time()
start = clf.predict_proba(X_new)
y_pred_proba = time.time()
end = end - start
prediction_time print(f"\nPrediction time: {prediction_time} seconds\n")
print(f"test_rankings = {np.array(test_rankings)[:6]}, \ny_pred = {y_pred[:6]}, \ny_pred_proba = \n{y_pred_proba[:6]}")
Prediction time: 0.004562854766845703 seconds
test_rankings = [0 1 0 0 1 0],
y_pred = [0 1 0 0 1 0],
y_pred_proba =
[[9.5283890e-01 4.7161117e-02]
[1.6450882e-05 9.9998355e-01]
[9.9937177e-01 6.2821433e-04]
[9.9999166e-01 8.3718114e-06]
[8.5286498e-03 9.9147135e-01]
[9.9326497e-01 6.7350427e-03]]
= np.abs(y_pred - np.array(test_rankings))
errors = calculate_ambiguousness(y_pred_proba)
ambiguousness: np.ndarray = pd.DataFrame({"Ambiguousness": ambiguousness, "Error": errors, "Schedules": test_neighbors, "Objectives": test_objectives}).sort_values(by="Ambiguousness")
df 'Cumulative error rate'] = df['Error'].expanding().mean()
df[# Calculate cumulative accuracy
'Cumulative accuracy'] = 1 - df['Cumulative error rate']
df[ df.head()
Ambiguousness | Error | Schedules | Objectives | Cumulative error rate | Cumulative accuracy | |
---|---|---|---|---|---|---|
757 | 3.321928e-09 | 0 | ([5, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [29.27167190264852, 22.08056364258747] | 0.0 | 1.0 |
85 | 3.321928e-09 | 0 | ([5, 3, 0, 0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0,... | [36.82335597592348, 22.700339918266497] | 0.0 | 1.0 |
181 | 2.913796e-06 | 0 | ([7, 0, 1, 0, 2, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,... | [39.958752516119404, 27.90929658917552] | 0.0 | 1.0 |
768 | 2.913796e-06 | 0 | ([6, 1, 3, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,... | [47.25699782218697, 32.81150132439508] | 0.0 | 1.0 |
353 | 2.913796e-06 | 0 | ([4, 4, 1, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0,... | [36.869197819247226, 26.235646300158415] | 0.0 | 1.0 |
# Create traces
= go.Figure()
fig =df["Ambiguousness"], y=df["Error"],
fig.add_trace(go.Scatter(x="markers",
mode="Error",
name=dict(size=9),
marker=df[["Schedules", "Objectives"]],
customdata=
hovertemplate"Ambiguousness: %{x} <br>" +
"Error: %{y} <br>" +
"Schedules: %{customdata[0][0]} / %{customdata[0][1]} <br>" +
"Objectives: %{customdata[1]} <br>"
))
=df["Ambiguousness"], y=df["Cumulative accuracy"],
fig.add_trace(go.Scatter(x="lines",
mode="Cum. accuracy",
name= dict(width = 3, dash = 'dash')))
line
fig.update_layout(={
title'text': f"Error vs Ambiguousness</br></br><sub>n={num_test_schedules}</sub>",
'y': 0.95, # Keep the title slightly higher
'x': 0.02,
'xanchor': 'left',
'yanchor': 'top'
},="Ambiguousness",
xaxis_title="Error / Accuracy",
yaxis_title=dict(font=dict(color='white')),
hoverlabel=dict(t=70) # Add more space at the top of the chart
margin
) fig.show()
1.5 Results
We wanted to test whether an XGBoost classification model could be used to assess and rank the quality of pairs of schedules. For performance benchmarking we use the conventional calculation method utilizing Lindley recursions.
We trained the XGBoost ranking model with a limited set of features (schedules) and labels (objectives). The total number of possible schedules is approximately 52.0 million. For training and validation, we sampled 20000 schedules. Generating the feature and label set took a total of 18.9963 seconds, with the calculation of objective values accounting for 17.0961 seconds.
The model demonstrates strong and consistent performance with high accuracies both for training as well as testing, good generalization and stability. Total training time for the final model was 0.3602 seconds. The evaluation of 1000 test schedules took 0.0046 seconds for the the XGBoost model and 0.8984 for the conventional method, which is an improvement of 196X.
1.6 Discussion
= round(modeling_time, 4)
training_time = round(evaluation_time, 4)
conventional_time = round(prediction_time, 4)
xgboost_time
# Define time values for plotting
= np.linspace(0, training_time+0.1, 1000) # 0 to 2 seconds
time_values
# Calculate evaluations for method 1
= np.where(time_values >= training_time, (time_values - training_time) / xgboost_time * 1000, 0)
method1_evaluations
# Calculate evaluations for method 2
= time_values / conventional_time * 1000
method2_evaluations
# Create line chart
= go.Figure()
fig
# Add method 1 trace
=time_values, y=method1_evaluations, mode='lines', name='Ranking model'))
fig.add_trace(go.Scatter(x
# Add method 2 trace
=time_values, y=method2_evaluations, mode='lines', name='Conventional method'))
fig.add_trace(go.Scatter(x
# Update layout
fig.update_layout(="Speed comparison between XGBoost ranking model and conventional method",
title="Time (seconds)",
xaxis_title="Number of Evaluations",
yaxis_title="Methods",
legend_title="plotly_white"
template
)
fig.show()
1.7 Timeline
*This experiment was started on 25-07-2024. The completion date was 28-08-2024.**