2  XGBoost regression model for objective calculation

2.1 Objective

Compare the performance (speed and accuracy) of a surrogate model (XGBoost regressor) with a conventional calculation for appointment scheduling objective function and against a ranking model.

2.2 Background

In this experiment we’ll develop a Machine Learning model using XGBoost for evaluating a single schedule and let it compete with the conventional method as well as with the ranking model.

2.3 Hypothesis

We expect a ranking model to be superior in speed compared to a XGBoost regressor model. The XGBoost regressor model will outperform the conventional model in speed.

2.4 Methodology

2.4.1 Tools and Materials

We use packages from Scikit-learn to prepare training data and evaluate the model and the XGBRegressor interface from the XGBoost library.

import time
import math
import json
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, KFold, GridSearchCV
from sklearn.base import clone
from sklearn.metrics import mean_squared_log_error
import xgboost as xgb
from xgboost.callback import TrainingCallback
import plotly.graph_objects as go
import pickle

2.4.2 Experimental Design

from functions import get_v_star
N = 12 # Number of patients
T = 18 # Number of intervals
d = 5 # Length of each interval
s = [0.0, 0.27, 0.28, 0.2, 0.15, 0.1] # Service times distribution
q = 0.20 # Probability of a scheduled patient not showing up
w = 0.8 # Weight for the waiting time in objective function
num_schedules = 20000 # Number of schedules to sample
v_star = get_v_star(T)

We will create a random set of pairs of neighboring schedules with \(N = 12\) patients and \(\ T = 18\) intervals of length \(d = 5\).

A neighbor of a schedule x is considered a schedule x’ where single patients have been shifted one interval to the left. Eg: ([2,1,1,2], [1,2,0,3]) are neighbors and ([2,1,1,2], [2,1,3,0]) are not, because [1,2,0,3] - [2,1,1,2] = [-1, 1, -1, 1] and [2,1,3,0] - [2,1,1,2] = [0, 0, 2, -2].

Service times will have a discrete distribution. The probability of a scheduled patient not showing up will be \(q = 0.2\).

The objective function will be the weighted average of the total waiting time of all patients and overtime. First all the paired schedules will be evaluated by computing the objective value. Then an XGBoost regressor model for predicting objective values will be trained and evaluated.

The model will be validated using a new sample of paired schedules the model has never seen (not in the training or the evaluation phase). All the objective values will be computed and the computation time will be recorded. Using the regressor model the objectives will be predicted and the prediciotn time will be measured. The predicted values will be compared to the actual values and the accuracy of the model will be assessed.

In order to be able to compare the objective regressor to the ranking model in the other experiment we will also predict the rankings of the paired schedules and compare them to the actual rankings. An opaqueness measure will be calculated for each prediction to assess the confidence of the model and relate it to accuracy.

2.4.3 Variables

  • Independent Variables: A list of tuples with pairs of neighboring schedules.
  • Dependent Variables:
    1. A list of tuples with the objective values for each pair of neighboring schedules.
    2. Lists with rankings for each tuple of pairwise schedules. Eg: If the rank for ([2,1,1], [1,1,2]) equals 0 this means that the schedule with index 0 ([2,1,1]) has the lowest objective value.

2.4.4 Data Collection

The data set will be generated using simulation in which random samples will be drawn from the population of all possible schedules. For each sample a random neighboring schedule will be created.

2.4.5 Sample Size and Selection

Sample Size: The total population size equals \({{N + T -1}\choose{N}} \approx\) 52.0 mln. For this experiment we will be using a relatively small sample of 20000 schedules.

Sample Selection: The samples will be drawn from a lexicographic order of possible schedules in order to accurately reflect the combinatorial nature of the problem and to ensure unbiased sampling from the entire combinatorial space.

2.4.6 Data Collection

The data sample has been generated in an earlier experiment using simulation in which random samples were drawn from the population of all possible schedules.

# Load the data from the pickle file
with open('neighbors_and_objectives.pkl', 'rb') as f:
    data = pickle.load(f)

# Extract the variables from the loaded data
neighbors_list = data['neighbors_list']
objectives_list = data['objectives']
rankings_list = data['rankings']

print("Data loaded successfully.\n")
for neigbors in neighbors_list[:2]: print(neigbors, "\n")
for objectives in objectives_list[:2]: print(objectives, "\n")
for rankings in rankings_list[:2]: print(rankings, "\n")
Data loaded successfully.

([3, 4, 2, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [4, 4, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 

([5, 3, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [5, 3, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]) 

[33.92084835176509, 40.85746812024389] 

[38.76980916837894, 39.52497576696366] 

0 

0 

The experiment involves multiple steps, beginning with data preparation and concluding with model evaluation.The diagram below illustrates the sequence of steps.

graph TD
    A["From population"] -->|"Sample"| B["Random subset"]
    B --> |Create neighbors| C["Features: Schedule pairs"]
    C --> |Calculate objectives| D["Labels: Objective values"]
    D --> |Flatten lists| E["Features and labels"]
    E --> |"Split"| F["Training set"]
    E --> |"Split"| G["Test set"]
    F --> |"Train"| H["Model"]
    H["Model"] --> |"Apply"| G["Test set"]
    G["Test set"] --> |"Evaluate"| I["Performance"]

  1. Prepare the data for training the XGBoost regressor model.
# Transform the schedule and objective data into lists of NumPy arrays
X = [item for tup in neighbors_list for item in tup]
y = [item for tup in objectives_list for item in tup]
print(f"Flattened neighbors list: {X[:3]}")
print(f"Flattened objectives list: {y[:3]}")
print(f"Number of schedules: {len(X)}")
Flattened neighbors list: [[3, 4, 2, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [4, 4, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [5, 3, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]]
Flattened objectives list: [33.92084835176509, 40.85746812024389, 38.76980916837894]
Number of schedules: 40000
  1. Run hyperparameter optimization for the XGBoost regressor model and record the time taken to find the optimal hyperparameters.
# #=========================================================================
# # XGBoost regression: 
# # Parameters: 
# # n_estimators  "Number of gradient boosted trees. Equivalent to number 
# #                of boosting rounds."
# # learning_rate "Boosting learning rate (also known as “eta”)"
# # max_depth     "Maximum depth of a tree. Increasing this value will make 
# #                the model more complex and more likely to overfit." 
# #=========================================================================
# regressor=xgb.XGBRegressor(eval_metric='rmsle')
# 
# #=========================================================================
# # exhaustively search for the optimal hyperparameters
# #=========================================================================
# from sklearn.model_selection import GridSearchCV
# # set up our search grid
# param_grid = {"max_depth":    [4, 5, 7],
#               "n_estimators": [500, 700, 900],
#               "learning_rate": [0.05, 0.1, 0.15]}
# 
# # try out every combination of the above values
# start = time.time()
# search = GridSearchCV(regressor, param_grid, cv=5, verbose=3, n_jobs=-1).fit(X_train, y_train)
# end = time.time()
# hyper_search_time = end - start
# print(f'Hyperparameter optimization time: {hyper_search_time}')
# 
# print("The best hyperparameters are ",search.best_params_)
  1. Train XGBoost regressor model to predict objective values from given schedules and measure training time.
class CustomCallback(TrainingCallback):
    def __init__(self, period=10):
        self.period = period

    def after_iteration(self, model, epoch, evals_log):
        if (epoch + 1) % self.period == 0:
            print(f"Epoch {epoch}, Evaluation log: {evals_log['validation_0']['rmse'][epoch]}")
        return False

def fit_and_score(estimator, X_train, X_test, y_train, y_test):
    """Fit the estimator on the train set and score it on both sets"""
    estimator.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=0)
    
    train_score = estimator.score(X_train, y_train)
    test_score = estimator.score(X_test, y_test)

    return estimator, train_score, test_score

# Ensure that X and y are numpy arrays (convert if needed)
X = np.array(X)  # Replace this with actual data
y = np.array(y)  # Replace this with actual data

# Check the shapes of X and y to ensure compatibility
print(f"X shape: {X.shape}, y shape: {y.shape}")

# Use KFold instead of StratifiedKFold, as stratification is not necessary for regression
cv = KFold(n_splits=5, shuffle=True, random_state=94)

# Load the best trial parameters from a JSON file
with open("best_regressor_trial_params.json", "r") as f:
    model_params = json.load(f)

# Initialize the XGBRegressor with the loaded parameters
regressor = xgb.XGBRegressor(
    tree_method="hist",
    max_depth=model_params["max_depth"],
    min_child_weight=model_params["min_child_weight"],
    gamma=model_params["gamma"],
    subsample=model_params["subsample"],
    colsample_bytree=model_params["colsample_bytree"],
    learning_rate=model_params["learning_rate"],
    n_estimators=model_params["n_estimators"],
    callbacks=[CustomCallback(period=50)],
)

print("Params: ")
for key, value in model_params.items():
    print(f" {key}: {value}")

start = time.time()
results = []

for train_idx, test_idx in cv.split(X, y):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    
    est, train_score, test_score = fit_and_score(
        clone(regressor), X_train, X_test, y_train, y_test
    )
    results.append((est, train_score, test_score))
end = time.time()
training_time = end - start
print(f"\nTraining time: {training_time} seconds\n")
X shape: (40000, 18), y shape: (40000,)
Params: 
 max_depth: 7
 min_child_weight: 8
 gamma: 0.014781560271184178
 subsample: 0.939540319301831
 colsample_bytree: 0.981185586324404
 learning_rate: 0.19710059853968986
 n_estimators: 339
Epoch 49, Evaluation log: 0.7842675890542508
Epoch 99, Evaluation log: 0.5841304745168888
Epoch 149, Evaluation log: 0.5205105576054171
Epoch 199, Evaluation log: 0.49100139293300615
Epoch 249, Evaluation log: 0.4725865535197702
Epoch 299, Evaluation log: 0.46768840443511245
Epoch 49, Evaluation log: 0.8459649633654832
Epoch 99, Evaluation log: 0.6707248343759513
Epoch 149, Evaluation log: 0.612484916667559
Epoch 199, Evaluation log: 0.5938950511994303
Epoch 249, Evaluation log: 0.5831474784528811
Epoch 299, Evaluation log: 0.5696855171251721
Epoch 49, Evaluation log: 0.8168490844451659
Epoch 99, Evaluation log: 0.620164837967189
Epoch 149, Evaluation log: 0.5494073276576844
Epoch 199, Evaluation log: 0.5259532836037151
Epoch 249, Evaluation log: 0.5005621293366569
Epoch 299, Evaluation log: 0.48866907252237435
Epoch 49, Evaluation log: 0.8179180833280669
Epoch 99, Evaluation log: 0.6199463741784181
Epoch 149, Evaluation log: 0.5610159807146109
Epoch 199, Evaluation log: 0.5346341440253888
Epoch 249, Evaluation log: 0.5206202507814842
Epoch 299, Evaluation log: 0.5038149767723248
Epoch 49, Evaluation log: 0.8459593913794298
Epoch 99, Evaluation log: 0.6399426459119163
Epoch 149, Evaluation log: 0.5836571877365068
Epoch 199, Evaluation log: 0.5576078958053741
Epoch 249, Evaluation log: 0.5435028981273413
Epoch 299, Evaluation log: 0.5367994961487046

Training time: 3.8138859272003174 seconds
# Print results
for i, (est, train_score, test_score) in enumerate(results):
    print(f"Fold {i+1} - Train Score (R²): {train_score:.4f}, Test Score (R²): {test_score:.4f}")
Fold 1 - Train Score (R²): 0.9988, Test Score (R²): 0.9982
Fold 2 - Train Score (R²): 0.9989, Test Score (R²): 0.9971
Fold 3 - Train Score (R²): 0.9988, Test Score (R²): 0.9980
Fold 4 - Train Score (R²): 0.9988, Test Score (R²): 0.9978
Fold 5 - Train Score (R²): 0.9987, Test Score (R²): 0.9975
# regressor=xgb.XGBRegressor(learning_rate = search.best_params_["learning_rate"],
#                        n_estimators  = search.best_params_["n_estimators"],
#                        max_depth     = search.best_params_["max_depth"],
#                        eval_metric='rmsle')
# 
# start = time.time()
# regressor.fit(X_train, y_train)
# end = time.time()
# training_time = end - start
# print(f"\nTraining time: {training_time} seconds\n")
  1. Use the trained model to predict the objective values for the test set and calculate the Mean Absolute Percentage Error (MAPE) between the predicted and true values.
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBRegressor with the loaded parameters
regressor = xgb.XGBRegressor(
    tree_method="hist",
    max_depth=model_params["max_depth"],
    min_child_weight=model_params["min_child_weight"],
    gamma=model_params["gamma"],
    subsample=model_params["subsample"],
    colsample_bytree=model_params["colsample_bytree"],
    learning_rate=model_params["learning_rate"],
    n_estimators=model_params["n_estimators"],
)

regressor.fit(X_train, y_train)
regressor.save_model('models/regressor.json')
predictions = regressor.predict(X_test)

# Calculate Mean Absolute Percentage Error (MAPE)
def mean_absolute_percentage_error(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

mape = mean_absolute_percentage_error(y_test, predictions)
print(f'MAPE: {mape:.2f}%')
MAPE: 0.75%
# Create the scatter plot
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=y_test, 
    y=predictions, 
    mode='markers',
    marker=dict(color='blue'),
    name='Predictions vs. true values'
))
fig.add_trace(go.Scatter(
    x=[0, max(max(y_test), max(predictions))],
    y=[0, max(max(y_test), max(predictions))],
    mode='lines',
    line=dict(color='tomato', dash='dash'),
    name='Base line',
))

# Add axis labels and a title
fig.update_layout(
    title='Predictions vs. true values',
    xaxis_title='True values',
    yaxis_title='Predictions',
    showlegend=True
)

# Show the plot
fig.show()

2.4.7 Validation

  1. Create validation set with pairs of neighboring schedules and calculate their objectives. Measure calculation time.
from functions import random_combination_with_replacement, create_neighbors_list, calculate_objective

num_test_schedules = 1000

test_schedules = random_combination_with_replacement(T, N, num_test_schedules)
test_neighbors = [create_neighbors_list(test_schedule, v_star) for test_schedule in test_schedules] # This can be done in parellel to improve speed

print(f"Sampled: {len(test_schedules)} schedules\n")

# Start time measeurement for the evaluation
start = time.time()
test_objectives_schedule_1 = [w * calculate_objective(test_neighbor[0], s, d, q)[0] + (1 - w) * calculate_objective(test_neighbor[0], s, d, q)[1] for test_neighbor in test_neighbors]
end = time.time()
evaluation_time = end - start
print(f"Evaluation time: {evaluation_time} seconds,\nNumber of evaluated schedules: {len(test_schedules)}\n")
test_objectives_schedule_2 = [w * calculate_objective(test_neighbor[1], s, d, q)[0] + (1 - w) * calculate_objective(test_neighbor[1], s, d, q)[1] for test_neighbor in test_neighbors]
test_rankings = [0 if test_obj < test_objectives_schedule_2[i] else 1 for i, test_obj in enumerate(test_objectives_schedule_1)]

# Combine the objectives for each pair for later processing
test_objectives = [[test_obj, test_objectives_schedule_2[i]] for i, test_obj in enumerate(test_objectives_schedule_1)]


for i in range(6):
    print(f"Neighbors: {test_neighbors[i]},\nObjectives: {test_objectives[i]}, Ranking: {test_rankings[i]}\n")
Total number of combinations: 51,895,935
Sampled: 1000 schedules

Evaluation time: 0.8680870532989502 seconds,
Number of evaluated schedules: 1000

Neighbors: ([6, 2, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0], [6, 2, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0]),
Objectives: [45.52353226715654, 39.28968414105995], Ranking: 1

Neighbors: ([6, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0], [6, 1, 2, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]),
Objectives: [35.515142981242484, 40.420143342402774], Ranking: 0

Neighbors: ([4, 4, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [4, 3, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
Objectives: [40.75650171481233, 29.93577017017845], Ranking: 1

Neighbors: ([5, 4, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [6, 4, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
Objectives: [50.97612604849105, 59.37432141247381], Ranking: 0

Neighbors: ([7, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [6, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]),
Objectives: [46.56245265301212, 33.6342448518005], Ranking: 1

Neighbors: ([8, 0, 2, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 0, 2, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
Objectives: [57.1136398876704, 43.64936609033637], Ranking: 1
  1. Predict for each schedule in the validation set the objectives using the regressor model. Measure prediction time.
def predict_objective(neighbors):
    neighbors_array = [np.array(neighbor) for neighbor in neighbors] # Convert schedules to a NumPy array
    neighbors_array = np.vstack(neighbors_array)
    predictions = regressor.predict(neighbors_array)
    return predictions

# Start time measurement for the prediction
start = time.time()
predictions = regressor.predict(test_schedules)
end = time.time()
prediction_time = end - start
print(f"Prediction time: {prediction_time},\nNumber of predicted schedules: {len(predictions)}\n")

# Calculate the rankings based on the predicted objectives
predictions = [predict_objective(neighbors) for neighbors in test_neighbors]
pred_rankings = [np.argmin(objectives) for objectives in predictions]
for i in range(6):
    print(f"Neighbors: {test_neighbors[i]},\nPredictions: {predictions[i]}, Ranking: {pred_rankings[i]}\n")
Prediction time: 0.0039348602294921875,
Number of predicted schedules: 1000

Neighbors: ([6, 2, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0], [6, 2, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0]),
Predictions: [45.412872 38.887188], Ranking: 1

Neighbors: ([6, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0], [6, 1, 2, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]),
Predictions: [35.4165  40.10864], Ranking: 0

Neighbors: ([4, 4, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [4, 3, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
Predictions: [40.62893  29.863707], Ranking: 1

Neighbors: ([5, 4, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [6, 4, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
Predictions: [50.934414 59.48574 ], Ranking: 0

Neighbors: ([7, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [6, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]),
Predictions: [46.498203 34.146675], Ranking: 1

Neighbors: ([8, 0, 2, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 0, 2, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
Predictions: [57.351826 43.583366], Ranking: 1
  1. Calculate opaqueness and accuracy comparing true and predicted rankings.

Opaqueness is calculated using the formula for entropy:

\[ H(X) = - \sum_{i} p(x_i) \log_b p(x_i) \]

Where in our case:

  • \(H(X)\) is the opaqueness of the random variable \(X\) - the set of predicted normalized objective values for each of the paired schedules,
  • \(p(x_i)\) is the normalized outcome \(x_i\),
  • \(\log_b\) is the logarithm with base \(b\) (here \(\log_2\) as we have two predicted values),
  • The sum is taken over all possible outcomes of \(X\).
from functions import calculate_opaqueness

errors = np.abs(np.array(test_rankings) - pred_rankings)
accuracy = 1 - errors.mean()
print(f"Accuracy = {accuracy}")

# Calculate the opaqueness of each prediction
normalised_predictions = [prediction / np.sum(prediction) for prediction in predictions]
opaqueness = [calculate_opaqueness(vector) for vector in normalised_predictions]
Accuracy = 0.954
predicted_values_left = [prediction[0] for prediction in predictions]
df = pd.DataFrame({"Opaqueness": opaqueness, "Error": errors, "Predictions": predictions}).sort_values(by="Opaqueness")
df['Cumulative error rate'] = df['Error'].expanding().mean()
# Calculate cumulative accuracy
df['Cumulative accuracy'] = 1 - df['Cumulative error rate']
print(df.head())

# Create traces
fig = go.Figure()
fig.add_trace(go.Scatter(x=df["Opaqueness"], y=df["Error"],
                    mode="markers",
                    name="Error",
                    marker=dict(size=9),
                    text=[f'{prediction}' for prediction in df["Predictions"]],))
fig.add_trace(go.Scatter(x=df["Opaqueness"], y=df["Cumulative accuracy"],
                    mode="lines",
                    name="Cum. accuracy",
                    line = dict(width = 3, dash = 'dash')))
fig.update_layout(
    title={
        'text': f"Error vs Opaqueness</br></br><sub>n={num_test_schedules}</sub>",
        'y': 0.95,  # Keep the title slightly higher
        'x': 0.02,
        'xanchor': 'left',
        'yanchor': 'top'
    },
    xaxis_title="Opaqueness",
    yaxis_title="Error / Accuracy",
    hoverlabel=dict(font=dict(color='white')),
    margin=dict(t=70)  # Add more space at the top of the chart
)
fig.show()
fig.write_html("images/objective-results.html")
     Opaqueness  Error             Predictions  Cumulative error rate  \
639    0.941917      0  [31.386139, 17.584324]                    0.0   
967    0.956972      0   [34.25956, 20.863909]                    0.0   
398    0.963658      0  [26.794538, 17.004993]                    0.0   
940    0.963955      0  [16.938108, 26.638329]                    0.0   
79     0.965459      0  [30.839458, 19.802292]                    0.0   

     Cumulative accuracy  
639                  1.0  
967                  1.0  
398                  1.0  
940                  1.0  
79                   1.0  

2.5 Results

We wanted to test whether an XGBoost regressor model could be used to assess the objective values schedules. For performance benchmarking we use the conventional calculation method utilizing Lindley recursions.

We trained the XGBoost regressor model with a limited set of features (schedules) and labels (objectives). The total number of possible schedules is approximately 52.0 million. For training and validation, we sampled 20000 schedules.

The model demonstrates strong and consistent performance with high prediction ability both for training as well as testing, good generalization and stability. Total training time for the final model was 3.8139 seconds. The evaluation of 1000 test schedules took 0.0039 seconds for the the XGBoost model and 0.8681 for the conventional method, which is an improvement of 220X.

2.6 Discussion

training_time = round(training_time, 4)
conventional_time = round(evaluation_time, 4)
xgboost_time = round(prediction_time, 4)

# Define time values for plotting
time_values = np.linspace(0, training_time+0.1, 1000)  # 0 to 2 seconds

# Calculate evaluations for method 1
method1_evaluations = np.where(time_values >= training_time, (time_values - training_time) / xgboost_time * 1000, 0)

# Calculate evaluations for method 2
method2_evaluations = time_values / conventional_time * 1000

# Create line chart
fig = go.Figure()

# Add method 1 trace
fig.add_trace(go.Scatter(x=time_values, y=method1_evaluations, mode='lines', name='Regressor model'))

# Add method 2 trace
fig.add_trace(go.Scatter(x=time_values, y=method2_evaluations, mode='lines', name='Conventional method'))

# Update layout
fig.update_layout(
    title="Speed comparison between XGBoost regressor model and conventional method",
    xaxis_title="Time (seconds)",
    yaxis_title="Number of Evaluations",
    legend_title="Methods",
    template="plotly_white"
)

fig.show()
fig.write_html("images/objectives-speed.html")

2.7 Timeline

This experiment was started on 30-08-2024. The expected completion date is 09-09-2024.

2.8 References

Cite all sources that informed your experiment, including research papers, datasets, and tools. This section ensures that your work is properly grounded in existing research and that others can trace the origins of your methods and data.