Idee Joost: cardinal analysis - is schedule A better than B? -> Yes/No
Experiment: Application of ML models to calculate loss values
In this experiment we will try out several Machine Learning models to predict the outcome of a loss function from a given schedule with T intervals of length d. A schedule with N patients can be defined as a vector
x = (x_1, x_2, \ldots, x_T) \in \mathbb{N}_0^T
\text{ such that } \sum_{t=1}^{T} x_t = N.
Waiting times are calculated using a Lindley recursion:
W_{1, t} = \max(0, W_{1, t-1} + V_{t-1} - d)
W_{n, t} = W_{1,t} * S^{(n-1)}
where W_{n, t} is the waiting time distribution of patient n in interval t, V_{T} is the distribution of the total amount of work in the last interval, S^{(k)} is the k-fold convolution of the service time distribution S and W*S is the convolution of the waiting time distribtion W with S.
We are interested in calculating for each schedule x the expected total waiting time and overtime
The loss function is a weighted average of expected waiting time and overtime
C(x) = \alpha_W E(W(x)) + \alpha_L E(L(x))
Directly mapping schedules to loss values would significantly reduce the practical applicability of our method. The weights in the loss function are subjective values that reflect the perceived relative importance of their corresponding factors. Consequently, each time the weights are adjusted, the model would need to be retrained.
A more effective approach would be to train separate models for each factor in the loss function. Once we predict the levels of each factor, we can input them into the loss function using the desired weights. In this experiment we will even go a step further and try to predict the expected waiting times in each interval instead of the aggregated values for the whole schedule.
Setup
Load all packages, the schedule class and selected functions.
from schedule_class import Schedule, generate_schedulesfrom functions import generate_all_schedulesimport numpy as npimport pandas as pdimport xgboost as xgbfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_errorimport tensorflow as tffrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Input, Conv2D, Flatten, Denseimport plotly.express as pxfrom plotly.subplots import make_subplotsimport plotly.graph_objects as go# Example usage:x = np.array([2, 1, 0])d =3s = [0.1, 0.3, 0.25, 0.2, 0.15]q =0.0omega =0.5schedule = Schedule(x, d, s, q, omega)print(schedule)
2024-07-01 14:47:56.845058: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Runner functions for creating and running a schedule instance.
Generate a dataset for training and testing of various schedules with N \in \{1, \dots, 18\} and corresponding aggregated expected waiting times in each interval.
N =18data = {'x0': [], 'x1': [], 'x2': [], 'ew0': [], 'ew1': [], 'ew2': []}df = pd.DataFrame.from_dict(data)for n inrange(1, N+1): schedules = generate_schedules(n) # Generates all possible schedules with T hard-coded 3for schedule in schedules: x = np.array(schedule, dtype=np.int64) sch = run_schedule(x, d, s, q, omega, False) x0, x1, x2 = x data['x0'].append(x0) data['x1'].append(x1) data['x2'].append(x2) data['ew0'].append(sch.system['ew'][0]) data['ew1'].append(sch.system['ew'][1]) data['ew2'].append(sch.system['ew'][2])# Convert the current data dictionary to a DataFrame and append it to the main DataFrame temp_df = pd.DataFrame.from_dict(data) df = pd.concat([df, temp_df]) data = {'x0': [], 'x1': [], 'x2': [], 'ew0': [], 'ew1': [], 'ew2': []}df
import xgboost as xgbfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error# Split data into features and labelsX = df[['x0', 'x1', 'x2']]y1 = df['ew0']y2 = df['ew1']y3 = df['ew2']# Split into training and testing datasetsX_train, X_test, y1_train, y1_test = train_test_split(X, y1, test_size=0.2, random_state=42)_, _, y2_train, y2_test = train_test_split(X, y2, test_size=0.2, random_state=42)_, _, y3_train, y3_test = train_test_split(X, y3, test_size=0.2, random_state=42)# Train modelsparams = {'objective': 'reg:squarederror','max_depth': 3,'eta': 0.1,'verbosity': 1}num_round =100# Model for output_1dtrain1 = xgb.DMatrix(X_train.iloc[:, 0], label=y1_train)dtest1 = xgb.DMatrix(X_test.iloc[:, 0], label=y1_test)model1 = xgb.train(params, dtrain1, num_round)# Model for output_2dtrain2 = xgb.DMatrix(X_train.iloc[:, 0:2], label=y2_train)dtest2 = xgb.DMatrix(X_test.iloc[:, 0:2], label=y2_test)model2 = xgb.train(params, dtrain2, num_round)# Model for output_3dtrain3 = xgb.DMatrix(X_train, label=y3_train)dtest3 = xgb.DMatrix(X_test, label=y3_test)model3 = xgb.train(params, dtrain3, num_round)
# Make predictionspreds1 = model1.predict(dtest1)preds2 = model2.predict(dtest2)preds3 = model3.predict(dtest3)# Combine predictions into a single output vector for each inputpredictions = pd.DataFrame({'output_1': preds1,'output_2': preds2,'output_3': preds3})print(predictions)
mse1 = mean_squared_error(y1_test, preds1)mse2 = mean_squared_error(y2_test, preds2)mse3 = mean_squared_error(y3_test, preds3)print(f'MSE for output_1: {mse1}')print(f'MSE for output_2: {mse2}')print(f'MSE for output_3: {mse3}')
MSE for output_1: 4.374190847504203
MSE for output_2: 9.273653843255532
MSE for output_3: 74.13544710299959
# Create a DataFrame with actual and predicted values for comparisoncomparison_df = pd.DataFrame({'Actual_output_1': y1_test.values,'Predicted_output_1': preds1,'Actual_output_2': y2_test.values,'Predicted_output_2': preds2,'Actual_output_3': y3_test.values,'Predicted_output_3': preds3})# Create scatter plots for each outputfig1 = px.scatter(comparison_df, x='Actual_output_1', y='Predicted_output_1', title='Actual vs Predicted - Output 1')fig2 = px.scatter(comparison_df, x='Actual_output_2', y='Predicted_output_2', title='Actual vs Predicted - Output 2')fig3 = px.scatter(comparison_df, x='Actual_output_3', y='Predicted_output_3', title='Actual vs Predicted - Output 3')# Show the plotsfig1.show()fig2.show()fig3.show()
Description of the Setup
Import Necessary Libraries:
import xgboost as xgbfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error
xgboost: Library for gradient boosting.
train_test_split from sklearn: Function to split data into training and testing sets.
mean_squared_error from sklearn: Function to evaluate the performance of the models.
Prepare Data:
# Split data into features and labelsX = df[['x0', 'x1', 'x2']]y1 = df['ew0']y2 = df['ew1']y3 = df['ew2']
X: Feature matrix containing columns x0, x1, and x2.
y1, y2, y3: Target variables corresponding to different outputs.
X = df[['x0', 'x1', 'x2']].valuesy = df[['ew0', 'ew1', 'ew2']].values# Reshape X to have the shape (samples, height, width, channels)X = X.reshape((X.shape[0], X.shape[1], 1, 1))# Split into training and testing datasetsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build the modelmodel = Sequential()model.add(Input(shape=(3, 1, 1)))model.add(Conv2D(64, (2, 1), activation='relu'))model.add(Flatten())model.add(Dense(64, activation='relu'))model.add(Dense(3)) # Output layer with 3 units (one for each output value)# Compile the modelmodel.compile(optimizer='adam', loss='mean_squared_error')# Print model summarymodel.summary()
<keras.src.callbacks.history.History at 0x159ab52b0>
# Evaluate the modelloss = model.evaluate(X_test, y_test, verbose=0)print(f'Test Loss: {loss}')# Make predictionspredictions = model.predict(X_test)
Test Loss: 3.7186384201049805
1/9 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
# Create a DataFrame with actual and predicted values for comparisoncomparison_df = pd.DataFrame({'Actual_output_1': y_test[:, 0],'Predicted_output_1': predictions[:, 0],'Actual_output_2': y_test[:, 1],'Predicted_output_2': predictions[:, 1],'Actual_output_3': y_test[:, 2],'Predicted_output_3': predictions[:, 2]})# Create scatter plots for each outputfig4 = px.scatter(comparison_df, x='Actual_output_1', y='Predicted_output_1', title='Actual vs Predicted - Output 1')fig5 = px.scatter(comparison_df, x='Actual_output_2', y='Predicted_output_2', title='Actual vs Predicted - Output 2')fig6 = px.scatter(comparison_df, x='Actual_output_3', y='Predicted_output_3', title='Actual vs Predicted - Output 3')# Show the plotsfig4.show()fig5.show()fig6.show()
Data Preparation
Import Necessary Libraries:
import numpy as npfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Input, Conv2D, Flatten, Densefrom tensorflow.keras.optimizers import Adam
numpy: Library for numerical computations.
train_test_split from sklearn: Function to split data into training and testing sets.
Sequential, Conv2D, Flatten, Dense from tensorflow.keras: Classes and functions for building and training neural networks.
Adam from tensorflow.keras.optimizers: Optimizer for training the neural network.
Prepare Data:
X = df[['x0', 'x1', 'x2']].valuesy = df[['ew0', 'ew1', 'ew2']].values
X: Feature matrix containing columns x0, x1, and x2.
y: Target matrix containing columns ew0, ew1, and ew2.
Reshape Data:
In Convolutional Neural Networks (CNNs), the input data is typically expected to be in a specific shape to allow the convolutional layers to process it correctly. The required shape usually depends on the nature of the data and the framework being used. Here’s a detailed explanation of the reshaping process:
Assuming we have a dataset X where:
Each row corresponds to a different sample.
Each column corresponds to a different feature.
For example, if X has 3 features (x0, x1, x2), the original shape of X would be (samples, 3).
CNNs expect input data in the form of a 4D tensor:
samples: Number of samples in the dataset.
height: Height of the input data (typically the number of rows in an image).
width: Width of the input data (typically the number of columns in an image).
channels: Number of channels in the input data (e.g., 1 for grayscale images, 3 for RGB images).
For this specific case, we want to reshape X to have a shape suitable for a CNN. The new shape is (samples, height, width, channels).
Here’s the code for reshaping X:
# Reshape X to have the shape (samples, height, width, channels)X = X.reshape((X.shape[0], X.shape[1], 1, 1))
X.shape[0]: This represents the number of samples.
X.shape[1]: This represents the number of features. In this case, each feature will be treated as a separate “row” in the “image”.
1: This represents the width of each “image”. Since each feature is treated as a single “pixel” row, the width is 1.
1: This represents the number of channels. Since the data isn’t color-coded and only consists of numerical values for each feature, we use 1 channel.
Split Data into Training and Testing Sets:
# Split into training and testing datasetsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_test_split splits the data with 80% for training and 20% for testing, using a fixed random seed (random_state=42) for reproducibility.
Model Building
Build the Model:
# Build the modelmodel = Sequential()model.add(Input(shape=(3, 1, 1)))model.add(Conv2D(64, (2, 1), activation='relu'))model.add(Flatten())model.add(Dense(64, activation='relu'))model.add(Dense(3)) # Output layer with 3 units (one for each output value)
Creates a Sequential model.
Adds an Input layer with a shape of (3, 1, 1) to define the input shape of the model, which in this case is a 3x1x1 tensor. This explicitly sets the expected input dimensions for the model, which is useful for ensuring the data is correctly formatted before passing it through subsequent layers.
Adds a Conv2D layer:
The layer will learn 64 different filters.
A kernel size of (2, 1), indicating the height and width of the 2D convolution window.
ReLU (Rectified Linear Unit) activation function is used to introduce non-linearity to the model, helping it learn more complex patterns.
Adds a Flatten layer to convert the 2D output of the convolutional layer to a 1D array, preparing it to be fed into a fully connected (dense) layer. This conversion is necessary because dense layers expect 1D input.
Adds a Dense layer with 64 units (neurons) and ReLU activation function.
Adds an output Dense layer with 3 units (one for each target variable). This layer does not use an activation function, allowing it to produce a wide range of values as output.
Compile the Model:
# Compile the modelmodel.compile(optimizer='adam', loss='mean_squared_error')
Compiles the model with the Adam optimizer and mean squared error loss function.
Print Model Summary:
# Print model summarymodel.summary()
Prints a summary of the model architecture.
Model Training and Evaluation
Train the Model:
# Train the modelmodel.fit(X_train, y_train, epochs=50, batch_size=1, verbose=2)
Trains the model for 50 epochs with a batch size of 1. The verbose=2 option provides detailed logs during training.
Evaluate the Model:
# Evaluate the modelloss = model.evaluate(X_test, y_test, verbose=0)print(f'Test Loss: {loss}')
Evaluates the model on the test data and prints the test loss.
Make Predictions:
# Make predictionspredictions = model.predict(X_test)
Makes predictions on the test data.
Model Architecture:
Layer Types: Add or remove layers such as additional Conv2D, MaxPooling2D, Dense, or other layers to change the network’s complexity.
Layer Parameters: Adjust the number of filters in Conv2D, the size of kernels, or the number of units in Dense layers.
Activation Functions:
Change activation functions (e.g., relu, sigmoid, tanh, softmax) to experiment with different non-linearities.
Optimizer:
Change the optimizer (e.g., SGD, RMSprop, Adam, Adagrad). Adjust learning rates and other optimizer-specific parameters.
Loss Function:
Use different loss functions (e.g., mean_absolute_error, mean_squared_logarithmic_error) depending on the specific task and data distribution.
Regularization:
Add regularization techniques such as Dropout layers, L2 regularization (kernel_regularizer=l2()), or early stopping during training to prevent overfitting.
Batch Size and Epochs:
Adjust the batch_size and epochs to change the training dynamics. Larger batch sizes can provide more stable updates, while more epochs allow the model to learn more but can lead to overfitting.
Input Data Shape:
Reshape the input data differently if experimenting with other network architectures that expect different input shapes.
Evaluation Metrics:
Monitor different metrics during training and validation (e.g., mean_absolute_error, r2_score) for a better understanding of model performance.
Adjusting these options allows for fine-tuning the neural network model to better fit the data and improve performance.
# Create a 2x3 subplottitle_txt ="Performance comparison of XGBoost vs Convolutional Neural Network</br></br><sub>Label: Expected waiting times</sub>"subplot_titles = ('Interval 1', 'Interval 2', 'Interval 3', 'Interval 1', 'Interval 2', 'Interval 3')fig = make_subplots(rows=2, cols=3, subplot_titles=subplot_titles)# Add each scatter plot to the subplotfig.add_trace(fig1.data[0], row=1, col=1)fig.add_trace(fig2.data[0], row=1, col=2)fig.add_trace(fig3.data[0], row=1, col=3)fig.add_trace(fig4.data[0], row=2, col=1)fig.add_trace(fig5.data[0], row=2, col=2)fig.add_trace(fig6.data[0], row=2, col=3)# Add annotations for model labelsfig.add_annotation(dict(font=dict(size=16), x=0.5, y=1.14, showarrow=False, text="XGBoost", xref="paper", yref="paper"))fig.add_annotation(dict(font=dict(size=16), x=0.5, y=0.48, showarrow=False, text="CNN", xref="paper", yref="paper"))# Add shared axis labelsfig.add_annotation(dict(font=dict(size=16), x=0.5, y=-0.1, showarrow=False, text="Actual", xref="paper", yref="paper"))fig.add_annotation(dict(font=dict(size=16), x=-0.1, y=0.5, showarrow=False, text="Predicted", textangle=-90, xref="paper", yref="paper"))# Update layout to add more left margin and increase top marginfig.update_layout(margin=dict(l=150, r=20, t=180, b=100), height=800, width=1200, title_text=title_txt, title_y=0.95)# Show the combined plotfig.show()