2024-06-27

Author

Witek ten Hove

OBP:

Afspraken:

We gaan verder kijken naar XG Boosting
Data: optimale schedules, labels: waiting times en tardiness

Idee Joost: cardinal analysis - is schedule A better than B? -> Yes/No

Experiment: Application of ML models to calculate loss values

In this experiment we will try out several Machine Learning models to predict the outcome of a loss function from a given schedule with T intervals of length d. A schedule with N patients can be defined as a vector

x = (x_1, x_2, \ldots, x_T) \in \mathbb{N}_0^T

\text{ such that } \sum_{t=1}^{T} x_t = N.

Waiting times are calculated using a Lindley recursion:

W_{1, t} = \max(0, W_{1, t-1} + V_{t-1} - d)

W_{n, t} = W_{1,t} * S^{(n-1)}

where W_{n, t} is the waiting time distribution of patient n in interval t, V_{T} is the distribution of the total amount of work in the last interval, S^{(k)} is the k-fold convolution of the service time distribution S and W*S is the convolution of the waiting time distribtion W with S.

We are interested in calculating for each schedule x the expected total waiting time and overtime

E(W(x)) = \sum_{t=1}^{T} \sum_{n=1}^{x_t} E(W_{n, t})

E(L(x)) = E(\max(0, W_{1, T} + V_{T} - d)),

The loss function is a weighted average of expected waiting time and overtime

C(x) = \alpha_W E(W(x)) + \alpha_L E(L(x))

Directly mapping schedules to loss values would significantly reduce the practical applicability of our method. The weights in the loss function are subjective values that reflect the perceived relative importance of their corresponding factors. Consequently, each time the weights are adjusted, the model would need to be retrained.

A more effective approach would be to train separate models for each factor in the loss function. Once we predict the levels of each factor, we can input them into the loss function using the desired weights. In this experiment we will even go a step further and try to predict the expected waiting times in each interval instead of the aggregated values for the whole schedule.

Setup

Load all packages, the schedule class and selected functions.

from schedule_class import Schedule, generate_schedules
from functions import generate_all_schedules
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Conv2D, Flatten, Dense
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Example usage:
x = np.array([2, 1, 0])
d = 3
s = [0.1, 0.3, 0.25, 0.2, 0.15]
q = 0.0
omega = 0.5

schedule = Schedule(x, d, s, q, omega)
print(schedule)

Zero arrays are: [[array([0., 0., 0., 0.])], [], [array([0., 0., 0., 0.]), array([0., 0., 0., 0.]), array([0., 0., 0., 0.])]]
p_min = [array([1., 0., 0., 0., 0.])] 
w = [[array([1., 0., 0., 0., 0.]), array([0.1 , 0.3 , 0.25, 0.2 , 0.15, 0.  , 0.  , 0.  , 0.  ])], [], []] 
p_plus = [array([0.01  , 0.06  , 0.14  , 0.19  , 0.2125, 0.19  , 0.115 , 0.06  ,
       0.0225, 0.    , 0.    , 0.    , 0.    ])] 
ew = [] 
tardiness = [1. 0. 0. 0. 0.] 
loss = None

2024-07-01 14:47:56.845058: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

Runner functions for creating and running a schedule instance.

def run_schedule(x, d, s, q, omega, print_system=True):
    schedule = Schedule(x=x, d=d, s=s, q=q, omega=omega)
    schedule.calculate_system_states(until=len(x))
    schedule.calculate_wait_times()
    schedule.calculate_loss()
    if(print_system): print(schedule)
    return(schedule)

Generate a dataset for training and testing of various schedules with N \in \{1, \dots, 18\} and corresponding aggregated expected waiting times in each interval.

N = 18
data = {'x0': [], 'x1': [], 'x2': [], 'ew0': [], 'ew1': [], 'ew2': []}
df = pd.DataFrame.from_dict(data)

for n in range(1, N+1):
    schedules = generate_schedules(n) # Generates all possible schedules with T hard-coded 3
    for schedule in schedules:
      x = np.array(schedule, dtype=np.int64)
      sch = run_schedule(x, d, s, q, omega, False)
      x0, x1, x2 = x
      data['x0'].append(x0)
      data['x1'].append(x1)
      data['x2'].append(x2)
      data['ew0'].append(sch.system['ew'][0])
      data['ew1'].append(sch.system['ew'][1])
      data['ew2'].append(sch.system['ew'][2])
      
      # Convert the current data dictionary to a DataFrame and append it to the main DataFrame
      temp_df = pd.DataFrame.from_dict(data)
      df = pd.concat([df, temp_df])
      data = {'x0': [], 'x1': [], 'x2': [], 'ew0': [], 'ew1': [], 'ew2': []}

df

	x0	x1	x2	ew0	ew1	ew2
0	0.0	0.0	1.0	0.0	0.0	0.00
0	0.0	1.0	0.0	0.0	0.0	0.00
0	1.0	0.0	0.0	0.0	0.0	0.00
0	0.0	0.0	2.0	0.0	0.0	2.00
0	0.0	1.0	1.0	0.0	0.0	0.15
...	...	...	...	...	...	...
0	16.0	1.0	1.0	240.0	29.0	28.00
0	16.0	2.0	0.0	240.0	60.0	0.00
0	17.0	0.0	1.0	272.0	0.0	28.00
0	17.0	1.0	0.0	272.0	31.0	0.00
0	18.0	0.0	0.0	306.0	0.0	0.00

1329 rows × 6 columns

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Split data into features and labels
X = df[['x0', 'x1', 'x2']]
y1 = df['ew0']
y2 = df['ew1']
y3 = df['ew2']

# Split into training and testing datasets
X_train, X_test, y1_train, y1_test = train_test_split(X, y1, test_size=0.2, random_state=42)
_, _, y2_train, y2_test = train_test_split(X, y2, test_size=0.2, random_state=42)
_, _, y3_train, y3_test = train_test_split(X, y3, test_size=0.2, random_state=42)

# Train models
params = {
    'objective': 'reg:squarederror',
    'max_depth': 3,
    'eta': 0.1,
    'verbosity': 1
}
num_round = 100

# Model for output_1
dtrain1 = xgb.DMatrix(X_train.iloc[:, 0], label=y1_train)
dtest1 = xgb.DMatrix(X_test.iloc[:, 0], label=y1_test)
model1 = xgb.train(params, dtrain1, num_round)

# Model for output_2
dtrain2 = xgb.DMatrix(X_train.iloc[:, 0:2], label=y2_train)
dtest2 = xgb.DMatrix(X_test.iloc[:, 0:2], label=y2_test)
model2 = xgb.train(params, dtrain2, num_round)

# Model for output_3
dtrain3 = xgb.DMatrix(X_train, label=y3_train)
dtest3 = xgb.DMatrix(X_test, label=y3_test)
model3 = xgb.train(params, dtrain3, num_round)

# Make predictions
preds1 = model1.predict(dtest1)
preds2 = model2.predict(dtest2)
preds3 = model3.predict(dtest3)

# Combine predictions into a single output vector for each input
predictions = pd.DataFrame({
    'output_1': preds1,
    'output_2': preds2,
    'output_3': preds3
})

print(predictions)

       output_1    output_2    output_3
0     19.999617   28.139517  131.985245
1     19.999617   17.529625   -4.471128
2     29.999422   30.009802   65.133102
3      0.002860   -1.054563   48.786777
4      0.002860   -1.054563    9.816395
..          ...         ...         ...
261    2.000951  176.956543   26.927626
262  109.995979   21.386503   27.181793
263    6.000697   15.429869  166.299469
264    0.002860   43.581989  101.111626
265   12.000317   44.678936  168.129898

[266 rows x 3 columns]

mse1 = mean_squared_error(y1_test, preds1)
mse2 = mean_squared_error(y2_test, preds2)
mse3 = mean_squared_error(y3_test, preds3)

print(f'MSE for output_1: {mse1}')
print(f'MSE for output_2: {mse2}')
print(f'MSE for output_3: {mse3}')

MSE for output_1: 4.374190847504203
MSE for output_2: 9.273653843255532
MSE for output_3: 74.13544710299959

# Create a DataFrame with actual and predicted values for comparison
comparison_df = pd.DataFrame({
    'Actual_output_1': y1_test.values,
    'Predicted_output_1': preds1,
    'Actual_output_2': y2_test.values,
    'Predicted_output_2': preds2,
    'Actual_output_3': y3_test.values,
    'Predicted_output_3': preds3
})

# Create scatter plots for each output
fig1 = px.scatter(comparison_df, x='Actual_output_1', y='Predicted_output_1', title='Actual vs Predicted - Output 1')
fig2 = px.scatter(comparison_df, x='Actual_output_2', y='Predicted_output_2', title='Actual vs Predicted - Output 2')
fig3 = px.scatter(comparison_df, x='Actual_output_3', y='Predicted_output_3', title='Actual vs Predicted - Output 3')

# Show the plots
fig1.show()
fig2.show()
fig3.show()

Description of the Setup

Import Necessary Libraries:
```
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
```
- xgboost: Library for gradient boosting.
- train_test_split from sklearn: Function to split data into training and testing sets.
- mean_squared_error from sklearn: Function to evaluate the performance of the models.
Prepare Data:
```
# Split data into features and labels
X = df[['x0', 'x1', 'x2']]
y1 = df['ew0']
y2 = df['ew1']
y3 = df['ew2']
```
- X: Feature matrix containing columns x0, x1, and x2.
- y1, y2, y3: Target variables corresponding to different outputs.

Split Data into Training and Testing Sets:

# Split into training and testing datasets
X_train, X_test, y1_train, y1_test = train_test_split(X, y1, test_size=0.2, random_state=42)
_, _, y2_train, y2_test = train_test_split(X, y2, test_size=0.2, random_state=42)
_, _, y3_train, y3_test = train_test_split(X, y3, test_size=0.2, random_state=42)

train_test_split splits the data with 80% for training and 20% for testing, using a fixed random seed (random_state=42) for reproducibility.
Splits are performed separately for y1, y2, and y3.

Define XGBoost Parameters:
```
params = {
    'objective': 'reg:squarederror',
    'max_depth': 3,
    'eta': 0.1,
    'silent': 1
}
num_round = 100
```
- params: Dictionary of parameters for XGBoost.
  - objective: Specifies the learning task and the corresponding objective. Here, it’s set to regression with squared error.
  - max_depth: Maximum depth of a tree.
  - eta: Learning rate.
  - verbosity: Verbosity mode.
- num_round: Number of boosting rounds.

Train Models:

For output_1:

dtrain1 = xgb.DMatrix(X_train.iloc[:, 0], label=y1_train)
dtest1 = xgb.DMatrix(X_test.iloc[:, 0], label=y1_test)
model1 = xgb.train(params, dtrain1, num_round)

dtrain1, dtest1: DMatrix objects for training and testing with only the first feature column (x0).
model1: Trained model for y1.

For output_2:

dtrain2 = xgb.DMatrix(X_train.iloc[:, 0:2], label=y2_train)
dtest2 = xgb.DMatrix(X_test.iloc[:, 0:2], label=y2_test)
model2 = xgb.train(params, dtrain2, num_round)

dtrain2, dtest2: DMatrix objects for training and testing with the first two feature columns (x0 and x1).
model2: Trained model for y2.

For output_3:

dtrain3 = xgb.DMatrix(X_train, label=y3_train)
dtest3 = xgb.DMatrix(X_test, label=y3_test)
model3 = xgb.train(params, dtrain3, num_round)

dtrain3, dtest3: DMatrix objects for training and testing with all three feature columns (x0, x1, and x2).
model3: Trained model for y3.

Learning Task Parameters:
- objective: Defines the loss function to be minimized (e.g., reg:squarederror, binary:logistic, multi:softmax).
Tree Parameters:
- max_depth: Maximum depth of the decision trees. Larger values can lead to overfitting.
- min_child_weight: Minimum sum of instance weight (hessian) needed in a child.
- gamma: Minimum loss reduction required to make a further partition on a leaf node.
Booster Parameters:
- eta (alias: learning_rate): Step size shrinkage used to prevent overfitting.
- subsample: Proportion of training instances to use for each tree. Helps prevent overfitting.
- colsample_bytree: Subsample ratio of columns when constructing each tree.
- lambda (alias: reg_lambda): L2 regularization term on weights.
- alpha (alias: reg_alpha): L1 regularization term on weights.
Learning Task Customization:
- scale_pos_weight: Control the balance of positive and negative weights, useful for unbalanced classes.
- eval_metric: Evaluation metrics to be used (e.g., rmse, logloss, error).
Control Parameters:
- n_estimators: Number of trees to fit (number of boosting rounds).
- early_stopping_rounds: Stop training if one metric doesn’t improve after a given number of rounds.
Tree Method Parameters:
- tree_method: Algorithm used to construct trees (e.g., auto, exact, approx, hist, gpu_hist).
- grow_policy: Controls the growth policy for the trees. depthwise or lossguide.

Adjusting these parameters allows for fine-tuning the XGBoost model to better fit the data and improve performance.

X = df[['x0', 'x1', 'x2']].values
y = df[['ew0', 'ew1', 'ew2']].values

# Reshape X to have the shape (samples, height, width, channels)
X = X.reshape((X.shape[0], X.shape[1], 1, 1))

# Split into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the model
model = Sequential()
model.add(Input(shape=(3, 1, 1)))
model.add(Conv2D(64, (2, 1), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(3))  # Output layer with 3 units (one for each output value)

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Print model summary
model.summary()

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D)                 │ (None, 2, 1, 64)       │           192 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (Flatten)               │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 64)             │         8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 3)              │           195 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 8,643 (33.76 KB)

 Trainable params: 8,643 (33.76 KB)

 Non-trainable params: 0 (0.00 B)

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=1, verbose=2)

Epoch 1/50
1063/1063 - 2s - 2ms/step - loss: 1294.3595
Epoch 2/50
1063/1063 - 1s - 616us/step - loss: 298.1531
Epoch 3/50
1063/1063 - 1s - 748us/step - loss: 234.6372
Epoch 4/50
1063/1063 - 1s - 768us/step - loss: 175.7784
Epoch 5/50
1063/1063 - 1s - 1ms/step - loss: 124.5925
Epoch 6/50
1063/1063 - 1s - 1ms/step - loss: 86.6902
Epoch 7/50
1063/1063 - 1s - 770us/step - loss: 58.3374
Epoch 8/50
1063/1063 - 1s - 726us/step - loss: 41.1547
Epoch 9/50
1063/1063 - 1s - 629us/step - loss: 29.9633
Epoch 10/50
1063/1063 - 1s - 736us/step - loss: 23.1881
Epoch 11/50
1063/1063 - 1s - 720us/step - loss: 20.1258
Epoch 12/50
1063/1063 - 1s - 738us/step - loss: 16.1180
Epoch 13/50
1063/1063 - 1s - 680us/step - loss: 12.8433
Epoch 14/50
1063/1063 - 1s - 617us/step - loss: 12.7816
Epoch 15/50
1063/1063 - 1s - 629us/step - loss: 11.3140
Epoch 16/50
1063/1063 - 1s - 632us/step - loss: 9.8082
Epoch 17/50
1063/1063 - 1s - 847us/step - loss: 9.4677
Epoch 18/50
1063/1063 - 1s - 640us/step - loss: 8.9950
Epoch 19/50
1063/1063 - 1s - 618us/step - loss: 7.7667
Epoch 20/50
1063/1063 - 1s - 637us/step - loss: 8.6389
Epoch 21/50
1063/1063 - 1s - 624us/step - loss: 6.5911
Epoch 22/50
1063/1063 - 1s - 681us/step - loss: 6.8794
Epoch 23/50
1063/1063 - 1s - 635us/step - loss: 6.0032
Epoch 24/50
1063/1063 - 1s - 641us/step - loss: 6.2592
Epoch 25/50
1063/1063 - 1s - 618us/step - loss: 6.0970
Epoch 26/50
1063/1063 - 1s - 737us/step - loss: 5.0670
Epoch 27/50
1063/1063 - 1s - 661us/step - loss: 5.9386
Epoch 28/50
1063/1063 - 1s - 637us/step - loss: 6.0425
Epoch 29/50
1063/1063 - 1s - 612us/step - loss: 6.2369
Epoch 30/50
1063/1063 - 1s - 628us/step - loss: 4.1562
Epoch 31/50
1063/1063 - 1s - 617us/step - loss: 5.4373
Epoch 32/50
1063/1063 - 1s - 640us/step - loss: 5.1314
Epoch 33/50
1063/1063 - 1s - 633us/step - loss: 4.6944
Epoch 34/50
1063/1063 - 1s - 620us/step - loss: 5.2129
Epoch 35/50
1063/1063 - 1s - 701us/step - loss: 4.9801
Epoch 36/50
1063/1063 - 1s - 729us/step - loss: 4.9067
Epoch 37/50
1063/1063 - 1s - 633us/step - loss: 4.1799
Epoch 38/50
1063/1063 - 1s - 635us/step - loss: 5.0204
Epoch 39/50
1063/1063 - 1s - 628us/step - loss: 4.3506
Epoch 40/50
1063/1063 - 1s - 622us/step - loss: 5.0035
Epoch 41/50
1063/1063 - 1s - 630us/step - loss: 3.9933
Epoch 42/50
1063/1063 - 1s - 652us/step - loss: 5.1471
Epoch 43/50
1063/1063 - 1s - 633us/step - loss: 4.3212
Epoch 44/50
1063/1063 - 1s - 631us/step - loss: 3.9416
Epoch 45/50
1063/1063 - 1s - 618us/step - loss: 4.6743
Epoch 46/50
1063/1063 - 1s - 628us/step - loss: 3.8187
Epoch 47/50
1063/1063 - 1s - 649us/step - loss: 4.0055
Epoch 48/50
1063/1063 - 1s - 722us/step - loss: 3.9888
Epoch 49/50
1063/1063 - 1s - 637us/step - loss: 4.2608
Epoch 50/50
1063/1063 - 1s - 634us/step - loss: 4.3603

<keras.src.callbacks.history.History at 0x159ab52b0>

# Evaluate the model
loss = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Loss: {loss}')

# Make predictions
predictions = model.predict(X_test)

Test Loss: 3.7186384201049805
1/9 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step

# Create a DataFrame with actual and predicted values for comparison
comparison_df = pd.DataFrame({
    'Actual_output_1': y_test[:, 0],
    'Predicted_output_1': predictions[:, 0],
    'Actual_output_2': y_test[:, 1],
    'Predicted_output_2': predictions[:, 1],
    'Actual_output_3': y_test[:, 2],
    'Predicted_output_3': predictions[:, 2]
})

# Create scatter plots for each output
fig4 = px.scatter(comparison_df, x='Actual_output_1', y='Predicted_output_1', title='Actual vs Predicted - Output 1')
fig5 = px.scatter(comparison_df, x='Actual_output_2', y='Predicted_output_2', title='Actual vs Predicted - Output 2')
fig6 = px.scatter(comparison_df, x='Actual_output_3', y='Predicted_output_3', title='Actual vs Predicted - Output 3')

# Show the plots
fig4.show()
fig5.show()
fig6.show()

Data Preparation

Import Necessary Libraries:
```
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Conv2D, Flatten, Dense
from tensorflow.keras.optimizers import Adam
```
- numpy: Library for numerical computations.
- train_test_split from sklearn: Function to split data into training and testing sets.
- Sequential, Conv2D, Flatten, Dense from tensorflow.keras: Classes and functions for building and training neural networks.
- Adam from tensorflow.keras.optimizers: Optimizer for training the neural network.
Prepare Data:
```
X = df[['x0', 'x1', 'x2']].values
y = df[['ew0', 'ew1', 'ew2']].values
```
- X: Feature matrix containing columns x0, x1, and x2.
- y: Target matrix containing columns ew0, ew1, and ew2.
Reshape Data:

In Convolutional Neural Networks (CNNs), the input data is typically expected to be in a specific shape to allow the convolutional layers to process it correctly. The required shape usually depends on the nature of the data and the framework being used. Here’s a detailed explanation of the reshaping process:

Assuming we have a dataset X where:
- Each row corresponds to a different sample.
- Each column corresponds to a different feature.
For example, if X has 3 features (x0, x1, x2), the original shape of X would be (samples, 3).

CNNs expect input data in the form of a 4D tensor:
- samples: Number of samples in the dataset.
- height: Height of the input data (typically the number of rows in an image).
- width: Width of the input data (typically the number of columns in an image).
- channels: Number of channels in the input data (e.g., 1 for grayscale images, 3 for RGB images).
For this specific case, we want to reshape X to have a shape suitable for a CNN. The new shape is (samples, height, width, channels).

Here’s the code for reshaping X:
```
# Reshape X to have the shape (samples, height, width, channels)
X = X.reshape((X.shape[0], X.shape[1], 1, 1))
```
- X.shape[0]: This represents the number of samples.
- X.shape[1]: This represents the number of features. In this case, each feature will be treated as a separate “row” in the “image”.
- 1: This represents the width of each “image”. Since each feature is treated as a single “pixel” row, the width is 1.
- 1: This represents the number of channels. Since the data isn’t color-coded and only consists of numerical values for each feature, we use 1 channel.
Split Data into Training and Testing Sets:
```
# Split into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
- train_test_split splits the data with 80% for training and 20% for testing, using a fixed random seed (random_state=42) for reproducibility.

Model Building

Build the Model:
```
# Build the model
model = Sequential()
model.add(Input(shape=(3, 1, 1)))
model.add(Conv2D(64, (2, 1), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(3))  # Output layer with 3 units (one for each output value)
```
- Creates a Sequential model.
- Adds an Input layer with a shape of (3, 1, 1) to define the input shape of the model, which in this case is a 3x1x1 tensor. This explicitly sets the expected input dimensions for the model, which is useful for ensuring the data is correctly formatted before passing it through subsequent layers.
- Adds a Conv2D layer:
  - The layer will learn 64 different filters.
  - A kernel size of (2, 1), indicating the height and width of the 2D convolution window.
  - ReLU (Rectified Linear Unit) activation function is used to introduce non-linearity to the model, helping it learn more complex patterns.
- Adds a Flatten layer to convert the 2D output of the convolutional layer to a 1D array, preparing it to be fed into a fully connected (dense) layer. This conversion is necessary because dense layers expect 1D input.
- Adds a Dense layer with 64 units (neurons) and ReLU activation function.
- Adds an output Dense layer with 3 units (one for each target variable). This layer does not use an activation function, allowing it to produce a wide range of values as output.
Compile the Model:
```
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
```
- Compiles the model with the Adam optimizer and mean squared error loss function.
Print Model Summary:
```
# Print model summary
model.summary()
```
- Prints a summary of the model architecture.

Model Training and Evaluation

Train the Model:
```
# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=1, verbose=2)
```
- Trains the model for 50 epochs with a batch size of 1. The verbose=2 option provides detailed logs during training.

Evaluate the Model:

# Evaluate the model
loss = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Loss: {loss}')

Evaluates the model on the test data and prints the test loss.

Make Predictions:

# Make predictions
predictions = model.predict(X_test)

Makes predictions on the test data.

Model Architecture:
- Layer Types: Add or remove layers such as additional Conv2D, MaxPooling2D, Dense, or other layers to change the network’s complexity.
- Layer Parameters: Adjust the number of filters in Conv2D, the size of kernels, or the number of units in Dense layers.
Activation Functions:
- Change activation functions (e.g., relu, sigmoid, tanh, softmax) to experiment with different non-linearities.
Optimizer:
- Change the optimizer (e.g., SGD, RMSprop, Adam, Adagrad). Adjust learning rates and other optimizer-specific parameters.
Loss Function:
- Use different loss functions (e.g., mean_absolute_error, mean_squared_logarithmic_error) depending on the specific task and data distribution.
Regularization:
- Add regularization techniques such as Dropout layers, L2 regularization (kernel_regularizer=l2()), or early stopping during training to prevent overfitting.
Batch Size and Epochs:
- Adjust the batch_size and epochs to change the training dynamics. Larger batch sizes can provide more stable updates, while more epochs allow the model to learn more but can lead to overfitting.
Input Data Shape:
- Reshape the input data differently if experimenting with other network architectures that expect different input shapes.
Evaluation Metrics:
- Monitor different metrics during training and validation (e.g., mean_absolute_error, r2_score) for a better understanding of model performance.

Adjusting these options allows for fine-tuning the neural network model to better fit the data and improve performance.

# Create a 2x3 subplot
title_txt = "Performance comparison of XGBoost vs Convolutional Neural Network</br></br><sub>Label: Expected waiting times</sub>"
subplot_titles = ('Interval 1', 'Interval 2', 
                  'Interval 3', 'Interval 1', 
                  'Interval 2', 'Interval 3')

fig = make_subplots(rows=2, cols=3, subplot_titles=subplot_titles)

# Add each scatter plot to the subplot
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=1, col=2)
fig.add_trace(fig3.data[0], row=1, col=3)
fig.add_trace(fig4.data[0], row=2, col=1)
fig.add_trace(fig5.data[0], row=2, col=2)
fig.add_trace(fig6.data[0], row=2, col=3)

# Add annotations for model labels
fig.add_annotation(dict(font=dict(size=16),
                        x=0.5,
                        y=1.14,
                        showarrow=False,
                        text="XGBoost",
                        xref="paper",
                        yref="paper"))

fig.add_annotation(dict(font=dict(size=16),
                        x=0.5,
                        y=0.48,
                        showarrow=False,
                        text="CNN",
                        xref="paper",
                        yref="paper"))

# Add shared axis labels
fig.add_annotation(dict(font=dict(size=16),
                        x=0.5,
                        y=-0.1,
                        showarrow=False,
                        text="Actual",
                        xref="paper",
                        yref="paper"))

fig.add_annotation(dict(font=dict(size=16),
                        x=-0.1,
                        y=0.5,
                        showarrow=False,
                        text="Predicted",
                        textangle=-90,
                        xref="paper",
                        yref="paper"))

# Update layout to add more left margin and increase top margin
fig.update_layout(margin=dict(l=150, r=20, t=180, b=100), height=800, width=1200, title_text=title_txt, title_y=0.95)

# Show the combined plot
fig.show()