2024-07-16

Author

Witek ten Hove

Volgende denkrichtingen:

Train een XGBoost model dat een list van twee schedules als input neemt en als output de index van de schedule met de hoogste objective waarde. Zie: https://xgboost.readthedocs.io/en/latest/tutorials/learning_to_rank.html
Train een model dat in nauwkeurigheid toeneemt naarmate de objective value of ranking beter is. In dat geval zou de rankingplot meer conisch verlopen en de ranking nauwkeuriger zijn bij een betere ranking. Zie: https://elicit.com/notebook/aa0448de-dc7e-4d8b-8bd8-c1875679265f#17e0ddd35b0962672caf3894fb9da5b4
Ik test de snelheid van de het huidige XGBoost model t.o.v. de berekende waarde.
We maken een mix van berekening van de werkelijke waarde en berekening via het model. Hiervoor moeten we kijken of het model bij de waarde ook een inschatting van de betrouwbaarheid kan geven. Als de betrouwbaarheid laag is, wordt de werkelijke waarde berekend.
Mental note: Bekijk multiprocessing en joblib

from schedule_class import NewSchedule, generate_schedules, service_time_with_no_shows
from functions import generate_all_schedules
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Conv2D, Flatten, Dense
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import pickle

Zero arrays are: [[array([0., 0., 0., 0.])], [], [array([0., 0., 0., 0.]), array([0., 0., 0., 0.]), array([0., 0., 0., 0.])]]

2024-07-16 16:27:33.801982: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

def run_schedule(x, d, s, q, omega, print_system=True):
    schedule = NewSchedule(x=x, d=d, s=s, q=q, omega=omega)
    schedule.calculate_system_states(until=len(x))
    schedule.calculate_wait_times()
    schedule.calculate_loss()
    if(print_system): print(schedule)
    return(schedule)

N = 10
T =  7
d = 3
s = [0.0, 0.27, 0.28, 0.2, 0.15, 0.1]
indices = np.arange(len(s))
exp_s = (indices * s).sum()
q = 0.2
s_adj = service_time_with_no_shows(s, q)
indices = np.arange(len(s_adj))
exp_s_adj = (indices * s_adj).sum()
print(f'service time distribution with no-shows: {s_adj} with expcted value: {exp_s_adj}')
omega = 0.5

samples_names = [f'x_{t}' for t in range(T)]
samples = pd.DataFrame(columns = samples_names)
labels_names = [f'ew_{t}' for t in range(T)]
labels = pd.DataFrame(columns = labels_names)

schedules = generate_all_schedules(N, T) # Generates all possible schedules with length T
print(f'N = {N}, # of schedules = {len(schedules)}')
for schedule in schedules:
  x = np.array(schedule, dtype=np.int64)
  sch = run_schedule(x, d, s, q, omega, False)
  
  # Convert the current data dictionary to a DataFrame and append it to the main DataFrame
  temp_samples = pd.DataFrame([x], columns=samples_names)
  samples = pd.concat([samples, temp_samples], ignore_index=True)
  temp_labels = pd.DataFrame([sch.system['ew']], columns=labels_names)
  labels = pd.concat([labels, temp_labels], ignore_index=True)

samples = samples.astype(np.int64)
labels = labels.astype(np.float64)
labels['obj'] = labels.sum(axis=1)
labels['obj_rank'] = labels['obj'].rank().astype(np.float64)
samples.tail()
labels.tail()

service time distribution with no-shows: [0.2, 0.21600000000000003, 0.22400000000000003, 0.16000000000000003, 0.12, 0.08000000000000002] with expcted value: 2.024
N = 10, # of schedules = 8008

/var/folders/gf/gtt1mww524x0q33rqlwsmjw80000gn/T/ipykernel_77102/458601458.py:29: FutureWarning:

The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

	ew_0	ew_1	ew_2	ew_3	ew_4	obj	obj_rank
8003	72.864	0.000000	0.000000	0.000000	6.380606	79.244606	7968.0
8004	72.864	0.000000	0.000000	9.241482	0.000000	82.105482	7982.5
8005	72.864	0.000000	12.217883	0.000000	0.000000	85.081883	7993.0
8006	72.864	15.216038	0.000000	0.000000	0.000000	88.080038	7998.5
8007	91.080	0.000000	0.000000	0.000000	0.000000	91.080000	8005.0

Using Pickle to save and re-use dataset

class ScheduleData:
  def __init__(self, N: int, T: int, samples, labels):
    self.N = N
    self.T = T
    self.samples = samples
    self.labels = labels
  
  def describe_data(self):
    print(f'N = {self.N}', f'T = {self.T}', '\nSamples',self.samples.tail(10), '\nLabels', self.labels.tail(10), sep = "\n")

curr_sch_data: ScheduleData = ScheduleData(N, T, samples, labels)
curr_sch_data.describe_data()

N = 10
T = 7

Samples
      x_0  x_1  x_2  x_3  x_4  x_5  x_6
7998    8    1    0    1    0    0    0
7999    8    1    1    0    0    0    0
8000    8    2    0    0    0    0    0
8001    9    0    0    0    0    0    1
8002    9    0    0    0    0    1    0
8003    9    0    0    0    1    0    0
8004    9    0    0    1    0    0    0
8005    9    0    1    0    0    0    0
8006    9    1    0    0    0    0    0
8007   10    0    0    0    0    0    0

Labels
        ew_0       ew_1       ew_2      ew_3      ew_4      ew_5    ew_6  \
7998  56.672  13.192158   0.000000  9.241482  0.000000  0.000000  0.0000   
7999  56.672  13.192158  12.217911  0.000000  0.000000  0.000000  0.0000   
8000  56.672  28.408317   0.000000  0.000000  0.000000  0.000000  0.0000   
8001  72.864   0.000000   0.000000  0.000000  0.000000  0.000000  1.9547   
8002  72.864   0.000000   0.000000  0.000000  0.000000  3.858871  0.0000   
8003  72.864   0.000000   0.000000  0.000000  6.380606  0.000000  0.0000   
8004  72.864   0.000000   0.000000  9.241482  0.000000  0.000000  0.0000   
8005  72.864   0.000000  12.217883  0.000000  0.000000  0.000000  0.0000   
8006  72.864  15.216038   0.000000  0.000000  0.000000  0.000000  0.0000   
8007  91.080   0.000000   0.000000  0.000000  0.000000  0.000000  0.0000   

            obj  obj_rank  
7998  79.105640    7964.5  
7999  82.082070    7978.0  
8000  85.080317    7987.5  
8001  74.818700    7911.0  
8002  76.722871    7945.5  
8003  79.244606    7968.0  
8004  82.105482    7982.5  
8005  85.081883    7993.0  
8006  88.080038    7998.5  
8007  91.080000    8005.0

with open('./experiments/data.pickle', 'wb') as file:
  pickle.dump(curr_sch_data, file)

with open('./experiments/data.pickle', 'rb') as file:
  curr_sch_data_test: ScheduleData = pickle.load(file)

curr_sch_data_test.describe_data()

N = 10
T = 7

Samples
      x_0  x_1  x_2  x_3  x_4  x_5  x_6
7998    8    1    0    1    0    0    0
7999    8    1    1    0    0    0    0
8000    8    2    0    0    0    0    0
8001    9    0    0    0    0    0    1
8002    9    0    0    0    0    1    0
8003    9    0    0    0    1    0    0
8004    9    0    0    1    0    0    0
8005    9    0    1    0    0    0    0
8006    9    1    0    0    0    0    0
8007   10    0    0    0    0    0    0

Labels
        ew_0       ew_1       ew_2      ew_3      ew_4      ew_5    ew_6  \
7998  56.672  13.192158   0.000000  9.241482  0.000000  0.000000  0.0000   
7999  56.672  13.192158  12.217911  0.000000  0.000000  0.000000  0.0000   
8000  56.672  28.408317   0.000000  0.000000  0.000000  0.000000  0.0000   
8001  72.864   0.000000   0.000000  0.000000  0.000000  0.000000  1.9547   
8002  72.864   0.000000   0.000000  0.000000  0.000000  3.858871  0.0000   
8003  72.864   0.000000   0.000000  0.000000  6.380606  0.000000  0.0000   
8004  72.864   0.000000   0.000000  9.241482  0.000000  0.000000  0.0000   
8005  72.864   0.000000  12.217883  0.000000  0.000000  0.000000  0.0000   
8006  72.864  15.216038   0.000000  0.000000  0.000000  0.000000  0.0000   
8007  91.080   0.000000   0.000000  0.000000  0.000000  0.000000  0.0000   

            obj  obj_rank  
7998  79.105640    7964.5  
7999  82.082070    7978.0  
8000  85.080317    7987.5  
8001  74.818700    7911.0  
8002  76.722871    7945.5  
8003  79.244606    7968.0  
8004  82.105482    7982.5  
8005  85.081883    7993.0  
8006  88.080038    7998.5  
8007  91.080000    8005.0