5 Large instance local search with trained XGBoost regressor model

5.1 Objective

Test the working and performance of a previously trained XGBoost Ranking model in a local search application.

5.2 Background

In previous experiments, we trained an XGBoost Classifier model to predict the objective values of neighboring schedules. In this experiment, we will use the trained models to perform a local search to find the best schedule.

5.3 Hypothesis

The XGBoost Classifier model will be able to efficiently guide the local search algorithm to find a schedule with a lower objective value than the initial schedule.

5.4 Methodology

5.4.1 Tools and Materials

import numpy as np
import json
import time
from itertools import chain, combinations
import sys
from math import comb  # Python 3.8 and later
import xgboost as xgb
import pickle
from typing import List, Tuple, Dict, Iterable, TypeVar, Union, Any, Optional, Literal

import logging
import sys # Needed for StreamHandler in order to enable explicit console output

# Logging configuration
log_level = logging.DEBUG # DEBUG or INFO
log_format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'

# Log to a file instead of to the console:
logging.basicConfig(level=log_level, format=log_format, filename='search.log', filemode='w')

# Get a logger instance
logger = logging.getLogger(__name__)

5.4.2 Load Parameters

N = 22 # Number of patients
T = 20 # Number of intervals
l = 10 # Target service time length

file_path_parameters = f"datasets/parameters_{N}_{T}_{l}.pkl" # For retrieving saved scheduling parameters
# Load the data from the pickle file
with open(file_path_parameters, 'rb') as f:
    data_params = pickle.load(f)

N = data_params['N'] # Number of patients
T = data_params['T'] # Number of intervals
d = data_params['d'] # Length of each interval
max_s = data_params['max_s'] # Maximum service time
q = data_params['q'] # Probability of a scheduled patient not showing up
w = data_params['w'] # Weight for the waiting time in objective function
l = data_params['l']
  
num_schedules = data_params['num_schedules'] # Size of training set
convolutions = data_params['convolutions'] # Service time distributions used in training phase adjusted for no-shows
print(f"Parameters loaded: N={N}, T={T}, l={l}, d={d}, max_s={max_s}, q={q}, w={w}, num_schedules={num_schedules}")

Parameters loaded: N=22, T=20, l=10, d=5, max_s=20, q=0.2, w=0.1, num_schedules=300000

5.4.3 Experimental Design

We will use the trained XGBoost Classifier model to guide a local search algorithm to find the best schedule. The local search algorithm will start with an initial schedule and iteratively explore the neighborhood of the current schedule to find a better one. As an initial schedule, we will use the schedule with the lowest objective value from the training dataset that was used to train the XGBoost Classifier model.

5.4.4 Variables

Independent Variables:
- Initial schedule, trained XGBoost Classifier
Dependent Variables:
- Speed, accuracy, and convergence of the local search algorithm.

5.4.5 Data Collection

We will use the training dataset to initialize the local search algorithm.

5.4.6 Sample Size and Selection

5.4.7 Experimental Procedure

graph TD
                A[Start] --> B("Initialize schedule x");
                B --> C{"Iterate through all subsets U of V*"};
                C -- "For each U" --> D{"Compute y = x + sum(v in U)"};
                D -- "Check y >= 0" --> E{"Compute cost C(y)"};
                E --> F{"Is C(y) < C(x)?"};
                F -- "Yes" --> G["Update x := y"];
                G --> C;
                F -- "No" --> H{"Finished iterating all U?"};
                H -- "Yes" --> I["End: x is optimal schedule"];
                H -- "No" --> C;
                D -- "If y < 0" --> C;

5.5 Results

5.5.1 Load the initial best schedule.

Start with the best solution found so far \(\{x^*, C(x^*)\}\) from the training set.

# Load the best solution from the training dataset
file_path_schedules = f"datasets/best_schedule_{N}_{T}_{l}.pkl"
# Load the data from the pickle file
with open(file_path_schedules, 'rb') as f:
    best_schedule_data = pickle.load(f)
    
print(f"The data has following keys: {[key for key in best_schedule_data.keys()]}")

print(f"The current best schedule is: {best_schedule_data['best_schedule']} with objective value {best_schedule_data['objective']}.")

# Set the initial schedule to the best solution from the training dataset
initial_schedule = best_schedule_data['best_schedule']

The data has following keys: ['best_schedule', 'objective']
The current best schedule is: [2, 2, 0, 2, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3] with objective value 24.028625804839624.

5.5.2 Generate the neighborhood of \(x^*\).

5.5.2.1 Define \(V^*\) and \(U_t\).

Define the vectors \(V^*\) as follows:

\[ \left\{ \begin{array}{c} \vec{v_1}, \\ \vec{v_2}, \\ \vec{v_3}, \\ \vdots \\ \vec{v_{T-1}}, \\ \vec{v_T} \\ \end{array} \right\} = \left\{ \begin{array}{c} (-1, 0,...., 0, 1), \\ (1, -1, 0,...., 0), \\ (0, 1, -1,...., 0), \\ \vdots \\ (0,...., 1, -1, 0), \\ (0,...., 0, 1, -1) \\ \end{array} \right\} \]

Define \(U_t\) as the set of all possible subsets of \(V^*\) such that each subset contains exactly \(t\) elements, i.e.,

\[ U_t = \{ S \subsetneq V^* \mid |S| = t \}, \quad t \in \{1, 2, \dots, T\}. \]

from functions import get_v_star

def powerset(iterable, size=1):
    "powerset([1,2,3], 2) --> (1,2) (1,3) (2,3)"
    return [[i for i in item] for item in combinations(iterable, size)]
  
x = initial_schedule

# Generate a matrix 'v_star' using the 'get_v_star' function
v_star = get_v_star(T)

# Generate all possible non-empty subsets (powerset) of the set {0, 1, 2, ..., t-1}
# 'ids' will be a list of tuples, where each tuple is a subset of indices
size = 2
ids = powerset(range(T), size)
len(ids)
ids[:T]

[[0, 1],
 [0, 2],
 [0, 3],
 [0, 4],
 [0, 5],
 [0, 6],
 [0, 7],
 [0, 8],
 [0, 9],
 [0, 10],
 [0, 11],
 [0, 12],
 [0, 13],
 [0, 14],
 [0, 15],
 [0, 16],
 [0, 17],
 [0, 18],
 [0, 19],
 [1, 2]]

5.5.2.2 Define the neighborhood of \(x\)

Define the neighborhood of \(x\) as all vectors of the form \(x + u_{tk}, \forall \, u_{tk} \in U_t\).

from functions import get_neighborhood
test_nh = get_neighborhood(x, v_star, ids)
print(f"All neighborhoods with {size} patients switched:\n x = {np.array(x)}: \n {test_nh}")

All neighborhoods with 2 patients switched:
 x = [2 2 0 2 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 3]: 
 [[2 1 0 ... 1 1 4]
 [1 2 1 ... 1 1 4]
 [1 2 0 ... 1 1 4]
 ...
 [2 2 0 ... 1 0 3]
 [2 2 0 ... 0 2 2]
 [2 2 0 ... 2 1 2]]

5.5.3 Local search algorithm with prediction

Load the pre-trained model and use it for evaluating schedules within a local search algorithm. The search algorithm checks for false positives (prediction improvement = “True”, actual is improvement = “False”) and false negatives (prediction improvement = “False”, actual is improvement = “True”). In both cases the model is updated using the schedules and associated objective values (rankings).

graph TD
    %% --- Part 1: Initialization & Outer Loop ---

    A[Start: local_search_predict_update] --> B{Inputs: x, w, v_star, clf, params, size, restarts, threshold};
    B --> C{"Validate Inputs (clf, x length)"};
    C -- Valid --> D["Initialize: x_star, T, restart_count=0, t=1"];
    C -- Invalid --> Z_Err1["Raise ValueError"];
    D --> E{"Calculate Initial cost_star"};
    E -- Success --> F{"Outer Loop: t <= size AND restart_count < restarts?"};
    E -- Error --> Z_Err2["Return x_star, clf"];

    %% Connections FROM other parts back to the Outer Loop check (F)
    Connector_O([From Part 2: Break Inner Loop]) --> F;
    Connector_CC_Yes([From Part 3: Found Better at Level t]) --> F;
    Connector_DD([From Part 3: Incremented t]) --> F;

    %% Connections TO other parts
    F -- No --> Y["End: Return x_star, clf"];
    F -- Yes --> G["Generate Neighborhood (level t)"];
    G --> Connector_H([To Part 2: Start Inner Loop]);

graph TD
    %% --- Part 2: Inner Loop - Neighbor Evaluation ---

    Connector_G([From Part 1: Generate Neighborhood]) --> H{"Inner Loop: For each neighbor"};

    H -- Next Neighbor --> I{"Predict with clf: prediction, P(0)"};
    I -- Error Predicting --> I_Err["Log Error, Assume P=0"];
    I_Err --> J;
    I -- Success --> J{"Perform Expensive Check? (Pred=1 OR P(0) < threshold)"};
    J -- No --> H_Next[Next Neighbor]; %% Skip expensive check
    J -- Yes --> K{"Calculate True Cost (Expensive Objective Func)"};
    K -- Error --> K_Err["Log Error"];
    K_Err --> H_Next;
    K -- Success --> L{"Is neighbor truly better? (cost_neighbor < cost_star)"};

    %% Path 1: Improvement Found
    L -- Yes --> M["Update x_star, cost_star, T"];
    M --> N["Set found_better=True, t=1, restart_count++"];
    N --> O["Break Inner Loop"];
    O --> Connector_F1([To Part 1: Outer Loop Check]); %% Connects back to F

    %% Path 2: No Improvement
    L -- No --> P{"Misprediction? (Pred=1 AND Actual=0)"};
    P -- No --> Q["Log Borderline/Correct Pred=0"];
    Q --> H_Next;
    P -- Yes --> R["Log Misprediction"];
    R --> Connector_S([To Part 3: Start Retraining]); %% Trigger Retraining

    %% Loop Control
    H_Next --> H; %% Process next neighbor
    H -- End of Neighbors --> BB{"End Inner Loop"};
    BB --> Connector_BB([To Part 3: Check Level Result]);

graph TD
    %% --- Part 3: Retraining Sub-routine & Loop Control ---

    %% Retraining Sub-routine Start
    Connector_R([From Part 2: Misprediction Detected]) --> S["Start Retraining Sub-routine"];
    subgraph Retraining Sub-routine
        direction TB
        S --> T{"Calculate True Costs for ALL neighbors at level t"};
        T --> U{"Opportunistic Better Found during Cost Calc?"};
        U -- Yes --> V["Update x_star, cost_star, T"];
        V --> W["Set found_better_retrain=True"];
        W --> X["Collect Data: Append features/labels for update"];
        U -- No --> X;
        X --> X_Loop{"More neighbors to process for retraining?"};
        X_Loop -- Yes --> T;
        X_Loop -- No --> Y_Fit{"Fit clf incrementally"};
        Y_Fit -- Error --> Y_FitErr["Log Fit Error"];
        Y_FitErr --> Z_CheckOpp{"Check if found_better_retrain?"};
        Y_Fit -- Success --> Z_CheckOpp;
    end

    %% Retraining Outcome
    Z_CheckOpp -- Yes --> AA["Set found_better=True, t=1, restart_count++"];
    AA --> Connector_O([To Part 1: Outer Loop Check via Break]); %% Connects back to F via O
    Z_CheckOpp -- No --> Connector_H_Next([To Part 2: Next Neighbor]); %% Retraining finished, continue inner loop

    %% Inner Loop Finished - Level Control Logic
    Connector_BB([From Part 2: End Inner Loop]) --> CC{"Found better solution at level t?"};
    CC -- Yes --> Connector_F2([To Part 1: Outer Loop Check]); %% Restart checks from t=1
    CC -- No --> DD["Increment t"];
    DD --> Connector_F3([To Part 1: Outer Loop Check]); %% Continue outer loop with next t

def local_search_predict(
    x: List[int],
    w: float,
    v_star: np.ndarray,
    clf: xgb.XGBClassifier,
    obj_func_params: Dict[str, Any],
    size: int = 2,
    restarts: int = 3,
    check_proba_threshold: float = 0.7,
    retrain_on: Literal['both', 'fp', 'fn', 'none'] = 'fp'
) -> Tuple[List[int], xgb.XGBClassifier]:
    """
    Performs local search guided by an XGBClassifier, minimizing expensive
    objective calls. Verifies prediction=0 if P(class=0) is below threshold.
    Updates the classifier incrementally when specified mispredictions occur.
    Uses logging instead of print statements. T is inferred from len(x).

    Args:
        x (List[int]): Starting point.
        w (float): Weight for combining objectives.
        v_star (np.ndarray): Current best overall solution (used for guidance).
        clf (xgb.XGBClassifier): Pre-trained XGBoost Classifier.
        obj_func_params (Dict[str, Any]): Parameters for objective function.
        size (int, optional): Max neighborhood size. Defaults to 2.
        restarts (int, optional): Max restarts. Defaults to 3.
        check_proba_threshold (float, optional): Threshold for P(class=0) verification. Defaults to 0.7. # Corrected default in comment
        retrain_on (Literal['both', 'fp', 'fn', 'none'], optional):
            Specifies when to trigger retraining based on misprediction type:
            - 'both': Retrain on False Positives (P=1, A=0) and False Negatives (P=0, A=1).
            - 'fp': Retrain only on False Positives. (Default) # Corrected default in comment
            - 'fn': Retrain only on False Negatives.
            - 'none': Never retrain based on mispredictions.
            Defaults to 'fp'.

    Returns:
        Tuple[List[int], xgb.XGBClassifier]: Best solution found and potentially updated classifier.
    """
    # --- Input Validation ---
    # Check if clf appears fitted (basic check)
    if not hasattr(clf, 'classes_') or not hasattr(clf, 'n_features_in_'):
         logger.warning("Classifier 'clf' may not be fitted. Proceeding with caution.")
         # Depending on strictness, you might raise an error here instead.
         # raise ValueError("Classifier 'clf' must be fitted before use.")

    if not x:
        logger.error("Input schedule x cannot be empty (length must be positive).")
        raise ValueError("Input schedule x cannot be empty (length must be positive).")

    allowed_retrain_values = {'both', 'fp', 'fn', 'none'}
    if retrain_on not in allowed_retrain_values:
        logger.error("Invalid value for 'retrain_on': %s. Must be one of %s", retrain_on, allowed_retrain_values)
        raise ValueError(f"Invalid value for 'retrain_on'. Must be one of {allowed_retrain_values}")

    # --- Initialization ---
    x_star = list(x) # Work with a copy
    T = len(x_star) # Infer T from the length - calculated once initially
    restart_count = 0
    t = 1 # Start with neighborhood size 1

    # Calculate initial cost
    try:
        logger.info("Calculating initial cost...")
        objectives_star = calculate_objective_serv_time_lookup(x_star, **obj_func_params)
        cost_star = w * objectives_star[0] + (1 - w) * objectives_star[1]
        logger.info("Initial solution cost: %.4f", cost_star)
    except Exception as e:
        logger.exception("Error calculating initial cost: %s", e)
        return x_star, clf # Return current best and original classifier on error

    # --- Main Search Loop ---
    while t <= size and restart_count < restarts:
        logger.info("--- Running local search level t=%d (Restart %d/%d) ---", t, restart_count + 1, restarts)

        ids_gen_iterable = powerset(range(T), t) # Use current T
        # Pass x_star (current best) to neighborhood generation
        neighborhood_iter = get_neighborhood(x_star, v_star, ids_gen_iterable)

        found_better_solution_at_level_t = False
        neighbors_data_at_level_t: List[Dict[str, Any]] = [] # Store data for potential retraining
        neighbors_processed_count = 0

        for neighbor_np in neighborhood_iter:
            neighbors_processed_count += 1
            neighbor = neighbor_np.tolist() # Convert numpy array to list
            neighbor_info = {"schedule": neighbor, "cost": None, "true_label": None, "prediction": None}
            neighbors_data_at_level_t.append(neighbor_info) # Add neighbor info early

            # --- Feature Creation ---
            # Feature is concatenation - ensure this matches how clf was trained
            feature_pair = x_star + neighbor

            # --- 1. Predict using the CHEAP classifier ---
            prediction = 0 # Default prediction
            proba_class_0 = 1.0 # Default probability
            try:
                # Reshape feature_pair for XGBoost if needed (expects 2D array)
                feature_pair_np = np.array(feature_pair).reshape(1, -1)
                prediction = clf.predict(feature_pair_np)[0]
                proba = clf.predict_proba(feature_pair_np)[0]
                # Ensure proba has expected structure (e.g., 2 elements for binary class)
                if len(proba) > 0:
                   proba_class_0 = proba[0] # Probability of class 0
                else:
                   logger.warning("Predict_proba returned unexpected structure: %s. Using default P(0)=1.0", proba)
            except Exception as e:
                logger.warning("Error predicting for neighbor %d: %s. Assuming prediction=0.", neighbors_processed_count, e)
                # Keep default prediction=0, proba_class_0=1.0

            neighbor_info["prediction"] = prediction # Store prediction
            logger.debug("  Neighbor %d: Predicted=%d (P(0)=%.3f)", neighbors_processed_count, prediction, proba_class_0)

            # --- 2. Decide whether to perform expensive check ---
            perform_expensive_check = False
            check_reason = ""

            if prediction == 1:
                perform_expensive_check = True
                check_reason = "Predicted 1"
            elif proba_class_0 < check_proba_threshold:
                perform_expensive_check = True
                check_reason = f"Borderline P(0) < {check_proba_threshold:.3f}"
            else: # prediction == 0 and proba_class_0 >= threshold
                logger.debug("  -> Skipping objective function call (Confident P=0).")

            # --- 3. Perform EXPENSIVE check if needed ---
            if perform_expensive_check:
                logger.debug("  -> Verifying (%s)...", check_reason)
                try:
                    objectives_neighbor = calculate_objective_serv_time_lookup(neighbor, **obj_func_params)
                    cost_neighbor = w * objectives_neighbor[0] + (1 - w) * objectives_neighbor[1]
                    is_truly_better = cost_neighbor < cost_star
                    true_label = 1 if is_truly_better else 0

                    # Store results in neighbor_info
                    neighbor_info["cost"] = cost_neighbor
                    neighbor_info["true_label"] = true_label

                    logger.debug("      True Cost=%.4f (Current Best=%.4f) -> Actual Better=%s",
                                 cost_neighbor, cost_star, is_truly_better)

                    # --- 4. Check for Misprediction and Trigger Retraining (Conditional) ---
                    misprediction = (prediction != true_label)
                    trigger_retraining = False
                    opportunistic_update_occurred = False # Reset for this neighbor check

                    if misprediction and retrain_on != 'none':
                        misprediction_type = ""
                        should_retrain_this_type = False

                        if prediction == 1 and not is_truly_better: # False Positive (P=1, A=0)
                            misprediction_type = "False Positive (P=1, A=0)"
                            should_retrain_this_type = retrain_on in ['both', 'fp']
                        elif prediction == 0 and is_truly_better: # False Negative (P=0, A=1)
                            misprediction_type = "False Negative (P=0, A=1)"
                            should_retrain_this_type = retrain_on in ['both', 'fn']

                        if should_retrain_this_type:
                            logger.warning("      Misprediction! (%s). Triggering retraining process based on 'retrain_on=%s'.",
                                           misprediction_type, retrain_on)
                            trigger_retraining = True
                        elif misprediction_type: # Misprediction occurred but not the type we retrain on
                             logger.info("      Misprediction occurred (%s), but retraining is disabled for this type ('retrain_on=%s').",
                                         misprediction_type, retrain_on)

                    # --- Retraining Sub-routine (if triggered) ---
                    if trigger_retraining:
                        features_for_update: List[List[int]] = []
                        labels_for_update: List[int] = []
                        best_opportunistic_neighbor = None
                        best_opportunistic_cost = cost_star # Initialize with current best cost

                        logger.info("      Calculating true costs for %d neighbors at level %d for retraining...",
                                    len(neighbors_data_at_level_t), t)

                        for n_idx, n_info in enumerate(neighbors_data_at_level_t):
                            n_schedule = n_info["schedule"]
                            n_cost = n_info["cost"]
                            n_true_label = n_info["true_label"]

                            # Calculate cost if not already done (e.g., for neighbors skipped earlier)
                            if n_cost is None or n_true_label is None:
                                try:
                                    logger.debug("          Calculating missing cost for neighbor %d...", n_idx+1)
                                    n_objectives = calculate_objective_serv_time_lookup(n_schedule, **obj_func_params)
                                    n_cost = w * n_objectives[0] + (1 - w) * n_objectives[1]
                                    n_is_better = n_cost < cost_star
                                    n_true_label = 1 if n_is_better else 0
                                    n_info["cost"] = n_cost # Update info cache
                                    n_info["true_label"] = n_true_label
                                except Exception as e:
                                    logger.warning("          Error calculating cost for neighbor %d (%s) during retraining: %s. Skipping.",
                                                   n_idx+1, n_schedule, e)
                                    continue # Skip this neighbor for training data

                            # Prepare data for fitting
                            n_feature_pair = x_star + n_schedule # Create feature pair for this neighbor
                            features_for_update.append(n_feature_pair)
                            labels_for_update.append(n_true_label)
                            logger.debug("          Neighbor %d: Cost=%.4f, True Label=%d (Used for training)",
                                         n_idx+1, n_cost, n_true_label)

                            # Check for opportunistic update (find the best neighbor *among those evaluated*)
                            if n_true_label == 1 and n_cost < best_opportunistic_cost:
                                logger.info("          Opportunistic Update Candidate! Found/Confirmed better neighbor (%d) during cost calculation.", n_idx+1)
                                best_opportunistic_neighbor = list(n_schedule) # Store a copy of the schedule
                                best_opportunistic_cost = n_cost # Update best cost found *during retraining*
                                opportunistic_update_occurred = True


                        # Perform incremental fit if data was gathered
                        if features_for_update:
                            logger.info("      Fitting model incrementally with %d data points...", len(labels_for_update))
                            try:
                                X_update = np.array(features_for_update) # Convert list of lists to 2D numpy array
                                y_update = np.array(labels_for_update)

                                # Ensure clf is fitted before incremental update if it's the first time
                                # XGBoost's fit with xgb_model handles this correctly.
                                clf.fit(X_update, y_update, xgb_model=clf.get_booster()) # Pass the existing booster
                                logger.info("      Model update complete.")

                            except Exception as e:
                                logger.exception("      Error during incremental model update: %s", e)
                        else:
                            logger.warning("      No valid data gathered for retraining.")

                        # If an opportunistic update was found, apply it now
                        if opportunistic_update_occurred:
                             logger.info(f"      Applying opportunistic update. New best: {best_opportunistic_neighbor} with cost = {best_opportunistic_cost:.4f}.")
                             x_star = best_opportunistic_neighbor # Use the best one found (already a list)
                             cost_star = best_opportunistic_cost
                             T = len(x_star) # Update T as length might have changed
                        # --- End of Retraining Sub-routine ---

                    # --- 5. Handle Updates & Loop Control ---
                    # Check if we should update x_star and restart the search level
                    if opportunistic_update_occurred:
                        found_better_solution_at_level_t = True # Mark improvement found
                        t = 1 # Reset level
                        restart_count += 1
                        logger.info("      Restarting search from t=1 due to opportunistic update during retraining. Restart count: %d", restart_count)
                        break # Exit inner loop (for neighbor_np in neighborhood_iter)

                    elif is_truly_better: # True Positive or handled False Negative (update to the current neighbor)
                        logger.info(f"      Confirmed better solution (or handled FN). Updating x_star to {neighbor} with cost = {cost_neighbor:.4f}.")
                        # CORRECTED: Assign neighbor directly as it's already a list
                        x_star = neighbor
                        cost_star = cost_neighbor
                        T = len(x_star) # Update T as length might have changed
                        found_better_solution_at_level_t = True
                        t = 1 # Reset level
                        restart_count += 1
                        logger.info("      Restarting search from t=1. Restart count: %d", restart_count)
                        break # Exit inner loop (for neighbor_np in neighborhood_iter)

                    # else: (Not truly better and no opportunistic update) -> continue to next neighbor implicitly

                except Exception as e:
                    logger.warning("  Error calculating objective or handling result for neighbor %d (%s): %s.",
                                   neighbors_processed_count, neighbor, e)
            # --- End of 'if perform_expensive_check:' ---

        # --- End of neighbor loop (for neighbor_np in neighborhood_iter) ---

        # If we finished the loop for level t without finding a better solution (or breaking early)
        if not found_better_solution_at_level_t:
            if neighbors_processed_count > 0:
                logger.info("No improving solution found or confirmed at level t=%d.", t)
            else:
                logger.info("No neighbors generated or processed at level t=%d.", t)
            t += 1 # Move to the next neighborhood size level

    # --- End of outer while loop ---
    logger.info("Local search finished after %d restarts or reaching max size %d.", restart_count, size)
    logger.info("Final solution: %s", x_star)
    logger.info("Final cost: %.4f", cost_star)

    return x_star, clf

from functions import calculate_objective_serv_time_lookup
start = time.time()
# Define the path to the saved model
model_path = "models/classifier_large_instance.json" # Make sure this path is correct

with open("best_trial_params.json", "r") as f:
    best_trial_params = json.load(f)

clf = xgb.XGBClassifier(
    tree_method="hist",
    max_depth=best_trial_params["max_depth"],
    min_child_weight=best_trial_params["min_child_weight"],
    gamma=best_trial_params["gamma"],
    subsample=best_trial_params["subsample"],
    colsample_bytree=best_trial_params["colsample_bytree"],
    learning_rate=best_trial_params["learning_rate"],
    n_estimators=best_trial_params["n_estimators"],
)

# Load the model directly from the file path
clf.load_model(model_path)

intial_objectives = calculate_objective_serv_time_lookup(x, d, convolutions)
initial_c_star = w * intial_objectives[0] + (1 - w) * intial_objectives[1]
x_star = local_search_predict(x, w, v_star, clf, {'d': d, 'convolutions': convolutions}, size=T, restarts=T)[0]
final_objectives = calculate_objective_serv_time_lookup(x_star, d, convolutions)
final_c_star = w * final_objectives[0] + (1 - w) * final_objectives[1]
end = time.time()
print(f"\nInitial schedule: {x}, with objective value: {initial_c_star}.\nFinal schedule: {x_star}, with objective value: {final_c_star}. Search time {end - start:.2f} seconds.")


Initial schedule: [2, 2, 0, 2, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3], with objective value: 24.028625804839624.
Final schedule: [2, 2, 0, 2, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3], with objective value: 24.028625804839624. Search time 830.96 seconds.

5.5.4 Run the conventional local search algorithm for validation

We will run a conventional local search algorithm to evaluate the new method, assessing both the quality of the results and its computational efficiency.

from functions import local_search
# Computing optimal solution with real cost
print(f"Initial schedule: {x}")
start = time.time()
test_x = local_search(x, d, convolutions, w, v_star, T, echo=True)
end = time.time()

Initial schedule: [2, 2, 0, 2, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3]
Initial solution: [2 2 0 2 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 3], cost: 24.028625804839624
Running local search with switching 1 patient(s)
Size of neighborhood: 17
Running local search with switching 2 patient(s)
Size of neighborhood: 139
Running local search with switching 3 patient(s)
Size of neighborhood: 728
Running local search with switching 4 patient(s)
Size of neighborhood: 2743
Running local search with switching 5 patient(s)
Size of neighborhood: 7913
Running local search with switching 6 patient(s)
Size of neighborhood: 18152
Running local search with switching 7 patient(s)
Size of neighborhood: 33931
Running local search with switching 8 patient(s)
Size of neighborhood: 52520
Running local search with switching 9 patient(s)
Size of neighborhood: 68003
Running local search with switching 10 patient(s)
Size of neighborhood: 74074
Running local search with switching 11 patient(s)
Size of neighborhood: 68003
Running local search with switching 12 patient(s)
Size of neighborhood: 52520
Running local search with switching 13 patient(s)
Size of neighborhood: 33931
Found better solution: [2 2 0 2 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2], cost: 23.443089273560844
Running local search with switching 1 patient(s)
Size of neighborhood: 18
Found better solution: [1 2 0 2 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3], cost: 23.290731519864124
Running local search with switching 1 patient(s)
Size of neighborhood: 18
Found better solution: [1 2 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3], cost: 23.17349058000479
Running local search with switching 1 patient(s)
Size of neighborhood: 19
Found better solution: [2 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3], cost: 23.049992753251537
Running local search with switching 1 patient(s)
Size of neighborhood: 19
Found better solution: [2 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 3], cost: 22.966437323340017
Running local search with switching 1 patient(s)
Size of neighborhood: 19
Running local search with switching 2 patient(s)
Size of neighborhood: 172
Running local search with switching 3 patient(s)
Size of neighborhood: 987
Running local search with switching 4 patient(s)
Size of neighborhood: 4029
Running local search with switching 5 patient(s)
Size of neighborhood: 12444
Running local search with switching 6 patient(s)
Size of neighborhood: 30192
Running local search with switching 7 patient(s)
Size of neighborhood: 58956
Running local search with switching 8 patient(s)
Size of neighborhood: 94146
Running local search with switching 9 patient(s)
Size of neighborhood: 124202
Running local search with switching 10 patient(s)
Size of neighborhood: 136136
Running local search with switching 11 patient(s)
Size of neighborhood: 124202
Running local search with switching 12 patient(s)
Size of neighborhood: 94146
Running local search with switching 13 patient(s)
Size of neighborhood: 58956
Running local search with switching 14 patient(s)
Size of neighborhood: 30192
Found better solution: [2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2], cost: 22.627834923300007
Running local search with switching 1 patient(s)
Size of neighborhood: 20
Running local search with switching 2 patient(s)
Size of neighborhood: 190
Running local search with switching 3 patient(s)
Size of neighborhood: 1140
Running local search with switching 4 patient(s)
Size of neighborhood: 4845
Running local search with switching 5 patient(s)
Size of neighborhood: 15504
Running local search with switching 6 patient(s)
Size of neighborhood: 38760
Running local search with switching 7 patient(s)
Size of neighborhood: 77520
Running local search with switching 8 patient(s)
Size of neighborhood: 125970
Running local search with switching 9 patient(s)
Size of neighborhood: 167960
Running local search with switching 10 patient(s)
Size of neighborhood: 184756
Running local search with switching 11 patient(s)
Size of neighborhood: 167960
Running local search with switching 12 patient(s)
Size of neighborhood: 125970
Running local search with switching 13 patient(s)
Size of neighborhood: 77520
Running local search with switching 14 patient(s)
Size of neighborhood: 38760
Running local search with switching 15 patient(s)
Size of neighborhood: 15504
Running local search with switching 16 patient(s)
Size of neighborhood: 4845
Running local search with switching 17 patient(s)
Size of neighborhood: 1140
Running local search with switching 18 patient(s)
Size of neighborhood: 190
Running local search with switching 19 patient(s)
Size of neighborhood: 20

print(f"Initial schedule: {x}\nFinal schedule: {test_x[0]}\nDifference: {test_x[0] - x}\nObjective value: {test_x[1]}. Search time: {end - start:.2f} seconds.")
test_res = calculate_objective_serv_time_lookup(test_x[0], d, convolutions)

Initial schedule: [2, 2, 0, 2, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3]
Final schedule: [2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2]
Difference: [ 0 -1  1 -1  1  0  1  0  0  0  0  0  0  0  0  0  0  0  0 -1]
Objective value: 22.627834923300007. Search time: 968.40 seconds.

5.6 Discussion

5.7 Timeline

This experiment was started on 01-04-2025 and concluded on 17-04-2025