Markov Decision Processes - Week 4

Author

LNMB

Homework Assigmment 1.6 p.11 Lecture notes

Exercise 1.6 (Airline Overbooking)

An airline seeks a reservation policy for a flight with \(S\) seats that maximizes its expected profit from the flight. Reservation requests arrive hourly according to a Bernoulli process with \(p\) being the probability of a reservation request per hour (at most one reservation request will arrive per hour). A passenger with a booked reservation pays the fare \(f > 0\) at flight time. If \(b \geq 0\) passengers with booked reservations are denied boarding at flight time, they do not pay the fare, and the airline pays them a penalty \(c(b)\) (divided among them) where \(b \mapsto c(b)\) is increasing with \(c(0) = 0\).

Consider the \(n\)-th hour before flight time \(T\). At the beginning of the hour, the airline reviews the number of booked reservations on hand, \(r\) say, and decides whether to book (accept) or decline a reservation request arriving during the next hour. Each of the \(r\) booked reservations may cancel during the hour, independently of each other, with probability \(q\).

For this reason, the airline is considering the possibility of overbooking the flight to compensate for cancellations. Let \(V^*_n(r)\) be the maximum expected future profit when there are \(r\) booked reservations at the beginning of the hour, before the accept/decline decision has been taken, and reservation requests and cancellations during the hour have occurred. Let \(W^*_n(r)\) be the maximum expected future profit when there are \(r\) booked reservations after booking or declining a reservation request, but before cancellations. The aim is to determine an optimal reservation policy for any value of the number of booked reservations at the beginning of each hour till the flight time \(T\).

a) Markov Decision Model

Formulate the problem as a Markov decision model, by determining the state space, action spaces, rewards, terminal rewards, and the transition probabilities. Formulate the optimality equation from which an optimal reservation policy can be determined.

b) Optimality of Booking-Limit Policies

Assume, as can be shown, that if \(g\) is a quasiconcave function on the integers, then \(r \mapsto \mathbb{E}(g(B_r))\) is quasiconcave, where \(B_r\) is a sum of independent identically distributed Bernoulli random variables. We recall that \(g\) is quasiconcave on the (positive) integers when there exists a number \(a\) such that \(g\) is increasing on \([0, a]\) and decreasing on \([a, \infty]\).

Use this result to show the following facts. First, show that \(r \mapsto W^*_n(r)\) is quasiconcave.

Let \(b_n = \arg\max_r W^*_n(r)\). Call \(b_n\) the booking limit. Then show that \(r \mapsto V^*_n(r)\) is quasiconcave with maximum \(b_n\). Finally, show that it is optimal to accept a reservation if and only if \(r < b_n\), with \(r\) the number of booked reservations on hand at the beginning of the hour (before a decision has been taken).

PROOF that \(r \mapsto W^*_n(r)\) is quasiconcave for all \(n \leq T\)

We will use proof by complete induction and first proof that (1) that \(W_T^*(r)\) is quasi-concave. Then we will proof that (2) if \(W_n^*(r)\) is quasi concave, the same holds for \(W_{n-1}^*(r)\)

Step 1

First define the function \(g(\tilde{r} )\) as the profit function at the last epoch in which all reservations and cancellations have already occurred and there are \(\tilde{r}\) customers left in the system. Clearly \(g(\tilde{r} )\) is concave because for each excess customer, there is a penalty incurred. Therefore the optimal amount of customers is exactly the amount of seats available. The maximum profit is equal to the fare times the amount of seats: \(f \cdot S\). Hence there is a point (in this case \(S\)) where \(g(\tilde{r} )\) is increasing on \([0,S]\) and decreasing on \([S, \inf]\) and therefore is quasi-concave.

Now we consider \(W_T^*(r)\) which is the maximum expected profit in the last epoch, just before the last cancellations arrive. Clearly:

\(W_T^*(r) = E (g (r))\) where \(r\) is a sum of i.i.d. binomial variables. Hence \(W_T^*(r)\) is quasi concave.

Under optimal policy, we know from the Bellman equation that:

\[ W^*_{n-1}(r) = \max \{ p \cdot E[W^*_{n}(r+1)] + (1-p) E[W^*_{n}(r)] , E[W^*_{n}(r)] \} \]

Since we assumed that \(W^*_{n}(r)\) is quasi concave, we can conclude that \(E[W^*_{n}(r)]\) is also quasi concave since taking the expectation preserves quasi concavity.

Define \(h(r)\), the first term in the max function:

\[ h(r) = p \cdot E[W^*_{n}(r+1)] + (1-p) E[W^*_{n}(r)] \]

Now we have to show that \(\max \{ h(r), W^*_{n}(r) \}\) is quasi-concave (we drop the expectations since they do not influence quasi-concavity).

Because \(W^*_{n}(r)\) is assumed to be quasi concave it reaches a maximum when \(r = a\). Then we have the following:

So \(\max \{ h(a), W^*_n(a) \}\) reaches it maximum when \(r = a\) and we have shown that \(\max \{h(r), W^*_{n}(r) \}\) is increasing on \([0,a]\) and decreasing on \([a,\inf]\), hence \(W^*_{n-1}(r)\) is quasi concave. Because we have already proven that \(W_T^*(r)\) is quasi concave via complete induction we can now conclude that \(W^*_{n}(r)\) is quasi concave for \(n \leq T\).

c) Solving the Problem

Solve the problem when the parameters are as follows:

  • \(T = 30\)
  • \(c(b) = f \cdot b\)
  • \(S = 10\)
  • \(f = \euro{300}\)
  • \(p = 0.2\) and \(0.3\)
  • \(q = 0.05\) and \(0.10\)
  • \(r \leq 20\) (so there is an upper bound on the total number of reservations).

Make graphs of the different combinations. In each case, estimate the booking limit ten hours before flight time from your graphs. Discuss whether your graphs confirm the claim in (b) that \(r \mapsto V^*_n(r)\) is quasiconcave.

What conjectures do the graphs suggest about the optimal reservation policy and/or maximum expected reward and their variation with the various data elements? You will lose points on your conjectures only if your graphs are inconsistent with or do not support your conjectures, or if you don’t make enough interesting conjectures. The idea here is to brainstorm intelligently.

Solutions

a) Markov Decision Model

The state space is the number of booked reservations \(r \in \{0, 1,2, \dots,S,Z\}\) with \(Z = S+b\) at the beginning of each hour.

The transition probability \(T\) from state \(r\) to state \(r'\) is the probability that \(r'\) booked reservations remain after the hour, given that there are \(r\) booked reservations at the beginning of the hour. It can be divided up in two separate transition probabilities \(C\) (cancellation) and \(O\) (order) with:

\[ C =\begin{array}{c|ccccc} & 0 & 1 & 2 & 3 & 4 & \dots & S & \dots & Z-1\\ \hline 0 & 1 & & & & \\ 1 & q & 1-q & & \\ 2 & q^2 &2q(1-q) & (1-q)^2 & & \\ 3 & q^3 & 3q^2(1-q) & 3q(1-q)^2 & (1-q)^3 & \\ \vdots \\ S & q^S & Sq^{S-1}(1-q) & \binom{S}{2}q^{S-2}(1-q)^2 & \binom{S}{3}q^{S-3}(1-q)^3 & \dots & \dots & (1-q)^S \\ \vdots \\ Z-1 & q^{Z-1} & {(Z-1)}q^{Z-2}(1-q) & \binom{Z-1}{2}q^{Z-3}(1-q)^2 & \binom{Z-1}{3}q^{Z-4}(1-q)^3 & \dots & \dots & {(Z-1)}q^{Z-1 - S}(1-q)^S & \dots & {(Z-1)}(1-q)^{Z-1} \\ \end{array} \] ,

\[ O =\begin{array}{c|ccccc} & 0 & 1 & 2 & 3 & 4 & \dots & S & S+1 & \dots & Z-1 & Z\\ \hline 0 & 1-p & p \\ 1 & & 1-p & p \\ 2 & & & 1-p & p \\ 3 & & & & 1-p & p \\ \vdots \\ S & & & & & & & 1-p & p\\ \vdots \\ Z-1 & & & & & & & & & & 1-p & p\\ \end{array} \]

, and

\[ T = C \circ O. \]

The action set is the decision to decline or accept a reservation request.

\[ A = \{decline,accept\}, \]

and the reward function is the expected profit or loss from the combined bookings and cancellations depending on the action taken at time \(n\). The reward function is given by:

\[ R_n(r,a) = \begin{cases} 0 & \text{if } n < T \\ fr & \text{if } n=T, r \leq S \\ fS - c(r - S) & \text{if } n=T, r > S \\ \end{cases} \]

b) Optimality of Booking-Limit Policies

\[ rW_n^* = \begin{cases} f\mathbb{E}(r) & \forall \ r \leq S \\ fS - c(\mathbb{E}(r) - S) & \forall \ r > S \\ \end{cases} \]

In any state \(r\) is either a set of