## Wednesday, May 16, 2012

### Active Minimax Forecasting

This is a continuation of my previous post where I noted that the Minimax Forecaster is tantalizing from an active learning perspective. Fortunately I noticed a paper by Abernethy et. al. called A Stochastic View of Optimal Regret through Minimax Duality. In this paper the authors unwrap online convex optimization in a general fashion by leveraging von Neumann's minimax theorem. By doing this they derive a general formula for the value of an online game to the adversary. The intuition of the previous post is that differences in game value between observing and not observing a particular outcome will be key to making active learning decisions in an adversarial setting, so this formula is very interesting.

Abernethy et. al. start out with a more general setup than the previous post, but I'll adapt their conventions to the previous post where there are differences. There is a game with $T$ rounds in it. There is a set $\mathcal{F}$ of experts. On each round, each expert $f$ produces a prediction $f_t$, player produces a prediction $p_t$, adversary simultaneously produces an outcome $y_t$, and player suffers an instantaneous loss $l (p_t, y_t)$. The experts are static (their predictions to do not depend upon previously observed outcomes), so essentially each expert is an sequence $f_{1:T}$. The player wants to generate a sequence of predictions $p_{1:T}$ which minimizes worst-case regret $\sup_{y_{1:T} \in \mathcal{Y}^T} L (p_{1:T}, y_{1:T}) - \inf_{f \in \mathcal{F}} L (f_{1:T}, y_{1:T}),$ where $L (p_{1:T}, y_{1:T}) = \sum_s l (p_s, y_s)$ is the total loss. (To ease notation supremums and infimums will capture the entire rest of the expression unless explicitly limited by parenthesis.) The Minimax Forecaster of the previous post is an explicit solution to a specific case of this problem, whereas Abernethy et. al. are concerned with characterizing such games in general. They consider the value of the game to the adversary under optimal play, \begin{aligned} R^* (\mathcal{F}) &= \inf_{p_1 \in \mathcal{P}} \sup_{y_1 \in \mathcal{Y}} \ldots \inf_{p_T \in \mathcal{P}} \sup_{y_T \in \mathcal{Y}} \sum_{t=1}^T l (p_t, y_t) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T l (f_t, y_t). \end{aligned} The amazing result is that this is the same as \begin{aligned} R^* (\mathcal{F}) &= \sup_{Y \in \mathbb{P} (\mathcal{Y}^T)} \mathbb{E}_{y_{1:T} \sim Y} \left[ \sum_{t=1}^T \inf_{p_t \in \mathcal{P}} \biggl( \mathbb{E}_{\tilde y_t} \left[ l (p_t, \tilde y_t) | y_{1:t-1} \right] \biggr) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T l (f_t, y_t) \right], \end{aligned} where the supremum is over distributions of outcome sequences $\mathbb{P} (\mathcal{Y}^T)$. In other words, this looks like a game played with an oblivious (non-stationary) environment, but played in the worst possible environment. This is a nifty result, and leads in subsequent work to interpolating between the IID and adversarial settings by constraining the supremum over sequence distributions.

Now with active learning, we make the adversary more powerful by choosing not to observe some of the outcomes (represented by the variable $z_s \in \{ 0, 1 \}$). I can upper bound the game value to the adversary as \begin{aligned} R^* (\mathcal{F} | z_{1:T}) &\leq \sup_{Y \in \mathbb{P} (\mathcal{Y}^T)} \mathbb{E}_{y_{1:T} \sim Y} \left[ \sum_{t=1}^T \inf_{p_t \in \mathcal{P}} \biggl( \mathbb{E}_{\tilde y_t} \left[ l (p_t, \tilde y_t) | \Omega_{t-1} \right] \biggr) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T l (f_t, y_t) \right], \end{aligned} where $\Omega_t = \{ y_s | s \leq t, z_s = 1 \}$ denotes observed outcomes. This is intuitively pleasing because the inner conditional expectation represents the player's knowledge. To derive the upper bound follow the procedure in Appendix A of the paper, after transforming the game value expression whenever $z_s = 0$ from $\inf_{p_1 \in \mathcal{P}} \sup_{y_1 \in \mathcal{Y}} \cdots \inf_{p_s \in \mathcal{P}} \sup_{y_s \in \mathcal{Y}} \cdots \inf_{p_T \in \mathcal{P}} \sup_{y_T \in \mathcal{Y}} \sum_{t=1}^T l (p_t, y_t) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T l (f_t, y_t),$ to $\inf_{p_1 \in \mathcal{P}} \sup_{y_1 \in \mathcal{Y}} \cdots \inf_{p_s \in \mathcal{P}} \cdots \inf_{p_T \in \mathcal{P}} \sup_{y_T \in \mathcal{Y}} \sup_{y_s \in \mathcal{Y}} \sum_{t=1}^T l (p_t, y_t) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T l (f_t, y_t),$ i.e., by letting the adversary defer selection of the unobserved values until the end of the game. This is just an upper bound because in reality the adversary has to chose the outcome for round $s$ on round $s$ so possibly I'm being too generous to the adversary.

#### Minimax Forecasting

If we design a Minimax Forecaster on the extensive form game associated with the bound, we get an algorithm which optimizes the bound. Consider the case where all outcomes are observed except for round $s$. Repeating the backward induction steps from the original minimax forecaster paper yields the expressions, \begin{aligned} p_t^* &= \frac{1}{2} \biggl( 1 - R^* (\mathcal{F}, y_{1:t-1}0 | z_s = 0) + R^* (\mathcal{F}, y_{1:t-1}1 | z_s = 0) \biggr). & (t > s) \end{aligned} and \begin{aligned} &R^* (\mathcal{F}, y_{1:t} | z_s = 0) \\ &= \frac{T - t}{2} + \mathbb{E}_{\sigma_{t+1:T}}\left[ \sup_{y_s \in \mathcal{Y}} |p_s - y_s| - \inf_{f \in \mathcal{F}} L (f_{1:T}, y_{1:t} \sigma_{t+1:T}) \right] & (t > s) \\ &= R^* (\mathcal{F}, y_{1:t}) \\ &\quad + \mathbb{E}_{\sigma_{t+1:T}}\left[ \left| p_s - \frac{1}{2} \left(1 + \inf_{f \in \mathcal{F}} \left( L (f_{1:T}, y_{1:s-1} 0 \sigma_{s+1:T}) \right) - \inf_{f \in \mathcal{F}} \left( L (f_{1:T}, y_{1:s-1} 1 \sigma_{s+1:T}) \right) \right) \right| \right]. \end{aligned} Thus the residual game value having played $p_s$ and not observed outcome $s$ is equal to the fully observed residual game value plus a penalty related to expected absolute loss averaged over all Rademacher distributed playouts. This implies the optimal choice of $p_s$ is a median \begin{aligned} p_s^* &= \mathop{\operatorname{median}} \left( \frac{1}{2} \left( 1 + \inf_{f \in \mathcal{F}} \left( L (f_{1:T}, y_{1:s-1} 0 \sigma_{s+1:T}) \right) - \inf_{f \in \mathcal{F}} \left( L (f_{1:T}, y_{1:s-1} 1 \sigma_{s+1:T}) \right) \right) \right). \end{aligned} In the case where there are no additional rounds of play between $s$ and $T$, there is only one point in the distribution so the median is that point, so the optimal play $p_s^*$ is the same as in the fully observed case. In the case where there is one additional round of play between $s$ and $T$, there are two points in the distribution so the mean is a median, and again the optimal play $p_s^*$ is the same as in the fully observed game (consistent with the previous blog post). In the case of more than one additional round of play between $s$ and $T$, the mean is not necessarily a median (i.e., does not necessary minimize expected absolute loss) so optimal play $p_s^*$ is different. Maybe this is why Mathematica pooped out'' when I tried to solve a longer version of the unobserved game.

Once we've hopped over'' the unobserved point and determined $p_s^*$, we can continue the backwards induction; but I think the interesting bit is the penalty term, which says what the value of observing the outcome on round $s$ given all previous and subsequent rounds will be observed, $\mathbb{E}_{\sigma_{t+1:T}}\left[ \left| p_s - \frac{1}{2} \left(1 + \inf_{f \in \mathcal{F}} \left( L (f_{1:T}, y_{1:s-1} 0 \sigma_{s+1:T}) \right) - \inf_{f \in \mathcal{F}} \left( L (f_{1:T}, y_{1:s-1} 1 \sigma_{s+1:T}) \right) \right) \right| \right].$ This penalty term would be important for deciding whether or not to query the outcome on round $s$. It'd be nice to generalize it to the case where some of the previous outcomes are not observed and also with respect to potentially unobserved future outcomes. Of course, planning the latter might be exponentially difficult.

For this to be practical, it would be nice if medians and expected absolute losses could be cheaply and accurately estimated, e.g., using random playouts. Also perhaps deciding whether to query the outcome needs to be done on a game value bound, e.g., assuming all subsequent observations will be observed rather than taking into account future decisions to observe the outcome. Furthermore, the technique is still fundamentally transductive since the expert predictions for the planning horizon $f_{1:T}$ need to be known. Even with all those caveats, however, there might be an interesting application for this, e.g., in recommendation systems.