Machined Learnings: Constrained CSMC

I'm beginning to detect the wisp of an inkling of a research program, paraphrased as follows: ``OR has spent 60+ years reducing decision problems to regression; now it is time to revisit all the problems they considered and instead reduce them to binary classification via cost-sensitive multiclass classification (CSMC)." Of course this begs the question, ``Why? We have solutions already. Isn't there something better to do?'' The serious response is ``for noisy problems, reduction to binary classification is empirically better at selecting the maximum value from a set of alternatives than reduction to regression; perhaps more complicated optimization problems will also show improved solutions in the presence of noise.'' My actual response is more like, ``I have some free time and this is a good way to build intuition.''

However there is already a cloud on the horizon: constraints. Here are two situations I've already encountered:

In SSP without recourse, the graph structure for a particular realization might not be fully connected. I want to ensure the underlying CSMC doesn't select an infeasible edge.
In the Ad Selection Problem, the main challenge was ensuring that the same ad was not picked repeatedly.

OR deals with constraints pervasively, and I suspect reduction to regression is convenient in this regard.

I've been wondering what the right formulation of constrained CSMC would be, and have thought of two scenarios. The first ``average formulation'' would be appropriate for when the set of feasible choices is related to the problem instances in a stationary fashion. In this case, there is a distribution

$D = D_x \times D_{\omega|x} \times D_{c|\omega,x}$ , where

$c: K \to \mathbf{R}$ takes values in the extended reals

$\mathbf{R} = \mathbb{R} \cup \{ \infty \}$ , and the components of

$c$ which are

$\infty$ -valued for a particular instance are revealed as part of the problem instance via

$\omega \in \mathcal{P} (K)$ (i.e.,

$\omega$ is a subset of

$K$ ). The regret of a particular classifier

$h: X \times \mathcal{P} (K) \to K$ is given by

$r_{av} (h) = E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ E_{c \sim D_{c|\omega,x}} \left[ c (h (x, \omega)) - \min_{k \in K}\; E_{c \sim D_{c|\omega,x}} \left[ c (k) \right] \right] \right].$ The second ``minimax formulation'' would be appropriate for when the set of feasible choices will vary in an ad-hoc fashion which is unpredictable from past data. In this case, there is a distribution

$D = D_x \times D_{\tilde c|x}$ , where

$\tilde c: K \to \mathbb{R}$ takes values in the regular reals

$\mathbb{R}$ . Then, an adversary comes in and manufactures a cost vector

$c: K \to \mathbf{R}$ in the extended reals

$\mathbf{R}$ by setting some of the components to

$\infty$ ; these choices are revealed via

$\omega$ prior to a decision being elicited. In this case the regret of a particular classifier is given by

$r_{mm} (h) = E_{x \sim D_x} \left[ E_{\tilde c \sim D_{\tilde c|x}} \left[ \max_{\omega \in \mathcal{P} (K)}\; \left\{ c (h (x, \omega)) - \min_{k \in K}\; E_{c \sim D_{\tilde c|x}} \left[ c (k) \right] \right\} \right] \right].$ Intuitively it feels like SSP without recourse would better fit the average formulation better, and ad serving would better fit the minimax formulation.

Now for some speculation. I suspect that the relative efficiency of the filter tree reduction of CSMC to binary is because it does not handle constraints well. In other words, reduction to regression is prepared to handle arbitrary constraints by limiting the scope of the argmax, but one pays for this additional flexibility with a worse regret bound. If there is a rule of thumb in machine learning, it's ``always solve the easiest version of your problem'', and there are lots of CSMC problems which are unconstrained (i.e., all decisions are allowed on all instances), so this additional flexibility is not warranted.

So the next step is to look at the various reductions of CSMC to regression or binary classification and analyze how well they perform on ``constrained CSMC''.

Machined Learnings

Tuesday, August 31, 2010

Constrained CSMC

No comments:

Post a Comment