Machined Learnings: Constrained Cost-Sensitive: Regression Reduction Analysis

In a previous post I introduced two versions of constrained cost-sensitive multiclass classification (CSMC): average, with regret \[ r_{av} (h) = E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ E_{c \sim D_{c|\omega,x}} \left[ c (h (x, \omega)) - \min_{k \in K}\; E_{c \sim D_{c|\omega,x}} \left[ c (k) \right] \right] \right], \] and minimax, with regret \[ r_{mm} (h) = E_{x \sim D_x} \left[ E_{\tilde c \sim D_{\tilde c|x}} \left[ \max_{\omega \in \mathcal{P} (K)} \left\{ c (h (x, \omega)) - \min_{k \in K}\; E_{\tilde c \sim D_{\tilde c|x}} \left[ c (k) \right] \right\} \right] \right]. \] Now I'm going to state two results that are not that surprising.

Result:Average CSMC Regression Regret

For all average constrained CSMC distributions $D$, and all cost-sensitive classifiers $h_g$ derived from a regressor $g$, \[ r_{av} (h_g) \leq \sqrt{2 |K| \epsilon_{L^2} (g)}, \] where $\epsilon_{L^2} (g)$ is the $L^2$ loss on the underlying regression problem.

Proof: See Average Analysis below.

Result:Minimax CSMC Regression Regret

For all minimax constrained CSMC distributions $D$, and all cost-sensitive classifiers $h_g$ derived from a regressor $g$, \[ r_{mm} (h_g) \leq \sqrt{2 |K| \epsilon_{L^2} (g)}, \] where $\epsilon_{L^2} (g)$ is the $L^2$ loss on the underlying regression problem.

Proof: See Minimax Analysis below.

Those bounds should look familiar: they are the same as in the unconstrained CSMC case. Together these results indicate that a regression reduction is truly indifferent to constraints, even constraints that are adversarially imposed at application time. It will be interesting to see whether other reductions of unconstrained CSMC have different properties for constrained CSMC.

Average Analysis

In this case, there is a distribution $D = D_x \times D_{\omega|x} \times D_{c|\omega,x}$, where $c: K \to \mathbf{R}$ takes values in the extended reals $\mathbf{R} = \mathbb{R} \cup \{ \infty \}$, and the components of $c$ which are $\infty$-valued for a particular instance are revealed as part of the problem instance via $\omega \in \mathcal{P} (K)$ (i.e., $\omega$ is a subset of $K$). The regret of a particular classifier $h: X \times \mathcal{P} (K) \to K$ is given by \[ r_{av} (h) = E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ E_{c \sim D_{c|\omega,x}} \left[ c (h (x, \omega)) - \min_{k \in K}\; E_{c \sim D_{c|\omega,x}} \left[ c (k) \right] \right] \right]. \] An argmax regression strategy to solve cost-sensitive multiclass classifier is a function $g: X \times \mathcal{P} (K) \times K \to \mathbf{R}$ which defines an associated cost-sensitive multiclass classifier $h_g: X \times \mathcal{P} (K) \to K$ according to \[ h_g (x, \omega) = \underset{k \in K}{\operatorname{arg\,min\;}} g (x, \omega, k). \] I would like to bound $r_{av} (h_g)$ in terms of the regret of $g$ on the regression problem, \[ \epsilon_{L} (g) = q_{L} (g) - \min_g\; q_{L} (g), \] where $q$ is the error on the regression problem \[ q_{L} (g) = E_{(x, \omega, c) \sim D} \left[ \frac{1}{|K|} \sum_{k \in K} L \bigl (g (x, \omega, k), c (k) \bigr) \right], \] and $L$ is a loss function for the regression problem (defined on the extended reals). I'll focus on $L^2$ loss for the regressor defined on the extended reals via $L^2 (\infty, \infty) = 0$, $L^2 (\infty, \cdot) = \infty$, and $L^2 (\cdot, \infty) = \infty$.

Consider a single instance $(x, \omega)$ with associated conditional per-instance cost-vector distribution $D_{c|\omega,x}$, and suppose our regressor has cost estimates which differ from a minimum error regressor's estimate $c^* (x, \omega, k)$ by $\delta (x, \omega, k)$, \[ g (x, \omega, k) = c^* (x, \omega, k) + \delta (x, \omega, k). \] For $k \in \omega$, $\delta (x, \omega, k) = 0$ since both $c^* (x, \omega, k)$ and our regressor $g (x, \omega, k)$ will be $\infty$. The associated classifier $h_g$ is \[ h_g (x, \omega) = \underset{k \in K}{\operatorname{arg\,min\,}} \bigl( c^* (x, \omega, k) + \delta (x, \omega, k) \bigr). \] Imagine an adversary which attempts to create a certain amount of CSMC regret on this instance while minimizing the amount of regression regret on this instance. This adversary is faced with the following family of problems indexed by $k^{**} \in K \setminus \omega$: \[ \begin{aligned} &\min_{\delta}\; \sum_{k \in K} \delta (x, \omega, k)^2 \\ \mathrm{s.t.} \; \forall k \neq k^{**}, \; & c^* (x, \omega, k^{**}) + \delta (x, \omega, k^{**}) \leq c^* (x, \omega, k) + \delta (x, \omega, k). \end{aligned} \] This is the same as the unconstrained CSMC reduction to regression but with $k^{**}$ restricted to the set $K \setminus \omega$. When $|K \setminus \omega| \leq 1$, the CSMC regret is zero; otherwise the adversary's strategy is unchanged: perturb the $k^*$ and $k^{**}$ estimates and leave others alone. Thus leveraging the previous analysis yields \[ r_{av} (h_g) \leq \sqrt{2 |K| \epsilon_{L^2} (g)}. \] It should also be noted that the regression regret will be at most the regression regret in the unconstrained case, since the additional information contained in $\omega$ allows the regressor to have a perfect estimate for some values of $k$.

Minimax Analysis

In this case, there is a distribution $D = D_x \times D_{\tilde c|x}$, where $\tilde c: K \to \mathbb{R}$ takes values in the regular reals $\mathbb{R}$. Then, an adversary comes in and manufactures a cost vector $c$ in the extended reals $\mathbf{R}$ by setting some of the components to $\infty$; these choices are revealed via $\omega$ prior to a decision being elicited. In this case the regret of a particular classifier is given by \[ r_{mm} (h) = E_{x \sim D_x} \left[ E_{\tilde c \sim D_{\tilde c|x}} \left[ \max_{\omega \in \mathcal{P} (K)} \left\{ c (h (x, \omega)) - \min_{k \in K}\; E_{\tilde c \sim D_{\tilde c|x}} \left[ c (k) \right] \right\} \right] \right]. \] Consider a single instance $x$ with associated conditional per-instance cost-vector distribution $D_{c|x}$; in addition the adversary can pick $\omega$ to construct the complete problem instance $(x, \omega)$. As above the regressor has cost estimates which differ from a minimum error regressor's estimate $c^* (x, \omega, k)$ by $\delta (x, \omega, k)$ and when $k \in \omega$ the estimates are perfect, i.e., $\delta (x, \omega, k) = 0$.

Imagine an adversary which attempts to create a certain amount of CSMC regret on this instance while minimizing the amount of regression regret on this instance. This adversary is faced with the following family of problems indexed by $\omega$ and $k^{**} \in K \setminus \omega$: \[ \begin{aligned} &\min_{\delta}\; \sum_{k \in K} \delta (x, \omega, k)^2 \\ \mathrm{s.t.} \; \forall k \neq k^{**}, \; & c^* (x, \omega, k^{**}) + \delta (x, \omega, k^{**}) \leq c^* (x, \omega, k) + \delta (x, \omega, k). \end{aligned} \] Again when $|K \setminus \omega| \leq 1$, the CSMC regret is zero; otherwise the adversary's strategy is the same for each $\omega$ and the leveraging the previous analysis yields \[ r_{mm} (h_g) \leq \sqrt{2 |K| \epsilon_{L^2} (h_g)}. \]

Machined Learnings

Wednesday, September 1, 2010

Constrained Cost-Sensitive: Regression Reduction Analysis

Average Analysis

Minimax Analysis

No comments:

Post a Comment