The constrained CSBM definition is as follows. There is a distribution $D = D_x \times D_{\omega|x} \times D_{r|\omega,x}$, where $r: A \to [0, 1] \cup \{ -\infty \}$ takes values in the unit interval augmented with $-\infty$, and the components of $r$ which are $-\infty$-valued for a particular instance are revealed as part of the problem instance via $\omega \in \mathcal{P} (A)$ (i.e., $\omega$ is a subset of $A$). Allowed outputs in response to a problem instance are subsets of $A$ of size $m$, denoted \[ S_m = \{ S | S \subseteq A, |S| = m \}.\] The regret of a particular deterministic policy $h: X \times \mathcal{P} (A) \to S_m$ is given by \[ v (h) = E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ \max_{s \in S_m}\; E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in s} r (a) \right] - E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in h (x, \omega)} r (a) \right] \right]. \] Note when $|A \setminus \omega| < m$, any strategy achieves zero regret (via $-\infty$ reward); therefore the ``interesting'' parts of the problem space are when $|A \setminus \omega| \geq m$.

There are two plausible scenarios for CSBM with partial feedback.

- Only the total reward associated with the set of actions chosen is observed. There is actually a version of this problem at my current gig, since there is a page on the site whose elements are designed to act in concert to elicit a single response.
- The reward associated with each action chosen is observed. For instance advertisements are generally chosen in a set, but provide individualized feedback.

The reduction works as follows: first the highest reward choice is chosen, then its reward is adjusted to $-\infty$, and the process is repeated until a set of size $m$ has been achieved. The individual steps are posed as constrained CSMC with partial feedback (CSMC-PF) problems, which is essentially CSBM with $m = 1$. The forfeit filter-offset tree was designed for constrained CSMC-PF, and in particular can be used as the $\mbox{Learn}$ oracle below. The forfeit filter-offset tree has the property that it always achieves finite regret, i.e., it chooses a feasible class whenever possible. In this context, it means the subproblems will never create duplicates.

Algorithm:Partial Feedback Set Select Train

**Input:**Action labels $A$, (maximum) size of set to select $m \leq |A| / 2$.

**Input:**Constrained CSMC-PF classifier $\mbox{Learn}$.

**Data:**Training data set $S$.

**Result:**Trained classifiers $\{\Psi_n | n \in [1, m] \}$.

- Define $\gamma_0 (\cdot, \cdot) = \emptyset$.
- For each $n$ from 1 to $m$:
- $S_n = \emptyset$.
- For each example $\bigl(x, \omega, \mathcal{A}, \{ r (a) | a \in \mathcal{A} \}, p (\cdot | x, \omega) \bigr) \in S$ such that $|A \setminus \omega| \geq m$:
- Let $\gamma_{n-1} (x, \omega)$ be the predicted best set from the previous iteration.
- For each action $a$:
- If $a \in \gamma_{n-1} (x, \omega)$, $r (n, a) = -\infty$;
- else $r (n, a) = r (a)$.
- $S_n \leftarrow S_n \cup \left\{\bigl( x, \omega \cup \gamma_{n-1} (x), \mathcal{A}, \{ r (n, a) | a \in \mathcal{A} \}, p (\cdot | x, \omega) \bigr) \right\}$.
- Let $\Psi_n = \mbox{Learn} (S_n)$.
- Let $\gamma_n (x) = \Psi_n (x) \cup \gamma_{n-1} (x)$.
- Return $\{ \Psi_n | n \in [1, m] \}$.

**Comment:**If $m > |A|/2$, negate all finite rewards and choose complement of size $|A| - m$.

Algorithm:Set Select Test

**Data:**Class labels $A$, number of positions to populate $l \leq m \leq |A|/2$.

**Data:**Instance feature realization $(x, \omega)$.

**Data:**Trained classifiers $\{\Psi_n | n \in [1, m] \}$.

**Result:**Set $h^\Psi: X \to S_l$.

- $\gamma_0^\Psi (x, \omega) = \emptyset$.
- For $n$ from 1 to $l$:
- $\gamma_n^\Psi (x, \omega) = \gamma_{n-1}^\Psi (x, \omega) \cup \Psi_n (x, \omega \cup \gamma_{n-1}^\Psi (x, \omega))$.
- If $|\gamma_l^\Psi (x, \omega)| = l$, $h^\Psi (x, \omega) = \gamma_l^\Psi (x, \omega)$;
- else set $h^\Psi (x, \omega)$ to an arbitrary element of $S_l$.

**Comment:**If $l \geq m > |A|/2$, negate all finite costs, and choose complement of size $|A| - l$.

My goal is to bound the average constrained CSBM regret \[ v (h) = E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ \max_{s \in S_m}\; E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in s} r (a) \right] - E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in h (x, \omega)} r (a) \right] \right] \] in terms of the average constrained CSMC regret on the induced subproblems. Once again I'll leverage a trick from the filter tree derivation and collapse the multiple subproblems into a single subproblem by defining an induced distribution. Let $D$ be the distribution of average constrained CSBM instances $(x, \omega, r)$. Define the induced distribution $D^\prime (\Psi, l)$ where $l \leq m$ of constrained CSMC-PF instances $(x^\prime, \omega^\prime, \mathcal{A}, \{ r^\prime (a) | a \in \mathcal{A} \}, p^\prime (\cdot | x^\prime, \omega^\prime))$.

- Draw $(x, \omega, r)$ from $D$.
- Draw $n$ uniform on $[1, l]$.
- Let $x^\prime = (x, n)$.
- Let $\omega^\prime = \omega \cup \gamma_{n-1} (x, \omega)$.
- For each action $a$:
- If $a \in \gamma_{n-1} (x, \omega)$, $r^\prime (a) = -\infty$;
- else $r^\prime (a) = r (a)$.
- Let $p^\prime (\cdot | x^\prime, \omega^\prime) = p (\cdot | x, \omega)$.
- Create constrained CSMC-PF example $(x^\prime, \omega^\prime, \mathcal{A}, \{ r^\prime (a) | a \in \mathcal{A} \}, p^\prime (\cdot | x^\prime, \omega^\prime))$.

Theorem:Regret Bound

For all average constrained CSBM distributions $D$, and all average constrained CSMC classifiers $\Psi$, \[ v (h^\Psi) \leq l\, q (\Psi, l). \]

**Proof:**See Appendix.The remarks from the previous version of this reduction still apply.

The reduction still seems inefficient when comparing reduction to regression directly ($\sqrt{m} \sqrt{|A|} \sqrt{\epsilon_{L^2}}$) versus reduction to regression via CSMC ($m \sqrt{|A|} \sqrt{\epsilon_{L^2}}$). This suggests there is a way to reduce this problem which only leverages $\sqrt{m}$ CSMC subproblems. One possible source of inefficiency: the reduction is retrieving the elements in order, whereas the objective function is indifferent to order.

The regret bound indicates the following property: once I have trained to select sets of size $m$, I can get a regret bound for selecting sets of size $l$ for any $l \leq m$. This suggests a variant with $m = |A|$ could be used to reduce minimax constrained CSMC-PF to average constrained CSMC-PF. I'll explore that in a future blog post.

#### Appendix

This is the proof of the regret bound.

If $\Psi$ achieves infinite regret on the induced subproblem, the bound holds trivially. Thus consider a $\Psi$ that achieves finite regret.

If $|A \setminus \omega| < l$, then $v = 0$ for any choice in $S_l$, and the bound conditionally holds trivially. Thus consider $|A \setminus \omega| \geq l$: since $\Psi$ achieves finite regret no duplicates are generated from any sub-classifier and $h^\Psi (x, \omega) = \gamma^\Psi_l (x, \omega)$.

Consider a fixed $(x, \omega)$ with $|A \setminus \omega| \geq l$. It is convenient to talk about \[ v (h | x, \omega, n) = \max_{s \in S_m}\; E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in s} r (a) \right] - E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in \gamma^\Psi_n (x, \omega)} r (a) \right], \] the conditional regret on this instance at the $n^\mathrm{th}$ step in Partial Feedback Set Select Test. Let \[ s^* (x, \omega, n) = \underset{s \in S_m}{\operatorname{arg\,max\,}} E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in s} r (a) \right] \] be any maximizer of the first term (which is unique up to ties); note any $s^* (x, \omega, n)$ will select $n$ classes with respect to largest conditional expected reward.

Proof is by demonstrating the property $v (h^\Psi | x, \omega, n) \leq \sum_{r=1}^n q_r (\Psi, l)$. The property holds with equality for $n = 1$. For $n > 1$ note \[ \begin{aligned} v (h^\Psi | x, \omega, n) - v (h^\Psi | x, \omega, n - 1) &= \max_{a \in A \setminus s^* (x, \omega, n - 1)} E_{r \sim D_{r|\omega,x}} \left[ r (a) \right] \\ &\quad - E_{r \sim D_{r|\omega,x}} \left[ r \left(\Psi_n \left(x, \omega \cup \gamma^\Psi_{n-1} (x, \omega) \right) \right) \right], \\ &\leq \max_{a \in A \setminus \gamma^\Psi_{n-1} (x, \omega)} E_{r \sim D_{r|\omega,x}} \left[ r (a) \right] \\ &\quad - E_{r \sim D_{r|\omega,x}} \left[ r \left(\Psi_n \left(x, \omega \cup \gamma^\Psi_{n-1} (x, \omega) \right) \right) \right], \\\ &\leq \max_{a \in A \setminus \gamma^\Psi_{n-1} (x, \omega)} E_{r \sim D_{r|\omega,x}} \left[ \tilde r_n (a) \right] \\ &\quad - E_{r \sim D_{r|\omega,x}} \left[ \tilde r_n \left(\Psi_n \left(x, \omega \cup \gamma^\Psi_{n-1} (x, \omega) \right) \right) \right], \\ &= q_n (\Psi, l | x, \omega), \end{aligned} \] where the second inequality is due to the optimality of $s^* (x, \omega, n - 1)$ and the third inequality is because $\tilde r_n (a) \leq r (a)$ with equality if $a \not \in \gamma^\Psi_{n-1} (x, \omega)$. Summing the telescoping series establishes \[ v (h^\Psi | x, \omega) = v (h^\Psi | x, \omega, l) \leq \sum_{r=1}^l q_r (\Psi, l | x, \omega) = l\, q (\Psi, l | x, \omega). \] Taking the expectation with respect to $D_{x, \omega}$ completes the proof.

## No comments:

## Post a Comment