## Saturday, October 9, 2010

### Dependent Reward Revelation: Part II

In my previous post I talked about price differentiation and how it motivates dependent reward revelation, which is a fancy way of saying which rewards are revealed depends upon the reward values. I have to admit I'm somewhat stuck about how to exploit this additional information.

I'll repeat the setup here:
1. World chooses $(x, \omega, r)$ from $D$ and reveals $(x, \omega)$.
2. Player chooses $a \in A$ via $p (a | x, \omega)$.
3. World chooses $\mathcal{A} \in \mathcal{P} (A)$ via $q (\mathcal{A} | x, \omega, r, a)$, where $a \in \mathcal{A}$ is required.
4. World reveals $\{ r (a) | a \in \mathcal{A} \}$.
The usual see what you did'' scenario is $q (\mathcal{A} | x, \omega, r, a) = 1_{\mathcal{A} = \{ a \}}$, and for price differentiation it is $q (\mathcal{A} | x, \omega, r, a) = \begin{cases} \{ a^\prime | a^\prime \leq a \} & \mbox{if } r (a) > 0; \\ \{ a^\prime | a^\prime \geq a \} & \mbox{if } r (a) = 0. \end{cases}$ Since $a \in \mathcal{A}$ is required, I can always throw away the additional information and convert any $q$ into $q = 1_{\mathcal{A} = \{ a \}}$, then use the offset tree. That seems wasteful, but at the moment I don't have another option for the price differentiation problem.

#### A Filter-Offset Style Update (Fail)

Consider a filter-offset tree style solution. Fix $(x, \omega, r)$, and consider a fixed internal node with inputs $\lambda \not \in \omega$ and $\phi \not \in \omega$. The expected importance weight of $\lambda$ would be \begin{aligned} w_{\lambda|r} &= E_{a \sim p} \biggl[ E_{\mathcal{A} \sim q|r,a} \biggl[ \alpha_{\lambda,\neg \phi} 1_{\lambda \in \mathcal{A}} 1_{\phi \not \in \mathcal{A}} 1_{r (\lambda) \geq \frac{1}{2}} \left( r(\lambda) - \frac{1}{2} \right) \\ &\quad\quad\quad + \alpha_{\neg \lambda, \phi} 1_{\lambda \not \in \mathcal{A}} 1_{\phi \in \mathcal{A}} 1_{r (\phi) \leq \frac{1}{2}} \left( \frac{1}{2} - r(\phi) \right) \\ &\quad\quad\quad + \alpha_{\lambda, \phi} 1_{\lambda \in \mathcal{A}} 1_{\phi \in \mathcal{A}} 1_{r (\lambda) > r (\phi)} \left( r (\lambda) - r (\phi) \right) \biggr] \biggr] \biggl/ \\ &E_{a \sim p} \left[ E_{\mathcal{A} \sim q|r,a} \left[ 1_{\lambda \in \mathcal{A}} 1_{\phi \not \in \mathcal{A}} + 1_{\lambda \not \in \mathcal{A}} 1_{\phi \in \mathcal{A}} + 1_{\lambda \in \mathcal{A}} 1_{\phi \in \mathcal{A}} \right] \right]. \end{aligned} Analogy with the filter-offset update suggests the choices \begin{aligned} \alpha_{\lambda, \neg \phi} &= (1 - \gamma) \frac{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \in \mathcal{A}} 1_{\phi \not \in \mathcal{A}} + 1_{\lambda \not \in \mathcal{A}} 1_{\phi \in \mathcal{A}} + 1_{\lambda \in \mathcal{A}} 1_{\phi \in \mathcal{A}} \right] \right]]}{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \in A} 1_{\phi \not \in \mathcal{A}} \right] \right]}, \\ \alpha_{\neg \lambda, \phi} &= (1 - \gamma) \frac{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \in \mathcal{A}} 1_{\phi \not \in \mathcal{A}} + 1_{\lambda \not \in \mathcal{A}} 1_{\phi \in \mathcal{A}} + 1_{\lambda \in \mathcal{A}} 1_{\phi \in \mathcal{A}} \right] \right]}{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \not \in A} 1_{\phi \in \mathcal{A}} \right] \right]}, \\ \alpha_{\lambda, \phi} &= \gamma \frac{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \in \mathcal{A}} 1_{\phi \not \in \mathcal{A}} + 1_{\lambda \not \in \mathcal{A}} 1_{\phi \in \mathcal{A}} + 1_{\lambda \in \mathcal{A}} 1_{\phi \in \mathcal{A}} \right] \right]}{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \in A} 1_{\phi \in \mathcal{A}} \right] \right]}, \end{aligned} for some $\gamma \in [0, 1]$. Unfortunately in general these quantities cannot be computed since $r$ is only partially revealed per instance. For the price differentiation $q$, for instance, only when $a$ is the largest possible price and $r (a) > 0$, or when $a$ is the smallest possible price and $r (a) = 0$, can these quantities be computed.

My suspicion is that the only way to proceed with this filter-offset style update is if the set of rewards that $q$ depends upon is always revealed. So something like $q (\mathcal{A} | x, \omega, r, a) = \begin{cases} \{ \tilde a \} \cup \{ a^\prime | a^\prime \geq a \} & \mbox{if } r (\tilde a) > 0; \\ \{ a, \tilde a \} & \mbox{if } r (\tilde a) = 0; \\ \{ \tilde a \} \cup \{ a^\prime | a^\prime \leq a \} & \mbox{if } r (\tilde a) < 0, \end{cases}$ would work since $q$ only depends upon $r (\tilde a)$ which is always revealed, so the above expectations can always be computed. With such a cooperative $q$, the rest of the filter-offset tree crank can be turned and the weighting factors would be \begin{aligned} \alpha_{\lambda, \neg \phi} &= \frac{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \in \mathcal{A}} 1_{\phi \not \in \mathcal{A}} + 1_{\lambda \not \in \mathcal{A}} 1_{\phi \in \mathcal{A}} \right] \right]}{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \in A} 1_{\phi \not \in \mathcal{A}} \right] \right]}, \\ \alpha_{\neg \lambda, \phi} &= \frac{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \in \mathcal{A}} 1_{\phi \not \in \mathcal{A}} + 1_{\lambda \not \in \mathcal{A}} 1_{\phi \in \mathcal{A}} \right] \right]}{E_{a \sim p} \left[ E_{\mathcal{A} \sim q | r, a} \left[ 1_{\lambda \not \in A} 1_{\phi \in \mathcal{A}} \right] \right]}, \\ \alpha_{\lambda, \phi} &= 1, \end{aligned} which is neat, but still leaves me wondering how to exploit the additional information available in the price differentiation problem.