## Monday, September 27, 2010

### Positional Effects: Part II

In a previous post I talked about cost-sensitive best m with partial feedback (CSBM-PF) in the presence of positional effects, as I try to muddle through how to optimize a combinatorial content page with a single feedback event. Several different possibilities suggested themselves:
1. Assume nothing about positional dependence, negative externalities, etc. In this case one can treat entire sets as actions and use the offset tree directly. Disadvantages of this approach include: generalization is restricted to combinations which have historical support; and training time computation scales as the number of combinations. However if the number of combinations is small this is perhaps the best way to go.
2. Assume a greedy approach to constructing result vectors is optimal, but otherwise do not assume anything about positional dependence (or negative externalities?). In this case a sequence of offset trees arranged in greedy fashion reduces CSBM-PF to CSMC-PF. Generalization is restricted to combinations in which the individual action-position pairs have historical support, rather than entire combinations. Training time calculation is $m |A|$ rather than ${|A| \choose m}$.
1. Ideally, I could find a necessary condition for the greedy approach to be optimal and use that in the learning algorithm, but I haven't found one.
3. Assume swap supremacy (sufficient for the greedy approach to be optimal) and preservation of relative order by position (orthogonal to greedy, but necessary to make what follows work). Again a sequence of offset trees arranged in greedy fashion reduces CSBM-PF to CSMC-PF, but data is reused at later positions. Generalization is restricted to combinations in which the individual action has historical support at or before the position where it is being used. I had hoped to be able to use data at later positions with these assumptions, but further reflection suggests no. For instance, consider the following expected reward distribution as $\epsilon \to 0$, $\begin{array}{|c|c|c|} \mbox{Action} & r (a, 1) & r (a, 2) \\ \hline 1 & 1 + 1 / \epsilon & 1 + \epsilon \\ 2 & 1 & 1 \end{array}$ which obeys swap supremacy and preserves relative order by position; yet a small regret in the second position leads to a large regret in the first position. Basically the assumptions I have so far are not strong enough to let me go backwards. This is sad because exploration at the earlier positions is the most expensive, in the sense that they are the higher value spots.
Well I tried to be fancy, and it was informative, but ultimately I'm going to go with a common assumption: that the positional dependence and the action dependence factorizes, i.e., $r (a, i) = \kappa (i) \tilde r (a)$, where $\kappa$ and $\tilde r$ are independent random variables. If $\kappa (i)$ is non-negative and monotonically decreasing then this implies swap supremacy and preservation of relative order by position. The difference is that it gives a projection factor for changing positions from $i$ to $j$, $E_{\kappa | x, \omega} [ \kappa (j) ] / E_{\kappa | x,\omega}[ \kappa (i) ]$, which will (hopefully) allow me to generalize across position and mitigate the historical support requirement.

Now to just make that work.