A paper by Tu and Lin caught my attention at ICML 2010 because of their use of a one-sided absolute loss in their reduction of CSMC to regression. One-sided here means that for a particular training instance, underestimating the lowest component of the cost vector is not penalized, and overestimating the other (non-lowest) components of the cost vector is not penalized. Since they bound regret on CSMC based upon the error (not regret) on the underlying regression problem, and since one-sided loss is bounded above by two-sided loss, one-sided loss is better here.
Now thinking about a CSMC reduction to regression as a linear program being solved with estimated coefficients, this suggests the following strategy in more complicated situations:
- Solve the linear program for each training instance (or subset of training data associated with each linear program instance).
- Use the solution to beneficially inform the loss function in the regression problem.
Thus it appears that, in the case of squared loss, the modification of the regression loss informed by the solution to the linear program on the training instance is actually harmful. However in more complicated situations perhaps an error bound will be the best that we can achieve, and therefore this idea of using the solution of the linear program on the training instances might have utility. (Parenthetically, Tu and Lin evidence strong empirical performance of their resulting algorithm for CSMC despite the lack of a regret bound.)