I recently was inspired to summarize progress in the application of

PAC-style algorithm design for interactive settings, and ended up with this small table.

| Probably | Approximately Correct |

Passive Batch Learning | wrt draws of the data set | IID Concentration |

Passive Online Learning | wrt draws of the data set | Martingale (``Online to Batch'' conversion) |

Active Online Learning | wrt draws of the data set and sequence of label inspection decisions | Martingale (custom?) |

Contextual Bandits | wrt draws of the data set and sequence of actions chosen | Martingale (Freedman-style) |

Two things jump out. The first is the versatility of

martingales for taming interactive learning. How much further can this style of reasoning go? It'd be foolish to prognosticate with any certainty, but it feels like there is still some room to run. Ultimately there are the limits of the stochastic setting, of course: not all environments are well approximated as oblivious. So eventually we'll all be doing game theory, unless something

like this has killed everybody. Before then, I worry about practically insurmountable sample complexity issues trying to tackle credit assignment problems in reinforcement learning. One can already see hints of this, e.g., worst-case instantaneous regret for agnostic stochastic contextual bandits indicates data requirements scale linearly with the number of possible actions. On the other hand, data volumes are scaling even faster than computation. If the relative growth continues indefinitely than computational complexity will trump sample complexity as an algorithmic design issue.

The second is the fundamental nature of online learning. In the 1990s you could be forgiven if (like me!) you thought of online learning as an implementation detail, a type of optimization strategy which was particularly good for the types of objective functions that arise in machine learning (I was totally unaware of

prediction of individual sequences). With the benefit of hindsight, it is clear that online learning lends itself better to thinking about how information is revealed to (or collected by!) an algorithm over time, and that passive batch learning is more like a special case with particular implementation possibilities.

Ran into your blog. You might be interested in another application of a Freedman-style inequality (http://arxiv.org/abs/1002.4058). Here, the data can be adversarial (e.g. not necessarily iid), but the algorithm can act randomly and still do well. This doesn't get into game theory, however, because of how regret is defined -- this may be a limitation of the model.

ReplyDeleteThanks for pointing that out, I'll put that on my reading list.

DeleteRe: game theory, I may have been too hasty in declaring the future to be all game theory :) proposition 2 of http://arxiv.org/abs/1104.5070 indicates for OCO-style problems there is always a minimax strategy for the adversary which is oblivious (but nonstationary). So maybe statistical reasoning will be sufficient to master these problems.

Well I clearly need to chew on your paper for a while, but the overall point is immediately apparent: you can use martingales in an adversarial setting to prove results about your own randomizations.

DeleteViva la martingale!

definitely!

Delete