## Sunday, April 15, 2012

### Interactive PAC

I recently was inspired to summarize progress in the application of PAC-style algorithm design for interactive settings, and ended up with this small table.
Probably Approximately Correct
Passive Batch Learning wrt draws of the data set IID Concentration
Passive Online Learning wrt draws of the data set Martingale (Online to Batch'' conversion)
Active Online Learning wrt draws of the data set and sequence of label inspection decisions Martingale (custom?)
Contextual Bandits wrt draws of the data set and sequence of actions chosen Martingale (Freedman-style)
Two things jump out. The first is the versatility of martingales for taming interactive learning. How much further can this style of reasoning go? It'd be foolish to prognosticate with any certainty, but it feels like there is still some room to run. Ultimately there are the limits of the stochastic setting, of course: not all environments are well approximated as oblivious. So eventually we'll all be doing game theory, unless something like this has killed everybody. Before then, I worry about practically insurmountable sample complexity issues trying to tackle credit assignment problems in reinforcement learning. One can already see hints of this, e.g., worst-case instantaneous regret for agnostic stochastic contextual bandits indicates data requirements scale linearly with the number of possible actions. On the other hand, data volumes are scaling even faster than computation. If the relative growth continues indefinitely than computational complexity will trump sample complexity as an algorithmic design issue.

The second is the fundamental nature of online learning. In the 1990s you could be forgiven if (like me!) you thought of online learning as an implementation detail, a type of optimization strategy which was particularly good for the types of objective functions that arise in machine learning (I was totally unaware of prediction of individual sequences). With the benefit of hindsight, it is clear that online learning lends itself better to thinking about how information is revealed to (or collected by!) an algorithm over time, and that passive batch learning is more like a special case with particular implementation possibilities.