Hey Noel. The offset tree is an offline policy co...

2010-11-08T10:31:36.977-08:00

Hey Noel.

The offset tree is an offline policy constructor for the contextual bandit problem (handling the "warm start" problem), but also can be updated online as well. In practice it is coupled with an exploration strategy that I do not discuss at all here.

So the discussion here is roughly about: I've done some exploration with a new advertisement somehow and decided to admit it to my exploitation policy (i.e., I want the offline policy constructor to compete with a larger class of policies that includes this new action now). Can I incrementally add an action to an offset tree or do I need to completely train from scratch? Since the offset tree can be maintained incrementally if the set of actions is not changing, it seems wasteful to have to completely retrain in the (common) event of the introduction or removal of new actions.

I'm fairly new to both learning reductions and...

2010-11-08T07:32:41.557-08:00

I'm fairly new to both learning reductions and ad server applications of ML. That said, I've recently been reading about bandit algorithms with an eye towards their application in ad serving (actually, content optimisation). Why don't you consider bandit algorithms? It seems they better model the true situation (limited feedback) than classification.

Comments on Machined Learnings: Why do Ad Servers use Regression?

Hey Noel. The offset tree is an offline policy co...

I'm fairly new to both learning reductions and...