I gave a presentation today on practical tips for solving contextual bandit problems on the internet. Unless one faces a complete cold start situation, there is an existing system making decisions which you are expected to improve. Such systems are often not explicitly constructed decision theoretically, and a common case is a pile of heuristics developed by a set of plucky humans armed with spreadsheet software. This led naturally to the topic of how to beat such systems.
It's important to note what is likely not to work: taking the same information used by the human-spreadsheet algorithm (HSA), applying the latest whiz-bang Machine Learning algorithm to it, and hoping to do better. You might get lucky but especially in a consulting context it's good to stack the deck in your favor. The reason why this is not a good strategy is that Machine Learning algorithms are at their core optimization algorithms, but HSA is also an optimization algorithm, and given enough time HSA can be quite competitive so it's important to understand its weaknesses.
HSA tends to use counting estimators driven by a small number of features (a typical number is 5). These features can themselves be nonlinear hand-tuned combinations of small numbers of base pieces of information (again, typically 5), which can often be found in other ``tabs'' in the spreadsheet. Therefore I find it useful to think of HSA as a low-dimensional deep-learning algorithm.
The deep-learning in HSA is not state-of-the-art but given that it has typically had several man-years of running time, it can be pretty good. Therefore you are asking for trouble hoping to best HSA purely by improved deep-learning. The true Achilles heel of HSA is the low-dimensionality: taking the union of all the base pieces of information in all the tabs and you are generally looking at less than 20 pieces of information. Furthermore, the pieces of information are typically dense in the original data stream.
This suggests a strategy for the win: look for additional pieces of information available to the decision system not considered by the HSA. In particular look for sparse features which individually occur too rarely to support counting statistics, but which in concert provide lots of information (raw text fields can have this property).
Another potential weakness of HSA is the manual aspect and the interaction with nonstationarity. Sometimes the heuristics in the serving system are not frequently updated, in which case automation and processing more recent data can provide lift. Sometimes the HSA contains separate different seasonal models driven by subsets of the data, in which case you do better by training on all the data and including temporal, seasonal, and holiday features (recency weighting the historical data is a good idea here: also make sure any cross-validation is done temporally out-of-sample).
In any event I would advise a healthy respect for HSA-driven systems. Naturally when I first started I figured such systems would be easy targets and I've learned humility ``the hard way.''