## Friday, June 29, 2012

### My ICML 2012 Notables

I've already devoted entire blog posts to some of the ICML 2012 papers, but there are some other papers that caught my attention for which I only have a quick comment.

• Online Structured Prediction via Coactive Learning: read the full blog post.
• Predicting Accurate Probabilities with a Ranking Loss: read the full blog post.
• Training Restricted Boltzmann Machines on Word Observations. I haven't used RBMs in over a decade, for practical text classification problems a bag-of-bigrams representation is often sufficient, and LDA is my go-to technique for unsupervised feature extraction for text. So why do I like this paper? First, the computational efficiency improvement appears substantial, which is always of interest: I like deep learning in theory, but in practice I'm very impatient. Second the idea of discovering higher order structure in text (5-grams!) is intriguing. Third (like LDA) the technique is clearly more generally applicable and I wonder what it would do on a social graph. That all suggests there is some chance that I might actually try this on a real problem.
• Fast Prediction of New Feature Utility: I'm constantly in the situation of trying to chose which features to try next, and correlating with the negative gradient of the loss function makes intuitive sense.
• Plug-in Martingales for Testing Exchangeability On-Line: how awesome would it be if VW in online learning mode could output a warning that says the input data does not appear to be generated by an exchangeable distribution; try randomly shuffling your data to improve generalization.''
• Dimensionality Reduction by Local Discriminative Gaussians: This seems imminently practical. The major limitation is that it is a supervised dimensionality reduction technique, so it would apply to cases where there is one problem with a deficit of labeled data and a related problem using the same features with an abundance of labeled data (which is a special case of Transfer Learning). I usually find myself in the few labeled data and lots of unlabeled data'' case demanding an unsupervised technique, but that could be because I don't ask myself the following question often enough: is there a related problem which has lots of training data associated with it?''
• Finding Botnets Using Minimal Graph Clusterings: Very entertaining. I was asked in a job interview once how I would go about identifying and filtering out automated traffic from search logs. There's no right answer'', and black-letter machine learning techniques don't obviously apply, so creativity is at a premium.