Machined Learnings: June 2014

Monday, June 30, 2014

ICML 2014 Review

ICML 2014 went well, kudos to the organizers. The location (Beijing) and overlap with CVPR definitely impacted the distribution of attendees, so the conference felt different than last year. (I also learned that my blog is blocked from China, collateral damage from some kind of spat between Google and the Chinese government).

Deep learning was by far the most popular conference track, to the extent that the conference room for this track was overwhelmed and beyond standing room only. I missed several talks I wanted to attend because there was no physical possibility of entrance. This is despite the fact that many deep learning luminaries and their grad students were at CVPR. Fortunately Yoshua Bengio chose ICML and via several talks provided enough insight into deep learning to merit another blog post. Overall the theme is: having conquered computer vision, deep learning researchers are now turning their attention to natural language text, with some notable early successes, e.g., paragraph vector. And of course the brand is riding high, which explains some of the paper title choices, e.g., “deep boosting”. There was also a conference track titled “Neural Theory and Spectral Methods” ... interesting bedfellows!

ADMM suddenly became popular (about 18 months ago given the latency between idea, conference submission, and presentation). By this I don't mean using ADMM for distributed optimization, although there was a bit of that. Rather there were several papers using ADMM to solve constrained optimization problems that would otherwise be vexing. The take-home lesson is: before coming up with a customized solver for whatever constrained optimization problem which confronts you, try ADMM.

Now for the laundry list of papers (also note the papers described above):

Input Warping for Bayesian Optimization of Non-stationary Functions. If you want to get the community's attention, you have to hit the numbers, so don't bring a knife to a gunfight.
Nuclear Norm Minimization via Active Subspace Selection. The inimitable Cho-Jui Hsieh has done it again, this time applying ideas from active variable methods to nuclear norm regularization.
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. A significant improvement in the computational complexity required for agnostic contextual bandits.
Efficient programmable learning to search. Additional improvements in imperative programming since NIPS. If you are doing structured prediction, especially in industrial settings where you need to put things into production, you'll want to investigate this methodology. First, it eases the burden of specifying a complicated structured prediction task. Second, it reduces the difference between training and evaluation, which not only means faster deployment, but also less defects introduced between experiments and the production system.
Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels. It is good to confirm quasi-random numbers can work better for randomized feature maps.
A Single-Pass Algorithm for Efﬁciently Recovering Sparse Cluster Centers of High-dimensional Data. I'll need to spend some quality time with this paper.
Multiresolution Matrix Factorization. Nikos and I have had good luck learning discriminative representations using classical matrix decompositions. I'm hoping this new decomposition technique can be analogously adapted.
Sample-based Approximate Regularization. I find data-dependent regularization promising (e.g., dropout on least-squares is equivalent to a scale-free L2 regularizer), so this paper caught my attention.
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm. No experiments in the paper, so maybe this is a ``pure theory win'', but it looks interesting.

Monday, June 16, 2014

Microsoft starts an ML blog and an ML product

My employer, Microsoft, has started a new blog around ML and also announced a new product for ML.

The blog is exciting, as there are multiple ML luminaries at Microsoft who will hopefully contribute. Joseph Sirosh is also involved so there will presumably be a healthy mix of application oriented content as well.

The product is also exciting. However if you are an ML expert already comfortable with a particular toolchain, you might wonder why the world needs this product. Those who work at large companies like Microsoft, Google, Facebook, or Yahoo are presumably aware that there is an army of engineers who maintain and improve the systems infrastructure underlying the data science (e.g., data collection, ingest and organization; automated model retraining and deployment; monitoring and quality assurance; production experimentation). However if you've never worked at a startup then you aren't really aware of how much work all those people are doing to enable data science. If those functions become available as part of a service offering, than an individual data scientist with a hot idea has a chance of competing with the big guys. More realistically, given my experience at startups, the individual data scientist will have a chance to determine that their hot idea is not so hot before having to invest large amount of capital developing infrastructure :)

Of course there is a lot more that has to happen for “Machine Learning as a Service” to be fully mature, but this product announcement is a nice first step.