Thursday, August 10, 2017

ICML 2017 Thoughts

ICML 2017 has just ended. While Sydney is remote for those in Europe and North America, the conference center
is a wonderful venue (with good coffee!), and the city is a lot of fun. Everything went smoothly and the
organizers did a great job.

You can get a list of papers that I liked from my Twitter feed, so instead I'd like to discuss some broad themes
I sensed.

  • Multitask regularization to mitigate sample complexity in RL. Both in video games and in dialog, it is useful to add extra (auxiliary) tasks in order to accelerate learning.
  • Leveraging knowledge and memory. Our current models are powerful function approximators, but in NLP especially we need to go beyond "the current example" in order exhibit competence.
  • Gradient descent as inference. Whether it's inpainting with a GAN or BLUE score maximization with an RNN, gradient descent is an unreasonably good inference algorithm.
  • Careful initialization is important. I suppose traditional optimization people would say "of course", but we're starting to appreciate the importance of good initialization for deep learning. In particular, start close to linear with eigenvalues close to 1. (Balduzzi et. al. , Poole et. al.)
  • Convolutions are as good as, and faster than, recurrent models for NLP. Nice work out of Facebook on causal convolutions for seq2seq. This aligns with my personal experience: we use convolutional NLP models in production for computational performance reasons.
  • Neural networks are overparameterized. They can be made much sparser without losing accuracy (Molchanov et. al., Lobacheva et. al.).
  • maluuba had the best party. Woot!
Finally, I kept thinking the papers are all “old”. While there were lots of papers I was seeing for the first time, it nonetheless felt like the results were all dated because I've become addicted to “fresh results” on arxiv.


  1. Hi Paul,
    For your first point - i.e. multitask regularization, could you give some citations both from RL world and dialog world.

    1. Consider slide 84 of ... the primary task is story comprehension but adding the additional task of predicting the (exact) teacher response helps.

      I also just noticed slide 27 ... but I'm not familiar with that part.