Friday, May 10, 2019

ICLR 2019 Thoughts

ICLR 2019 was reminiscent of the early NeurIPS days (sans skiing): a single track of talks, vibrant poster sessions, and a large mid-day break. The Tuesday morning talks were on climate change, modeling proteins, generating music, and modeling the visual cortex. Except for climate change, these were all hot topics at NeurIPS in the late 1990s. History doesn't repeat, but it does rhyme.

My favorite talk was by Pierre-Yves Oudeyer, whose research in curiosity based learning spans both human subjects and robotics. Pierre's presentation was an entertaining tour de force of cognitive science, and I highly recommend watching the video (starts about 9 minutes 30 seconds). These ideas have extensively influenced the reinforcement learning community: the well-known Achilles' Heel of reinforcement learning is sample complexity, and recently practitioners have attacked it inspired by ideas from curiosity based learning (e.g., the Burda et. al. poster at the conference). Furthermore the view “exploration is for building world models” is reflected in recent theoretical results in contextual decision processes.

The strangest moment for me at the conference was seeing the GLUE poster. Apparently with the latency of conference review and publication, GLUE is just now being presented. Of course, it is already obsolete, so the presenters had another poster about their new dataset called SuperGLUE. Things are moving so quickly that the former “fast path” of conference proceedings is now noticeably behind.

Here's some stuff that caught my attention:
  • Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach: A couple years ago Zhang et. al. stunned the community by demonstrating convnets can fit Imagenet labels to randomly generated images, destroying the common belief that convnets generalized well due to capacity control. Here Zhou et. al. show that an MDL-style generalization bound applies, i.e., networks whose representation can be compressed after training have tighter deviation bounds. This is a (training) data-dependent bound and they seal the argument by noting networks trained on randomized training data do not compress as well.
  • Episodic Curiousity through Reachability: One of many curiosity-based exploration posters, Savinov et. al. propose a combination of a memory and something akin to a policy cover, with promising results. Also cool: the poster includes QR codes which trigger videos of agents learning to move via different algorithms.
  • Supervised Policy Update for Deep Reinforcement Learning: Vuong et. al. presented a plausible improvement over TRPO and PPO by convexifying the constrained policy optimization.