Machined Learnings: On the Sustainability of Open Industrial Research

I'm glad OpenAI exists: the more science, the better! Having said that, there was a strange happenstance at NIPS this year. OpenAI released OpenAI universe, which is their second big release of a platform for measuring and training counterfactual learning algorithms. This is the kind of behaviour you would expect from an organization which is promoting the general advancement of AI without consideration of financial gain. At the same time, Google, Facebook, and Microsoft all announced analogous platforms. Nobody blinked an eyelash at the fact that three for-profit organizations were tripping over themselves to give away basic research technologies.

A naive train of thought says that basic research is a public good, subject to the free-rider problem, and therefore will be underfunded by for-profit organizations. If you think this is a strawman position, you haven't heard of the Cisco model for innovation. When this article was written:

…Cisco has no “pure” blue-sky research organization. Rather, when Cisco invests research dollars, it has a specific product in mind. The company relies on acquisitions to take the place of pure research …

Articles like that used to worry me alot. So why (apparently) is this time different?

Factor 1: Labor Market Scarcity

Informal discussions with my colleagues generally end up at this explanation template. Specific surface forms include:

“You can't recruit the best people without good public research.” Facially, I think this statement is true, but the logic is somewhat circular. You certainly can't recruit the best researchers without good public research, but why do you want them in the first place? So is the statement more like “With good public research, you can recruit the best people, and then convince them to do some non-public research.” (?) Alot of grad students do seem to graduate and then “disappear”, so there is probably some truth to this.
“The best people want to publish: it's a perk that you are paying them.” Definitely, getting public recognition for your work is rewarding, and it makes total sense for knowledge workers to want to balance financial capital and social capital. Public displays of competence are transferable to a new gig, for instance. But this line of thought assumes that public research is a cost for employers that they chose to pay in lieu of, e.g., higher salaries.

I not only suspect this factor is only part of the picture: I strongly hope that it is only part of the picture. Because if it is the whole picture, as soon as the labor market softens, privately funded public research will experience a big pullback, which would suck.

Factor 2: Positive Externalities

This argument is: “researchers improve the productivity of those nearby such that it is worth paying them just to hang out.” In this line of thinking even a few weeks lead time on the latest ideas, plus the chance to talk in person with thought leaders in order to explain the nuances of the latest approaches, is worth their entire salary. There is some truth to this, e.g., Geoffrey Hinton performed some magic for the speech team here back in the day. The problem I have with this picture is that, in practice, it can be easier to communicate and collaborate with somebody across the planet than with somebody downstairs. It's also really hard to measure, so if I had to convince the board of directors to fund a research division based upon this, I think I would fail.

This is another favorite argument that comes up in conversation, by the way. It's funny to hear people characterize the current situation as “ we're scarce and totally awesome.” As Douglas Adams points out, there is little benefit to having a sense of perspective.

Factor 3: Quality Assurance

The idea here is basically “contributing to the public research discussion ensures the high quality of ideas within the organization.” The key word here is contributing, as the alternative strategy is something more akin to free-riding, e.g., sending employees to conferences to attend but not contribute.

There is definite value in preparing ideas for public consumption. Writing the related work section of a paper is often an enlightening experience, although honestly it tends to happen after the work has been done, rather than before. Before is more like a vague sense that there is no good solution to whatever the problem is, hopefully informed by a general sense of where the state-of-the-art is. Writing the experiment section, in my experience, is more of a mixed bag: you often need to dock with a standard metric or benchmark task that seems at best idiosyncratic and at worst unrelated to the thrust of your work and therefore forcing particular hacks to get over the finish line. (Maybe this is why everybody is investing so heavily in defining the next generation of benchmark tasks.)

The funny thing is most of the preceeding benefits occur during the preparation for publication. Plausibly, at that point, you could throw the paper away and still experience the benefits (should we call these “the arxiv benefits”?). Running the reviewer gauntlet is a way of measuring whether you are doing quality work, but it is a noisy signal. Quality peer feedback can suggest improvements and new directions, but is a scarce resource. Philanthropic organizations that want to advance science should attack this scarcity, e.g., by funding high quality dedicated reviewers or inventing a new model for peer feedback.

I don't find this factor very compelling as a rationale for funding basic research, i.e., if I were the head of a research department arguing for funding from the board of directors, I wouldn't heavily leverage this line of attack. Truth is less important than perception here, and I think the accounting department would rather test the quality of their ideas in the marketplace of products.

Factor 4: Marketing

A company can use their basic research accolades as a public display of the fitness and excellence of their products. The big players definitely make sure their research achievements are discussed in high profile publications such as the New York Times. However this mostly feels like an afterthought to me. What seems to happen is that researchers are making their choices on what to investigate, some of it ends up being newsworthy, and another part of the organization has dedicated individuals whose job it is to identify and promote newsworthy research. IBM is the big exception, e.g., Watson going after Jeopardy.

This is arguably sustainable (IBM has been at it for a while), but it creates activity that looks like big pushes around specific sensational goals, rather than distribution of basic research tools and techniques. In other words, it doesn't look like what was happening at this year's NIPS.

Factor 5: Monopolies

I find this explanation agreeable: that technology has created more natural monopolies and natural monopolies fund research, c.f., Bell Labs and Xerox PARC. All market positions are subject to disruption and erosion but Microsoft, Google, and Facebook all have large competitive moats in their respective areas (OS, search, and social), so they are currently funding public basic research. This factor predicts that as Amazon's competitive moats in retail (and cloud computing) widen, they will engage in more public basic research, something we have seen recently.

For AI (née machine learning) in particular, the key monopoly is data (which derives from customer relationships). Arguably the big tech giants would love for AI technologies to be commodities, because they would then be in the best position to exploit such technologies due to their existing customer relationships. Conversely, if a privately discovered disruptive AI technology were to emerge, it would be one of the “majors” being disrupted by a start-up. So the major companies get both benefits and insurance from a vibrant public research ecosystem around AI.

Nonetheless, a largish company with a decent defensive moat might look at the current level of public research activity and say, “hey good enough, let's free ride.” (Not explicitly, perhaps, but implicitly). Imagine you are in charge of Apple or Salesforce, what do you do? I don't see a clear “right answer”, although both companies appear to be moving in the direction of more open basic research.

Factor 6: Firms are Irrational

Tech firms are ruled by founder-emperors whose personal predilections can decide policies such as whether you can bring a dog to work. The existence of a research department with a large budget, in practice, can be similarly motivated. All the above factors are partially true but difficult to measure, so it comes down to a judgement call, and as long as a company is kicking ass deference for the founder(s) will be extreme.

If this factor is important, however, then when the company hits a rough patch, or experiences a transition at the top, things can go south quickly. There have been examples of that in the last 10 years for sure.

Machined Learnings

Saturday, December 17, 2016

On the Sustainability of Open Industrial Research