Saturday, March 24, 2012

Are We The Bad Guys?

America has experienced increasing income inequality for the past few decades, and there is lively debate in the econoblogosphere about the causes. One post by Karl Smith caught my attention:
My longer thesis is that the rising return to unskilled labor is a function of industrialization and that industrialization is unique in this. The wage rate on unskilled labor never benefited before and its not immediately clear that it will ever benefit again.

This is because rents always accrue to the scarce factors of production. Industrialization meant that the only thing we were short on were “control systems” everything else in the production process was effectively cheap.

However, any mentally healthy human being is a decent control system. So, this meant huge returns to being a human.
If this theory is correct, it indicates anybody working in Artificial Intelligence and related fields is contributing to income inequality. Doh!

Karl goes on to say
You need there to be a shortage of something that human beings have a comparative advantage at simply by being human beings.
Mechanical Turk shows that people still have the ability to trade their inherent excellent perceptual capabilities. Identifying obscenity and tagging images for \$2.00 an hour may not sound like your idea of the good life, but those who subsist on landfills in Nicaragua would presumably consider it an improvement. It would be great if it were feasible to connect the world's poorest to Mechanical Turk to improve their welfare.

Any charity with such ambitions needs to hurry, however. Within a decade or two we will have cracked all the problems that are commonly encountered on Mechanical Turk today, closing this window of development opportunity.

Monday, March 5, 2012

PU Learning and AUC

At my new job the first major problem I'm dealing with is characterized by a prevalence of positive labeled data, no negative labeled data, and abundant unlabeled data; also known as p-u learning. Despite the ``stinky'' moniker, it is possible to make progress under these conditions. Because this is a common predicament there is extensive treatment in the literature: researchers have advocated a variety of different approaches and associated statistical assumptions and it's been tricky for me to distill best practices. Fortunately I noticed a gem of a result due to Zhang and Lee which is proving useful.

The setup is extremely natural: assume features $x$ and (binary) labels $y$ are distributed jointly via $D = D_x \times D_{y|x} = D_y \times D_{x|y}$; assume you have access to samples from the distribution $D_{x|1}$ of features $x$ given a positive label $y = 1$, i.e., positive labeled examples; and assume that you have to access to samples from the unconditional distribution of features $D_x$, i.e., unlabeled examples. Note that you do not have access to samples from the distribution $D_{x|0}$, i.e., you do not have any negative labeled examples.

It turns out if AUC is the objective function you can optimize directly on the positive and unlabeled data. This is demonstrated by relating AUC on the p-u dataset, \[
\mathop{\mathrm{PUAUC}} (\Phi) &= \mathbb{E}_{(x_+, x_-) \sim D_{x|1} \times D_x}\left[ 1_{\Phi (x_+) > \Phi (x_-)} + \frac{1}{2} 1_{\Phi (x_+) = \Phi (x_-)} \right],
\] to the standard AUC computed using the (inaccessible) negative labeled examples, \[
\mathop{\mathrm{AUC}} (\Phi) &= \mathbb{E}_{(x_+, x_-) \sim D_{x|1} \times D_{x|0}}\left[ 1_{\Phi (x_+) > \Phi (x_-)} + \frac{1}{2} 1_{\Phi (x_+) = \Phi (x_-)} \right].
\] In particular, \[
&\mathop{\mathrm{AUC}} (\Phi) \\
&= \mathbb{E}_{(x_+, x_-) \sim D_{x|1} \times D_{x|0}}\left[ 1_{\Phi (x_+) > \Phi (x_-)} + \frac{1}{2} 1_{\Phi (x_+) = \Phi (x_-)} \right] \\
&= \frac{\mathbb{E}_{(x_+,(x_-,y)) \sim D_{x|1} \times D}\left[ 1_{y=0} \left( 1_{\Phi (x_+) > \Phi (x_-)} + \frac{1}{2} 1_{\Phi (x_+) = \Phi (x_-)} \right) \right] }{\mathbb{E}_{(x,y) \sim D}\left[ 1_{y=0} \right]} \\
&= \frac{\mathop{\mathrm{PUAUC}} (\Phi) - \mathbb{E}_{(x_+,(x_-,y)) \sim D_{x|1} \times D}\left[ 1_{y=1} \left( 1_{\Phi (x_+) > \Phi (x_-)} + \frac{1}{2} 1_{\Phi (x_+) = \Phi (x_-)} \right) \right]}{\mathbb{E}_{(x,y) \sim D}\left[ 1_{y=0} \right]} \\
&= \frac{\mathop{\mathrm{PUAUC}} (\Phi) - \mathbb{E}_{(x, y) \sim D} \left[ 1_{y = 1} \right] \mathbb{E}_{(x_+,x_-) \sim D_{x|1} \times D_{x|1}}\left[ \left( 1_{\Phi (x_+) > \Phi (x_-)} + \frac{1}{2} 1_{\Phi (x_+) = \Phi (x_-)} \right) \right]}{\mathbb{E}_{(x,y) \sim D}\left[ 1_{y=0} \right]} \\
&= \frac{\mathop{\mathrm{PUAUC}} (\Phi) - \frac{1}{2} \mathbb{E}_{(x, y) \sim D} \left[ 1_{y = 1} \right]}{\mathbb{E}_{(x,y) \sim D}\left[ 1_{y=0} \right]} \\
&= \frac{\mathop{\mathrm{PUAUC}} (\Phi) - \frac{1}{2}}{\mathbb{E}_{(x,y) \sim D}\left[ 1_{y=0} \right]} + \frac{1}{2},
\] which they write in the paper as \[
\mathop{\mathrm{AUC}} (\Phi) - \frac{1}{2} \propto \mathop{\mathrm{PUAUC}} (\Phi) - \frac{1}{2}.
\] This results in the following extremely simple procedure: treat unlabeled data as negative data and optimize for AUC.