Thursday, June 7, 2012

Stealth No More!

The startup I'm currently at publicly launched today. It's a social image sharing site called LoveIt. This is a crowded space at the moment, but we've tried to throw in some innovative new features. One machine learning related bit that I worked on is the recommendation system; here's an example screenshot with the recommendations in the bottom right hand side.

The image for a mashup by DJ Earworm (who is totally awesome!). In this case the second recommendation is a music collection which is very sensible, but the first recommendation is more questionable (focusing on the costume ball aspect). Hopefully the system will get better as we generate more behavioral data exhaust. I have noticed image recommendation is more forgiving than text recommendation: images have less precise meaning so people are more willing to invent why a quirky recommendation makes sense.

Conceptually the system is heavily Elkan inspired. The implementation is a combination of Elasticsearch and Vowpal Wabbit, strung together with Erlang. The tricky part is getting it to compute something quickly (circa 100ms), and both Elasticsearch and Vowpal Wabbit are excellent pieces of software in this regard!

The Bigger Picture

When I first started on the internet, the most common demand for machine learning I encountered was for optimizing performance marketing (the other big one would have been algorithmic search, but southern California wasn't a major player in that space). Nowadays there are many big smart companies focused on the science of advertising. In my opinion, if you have some machine learning acumen and some plucky post-series-A startup claiming to revolutionize internet advertising with a new algorithm attempts to recruit you, run the other way! There are probably still many smaller exits to be had in this space selling to the major ad networks, but unless you have a large equity share it won't change your life.

Fortunately there is a new nexus of ubiquitous machine learning need: content recommendation, personalization, summarization, and visualization. This is driven by the intersection of several trends, including the rise in user-generated content, social networks, and smartphones. For example, Twitter has turned everybody into an intelligence analyst lost in a sea of intercepts. Technologies that can scan all of Twitter and surface the (personalized) good stuff in real-time would be very interesting. Furthermore, as Google has proven, if you position yourself as a trusted discovery tool for users you can easily monetize. Thus if you get a recruiting call from a startup claiming to attack such problems, my advice is to seriously consider it.

No comments:

Post a Comment