I attended a few sessions of this conference here in St. Louis, organized by Alex Miller. Miller has the uncanny ability to find the geekiest person at major organizations and convince them to come to town and talk about some of the really big issues that they are dealing with their code. The conference started off with Hilary Mason, who is a computer scientist and mathematician working for bit.ly. Did you know that you can get all sorts of analytics with any shortened bit.ly URL by just appending a plus sign at the end of it? Yup. She spoke about machine learning, and understanding and predicting behavior from large data set collections. For example, when the World Cup was playing, they observed all sorts of traffic coming from the countries that were in competition during the games. As soon as the game was over, the losing country’s traffic dropped to nothing. Obvious, but interesting. She also gave one of the best illustrations of Bayesian probability analysis that I have seen this side of grad school (and that has been a very long time for me).
I got to hear from Eben Hewitt, who wrote the O’Reilly book on Cassandra, an open source database project that is part of Apache and the current favorite of the large data set folks. He spoke about the really big data guys and how we have to talk in petabytes — WalMart’s customer data base is half a PB, and Google processes 24 PB each day. The data that was assembled to make the movie Avatar was around a PB.
Finally, there was Brian Sletten, an independent consultant based in LA, talking about new Web technologies. He mentioned the Powerhouse Museum in Sydney that is doing some interesting things with Web services — now how cool is that? I can feed my museum addition by going to a geeky conference.
You should put this on your radar for next year. This is very high signal, almost no noise. Some of the speakers could use some polishing, but the raw data is excellent.