Speech: Big Data Applied in the Real World

deereThe use of Big Data is growing rapidly as companies seek to use its power to help improve business efficiency and look for new business opportunities. In this presentation for the local chapter of the Data Warehousing Institute, I will discuss how big data is being applied across a variety of industries, including even farm tractor maker John Deere, illustrated above.

You can download my slide deck here.


ITworld: A/B tests: Cut the fluff and spend the pixels on what works

surlatableA/B testing is like many things that can be vexing about the Web: a simple concept can turn into a complex programming project. But while the idea is simple — producing two (or more) different web pages for your site and instrument them to see which one drives more traffic or more sales – getting it to work can be fraught with politics and the actual implementation details.

Why bother? Mainly because there is almost nothing else that you can do that can have such a big effect. Just by changing the text size or button color you can generate a 50% increase in clickthrough rates.

You can read more about A/B tests in this article for ITworld and also view an accompanying slideshow that illustrates how to improve your own Web pages with four interesting examples, such as the one above showing three different versions of the Sur La Table website.

Your next supercomputer is just a click away in the cloud


This week at the Amazon developer’s conference in Vegas we got to see the latest supercomputer. I have been fascinated with this genre for many years. In the olden days, these were mammoth beasts, occupying rooms full of gear and burning up electricity like crazy. They cost millions of dollars, and had minions of folks tending to their care and feeding.

Back in 2004, I was fortunate enough to go out to San Francisco where a bunch of random folks were trying to assemble the first “flash mob” computer. You brought your own laptop (or desktop if you were strong enough) and left it for the weekend while they hooked it up to a switching fabric and tried to get every PC in sync.

Both the flash mob and the traditional supercomputer are old style. Today we have supercomputers in the cloud. With a click of a few mouse clicks, you could be running on thousands of virtual cores. It was bound to happen, and this week the folks at CycleComputing showed what they were doing. I have to say I was impressed.

In one case, they managed to put together 30,000 cores, which cost less than $1300 an hour to run for a big pharma company to do molecular modeling. For the Amazon show, they had virtual machines running across the globe on all eight of Amazon’s data centers. They were able to provision150,000 cores to run 264 years of compute time in less than a day’s actual elapsed time for a materials modeling application. Wow!

CycleComputing worked with Amazon to set this all up, so they could get all their virtual machines running in about the same time frame. If you had to create this computer in the real world, it would be $68 million. Cycle had a bill from Amazon of $30,000. While that is a lot of money, for the horsepower that they put together it really isn’t. I remember when some high-end PC servers cost that much for a single core not too long ago.

Think about that for a moment: in the past, you couldn’t get all this hardware set up in a matter of moments, let along months. Most supercomputers take years to build, and then they are almost obsolete, because someone else is building a bigger one. On the Top500.org list of the biggest ones, the current champ is a Chinese computer with more than three million cores. Just on cores alone, the CycleComputing assemblage would rank in the top 20 on this list.

Pretty amazing.  Silicon Angle has this video interview from the show floor with Jason Stowe of the company.

If you doubted that the cloud is just a passing fad, this should convince you otherwise.

ITworld: 6 mapping trends from Techonomy13

streetrx2If you haven’t yet gotten into mapping your data, now might be a good time to take a closer look at the technologies available. While maps have been around for thousands of years, the digital kind are a more recent innovation and more of a communications language, to visually display content and get context. Plus, they are universally recognized by everyone.

In this article for ITworld and the accompanying slideshow of maps, I talk about six trends that businesses can capitalize on with using these tools.

Website Lessons Learned from Williams-Sonoma

Is your website as classy as your brand?

For Williams-Sonoma the goal is to match great looking Web pages with top-shelf analytics to keep track of customers. I sat in on a talk they gave at the recent Teradata Partners annual conference in Dallas. By developing a website that elegantly weaves together design and analytics, good things happen for both company and customer.

You can read the full post in today’s Mozy blog here.

Does your relationship have enough diversity?

The old saying that opposites attract now has some mathematical validity, at least according to a new research pre-print made available this week. The paper, by Lars Backstrom of Facebook and Jon Kleinberg of Cornell University, looks at Facebook friend network data and link. The authors attempt to predict a member’s spouse or romantic partner from the shape and density of their network.The authors built some algorithms and do so successfully.

To aid in their model, they invent the concept of network dispersion to characterize your friend network. Their measure of dispersion isn’t what you might initially assume: the more the better. Put another way, the more diverse your friend collection is between you and your spouse, the more powerful a bond is between the two of you and the more likely it is your marriage (or relationship) will last longer. The two authors also did time-series analysis, looking at how your friend networks change and relate that to your changing relationship status that you post publicly to Facebook.

Another interesting result from the paper is that the researchers used two different datasets: one was a subset of the other and focused on larger and denser friend networks. Both sets used randomized and anonymous data from across a large swatch of Facebook members. The results were very similar, showing diversity doesn’t matter how big your network is.

It is also heartening to see that people who are “connectors” (as Brad Feld uses the term) play important roles in one’s friend network. We connectors (and I count myself among the group) are the ones who create the network path diversity, and help to stitch together the smaller sub-groups that boost the dispersion factor. This makes sense to me.

There are many other efforts involved in mapping relationships, including one that Dean Collins developed one many years ago but it is no longer available.

One that you can access quickly and free is LinkedIn Inmaps here. When I first heard about this program several years ago I was fascinated with my own map, but sadly, they have tuned the software to only look at smaller networks than mine. Perhaps your account will be able to work with their software.

So there you have it: not only do opposites attract, but they stay attracted longer. If you have any links to other relationship mapping or visualizations, share them here in the comments.

Smartbear: How Riot Games conquered Hadoop, seriously

honuLife at a gaming company isn’t always fun-and-games. It’s also a demanding IT environment with a huge amount of data to manage. Using various Hadoop open source tools (including Honu, see the diagram at right), the gaming company behind League of Legends supported hypergrowth and delivered more timely analytics. I spent some time with them at the StampedeCon conference to learn more about how they pulled this off.

My article in Smartbear’s blog can be found here.

Come See the Software Side of Sears

Thanks to Apache Hadoop and other data-analytics technologies, the international retailer Sears has managed to not only transform its IT operations, but also decommission some of its mainframe computers. The company has been so successful with this project that it has spun off the group responsible into a separate company that is now selling its services to others. Call this one of the bigger proof-points of using Hadoop in the enterprise.

Read the rest of my article on Sears at StampedeCon here on Slashdot and see the software side of Sears.

Modern Infrastructure: Hyperscale data center means different hardware needs

Remember when data centers had separate racks, staffs and management tools for servers, storage, routers and other networking infrastructure? Those days seem like a fond memory in today’s hyperscale data center. That setup worked well when applications were relatively separate or they made use of local server resources such as RAM and disk and had few reasons to connect to the Internet.

I describe the new needs of the modern hyperscale data center in an article for Modern Infrastructure Magazine here.

Solution Providers for Retail: GPS in Retail Stores Helps Convert Browsers to Buyers

esrIn another post on geofencing, I mentioned efforts by a number of retailers to make use of location-based information. There is another perspective on location, that of the provider of the geospatial databases that drive many of the location-aware mobile apps that are being developed. Redlands, Calif.-based ESRI has been one of the leaders in this space. I spoke to two of their key managers about how they work with developers and how the business is changing.

You can read my post today on the SPfR blog here.