Data daze: Big Data demystified

Big Data is coming, heralded by a bit of noisy hype - but many experts think it is hype that will be fully realised very shortly. Accel Partners, a global venture capital firm, is betting on it - the firm has established a $100 million Big Data fund.

"Big Data is one of the biggest transformational changes in the data centre and IT landscape," Ping Li, an Accel partner, recently told a conference in Silicon Valley.

Big Data is more than simply a way of handling a lot of data - it is also about handling a lot of data in new ways quickly, and making links between sets of unstructured data.

An example of Big Data was supplied by Steve Hillion, chief product officer at Alpine Data Labs in San Mateo, who said that Salt Lake City-based Zions Bank found some unusual patterns among its high-end customers by using Big Data analytics.

The data showed a small cluster of customers who owned small businesses and generated a lot of value. The individuals within the group showed similar behaviour, but the group itself was so small that the individuals wouldn't have appeared in a structured, aggregated view where some of the data would have been truncated or discarded. In a Big Data environment, where the information remains in raw form rather than being aggregated, the bank was able to discern some faint patterns, and once the customers were identified it could target them with small business products.

The ability to use this additional data offers the opportunity to make a lot of money through micro-targeting groups that may not otherwise have been identified, Hillion concluded. Leading users of Big Data are trying to develop inventive stratification of consumers to target them in a much narrower way than they do now. That requires what he called unsupervised learning where the user doesn't know a model or key variable in advance but waits to learn it from the data.

Keith Collins, vice president and chief technology officer at SAS, said Big Data can be useful in fighting fraud, which has increased sharply in years. SAS turns the links in social analysis upside down to spot patterns and anomalies, sometimes running over a long time, that can indicate fraudsters working together.

Randy Lea, vice president of the Aster Data Center of Innovation at data specialist Teradata, said marketers need to know what a customer has been doing over time and across channels. Most know what a customer has been doing on their website, but how did they get there - was it an email marketing campaign or a click-through on a search engine ad? A cookie can identify his return, and if he clicks on a newsletter, the marketing department gets his email and perhaps Facebook or Twitter links. A week or two later he returns to browse some more and the bank can present a targeted offer.

This is not, in Lea's opinion, Big Data but simple application of the tools Teradata offers.

While there is debate over whether we are talking about Big Data in the sense that a computer scientist might think of the topic, data volumes are growing rapidly, generated by social networking, mobile internet, geographic information and sensors.

McKinsey Global Institute, in a widely cited paper from last May, estimated that by 2009 nearly all sectors in the US economy with companies of more than 1,000 employees held an average of 200 terabytes of data, twice the size of Wal-Mart's data warehouse in 1999 - which at the time was considered one of the seven wonders of the IT world.

The definition of Big Data is fairly loose: one general definition says it is data that can't be processed in memory and must be distributed and processed in parallel.

Sceptics say that this still too broad. Rasmus Wegener, a partner in the IT practice at management consultancy Bain, defines Big Data as a significant amount of structured and unstructured data that needs to be analysed at the same time, requiring complex very fast computing operations. He figures finance, electric grids and a few internet companies such as Yahoo and Google that are analysing users in real time so they can deliver targeted adverts are Big Data. Computing massive amounts of complex data to deliver a credit decision within seconds is Big Data - delivering the results overnight is just Large Data that users can run through Oracle, Teradata or SAS. Indeed, a Teradata consultant said that more than 25 of the firm's clients were running over a petabyte on Teradata systems.

Understanding the difference can be worth millions. A cruise ship line that Bain worked with wanted to improve its pricing on 300 trips a week with about 3,000 passengers per ship. It must be Big Data, said the business side, tossing it to IT; IT proposed using the Apache Hadoop software library, which is designed to allow distributed processing of large data sets across clusters of machines, and came up with a project budget of $1 million. Bain recommended the firm simply concentrate several of its scattered business analysts into a team, hire a director, and work on pricing. Adjusting the prices for rear-facing cabins generated $5 million in new revenue in the first year, without Hadoop.

Whatever their differences on the definition of Big Data, experts agree on the value of multi-disciplinary teams including experts in large scale mathematics, statistics, computer science and deep knowledge of the business.

Anand Rajaraman, senior vice president at Wal-Mart Global e-Commerce and co-founder of @WalmartLabs, said Big Data doesn't just break through hardware and software, it also breaks down traditional analytics. Many people know how to work with data, but that doesn't mean they are ready to work with Big Data. "The tools are very different. Many of the fundamental algorithms for predictive analytics depend crucially on keeping the data in main memory with a single CPU to access it. Big Data breaks that condition. The data can't all be in memory at the same time, so it needs to be processed in a distributed fashion. That requires a new programming model," he said.

Whether that requires someone with a degree in the new field of data science or merely teaching the best business analysts in the company how to use the newest tools remains, predictably, a subject of debate.

One ray of hope in all of this is that Alpine's Hillion thinks that the new tools coming over the next few years mean that ordinary business people will be working with Big Data, even if they aren't statisticians."We need to demystify this a bit, remove the cloud of obfuscation and make Big Data available to people," he said.

May 2012

Latest Issue

Download

Issue Archive

Subscribe to our Newsletter

Sign up to receive FREE Banking Technology news alerts straight to your inbox

Latest Whitepaper

MyStandards: a tool for change

MyStandards, officially launched 14 May, is a development that goes to "the heart of what Swift is doing to reduce the cost of managing the...