Features


 

Data, captured

Data warehouses are about to explode, with new data types and huge volumes - but let's not forget what they are actually for.

There seems to come a point in any technology's life cycle when it moves from being a tightly-defined academic description to being the equivalent of go-faster stripes on a cheap production car.

Complex Event Processing is a current example, but one that has gone round the loop a few times is data warehousing, and it is now linking with another buzz phrase, Business Intelligence.

The concept of data warehouses came out of IBM and was intended to provide an architectural model for the flow of data from operational systems to decision support environments.  There were two main issues - the cost, and the amount of redundancy of information that existed.

As complexity escalated, and the cost of storage continued to fall, those origins are now taking a back seat to the issues around dealing with the data, and that is where the business intelligence angle comes in.

While there are plenty of conversations going on about the best architectural approach in building data warehouses, the attraction for business is more about getting the information out of them.

The ability to analyse petabytes of data in near real time is "not just about competitive advantage, but about survival," according to Mike Koehler, president and chief executive of data warehousing specialist Teradata.

Koehler was speaking at the company's annual user conference, held in Las Vegas, last year where it made a series of product and services announcements that it claims will make petabyte-scale analytics affordable for an increasing number of businesses.

Among the announcements were a new version of its primary database, featuring a virtual storage capability that automatically manages data, putting ‘hot', frequently accessed, data on the fastest storage devices and migrates ‘colder' data to the slowest media without user intervention.

This is a feature that will become more important as solid state disk drives become widely available. The company was demonstrating a working prototype of an SSD-based data warehouse. As well as providing faster access speeds than traditional disks, these also offer the promise of a 50% reduction in power use.

Marking the emergence of this level, the company announced five customers that have data warehousing capabilities greater than 1 petabyte. Bank of America and another financial services company are among the five, led by eBay, which has 5 petabytes, and WalMart with 2.5 petabytes. Dell is in fifth place among the "Petabyte Power Players" with 1 petabyte.

Andrew Bond, technology director for database, business intelligence and data warehousing at Oracle UK, says that there is "a lot of confusion" in the market, some of it deliberately put about by supporters of one approach or another, but it all comes down to the data. "You have the data, which is the information management part, and you have the information access," he says. "Together they equal business intelligence technologies."

Conceptually separating the two is important, he says, because the granularity of the stored data is preserved, and can be retrieved later, either for something like dispute resolution or for some novel form of analysis.

The underlying structure is less important than understanding what the information is for says Bond. "I don't think that most people in an organisation need to know that they are ‘consuming business intelligence'. They just want to make informed decisions supported by derived information."

Colin Rickard, managing director EMEA at data management specialist DataFlux, also cautions against becoming bogged-down in the technical issues. "A data warehouse is not in itself a business benefit, and that's not a reason for a bank to build one" he says. "They should build one to better understand risks, to improve the business processes, key performance indicators and margins - unless it does one or more of those things there is no reason."

Rickard also says that the notion of a giant repository of data that the phrase conjures up is false. "Any fool can build a huge database and build reports from it, but no bank has the Mother of All Data warehouses; typically you will have several," he says. "The single giant database would involve lots of hardware and plays well for vendors in that space."

The thing to remember, he says, is that separation of the data and the analytics. "IT is responsible for the bits and bytes, and the people responsible for the content are the business users, so they have to work together. There is a responsibility incumbent on IT to make the business users understand the limitations of the technology."

One of the limitations is as old as the computer industry itself - the old garbage in, garbage out maxim.

"Data quality is still a massive issue," says Bond. "It takes time and there is no alternative. Bringing data from operational systems into the warehouse does help as part of this, but "you still need to trust the data", he says.

Rickard says that in financial services firms the nature of the data is crucial. "There is a lot of constantly changing data, and that means a constantly changing risk picture," he says. "Now we are starting to work on data governance."

This is also being pushed by the constant need for regulatory compliance, but Rickard says that there is still scope to use the systems simply to improve management of the business. "It's a less popular discussion these data, but there are a whole set of business processes that could be brought into the flow - people have been talking about the single customer view for as long as I can remember, for instance."

A further complication that is both exciting and challenging is to apply the analytic techniques that have been applied to traditional forms of data to new and emerging forms to create ever richer understandings of the market.

"There is a lot of capability around unstructured data," says Bond. "The approach is to first secure it in the warehouse, and then to combine it with structured data to derive greater value from it using text mining and other techniques - and that goes back to supporting people making informed decisions."

One potentially enormous area for new forms of data comes from geographic information systems, says Steve Brobst, chief technology officer at Teradata. He told the recent conference that there will be an explosion in this kind of information as mobile technologies continue to be deployed and the application of RFID spreads beyond the logistics industry.

These will allow the creation of new applications such as using location data to make security or anti-fraud decisions in real-time - for instance, if a card is being used at a petrol filling station, the system can compare the card history against the location. Most people use five or less filling stations on a regular basis.

Taking this further will see the kind of dynamic customer recommendations that users of Amazon or eBay are already familiar with, but on a ubiquitous level - with participating stores able to create on-the-fly offers based on previous purchasing histories, or meteorological data, as the consumer moves around in the physical world. How consumers take to what could be a rather intrusive experience is a whole different question, but the world of what Brobst calls "pervasive business intelligence" is already opening up.

Back in the world as it currently exists, Rickard sees two clear needs for data management in banking. The first is the recent announcement from the UK's Financial Services Authority that banks must invest in systems to ensure they can rapidly declare their exposures to customers in the event of a failure.

The second is the related phenomenon of the enforced mergers that are happening across the industry. "What's different about the current spate of mergers is that they can't run the systems as standalone," he says. "They have to drive value and that means consolidation of people, process and systems. There are going to be challenges. BT

CASE STUDY: GE and enterprise risk

by Tom Groenfeldt

Decentralising a conglomerate like GE Commercial Finance and controlling through P&L can be a fine way to encourage local initiative and avoid micromanagement from headquarters, but it does make enterprise risk management a challenge.

At the most recent Teradata Partners conference, Chris Watkins, senior solution architect for GE Commercial Finance, described the challenges of consolidating measures of risk in an enterprise data warehouse. The commercial finance unit of GE is one of the largest non-banks in the world with diversified P&Ls that reflect its huge array of businesses - from lending to dentists and providing leases on copiers to financing a corporate jet or underwriting project finance on power plants.

Building a view of risk across the enterprise starts with AAA data - accuracy, availability and accessibility, he said. And obtaining value from data stored in a data warehouse requires a common data model, common definitions, and governance on data quality and auditability. Business units wanted to maintain their own version of the truth, but for the data warehouse to work for risk management, headquarters had to insist on a single standard which can provide the basis for audits for Sarbanes-Oxley compliance, Basel II, and internal auditors.

With a culture of business unit autonomy, developing a single standard required senior executive support. "We needed to make clear that this was important to the company as a whole and there have to be consequences for those who don't comply," he said.  "An individual business many have to compromises on how they look at their data to support the enterprise." Tracking progress through a project management tool which flagged laggards in red helped pull along reluctant unit leaders, he added.

The project leaders also worked to sell business units on the value of a data warehouse. "You need to show that can you do new things with this information, that we are building an information resource to enable analysis and decision making."

The data warehouse pulls data from 75 systems of record across 20 business units into a common core data model, the financial services logical data model. It took time to educate the business units on the value of a data model, and the project leaders had to understand how many different terms businesses would use to describe parties to a transaction, such as a borrower, vendor or guarantor. And real estate is unique, or as Watkins put it, "a funny animal".

"They don't have parties; the party is really the building itself." Business units and IT have to understand the different vocabularies before they can develop data standards. Reference data is much more difficult than most people appreciate, they learned during the economic crisis.  The firm found it had exposure to AIG through subsidiaries of the insurer which didn't have AIG anywhere in their name.

"You need to have this discussion early and set clear data quality rules and measurement standards," added Watkins. "Business must take ownership but IT has to be there with its expertise in the way data is used."