Monday 1 June 2009

Small Children, Energy and Efficient Data Warehousing

Last week, I referred to Peter Thomas, and his article Using multiple BI tools in a BI implementation – Part II, In the article, Peter points out that the way to drive consistency across dimensions and measures is to define as much logic as possible in the data warehouse.

I was musing over this again this weekend (I hear the cries of what an interesting life you have) whilst out for lunch with friends and their small children (ages 9 and 7) On the short walk to the restaurant I was amused by how different their approach at getting from A to B was to the 'grown ups'. We were focused on getting to the destination in a relatively direct and efficient way. However, the children ran to and fro, stopped, doubled-back, looped a few circles and even randomly waved their arms in the air. They generally spent as much time in a state of motion as possible. Clearly the criteria of small children, when en-route to a destination, is to use the maximum amount of energy possible!

This, if you will go with me on this, is rather like trying to implement data warehouse consistency in the BI tools rather than further downstream in the data warehouse. You do eventually get to the destination, but will probably be exhausted, out of breath and hungry. This is fine if you are two children out for dinner with some stuffy grown-ups but not an efficient use of the somewhat limited time of a BI practitioner.

A typical BI architecture comprises tiers that include;

  • Source Systems
  • ETL
  • Data (Data Warehouse, Data Marts, OLAP Cubes)
  • BI Metadata
  • BI Application (Reports, Scorecards, Dashboards)

A properly architected data warehouse (more on this in later blogs) should have been built against an enterprise schema and is therefore *the* consistent representation of business information. Common definitions of customers, departments, profit and products live here. If there is one good reason for this (although there are many) then it is simply that there these can all be defined once in the data warehouse but would have to be defined many, many times in what can often be hundreds of reports that comprise a BI solution.

One of the reasons that we fail to do this is that inconsistency is often made visible for the first time by the BI tools. At this point the project momentum is around building metadata models and reports. Inconsistencies are fixed where the resources are focused ... in reports. Add to this that revisiting the design may need involvement from the ETL developer, the DW designer and the business analyst and, if there is a lack of clarity, the business users. It is no surprise that the report author is inclined to fix it where they stand. After all, the tools make the fix simple and it is only when the report author has built the same calculation for the tenth time that they become suspicious about starting it in the first place. And of course, the initial build of the BI application is only the beginning. Many more reports will be built over the life of the BI application.

So work hard to establish the correct definitions during the design of the data warehouse and it will reap productivity gains. Leave it to the BI application only if you have the carefree attitude, the free-time and the energy levels of an eight year old.

No comments:

Post a Comment