Wednesday 23 September 2009

Data Warehouse Design, A Two or Three Tier Approach?

A recent LinkedIn discussion on data warehouse design approaches has reassured me that after a decade of running and managing BI projects that there is still healthy and lively debate on what is the 'right' approach. This particular discussion is focused on if there should be two 'tiers' or three with a tier representing a transformation step and (usually) a persisted data schema.

A two tier approach comprises loading or staging which is transformed into a second, dimensional tier. The three tier approach includes an interim step, a relational integrated schema.

One of the common (and understandable concerns) is that three tiers will increase development times. The logic is seemingly sound. It *must* take longer to build three things than it does to build two things. However, in reality it doesn't!

The reason it doesn't is simple. Whilst there are many successful implementation using only two tiers, there are always three critical transformations to make, three essential data problems to 'fix'.

These are;

1. Load. This is the job of extracting data from source databases quickly and without disrupting source applications.


2. Integrate. Critically, this is integrating and representing the loaded data into a single, consistent (relational) format.


3. Present. This is simplifying the enterprise model into a dimensional schema that is highly performant and is easier for end users to navigate.

The choice then is to do this in three simpler steps or two more complex (actually one simple load step and one *really* complex integrate/presentation) step.

Finally, a three tier approach has consistently proven to be easier to maintain and support when the inevitable change requests start to come through. Actually a three tier approach implemented correctly can prove to be easier and therefore more cost effective than two tier over the total life of an application.