Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»

■    Sometimes the formula for correction is so complex that making an adjustment cannot be done.

■    Choice 3: Reset the account to the proper value on August 16. An entry on August 16 reflects the balance of the account at that moment regardless of any past activity. An entry would be made for $750 on August 16. But this approach has its own drawbacks:

■    The ability to simply reset an account as of one moment in time requires application and procedural conventions.

■    Such a resetting of values does not accurately account for the error that has been made.

Choice 3 is what likely happens when you cannot balance your checking account at the end of the month. Instead of trying to find out what the bank has done, you simply take the bank’s word for it and reset the account balance.

There are then at least three ways to handle incorrect data as it enters the data warehouse. Depending on the circumstances, one of the approaches will yield better results than another approach.


The two most important design decisions that can be made concern the granularity of data and the partitioning of data. For most organizations, a dual level

of granularity makes the most sense. Partitioning of data breaks it down into small physical units. As a rule, partitioning is done at the application level rather than at the system level.

Data warehouse development is best done iteratively. First one part of the data warehouse is constructed, then another part of the warehouse is constructed. It is never appropriate to develop the data warehouse under the “big bang” approach. One reason is that the end user of the warehouse operates in a discovery mode, so only after the warehouse’s first iteration is built can the developer tell what is really needed in the warehouse.

Скачать в pdf «Building the Data Warehouse»