Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»

selective use of redundancy

Description is nonredundant and is used frequently, but is seldom updated.

Figure 3.29 Description is redundantly spread over the many places it is used. It must be updated in many places when it changes, but it seldom, if ever, does.

Occasionally, the introduction of derived (i.e., calculated) data into the physical database design can reduce the amount of I/O needed. Figure 3.31 shows such a case. A program accesses payroll data regularly in order to calculate the annual pay and taxes that have been paid. If the program is run regularly and at the year’s end, it makes sense to create fields of data to store the calculated data. The data has to be calculated only once. Then all future requirements can access the calculated field. This approach has another advantage in that once the field is calculated, it will not have to be calculated again, eliminating the risk of faulty algorithms from incorrect evaluations.

low probability of access

very high probability of access

Figure 3.30 Further separation of data based on a wide disparity in the probability of access.

introducing derived data

One of the most innovative techniques in building a data warehouse is what can be termed a “creative” index, or a creative profile (a term coined by Les Moore). Figure 3.32 shows an example of a creative index. This type of creative index is created as data is passed from the operational environment to the data warehouse environment. Because each unit of data has to be handled in any case, it requires very little overhead to calculate or create an index at this point.

Скачать в pdf «Building the Data Warehouse»