Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»


A more rational approach is to physically merge the tables so that minimal I/O is consumed, as seen in Figure 3.27. Now the same program operates as before, only it needs much less I/O to accomplish the same task.


The question, then, becomes what is a sane strategy to merge the tables so that the maximum benefit is derived? It is in answering this question that the physical database designer earns his or her reward.


Merging tables is only one design technique that can save I/O. Another very useful technique is creating an array of data. In Figure 3.28, data is normalized so that each occurrence of a sequence of data resides in a different physical location. Retrieving each occurrence, n, n + 1, n + 2, … , requires a physical I/O to get the data. If the data were placed in a single row in an array, then a single I/O would suffice to retrieve it, as shown at the bottom of Figure 3.28.


Of course, it does not make sense to create an array of data in every case. Only when there are a stable number of occurrences, where the data is accessed in sequence, where it is created and/or updated in a statistically well-behaved

Figure 3.26 When there are many tables, much I/O is required for dynamic intercon-nectability.

Figure 3.27 When tables are physically merged, much less I/O is required.


sequence, and so forth, does creating an array pay off.


Interestingly, in the data warehouse these circumstances occur regularly because of the time-based orientation of the data. Data warehouse data is always relevant to some moment in time, and units of time occur with great regularity. In the data warehouse, creating an array by month, for example, is a very easy, natural thing to do.

Скачать в pdf «Building the Data Warehouse»