Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»

Another important design technique that is especially relevant to the data warehouse environment is the deliberate introduction of redundant data. Figure 3.29

creating arrays of data for performance







data physically organized into an array

Figure 3.28 Under the right circumstances, creating arrays of data can save considerable resources.

shows an example where the deliberate introduction of redundant data pays a big dividend. In the top of Figure 3.29, the field—description—is normalized and exists nonredundantly. In doing so, all processes that must see the description of a part must access the base parts table. The access of the data is very expensive, although the update of the data is optimal.

In the bottom of Figure 3.29, the data element—description—has been deliberately placed in the many tables where it is likely to be used. In doing so, the access of data is more efficient, and the update of data is not optimal. For data that is widely used (such as description), and for data that is stable (such as description), however, there is little reason to worry about update. In particular, in the data warehouse environment there is no concern whatsoever for update.

Another useful technique is the further separation of data when there is a wide disparity in the probability of access. Figure 3.30 shows such a case.

In Figure 3.30, concerning a bank account, the domicile of the account and the data opened for the account are normalized together with the balance of the account. Yet the balance of the account has a very different probability of access than the other two data elements. The balance of an account is very popular, while the other data is hardly ever accessed. To make I/O more efficient and to store the data more compactly, it makes sense to further reduce the normalized table into two separate tables, as shown.

Скачать в pdf «Building the Data Warehouse»