Building the Data Warehouse

There is an important feedback loop for the data warehouse environment. Upon building the data warehouse’s first iteration, the data architect listens very carefully to the feedback from the end user. Adjustments are made based on the user’s input.

Another important consideration is the levels of granularity needed by the different architectural components that will be fed from the data warehouse. When data goes into overflow—away from disk storage to a form of alternate storage—the granularity can be as low as desired. When overflow storage is not used, the designer will be constrained in the selection of the level of granularity when there is a significant amount of data.

For overflow storage to operate properly, two pieces of software are neces-sary—a cross-media storage manager that manages the traffic to and from the disk environment to the alternate storage environment and an activity monitor. The activity monitor is needed to determine what data should be in overflow and what data should be on disk.

The Data Warehouse and Technology

In many ways, the data warehouse requires a simpler set of technological features than its predecessors. Online updating with the data warehouse is not needed, locking needs are minimal, only a very basic teleprocessing interface is required, and so forth. Nevertheless, there are a fair number of technological requirements for the data warehouse. This chapter outlines some of these.

Managing Large Amounts of Data

Prior to data warehousing the terms terabytes and petabytes were unknown; data capacity was measured in megabytes and gigabytes. After data warehousing the whole perception changed. Suddenly what was large one day was trifling the next. The explosion of data volume came about because the data warehouse required that both detail and history be mixed in the same environment. The issue of volumes of data is so important that it pervades all other aspects of data warehousing. With this in mind, the first and most important technological requirement for the data warehouse is the ability to manage large amounts of data, as shown in Figure 5.1. There are many approaches, and in a large warehouse environment, more than one approach will be used.

