Building the Data Warehouse

■■ Multiple outputs may result. Data may be produced at different levels of summarization by the same data warehouse creation program.

■■ Default values must be supplied. Under some conditions an output value in the data warehouse will have no source of data. In this case, the default value to be used must be specified.

■■ The efficiency of selection of input data for extraction often becomes a real issue. Consider the case where at the moment of refreshment there is no way to distinguish operational data that needs to be extracted from operational data that does not need to be extracted. When this occurs, the entire operational file must be read. Reading the entire file is especially inefficient because only a fraction of the records is actually needed. This type of processing causes the online environment to be active, which further squeezes other processing in the online environment.

■    Summarization of data is often required. Multiple operational input records are combined into a single “profile” data warehouse record. To do summarization, the detailed input records to be summarized must be properly sequenced. In the case where different record types contribute to the single summarized data warehouse record, the arrival of the different input record types must be coordinated so that a single record is produced.

■    Renaming of data elements as they are moved from the operational environment to the data warehouse must be tracked. As a data element moves from the operational environment to the data warehouse environment, it usually changes its name. Documentation of that change must be made.

