Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»

■■ Nonkey data is reformatted as it passes from the operational environment to the data warehouse environment. As a simple example, input data about date is read as YYYY/MM/DD and is written to the output file as DD/MM/YYYY. (Reformatting of operational data before it is ready to go into a data warehouse often becomes much more complex than this simple example.)

■    Data is cleansed as it passes from the operational environment to the data warehouse environment. In some cases, a simple algorithm is applied to input data in order to make it correct. In complex cases, artificial intelligence subroutines are invoked to scrub input data into an acceptable output form. There are many forms of data cleansing, including domain checking, cross-record verification, and simple formatting verification.

■    Multiple input sources of data exist and must be merged as they pass into the data warehouse. Under one set of conditions the source of a data warehouse data element is one file, and under another set of conditions the source of data for the data warehouse is another file. Logic must be spelled out to have the appropriate source of data contribute its data under the right set of conditions.

■    When there are multiple input files, key resolution must be done before the files can be merged. This means that if different key structures are used in the different input files, the merging program must have the logic embedded that allows resolution.

■    With multiple input files, the sequence of the files may not be the same or even compatible. In this case, the input files need resequenced. This is not a problem unless many records must be resequenced, which unfortunately is almost always the case.

Скачать в pdf «Building the Data Warehouse»