Building the Data Warehouse

The data warehouse is the ideal place to store external and unstructured data. If external and unstructured data is not stored in a centrally located place, several problems are sure to arise. Figure 8.2 shows that when this type of data enters the corporation in an undisciplined fashion, the identity of the source of the data is lost, and there is no coordination whatsoever in the orderly use of the data.

Typically, when external data is not entered into the data warehouse, it comes into the corporation by means of the PC. There is nothing wrong per se with entering data at the PC level. But almost always, the data is entered manually through a spreadsheet, and absolutely no attempt to is made capture information about its source. For example, in Figure 8.2 an analyst sees a report in the





Figure 8.1 External and unstructured data both belong in the data warehouse.

Business Week

Wall Street Journal






Figure 8.2 Problems with unstructured data.

Los Angeles Times

Wall Street Journal. The next day, the analyst uses the data from the Wall Street Journal as part of a report, but the original source of the data is lost as it is entered into the corporate mainstream of data.

Another difficulty with the laissez-faire approach to external data is that at a later time it is hard to recall the data. It is entered into the corporation’s systems, used once, and then it disappears. Even a few weeks later, it is hard to find and then reprocess the data for further use. This is unfortunate because much of the data coming from external sources is quite useful over the spectrum of time.

