Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»


In some cases, the level of granularity of the external data will not match that of the internal systems of the corporation. For example, suppose a corporation has individual household information. Now suppose the corporation purchases a list of income by zip code. The external list says that the average household income in the zip code is $X. The matching of the internal household information is done such that each household in a zip code is assigned the income specified by the external file. (This means that some households will be assigned an income level below their means and that other households will be assigned an income level above their means. But, on average, the household income will be about right.) Once this arbitrary assignment of income is done, the data can be sliced and diced into many other patterns.


The third factor that makes external data hard to capture is its unpredictability. External data may come from practically any source at almost any time.


In addition to external data that might come from a magazine article or a consultant’s report, another whole class of data is just now able to be automated— unstructured data. The two most common types of unstructured data are image and voice data. Image data is stored as pictures; voice data is stored digitally and can be translated back into a voice format. The issues of image data and voice data stem primarily from technology. The technology to capture and manipulate image and voice data is not nearly as mature as more conventional technology. In addition, even when image and voice data can be captured, their storage requires huge amounts of DASD, and their recall and display or playback can be awkward and slow.

Скачать в pdf «Building the Data Warehouse»