Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»

The second reason is the algorithmic differential. For example, one department has chosen to analyze all old accounts. Another department has chosen to ana-

Figure 1.4 The reasons for the predictability of the crisis in data credibility in the naturally evolving architecture.

lyze all large accounts. Is there any necessary correlation between the characteristics of customers who have old accounts and customers who have large accounts? Probably not. So why should a very different result surprise anyone?

The third reason is one that merely magnifies the first two reasons. Every time a new extraction is done, the probabilities of a discrepancy arise because of the timing or the algorithmic differential. And it is not unusual for a corporation to have eight or nine levels of extraction being done from the time the data enters the corporation’s system to the time analysis is prepared for management. There are extracts, extracts of extracts, extracts of extracts of extracts, and so on. Each new level of extraction exaggerates the other problems that occur.

The fourth reason for the lack of credibility is the problem posed by external data. With today’s technologies at the PC level, it is very easy to bring in data from outside sources. For example, Figure 1.5 shows one analyst bringing data into the mainstream of analysis from the Wall Street Journal, and another analyst bringing data in from Business Week. However, when the analyst brings data in, he or she strips the identity of the external data. Because the origin of the data is not captured, it becomes generic data that could have come from any source.

Скачать в pdf «Building the Data Warehouse»