Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»

But the technology required to make this powerful interaction happen is not trivial. There are some obstacles to understanding the data that comes from the Web environment. For example, Web-generated data is at a very low level of detail-in fact, so low that it is not fit for either analysis or entry into the data warehouse. To make the clickstream data useful for analysis and the warehouse, the log data must be read and refined.

Figure 10.2 shows that Web log clickstream data is passed through software that is called a Granularity Manager before entry into the data warehouse environment.

A lot of processing occurs in the Granularity Manager, which reads clickstream data and does the following:

■■ Edits out extraneous data

■■ Creates a single record out of multiple, related clickstream log records

Figure 10.1 The activity of the Web environment is spun off into Web logs in records called clickstream records.



Figure 10.2 Data passes through the Granularity Manager before entering the data warehouse.


■■ Edits out incorrect data

■■ Converts data that is unique to the Web environment, especially key data that needs to be used in the integration with other corporate data

■■ Summarizes data ■■ Aggregates data

As a rule of thumb, about 90 percent of raw clickstream data is discarded or summarized as it passes through the Granularity Manager. Once passed through the manager into the data warehouse, the clickstream data is ready for integration into the mainstream of corporate processing.

Скачать в pdf «Building the Data Warehouse»