Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»

•    a fraction of data in the warehouse

•    used for very efficient formulation of a query

•    cannot be used for general purpose analysis— can only be used for statistical analysis

Figure 2.18 Living simple data-another way of changing the granularity of data.

A living sample database is not a general-purpose database. If you wanted to find out whether J Jones is a customer, you would not look into a living sample database for that information. It is absolutely possible for J Jones to be a customer but not be on record in the living sample. These databases are good for statistical analysis and looking at trends, and can offer some very promising results when data must be looked at collectively. They are not at all useful for dealing with individual records of data.

One of the important aspects of the building a living sample database is how the data is loaded, which determines the amount of data in the database and how random the data will be. Consider how a living sample database is typically loaded. An extract/selection program rummages through a large database, choosing every one-hundredth or every one-thousandth record. The record is then shipped to the living sample database. The resulting living sample database, then, is one-hundredth or one-thousandth the size of the original database. The query that operates against this database then uses one-hundredth or one-thousandth the resources as a query that would operate directly against the full data warehouse.

The selection of records for inclusion into the living sample is usually random. On occasion, a judgment sample is taken, in which a record must meet certain criteria in order to be selected. The problem with judgment samples is that they almost always introduce bias into the living sample data. The problem with a random selection of data is that it may not produce statistical significance. But however it’s done, a subset of the data warehouse is selected for the living sample. The fact that any given record is not found in the living sample database

Скачать в pdf «Building the Data Warehouse»