The data warehouse contains a very useful source of data for the explorer and data miner. The data found in the data warehouse is cleansed, integrated, organized. And the data is historical. This foundation is precisely what the data miner and the explorer need in order to start the exploration and data mining activity. It is noteworthy that while the data warehouse provides an excellent source of data for the miner and the explorer, the data warehouse is often not the only source. External data and other data can be freely mixed with data warehouse data in the course of doing exploration and mining. Refer to the book Exploration Warehousing (Wiley, 2000) for more information on this topic.

Living Sample Database

Occasionally, it is necessary to create a different kind of data warehouse. Sometimes there is simply too much data for normal access and analysis. When this happens, special design approaches may be used.

An interesting hybrid form of a data warehouse is the living sample database, which is useful when the volume of data in the warehouse has grown very large. The living sample database refers to a subset of either true archival data or lightly summarized data taken from a data warehouse. The term “living” stems from the fact that it is a subset—a sample—of a larger database, and the term “sample” stems from the fact that periodically the database needs to be refreshed. Figure 2.18 shows a living sample database.

In some circumstances (for example, statistical analysis of a population or profiling), a living sample database can be very useful and can save huge amounts of resources. But there are some severe restrictions, and the designer should not build such a database as part of the data warehouse unless he or she is aware of the limitations.

