Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»


The primary issue of granularity is that of getting it at the right level. The level of granularity needs to be neither too high or too low.


The trade-off in choosing the right levels of granularity—as discussed in Chapter 2—centers around managing the volume of data and storing data at too high a level of granularity, to the point that detailed data is so voluminous that it is unusable. In addition, if there is to be a truly large amount of data, consideration must be given to putting the inactive portion of the data into overflow storage.

Raw Estimates


The starting point for determining the appropriate level of granularity is to do a raw estimate of the number of rows of data and the DASD (direct access storage device) that will be in the data warehouse. Admittedly, in the best of circumstances, only an estimate can be made. But all that is required at the inception of building the warehouse is an order-of-magnitude estimate.


The raw estimate of the number of rows of data that will reside in the data warehouse tells the architect a great deal. If there are only 10,000 rows, almost any level of granularity will do. If there are 10 million rows, a low level of granularity is needed. If there are 10 billion rows, not only is a low level of granularity needed, but a major portion of the data must go into overflow storage.


Figure 4.1 shows an algorithmic path to calculate the space occupied by a data warehouse. The first step is to identify all the tables to be built. As a rule of thumb, there will be one or two really large tables and many smaller supporting tables. Next, estimate the size of the row in each table. It is likely that the exact size will not be known. A lower-bound estimate and an upper-bound estimate are sufficient.

Скачать в pdf «Building the Data Warehouse»