Building the Data Warehouse

Скачать в pdf «Building the Data Warehouse»

End users operate in a mode that can be called the “discovery mode.” End users don’t know what their requirements are until they see what the possibilities are. Initially populating large amounts of data into the data warehouse is dangerous-it is a sure thing that the data will change once populated. Jon Geiger says that the mode of building the data warehouse is “build it wrong the first time.” This tongue-in-cheek assessment has a strong element of truth in it.

The population and feedback processes continue for a long period (indefinitely). In addition, the data in the warehouse continues to be changed. Of course, over time, as the data becomes stable, it changes less and less.

A word of caution: If you wait for existing systems to be cleaned up, you will never build a data warehouse. The issues and activities of the existing systems’ operational environment must be independent of the issues and activities of the data warehouse environment. One train of thought says, “Don’t build the data warehouse until the operational environment is cleaned up.” This way of thinking may be theoretically appealing, but in truth it is not practical at all.

One observation worthwhile at this point relates to the frequency of refreshment of data into the data warehouse. As a rule, data warehouse data should be refreshed no more frequently than every 24 hours. By making sure that there is at least a 24-hour time delay in the loading of data, the data warehouse developer minimizes the temptation to turn the data warehouse into an operational environment. By strictly enforcing this lag of time, the data warehouse serves the DSS needs of the company, not the operational needs. Most operational processing depends on data being accurate as of the moment of access (i.e., current-value data). By ensuring that there is a 24-hour delay (at the least), the data warehouse developer adds an important ingredient that maximizes the chances for success.

Скачать в pdf «Building the Data Warehouse»