There is every reason to do capacity planning for the data warehouse, DSS environment. Data warehouses grow at a tremendous rate. Data warehouse cost money. Data warehouses operate on a variety of technologies.Data warehouses have performance problems just like other technological environments.Therefore it only makes sense to plan your data warehouse environment.In doing so, you place your corporation in a proactive position, not a reactive one.
Who Does Data Warehouse Capacity Planning?
But come on. How many corporations do capacity planning for the data warehouse, DSS environment? And of those few corporations that actually do capacity planning, how many have done capacity planning for the data warehouse environment well and effectively?
The answer is that there are at best a handful of companies that have done data warehouse, DSS capacity planning effectively.
Does that mean that data warehouse DSS capacity planning should not be done? Not at all. There is every good reason to do capacity planning for the data warehouse environment. It just that capacity planning for the data warehouse, DSS environment is ... gulp...hard to do. It is ...gulp, gulp ... "different" than anything that most capacity planners have ever done before. therefore capacity planning for the data warehouse DSS environment is an oxymoron.
Three Approaches
This article will describe three approaches to doing capacity planning for the data warehouse DSS environment. After reading about the three approaches, you can decide why capacity planning for the data warehouse, DSS environment is so difficult. And if you are one of the hardy souls who likes to be proactive, you might even choose one of the approaches.
What Needs To Be Planned?
What is it that needs to be planned in the data warehouse DSS environment?While there are many facets to the data warehouse DSS environment, the two most important aspects of capacity planning are planning for storage and planning for processors.
It is noteworthy that there are plenty of other things that come with the territory:
-
what dbms to use,
-
what form of networking to be used,
-
what end user access and analysis tools should be used,
-
what kind of integration and transformation tools should be used, and so forth.
But the heart of the matter is how much storage and how many and what kind of processors should be used for the data warehouse.
Key Factors
The two key factors the capacity planner looks at are the amount of data there will be and the workload that will be run against the data warehouse.
Unfortunately, both of these factors in the data warehouse DSS environment are very difficult to ascertain.
The Analytical Approach
The first approach to capacity planning is the analytical approach.The analytical approach is one where the capacity planner attempts to calculate and/or predict capacity needs before the equipment is purchased.In the analytical approach the analyst attempts to quantify such things as:
Each of these interrelated questions must be answered in order for the analyst to determine how much data there will be in the warehouse. And if you have ever struggled through an exercise of trying to accurately predict these questions, you know that accuracy is very difficult to come by. In all honesty, a good guess is about the best that can be achieved.
But volumes of data are only one aspect of capacity planning. The other side of capacity planning in the data warehouse DSS environment is that of workload projection. And if you thought trying to predict volumes of data was difficult, wait until you try to predict what the workload for the DSS environment is going to look like.
There are many factors that must be considered when trying to profile the data warehouse DSS workload. Some of the more interesting factors are:
-
how many farmers will you have?
-
how many explorers do you have?
-
what does the average farmer query look like?
-
what does the pattern of submission for the farmers look like?
-
what does the explorer query look like?
-
is there any pattern to the submission of analysis by the explorer community?
-
has the explorer community ever been addressed or graced with an infrastructure before?
-
are there predictable peaks and valleys of processing:
-
throughout the day?
-
throughout the month?
-
throughout the quarter?
-
will there be an attempt to use a resource governor? And so on.
There are then many questions that need to be answered in order to portray the data warehouse DSS workload. As in the case of volumes of data, an accurate picture simply cannot be painted.
But perhaps the biggest enemy of the analytical approach is that of the attitude of the community of people using the data warehouse DSS environment. In most cases the data warehouse DSS environment is one of discovery. People simply don't know what is going to happen until they get there. People don't know what they will do until they know what the possibilities are. And where people really don't know what they will do, trying to look into a crystal ball and predict what will happen is black magic.
The Calibrated Extrapolation Approach
Which leads to the second approach to capacity planning. That approach is the calibrated extrapolation approach.The calibrated extrapolation approach is one where there is at best a rudimentary attempt at analytical capacity planning. But after the first or second iteration of the warehouse is created and after the first few users have become enamored of the data warehouse, then careful track is kept for the warehouse and its usage. Over calibrated periods of time, the growth of the warehouse is tracked. Based on the incremental growth that is being measured, an extrapolation of future capacity needs is made.The extrapolation of capacity needs then becomes an educated guess.Of course the educated guess can be refined.The analyst can factor in known growth factors such as addition of new subject areas, addition of history, and the like. In doing so the analyst combines the best of the calibrated extrapolation approach and the analytical approach.
But even when the calibrated extrapolation approach is used wisely and well, the calibrated extrapolation approach has only a short time horizon for effectiveness. In other words, trying to project outward into the long term future using the calibrated extrapolation approach is a dicey venture. Extrapolation can be done for three months or maybe even for six months. But anything beyond that is questionable.
The Copycat Approach
The third approach is the "copycat" approach. In the copycat approach you go and find some company with roughly the same characteristics as your company but where the company has advanced into data warehousing further than your company. In this case you simply ask what environment they are operating in and ask how things are going. The copycat approach is by far the easiest approach. When the copycat approach to capacity planning works well, nothing beats it.
But there are pitfalls with the copycat approach. Some of the pitfalls are:
All of these factors mean that the comparison between your company and the examined company may produce very misleading results.
Enter The Vendor
There is of course a fourth alternative. That alternative is to let a hardware vendor come in and do capacity planning for you. This is surely the laziest way to go. But don't be surprised when the vendor discovers that the only way to meet your capacity needs is to buy the vendors hardware. In short, the capacity planning done by hardware and dbms vendors is an exercise in subtle hard selling. You may actually get some useful capacity projections. You will certainly get a hard sale for the vendors products.
凡是有该标志的文章,都是该blog博主Caoer(草儿)原创,凡是索引、收藏
、转载请注明来处和原文作者。非常感谢。