The Search For Intelligent Life in the Cost-Based Optimizer
Tim Gorman, Evergreen Database Technologies, Inc.
IMPORTANT NOTE:
This paper was written based on observations and testing in Oracle8 version 8.0 and Oracle8i version 8.1.
The recommendations in this paper are ENTIRELY OBSOLETE for Oracle9i, Oracle10g, Oracle11g, and higher.
Please DO NOT use the concepts in this paper to modify the OPTIMIZER_INDEX_* parameters from Oracle9i and higher, but instead use the GATHER_SYSTEM_STATS procedure in the DBMS_STATS package.
I am leaving this paper visible on the internet only for historical interest, because I enjoyed researching and writing this paper 10 years ago, and so that people can see this disclaimer and explanation.
My sincere apologies for any harm that this out-of-date paper and its recommendations may have caused in the 6+ years since Oracle9i R1 was released in 1992.
-Tim Gorman, January 2009
rbo versus cbo
For almost a decade since the cost-based optimizer (CBO) was introduced, its existence has been distinguished primarily by frustration and disappointment. It debuted in Oracle7 version 7.0 with the promise of magically improving application performance, with the promise of transforming existing SQL statements into lightning, by making decisions that we ourselves would make.
Instead, the CBO seemed bent on self-destruction, choosing execution plans which could only be described in scatological terms. When, seemingly by chance, it did choose a sensible execution plan, it was not the end of the story. The CBO became infamous for flip-flopping unexpectedly away from a good plan to a bad plan without warning, purportedly as data conditions changed.
As a result, the majority of application developers struggle to retain the venerable rule-based optimizer (RBO), if only for its perceived predictability and stability. The only thing that the CBO seemed to have going for it was that it supported all of the nifty new features introduced by Oracle from version 7.1 onwards. It seemed that risking the use of the CBO was the price one paid in order to use new features. Other than that, conventional wisdom holds that you stick with the RBO unless you thrive on stress and regret.
Although the situation with the CBO quietly became drastically better in Oracle8 v8.0 and has become orders of magnitude better in Oracle8i, this widely held perception persists. In fact, one major new feature of Oracle8i, stored outlines, was touted by Oracle itself as optimizer plan stability. Unfortunately, what Oracle had intended that phrase to mean is different from the way the rest of the world interprets it. It was intended as a mechanism for augmenting CBO in testing and migration situations. But everybody (at least everybody with whom I have spoken) seems to interpret this new feature as an enormous “band-aid” for the CBO, as a tacit admission by Oracle that the CBO simply cannot be relied upon, that it does not work. In a funny way, CBO hints on SQL statement are perceived similarly. In truth, neither of these are a tacit admission of any shortcoming because the CBO actually works.
The truth is that, while these negative perceptions were true for Oracle7, in Oracle8 the CBO works just fine, most especially in Oracle8i. This is a rather bold statement to make, but please allow me the chance to convince you that I am neither a shill for Oracle Corporation nor (totally) insane. Any anomalous behavior you may have witnessed or heard about is more due to abuse (i.e., poor input or poor configuration) rather than use.
Cut to the chase…
OK, to save you having to read this whole paper, I’m going to cut to the chase and reveal the major recommendations, right here in the beginning…
Starting in Oracle8 v8.0, two new initialization parameters were made available, but not documented. They were eventually documented with the release of Oracle8i, in the Oracle8i Server Reference manual:
·
OPTIMIZER_INDEX_CACHING
This initialization parameter represents a percentage value, ranging between the values of 0 and 99. The default value of 0 indicates to the CBO that 0% of database blocks accessed using indexed access can be expected to be found in the Buffer Cache of the Oracle SGA. This implies that all index accesses will require a physical read from the I/O subsystem for every logical read from the Buffer Cache, also known as a 0% hit ratio on the Buffer Cache. This parameter applies only to the CBO’s calculations of accesses for blocks in an index, not for the blocks in the table related to the index.
·
OPTIMIZER_INDEX_COST_ADJ
This initialization parameter is also a percentage value, ranging between 1 and 10000, representing a comparison between the relative cost of physical I/O requests for indexed access and full table-scans. The default value of 100 indicates to the cost-based optimizer that indexed access is 100% as costly (i.e., equally costly) as FULL table scan access.
As it turns out, the default values for each of these parameters are wildly unsuitable and unrealistic. I’ll prove this assertion later in this paper, but for now, suffice to say that OPTIMIZER_INDEX_CACHING should be set to 90 and OPTIMIZER_INDEX_COST_ADJ should be set to a value which usually ranges between 10 and 50 for most online transaction processing (OLTP) systems, For data warehousing or other decision-support systems (DSS), it might be prudent to simply set this parameter to 50. A good way of calculating this with some accuracy is included at the end of the paper…
So, that’s it!
Test it out. Take a formerly incorrigible SQL statement, get rid of all the hints that have been painstakingly plastered all over it, set these two parameters using ALTER SESSION SET, and do an EXPLAIN PLAN.
…the rest of the story…
Still here?
Don’t you hate it when someone makes a wild assertion with absolutely no substantiation? That is the way with most recommendations, isn’t it? “Tips and techniques” are the junk food of professional education, so it is important to chew on something more substantial instead. So OK, get ready to chow…
First, I’d like to explain the story from the beginning, starting with the RBO and how it works, and then progress to the CBO and explain (and contrast) how it works. I want to show how the two parameters mentioned above play a dramatic effect on the workings of the CBO. Added to Oracle8 v8.0 as an undocumented afterthought, these parameters have an impact on the costing calculations of the CBO in a way that deductions against your gross income have an impact net income tax owed.
How the rule-based optimizer works
But first, let’s start at the beginning…
The rule-based optimizer (RBO) has only a small amount of information to use in deciding upon an execution plan for a SQL statement:
·
The text of the SQL statement itself
·
Basic information about the objects in the SQL statement, such as the tables, clusters, and views in the FROM clause and the data type of the columns referenced in the other clauses
·
Basic information about indexes associated with the tables referenced by the SQL statement
·
Data dictionary information is only available for the local database. If you’re referencing a remote database, the remote dictionary information is not available to the RBO…
In order to determine the execution plan, the RBO first examines the WHERE clause of the statement, separating each predicate from one another for evaluation, starting from the bottom of the statement. It applies a score for each predicate, using the fifteen access methods ordered by their alleged merit:
1.
Single row by ROWID
2.
Single row by cluster join
3.
Single row by hash cluster key with unique key
4.
Single row by unique index
5.
Cluster join
6.
Hash cluster key
7.
Indexed cluster key
8.
Composite key
9.
Single-column non-unique index
10.
Bounded range search on indexed columns
11.
Unbounded range search on indexed columns
12.
Sort-merge join
13.
MAX or MIN of indexed column
14.
ORDER BY on indexed columns
15.
Full table-scan
Any person who has spent any time tuning SQL statements knows that full table-scans are not uniformly evil, as the ranking of 15 out of 15 would suggest. There are several situations where a full table-scan is far superior to indexed access. The simple presence of an index does not uniformly mean that it is the best access method, although that is what is coded into the RBO. So, right away, experience tells us that a system of simple linear rankings is bound to cause problems.
Also, please note the preponderance toward the use of table clusters, both indexed and hashed. In fact, of the first seven rules in the list, five involve clusters. In over 10 years of application development in Oracle, I personally have taken advantage of table clusters only two or three times, only once as part of an actual production application. Each time they were considered, their disadvantages outweighed their advantages. How many times have you ever used clusters in your career?
So, right away, only ten of the total of fifteen rules are relevant for normal applications. Ten. That is not a very large palette of choices for something as complex as SQL execution in Oracle8.
To illustrate how absurd the RBO can be, let’s take an example, such as the following SQL statement:
SELECTCOUNT(*)
FROM A, B, C
WHERE A.STATUS = B.STATUS
AND A.B_ID = B.ID
AND B.STATUS = ‘OPEN’
AND B.ID = C.B_ID
AND C.STATUS = ‘OPEN’;
This is a simple three-table join. Some things to note:
·
There is a UNIQUE index on the column ID in the table B (a primary key)
·
There are NONUNIQUE indices on the columns STATUS on tables A and B, but not on C.
·
There is a NONUNIQUE index on the column B_ID on table C, not (as one might expect) on the same column on table A. This is an intentional oversight to illustrate a point…
·
The table B (the parent entity of the three) has been populated with 100 rows, while the two child entities (tables A and C) have been populated with 1,000 rows apiece, 10 rows apiece for each row in table B.
·
All of the STATUS columns are populated with a single value of ‘OPEN’
Looking at the SQL statement as the RBO would, we see the WHERE clause has three join predicates (the first, second, and fourth lines) and two filtering predicates (the third and fifth lines). Thinking like the RBO, in order to determine how to devise an execution plan for the query, we will assign scores for each predicate, starting with the last line and working up toward the first. So, using the list of 15 rules above, the RBO will assign scores to each predicate:
1. WHERE A.STATUS = B.STATUS (FULL table-scan on either A or B; score = 15)
2. AND A.B_ID = B.ID (FULL table-scan on either A or B; score = 15)
3. AND B.STATUS = ‘OPEN’ (NONUNIQUE index on B.STATUS; score = 9)
4. AND B.ID = C.B_ID (FULL table-scan on either A or B; score = 15)
5. AND C.STATUS = ‘OPEN’; (no index on C.STATUS, so FULL table-scan; score = 15)
One of the filtering predicates (line #5) scored 15 – the score for a FULL table-scan – because there is simply no index on the STATUS column on the C table.
The three join predicates (lines #1, #2, and #4) had to score as FULL table-scans as well, despite the fact that indexes exist on all mentioned columns in those predicates. The reason for this dismal score of 15 is due to the fact that neither side of the equation is a known value. Rather, both sides are unknown values. Therefore, the only possible way to access the row-sources using these predicates is a FULL table-scan.
This left the one remaining filtering predicates (line #3) with a score of 9, due to their NONUNIQUE indexes. So, as a result, the query will start with an indexed RANGE scan on the B table, using the index on the STATUS column.
…so far, so good. Let’s continue…
Having made the initial decision to make B the driving table of the query, the RBO now knows that it must join either from B to A or from B to C. To determine which table to join next, it once again scans the predicates in the WHERE clause from last to first, considering each predicate given a starting point of the B row-source:
1. WHERE A.STATUS = B.STATUS (NONUNIQUE index on A.STATUS; score = 9)
2. AND A.B_ID = B.ID (FULL table scan, no index; score = 15)
4. AND B.ID = C.B_ID (NONUNIQUE index on C.B_ID; score = 9)
5. AND C.STATUS = ‘OPEN’; (no index, so FULL table-scan; score = 15)
Line #3 has already been used, so it was removed from consideration in this second pass.
It seems that we have a two-way tie this time, since the predicates on lines #1 and #4 each achieved a score of 9 for their NONUNIQUE indices. How to break this tie?
Well, the RBO only has rudimentary information to work with. It knows that each of these columns have NONUNIQUE indexes, but nothing else. So, to break the tie, the RBO can only the minimal information stored in the data dictionary, or clues embedded in the SQL statement itself.
In this case, when we have to break a tie between two different row-sources, the RBO scans the list of row-sources in the FROM clause of the SQL statement, moving from right to left, from the end of the list to the beginning of the list of row-sources. In this case, we are choosing between A and C. Table C is encountered first when reading the FROM clause from right to left, so that becomes our starting point for the query: the NONUNIQUE index on C.B_ID. There is a filtering predicate (line #5) as well, so as rows are retrieved from C they will also be filtered (restricted) against the STATUS column. This doesn’t affect how RBO does the scoring, but the predicate gets used this way anyway…
Now, we consider the remaining predicates in the WHERE clause, this time attempting to decide how to join to the A row-source from either table B or C:
1. WHERE A.STATUS = B.STATUS (NONUNIQUE index on A.STATUS; score = 9)
2. AND A.B_ID = B.ID (FULL table scan, no index; score = 15)
There are only joins from table B to table A, and of the two, there is an indexed join (score = 9 for NONUNIQUE index) on line #1. So, that’s how we do the join from table B to table A, using the index on STATUS, and filtering using the values from B.ID against the values in A.B_ID.
Obviously, any application developer will comment that there SHOULD have been an index supporting the foreign-key on the B_ID column on table A. If there had been, what would have happened?
We would have had another tie, but this time between predicates on the same row-source. Thus, we could not have broken this tie using the ordering of row-sources in the FROM clause. Since both predicates would have scored the same due to a NONUNIQUE index (i.e., score = 9), the RBO would have had to choose the index with the highest OBJECT_ID value in the DBA_OBJECTS view. Essentially, this method of breaking the tie would favor indexes that were simply newer. Not completely irrational, but there are better criteria by which to choose an index, certainly.
NOTE:
If you stop to think about it, what other possible criteria does the RBO have, to make this choice?
FURTHER NOTE:
Even more disturbing, think about how often this kind of tie breaking happens? After all, the RBO effectively only has ten basic access methods from which to choose!
So, in the end, our query performs an indexed RANGE scan on the B.STATUS column, seeking rows on B with a value of “OPEN”. As every row has a value of “OPEN”, this means we will scan the entire table using this index.
Then, the query will join to the C table using another indexed RANGE scan on the B_ID column, filtering the results against the STATUS column, again seeking rows with a value of “OPEN”. Last, the query will join from the B table to the A table, using the index on the STATUS column and filtering the data values in the A.B_ID column against values provided by the B.ID. Remember that every row has been populated with STATUS = “OPEN”, so this is a plan that really doesn’t work very well. Think about all the rows that are retrieved because they have STATUS = “OPEN” but then discarded because A.B_ID does not match B.ID. Ouch!
As proof, in actual execution this SQL statement ran in 21.98 seconds on a v8.1.7 database, using a two CPU SPARC machine running Solaris 2.8. According to the AUTOTRACE utility in SQL*Plus, it consumed 1,692,075 logical reads (and 0 physical reads) to count 10,000 rows.
Some observations illustrated here:
·
The RBO only considers a small number of possible access methods. If Oracle had chosen to continue to develop the RBO as more access methods became available, how would they have been ranked? Would bitmapped indexes have been considered very desirable in all cases? How would HASH joins have been ranked, especially since they are frequently performed using FULL table-scans? If you stop to think about it, the idea of a single-dimensional ranking system for access methods starts to become completely absurd…
·
The presence of an index is automatically good to RBO, even though we know that it isn’t…
·
The RBO uses nonsensical tricks to break frequently occurring ties. Why should the ordering of the FROM clause have anything to do with this? Even more bizarre, why should the age of an index have anything to do with anything? How many database administrators rebuild indexes periodically? Many people have complained about the instability of the cost-based optimizer; how many times have RBO plans changed after a round of well-meaning index rebuilds?
·
Please note that this query had a 100% Buffer Cache hit ratio (BCHR), meaning that no physical I/O resulted from any of the logical I/Os. This should clearly illustrate the point that the BCHR is almost meaningless as a measure of performance optimization. Reducing the amount of work performed, reducing the number of logical I/O’s is a better goal, instead of disguising too much work by keeping it memory-bound.
To paraphrase a well-known saying, “this is no way to run a business”. The RBO was a stopgap measure that was bound to become more and more absurd the longer it continued to be enhanced. Oracle abandoned the RBO for good reason: because it had to. An optimizer that uses more information about the attributes and organization of data is the only feasible long-term direction.
Unfortunately, it took almost 7 long years after its introduction before the cost-based optimizer could properly be said to be ready for production systems.
How the Cost-Based Optimizer works
The CBO is a mathematical processor. It uses formulas to calculate the cost of a SQL statement. Cost essentially translates to physical I/O, those logical I/Os that result in calls to the I/O sub-system, instead of being handled entirely in the Buffer Cache in the Oracle SGA.
Just for the sake of interest, the exact same SQL statement used as an example above, run against the exact same v8.1.7 database using CBO (instead of RBO), ran in 0.58 seconds (instead of 21.98 seconds) and consumed only 10 logical reads (instead of 1.6 million logical reads). This difference is entirely due to the way the CBO works. It examines all possible access methods, using the data dictionary information available to the RBO plus information about the data (i.e., number of rows, size of tables and indexes, distribution of data in columns, etc). It then calculates the number of number of logical I/O for every possible access method, then simply chooses the one with lowest cost.
From the earlier example illustrating the RBO, please note clearly that the RBO makes linear, step-by-step decisions. It makes a decision, and then uses that decision as the basis for considering the next decision. As it progresses from each step to another, it considers an ever-narrowing set of possibilities, dependent on earlier decisions. Thus, the quality of initial decisions determines the quality of subsequent choices.
playing chess
Consider the viability of playing a game of chess in this manner. As a matter of fact, this is how I play chess – like the RBO. I am only capable of considering the very next move, and am incapable of visualizing two, or three, or ten moves into the future for one possible permutation, let alone multiple permutations. I don’t win very often.
Unlike the RBO, the CBO determines the cost of all possible execution plans. Like the RBO, it does work in steps, starting with all possible table accesses, and then calculating subsequent joins or table accesses from that starting point. Each step into the execution is recursively calculated this way. It tries all possible permutations of execution plan, up to the limit imposed by the initialization parameter, OPTIMIZER_MAX_PERMUTATIONS, which defaults to a value of 80,000. That’s a lot of permutations to consider, and the CBO will often consider hundreds or thousands of such permutations in sub-second time.
NOTE:
If you doubt this, there is a trace event for “CBO trace dump” (event 10053) documented on my website (http://www.EvDBT.com/library.htm), at the bottom of that web-page. Using this event, you can dump all of the considerations made by the CBO to a “.trc” trace file in the USER_DUMP_DEST directory on the database server.
Again, consider the game of chess. The CBO is not unlike a grandmaster capable of considering all potential moves, along with subsequent moves, deciding on the next move with the greatest number possible advantage.
What can go wrong?
So it is with the CBO. It calculates the cost of all possible permutations of execution for a SQL statement and simply chooses the lowest one. This is all very straightforward and logical. What could possibly go wrong?
At least two things can go wrong with a mathematical processor:
·
it could receive bad data as input (i.e., the old garbage-in, garbage-out problem)
·
one or more formulas may fail to include important factors or otherwise be in error
Throughout the lifecycle of Oracle7, the problem was both. The ANALYZE command in Oracle7 was fraught with bugs, producing poor statistics for the CBO as input. In Oracle8, many of the bugs in the ANALYZE command have been fixed, resolving the problems with bad input. In Oracle8i, the DBMS_STATS package provides even better statistics than the ANALYZE command, as attested by several “bugs” logged into the MetaLink support system (
http://metalink.oracle.com/
).
Using DBMS_STATS in conjunction with the new Oracle8i MONITORING feature on tables makes the task of gathering valid and useful CBO statistics reliable and easy. So, that problem has been dealt with.
What remains are some subtle corrections to the formulas that the CBO uses to calculate cost. Let’s examine this…
What does “cost” mean?
The cost calculated by the CBO is primarily comprised of physical I/O. The actual formula is documented as
IO + CPU/1000 + NetIO*1.5
In this formula, “IO” represents physical I/O requests, “CPU” represents logical I/O requests, and “Net I/O” represents logical I/O requests to/from a remote database across a database link.
This implies that physical I/O requests are the most expensive component, so they make up the majority of the costing. Distributed database operations are also represented as very expensive, but for the scope of this paper we’ll just concentrate on operations within a single local database, if you don’t mind…
What is the distinction between logical I/O and physical I/O? The former are requests for database blocks cached in the Buffer Cache of the Oracle SGA. The latter are requests from Oracle to the underlying I/O subsystem, for blocks that are not cached in the Buffer Cache. The CBO will try to calculate the number of physical I/O requests for all possible execution plans for a SQL statement, and reject all but the plan with the lowest number.
To understand how complicated this can be, let’s first do a quick review of I/O in Oracle…
A review of the rules of the I/O in Oracle
Oracle operations do not access rows; they access database blocks that contain rows. Moreover, Oracle does not attempt to find blocks on disk; it expects to find them cached in shared memory, in the Buffer Cache of the System Global Area (SGA). Reads against block buffers in the Buffer Cache are known as logical reads. To be even more specific, there are two major types of logical reads: consistent gets and db block gets (a.k.a. current gets). Consistent gets are blocks retrieved to specific version or point-in-time, usually requested by SELECT statements. On the other hand, db block gets or current gets are blocks retrieved at the most up-to-date or current point-in-time, usually retrieved for the purpose of making of modification, such as an INSERT, UPDATE, or DELETE statement. Differentiating between the two types of logical reads has no further bearing on the rest of this paper; it was presented here only as a matter of passing interest, but if you go looking for statistics in V$SYSSTAT or V$SESSTAT, you won’t find a statistic named “logical reads”; instead you’ll find “consistent gets” and “db block gets”. Just add the two together.
Obviously, if the real database resides on hard disk and not in shared memory, then those buffers in the Buffer Cache had to get populated somehow! In general, this happens when an Oracle server process checks the Buffer Cache for the presence of a specific database block (a.k.a. a logical read). If the database block is in the buffer, then all is well and good; the process continues its work. But if the database block does not reside in any buffer in the Buffer Cache, then the server process that has performed the logical read will itself make a request to the I/O sub-system (a.k.a a physical read) in order to populate the buffers. Then, the server process continues on its merry way.
What happens if a server process performs a logical read for a block that is not resident in the Buffer Cache, finds that it is not present, performs a physical read, and another server process attempts a logical read before the first process completes its physical read? Will both processes issue physical reads? The answer is “no”. The first process, upon finding that the database block required is not resident in the Buffer Cache, will allocate a buffer for the block before actually performing the physical read. Then, it will “lock” that buffer and issue the physical I/O request. If the second process attempts a logical read while the buffer is locked, then the second process will post a buffer busy wait wait-event and simply wait for the block to become available. After all, no sense causing more I/O than is necessary…
Working on the “cache buffers chain” gang…
The Buffer Cache is a cache full of buffers and is managed by a least recently used (LRU) algorithm. Using this algorithm, buffers that are accessed most frequently are placed at the most recently used (MRU) end of a linked-list. Newly read blocks are read into the cache and placed at the MRU end of the list. As logical reads are performed upon buffers, they are moved back to the MRU end of the list. If a block is not accessed, it will gradually slide further and further toward the LRU end of the list. Eventually, when it reaches the end of the list, a newly read block will overwrite its buffer. Another way of thinking about it is to say that the block will fall off the LRU end of the list.
NOTE:
The rules of this LRU algorithm have changed dramatically for Oracle8i, where a new mid-point insertion algorithm is used to try to minimize latch manipulation while attaching and detaching blocks from the list, especially when moving “hot” blocks to the MRU end of the list. I don’t have any more information on this; I’ll leave details for a subsequent update of this paper…
Two types of I/O
Oracle has two types of read I/O on datafiles:
·
Single-block, random-access reads, usually pertaining to indexed access (wait-event db file sequential read is posted during this I/O request)
·
Multi-block, sequential-access reads, usually pertaining to FULL table scan access (wait-event db file scattered read is posted during this I/O request), but might also pertain to sorting and parallel queries (wait-event direct path read is posted during these I/O requests)
Now, let’s begin at the beginning, as they say…
Simplistic “cached I/O” algorithm and the resulting “flush” effect
The rules for both types of I/O can be quite simple. When a process wants a database block, it first checks the Buffer Cache. If the block is present in the Buffer Cache, then it is used and “moved” to the MRU (a.k.a. most recently used) end of the LRU (a.k.a. least recently used) chain. If it is not in the Buffer Cache, then it is read from datafiles on disk, placed into a buffer in the Buffer Cache, and “placed” at the MRU end of the LRU chain.
But there is a grave problem with this simple formula. Using this formula, what would happen if one were to scan the entire contents – every single row from every single database block – of a table that is larger than the entire Buffer Cache? In other words, perform a FULL table scan against a large table?
What would inevitably happen is that each newly read database block would be placed at the MRU end of the LRU chain, while blocks read a while ago would inexorably be pushed toward the LRU end of the LRU chain, eventually to be “pushed out” of the Buffer Cache and overwritten. Like a steamroller, a FULL table scan would completely flush the contents of the Buffer Cache.
Cached I/O in Oracle
So, in Oracle7 version 7.0, a subtle but staggering series of changes were introduced, to prevent this cache flushing effect. Actually, the changes about to be described were first implemented in Oracle version 6.0, but for the sake of story telling, let’s start with Oracle7 version 7.0…
First of all, multi-block sequential I/O, the hallmark of FULL table scans, would be handled differently with regards to the LRU algorithm. When blocks were read into the Buffer Cache during a FULL table scan, they would be placed at the LRU (not the MRU) end of the LRU chain. Single-block reads for indexed scans were not changed; they continued to be placed at the MRU end of the LRU chain. But blocks from FULL table scans, unless they were accessed again almost immediately (thereby moving them to the MRU end of the chain), would be overwritten almost immediately.
And why not? When you think about it, logical reads during a FULL table scan only rarely encountered the blocks they wanted in the Buffer Cache, simply due to sheer volume. Therefore, FULL table scans are always characterized by high numbers of physical reads. But how often are the blocks read by FULL table scans ever re-accessed? The simple answer is rarely, in most cases. So, playing with the odds, the designers of Oracle7 acknowledged this fact allowing blocks read during FULL table scans to be overwritten almost immediately, and simultaneously preserved the effectiveness of the Buffer Cache as a real cache.
However, the Oracle7 designers realized that there were some situations where you wanted to hold the blocks won during a FULL table scan: when reading very small tables, which were often heavily accessed. Another feature of very small tables is that they would never end up flushing the contents of Buffer Cache! So, blocks read during FULL table scans of very small tables were placed on the MRU end of the LRU chain, as an exception. An initialization parameter, SMALL_TABLE_THESHOLD, with a default value of 2% of DB_BLOCK_BUFFERS, was created to define the meaning of very small table. SMALL_TABLE_THRESHOLD was the number of blocks; any tables with fewer blocks were considered very small.
NOTE: This algorithm also pertained to Oracle release 6.0, where the SMALL_TABLE_THRESHOLD was hard-coded into the kernel (not externalized as a parameter) with the value of 4 blocks. It did not become a “visible” parameter until Oracle7.
So, they thought of everything, didn’t they? Such clever people! However, these clever designers of Oracle7 neglected to consider human nature…
Application developers and database administrators have always been enamored of the idea of pinning entire tables into the Buffer Cache. This is popularly thought to be a good practice, though if you stop and think about it, it is simply impossible in most cases. After all, by definition a cache is a relatively small subset of the greater database. Even with the acres of RAM available to servers today, it is still only a few gigabytes of space. Tables much larger than this are as common as earthworms in a garden in most database applications. But, humans are an optimistic sort, so while whistling a snappy tune we do our best to pin those tables into memory, convinced that this will somehow optimize performance.
Given the new rules of Oracle7 release 7.0, database administrators immediately tended to reset the parameter SMALL_TABLE_THRESHOLD to an absurdly high value, seeking that elusive pinned table, avoiding disk-based I/O altogether. At the same time, they undid all of the clever work wrought by the Oracle7 designers, and we were thrown into the exact situation (cache flushing) which we were trying to avoid, with FULL table scans busily flushing the Buffer Cache down the proverbial creek, benefiting no one.
I/O in Oracle7 version 7.1 and beyond
OK, along came Oracle7 version 7.1…
This time around, the Oracle7 designers showed even more clearly what a clever lot they were. First, they made the ill-treated parameter SMALL_TABLE_THRESHOLD and made it “undocumented” by renaming it “_SMALL_TABLE_THRESHOLD”. This had a similar effect to putting a skull-and-crossbones on the parameter, effectively saying “Change me at your peril”. Good idea!
However, they recognized that developers desired to pin certain tables into the Buffer Cache. So, they created the CACHE and NOCACHE keywords for the CREATE TABLE and ALTER TABLE commands. With NOCACHE as the default setting, a developer could mark a table as CACHE. What this meant is that, like very small tables, tables set for CACHE would also be placed at the MRU end of the LRU chain, even during a FULL table scan. Their NOCACHE brethren continue to be placed at the LRU end.
However, knowing human nature, and remembering the popular but misguided concept that pinning tables into memory would somehow make every application scream, what do you think would happen? Of course! Everybody would mark all of their tables as CACHE!
Having learned a thing or two from the SMALL_TABLE_THRESHOLD situation, the Oracle7 designers added a governor to help prevent this very thing from happening. They created another parameter, CACHE_SIZE_THRESHOLD, which designated a maximum size for tables to be set to CACHE. If a table was marked CACHE but it was larger than the number of blocks indicated by CACHE_SIZE_THESHOLD, Oracle would treat the table as NOCACHE, effectively ignoring the setting to CACHE. The parameter defaulted to 10% of the total size of the Buffer Cache, but it could be raised or lowered by the database administrators (or by developers who had hacked the DBA’s passwords) as needed.
So, these are the rules that have held for the past ten years, since Oracle7 version 7.1. Indexed scans (using single-block I/O) are always placed at the MRU end of the LRU chain of the Buffer Cache, and subsequently tend to stay in the Buffer Cache for a long period of time. The very tree-structure of B*-Tree indexes tends to enhance the reuse of these buffers, constantly moving them back to the MRU end as well. So, indexes tend to cache very well.
FULL table scans, on the other hand, will be read from disk into buffers in the Buffer Cache which will be placed immediately at the LRU end of the LRU chain. Unless, they are small tables, at which time they will be read into buffers placed at the MRU end. Another exception are tables that are marked CACHE and are smaller than CACHE_SIZE_THRESHOLD, which will also go to the MRU end.
summary of the characteristics of I/O in Oracle
In summary, FULL table scans are generally noted by the fact that almost every logical read results in a physical read, and this is completely normal and beneficial. FULL table scans usually have a Buffer Cache hit ratio of 20% or less (often as low as 1% or 5%), meaning that all but a very small number of logical reads result in corresponding physical reads. Indexed scans, on the other hand, are generally characterized by very high Buffer Cache hit ratios of 95% or higher.
Calculating “logical I/O” as opposed to “physical I/O”
So, the cost-based optimizer is supposed to calculate cost, which roughly equates to physical I/O. That’s not as easy as it sounds…
Calculating logical reads is much easier, and that is in fact what the CBO does. Given the information in the data dictionary about the size of tables and indexes, the number of rows in tables, the number of distinct keys in indexes, the number of levels in a B*-Tree index, the average number of blocks in a table for a distinct data value, the average number of index leaf nodes for a distinct data value. Given knowledge about how the four methods of joining tables operate. Given information from histograms in the data dictionary about the selectivity (or lack thereof) of column values. Given knowledge about how different types of B*-Tree and bitmapped index operations work. Given all of these things, the formulas embedded into the CBO (which Oracle will not share with anybody) can quickly and accurately determine how many logical reads should be expected for each of the handful, dozens, hundreds, or thousands of possible execution plans for any given SQL statement. The CBO will calculate logical reads for all of these execution plans, attempt to convert these to physical reads (a.k.a. cost), and choose the lowest resulting value.
But how does the CBO make that jump from total logical reads to total physical reads?
For FULL table scans, this is relatively easy! It takes the total number of logical reads (i.e., the number of blocks in the table), divides by the value of the parameter DB_FILE_MULTIBLOCK_READ_COUNT, and hey presto we have physical reads. Since FULL table scans are distinguished by a very low (almost non-existent) Buffer Cache hit-ratio, this is mostly a valid formula.
For indexed scans, which are always more highly cacheable, this formula needs some adjustment…
Indexed scans are generally single-block reads (for UNIQUE, RANGE, and FULL indexed scans, but FAST FULL indexed scans use the multi-block reads just like FULL table scans), so there is usually no adjustment to perform using DB_FILE_MULTIBLOCK_READ_COUNT here. Instead, starting in Oracle8 release 8.0, a new parameter named OPTIMIZER_INDEX_CACHING has been introduced to help with this adjustment from logical to physical I/O, as follows:
CALCULATED-LOGICAL-READS
* (1 – (OPTIMIZER_INDEX_CACHING / 100)) = CALCULATED-PHYSICAL-READS = COST
The parameter, however, has a lousy default setting of zero. Plug the value of “0” into the formula, and logical reads translate one-for-one into physical reads, just like FULL table scans. The default value negates this formula. This is simply not the way it really is!
I like to set OPTIMIZER_INDEX_CACHING to 90, meaning that 90% of all logical reads resulting from indexed scans can be expected to be found in the Buffer Cache. More likely, for most applications, this percentage is probably closer to 95%, or 98%, or even 100%. But 90% is a good, conservative value to use. Start with that and you should notice a dramatic difference in the choices made by the CBO.
Don’t take my word for it. Try it out! Verify these assertions empirically, for your own self.
You can change this parameter using ALTER SESSION SET OPTIMIZER_INDEX_CACHING = 90. The range of values is 0 (the default) to 99. Start with a value of 90, and then try different values. I tend to see crazy behavior when I set it to 99; try it out. Run this command and perform an EXPLAIN PLAN on an especially troublesome query, after removing the hints you’ve painstakingly added. Odds are that the CBO will choose the right plan. If you find this happening often enough under testing conditions, perhaps you’ll want to set the initialization parameter in the “init.ora” file, and see what happens to overall system performance when the setting is global.
It’s like a whole new attitude for the surly old CBO…
Adjusting for different types of I/O
But wait! That’s not all!
The CBO also wants to know the relative cost of indexed access in comparison to FULL table scans. The parameter OPTIMIZER_INDEX_COST_ADJ represents this comparison. The default value of 100 implies to the CBO that indexed access takes roughly the same amount of elapsed time as FULL table scan access. The CBO uses the value of this parameter to perform yet another refinement of calculated cost for indexed scans:
PREVIOUS-COST
* (OPTIMIZER_INDEX_COST_ADJ / 100) = FINAL-COST
As you can plainly see, the default value negates this formula, just as with the OPTIMIZER_INDEX_CACHING default value.
Luckily, a valid value for OPTIMIZER_INDEX_COST_ADJ can easily be retrieved from the Oracle database itself. The answers lie in the V$SYSTEM_EVENT view, in the column AVERAGE_WAIT. This is yet another example of the incalculable value of setting TIMED_STATISTICS to TRUE, so that the Session Wait views (of which V$SYSTEM_EVENT is one) are populated with timing information.
After the database has been up and running normally for a sufficient period of time (i.e., a couple hours or more), perform the following query:
SELECTEVENT,
AVERAGE_WAIT
FROM V$SYSTEM_EVENT
WHERE EVENT LIKE ‘db file s%’;
This query will retrieve information on the two I/O wait events db file scattered reads (a.k.a. FULL table scans) and db file sequential reads (a.k.a. indexed scans). The AVERAGE_WAIT column contains the average timing, in 1/100ths of a second, of these events:
EVENT AVERAGE_WAITS
========================= ==============
db file sequential reads .33178629
db file scattered reads 2.190087
In this example, indexed scan I/O requests takes only 15% as long as each FULL table scan I/O request. So, set OPTIMIZER_INDEX_COST_ADJ to 15.
Summary
There are other parameters that affect the CBO, which are unfortunately beyond the scope of this paper. One such has been mentioned here: DB_FILE_MULTIBLOCK_READ_COUNT.
Pop quiz:
how do you think a very high value for DB_FILE_MULTIBLOCK_READ_COUNT influences the CBO? Is a high value always a good thing? What would be the potential advantages (and disadvantages) of a low value, in terms of behavioral impact on the CBO? Deceiving the CBO by setting this parameter too high is one of the most common reasons for poor decisions by the CBO when migrating from RBO.
There is solid evidence that the DBMS_STATS package (introduced in Oracle8i) does a much better job of calculating table, index, and column statistics than the reportedly-soon-to-be-obsolete ANALYZE command. Don’t take my word for it; search in MetaLink for the keyword “dbms_stats”, making sure to do an “Advanced” search and including entries from the “bug database”. Several issues logged into MetaLink against the DBMS_STATS package have subsequently been resolved as “bugs” against the ANALYZE command, which was used as a basis of comparison.
But the impact of understanding and appropriately setting these two parameters is earth shaking. Try it out.
I hope that you’ll come to see that there is, in fact, hope for the CBO. In fact, there is intelligent life in there and it is getting better, faster, and smarter very rapidly.
Acknowledgements
The author sincerely thanks the following experts for generously providing sage advice, for correcting where incorrect, for clarifying where unclear, and for guiding where lost. In alphabetical order:
Jonathan Lewis
(JL Computer Consultancy - UK - http://www.jlcomp.demon.co.uk/)
Jeff Maresh
(Maresh Consulting, Inc., Conifer, CO - USA - http://www.EvDBT.com/)
Cary
Millsap
(Hotsos Enterprises, Ltd., Southlake, TX - USA - http://www.Hotsos.com/)
Mogens Norgaard
(Miracle A/S, - Denmark - http://www.MiracleAS.dk/)
Craig Shallahamer
(OraPub, Inc., Lake Oswego, OR - USA - http://www.OraPub.com/)
Any errors or oversights in this paper are those of the author alone.
About the author
Tim Gorman is a consultant for Evergreen Database Technologies, Inc, based in Colorado, USA. He has been a "C" programmer on databases on UNIX and VMS since 1983 and has worked with Oracle technology since 1990. Tim is co-author of "Oracle8 Data Warehousing" and "Essential Oracle8i Data Warehousing", published in 1998 and 2000, respectively, by John Wiley & Sons. Tim can be reached at:
Corporate Website:
http://www.EvDBT.com/
E-mail:
Tim@EvDBT.com
Personal Website:
http://www.EvDBT.com/