大表查询的执行计划问题
这篇文章举例说明了大表在各种存储模式下进行查询时的执行计划。不同的字段查询会形成相对应的执行计划,而且对于索引的INDEX FAST FULL SCAN也有一个举例,对全表扫描和快速索引扫描有很好的说明用处。非常牛逼!
Wide Load Storage
By Tom Kyte
I'm working at the site of a client who has a data warehouse with partitioned fact tables with about 400 columns. Performance is not too good, and I am trying to optimize the system a bit.
The client told me that most of the columns are queried rarely, if ever, but the client wants to store the data anyway in case it is ever needed, so my idea of speeding up the inevitable full table scan by dropping all never-used columns was rejected. Obviously, I could store a redundant set of skinnier tables and use compression on them to reduce the size even more, but storage space is an issue (as in, “we can't afford any more storage”), and users would have to know which table to query for which column. The client wants only ETL [extract, transform, and load] logic in the database but no query logic at all, because “that's what we have our BI tools for.” So do you have any suggestions on how to tune such a database?
“We can't afford any more storage”: Well, if they want performance, they just might need to rethink that. Sometimes extra storage is necessary. Indexes consume extra storage, materialized views consume extra storage, and both are vital for data warehouse performance.
However, you might be able to use vertical partitioning here and add a new primary key constraint, which will result in an index (but in general, you'll want that index in order to put the data back together!).
If you tell Oracle Database there is a one-to-one mandatory relationship between the two tables, you can include table elimination in your plan. To be one-to-one mandatory, there will be a primary key on the join column and there will be reciprocal foreign keys, which can be validated or not, if you load the data clean—it'll be OK to just say they exist. The code in Listing 2 does this, telling the database that a foreign key exists.
Code Listing 2:
Creating T1 and T2 tables and constraints
SQL> create table t1 as select
2 OBJECT_ID,
3 OWNER, OBJECT_NAME, SUBOBJECT_NAME,
4 DATA_OBJECT_ID, OBJECT_TYPE
5 from all_objects where 1=0;
Table created.
SQL> alter table t1
2 add constraint t1_pk
3 primary key(object_id)
4 rely;
Table altered.
SQL> create table t2 as select
2 OBJECT_ID,
3 CREATED, LAST_DDL_TIME, TIMESTAMP,
4 STATUS, TEMPORARY, GENERATED, SECONDARY
5 from all_objects where 1=0;
Table created.
SQL> alter table t2
2 add constraint t2_pk
3 primary key(object_id)
4 rely;
Table altered.
We'll use DBMS_STATS to tell the optimizer that T1 and T2 are big tables, as they would be in real life:
SQL> begin
2 dbms_stats.set_table_stats
3 ( user, 'T1',
4 numrows => 100000000,
5 numblks => 1000000 );
6 dbms_stats.set_table_stats
7 ( user, 'T2',
8 numrows => 100000000,
9 numblks => 1000000 );
10 end;
11 /
PL/SQL procedure successfully completed.
And then applications would use the VW view, shown in Listing 3, which hides the fact that there are really two tables underneath.
Code Listing 3:
Creating VW view and constraints
SQL> create or replace view vw
2 as
3 select
4 t1.OBJECT_ID,
5 t1.OWNER, t1.OBJECT_NAME, t1.SUBOBJECT_NAME,
6 t1.DATA_OBJECT_ID, t1.OBJECT_TYPE,
7 t2.CREATED, t2.LAST_DDL_TIME, t2.TIMESTAMP,
8 t2.STATUS, t2.TEMPORARY, t2.GENERATED, t2.SECONDARY
9 from t1, t2
10 where t1.object_id = t2.object_id;
View created.
SQL> alter table t2
2 add constraint t2_fk_t1
3 foreign key(object_id)
4 references t1(object_id)
5 rely disable novalidate;
Table altered.
SQL> alter table t1
2 add constraint t1_fk_t2
3 foreign key(object_id)
4 references t2(object_id)
5 rely disable novalidate;
Table altered.
Now when we need columns from both tables, performance will be negatively affected, because we have to join them back together. The query uses an index if a small set of rows is returned or a large hash join if there are many rows to be output, as shown in Listing 4.
Code Listing 4:
Query on VW view requiring hash join and full table scan
SQL> set autotrace traceonly explain
SQL> select * from vw;
Execution Plan
-----------------------------------------
Plan hash value: 2959412835
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100M| 13G| | 1490K (3)|
|* 1 | HASH JOIN | | 100M| 13G| 6198M| 1490K (3)|
| 2 | TABLE ACCESS FULL| T2 | 100M| 5054M| | 317K (4)|
| 3 | TABLE ACCESS FULL| T1 | 100M| 8392M| | 316K (4)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------
1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
But if hardly any of your queries access the columns from the second table, T2, you can remove the second table from the query plan altogether. This happens transparently; as shown in Listing 5, you need do nothing for this to occur.
Code Listing 5:
Transparently removing T2 from query plan
SQL> select OWNER, OBJECT_NAME,
2 SUBOBJECT_NAME,
3 DATA_OBJECT_ID,
4 OBJECT_TYPE
5 from vw;
Execution Plan
----------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Byte | Cost (%CPU) | Time
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100M| 8392M| 316K (4)| 00:25:03
| 1 | TABLE ACCESS FULL| T1 | 100M| 8392M| 316K (4)| 00:25:03
--------------------------------------------------------------------------
Note that the ability to eliminate the unnecessary table was new in Oracle Database 10g Release 2. And finally, yes, use COMPRESS on the tables!
An alternative solution would be to use an index to create a skinny version of the data that is frequently queried. The optimizer will—when possible—use an index fast full scan (a full scan-style execution path, using multiblock I/O like a full table scan) to read the subset of columns and have efficient access to all the data in the table when it needs it. For example, borrowing on the previous example and assuming that the same six columns are the popular ones, you would create the skinny version of the data as shown in Listing 6.
Code Listing 6:
Creating skinny version of the data
SQL> create table t1 as
2 select *
3 from all_objects
4 where 1=0;
Table created.
SQL> create index t1_idx on
2 t1( OBJECT_ID, OWNER,
3 OBJECT_NAME, SUBOBJECT_NAME,
4 DATA_OBJECT_ID, OBJECT_TYPE );
Index created.
SQL> begin
2 dbms_stats.set_table_stats
3 ( user, 'T1',
4 numrows => 100000000,
5 numblks => 1000000 );
6 dbms_stats.set_index_stats
7 ( user, 'T1_IDX',
8 numrows => 100000000,
9 numlblks => 1000000/2 );
10 end;
11 /
PL/SQL procedure successfully completed.
Now the optimizer has two structures it can access—the table (T1) or the index (T1_IDX)—to retrieve those six columns.
Note that these structures rely on at least one of the attributes in the index defined as NOT NULL. If all columns in the index are nullable, this approach would not work, because entirely null entries are not placed in the index, so not every row would necessarily be indexed. In this example, OBJECT_ID is NOT NULL in the table, and hence the index will point to every row in the table.
Now when we query all columns (or at least one nonindexed column), we will get a full scan of the single table (T1), as shown in Listing 7.
Code Listing 7:
Query using full table scan of T1
SQL> set autotrace traceonly explain
SQL> select * from t1;
Execution Plan
----------------
Plan hash value: 3617692013
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100M| 9536M| 320K (5)| 00:25:25 |
| 1 | TABLE ACCESS FULL| T1 | 100M| 9536M| 320K (5)| 00:25:25 |
-------------------------------------------------------------------------
And when we ask for only the indexed columns, we will get an index fast full scan, as shown in Listing 8. Note that because we used DBMS_STATS to tell the optimizer that the index is about half the size of the table, the cost of the full scan of the index (157K) is about half the cost of the full scan of the table (320K), and the runtime of the full scan of the index (12:31) is about half the runtime of the full scan of the table (25:25).
Code Listing 8:
Query using index fast full scan of T1_IDX
SQL> select OWNER, OBJECT_NAME,
2 SUBOBJECT_NAME,
3 DATA_OBJECT_ID,
4 OBJECT_TYPE
5 from t1;
Execution Plan
--------------------------
Plan hash value: 1294651092
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100M| 7152M| 157K (4)| 00:12:31 |
| 1 | INDEX FAST FULL SCAN | T1_IDX | 100M| 7152M| 157K (4)| 00:12:31 |
---------------------------------------------------------------------------------