Tuesday, May 26, 2009

User objects created in the SYS schema and the (cost based) optimizer

The answer - as already disclosed by Nicolas Gasparotto - to the question that I asked here in the "Weekend Quiz" is to run the script as SYS user, and then run the query shown against these objects in the SYS schema (tested against 10g XE, 10.2.0.4 and 11.1.0.7 on Win32).

Note: It's not recommended to create any non-SYS objects in the SYS schema and you should only perform this (if at all) in a test database.

All this came up in this recent OTN forum thread where it became obvious that the issue can only be reproduced if the objects are owned by SYS.

There are two interesting points to derive from this (apart from the obvious that one should not create any user objects in the SYS schema):

1. The optimizer seems to treat objects owned by SYS differently, in particular regarding the transformations applied. Note that the crucial point is not that the query is executed as SYS user, but that the objects are owned by the SYS user. Granting appropriate privileges to a non-SYS user on the objects owned by SYS allows to reproduce the issue even with a non-SYS user.

2. It's something to remind if there is the need to understand a performance issue with a recursive dictionary query performed on SYS-owned objects. Although you obviously can't influence the SQL generated by Oracle itself it might help to understand the issue and take appropriate steps to rectify the issue.

Oh, by the way, have I already mentioned that it's really a bad idea to create user objects in the SYS schema?

Sunday, May 24, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 4

Back to part 3

Using objects residing in multiple blocksizes

I've already mentioned it several times on my blog but I would like to take the chance here again to stress the point that the cost based optimizer does a bad job when it comes to calculating costs for full table scans of objects residing in non-default block sizes. It really looks like that this feature has been introduced to support transportable tablespaces but it obviously hasn't been tested very thoroughly when it comes to cost calculation.

Each of the different modes has its deficiencies when dealing with objects in non-default blocksizes. The somehow odd thing is that the traditional I/O costing does the best job, and all system statistics based calculations are utterly wrong.

Traditional I/O based costing

The traditional I/O based costing simply scales the MBRC up or down according to the non-default blocksize to come to the same I/O read request size. So if you e.g. have a MBRC of 8 and a default blocksize of 8KB and now calculate the cost for an object residing in a 2KB tablespace, the MBRC will be multiplied 4, which results in a MBRC of 32. The I/O cost will be different although due to the different adjustment used with the higher MBRC setting. The adjusted MBRC for 32 is 16.39 whereas the adjusted MBRC for 8 is 6.59, so the calculated cost for the full table scan of the object residing in the 2KB tablespace will be higher. Likewise the same happens when using an object in a 16KB non-default tablespace. The MBRC will be reduced accordingly to 4 to get the same I/O read size again. Since adjusted MBRC for MBRC = 4 is 4.17, the cost calculated will actually be less for the object residing the 16KB tablespace.

System statistics

Things get more complicated when using NOWORKLOAD or WORKLOAD system statistics.
To recall the formula to calculate the I/O cost of a full table scan with system statistics is:

Number of blocks / MBRC * MREADTIM / SREADTIM

And in case of NOWORKLOAD system statistics the MREADTIM and SREADTIM are synthesized using the following formula:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTRFSPEED

MREADTIM = IOSEEKTIM + DB_BLOCK_SIZE * MBRC / IOTRFSPEED

Now if the object resides in a non-default blocksize tablespace, the following inconsistent adjustments are applied to the formula:

NOWORKLOAD system statistics:

SREADTIM as above, using _DEFAULT_ DB_BLOCK_SIZE

MREADTIM = IOSEEKTIM + _DEFAULT_ DB_BLOCK_SIZE * scaled MBRC / IOTRFSPEED

I/O cost = Number of blocks / scaled MBRC * MREADTIM / SREADTIM

So obviously something is odd in above formulas: The SREADTIM and MREADTIM values are synthesized with a mixture of a scaled MBRC (according to the block size) but a non-adjusted default DB_BLOCK_SIZE, resulting in a large variation in cost, so a full table scan in a small blocksize is much more expensive than in the default blocksize, and likewise a full table scan in a large blocksize is much cheaper. Unfortunately this doesn't reflect at all the runtime behaviour, since Oracle acutally scales the I/O read request size accordingly meaning that the runtime difference usually is negligible, but the cost calculated is dramatically different.

WORKLOAD system statistics:

MBRC as measured/set

SREADTIM as measured/set

MREADTIM as measured/set

I/O cost = Number of blocks / MBRC * MREADTIM / SREADTIM

This is even worse than above NOWORKLOAD result because the I/O cost calculated simply is different by the factor of number of blocks in non-default block size / number of blocks in default block size, e.g. an object residing in a 2KB block size will have an I/O cost four times higher than an object residing in a 8KB default blocksize, and the MBRC is not adjusted at all for the calculation.

Some calculation examples

The following examples attempt to summarize what you get if you're using an object in a non-default blocksize tablespace:

1. I/O Cost of a full table scan of a 10,000 blocks / 80MB segment residing in a default blocksize of 8KB (in case of system statistics excluding the CPU cost) and using an MBRC of 8:

traditional I/O:
round(10,000 blocks / 6.59) = 1,517

(default) NOWORKLOAD system statistics:
SREADTIM = 10 + 8192/4096 = 10 + 2 = 12
MREADTIM = 10 + 8192*8/4096 = 10 + 16 = 26

round((10,000 / 8) * 26 / 12) = 1,250 * 26 / 12 = 2,708

WORKLOAD system statistics:
same as above provided you use the same for MREADTIM and SREADTIM otherwise as measured

2. I/O Cost of a full table scan of a 40,000 blocks / 80MB segment residing in a non-default blocksize of 2KB (in case of system statistics excluding the CPU cost) and using a default blocksize of 8KB and a MBRC of 8:

traditional I/O:
MBRC used for 2KB blocksize = 8 * 8KB / 2KB = 8 * 4 = 32

adjusted MBRC for MBRC = 32: 16.39

round(40,000 blocks / 16.39) = 2,441

(default) NOWORKLOAD system statistics:
SREADTIM = 10 + 8192/4096 = 10 + 2 = 12
(should be 10 + 2048/4096 = 10 + 0.5 = 10.5)

MREADTIM = 10 + 8192*32/4096 = 10 + 64 = 74
(should be 10 + 2048*32/4096 = 10 + 16 = 26)

round((40,000 / 32) * 74 / 12) = 1,250 * 74 / 12 = 7,708
(should be round((40,000 / 32) * 26 / 10.5) = 1,250 * 26 / 10.5 = 3,095

WORKLOAD system statistics (for the sake of comparison using the NOWORKLOAD defaults):
MBRC = 8
SREADTIM = 12
MREADTIM = 26

round((40,000 / 8) * 26 / 12) = 5000 * 26 / 12 = 10,833
(should be round((40,000 / 32) * 26 / 12) = 1,250 * 26 / 12 = 2,708

3. I/O Cost of a full table scan of a 5,000 blocks / 80MB segment residing in a non-default blocksize of 16KB (in case of system statistics excluding the CPU cost) and using a default blocksize of 8KB and a MBRC of 8:

traditional I/O:
MBRC used for 16KB blocksize = 8 * 8KB / 16KB = 8 * 0.5 = 4

adjusted MBRC for MBRC = 4: 4.17

round(5,000 blocks / 4.17) = 1,199

(default) NOWORKLOAD system statistics:
SREADTIM = 10 + 8192/4096 = 10 + 2 = 12
(should be 10 + 16384/4096 = 10 + 4 = 14)

MREADTIM = 10 + 8192*4/4096 = 10 + 8 = 18
(should be 10 + 16384*4/4096 = 10 + 16 = 26)

round((5,000 / 4) * 18 / 12) = 1,250 * 18 / 12 = 1,875
(should be round((5,000 / 4) * 26 / 14) = 1,250 * 26 / 14 = 2,321

WORKLOAD system statistics:
MBRC = 8
SREADTIM = 12
MREADTIM = 26

round((5,000 / 8) * 26 / 12) = 625 * 26 / 12 = 1,354
(should be (5,000 / 4) * 26 / 12) = 1,250 * 26 / 12 = 2,708

Plan change by moving to non-default block size

And here is a final example that shows by simply moving an object to a non-default blocksize tablespace a change in the execution plan can be observed. An index scan is used instead of the full table scan since the cost of the full table scan operation is significantly increased. The SQL trace will reveal that at runtime however there is no significant difference between the full table scan executed in the default tablespace and non-standard blocksize tablespace.


SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> create table t1
2 as
3 select
4 trunc(dbms_random.value(0,25)) as col1
5 , trunc(dbms_random.value(0,25)) as col2
6 , rpad('x', 200, 'x') as filler
7 from
8 (
9 select
10 level as id
11 from
12 dual
13 connect by
14 level <= 10000
15 );

Table created.

Elapsed: 00:00:00.49
SQL>
SQL> create index t1_idx on t1 (col1, col2)
2 ;

Index created.

Elapsed: 00:00:00.14
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.55
SQL>
SQL> explain plan for
2 select /*+ full(t1) */
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 83 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 96 | 19776 | 83 (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COL1"=1 AND ("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR
"COL2"=4 OR "COL2"=5 OR "COL2"=6))

14 rows selected.

Elapsed: 00:00:00.13
SQL>
SQL> explain plan for
2 select /*+ index(t1) */
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 96 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 96 | 19776 | 96 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | T1_IDX | 96 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("COL1"=1)
filter("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR "COL2"=4 OR "COL2"=5 OR
"COL2"=6)

16 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 83 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 96 | 19776 | 83 (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COL1"=1 AND ("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR
"COL2"=4 OR "COL2"=5 OR "COL2"=6))

14 rows selected.

Elapsed: 00:00:00.05
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.02
SQL>
SQL> alter session set tracefile_identifier = 'fts_8k';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set termout off
SQL>
SQL> select /*+ full(t1) */
2 *
3 from
4 t1
5 where
6 col1 = 1
7 and col2 in (1,2,3,4,5,6);

SQL>
SQL> set termout on
SQL>
SQL> alter session set events '10046 trace name context off';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set tracefile_identifier = 'idx_8k';

Session altered.

Elapsed: 00:00:00.02
SQL>
SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set termout off
SQL>
SQL> select /*+ index(t1) */
2 *
3 from
4 t1
5 where
6 col1 = 1
7 and col2 in (1,2,3,4,5,6);

SQL>
SQL> set termout on
SQL>
SQL> alter session set events '10046 trace name context off';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.70
SQL>
SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> create table t1
2 tablespace test_2k
3 as
4 select
5 trunc(dbms_random.value(0,25)) as col1
6 , trunc(dbms_random.value(0,25)) as col2
7 , rpad('x', 200, 'x') as filler
8 from
9 (
10 select
11 level as id
12 from
13 dual
14 connect by
15 level <= 10000
16 );

Table created.

Elapsed: 00:00:00.46
SQL>
SQL> create index t1_idx on t1 (col1, col2)
2 tablespace test_2k;

Index created.

Elapsed: 00:00:00.13
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.69
SQL>
SQL> explain plan for
2 select /*+ full(t1) */
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.07
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 245 (1)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| T1 | 96 | 19776 | 245 (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COL1"=1 AND ("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR
"COL2"=4 OR "COL2"=5 OR "COL2"=6))

14 rows selected.

Elapsed: 00:00:00.11
SQL>
SQL> explain plan for
2 select /*+ index(t1) */
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.06
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 102 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 96 | 19776 | 102 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | T1_IDX | 96 | | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("COL1"=1)
filter("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR "COL2"=4 OR "COL2"=5 OR
"COL2"=6)

16 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 col1 = 1
8 and col2 in (1,2,3,4,5,6);

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 19776 | 102 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 96 | 19776 | 102 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | T1_IDX | 96 | | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("COL1"=1)
filter("COL2"=1 OR "COL2"=2 OR "COL2"=3 OR "COL2"=4 OR "COL2"=5 OR
"COL2"=6)

16 rows selected.

Elapsed: 00:00:00.10
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.03
SQL>
SQL> alter session set tracefile_identifier = 'fts_2k';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> set termout off
SQL>
SQL> select /*+ full(t1) */
2 *
3 from
4 t1
5 where
6 col1 = 1
7 and col2 in (1,2,3,4,5,6);

SQL> set termout on
SQL>
SQL> alter session set events '10046 trace name context off';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set tracefile_identifier = 'idx_2k';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set events '10046 trace name context forever, level 8';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set termout off
SQL>
SQL> select /*+ index(t1) */
2 *
3 from
4 t1
5 where
6 col1 = 1
7 and col2 in (1,2,3,4,5,6);

SQL> set termout on
SQL>
SQL> alter session set events '10046 trace name context off';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> spool off


And here's what we get from the generated trace files.

The 8KB FTS does these multi-block reads:

WAIT #35: nam='db file scattered read' ela= 2296 file#=8 block#=12050 blocks=8 obj#=61857 tim=107988983808
WAIT #35: nam='db file scattered read' ela= 916 file#=8 block#=12058 blocks=8 obj#=61857 tim=107988984970
WAIT #35: nam='db file scattered read' ela= 911 file#=8 block#=12066 blocks=8 obj#=61857 tim=107988986104


And the 2KB FTS does these multi-block reads:

WAIT #37: nam='db file scattered read' ela= 798 file#=6 block#=66 blocks=32 obj#=61859 tim=107991673671
WAIT #37: nam='db file scattered read' ela= 828 file#=6 block#=98 blocks=32 obj#=61859 tim=107991674822
WAIT #37: nam='db file scattered read' ela= 829 file#=6 block#=130 blocks=32 obj#=61859 tim=107991675980


And the TKPROF output confirms that there is not a relevant difference in the wait times.

The 8KB output:

********************************************************************************

select /*+ full(t1) */
*
from
t1
where
col1 = 1
and col2 in (1,2,3,4,5,6)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.06 296 299 0 93
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.06 296 299 0 93

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61

Rows Row Source Operation
------- ---------------------------------------------------
93 TABLE ACCESS FULL T1 (cr=299 pr=296 pw=0 time=21098 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file sequential read 1 0.00 0.00
db file scattered read 37 0.02 0.05
SQL*Net message from client 2 0.00 0.00
********************************************************************************


The 2KB output:


********************************************************************************

select /*+ full(t1) */
*
from
t1
where
col1 = 1
and col2 in (1,2,3,4,5,6)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.02 0.07 1251 1254 0 93
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.03 0.07 1251 1254 0 93

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61

Rows Row Source Operation
------- ---------------------------------------------------
93 TABLE ACCESS FULL T1 (cr=1254 pr=1251 pw=0 time=21314 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file sequential read 1 0.00 0.00
db file scattered read 40 0.02 0.05
SQL*Net message from client 2 0.00 0.00
********************************************************************************


Don't be mislead by the different number of consistent gets, this is just due to the different block size. As can be seen from the raw trace files the multi-block reads in the 2KB block size are reading 32 blocks at a time, whereas in the 8KB default block size 8 blocks are read at a time, so the number of I/O requests will be similar given a segment of the same size.

Friday, May 22, 2009

Weekend quiz

Quiz questions seem to be in nowadays, so here's one for you regarding the cost based optimizer:

Given this simple script (Run this on any 10.2.x (including XE) or 11.1.0.6/7):


create table test1
as
select * from dba_objects;

create table test2
as
select * from dba_objects;

create index i_test1_1 on test1(object_name);

create index i_test1_2 on test1(object_id);

create index i_test2_1 on test2(object_name);

create index i_test2_2 on test2(object_id);

exec dbms_stats.gather_table_stats(null, 'TEST1', estimate_percent=>null, method_opt=>'for all columns size 1');

exec dbms_stats.gather_table_stats(null, 'TEST2', estimate_percent=>null, method_opt=>'for all columns size 1');


And this simple query:


explain plan for
select
*
from
test1
where
object_name='TEST1'
and object_id in (
select /*+ unnest */
object_id
from
test2
);


with this plan:

Plan 1 (the obvious one):


Plan hash value: 2107173885

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 212 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS SEMI | | 2 | 212 | 6 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST1 | 2 | 202 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | I_TEST1_1 | 2 | | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | I_TEST2_2 | 69450 | 339K| 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("OBJECT_NAME"='TEST1')
4 - access("OBJECT_ID"="OBJECT_ID")


How is it possible to get the following different plans for the same statement in the same database with the same settings and using the same script:

Plan 2a (10.2.x):

Plan hash value: 2706897970

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 99 | 53 (4)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 1 | 99 | 53 (4)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST1 | 1 | 86 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | I_TEST1_1 | 1 | | 1 (0)| 00:00:01 |
| 4 | VIEW | VW_NSO_1 | 12685 | 161K| 50 (2)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST2 | 12685 | 63425 | 50 (2)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("OBJECT_ID"="$nso_col_1")
3 - access("OBJECT_NAME"='TEST1')


Plan 2b (11.1):

Plan hash value: 1356524942

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 210 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS SEMI | | 2 | 210 | 6 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST1 | 2 | 202 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | I_TEST1_1 | 2 | | 3 (0)| 00:00:01 |
| 4 | VIEW PUSHED PREDICATE | VW_NSO_1 | 1 | 4 | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | I_TEST2_2 | 1 | 5 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("OBJECT_NAME"='TEST1')
5 - access("OBJECT_ID"="OBJECT_ID")


Note 1: It's assumed that everything is equal that is (normally) considered by the cost based optimizer: Object statistics, System statistics, segment sizes, the actual data and optimizer related settings including all "underscore" parameters. It can be reproduced on the same database.

Hint: The solution should not be applied to a production system.

Note 2: And no, Timur, you're not allowed to participate, since you already know the answer!

Sunday, May 17, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 3

Back to part 2 Forward to part 4

System statistics in 9i

In 10g the CPU costing mode is enabled by default and is supported by the default NOWORKLOAD system statistics.

But you can use system statistics already in 9i, although you have to enable them explicitly.

Oracle 9i is out of support, but I guess there are still a lot of systems out there that are using the 9.2 release, therefore I find it worth to mention what you can do in 9i with system statistics. This can be helpful if you consider to test your application already in the 9i environment with system statistics before upgrading to 10g.

In most descriptions about 9i and system statistics only WORKLOAD system statistics are mentioned, but 9i also supports NOWORKLOAD system statistics, although not in the same full flavour as 10g does.

You can activate CPU costing in 9i by running DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD'), but this seems to work differently than in 10g and later.

Whereas in 10g and later this actually measures the IOSEEKTIM and IOTFRSPEED values, this seems to activate in 9i something that is comparable with the default NOWORKLOAD system statistics of 10g.

The SYS.AUX_STATS$ does not show any actual values, but tests revealed that 9i (at least 9.2.0.8) in that case seems to measure the CPU speed (in 10g this is the CPUSPEEDNW value) and uses the same default values for IOSEEKTIM and IOTFRSPEED as 10g does (10ms and 4096 bytes/ms resp.).

Running some tests showed that you arrive at the same I/O cost as you do in 10g with the default NOWORKLOAD system statistics.


exec dbms_stats.delete_system_stats;

exec dbms_stats.gather_system_stats('NOWORKLOAD')

alter session set "_table_scan_cost_plus_one" = false;

explain plan for
select
max(val)
from
t1;


Since you can't modify the NOWORKLOAD system statistics values in 9i (i.e. you can't manipulate CPUSPEEDNW, IOSEEKTIM and IOTFRSPEED using DBMS_STATS.SET_SYSTEM_STATS), see the following excerpt from the 9.2.0.8 10053 trace file for the 10,000 blocks MSSM table with 8kb blocksize when using NOWORKLOAD system statistics:


Join order[1]: T1[T1]#0
Best so far: TABLE#: 0 CST: 2715 CDN: 10000 BYTES: 40000
Final - All Rows Plan:
JOIN ORDER: 1
CST: 2715 CDN: 10000 RSC: 2714 RSP: 2714 BYTES: 40000
IO-RSC: 2709 IO-RSP: 2709 CPU-RSC: 72914400 CPU-RSP: 72914400


The I/O cost calculated seems to correspond to what 10g calculates when using default NOWORKLOAD system statistics.

Other variants also seem to correspond to the 10g default NOWORKLOAD model. E.g. using a DB_FILE_MULTIBLOCK_READ_COUNT = 16 we get the following cost calculation:


Join order[1]: T1[T1]#0
Best so far: TABLE#: 0 CST: 2194 CDN: 10000 BYTES: 40000
Final - All Rows Plan:
JOIN ORDER: 1
CST: 2194 CDN: 10000 RSC: 2193 RSP: 2193 BYTES: 40000
IO-RSC: 2188 IO-RSP: 2188 CPU-RSC: 72914400 CPU-RSP: 72914400


If you gather or set WORKLOAD system statistics 9i calculates the same cost as 10g does, based on the SREADTIM, MREADTIM and MBRC values.

You can manipulate the WORKLOAD system statistics using DBMS_STATS.SET_SYSTEM_STATS like you can do in 10g. So the following code snippet mimics the default NOWORKLOAD SREADTIM and MREADTIM values for a 8kb blocksize with a DB_FILE_MULTIBLOCK_READ_COUNT = 8.


drop table t1;

create table t1
pctfree 99
pctused 1
-- tablespace test_2k
-- tablespace test_4k
tablespace test_8k
-- tablespace test_16k
as
with generator as (
select --+ materialize
rownum id
from all_objects
where rownum <= 3000
)
select
/*+ ordered use_nl(v2) */
rownum id,
trunc(100 * dbms_random.normal) val,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 10000
;


begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/

begin
dbms_stats.set_system_stats('MBRC',8);
dbms_stats.set_system_stats('MREADTIM',26.0);
dbms_stats.set_system_stats('SREADTIM',12.0);
dbms_stats.set_system_stats('CPUSPEED',1000000);
end;
/

alter session set "_table_scan_cost_plus_one" = false;

alter session set tracefile_identifier = '9i_workload_system_stats';

alter session set events '10053 trace name context forever, level 1';

explain plan for
select max(val)
from t1;

alter session set events '10053 trace name context off';


Which should arrive at the same cost as the default NOWORKLOAD system statistics. And indeed it gives you this 10053 trace file excerpt:


Join order[1]: T1[T1]#0
Best so far: TABLE#: 0 CST: 2710 CDN: 10000 BYTES: 40000
Final - All Rows Plan:
JOIN ORDER: 1
CST: 2710 CDN: 10000 RSC: 2709 RSP: 2709 BYTES: 40000
IO-RSC: 2709 IO-RSP: 2709 CPU-RSC: 72914400 CPU-RSP: 72914400


If the WORKLOAD system statistics are invalid (e.g. the MREADTIM is not greater than the SREADTIM value or some values are missing) then 9i falls back to traditional I/O cost based costing for the I/O cost but adds obviously CPU costs, which is different from 10g which falls back to NOWORKLOAD system statistics.

If you change in the above code snippet the corresponding code block like this:


begin
dbms_stats.set_system_stats('MBRC',8);
dbms_stats.set_system_stats('MREADTIM',12.0);
dbms_stats.set_system_stats('SREADTIM',12.0);
dbms_stats.set_system_stats('CPUSPEED',1000000);
end;
/


You'll see the following output from the 10053 trace file:


Join order[1]: T1[T1]#0
Best so far: TABLE#: 0 CST: 1519 CDN: 10000 BYTES: 40000
Final - All Rows Plan:
JOIN ORDER: 1
CST: 1519 CDN: 10000 RSC: 1518 RSP: 1518 BYTES: 40000
IO-RSC: 1518 IO-RSP: 1518 CPU-RSC: 72914400 CPU-RSP: 72914400


This an interesting mixture of traditional I/O costing and CPU costing.

Quirks and oddities

While running my tests I came across several quirks and oddities:

- When gathering NOWORKLOAD system statistics with 10.2.0.4 and 11.1.0.7 on my test environment it happened regularly that the IOTFRSPEED was left unchanged at the default of 4096, but sometimes it also changed to a value that looked more like a measured one.

- You obviously can't set the CPUSPEEDNW/IOSEEKTIM/IOTFRSPEED values using DBMS_STATS.SET_SYSTEM_STATS in 9i for the NOWORKLOAD system statistics, so you're bound to the default values used by Oracle.

- If you're on 10g and later and use OPTIMIZER_FEATURES_ENABLE = '9.2.0.8' the NOWORKLOAD system statistics are not used properly. Although the 10053 optimizer trace suggests otherwise by mentioning the NOWORKLOAD values the final cost calculated falls back to traditional I/O based costing (and no CPU costing at all). This is probably caused by the fact that 9i doesn't support the NOWORKLOAD parameters in the SYS.AUX_STATS$ table that are always present in 10g. System statistics work however in 9.2.0.8 compatibility mode if you're using WORKLOAD system statistics with their corresponding values in SYS.AUX_STATS$.

- Altering the SYSTEM value of db_file_multiblock_read_count: This seems to be buggy because the value set is not getting used in the session that altered the system. You need to set other optimizer related values in the same session to make the change of the parameter effective. For more information, see Dion Cho's blog post about the issue.

The next part of the series will cover my "favorite" topic using objects in multiple block sizes and effects on costing, in particular when using system statistics.

Sunday, May 3, 2009

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 2

Back to part 1 Forward to part 3

Before heading on to the remaining modes of system statistics, let's summarize what has been observed in part 1 regarding the default NOWORKLOAD system statistics in 10g and later. The following table shows what the test case from the previous post demonstrated:

Table 1: 8KB MSSM locally managed tablespace 10,000 blocks table segment
default NOWORKLOAD system statistics:


MBRC|SREADTIM|MREADTIM|MREADTIM/SREADTIM|NOCPU cost|adjusted MBRC|CPU cost|CPU/NOCPU cost
----|--------|--------|-----------------|----------|-------------|--------|--------------
8 |12 | 26 | 2.16 |1,518 | 6.59 |2,709 |1.78
16 |12 | 42 | 3.5 | 962 |10.39 |2,188 |2.27
32 |12 | 74 | 6.16 | 610 |16.39 |1,928 |3.16
64 |12 |138 |11.5 | 387 |25.84 |1,798 |4.64
128 |12 |266 |22.16 | 245 |40.82 |1,732 |7.07


If you happen to have a 16KB default blocksize the results would look like the following. Note that the table is now only 5,000 blocks in size, and the SREADTIM is now a bit longer (10+16384/4096=14ms instead of 10+8192/4096=12ms) therefore the 16KB blocksize calculation makes the full table scan look a bit cheaper to the optimizer when using the default NOWORKLOAD system statistics.

Table 2: 16KB MSSM locally managed tablespace 5,000 blocks table segment
default NOWORKLOAD system statistics:


MBRC|SREADTIM|MREADTIM|MREADTIM/SREADTIM|NOCPU cost|adjusted MBRC|CPU cost|CPU/NOCPU cost
----|--------|--------|-----------------|----------|-------------|--------|--------------
4 |14 | 26 | 1.86 |1,119 | 4.17 |2,322 |2.08
8 |14 | 42 | 3.0 | 759 | 6.59 |1,875 |2.47
16 |14 | 74 | 5.3 | 481 |10.39 |1,652 |3.43
32 |14 |138 | 9.86 | 305 |16.39 |1,540 |5.05
64 |14 |266 |19.0 | 194 |25.84 |1,485 |7.65


Gathered NOWORKLOAD system statistics

If you gather NOWORKLOAD system statistics using DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD') the values IOSEEKTIM and IOTFRSPEED will actually get measured and used accordingly.

The remaining calculations how to derive the SREADTIM and MREADTIM values correspond to what has been outlined above.

Gathering NOWORKLOAD statistics:

exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD')

This actually gathers the IOTFRSPEED and IOSEEKTIM values in addition to CPUSPEEDNW rather than using the default values of 4096 and 10.

In 10g and later this may take from a couple of seconds to a couple of minutes depending on the size of your database. Note that this puts additional load onto your system while gathering the NOWORKLOAD system statistics since it submits random reads against all data files.

The following test case shows the different I/O cost calculations when using default NOWORKLOAD system statistics and custom gathered NOWORKLOAD system statistics. It creates again the 10,000 blocks table in a 8KB default block size locally managed tablespace using manual segment space management:


SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.03
SQL>
SQL> create table t1
2 pctfree 99
3 pctused 1
4 -- tablespace test_2k
5 -- tablespace test_4k
6 tablespace test_8k
7 -- tablespace test_16k
8 as
9 with generator as (
10 select --+ materialize
11 rownum id
12 from all_objects
13 where rownum <= 3000
14 )
15 select
16 /*+ ordered use_nl(v2) */
17 rownum id,
18 trunc(100 * dbms_random.normal) val,
19 rpad('x',100) padding
20 from
21 generator v1,
22 generator v2
23 where
24 rownum <= 10000
25 ;

Table created.

Elapsed: 00:00:02.22
SQL>
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.29
SQL> -- default NOWORKLOAD system statistics
SQL> -- ignore CPU costs for the moment
SQL> begin
2 dbms_stats.delete_system_stats;
3 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
4 end;
5 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 04-26-2009 14:21
SYSSTATS_INFO DSTOP 04-26-2009 14:21
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 10
SYSSTATS_MAIN IOTFRSPEED 4096
SYSSTATS_MAIN SREADTIM
SYSSTATS_MAIN MREADTIM
SYSSTATS_MAIN CPUSPEED
SYSSTATS_MAIN MBRC
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> alter session set "_table_scan_cost_plus_one" = false;

Session altered.

Elapsed: 00:00:00.02
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.01
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2709 (0)| 00:00:33 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2709 (0)| 00:00:33 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.08
SQL> -- gather NOWORKLOAD system statistics
SQL> exec dbms_stats.gather_system_stats('NOWORKLOAD')

PL/SQL procedure successfully completed.

Elapsed: 00:00:14.43
SQL> -- ignore CPU costs for the moment
SQL> begin
2 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
3 end;
4 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.02
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 04-26-2009 14:21
SYSSTATS_INFO DSTOP 04-26-2009 14:21
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 14.226
SYSSTATS_MAIN IOTFRSPEED 32517.754
SYSSTATS_MAIN SREADTIM
SYSSTATS_MAIN MREADTIM
SYSSTATS_MAIN CPUSPEED
SYSSTATS_MAIN MBRC
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1403 (0)| 00:00:21 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1403 (0)| 00:00:21 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.04
SQL>
SQL> spool off


Based on the gathered IOSEEKTIM and IOTFRSPEED values the I/O cost calculated is significantly different.

Applying the known formulas we can reproduce the calculated figures:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED

SREADTIM = 14.226 + 8192 / 32,517.754 = 14.478

MREADTIM = IOSEEKTIM + MBRC * DB_BLOCK_SIZE / IOTFRSPEED

MREADTIM = 14.226 + 8 * 8192 / 32,517.754 = 16.241

FTS cost = Blocks below HWM / MBRC * MREADTIM / SREADTIM

FTS cost = 10,000 / 8 * 16.241 / 14.478 = 1,403

Gathered WORKLOAD system statistics

Gathering WORKLOAD system statistics measures a different set of values, including the actual MBRC, SREADTIM and MREADTIM values. The cost calculation therefore doesn't use the synthesized SREADTIM and MREADTIM values any longer, nor does it use the "_db_file_optimizer_read_count" parameter in 10g and later, but uses simply the measured values.

Therefore the I/O costs calculated with WORKLOAD system statistics are not dependent on the "db_file_multiblock_read_count" value used, but the important point to keep in mind is that the gathered WORKLOAD system statistics are based on the "db_file_multiblock_read_count" (in 10g and later on the internal parameter "_db_file_exec_read_count") value used at runtime, so the values measured are obviously influenced by this setting ("_db_file_exec_read_count" equals "db_file_multiblock_read_count" if this has been set and the underscore parameter hasn't been modified).

As already mentioned in part 1 Oracle has introduced with Oracle 10.2 that if the "db_file_multiblock_read_count" is left unset different values for cost calculation and at execution time will be used (8 for calculation and the largest possible I/O size at runtime, usually 1MB on most platforms), so that points in general into the right direction since it allows the calibration code to work out the largest MBRC possible at runtime that can be achieved. Note that Christian Antognini doesn't agree to this approach in his book "Troubleshooting Oracle Performance" where he advices to manually work out the "optimal" MBRC setting running suitable I/O tests.

Note that in 10g and later the runtime engine still uses the "_db_file_exec_read_count", regardless of the MBRC used to calculate the cost.

If you run the following code snippet in 10g and later and check the resulting trace files, you'll see this confirmed:


alter session set tracefile_identifier = 'exec_count_16';

alter session set "_db_file_exec_read_count" = 16;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

select max(val)
from t1;

alter session set events '10046 trace name context off';

alter session set tracefile_identifier = 'exec_count_128';

alter session set "_db_file_exec_read_count" = 128;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

select max(val)
from t1;

alter session set events '10046 trace name context off';


The resulting trace files look like the following:

The 16 blocks setting:


.
.
.
WAIT #2: nam='db file scattered read' ela= 1732 file#=8 block#=12058 blocks=16 obj#=62088 tim=69006657688
WAIT #2: nam='db file scattered read' ela= 1725 file#=8 block#=12074 blocks=16 obj#=62088 tim=69006659628
WAIT #2: nam='db file scattered read' ela= 1726 file#=8 block#=12090 blocks=16 obj#=62088 tim=69006661566
.
.
.


The 128 blocks setting:


.
.
.
WAIT #2: nam='db file scattered read' ela= 13842 file#=8 block#=12169 blocks=128 obj#=62088 tim=69008775308
WAIT #2: nam='db file scattered read' ela= 15513 file#=8 block#=12297 blocks=128 obj#=62088 tim=69008793460
WAIT #2: nam='db file scattered read' ela= 26437 file#=8 block#=12425 blocks=128 obj#=62088 tim=69008822434
.
.
.


Gathering WORKLOAD system statistics:

exec DBMS_STATS.GATHER_SYSTEM_STATS('START')
-- some significant (ideally "representative") workload needs to be performed
-- otherwise some or all of the measured values will be missing
exec DBMS_STATS.GATHER_SYSTEM_STATS('STOP')

or

exec DBMS_STATS.GATHER_SYSTEM_STATS('INTERVAL', )

Note that gathering workload system statistics doesn't put additional load onto your system, since the values are derived from the delta in statistics already maintained by Oracle during database activity.

Furthermore if your workload doesn't use "db file scattered read" i.e. multi-block reads that are working with the buffer cache, then you might end up with WORKLOAD system statistics that are missing the MBRC and MREADTIM component (null values). This might happen if you e.g. use only index access paths with table row random lookups by ROWID and/or all your tablescans are all going parallel, or in 11g use serial direct reads that bypass the buffer cache (which can be activated in pre-11g using the hidden parameter "_serial_direct_read").

The same applies to "db file sequential read" i.e. single-block reads, if you only perform multi-block reads in your workload then the SREADTIM information might be missing from the gathered statistics.

Although the official documentation says in 10.2 for that case the following:
"During the gathering process of workload statistics, it is possible that mbrc and mreadtim will not be gathered if no table scans are performed during serial workloads, as is often the case with OLTP systems. On the other hand, FTS occur frequently on DSS systems but may run parallel and bypass the buffer cache. In such cases, sreadtim will still be gathered since index lookup are performed using the buffer cache. If Oracle cannot gather or validate gathered mbrc or mreadtim, but has gathered sreadtim and cpuspeed, then only sreadtim and cpuspeed will be used for costing. FTS cost will be computed using analytical algorithm implemented in previous releases. Another alternative to computing mbrc and mreadtim is to force FTS in serial mode to allow the optimizer to gather the data."

And the 11g documentation says this:
"If Oracle Database cannot gather or validate gathered mbrc or mreadtim values, but has gathered sreadtim and cpuspeed values, then only the sreadtim and cpuspeed values are used for costing. In this case, the optimizer uses the value of the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to cost a full table scan. However, if DB_FILE_MULTIBLOCK_READ_COUNT is not set or is set to 0 (zero), then the optimizer uses a value of 8 for costing."

But when testing this it looked like that if either MBRC or MREADTIM was missing then the optimizer simply reverted to the available NOWORKLOAD system statistics (Note that this applies to 10g and later; I'll show in the next part of the series what happens in 9i since things are different there).

Note in order to have the optimizer accept the WORKLOAD system statistic the MREADTIM needs to greater than the SREADTIM. If your multi-block read requests are served from a cache or your storage system performs aggresssive read-aheads the measured MREADTIM can be less than the SREADTIM. In this case you might need to adjust the MREADTIM manually using the GET_SYSTEM_STATS/SET_SYSTEM_STATS API, which will be covered below.

One interesting oddity showed up when MBRC was available but MREADTIM was missing or not greater than SREADTIM: In that case it looks like that the NOWORKLOAD statistics use the MBRC set in their calculations for synthesizing the MREADTIM and calculating the full table scan cost. This makes sense but is interesting mixture of NOWORKLOAD and WORKLOAD system statistics.

The following test case shows how to gather WORKLOAD system statistics, and how to correct manually a MREADTIM value gathered too low.


SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:02.44
SQL>
SQL> create table t1
2 pctfree 99
3 pctused 1
4 -- tablespace test_2k
5 -- tablespace test_4k
6 tablespace test_8k
7 -- tablespace test_16k
8 as
9 with generator as (
10 select --+ materialize
11 rownum id
12 from all_objects
13 where rownum <= 3000
14 )
15 select
16 /*+ ordered use_nl(v2) */
17 rownum id,
18 trunc(100 * dbms_random.normal) val,
19 rpad('x',100) padding
20 from
21 generator v1,
22 generator v2
23 where
24 rownum <= 10000
25 ;

Table created.

Elapsed: 00:00:02.27
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 't1',
5 cascade => true,
6 estimate_percent => null,
7 method_opt => 'for all columns size 1'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.60
SQL>
SQL> begin
2 dbms_stats.delete_system_stats;
3 dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
4 end;
5 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.07
SQL>
SQL> alter session set "_table_scan_cost_plus_one" = false;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.gather_system_stats('START')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.12
SQL>
SQL> begin
2 for i in 1..10 loop
3 for rec in (
4 select
5 max(val)
6 from
7 t1
8 ) loop
9 execute immediate 'alter system flush buffer_cache';
10 end loop;
11 end loop;
12 for rec in (
13 select /*+ use_nl(a t1) */ max(val) from t1,
14 (
15 select /*+ no_merge no_eliminate_oby */
16 rowid as row_id
17 from
18 t1
19 order by
20 dbms_random.value
21 ) a
22 where a.row_id = t1.rowid
23 ) loop
24 null;
25 end loop;
26 end;
27 /

PL/SQL procedure successfully completed.

Elapsed: 00:01:17.73
SQL>
SQL> exec dbms_stats.gather_system_stats('STOP')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.74
SQL>
SQL> begin
2 dbms_stats.set_system_stats('CPUSPEED',1000000);
3 end;
4 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> declare
2 s_status varchar2(200);
3 dt_dstart date;
4 dt_dstop date;
5 n_pvalue number;
6 begin
7 dbms_stats.get_system_stats (
8 s_status,
9 dt_dstart,
10 dt_dstop,
11 'MREADTIM',
12 n_pvalue);
13 dbms_output.put_line('Status: ' || s_status);
14 dbms_output.put_line('Dstart: ' || to_char(dt_dstart, 'DD.MM.YYYY HH24:MI:SS'));
15 dbms_output.put_line('Dstop : ' || to_char(dt_dstop , 'DD.MM.YYYY HH24:MI:SS'));
16 dbms_output.put_line('Value : ' || to_char(n_pvalue, 'TM'));
17 dbms_stats.set_system_stats('MREADTIM', 10 * n_pvalue);
18 end;
19 /
Status: COMPLETED
Dstart: 03.05.2009 13:24:00
Dstop : 03.05.2009 13:24:00
Value : 1.293

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.11
SQL>
SQL> column sname format a20
SQL> column pname format a20
SQL> column pval2 format a20
SQL>
SQL> select
2 sname
3 , pname
4 , pval1
5 , pval2
6 from
7 sys.aux_stats$;

SNAME PNAME PVAL1 PVAL2
-------------------- -------------------- ---------- --------------------
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_INFO DSTART 05-03-2009 13:24
SYSSTATS_INFO DSTOP 05-03-2009 13:24
SYSSTATS_INFO FLAGS 1
SYSSTATS_MAIN CPUSPEEDNW 1000000
SYSSTATS_MAIN IOSEEKTIM 10
SYSSTATS_MAIN IOTFRSPEED 4096
SYSSTATS_MAIN SREADTIM 8.021
SYSSTATS_MAIN MREADTIM 12.93
SYSSTATS_MAIN CPUSPEED 1000000
SYSSTATS_MAIN MBRC 8
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN SLAVETHR

13 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> explain plan for
2 select
3 max(val)
4 from
5 t1;

Explained.

Elapsed: 00:00:00.02
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2016 (0)| 00:00:17 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2016 (0)| 00:00:17 |
---------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.17
SQL>
SQL> spool off


Using the known formula we can confirm the cost calculation above:

Blocks below HWM / MBRC: Number of multi-block read requests required to scan the segment

Number of multi-block read requests * MREADTIM = time it takes to perform these number of read requests in milliseconds.

Finally this is divided by SREADTIM to arrive at the known unit used for cost representation which is number of single read requests.

10,000 / 8 = 1,250 multi-block read requests

1,250 * 12.93 = 16,162.5 ms execution time (which is shown as 17 seconds in the plan by the way)

16,162.5 / 8.021 = 2,015.02 (which is shown as 2,016 in the plan)

Manually writing and maintaining system statistics

Using the DBMS_STATS.GET_SYSTEM_STATS / SET_SYSTEM_STATS API you can write your own set of system statistics for both NOWORKLOAD and WORKLOAD values.

You can use DBMS_STATS.DELETE_SYSTEM_STATS to remove the system statistics, which will activate the default NOWORKLOAD system statistics in 10g and disable CPU costing in 9i.

You can use DBMS_STATS.EXPORT_SYSTEM_STATS / IMPORT_SYSTEM_STATS to export and import system statistics to a user statistics table created with DBMS_STATS.CREATE_STAT_TABLE.

Note that DBMS_STATS.GATHER_SYSTEM_STATS when used with an user stats table (created with DBMS_STATS.CREATE_STAT_TABLE) behaves differently than e.g. DBMS_STATS.GATHER_TABLE_STATS: Whereas object related statistics always go to the data dictionary and you only have the option to save the current statistics to the user stats table before replacing them with the new values, GATHER_SYSTEM_STATS actually writes the system statistics into the user stats table and doesn't change the actual system statistics if you're supplying a user stats table name.

For further discussion how you could use system statistics see Jonathan Lewis' thoughts about this topic:

Part 1
Part 2
Part 3

The next part of the series will cover the usage of system statistics in 9i, highlight some quirks and oddities observed and finally show what happens if you attempt to use multiple block sizes for "tuning" purposes.