Monday, July 23, 2007

Subpartitions and optimizer statistics

UPDATE: If you came here because you seem to have issues with execution plans after upgrading to 10.2.0.4, please read this note about a severe bug in optimizer of the 10.2.0.4 patch set regarding subpartition pruning to a single subpartition.

Recently I came across the following issue when analyzing some long running SQL statements at a customer site:

Some of their tables are using range-list composite partitioning, and the list subpartitions represent feeds which are loaded into the database by a generic ETL process. Some of these feeds are fairly large (millions of rows) but some others can be small (like only 1000 rows). So the resulting subpartitions differ dramatically in size. Statistics on subpartition level are updated properly after each data load and represent the different sizes very well. So far, so good.

In the course of the process, some of these subpartitions are then joined to some other big tables, again having millions of rows. The large table being joined has an index on the join column which is unique. The SQL being generated uses explicit subpartition pruning so that the optimizer knows at parse time exactly which single subpartition will be accessed by the statement.

Now we observed the following weird behaviour when this join took place: Regardless of the size of the subpartition being joined to the large table, the optimizer always chose a hash join with full tablescans on both tables. Even when a very small subpartition was joined to the large table, when running the statement in serial mode it took up to 15 minutes to complete the statement due to full tablescan on the large table.

Now we started to check all the usual things that might go wrong and mislead the optimizer, but all the things we checked worked out to be reasonably set including all the table, index and column statistics in charge.

Still, when looking at the execution plan, it was quite obvious what was driving the optimizer to that hash join: The estimated cardinality of the subpartition was terribly wrong and seemed to be taken from the partition level of the statistics rather than the appropriate subpartition statistics.

In addition the 10g explain plan included an operation called "partition combined iterator" with "KEY,KEY" as partition start and stop criteria which was irritating since the explicit subpartition pruning should give enough information to use exactly that partition as start and stop. The actual full table scan line below that "iterator" in the execution then again showed exactly the same partition start and stop key, as you would expect in this case, but - as already mentioned - the cardinality was way off and correlated to the cardinality of the parent range partition.

So I decided to create a generic test case in order to open a Service Request with Oracle to sort this out. And here is the irritating result (tested on 10gR2, 10.2.0.2, AIX 5.3.0.4).

First testcase focuses on the estimated cardinality when performing explicit subpartition pruning on range-list partitioned tables:

SQL>
SQL> drop table partition_test;

Table dropped.

SQL>
SQL> create table partition_test (
2 x_pkey number not null,
3 x_slice varchar2(20) not null,
4 data1 number
5 )
6 partition by range (x_pkey)
7 subpartition by list (x_slice)
8 (
9 partition pkey_0 values less than (1)
10 (
11 subpartition pkey_0_xxx values (' xxx '),
12 subpartition pkey_0_001 values ('001')
13 ),
14 partition pkey_1 values less than (2)
15 (
16 subpartition pkey_1_xxx values (' xxx '),
17 subpartition pkey_1_001 values ('001'),
18 subpartition pkey_1_101 values ('101')
19 )
20 );

Table created.

OK, now we have a range-list partitioned table with two range partitions and each having several list subpartitions. So let's put some data into it, and the important point here is that it differs in row count.

SQL>
SQL> insert into partition_test (x_pkey, x_slice, data1)
2 select 1, '001', seq
3 from (
4 select level as seq from dual connect by level <= 10
5 );

10 rows created.

SQL>
SQL> insert into partition_test (x_pkey, x_slice, data1)
2 select 1, '101', seq
3 from (
4 select level as seq from dual connect by level <= 1000
5 );

1000 rows created.

SQL>
SQL> begin dbms_stats.gather_table_stats(ownname=>USER,tabname=>'PARTITION_TEST',granularity=>'ALL'); end;
2 /

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, partition_name, subpartition_name, num_rows from user_tab_statistics where table_name = 'PARTITION_TEST';

TABLE_NAMEִִִִִִִִִִִִִִִִִִִִִPARTITION_NAMEִִִִִִִִִִִִִִִִִSUBPARTITION_NAMEִִִִִִִִִִִִִִִִNUM_ROWS
------------------------------ִ------------------------------ִ------------------------------ִ----------
PARTITION_TESTִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ1010
PARTITION_TESTִִִִִִִִִִִִִִִִִPKEY_0ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ0
PARTITION_TESTִִִִִִִִִִִִִִִִִPKEY_1ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ1010
PARTITION_TESTִִִִִִִִִִִִִִִִִPKEY_0ִִִִִִִִִִִִִִִִִִִִִִִִִPKEY_0_XXXִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ0
PARTITION_TESTִִִִִִִִִִִִִִִִִPKEY_0ִִִִִִִִִִִִִִִִִִִִִִִִִPKEY_0_001ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ0
PARTITION_TESTִִִִִִִִִִִִִִִִִPKEY_1ִִִִִִִִִִִִִִִִִִִִִִִִִPKEY_1_XXXִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ0
PARTITION_TESTִִִִִִִִִִִִִִִִִPKEY_1ִִִִִִִִִִִִִִִִִִִִִִִִִPKEY_1_001ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ10
PARTITION_TESTִִִִִִִִִִִִִִִִִPKEY_1ִִִִִִִִִִִִִִִִִִִִִִִִִPKEY_1_101ִִִִִִִִִִִִִִִִִִִִִִִִִִִ1000


8 rows selected.


OK, data is in, and we have proper statistics generated. Now let's check some simple cardinality estimations when performing explicit (or implicit, doesn't really matter, the implicit pruning using where clauses might lead to different smaller cardinalities due to the additional filter predicates used to calculate the cardinality, but the main issue is still the same) partition pruning:

SQL>
SQL> explain plan for select data1 from partition_test subpartition (pkey_1_001);

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 315551227

--------------------------------------------------------------------------------------------------------------
|ִIdִִ|ִOperationִִִִִִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִ|ִRowsִִ|ִBytesִ|ִCostִ(%CPU)|ִTimeִִִִִ|ִPstart|ִPstopִ|
--------------------------------------------------------------------------------------------------------------
|ִִִ0ִ|ִSELECTִSTATEMENTִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִ|ִִ1010ִ|ִִ4040ִ|ִִִִִ4ִִִ(0)|ִ00:00:01ִ|ִִִִִִִ|ִִִִִִִ|
|ִִִ1ִ|ִִPARTITIONִCOMBINEDִITERATOR|ִִִִִִִִִִִִִִִִ|ִִ1010ִ|ִִ4040ִ|ִִִִִ4ִִִ(0)|ִ00:00:01ִ|ִִִKEYִ|ִִִKEYִ|
|ִִִ2ִ|ִִִTABLEִACCESSִFULLִִִִִִִִִ|ִPARTITION_TESTִ|ִִ1010ִ|ִִ4040ִ|ִִִִִ4ִִִ(0)|ִ00:00:01ִ|ִִִִִ4ִ|ִִִִִ4ִ|
--------------------------------------------------------------------------------------------------------------

9 rows selected.

SQL>
SQL> select /*+ gather_plan_statistics */ data1 from partition_test subpartition (pkey_1_001);

ִִִִִDATA1
----------
ִִִִִִִִִ1
ִִִִִִִִִ2
ִִִִִִִִִ3
ִִִִִִִִִ4
ִִִִִִִִִ5
ִִִִִִִִִ6
ִִִִִִִִִ7
ִִִִִִִִִ8
ִִִִִִִִִ9
ִִִִִִִִ10


10 rows selected.


SQL>
SQL> select * from table(dbms_xplan.display_cursor(NULL, 0, 'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8r0m2ku4f5vaw, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ data1 from partition_test subpartition (pkey_1_001)

Plan hash value: 315551227

--------------------------------------------------------------------------------------------------------
|ִIdִִ|ִOperationִִִִִִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִ|ִStartsִ|ִE-Rowsִ|ִA-Rowsִ|ִִִA-Timeִִִ|ִBuffersִ|
--------------------------------------------------------------------------------------------------------
|ִִִ1ִ|ִִPARTITIONִCOMBINEDִITERATOR|ִִִִִִִִִִִִִִִִ|ִִִִִִ1ִ|ִִִ1010ִ|ִִִִִ10ִ|00:00:00.01ִ|ִִִִִִִ8ִ|
|ִִִ2ִ|ִִִTABLEִACCESSִFULLִִִִִִִִִ|ִPARTITION_TESTִ|ִִִִִִ1ִ|ִִִ1010ִ|ִִִִִ10ִ|00:00:00.01ִ|ִִִִִִִ8ִ|
--------------------------------------------------------------------------------------------------------


13 rows selected.

SQL>

Wow, that's weird. What the heck is the optimizer doing here? Why does it refuse to use the obviously available subpartition statistics? Well, it would probably need a 10053 optimizer trace to get a clue, but I've not done that yet. By the way, the plans are not that detailed in 9iR2 (9.2.0.2 and 9.2.0.8 tested), but the results are the same. So this looks like a feature? Works as designed?

Here's a second testcase showing the impact of the issue. Now we create a similar table (range-list composite partitioning) and in addition a second table which represents our large join table mentioned above. Instead of actually loading a large amount of data I'll instead fake the optimizer statistics so that the tables look to the optimizer like being very large:


SQL>
SQL> drop table partition_test2;

Table dropped.

SQL>
SQL> create table partition_test2 (
2 x_pkey number not null,
3 x_slice varchar2(20) not null,
4 data1 number not null
5 )
6 partition by range (x_pkey)
7 subpartition by list (x_slice)
8 (
9 partition pkey_0 values less than (1)
10 (
11 subpartition pkey_0_xxx values (' xxx '),
12 subpartition pkey_0_001 values ('001')
13 ),
14 partition pkey_1 values less than (2)
15 (
16 subpartition pkey_1_xxx values (' xxx '),
17 subpartition pkey_1_001 values ('001'),
18 subpartition pkey_1_101 values ('101')
19 )
20 );

Table created.

SQL>
SQL> insert into partition_test2 (x_pkey, x_slice, data1)
2 select 1, '001', seq
3 from (
4 select level as seq from dual connect by level <= 10
5 );

10 rows created.

SQL>
SQL> insert into partition_test2 (x_pkey, x_slice, data1)
2 select 1, '101', seq
3 from (
4 select level as seq from dual connect by level <= 1000
5 );

1000 rows created.

SQL>
SQL> begin dbms_stats.gather_table_stats(ownname=>USER,tabname=>'PARTITION_TEST2',granularity=>'ALL'); end;
2 /

PL/SQL procedure successfully completed.

SQL>
SQL> begin dbms_stats.set_table_stats(ownname=>USER,tabname=>'PARTITION_TEST2',partname=>'PKEY_1_101',numrows=>1000000,numblks=>100000); end;
2 /

PL/SQL procedure successfully completed.

SQL>
SQL> begin dbms_stats.set_table_stats(ownname=>USER,tabname=>'PARTITION_TEST2',partname=>'PKEY_1',numrows=>1000010,numblks=>100005); end;
2 /

PL/SQL procedure successfully completed.

SQL>
SQL> begin dbms_stats.set_table_stats(ownname=>USER,tabname=>'PARTITION_TEST2',numrows=>1000010,numblks=>100005); end;
2 /

PL/SQL procedure successfully completed.

SQL>
SQL> declare
2 col_stat_rec dbms_stats.StatRec;
3 col_vals dbms_stats.numarray := dbms_stats.numarray(1, 1000000);
4 begin
5 col_stat_rec.epc := 2;
6 col_vals(1) := 1;
7 col_vals(2) := 1000000;
8 dbms_stats.prepare_column_values(col_stat_rec, col_vals);
9 dbms_stats.set_column_stats(ownname=>USER,tabname=>'PARTITION_TEST2',partname=>'PKEY_1_101',colname=>'DATA1',distcnt=>1000000,density=>1/1000000, srec=>col_stat_rec);
10 dbms_stats.set_column_stats(ownname=>USER,tabname=>'PARTITION_TEST2',partname=>'PKEY_1',colname=>'DATA1',distcnt=>1000000,density=>1/1000000, srec=>col_stat_rec);
11 dbms_stats.set_column_stats(ownname=>USER,tabname=>'PARTITION_TEST2',colname=>'DATA1',distcnt=>1000000,density=>1/1000000, srec=>col_stat_rec);
12 end;
13 /

PL/SQL procedure successfully completed.

SQL>
SQL> column table_name format a20
SQL> column index_name format a20
SQL> column partition_name format a10
SQL> column subpartition_name format a15
SQL> column column_name format a15
SQL> column plan_table_output format a300
SQL>
SQL> select table_name, partition_name, subpartition_name, num_rows, blocks from user_tab_statistics where table_name = 'PARTITION_TEST2';

TABLE_NAMEִִִִִִִִִִִPARTITION_ִSUBPARTITION_NAִִִNUM_ROWSִִִִִBLOCKS
--------------------ִ----------ִ---------------ִ----------ִ----------
PARTITION_TEST2ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ1000010ִִִִִ100005
PARTITION_TEST2ִִִִִִPKEY_0ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0
PARTITION_TEST2ִִִִִִPKEY_1ִִִִִִִִִִִִִִִִִִִִִִִִ1000010ִִִִִ100005
PARTITION_TEST2ִִִִִִPKEY_0ִִִִִPKEY_0_XXXִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0
PARTITION_TEST2ִִִִִִPKEY_0ִִִִִPKEY_0_001ִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0
PARTITION_TEST2ִִִִִִPKEY_1ִִִִִPKEY_1_XXXִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0
PARTITION_TEST2ִִִִִִPKEY_1ִִִִִPKEY_1_001ִִִִִִִִִִִִִִ10ִִִִִִִִִִ5
PARTITION_TEST2ִִִִִִPKEY_1ִִִִִPKEY_1_101ִִִִִִִִִ1000000ִִִִִ100000


8 rows selected.


SQL>
SQL> select column_name, num_distinct, density, num_nulls from user_tab_col_statistics where table_name = 'PARTITION_TEST2';

COLUMN_NAMEִִִִִNUM_DISTINCTִִִִDENSITYִִNUM_NULLS
---------------ִ------------ִ----------ִ----------
X_PKEYִִִִִִִִִִִִִִִִִִִִִ1ִִִִִִִִִִ1ִִִִִִִִִִ0
X_SLICEִִִִִִִִִִִִִִִִִִִִ2ִִִִִִִִִ.5ִִִִִִִִִִ0
DATA1ִִִִִִִִִִִִִִִִ1000000ִִִִ.000001ִִִִִִִִִִ0


SQL>
SQL> select partition_name, column_name, num_distinct, density, num_nulls from user_part_col_statistics where table_name = 'PARTITION_TEST2';

PARTITION_ִCOLUMN_NAMEִִִִִNUM_DISTINCTִִִִDENSITYִִNUM_NULLS
----------ִ---------------ִ------------ִ----------ִ----------
PKEY_0ִִִִִX_PKEYִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_0ִִִִִX_SLICEִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_0ִִִִִDATA1ִִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_1ִִִִִX_PKEYִִִִִִִִִִִִִִִִִִִִִ1ִִִִִִִִִִ1ִִִִִִִִִִ0
PKEY_1ִִִִִX_SLICEִִִִִִִִִִִִִִִִִִִִ2ִִִִִִִִִ.5ִִִִִִִִִִ0
PKEY_1ִִִִִDATA1ִִִִִִִִִִִִִִִִ1000000ִִִִ.000001ִִִִִִִִִִ0


6 rows selected.

SQL>
SQL> select subpartition_name, column_name, num_distinct, density, num_nulls from user_subpart_col_statistics where table_name = 'PARTITION_TEST2';

SUBPARTITION_NAִCOLUMN_NAMEִִִִִNUM_DISTINCTִִִִDENSITYִִNUM_NULLS
---------------ִ---------------ִ------------ִ----------ִ----------
PKEY_1_101ִִִִִִX_PKEYִִִִִִִִִִִִִִִִִִִִִ1ִִִִִִִִִִ1ִִִִִִִִִִ0
PKEY_1_001ִִִִִִX_PKEYִִִִִִִִִִִִִִִִִִִִִ1ִִִִִִִִִִ1ִִִִִִִִִִ0
PKEY_1_XXXִִִִִִX_PKEYִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_1_101ִִִִִִX_SLICEִִִִִִִִִִִִִִִִִִִִ1ִִִִִִִִִִ1ִִִִִִִִִִ0
PKEY_1_001ִִִִִִX_SLICEִִִִִִִִִִִִִִִִִִִִ1ִִִִִִִִִִ1ִִִִִִִִִִ0
PKEY_1_XXXִִִִִִX_SLICEִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_1_101ִִִִִִDATA1ִִִִִִִִִִִִִִִִ1000000ִִִִ.000001ִִִִִִִִִִ0
PKEY_1_001ִִִִִִDATA1ִִִִִִִִִִִִִִִִִִִִִ10ִִִִִִִִִ.1ִִִִִִִִִִ0
PKEY_1_XXXִִִִִִDATA1ִִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_0_XXXִִִִִִX_PKEYִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_0_001ִִִִִִX_PKEYִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_0_XXXִִִִִִX_SLICEִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_0_001ִִִִִִX_SLICEִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_0_XXXִִִִִִDATA1ִִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0
PKEY_0_001ִִִִִִDATA1ִִִִִִִִִִִִִִִִִִִִִִ0ִִִִִִִִִִ0ִִִִִִִִִִ0


15 rows selected.

SQL>
SQL> drop table join_table;

Table dropped.

SQL>
SQL> create table join_table (
2 data1 number not null,
3 data2 number
4 );

Table created.

SQL>
SQL> insert into join_table (data1, data2)
2 select seq, seq
3 from (
4 select level as seq from dual connect by level <= 1000
5 );

1000 rows created.

SQL>
SQL> create unique index join_table_pk on join_table (data1);

Index created.

SQL>
SQL> begin dbms_stats.gather_table_stats(ownname=>USER,tabname=>'JOIN_TABLE', cascade=>true); end;
2 /

PL/SQL procedure successfully completed.

SQL>
SQL> begin dbms_stats.set_table_stats(ownname=>USER,tabname=>'JOIN_TABLE',numrows=>10000000,numblks=>1000000); end;
2 /

PL/SQL procedure successfully completed.

SQL>
SQL> begin dbms_stats.set_index_stats(ownname=>USER,indname=>'JOIN_TABLE_PK',numrows=>10000000,numlblks=>100000,numdist=>10000000,clstfct=>1000000,indlevel=>3); end;
2 /

PL/SQL procedure successfully completed.

SQL>
SQL> declare
2 col_stat_rec dbms_stats.StatRec;
3 col_vals dbms_stats.numarray := dbms_stats.numarray(1, 10000000);
4 begin
5 col_stat_rec.epc := 2;
6 col_vals(1) := 1;
7 col_vals(2) := 10000000;
8 dbms_stats.prepare_column_values(col_stat_rec, col_vals);
9 dbms_stats.set_column_stats(ownname=>USER,tabname=>'JOIN_TABLE',colname=>'DATA1',distcnt=>10000000,density=>1/10000000, srec=>col_stat_rec);
10 end;
11 /

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, num_rows, blocks from user_tab_statistics where table_name = 'JOIN_TABLE';

TABLE_NAMEִִִִִִִִִִִִִNUM_ROWSִִִִִBLOCKS
--------------------ִ----------ִ----------
JOIN_TABLEִִִִִִִִִִִִִ10000000ִִִִ1000000


SQL>
SQL> select column_name, num_distinct, density, num_nulls from user_tab_col_statistics where table_name = 'JOIN_TABLE';

COLUMN_NAMEִִִִִNUM_DISTINCTִִִִDENSITYִִNUM_NULLS
---------------ִ------------ִ----------ִ----------
DATA1ִִִִִִִִִִִִִִִ10000000ִִִ.0000001ִִִִִִִִִִ0
DATA2ִִִִִִִִִִִִִִִִִִִ1000ִִִִִִִ.001ִִִִִִִִִִ0


SQL>
SQL> select table_name, index_name, num_rows, leaf_blocks, clustering_factor, distinct_keys, blevel
2 from user_ind_statistics where table_name = 'JOIN_TABLE';

TABLE_NAMEִִִִִִִִִִִINDEX_NAMEִִִִִִִִִִִִִNUM_ROWSִLEAF_BLOCKSִCLUSTERING_FACTORִDISTINCT_KEYSִִִִִBLEVEL
--------------------ִ--------------------ִ----------ִ-----------ִ-----------------ִ-------------ִ----------
JOIN_TABLEִִִִִִִִִִִJOIN_TABLE_PKִִִִִִִִִִ10000000ִִִִִִ100000ִִִִִִִִִִִ1000000ִִִִִִ10000000ִִִִִִִִִִ3


SQL>

Now we have two tables:

The range-list partitioned table is supposed to have a subpartition holding 10 records and another one holding 1.000.000 records. The remaining ones are left empty.

The large join table is supposed to have 10.000.000 records and a unique index on the join column.

Here is the plan when joining the main range partition holding the 1.000.010 records (both subpartitions included) to the large table:

SQL>
SQL> explain plan for select a.data1, b.data2 from partition_test2 partition (pkey_1) a, join_table b where a.data1 = b.data1;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1688927127

-------------------------------------------------------------------------------------------------------------------
|ִIdִִ|ִOperationִִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִִ|ִRowsִִ|ִBytesִ|TempSpc|ִCostִ(%CPU)|ִTimeִִִִִ|ִPstart|ִPstopִ|
-------------------------------------------------------------------------------------------------------------------
|ִִִ0ִ|ִSELECTִSTATEMENTִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִ1000K|ִִִִ10M|ִִִִִִִ|ִִִ223Kִִ(1)|ִ00:44:42ִ|ִִִִִִִ|ִִִִִִִ|
|*ִִ1ִ|ִִHASHִJOINִִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִ1000K|ִִִִ10M|ִִִִ15M|ִִִ223Kִִ(1)|ִ00:44:42ִ|ִִִִִִִ|ִִִִִִִ|
|ִִִ2ִ|ִִִPARTITIONִRANGEִSINGLE|ִִִִִִִִִִִִִִִִִ|ִִ1000K|ִִ3906K|ִִִִִִִ|ִ19401ִִִ(1)|ִ00:03:53ִ|ִִִִִ2ִ|ִִִִִ2ִ|
|ִִִ3ִ|ִִִִPARTITIONִLISTִALLִִִ|ִִִִִִִִִִִִִִִִִ|ִִ1000K|ִִ3906K|ִִִִִִִ|ִ19401ִִִ(1)|ִ00:03:53ִ|ִִִִִ1ִ|ִִִִִ3ִ|
|ִִִ4ִ|ִִִִִTABLEִACCESSִFULLִִִ|ִPARTITION_TEST2ִ|ִִ1000K|ִִ3906K|ִִִִִִִ|ִ19401ִִִ(1)|ִ00:03:53ִ|ִִִִִ3ִ|ִִִִִ5ִ|
|ִִִ5ִ|ִִִTABLEִACCESSִFULLִִִִִ|ִJOIN_TABLEִִִִִִ|ִִִִ10M|ִִִִ66M|ִִִִִִִ|ִִִ193Kִִ(1)|ִ00:38:48ִ|ִִִִִִִ|ִִִִִִִ|
-------------------------------------------------------------------------------------------------------------------

PredicateִInformationִ(identifiedִbyִoperationִid):
---------------------------------------------------
ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ
ִִִ1ִ-ִaccess("A"."DATA1"="B"."DATA1")ִִִִִִִִִִִִִ


17 rows selected.

This seems to make sense, two large sets to join, full table scans with hash join looks reasonable.

So let's join now the very small subpartition to this large table, as it might happen in our real world scenario:

SQL>
SQL> explain plan for select a.data1, b.data2 from partition_test2 subpartition (pkey_1_001) a, join_table b where a.data1 = b.data1;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1183211070

------------------------------------------------------------------------------------------------------------------------
|ִIdִִ|ִOperationִִִִִִִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִִ|ִRowsִִ|ִBytesִ|TempSpc|ִCostִ(%CPU)|ִTimeִִִִִ|ִPstart|ִPstopִ|
------------------------------------------------------------------------------------------------------------------------
|ִִִ0ִ|ִSELECTִSTATEMENTִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִ1000K|ִִִִ10M|ִִִִִִִ|ִִִ223Kִִ(1)|ִ00:44:42ִ|ִִִִִִִ|ִִִִִִִ|
|*ִִ1ִ|ִִHASHִJOINִִִִִִִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִ1000K|ִִִִ10M|ִִִִ15M|ִִִ223Kִִ(1)|ִ00:44:42ִ|ִִִִִִִ|ִִִִִִִ|
|ִִִ2ִ|ִִִPARTITIONִCOMBINEDִITERATOR|ִִִִִִִִִִִִִִִִִ|ִִ1000K|ִִ3906K|ִִִִִִִ|ִ19401ִִִ(1)|ִ00:03:53ִ|ִִִKEYִ|ִִִKEYִ|
|ִִִ3ִ|ִִִִTABLEִACCESSִFULLִִִִִִִִִ|ִPARTITION_TEST2ִ|ִִ1000K|ִִ3906K|ִִִִִִִ|ִ19401ִִִ(1)|ִ00:03:53ִ|ִִִִִ4ִ|ִִִִִ4ִ|
|ִִִ4ִ|ִִִTABLEִACCESSִFULLִִִִִִִִִִ|ִJOIN_TABLEִִִִִִ|ִִִִ10M|ִִִִ66M|ִִִִִִִ|ִִִ193Kִִ(1)|ִ00:38:48ִ|ִִִִִִִ|ִִִִִִִ|
------------------------------------------------------------------------------------------------------------------------


PredicateִInformationִ(identifiedִbyִoperationִid):
---------------------------------------------------
ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ
ִִִ1ִ-ִaccess("A"."DATA1"="B"."DATA1")ִִִִִִִִִִִִִ


16 rows selected.

SQL>

Again this weird "partition combined iterator, key, key", and again a single partition access shown in the full table scan. But yet again, it refuses to accept that this single subpartition - according to the statistics - just holds 10 rows rather than 1.000.000 rows. Therefore its conclusion is: the cheapest operation is hash join / full table scan, which really hurts for the big table.

Let's assume the optimizer would take the correct statistics, what would happen then to the plan? Here's a proposal:

SQL> explain plan for select /*+ cardinality(a, 10) */ a.data1, b.data2 from partition_test2 subpartition (pkey_1_001) a, join_table b where a.data1 = b.data1;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2056129735

----------------------------------------------------------------------------------------------------------------
|ִIdִִ|ִOperationִִִִִִִִִִִִִִִִִִִִ|ִNameִִִִִִִִִִִִ|ִRowsִִ|ִBytesִ|ִCostִ(%CPU)|ִTimeִִִִִ|ִPstart|ִPstopִ|
----------------------------------------------------------------------------------------------------------------
|ִִִ0ִ|ִSELECTִSTATEMENTִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִִִ10ִ|ִִִ110ִ|ִ19423ִִִ(1)|ִ00:03:54ִ|ִִִִִִִ|ִִִִִִִ|
|ִִִ1ִ|ִִNESTEDִLOOPSִִִִִִִִִִִִִִִִ|ִִִִִִִִִִִִִִִִִ|ִִִִ10ִ|ִִִ110ִ|ִ19423ִִִ(1)|ִ00:03:54ִ|ִִִִִִִ|ִִִִִִִ|
|ִִִ2ִ|ִִִPARTITIONִCOMBINEDִITERATOR|ִִִִִִִִִִִִִִִִִ|ִִִִ10ִ|ִִִִ40ִ|ִ19393ִִִ(1)|ִ00:03:53ִ|ִִִKEYִ|ִִִKEYִ|
|ִִִ3ִ|ִִִִTABLEִACCESSִFULLִִִִִִִִִ|ִPARTITION_TEST2ִ|ִִִִ10ִ|ִִִִ40ִ|ִ19393ִִִ(1)|ִ00:03:53ִ|ִִִִִ4ִ|ִִִִִ4ִ|
|ִִִ4ִ|ִִִTABLEִACCESSִBYִINDEXִROWID|ִJOIN_TABLEִִִִִִ|ִִִִִ1ִ|ִִִִִ7ִ|ִִִִִ3ִִִ(0)|ִ00:00:01ִ|ִִִִִִִ|ִִִִִִִ|
|*ִִ5ִ|ִִִִINDEXִUNIQUEִSCANִִִִִִִִִ|ִJOIN_TABLE_PKִִִ|ִִִִִ1ִ|ִִִִִִִ|ִִִִִ2ִִִ(0)|ִ00:00:01ִ|ִִִִִִִ|ִִִִִִִ|
----------------------------------------------------------------------------------------------------------------


PredicateִInformationִ(identifiedִbyִoperationִid):
---------------------------------------------------
ִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִִ
ִִִ5ִ-ִaccess("A"."DATA1"="B"."DATA1")ִִִִִִִִִִִִִ


17 rows selected.

SQL>

This looks quite promising, and if a similar hint is applied to our real world example mentioned in the beginning, the serial statement execution run time drops from 15 minutes to just a couple of seconds. This looks like a vast improvement.

The Service Request is still being worked on, so stay tuned what Oracle Support has to say about this.

Update (August 7th, 2007): Oracle has accepted this as a bug (BUG5996801) and it has already been fixed for the upcoming version 11g. They are currently planning to include this fix also in
the next patch-set release 10.2.0.4 for Oracle 10gR2.