Thursday, February 19, 2015

12c Parallel Execution New Features: Hybrid Hash Distribution - Part 2

In the second part of this post (go to part 1) I want to focus on the hybrid distribution for skewed join expressions.

2. Hybrid Distribution For Skewed Join Expressions


The HYBRID HASH distribution allows to some degree addressing data distribution skew in case of HASH distributions, which I've described in detail already in the past. A summary post that links to all other relevant articles regarding Parallel Execution Skew can be found here, an overview of the relevant feature can be found here and a detailed description can be found here.

One other side effect of the truly hybrid distribution in case of skew (mixture of BROADCAST / HASH for one row source and ROUND-ROBIN / HASH for the other row source) is that HASH distributions following such a hybrid distribution need to redistribute again even if the same join / distribution keys get used by following joins. If this were regular HASH distributions the data would already be suitably distributed and no further redistribution would be required.

Here's an example of this, using the test case setup mentioned here:
-- Here the HYBRID SKEW distribution works for B->C
-- But the (B->C)->A join is affected by the same skew
-- So the HASH re-distribution of the resulting B.ID is skewed, too
-- And hence the HASH JOIN/SORT AGGREGATE (operation 4+5) are affected by the skew
-- The big question is: Why is there a re-distribution (operation 12+11)?
-- The data is already distributed on B.ID??
-- If there wasn't a re-distribution no skew would happen
-- In 11.2 no-redistribution happens no matter if C is probe or hash row source
-- So it looks like a side-effect of the hybrid distribution
-- Which makes sense as it is not really HASH distributed, but hybrid
select count(t_2_filler) from (
select  /*+ monitor
            leading(b c a)
            use_hash(c a)
            swap_join_inputs(a)
            no_swap_join_inputs(c)
            pq_distribute(a hash hash)
            pq_distribute(c hash hash)
            --optimizer_features_enable('11.2.0.4')
            pq_skew(c) 
        */
        a.id as t_1_id
      , a.filler as t_1_filler
      , c.id as t_2_id
      , c.filler as t_2_filler
from    t_1 a
      , t_1 b
      , t_2 c
where
        c.fk_id_skew = b.id
and     a.id = b.id
);

-- 11.2 plan
----------------------------------------------------------------------------
| Id  | Operation                  | Name     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |        |      |            |
|   1 |  SORT AGGREGATE            |          |        |      |            |
|   2 |   PX COORDINATOR           |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)     | :TQ10003 |  Q1,03 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE         |          |  Q1,03 | PCWP |            |
|*  5 |      HASH JOIN             |          |  Q1,03 | PCWP |            |
|   6 |       PX RECEIVE           |          |  Q1,03 | PCWP |            |
|   7 |        PX SEND HASH        | :TQ10000 |  Q1,00 | P->P | HASH       |
|   8 |         PX BLOCK ITERATOR  |          |  Q1,00 | PCWC |            |
|   9 |          TABLE ACCESS FULL | T_1      |  Q1,00 | PCWP |            |
|* 10 |        HASH JOIN           |          |  Q1,03 | PCWP |            |
|  11 |        PX RECEIVE          |          |  Q1,03 | PCWP |            |
|  12 |         PX SEND HASH       | :TQ10001 |  Q1,01 | P->P | HASH       |
|  13 |          PX BLOCK ITERATOR |          |  Q1,01 | PCWC |            |
|  14 |           TABLE ACCESS FULL| T_1      |  Q1,01 | PCWP |            |
|  15 |        PX RECEIVE          |          |  Q1,03 | PCWP |            |
|  16 |         PX SEND HASH       | :TQ10002 |  Q1,02 | P->P | HASH       |
|  17 |          PX BLOCK ITERATOR |          |  Q1,02 | PCWC |            |
|  18 |           TABLE ACCESS FULL| T_2      |  Q1,02 | PCWP |            |
----------------------------------------------------------------------------

-- 12.1 plan
-------------------------------------------------------------------------------------
| Id  | Operation                           | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |        |      |            |
|   1 |  SORT AGGREGATE                     |          |        |      |            |
|   2 |   PX COORDINATOR                    |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)              | :TQ10004 |  Q1,04 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                  |          |  Q1,04 | PCWP |            |
|*  5 |      HASH JOIN                      |          |  Q1,04 | PCWP |            |
|   6 |       PX RECEIVE                    |          |  Q1,04 | PCWP |            |
|   7 |        PX SEND HYBRID HASH          | :TQ10002 |  Q1,02 | P->P | HYBRID HASH|
|   8 |         STATISTICS COLLECTOR        |          |  Q1,02 | PCWC |            |
|   9 |          PX BLOCK ITERATOR          |          |  Q1,02 | PCWC |            |
|  10 |           TABLE ACCESS FULL         | T_1      |  Q1,02 | PCWP |            |
|  11 |       PX RECEIVE                    |          |  Q1,04 | PCWP |            |
|  12 |        PX SEND HYBRID HASH          | :TQ10003 |  Q1,03 | P->P | HYBRID HASH|
|* 13 |         HASH JOIN BUFFERED          |          |  Q1,03 | PCWP |            |
|  14 |          PX RECEIVE                 |          |  Q1,03 | PCWP |            |
|  15 |           PX SEND HYBRID HASH       | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|  16 |            STATISTICS COLLECTOR     |          |  Q1,00 | PCWC |            |
|  17 |             PX BLOCK ITERATOR       |          |  Q1,00 | PCWC |            |
|  18 |              TABLE ACCESS FULL      | T_1      |  Q1,00 | PCWP |            |
|  19 |          PX RECEIVE                 |          |  Q1,03 | PCWP |            |
|  20 |           PX SEND HYBRID HASH (SKEW)| :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  21 |            PX BLOCK ITERATOR        |          |  Q1,01 | PCWC |            |
|  22 |             TABLE ACCESS FULL       | T_2      |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------------------
Note that both joins to A and C are based on B.ID. As you can see from the 11.2 plan therefore the final hash join (operation ID 5) doesn't need to have the output of the previous hash join (operation ID 10) redistributed, since the data is already distributed in a suitable way (and as a consequence both joins therefore will be affected by skewed values in T2.FK_ID_SKEW, but no BUFFERED join variant is required).

Now look at the 12c plan when SKEW is detected: Since the SKEW handling in fact leads to a potential mixture of HASH / BROADCAST and HASH / ROUND-ROBIN distribution, the data gets redistributed again for the final join (operation ID 11 + 12) which has several bad side effects: First it adds the overhead of an additional redistribution, as a side effect this then turns one of the hash joins into its BUFFERED variant, and since the SKEW distribution (at present) is only supported if the right side of the join is a table (and not the result of another join), this following join actually will be affected by the skew that was just addressed by the special SKEW handling in the join before (assuming the HYBRID HASH distributions in operation ID 6+7 / 11+12 operate in HASH / HASH, not BROADCAST / ROUND-ROBIN mode)...

Monday, February 16, 2015

12c Parallel Execution New Features: Hybrid Hash Distribution - Part 1

In this blog post I want to cover some aspects of the the new HYBRID HASH adaptive distribution method that I haven't covered yet in my other posts.

As far as I know it serves two purposes for parallel HASH and MERGE JOINs, adaptive broadcast distribution and hybrid distribution for skewed join expressions. In the first part of this post I want to focus on former one (goto part 2).

1. Adaptive Broadcast Distribution For Small Left Row Sources


It allows the PX SEND / RECEIVE operation for the left (smaller estimated row source) of the hash join to decide dynamically at runtime, actually at each execution, if it should use either a BROADCAST or HASH distribution, and correspondingly for the other row source to use then either a ROUND-ROBIN or a HASH distribution, too. This is described for example in the corresponding white paper by Maria Colgan here.

It's important to emphasize that this decision is really done at each execution of the same cursor, so the same cursor can do a BROADCAST distribution for the left row source at one execution and HASH distribution at another execution depending on whether the number of rows detected by the STATISTICS COLLECTOR operator exceeds the threshold or not. This is different from the behaviour of "adaptive joins" where the final plan will be resolved at first execution and from then on will be re-used, and therefore a STATISTICS COLLECTOR operator as part of an adaptive plan no longer will be evaluated after the first execution.

Here is a simple script demonstrating that the distribution method is evaluated at each execution:
define dop = 4

create table t_1
compress
as
select
        rownum as id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(&dop*2) */ * from dual
connect by
        level <= &dop*2) a
;

exec dbms_stats.gather_table_stats(null, 't_1', method_opt=>'for all columns size 1')

create table t_2
compress
as
select
        rownum as id
      , mod(rownum, &dop) + 1 as fk_id
      , rpad('x', 100) as filler
from
        (select /*+ cardinality(1e5) */ * from dual
connect by
        level <= 1e5) a
;

exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1')

alter table t_1 parallel &dop cache;

alter table t_2 parallel &dop cache;

select /*+ leading(t1) no_swap_join_inputs(t2) pq_distribute(t_2 hash hash) */ max(t_2.id) from t_1, t_2 where t_1.id = t_2.fk_id;

@pqstat

delete from t_1 where rownum <= 1;

select count(*) from t_1;

select /*+ leading(t1) no_swap_join_inputs(t2) pq_distribute(t_2 hash hash) */ max(t_2.id) from t_1, t_2 where t_1.id = t_2.fk_id;

@pqstat

rollback;
For the table queue 0 (the distribution of T_1) the distribution for the first execution in above script look like this:
     TQ_ID SERVER_TYP   INSTANCE PROCESS    NUM_ROWS          % GRAPH     
---------- ---------- ---------- -------- ---------- ---------- ----------
         0 Producer            1 P004              8        100 ##########
                                 P005              0          0           
                                 P006              0          0           
                                 P007              0          0           
           ********** **********          ----------
           Total                                   8

           Consumer            1 P000              3         38 ##########
                                 P001              1         13 ###       
                                 P002              2         25 #######   
                                 P003              2         25 #######   
           ********** **********          ----------
           Total                                   8
So the eight rows are distributed assumingly by hash. But for the second execution with only seven rows in T_1 I get this output:
     TQ_ID SERVER_TYP   INSTANCE PROCESS    NUM_ROWS          % GRAPH
---------- ---------- ---------- -------- ---------- ---------- ----------
         0 Producer            1 P004             28        100 ##########
                                 P005              0          0
                                 P006              0          0
                                 P007              0          0
           ********** **********          ----------
           Total                                  28

           Consumer            1 P000              7         25 ##########
                                 P001              7         25 ##########
                                 P002              7         25 ##########
                                 P003              7         25 ##########
           ********** **********          ----------
           Total                                  28
So the seven rows were this time broadcasted.

The "pqstat" script is simply a query on V$PQ_TQSTAT, which I've mentioned for example here.

So I run the same query twice, the first time the threshold is exceeded and a HASH distribution takes place. After deleting one row the second execution of the same cursor turns into a BROADCAST / ROUND-ROBIN distribution. You can verify that this is the same parent / child cursor via DBMS_XPLAN.DISPLAY_CURSOR / V$SQL. Real-Time SQL Monitoring also can provide more details about the distribution methods used (click on the "binoculars" icon in the "Other" column of the active report for the PX SEND HYBRID HASH operations).

Note that the dynamic switch between HASH to BROADCAST unfortunately isn't the same as a decision of the optimizer at parse time to use BROADCAST distribution, because in such a case the other row source won't be distributed at all, which comes with some important side effects:

Not only the redistribution of larger row sources simply can take significant time and resources (CPU and in case of RAC network), but due to the (in 12c still existing) limitation of Parallel Execution that only a single redistribution is allowed to be active concurrently reducing the number of redistributions in the plan simply as a side effect can reduce the number of BUFFERED operations (mostly HASH JOIN BUFFERED, but could be additional BUFFER SORTs, too), which are a threat to Parallel Execution performance in general.

Here is a very simple example showing the difference:

-- HYBRID HASH with possible BROADCAST distribution of T_1
----------------------------------------------------------------------------
| Id  | Operation                  | Name     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |        |      |            |
|   1 |  PX COORDINATOR            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)      | :TQ10002 |  Q1,02 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED      |          |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE             |          |  Q1,02 | PCWP |            |
|   5 |      PX SEND HYBRID HASH   | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|   6 |       STATISTICS COLLECTOR |          |  Q1,00 | PCWC |            |
|   7 |        PX BLOCK ITERATOR   |          |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL  | T_1      |  Q1,00 | PCWP |            |
|   9 |     PX RECEIVE             |          |  Q1,02 | PCWP |            |
|  10 |      PX SEND HYBRID HASH   | :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  11 |       PX BLOCK ITERATOR    |          |  Q1,01 | PCWC |            |
|  12 |        TABLE ACCESS FULL   | T_2      |  Q1,01 | PCWP |            |
----------------------------------------------------------------------------

-- TRUE BROADCAST of T_1
-------------------------------------------------------------------------
| Id  | Operation               | Name     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |        |      |            |
|   1 |  PX COORDINATOR         |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN            |          |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |  Q1,01 | PCWP |            |
|   5 |      PX SEND BROADCAST  | :TQ10000 |  Q1,00 | P->P | BROADCAST  |
|   6 |       PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| T_1      |  Q1,00 | PCWP |            |
|   8 |     PX BLOCK ITERATOR   |          |  Q1,01 | PCWC |            |
|   9 |      TABLE ACCESS FULL  | T_2      |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------
So even if in the first plan the T_1 row source really has less than 2*DOP rows and the HYBRID HASH distribution turns into a BROADCAST distribution, this doesn't change the overall plan shape generated by the optimizer. The second HYBRID HASH distribution won't be skipped and will turn into a ROUND-ROBIN distribution instead, which can be confirmed by looking at the output from V$PQ_TQSTAT for example. So the data of the second row source still needs to be distributed, and hence the HASH JOIN will be operating as BUFFERED join due to the plan shape and the limitation that only a single PX SEND / RECEIVE pair can be active at the same time.

In the second plan the BROADCAST distribution of T_1 means that T_2 will not be re-distributed, hence there is no need to operate the HASH JOIN in buffered mode.

So the only purpose of this particular adaptive HYBRID HASH distribution is obviously to avoid skew if there are only a couple of rows (and hence possible join key values) in the left row source, because a HASH distribution based on such a low number of distinct values doesn't work well. Oracle's algorithm needs a certain number of distinct values otherwise it can end up with a bad distribution. This probably also explains why the threshold of 2*DOP was chosen so low.