Monday, October 17, 2011

Volatile Data, Dynamic Sampling And Shared Cursors

For the next couple of weeks I'll be picking up various random notes I've made during the sessions that I've attended at OOW. This particular topic was also a problem discussed recently at one of my clients, so it's certainly worth to be published here.

In one of the optimizer related sessions it was mentioned that for highly volatile data - for example often found in Global Temporary Tables (GTT) - it's recommended to use Dynamic Sampling rather than attempting to gather statistics. In particular for GTTs gathering statistics is problematic because the statistics are used globally and shared across all sessions. But GTTs could have a completely different data volume and distribution per session so sharing the statistics doesn't make sense in such scenarios.

So using Dynamic Sampling sounds like a reasonable advice and it probably is in many such cases.

However, there is still a potential problem even when resorting to Dynamic Sampling. If the cursors based on Dynamic Sampling get shared between sessions then they won't be re-optimized even if a GTT in one session is completely different from the one of the session that created the shared cursor previously.

So you can still end up with shared cursors and execution plans that are inappropriate to share across the different sessions. Using Dynamic Sampling doesn't address this issue. It addresses the issue if the cursors do not get shared, for example if they use literals and these literals differ so that different cursors will be generated based on the text matching.

Here is a simple test case that demonstrates the problem:


drop view v_gtt_dyn;

drop table gtt_dyn;

-- Create a Global Temporary Table with an index on it
create global temporary table gtt_dyn (
id number not null
, vc1 varchar2(100)
, filler varchar2(255)
)
on commit preserve rows
;

create index gtt_dyn_idx on gtt_dyn (id);

-- Create a simple view - it will become obvious later
-- why this has been used
create or replace view v_gtt_dyn as select * from gtt_dyn;

-- Run in Session 1
set echo on timing on

-- Unique value in ID column
insert into gtt_dyn
select
rownum as id
, rpad('x', 100) as vc1
, rpad('y', 255) as filler
from
dual
connect by level <= 10000
;

commit;

alter session set statistics_level = all;

alter session set optimizer_dynamic_sampling = 2;

select * from v_gtt_dyn where id = 10 and rownum > 1;

set linesize 200 pagesize 0 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

-- Run in Session 2
set echo on timing on

-- Single value in ID column
insert into gtt_dyn
select
10 as id
, rpad('x', 100) as vc1
, rpad('y', 255) as filler
from
dual
connect by level <= 10000
;

commit;

alter session set statistics_level = all;

alter session set optimizer_dynamic_sampling = 2;

select * from v_gtt_dyn where id = 10 and rownum > 1;

set linesize 200 pagesize 0 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));


Now this is probably an extreme case of data distribution differences but if you run it you'll see it makes the point obvious: In the second session the data distribution of the GTT is completely different, and although there are no statistics gathered on the GTT and hence Dynamic Sampling gets used to arrive at an execution plan, the plan gets shared in the second session (there is only a child number 0) - but the plan is completely inappropriate for the data distribution of the GTT in the that session, you just need to look at the E-Rows and A-Rows columns of the runtime profile:


SQL> select * from v_gtt_dyn where id = 10 and rownum > 1;

no rows selected

Elapsed: 00:00:00.07
SQL>
SQL> set linesize 200 pagesize 0 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID bjax3mksw1uza, child number 0
-------------------------------------
select * from v_gtt_dyn where id = 10 and rownum > 1

Plan hash value: 471827990

-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.08 | 1117 |
| 1 | COUNT | | 1 | | 0 |00:00:00.08 | 1117 |
|* 2 | FILTER | | 1 | | 0 |00:00:00.08 | 1117 |
| 3 | TABLE ACCESS BY INDEX ROWID| GTT_DYN | 1 | 1 | 10000 |00:00:00.06 | 1117 |
|* 4 | INDEX RANGE SCAN | GTT_DYN_IDX | 1 | 1 | 10000 |00:00:00.02 | 63 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM>1)
4 - access("ID"=10)

Note
-----
- dynamic sampling used for this statement (level=2)



Imagine a more complex plan with joins and a larger data volume and this is a receipt for disaster.

If this problem cannot be addressed from application side by helping the database to generate different cursors for the different data distributions (for example by simply adding different predicates that don't change the result like 1 = 1, 2 = 2 etc.) then you might be able to handle the issue by using Virtual Private Database (VPD, aka. Row Level Security / RLS, Fine Grained Access Control / FGAC). I've already demonstrated the general approach in the past here, but in this case a slightly more sophisticated approach could make sense.

By adding the following code and RLS policy I can drive Oracle to perform a re-optimization only in those cases where it is appropriate. This limits the damage that the general approach does to the Shared Pool by generating potentially numerous child cursors unconditionally.


create or replace package pkg_rls_force_hard_parse is
function force_hard_parse (in_schema varchar2, in_object varchar2) return varchar2;
end pkg_rls_force_hard_parse;
/

create or replace package body pkg_rls_force_hard_parse is
-- Cache the count in session state
g_cnt number;

function force_hard_parse (in_schema varchar2, in_object varchar2) return varchar2
is
s_predicate varchar2(100);
begin
-- Only execute query once in session
-- Change if re-evaluation is desired
if g_cnt is null then
select
count(*)
into
g_cnt
from
gtt_dyn
where
id = 10
and rownum <= 10;
end if;

-- We end up with exactly two child cursors
-- with the desired different plans
-- These child cursors will be shared accordingly
if g_cnt > 1 then
s_predicate := '1 = 1';
else
s_predicate := '2 = 2';
end if;

return s_predicate;
end force_hard_parse;
end pkg_rls_force_hard_parse;
/

-- CONTEXT_SENSITIVE avoids re-evaluation of policy function at execution time
-- Note however that it doesn't avoid re-evaluation at parse time
exec DBMS_RLS.ADD_POLICY (USER, 'v_gtt_dyn', 'hard_parse_policy', USER, 'pkg_rls_force_hard_parse.force_hard_parse', 'select', policy_type => DBMS_RLS.CONTEXT_SENSITIVE);


Now if you repeat above exercise - ideally with SQL trace enabled to see the additional acitivity caused by the RLS policy - you'll notice that the different sessions will end up with different child cursors and execution plans based on the check made.

Now the reason why the view is in place might become obvious: A RLS policy on the base table would have lead to an infinite recursive execution of the RLS policy function due to the query performed within the function. There are other obvious options how to deal with that, for example storing the RLS policy function in a separate schema with the EXEMPT ACCESS POLICY privilege should also work.

This is the result in the second session now:


SQL> select * from v_gtt_dyn where id = 10 and rownum > 1;

no rows selected

Elapsed: 00:00:00.12
SQL>
SQL> set linesize 200 pagesize 0 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID bjax3mksw1uza, child number 1
-------------------------------------
select * from v_gtt_dyn where id = 10 and rownum > 1

Plan hash value: 424976618

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.04 | 1003 |
| 1 | COUNT | | 1 | | 0 |00:00:00.04 | 1003 |
|* 2 | FILTER | | 1 | | 0 |00:00:00.04 | 1003 |
|* 3 | TABLE ACCESS FULL| GTT_DYN | 1 | 9288 | 10000 |00:00:00.03 | 1003 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM>1)
3 - filter("ID"=10)

Note
-----
- dynamic sampling used for this statement (level=2)


Notice how a second child cursor has been generated and that the cardinality estimate is now much closer to the reality.

Adaptive Cursor Sharing / Cardinality Feedback

I was curious to see if recent features like Adaptive Cursor Sharing or Cardinality Feedback would be able to solve the issue when using the 11g releases.

Cardinality Feedback (introduced in 11.2) unfortunately doesn't get used in the scenario described here, because Dynamic Sampling disables Cardinality Feedback in the current implementation.

Note that the usage of bind variables also disables Cardinality Feedback for those parts of the plan affected by the bind variables - as described in the Optimizer blog post that can be found here.

So may be Adaptive Cursor Sharing (ACS, introduced in 11.1) can come to rescue in case bind variables get used.

Of course the usage of bind variables increases the probability of cursor sharing in above scenario. As already outlined in a previous note ACS is a "reactive" and "non-persistent" feature, so it will only be able to correct things that have already been going wrong at least once. Furthermore if the ACS information gets aged out of the Shared Pool again mistakes will have to be repeated to get recognized by ACS.

However it is interesting to note that I wasn't able to get ACS working in a slightly modified scenario like this (without the RLS policy in place of course):


-- Session 1
set echo on timing on

set linesize 200 pagesize 0 trimspool on tab off

insert into gtt_dyn
select
rownum as id
, rpad('x', 100) as vc1
, rpad('y', 255) as filler
from
dual
connect by level <= 10000
;

commit;

alter session set statistics_level = all;

variable n1 number

exec :n1 := 10

select * from v_gtt_dyn where id <= :n1 and rownum > 1;

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

-- Session 2
set echo on timing on

set linesize 200 pagesize 0 trimspool on tab off

insert into gtt_dyn
select
10 as id
, rpad('x', 100) as vc1
, rpad('y', 255) as filler
from
dual
connect by level <= 10000
;

commit;

alter session set statistics_level = all;

variable n1 number

exec :n1 := 10

select * from v_gtt_dyn where id <= :n1 and rownum > 1;

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

-- Second execution to allow ACS potentially kicking in
select * from v_gtt_dyn where id <= :n1 and rownum > 1;

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));


There are some interesting points to notice when running this example:

1. A cursor that uses non-equal operators like above less or equal together with bind variables usually gets marked as "bind-sensitive" and will be monitored by ACS. But in the above case the cursor was not marked as bind sensitive and hence ACS didn't even bother to monitor

2. Consequently the two sessions share the single child cursor and the problem is not addressed by ACS even in subsequent executions

3. It looks like that again the usage of Dynamic Sampling disables ACS

Looking at the way ACS manages the Cursor Sharing criteria (check V$SQL_CS_SELECTIVITY for example) I see the problem that ACS probably wouldn't support the fact that the same value for the bind variable resulted in a completely different selectivity range.

May be this is an explanation why ACS is not activated for cursors that use Dynamic Sampling - ACS may only be able to cope with different bind value ranges that lead to different selectivities.

So even when using bind variables and 11g with ACS it looks like that only the RLS policy approach allows to address this issue from a database-only side. Ideally the application should be "data-aware" in such cases and help the database accordingly to arrive at reasonable execution plans by actively unsharing the cursors.

2 comments:

  1. great post I always have this problem with gtt

    ReplyDelete
  2. Hi Randolf,

    Another solution would be to calculate statistics on the GTT.
    Then we could force the use of DS by using a DYNAMIC_SAMPLING hint at level 6 and invalidate the cursor before the query is executed by using the dbms_stats.set_table_stats procedure with NO_INVALIDATE = FALSE.

    A FORCE_HARD_PARSE hint would be so usefull :-)

    ReplyDelete