Thursday, January 31, 2013

Exadata Smart Scan Projection Limitation

Here is an interesting limitation to Exadata Smart Scans - if more than 254 columns from a table (not HCC compressed, more on that in moment) need to be projected, Smart Scans for that particular segment will be disabled and Exadata will fall back to conventional I/O. This means that the number of columns in the projection clause can make a significant difference to performance, since only Smart Scans allow taking advantage of offloading and particularly avoiding I/O via Storage Indexes.

Now the expression "254 columns" might ring a bell, since it is the maximum number of columns that Oracle can store in a single row piece - tables consisting of more than 254 columns will have to be stored in multiple row pieces.

However, what I'm talking about here is not related to such columns residing in different row pieces of a row - Smart Scans still happily work even if columns from different row pieces are projected (which was subject to several bugs in the past), although you might end up with additional "cell single block reads" in case of truly chained rows rather than just additional logical I/O for picking up the different row pieces from the same block, also sometimes called "intra-block" chaining.

No, the limitation simply seems to be that Smart Scans - broadly speaking and ignoring edge cases - can only transport a maximum of 254 columns from a single (non-HCC) segment. Requesting more columns will simply disable Smart Scans for that segment.

Now you might say, offloading and in particular offloading column projection isn't that much relevant if you select that many columns from a table anyway, but the point is that you loose the ability to benefit from Storage Indexes and only transporting the relevant rows back to the compute nodes.

Both features can speed up the processing significantly, in particular if the number of rows selected is only a fraction of the total number of rows, and/or the cells could avoid a significant amount of I/O via Storage Indexes.

To demonstrate the point I've put together a simple test case that generates a test table with more than 254 columns - the script below generates a table of approx. 40GB uncompressed size so that a significant difference in performance could be measured.

set echo on timing on time on

-- MAX is 999, there is a ID column as first col
define num_cols = 300

define compression = nocompress
--define compression = "compress for query low"

drop table many_cols_rg;

purge table many_cols_rg;

declare
  s_sql varchar2(32767);
begin
  s_sql := q'!
create table many_cols_rg pctfree 0
&compression
parallel nologging
as
with generator1 as
(
select  /*+ materialize cardinality(1000) */
        rownum as id
        -- this makes the rowsource wider otherwise PX BLOCK ITERATOR has problems properly spreading work among the slaves
      , rpad('x', 4000) as padding
from
        dual
connect by
        level <= 1e3
),
generator2 as
(
select  /*+ materialize cardinality(10000) */
        rownum as id
        -- this makes the rowsource wider otherwise PX BLOCK ITERATOR has problems properly spreading work among the slaves
      , rpad('x', 4000) as padding
from
        dual
connect by
        level <= 1e4
)
select
        num_id as id
!';
  for i in 1..&num_cols loop
    s_sql := s_sql || ', char_id as col' || to_char(i, 'FM000');
  end loop;
  s_sql := s_sql || q'!
from
        (
        select  /*+ no_merge */
                a.id + (b.id - 1) * 1e3 as num_id
              , cast(to_char(a.id + (b.id - 1) * 1e3, 'FM0000000000') as varchar2(10)) as char_id
        from
                generator1 a
              , generator2 b
        )
  !';
  execute immediate s_sql;
end;
/

exec dbms_stats.gather_table_stats(null, 'many_cols_rg')

Assuming a Storage Index was generated on the ID column a query like the following (using Parallel Query at a DOP of 32 in the test runs here) can benefit from offloading, since in principle all I/O could be avoided via the Storage Index, and virtually no data needs to be transported from the cells to the compute nodes.

Note that it projects exactly 254 columns from different row pieces.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

11:28:22 SQL>
11:28:22 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan                0

Elapsed: 00:00:00.00
11:28:22 SQL>
11:28:22 SQL> select
11:28:22   2  col001,
11:28:22   3  col002,
.
.
.
11:28:23 254  col253,
11:28:23 255  col300/*,
11:28:23 256  col254*/
11:28:23 257  from many_cols_rg where id between -2 and -1;

no rows selected

Elapsed: 00:00:02.40
11:28:25 SQL>
11:28:25 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    2,1195E+10
cell physical IO interconnect bytes returned by smart scan          3000784

Elapsed: 00:00:00.01
11:28:25 SQL>

As you can see from the snippet, it took less than 2.5 seconds to run the query on the 40GB segment, and the session statistics report 20GB avoided via the Storage Index (which seems to be an instrumentation bug as it always reports only 50% of the total segment size as a maximum, this output was taken from 11.2.0.2 Exadata BP14). Furthermore only a couple of MB were exchanged between the cells and the compute nodes.

The corresponding Real-Time SQL Monitoring report confirms the "Smart Scan":


Increasing the number of columns projected from the segment above 254 (and as outlined above it doesn't matter from which row pieces these columns come from) disables the Smart Scan, and it takes more than 9 seconds to run essentially the same query, pumping all 40GB through the compute nodes to filter all rows.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

11:29:14 SQL>
11:29:14 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan                0

Elapsed: 00:00:00.00
11:29:14 SQL>
11:29:14 SQL> select
11:29:14   2  col001,
11:29:14   3  col002,
.
.
.
11:29:15 254  col253,
11:29:15 255  col300,
11:29:15 256  col254
11:29:15 257  from many_cols_rg where id between -2 and -1;

no rows selected

Elapsed: 00:00:09.22
11:29:24 SQL>
11:29:24 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan                0

Elapsed: 00:00:00.01
11:29:24 SQL>

The corresponding Real-Time SQL Monitoring report confirms the fallback to "direct path reads":


Funnily, in this deliberately crafted, extreme case here, it is much faster to access the segment twice and get the remaining columns via a self-join in order to benefit from the offloading features - it only takes 4.7 seconds to run the self-join, and the session statistics confirm that both segment scans could leverage offloading and in particular avoid I/O via Storage Indexes:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

11:29:37 SQL>
11:29:37 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan                0

Elapsed: 00:00:00.00
11:29:37 SQL>
11:29:37 SQL> select
11:29:37   2  a.col001,
11:29:37   3  a.col002,
.
.
.
11:29:37 254  a.col253,
11:29:37 255  a.col300,
11:29:37 256  b.col254
11:29:37 257  from many_cols_rg a, many_cols_rg b
11:29:37 258  where a.id between -2 and -1 and b.id between -2 and -1
11:29:37 259  and a.id = b.id;

no rows selected

Elapsed: 00:00:04.77
11:29:42 SQL>
11:29:42 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    4,2390E+10
cell physical IO interconnect bytes returned by smart scan          6001568

Elapsed: 00:00:00.01
11:29:42 SQL>
11:29:42 SQL>

Note: This is one of the cases where you don't want the optimizer to eliminate a self-join via a query transformation based on corresponding constraints on the join columns :-)

HCC Compression


Now the funny thing is that if you repeat the table creation script but uncomment the HCC compression, the Smart Scan happily works with up to 1,000 columns of such a compressed segment.

So obviously the general code implementation supports transporting rows with more than 254 columns from the cell to the compute nodes, but the question is why does it only do so with HCC compressed segments. It's probably a question that my client will raise with Oracle Support to find out the answer.

Footnote


At least only the number of "raw" columns projected count towards the limitation - any expressions based on columns don't count, therefore you can project actually more than 254 expressions from a segment and still benefit from Smart Scans as long as the expressions refer to a maximum of 254 "base" columns.

The same limitation could also be reproduced when using (almost) the latest available Exadata version as of writing this (11.2.0.3 BP12, almost because BP14 just came out as far as I know).

6 comments:

  1. Thanks Randolf, very interesting post .
    GregG

    ReplyDelete
  2. Hi Randolf,

    Just FYI, this is documented in the Exadata Storage Server Owners Guide - "A query that has more than 255 columns referenced and heap table is uncompressed, or Basic or OLTP compressed. However such queries on Exadata Hybrid Columnar Compression-compressed tables are offloaded".
    It's on Page 162 of the version I have (August 2011)

    Cheers,
    Tim

    ReplyDelete
  3. Hi Tim,

    cool, thanks for the pointer. So this is a documented limitation - I didn't know that.

    Randolf

    ReplyDelete
  4. Very good post, thanks?

    > but the question is why does it only do so with HCC compressed segments?

    => I guess to force usage of HCC in order to speed-up finding and fixing of bugs.

    cheers,
    goran

    ReplyDelete
  5. Interesting post, Thanks!

    Will smartscan work with other types of compression e.g. OLTP with > 254 column referenced

    ReplyDelete
  6. > Will smartscan work with other types of compression e.g. OLTP with > 254 column referenced

    OLTP and basic compresssion only work with tables having less than 255 columns, so there won't be any actual compression, although Oracle doesn't complain, just silently falls back to no compression.

    Since the internal organisation of a block therefore doesn't change, I'm pretty sure the same restriction applies to such a scenario, too.

    Randolf

    ReplyDelete