6

I'm using an indexed column used as a filter by putting it 'between' two literal values. (The column is in the second position of the index and actually makes execution slower; I will deal with that later).

What's confusing me is that Oracle (11.2.0.3.0) uses or ignores said index depending on the format of the value and format strings supplied to to_date:

This ignores the index:

SQL> SELECT *
  2  FROM gprs_history_import  gh
  3  WHERE start_call_date_time BETWEEN
  4      to_date('20140610 000000','yyyymmdd hh24miss') AND
  5      to_date('20140610 235959','yyyymmdd hh24miss')
  6  /

Execution Plan
----------------------------------------------------------
Plan hash value: 990804809

--------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                     |   350 |   219K|   242K  (1)| 00:56:42 |       |       |
|   1 |  PARTITION RANGE SINGLE|                     |   350 |   219K|   242K  (1)| 00:56:42 |    74 |    74 |
|   2 |   PARTITION LIST ALL   |                     |   350 |   219K|   242K  (1)| 00:56:42 |     1 |     3 |
|*  3 |    TABLE ACCESS FULL   | GPRS_HISTORY_IMPORT |   350 |   219K|   242K  (1)| 00:56:42 |   220 |   222 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

This one does use the index (notice the space after the date part in line 4):

SQL> SELECT *
  2  FROM gprs_history_import  gh
  3  WHERE start_call_date_time BETWEEN
  4      to_date('20140610 ','yyyymmdd ') AND
  5      to_date('20140610 235959','yyyymmdd hh24miss')
  6  /

Execution Plan
----------------------------------------------------------
Plan hash value: 464458373

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                          |   350 |   219K|  2795K  (1)| 10:52:15 |       |       |
|*  1 |  FILTER                              |                          |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR           |                          |   350 |   219K|  2795K  (1)| 10:52:15 |   KEY |    74 |
|   3 |    PARTITION LIST ALL                |                          |   350 |   219K|  2795K  (1)| 10:52:15 |     1 |     3 |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| GPRS_HISTORY_IMPORT      |   350 |   219K|  2795K  (1)| 10:52:15 |   KEY |   222 |
|*  5 |      INDEX SKIP SCAN                 | GPRS_HISTORY_IMPORT_IDX1 |     1 |       |  2795K  (1)| 10:52:15 |   KEY |   222 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_DATE('20140610 ','yyyymmdd ')<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   5 - access("START_CALL_DATE_TIME">=TO_DATE('20140610 ','yyyymmdd ') AND "START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
       filter("START_CALL_DATE_TIME">=TO_DATE('20140610 ','yyyymmdd ') AND "START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

(The filter in (1) seems a bit silly, as if Oracle didn't understand the expression)

Again, this one doesn't (I removed the trailing space):

SQL> SELECT *
  2  FROM gprs_history_import  gh
  3  WHERE start_call_date_time BETWEEN
  4      to_date('20140610','yyyymmdd') AND
  5      to_date('20140610 235959','yyyymmdd hh24miss')
  6  /

Execution Plan
----------------------------------------------------------
Plan hash value: 990804809

--------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                     |   350 |   219K|   242K  (1)| 00:56:42 |       |       |
|   1 |  PARTITION RANGE SINGLE|                     |   350 |   219K|   242K  (1)| 00:56:42 |    74 |    74 |
|   2 |   PARTITION LIST ALL   |                     |   350 |   219K|   242K  (1)| 00:56:42 |     1 |     3 |
|*  3 |    TABLE ACCESS FULL   | GPRS_HISTORY_IMPORT |   350 |   219K|   242K  (1)| 00:56:42 |   220 |   222 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

Putting quotes around the space precludes the index from ever being used.

What gives?

arielCo
  • 371
  • 3
  • 15
  • dba.stackexchange.com would be a better site to address this topic, where you can get detailed and more accurate solution. Flagging to migrate this issue. – Rachcha Jun 17 '14 at 01:23
  • @Rachcha When a new user asks an interesting question that is a good fit for this site we should all rejoice, not try to move it somewhere else. Also, very few DBAs know how to solve these kinds of issues. – Jon Heller Jun 17 '14 at 01:34
  • @jonearles: Perhaps you are right. I just thought the OP would get better response there, just my perception. – Rachcha Jun 17 '14 at 01:36
  • 1
    What is the table partitioned on? Presumably a date range, but what granularity? – APC Jun 17 '14 at 05:32
  • If the table contains CDRs I guess it might have a high turnover and the records for a particular day might not be around for long, but also the stats might struggle to keep up? It's interesting that the index version has a higher cost, and the non-index doesn't filter the lower date bound. Were these first run at different times, and were there records on and before 2014-06-10 when each of them was hard-parsed? Wondering if the actual format you're using might not be relevant, they're just different queries parsed with different data/stats and each looked optimal at that point in time? – Alex Poole Jun 17 '14 at 08:53
  • @APC: It's partitioned per day (thus the `partition range single` in the second execution plan). – arielCo Jun 17 '14 at 17:49
  • @AlexPoole: Records arrive daily and last ~90 days. All plans were run within a few minutes, and definitely against the same table data. – arielCo Jun 17 '14 at 18:21

2 Answers2

2

Ok - I'll give it a try, this is mostly deduction from the availabe Information:

Why does Oracle choose a different execution plan?

It seems in your second query with the unusual Date-Format, the optimizer has no idea what the value of the resulting date is. You see the Filter Predicate:

1 - filter(TO_DATE('20140610 ','yyyymmdd ')<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

Which means the optimizer is not even sure that the first date is smaller than the second! That means the optimizer has no idea about the number of returned rows and will just use a generic plan without taking specific statistics into account. It would be the same if you had a user-defined function xyt() which would return a date for the range. The optimizer has no way to know what date-value will result - This means you get a generall all purpose plan, which should be pretty decent for any date-range specified.

In the first and third case, the optimizer seems to understand the date directly and can guess the number of rows which are in the date range by using statistics. So while the second Query was to the Optimizer like BETWEEN X AND 3 this Query is like BETWEEN 1 AND 3 So he optimizes the query plan for the predicted number of returned rows!

The Strange thing seems to be, that the query optimizer has such problems with a strange date format, could be filed as a bug/request for improvement...

But an important Point:

  1. A full table scan does not have to be a BAD plan... As well as using an index is not always faster!
  2. The cost in the query plan is in no way directly related to the actual execution time or performance - it is an internal measurement to compare different plans for the SAME QUERY (So you cannot compare the cost of different querys like your queries 1,2 and 3)

Basically if you return a high number of rows from a table a full table scan without index access will in many cases be much faster, especially when operating on certain partitions! - The Table scan will only access the pertition for the matching date range - so only for the date in question and returns all rows from this partition. This is much faster than queriyng the index for each single row and then extracting the row by index access... Try to profile the querys - the full table scan on partition should be 3 times as fast with much less IO

Falco
  • 3,097
  • 20
  • 23
  • I agree with that Oracle likely doesn't understand the format at the time of optimization in the 2nd query. It may be getting confused by the format with a literal space at the end. I'm glad that my idea isn't totally crazy. (cont) – arielCo Jun 28 '14 at 05:13
  • The index skip scan is a terrible choice because the date column is second in the index to another column with several million values, and for each value of the 1st column there may be 5-10 dates. It takes at least 100x more than a pruned full scan. It turns out that the statistics may be completely stale (at least according to user_*.last_analyzed); that could also be why it estimates 350 rows for one partition instead of tens of millions. I have to discuss the possible cause with the DBAs - maybe the gathering job times out? Thanks for the insights! – arielCo Jun 28 '14 at 05:16
  • For a certain number of rows the index scan could be faster (e.g. if you only want one row from the whole table even with the multi-column index it should be faster than a FTS) - so if the optimizer expects a low number of rows to match the query, it will use the index. If it cannot discern the value of to_date at parse-time, it will have to guess - and if it guesses wrong a bad plan is the result... You could try adding the optimizer hint for the expected cardinality – Falco Jun 29 '14 at 12:48
2

Bugs in the optimizer or the parser cause some date formats to downgrade static partition pruning to dynamic partition pruning. Partition pruning changes lead to different cardinality and costs, which then lead to significant changes to many other parts of the plan.

This answer only partially explains the problem and contains some speculation. Hopefully it will at least shed a little light on what the problem is and is not. It is at least a good starting point if you really need a complete explanation and want to submit a service request to Oracle.

Terminology and some background reading

Static partition pruning is when the optimizer determines at compile time which partition will be used. Statistics are per-partition, leading to better cardinality estimates, leading to better plans. For example, think of a table partitioned by status where the partition for CANCELLED is tiny and the partition for ACTIVE is large. Knowing which partition is used can completely change the join order and access methods of the optimal plan. Pstart and Pstop will be numeric values when static partition pruning is used.

Dynamic partition pruning is when the optimizer cannot determine the partition until runtime. Data is only retrieved from the required partitions but the execution plan is built without special knowledge of which partition is used. Some partition statistic estimates will be a simple average of all available partitions. In the above example of a table partitioned by status, the average of a tiny partition and a large partition does not accurately represent either. Either Pstart or Pstop will include the word KEY when dynamic partition pruning is used.

The Oracle® Database VLDB and Partitioning Guide includes a section about Datatype Conversions that is worth reading. For example, one relevant quote from the manual:

Only a properly applied TO_DATE function guarantees that the database is capable of uniquely determining the date value and using it potentially for static pruning, which is especially beneficial for single partition access.

Sample schema and data

This simple test case demonstrates the problem. It also rules out common performance issues, such as missing statistics.

First, create a sample table with 2 partitions, one large and one small.

create table gprs_history_import(id number, start_call_date_time date)
partition by range (start_call_date_time)
(
    partition p_large values less than (date '2014-06-01'),
    partition p_small values less than (date '2014-07-01')
);

insert into gprs_history_import
select level, date '2014-05-01'
from dual connect by level <= 1000;

insert into gprs_history_import
select level, date '2014-06-01'
from dual connect by level <= 10;

begin
    dbms_stats.gather_table_stats(user, 'GPRS_HISTORY_IMPORT');
end;
/

select count(*) from gprs_history_import partition (p_large); -- 1000
select count(*) from gprs_history_import partition (p_small); --   10

Static to dynamic causes bad cardinality estimates

The static cardinality estimate is a perfect 1000. The extra space in second date format changes Pstop from 1 to KEY. The plan changes from static to dynamic partition pruning. The dynamic estimate is an inaccurate 505, the average of 1000 and 10

For simplicity this example only shows a bad cardinality estimate. It is not necessary to show a query running slow, since bad row estimates inevitably lead to bad execution plans for many reasons.

explain plan for select /* static partition pruning */ *
from gprs_history_import
where start_call_date_time < to_date('20140601 000000','yyyymmdd hh24miss');

select * from table(dbms_xplan.display);

Plan hash value: 452971246

--------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                     |  1000 | 12000 |    16   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|                     |  1000 | 12000 |    16   (0)| 00:00:01 |     1 |     1 |
|   2 |   TABLE ACCESS FULL    | GPRS_HISTORY_IMPORT |  1000 | 12000 |    16   (0)| 00:00:01 |     1 |     1 |
--------------------------------------------------------------------------------------------------------------

explain plan for select /* dybnamic partition pruning */ *
from gprs_history_import
where start_call_date_time < to_date('20140601 ','yyyymmdd ');

select * from table(dbms_xplan.display);


Plan hash value: 2464174375

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                     |   505 |  6060 |    29   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|                     |   505 |  6060 |    29   (0)| 00:00:01 |     1 |   KEY |
|*  2 |   TABLE ACCESS FULL      | GPRS_HISTORY_IMPORT |   505 |  6060 |    29   (0)| 00:00:01 |     1 |   KEY |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("START_CALL_DATE_TIME"<TO_DATE('20140601 ','yyyymmdd '))

Date format parsing problems

Now for some speculation as to why the query moves from static to dynamic partition pruning.

It is not always obvious when the optimizer can use static and dynamic partitioning. In general, literals allow static pruning and variables require dynamic pruning.

--#1: Obviously static: It uses an unambiguous ANSI date literal.
select * from gprs_history_import where start_call_date_time = date '2000-11-01';

--#2: Obviously dyanmic: It uses a bind variable.
select * from gprs_history_import where start_call_date_time = :date;

--#3: Probably dynamic: The optimizer cannot always infer the literal value. 
select * from gprs_history_import where start_call_date_time = 
    (select date '2000-11-01' from dual);

--#4: Probably static: FEB is not always valid, but Oracle can figure that out.
select * from gprs_history_import where start_call_date_time = 
    to_date('01-FEB-2000', 'DD-MON-YYYY');

When you consider all the performance and internationalization issues around case #4 it becomes clear how difficult it is to parse dates. The value of to_date('01-FEB-2000', 'DD-MON-YYYY') depends on several NLS parameters, such as NLS_DATE_LANGUAGE. The query is valid for English but not German. And if NLS_CALENDAR is not set to GREGORIAN than even an all-numeric date format can be wrong. The to_date string is not a bind value but it's not clearly a literal either.

The difference between true date literals and formatted strings is more apparent if hard parses are counted. Query #1 will not force a hard-parse even if the language is changed, but query #4 will. This can be demonstrated by running several variations of each, changing the language, and then running select value from v$sesstat natural join v$statname where name = 'parse count (hard)' and sid = userenv('SID');.

Oracle must have a variable somewhere to denote "this is not a bind variable but could lead to different plans based on NLS settings". That variable does not always lead to dynamic partition pruning but there must be some bugs somewhere that occasionally break it.

Jon Heller
  • 31,663
  • 6
  • 63
  • 114
  • _The Oracle® Database VLDB and Partitioning Guide ..._ - I like this! I wasn't aware of dynamic vs static pruning, and the warning about `to_date` fits with my hunch that Oracle didn't have the value of the to_date expression ready before execution because that particular format threw it off. – arielCo Jun 28 '14 at 06:39
  • Sure enough, the rows=350 estimate in my examples is totally off (20MM is closer to reality). And the skip-scan plan is simply terrible, because the skipped column has several million different values and a few dates per node. But I just found that I may have stale _statistics_ even though they're supposedly gathered automatically; I need to double-check on Monday and discuss the possible causes with the DBAs. *Thanks for all the leads* - I'll get on it when I have the time. – arielCo Jun 28 '14 at 06:42