Is Your PC? |
|---|
| * Slow |
| * Crashing |
| * Blue-Screening |
| * Not Connecting |
| * Misbehaving |
| * Infected |
Products |
|---|
| * Home PCs |
| * Anti-Virus Software |
Services |
|---|
| * Computer Service |
| * Network Setup |
| * Web Hosting |
|   |
| * Price List |
| * Home |
| * Contact Us |
| Useful Stuff | Oracle Scripts | Unix Scripts |
This started as a problem to find the latest date from three related tables but ended up being an exercise in getting the optimizer to build a plan that used Range Scans instead of Fast Full Scans.
Three tables contain a date column - some sort of a last-update-date and the SQL has to find the latest one given that the tables are related in some way. To help clarify the problem, here are the essentials of the tables concerned...
table U (80 rows)
---------------------
u_id number
v_id number
stuff varchar2(700)
|
table W (800 rows)
---------------------
w_id number
u_id number
udate date
stuff varchar2(50)
|
table O (25000 rows)
---------------------
o_id number
u_id number
udate date
stuff varchar2(80)
|
table R (25000 rows)
---------------------
r_id number
o_id number
udate date
stuff varchar2(70)
|
Primary key constraints are present on u.u_id, w.w_id, o.o_id and r.r_id but there are no other indexes which is perhaps not too surprising given the relatively small number of rows in each table.
Some test data was inserted using ( It's worth checking that your dba_objects has at least 25000 rows in it - if it does, you don't have to join dba_objects to "u" with a cartesian join just to generate rows )...
insert into u select rownum, trunc(dbms_random.value*8)+1, rpad(object_name,650,'x') from dba_objects where rownum <= 80; insert into w select rownum, trunc(dbms_random.value*80), sysdate-trunc(dbms_random.value*700), rpad(object_name,45,'x') from dba_objects where rownum <= 800; insert into o select rownum, trunc(dbms_random.value*80), sysdate-trunc(dbms_random.value*700), rpad(object_name,75,'x') from dba_objects, u where rownum <= 25000; insert into r select rownum, trunc(dbms_random.value*25000), sysdate-trunc(dbms_random.value*700), rpad(object_name,65,'x') from dba_objects, u where rownum <= 25000;
In the query, u.v_id is supplied programatically via a bind variable and the distribution u_id's for each v_id is as follows...
SQL> select v_id, count(*) from u group by v_id;
V_ID COUNT(*)
---------- ----------
1 13
2 4
3 13
4 11
5 13
6 11
7 10
8 5
We know the table design and the data but imagine if you will that the execution plan here ( labelled "original execution plan" ) is your first contact with this problem. There is a number of things that are quite striking about it. Here it is together with the original SQL and my test value ...
var v_v_id number;
exec :v_v_id := 5;
set autotr on
select greatest( max(unique(r.udate)), max(unique(o.udate)), max(unique(w.udate)))
from r, o, w
where w.u_id in (select u_id from u where v_id=:v_v_id)
and o.u_id = w.u_id
and r.o_id = o.o_id;
set autotr off
GREATEST(
---------
18-MAR-11
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | | 453 (2)| 00:00:06 |
| 1 | SORT AGGREGATE | | 1 | 46 | | | |
|* 2 | HASH JOIN | | 31426 | 1411K| | 453 (2)| 00:00:06 |
|* 3 | TABLE ACCESS FULL | U | 10 | 60 | | 5 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 251K| 9820K| | 446 (2)| 00:00:06 |
| 5 | TABLE ACCESS FULL | W | 800 | 8800 | | 5 (0)| 00:00:01 |
| 6 | MERGE JOIN | | 25000 | 708K| | 439 (2)| 00:00:06 |
| 7 | SORT JOIN | | 25000 | 317K| 1192K| 208 (1)| 00:00:03 |
| 8 | TABLE ACCESS FULL| R | 25000 | 317K| | 85 (0)| 00:00:02 |
|* 9 | SORT JOIN | | 25000 | 390K| 1384K| 231 (2)| 00:00:03 |
| 10 | TABLE ACCESS FULL| O | 25000 | 390K| | 94 (0)| 00:00:02 |
---------------------------------------------------------------------------------------
( original execution plan )
Statistics
----------------------------------------------------------
683 consistent gets
0 physical reads
6 sorts (memory)
0 sorts (disk)
Although the query runs quickly, all of the table access is via full table scans and there are two sorts. It is obvious that this query is quite likely to run into problems if, or more probably when, the tables grow.
The tables have generated keys that use just a single column number. This type of key is not without problems but is very compact and somethimes the best that can be done when a natural key does not exist. Because only a few columns are needed for this query, I added some indexes, gathered stats and ran my test again ...
create index u_i on u( v_id, u_id );
create index w_i on w( u_id, udate );
create index o_i on o( u_id, udate, o_id );
create index r_i on r( o_id, udate );
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | | 308 (4)| 00:00:04 |
| 1 | SORT AGGREGATE | | 1 | 46 | | | |
|* 2 | HASH JOIN | | 69227 | 3109K| | 308 (4)| 00:00:04 |
|* 3 | INDEX RANGE SCAN | U_I | 8 | 48 | | 1 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 398K| 15M| | 304 (3)| 00:00:04 |
| 5 | INDEX FAST FULL SCAN | W_I | 793 | 8723 | | 3 (0)| 00:00:01 |
| 6 | MERGE JOIN | | 24158 | 684K| | 299 (2)| 00:00:04 |
| 7 | SORT JOIN | | 24158 | 306K| 1160K| 141 (2)| 00:00:02 |
| 8 | INDEX FAST FULL SCAN| R_I | 24158 | 306K| | 23 (0)| 00:00:01 |
|* 9 | SORT JOIN | | 24158 | 377K| 1352K| 157 (2)| 00:00:02 |
| 10 | INDEX FAST FULL SCAN| O_I | 24158 | 377K| | 26 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
( first-attempt execution plan )
Statistics
----------------------------------------------------------
189 consistent gets
0 physical reads
2 sorts (memory)
0 sorts (disk)
An improvement - only the indexes are being accessed which, because they are smaller than the tables means the same statement completes using only about 40% of the effort needed previously.
I'm still concerned though. This execution plan is hardly different to the original one in so much as it requires three
full object scans instead of four and there is still a couple of sorts in there as well. Despite being faster, this plan
suffers from exactly the same problem as the original query namely - it's not scalable. Not only will the fast full
index scans start to take longer as the number of rows increase but eventually the sorts will become too big to be memory
sorts and there will be a definite slow down. Even if the number of rows remains about the same, ordinary "vanilla-flavoured"
B-tree indexes grow in size as keys are added and deleted, making fast full scans take longer.
Is this the best that can be done?
How about this ...
select greatest( max(r.udate), max(o.udate), max(w.udate))
from u, w, o, r
where o.u_id = u.u_id
and u.v_id = :v_v_id
and o.u_id = w.u_id
and r.o_id = o.o_id
and o.udate > ( select max(w.udate) udate from u, w where u.u_id = w.u_id and u.v_id = :v_v_id) ;
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 42 (3)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 46 | | |
|* 2 | HASH JOIN | | 2490 | 111K| 39 (3)| 00:00:01 |
|* 3 | HASH JOIN | | 246 | 8610 | 36 (3)| 00:00:01 |
| 4 | NESTED LOOPS | | 156 | 3432 | 11 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | U_I | 10 | 60 | 1 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | O_I | 16 | 256 | 1 (0)| 00:00:01 |
| 7 | SORT AGGREGATE | | 1 | 17 | | |
| 8 | NESTED LOOPS | | 100 | 1700 | 3 (0)| 00:00:01 |
| 9 | INDEX FAST FULL SCAN| W_I | 800 | 8800 | 3 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | U_I | 1 | 6 | 0 (0)| 00:00:01 |
| 11 | INDEX FAST FULL SCAN | R_I | 25000 | 317K| 24 (0)| 00:00:01 |
| 12 | INDEX FAST FULL SCAN | W_I | 800 | 8800 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------
( second-attempt execution plan )
Statistics
----------------------------------------------------------
123 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
Better but only a little bit. I've got rid of the fast full scan of O_I and there are no memory sorts. The way O_I is accessed is using both u_id and udate. This means it works out to be cheaper to do a range scan compared to a fast full scan.
Although the benefit looks small ( 123 consistent gets compared to 189 ) the change could be significant. As it happens, the max(udate) in W, O and R stay roughly in line so the max(udate) that we want from O is probably amongst the "top" 1000 rows. If the tables grew to ten times their original size by getting filled up with historical data, a full scan of O_I would end up reading 250000 rows with the SQL from the first attempt but still just 1000 with this second attempt.
If getting rid of the fast full scan of index O_I is a good thing, then it must follow that getting rid of the fast full scan of R_I would be a good thing too...
Drop the indexes O_I and R_I and re-create them thus ...
create index o_i on o( udate, u_id, o_id );
create index r_i on r( udate, o_id );
with top_o as
(
select o_id, u_id, udate
from o
where udate > ( select max(w.udate) udate from u, w where u.u_id = w.u_id and u.v_id = :v_v_id)
),
top_r as
(
select o_id, udate
from r
where udate > ( select max(w.udate) udate from u, w where u.u_id = w.u_id and u.v_id = :v_v_id)
)
select greatest( max(r.udate), max(o.udate), max(w.udate))
from u, w, top_o o, top_r r
where o.u_id = u.u_id
and u.v_id = :v_v_id
and o.u_id = w.u_id
and r.o_id = o.o_id;
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 14 (8)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 46 | | |
|* 2 | HASH JOIN | | 1577 | 72542 | 8 (13)| 00:00:01 |
|* 3 | HASH JOIN | | 158 | 5530 | 5 (20)| 00:00:01 |
| 4 | NESTED LOOPS | | 156 | 3432 | 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | O_I | 1250 | 20000 | 2 (0)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 17 | | |
| 7 | NESTED LOOPS | | 100 | 1700 | 3 (0)| 00:00:01 |
| 8 | INDEX FAST FULL SCAN| W_I | 800 | 8800 | 3 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | U_I | 1 | 6 | 0 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | U_I | 1 | 6 | 0 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | R_I | 1250 | 16250 | 2 (0)| 00:00:01 |
| 12 | SORT AGGREGATE | | 1 | 17 | | |
| 13 | NESTED LOOPS | | 100 | 1700 | 3 (0)| 00:00:01 |
| 14 | INDEX FAST FULL SCAN | W_I | 800 | 8800 | 3 (0)| 00:00:01 |
|* 15 | INDEX RANGE SCAN | U_I | 1 | 6 | 0 (0)| 00:00:01 |
| 16 | INDEX FAST FULL SCAN | W_I | 800 | 8800 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------
( final execution plan )
Statistics
----------------------------------------------------------
29 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
|   Software4Students   |   Your advert here - call 0116 2870 610   |   Internet Connection Speedtest   |   Premier Maps Online   |