At work I came across a query that was using correlated sub select and I thought it is worth sharing. I have seen many queries where correlated sub select is used which can easily be written using analytics. In this particular query correlated was used to select rows with maximum effective date for every contrived key and category. This query was taking anywhere between 2-4 seconds to complete. I was able to re-write this query using analytics and brought the execution time to less than 500 milliseconds.

I’ll demonstrate by re-generating the similar situation.

Let’s create a members table:


SQL> drop table members;

Table dropped.

SQL>
SQL> create table members (
2    Member_ID varchar2(10),
3    member_ck varchar2(10),
4    Category char(1),
5    eff_date date
6    ) ;

Table created.

Now load some data into members table:

SQL> insert into members values ('A','1','Z',sysdate) ;

1 row created.

SQL>
SQL> insert into members values ('A','1','Z',sysdate-5) ;

1 row created.

SQL>
SQL> insert into members values ('A','1','X',sysdate) ;

1 row created.

SQL>
SQL> insert into members values ('A','1','X',sysdate-5) ;

1 row created.

SQL>
SQL> insert into members values ('A','2','Z',sysdate-90) ;

1 row created.

SQL>
SQL> insert into members values ('A','2','Z',sysdate-100) ;

1 row created.

SQL>
SQL> insert into members values ('A','2','X',sysdate-80) ;

1 row created.

SQL>
SQL> insert into members values ('A','2','X',sysdate-150) ;

1 row created.

SQL>
SQL> insert into members values ('B','3','Z',sysdate-90) ;

1 row created.

SQL>
SQL> insert into members values ('B','3','X',sysdate-200) ;

1 row created.

SQL> commit;

Following are rows for member ‘A.’ From this data set we need to select rownum 1,3,5 and 7. These are rows with maximum eff_date for member_ck and category.


SQL> select rownum, member_id, member_ck, category, eff_date
2  from members a where member_id = 'A'
3  ;

ROWNUM MEMBER_ID  MEMBER_CK  C EFF_DATE
---------- ---------- ---------- - ---------
1 A	      1 		 Z 24-NOV-16
2 A	      1 		 Z 19-NOV-16
3 A	      1 		 X 24-NOV-16
4 A	      1 		 X 19-NOV-16
5 A	      2 		 Z 26-AUG-16
6 A	      2 		 Z 16-AUG-16
7 A	      2 		 X 05-SEP-16
8 A	      2 		 X 27-JUN-16

8 rows selected.

Following correlated sub query gets the rows that we are interested in. But as table grows, this will take long time to complete. You can see this from the explain plan. Optimizer needs to do full table scan twice, 31 recursive calls and 101 consistent gets.


SQL> set autotrace on
SQL>
SQL> SELECT member_id,
2    member_ck,
3    category,
4    eff_date
5  FROM members a
6  WHERE member_id = 'A'
7  AND eff_date   IN
8    (SELECT MAX(eff_date)
9    FROM members b
10    WHERE b.member_ck = a.member_ck
11    AND b.category	 = a.category
12    AND b.member_id	 = a.member_id
13    )
14  ORDER BY member_id,
15    member_ck,
16    category,
17    eff_date ;

MEMBER_ID  MEMBER_CK  C EFF_DATE
---------- ---------- - ---------
A		   1	      X 24-NOV-16
A		   1	      Z 24-NOV-16
A		   2	      X 05-SEP-16
A		   2	      Z 26-AUG-16

Elapsed: 00:00:00.04

Execution Plan
----------------------------------------------------------
Plan hash value: 3201527363

--------------------------------------------------------------------------------
| Id  | Operation	 	     | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	       |     6 |   156 |    12	 (9)| 00:00:01 |
|   1 |  SORT ORDER BY	     |	       |     6 |   156 |    12	 (9)| 00:00:01 |
|*  2 |   FILTER		     |	       |       |       |	    	|	       |
|*  3 |    TABLE ACCESS FULL | MEMBERS |     8 |   208 |     3	 (0)| 00:00:01 |
|   4 |    SORT AGGREGATE    |	       |     1 |    26 |		    |	       |
|*  5 |     TABLE ACCESS FULL| MEMBERS |     1 |    26 |     3	 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("EFF_DATE"= (SELECT MAX("EFF_DATE") FROM "MEMBERS" "B"
WHERE "B"."MEMBER_CK"=:B1 AND "B"."MEMBER_ID"=:B2 AND
"B"."CATEGORY"=:B3))
3 - filter("MEMBER_ID"='A')
5 - filter("B"."MEMBER_CK"=:B1 AND "B"."MEMBER_ID"=:B2 AND
"B"."CATEGORY"=:B3)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Statistics
----------------------------------------------------------
31  recursive calls
2  db block gets
101  consistent gets
0  physical reads
0  redo size
862  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
4  sorts (memory)
0  sorts (disk)
4  rows processed

Now let’s re-write using analytics function max() over (partition by) clause. Note the cost is much less and same result is achieved by only doing 5 recursive calls and 15 consistent gets. Clearly running this query is much cheaper than correlated sub select.

SQL>
SQL> SELECT m.member_id,
2    m.member_ck,
3    m.category,
4    m.eff_date
5  FROM
6    (SELECT member_id,
7  	 member_ck,
8  	 category,
9  	 eff_date,
10  	 MAX(eff_date) over (partition BY category, member_ck) max_eff_date
11    FROM members
12    WHERE member_id = 'A'
13    ) m
14  WHERE m.max_eff_date = m.eff_date
15  ORDER BY m.member_id,
16    m.member_ck,
17    m.category,
18    m.eff_date ;

MEMBER_ID  MEMBER_CK  C EFF_DATE
---------- ---------- - ---------
A		   1		  X 24-NOV-16
A		   1		  Z 24-NOV-16
A		   2		  X 05-SEP-16
A		   2		  Z 26-AUG-16

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 3057722337

--------------------------------------------------------------------------------
| Id  | Operation	    	 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	       |     8 |   280 |     5	(40)| 00:00:01 |
|   1 |  SORT ORDER BY	     |	       |     8 |   280 |     5	(40)| 00:00:01 |
|*  2 |   VIEW			     |	       |     8 |   280 |     4	(25)| 00:00:01 |
|   3 |    WINDOW SORT	     |	       |     8 |   208 |     4	(25)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| MEMBERS |     8 |   208 |     3	 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("M"."MAX_EFF_DATE"="M"."EFF_DATE")
4 - filter("MEMBER_ID"='A')

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Statistics
----------------------------------------------------------
5  recursive calls
0  db block gets
15  consistent gets
0  physical reads
0  redo size
862  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
2  sorts (memory)
0  sorts (disk)
4  rows processed

Bottom line is to avoid writing correlated sub queries.

I hope you found this useful. Thank you for reading and visiting my blog.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *