Skip to content

Correlated Subquery to Analytics

At work I came across a query that was using correlated sub select and I thought it is worth sharing. I have seen many queries where correlated sub select is used which can easily be written using analytics. In this particular query correlated was used to select rows with maximum effective date for every contrived key and category. This query was taking anywhere between 2-4 seconds to complete. I was able to re-write this query using analytics and brought the execution time to less than 500 milliseconds.

I’ll demonstrate by re-generating the similar situation.

Let’s create a members table:


SQL> drop table members;

Table dropped.

SQL> 
SQL> create table members (
  2    Member_ID varchar2(10),
  3    member_ck varchar2(10),
  4    Category char(1),
  5    eff_date date
  6    ) ;

Table created.

Now load some data into members table:

SQL> insert into members values ('A','1','Z',sysdate) ;

1 row created.

SQL> 
SQL> insert into members values ('A','1','Z',sysdate-5) ;

1 row created.

SQL> 
SQL> insert into members values ('A','1','X',sysdate) ;

1 row created.

SQL> 
SQL> insert into members values ('A','1','X',sysdate-5) ;

1 row created.

SQL> 
SQL> insert into members values ('A','2','Z',sysdate-90) ;

1 row created.

SQL> 
SQL> insert into members values ('A','2','Z',sysdate-100) ;

1 row created.

SQL> 
SQL> insert into members values ('A','2','X',sysdate-80) ;

1 row created.

SQL> 
SQL> insert into members values ('A','2','X',sysdate-150) ;

1 row created.

SQL> 
SQL> insert into members values ('B','3','Z',sysdate-90) ;

1 row created.

SQL> 
SQL> insert into members values ('B','3','X',sysdate-200) ;

1 row created.

SQL> commit;

Following are rows for member ‘A.’ From this data set we need to select rownum 1,3,5 and 7. These are rows with maximum eff_date for member_ck and category.


SQL> select rownum, member_id, member_ck, category, eff_date
  2  from members a where member_id = 'A'
  3  ;

    ROWNUM MEMBER_ID  MEMBER_CK  C EFF_DATE
---------- ---------- ---------- - ---------
		 1 A	      1 		 Z 24-NOV-16
		 2 A	      1 		 Z 19-NOV-16
		 3 A	      1 		 X 24-NOV-16
	 	 4 A	      1 		 X 19-NOV-16
	 	 5 A	      2 		 Z 26-AUG-16
	 	 6 A	      2 		 Z 16-AUG-16
	 	 7 A	      2 		 X 05-SEP-16
	 	 8 A	      2 		 X 27-JUN-16

8 rows selected.

Following correlated sub query gets the rows that we are interested in. But as table grows, this will take long time to complete. You can see this from the explain plan. Optimizer needs to do full table scan twice, 31 recursive calls and 101 consistent gets.


SQL> set autotrace on
SQL> 
SQL> SELECT member_id,
  2    member_ck,
  3    category,
  4    eff_date
  5  FROM members a
  6  WHERE member_id = 'A'
  7  AND eff_date   IN
  8    (SELECT MAX(eff_date)
  9    FROM members b
 10    WHERE b.member_ck = a.member_ck
 11    AND b.category	 = a.category
 12    AND b.member_id	 = a.member_id
 13    )
 14  ORDER BY member_id,
 15    member_ck,
 16    category,
 17    eff_date ;

MEMBER_ID  MEMBER_CK  C EFF_DATE
---------- ---------- - ---------
A		   1	      X 24-NOV-16
A		   1	      Z 24-NOV-16
A		   2	      X 05-SEP-16
A		   2	      Z 26-AUG-16

Elapsed: 00:00:00.04

Execution Plan
----------------------------------------------------------
Plan hash value: 3201527363

--------------------------------------------------------------------------------
| Id  | Operation	 	     | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	       |     6 |   156 |    12	 (9)| 00:00:01 |
|   1 |  SORT ORDER BY	     |	       |     6 |   156 |    12	 (9)| 00:00:01 |
|*  2 |   FILTER		     |	       |       |       |	    	|	       |
|*  3 |    TABLE ACCESS FULL | MEMBERS |     8 |   208 |     3	 (0)| 00:00:01 |
|   4 |    SORT AGGREGATE    |	       |     1 |    26 |		    |	       |
|*  5 |     TABLE ACCESS FULL| MEMBERS |     1 |    26 |     3	 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("EFF_DATE"= (SELECT MAX("EFF_DATE") FROM "MEMBERS" "B"
	      WHERE "B"."MEMBER_CK"=:B1 AND "B"."MEMBER_ID"=:B2 AND
	      "B"."CATEGORY"=:B3))
   3 - filter("MEMBER_ID"='A')
   5 - filter("B"."MEMBER_CK"=:B1 AND "B"."MEMBER_ID"=:B2 AND
	      "B"."CATEGORY"=:B3)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
	 31  recursive calls
	  2  db block gets
	101  consistent gets
	  0  physical reads
	  0  redo size
	862  bytes sent via SQL*Net to client
	551  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  4  sorts (memory)
	  0  sorts (disk)
	  4  rows processed

Now let’s re-write using analytics function max() over (partition by) clause. Note the cost is much less and same result is achieved by only doing 5 recursive calls and 15 consistent gets. Clearly running this query is much cheaper than correlated sub select.

SQL> 
SQL> SELECT m.member_id,
  2    m.member_ck,
  3    m.category,
  4    m.eff_date
  5  FROM
  6    (SELECT member_id,
  7  	 member_ck,
  8  	 category,
  9  	 eff_date,
 10  	 MAX(eff_date) over (partition BY category, member_ck) max_eff_date
 11    FROM members
 12    WHERE member_id = 'A'
 13    ) m
 14  WHERE m.max_eff_date = m.eff_date
 15  ORDER BY m.member_id,
 16    m.member_ck,
 17    m.category,
 18    m.eff_date ;

MEMBER_ID  MEMBER_CK  C EFF_DATE
---------- ---------- - ---------
A		   1		  X 24-NOV-16
A		   1		  Z 24-NOV-16
A		   2		  X 05-SEP-16
A		   2		  Z 26-AUG-16

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 3057722337

--------------------------------------------------------------------------------
| Id  | Operation	    	 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	       |     8 |   280 |     5	(40)| 00:00:01 |
|   1 |  SORT ORDER BY	     |	       |     8 |   280 |     5	(40)| 00:00:01 |
|*  2 |   VIEW			     |	       |     8 |   280 |     4	(25)| 00:00:01 |
|   3 |    WINDOW SORT	     |	       |     8 |   208 |     4	(25)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| MEMBERS |     8 |   208 |     3	 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("M"."MAX_EFF_DATE"="M"."EFF_DATE")
   4 - filter("MEMBER_ID"='A')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
	  5  recursive calls
	  0  db block gets
	 15  consistent gets
	  0  physical reads
	  0  redo size
	862  bytes sent via SQL*Net to client
	551  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  2  sorts (memory)
	  0  sorts (disk)
	  4  rows processed

Bottom line is to avoid writing correlated sub queries.

I hope you found this useful. Thank you for reading and visiting my blog.

Published inSQL Query performance

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *