At work I came across a query that was using correlated sub select and I thought it is worth sharing. I have seen many queries where correlated sub select is used which can easily be written using analytics. In this particular query correlated was used to select rows with maximum effective date for every contrived key and category. This query was taking anywhere between 2-4 seconds to complete. I was able to re-write this query using analytics and brought the execution time to less than 500 milliseconds.
I’ll demonstrate by re-generating the similar situation.
Let’s create a members table:
SQL> drop table members; Table dropped. SQL> SQL> create table members ( 2 Member_ID varchar2(10), 3 member_ck varchar2(10), 4 Category char(1), 5 eff_date date 6 ) ; Table created.
Now load some data into members table:
SQL> insert into members values ('A','1','Z',sysdate) ; 1 row created. SQL> SQL> insert into members values ('A','1','Z',sysdate-5) ; 1 row created. SQL> SQL> insert into members values ('A','1','X',sysdate) ; 1 row created. SQL> SQL> insert into members values ('A','1','X',sysdate-5) ; 1 row created. SQL> SQL> insert into members values ('A','2','Z',sysdate-90) ; 1 row created. SQL> SQL> insert into members values ('A','2','Z',sysdate-100) ; 1 row created. SQL> SQL> insert into members values ('A','2','X',sysdate-80) ; 1 row created. SQL> SQL> insert into members values ('A','2','X',sysdate-150) ; 1 row created. SQL> SQL> insert into members values ('B','3','Z',sysdate-90) ; 1 row created. SQL> SQL> insert into members values ('B','3','X',sysdate-200) ; 1 row created. SQL> commit;
Following are rows for member ‘A.’ From this data set we need to select rownum 1,3,5 and 7. These are rows with maximum eff_date for member_ck and category.
SQL> select rownum, member_id, member_ck, category, eff_date 2 from members a where member_id = 'A' 3 ; ROWNUM MEMBER_ID MEMBER_CK C EFF_DATE ---------- ---------- ---------- - --------- 1 A 1 Z 24-NOV-16 2 A 1 Z 19-NOV-16 3 A 1 X 24-NOV-16 4 A 1 X 19-NOV-16 5 A 2 Z 26-AUG-16 6 A 2 Z 16-AUG-16 7 A 2 X 05-SEP-16 8 A 2 X 27-JUN-16 8 rows selected.
Following correlated sub query gets the rows that we are interested in. But as table grows, this will take long time to complete. You can see this from the explain plan. Optimizer needs to do full table scan twice, 31 recursive calls and 101 consistent gets.
SQL> set autotrace on SQL> SQL> SELECT member_id, 2 member_ck, 3 category, 4 eff_date 5 FROM members a 6 WHERE member_id = 'A' 7 AND eff_date IN 8 (SELECT MAX(eff_date) 9 FROM members b 10 WHERE b.member_ck = a.member_ck 11 AND b.category = a.category 12 AND b.member_id = a.member_id 13 ) 14 ORDER BY member_id, 15 member_ck, 16 category, 17 eff_date ; MEMBER_ID MEMBER_CK C EFF_DATE ---------- ---------- - --------- A 1 X 24-NOV-16 A 1 Z 24-NOV-16 A 2 X 05-SEP-16 A 2 Z 26-AUG-16 Elapsed: 00:00:00.04 Execution Plan ---------------------------------------------------------- Plan hash value: 3201527363 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 156 | 12 (9)| 00:00:01 | | 1 | SORT ORDER BY | | 6 | 156 | 12 (9)| 00:00:01 | |* 2 | FILTER | | | | | | |* 3 | TABLE ACCESS FULL | MEMBERS | 8 | 208 | 3 (0)| 00:00:01 | | 4 | SORT AGGREGATE | | 1 | 26 | | | |* 5 | TABLE ACCESS FULL| MEMBERS | 1 | 26 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("EFF_DATE"= (SELECT MAX("EFF_DATE") FROM "MEMBERS" "B" WHERE "B"."MEMBER_CK"=:B1 AND "B"."MEMBER_ID"=:B2 AND "B"."CATEGORY"=:B3)) 3 - filter("MEMBER_ID"='A') 5 - filter("B"."MEMBER_CK"=:B1 AND "B"."MEMBER_ID"=:B2 AND "B"."CATEGORY"=:B3) Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 31 recursive calls 2 db block gets 101 consistent gets 0 physical reads 0 redo size 862 bytes sent via SQL*Net to client 551 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 4 sorts (memory) 0 sorts (disk) 4 rows processed
Now let’s re-write using analytics function max() over (partition by) clause. Note the cost is much less and same result is achieved by only doing 5 recursive calls and 15 consistent gets. Clearly running this query is much cheaper than correlated sub select.
SQL> SQL> SELECT m.member_id, 2 m.member_ck, 3 m.category, 4 m.eff_date 5 FROM 6 (SELECT member_id, 7 member_ck, 8 category, 9 eff_date, 10 MAX(eff_date) over (partition BY category, member_ck) max_eff_date 11 FROM members 12 WHERE member_id = 'A' 13 ) m 14 WHERE m.max_eff_date = m.eff_date 15 ORDER BY m.member_id, 16 m.member_ck, 17 m.category, 18 m.eff_date ; MEMBER_ID MEMBER_CK C EFF_DATE ---------- ---------- - --------- A 1 X 24-NOV-16 A 1 Z 24-NOV-16 A 2 X 05-SEP-16 A 2 Z 26-AUG-16 Elapsed: 00:00:00.00 Execution Plan ---------------------------------------------------------- Plan hash value: 3057722337 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8 | 280 | 5 (40)| 00:00:01 | | 1 | SORT ORDER BY | | 8 | 280 | 5 (40)| 00:00:01 | |* 2 | VIEW | | 8 | 280 | 4 (25)| 00:00:01 | | 3 | WINDOW SORT | | 8 | 208 | 4 (25)| 00:00:01 | |* 4 | TABLE ACCESS FULL| MEMBERS | 8 | 208 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("M"."MAX_EFF_DATE"="M"."EFF_DATE") 4 - filter("MEMBER_ID"='A') Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 5 recursive calls 0 db block gets 15 consistent gets 0 physical reads 0 redo size 862 bytes sent via SQL*Net to client 551 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 4 rows processed
Bottom line is to avoid writing correlated sub queries.
I hope you found this useful. Thank you for reading and visiting my blog.