Performance of PL/SQL Functions using SQL vs Pure Code

I received a comment the other day from a user of my PL/SQL packages about how he had extracted the GEOM.isCompound() function from the GEOM package, and re-written it in pure PL/SQL getting rid of the SQL pivot that that I use.

That’s the sort of behaviour I encourage.

The user’s comment was this:

This [i.e., his rewrite without the SQL] runs about 20-30 times faster and answers the same question.

And this is about what I would have expected!

But this comment reminded me about that article I have been meaning to write (for the past few years) on how to examine the performance differences between two implementations of a function in PL/SQL. I used the technique in my article on a fast method for creating multi-point geometries but I never got around to the sort of article that follows. At the end of the article I will discuss the SQL vs pure programmatic approach: there is no right answer, just some observations from someone who has written the odd PL/SQL function over the years.

The original GEOM.isCompound() function was extracted for the purpose of this article. With a slight rename, here it is:

 create or replace
 Function is_Compound(p_elem_info in mdsys.sdo_elem_info_array)
   return boolean
 Is
   v_compound_element_count number := 0;
 Begin
   SELECT count(*) as c_element_count
     INTO v_compound_element_count
     FROM (SELECT e.id,
                  e.etype,
                  e.offset,
                  e.interpretation
             FROM (SELECT trunc((rownum - 1) / 3,0) as id,
                          sum(case when mod(rownum,3) = 1 then sei.column_value else null end) as offset,
                          sum(case when mod(rownum,3) = 2 then sei.column_value else null end) as etype,
                          sum(case when mod(rownum,3) = 0 then sei.column_value else null end) as interpretation
                     FROM TABLE(p_elem_info) sei
                    GROUP BY trunc((rownum - 1) / 3,0)
                   ) e
          ) i
    WHERE ( i.etype = 2 AND i.interpretation = 2 )
       OR ( i.etype in (1003,2003) AND i.interpretation IN (2,4) );
   Return v_compound_element_count > 0;
 End is_Compound;

The commentator’s function is as follows:

 create or replace
 function has_compound_curves ( geom_in in sdo_geometry) 
   return integer 
 as
   etype number;
   interpret number;
 begin
   for i in geom_in.sdo_elem_info.first .. geom_in.sdo_elem_info.last loop
     case
       when mod(i,3) = 1 then null; -- continue;
       when mod(i,3) = 2 then etype := geom_in.sdo_elem_info(i);
       when mod(i,3) = 0 
       then
         interpret := geom_in.sdo_elem_info(i);
         if((etype = 2 and interpret = 2) or (etype in (1003,2003) and interpret in (2,4))) then
           return 1;
         end if;
     end case;
   end loop;
   return 0;
 end has_compound_curves;

Now, let’s compare the two using Tom Kyte’s runstats_pkg package.

 set serveroutput on size unlimited
 declare 
   v_OK   integer;
   v_geom mdsys.sdo_geometry := MDSYS.SDO_GEOMETRY(2006,null,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1,5,2,1,9,2,1,17,4,2,17,2,1,23,2,2),MDSYS.SDO_ORDINATE_ARRAY(50,15,55,15,60,15,65,15,10,25,20,30,25,25,30,30,110,178,110,175,120,175,120,178,115,180,110,178));
 begin
  runstats_pkg.rs_start;
  v_OK := has_compound_curves(v_geom);
  runstats_pkg.rs_middle;
  v_OK := case when is_compound(v_geom.sdo_elem_info) = TRUE then 1 else 0 end;
  runstats_pkg.rs_stop;
 end;
 /
 
 anonymous block completed
 Run1 ran in 0 hsecs
 Run2 ran in 1 hsecs
 Run 1 ran in 0 % of the time
 ...

There appears to be little difference. So, I’ve short-circuited any more testing as a single execution of each function occurs in less than 1/100th of a second and so cannot be reasonably compared. The only way to compare the algorithms is to write a procedure that executes each 1000 times as follows:

 create or replace
 procedure test_compound(p_geom in mdsys.sdo_geometry,
                         p_is   in integer)
 As
   v_OK          integer;
   v_is_compound boolean := case when p_is > 0 then true else false end;
 Begin
   For i IN 1..1000 LOOP
     If ( v_is_compound ) Then
       v_OK := case when is_compound(p_geom.sdo_elem_info) then 1 else 0 end;
     Else
       v_OK := has_compound_curves(p_geom);
     End If;
   End Loop;
 end test_compound;

Which we would execute using the runstats_pkg as follows:

 set serveroutput on size unlimited;
 declare 
   v_geom mdsys.sdo_geometry := MDSYS.SDO_GEOMETRY(2006,null,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1,5,2,1,9,2,1,17,4,2,17,2,1,23,2,2),MDSYS.SDO_ORDINATE_ARRAY(50,15,55,15,60,15,65,15,10,25,20,30,25,25,30,30,110,178,110,175,120,175,120,178,115,180,110,178));
 begin
  runstats_pkg.rs_start;
  test_compound(v_geom,0);
  runstats_pkg.rs_middle;
  test_compound(v_geom,1);
  runstats_pkg.rs_stop(10);
 end;
 
 anonymous block completed
 Run1 ran in 1 hsecs
 Run2 ran in 91 hsecs
 Run 1 ran in 1.1 % of the time
 
 Name                                  Run1        Run2        Diff
 LATCH.cache buffers chains             121         105         -16
 STAT...undo change vector size       2,072       2,140          68
 STAT...recursive cpu usage               0          91          91
 STAT...CPU used by this sessio           1          93          92
 STAT...redo size                     2,720       2,860         140
 STAT...execute count                     1       1,001       1,000
 STAT...calls to get snapshot s           1       1,001       1,000
 STAT...recursive calls                   1       1,003       1,002
 LATCH.shared pool                        0       1,028       1,028
 LATCH.SQL memory manager worka           6       2,006       2,000
 LATCH.library cache lock                 0       2,003       2,003
 LATCH.library cache pin                  4       2,007       2,003
 LATCH.library cache                      5       3,017       3,012
 LATCH.row cache objects                 12      15,024      15,012
 STAT...session pga memory           65,536           0     -65,536
 Run1 latches total versus runs -- difference AND pct
         Run1        Run2        Diff          Pct
          158      25,210      25,052           1%

This tells me that the pure PL/SQL implementation runs in 1.1% of the time of the original implementation based on SQL. That is a huge difference between the two which is expected and appears to be in agreement with my commentator.

Can we modify is_Compound and close the performance gap?

Let’s modify is_Compound by:

  1. making it return an integer and
  2. removing one of the SELECT in-line views that is not needed.
 create or replace
 Function is_Compound(p_elem_info in mdsys.sdo_elem_info_array)
   return integer
 Is
   v_compound_element_count number := 0;
 Begin
   SELECT count(*) as c_element_count
     INTO v_compound_element_count
     FROM (SELECT trunc((rownum - 1) / 3,0) as id,
                  sum(case when mod(rownum,3) = 1 then sei.column_value else null end) as offset,
                  sum(case when mod(rownum,3) = 2 then sei.column_value else null end) as etype,
                  sum(case when mod(rownum,3) = 0 then sei.column_value else null end) as interpretation
             FROM TABLE(p_elem_info) sei
            GROUP BY trunc((rownum - 1) / 3,0)
          ) e
    WHERE ( e.etype = 2 AND e.interpretation = 2 )
       OR ( e.etype in (1003,2003) AND e.interpretation IN (2,4) );
   Return v_compound_element_count;
 End is_Compound;
 /

And then run it again against has_compound_curves() function. Note that first we need to modify the test_compound procedure to handle the changed return type by removing the “case when is_compound….end” construct and replacing it by a simple call to is_compound.

The new results are:

 anonymous block completed
 Run1 ran in 3 hsecs
 Run2 ran in 93 hsecs
 Run 1 ran in 3.23 % of the time
  	
 Name                                  Run1        Run2        Diff
 LATCH.channel operations paren           0          14          14
 LATCH.checkpoint queue latch             0          16          16
 LATCH.cache buffers chains             121          96         -25
 STAT...undo change vector size       2,072       2,140          68
 STAT...CPU used by this sessio           5          94          89
 STAT...recursive cpu usage               2          91          89
 STAT...redo size                     2,720       2,860         140
 STAT...calls to get snapshot s           1       1,001       1,000
 STAT...execute count                     1       1,001       1,000
 STAT...recursive calls                   1       1,003       1,002
 LATCH.shared pool                        0       1,028       1,028
 LATCH.library cache pin                  4       2,007       2,003
 LATCH.library cache lock                 0       2,003       2,003
 LATCH.SQL memory manager worka           6       2,073       2,067
 LATCH.library cache                      5       3,018       3,013
 LATCH.row cache objects                 12      15,027      15,015
  
 Run1 latches total versus runs -- difference AND pct
         Run1        Run2        Diff          Pct
          160      25,310      25,150           1%

A very, very slight improvement.

Let’s try modifying the function by adding in an “and rownum < 2” predicate to “short circuit” the query once we have found a compound element. Also, let’s add in a test to short-circuit when we find a compound element (4,1005,2005) and not just a potential sub-element of one.

 create or replace
 Function is_Compound(p_elem_info in mdsys.sdo_elem_info_array)
   return integer
 Is
   v_compound_element_count number := 0;
 Begin
   SELECT count(*) as c_element_count
     INTO v_compound_element_count
     FROM (SELECT trunc((rownum - 1) / 3,0) as id,
                  sum(case when mod(rownum,3) = 1 then sei.column_value else null end) as offset,
                  sum(case when mod(rownum,3) = 2 then sei.column_value else null end) as etype,
                  sum(case when mod(rownum,3) = 0 then sei.column_value else null end) as interpretation
             FROM TABLE(p_elem_info) sei
            GROUP BY trunc((rownum - 1) / 3,0)
          ) e
    WHERE ( e.etype = 2 AND e.interpretation = 2 )
       OR ( e.etype in (1003,2003) AND e.interpretation IN (2,4) )
      AND rownum < 2;
   Return v_compound_element_count;
 End is_Compound;

Now, one last final run against the new has_compound_curves PL/SQL function….

 anonymous block completed
 Run1 ran in 3 hsecs
 Run2 ran in 90 hsecs
 Run 1 ran in 3.33 % of the time
 	
 Name                                  Run1        Run2        Diff
 LATCH.cache buffers chains             128         112         -16
 STAT...undo change vector size       2,140       2,212          72
 STAT...CPU used by this sessio           5          90          85
 STAT...recursive cpu usage               2          90          88
 STAT...redo size                     2,860       3,004         144
 STAT...calls to get snapshot s           1       1,001       1,000
 STAT...execute count                     1       1,001       1,000
 STAT...recursive calls                   1       1,003       1,002
 LATCH.shared pool                        0       1,029       1,029
 LATCH.SQL memory manager worka           6       2,006       2,000
 LATCH.library cache lock                 1       2,003       2,002
 LATCH.library cache pin                  4       2,007       2,003
 LATCH.library cache                      5       3,017       3,012
 LATCH.row cache objects                 12      15,024      15,012
 
 Run1 latches total versus runs -- difference AND pct
         Run1        Run2        Diff          Pct
          180      25,246      25,066           1%

Again, not much change.

Let’s try reordering the geometry so the compound element is first…

 select sdo_aggr_union(sdoaggrtype(geom,0.05)) as geo m from (
 select rin,geom from (
 select rownum as rin,e.geometry as geom
 from table(geom.extractelementspiped(MDSYS.SDO_GEOMETRY(2006,null,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1,5,2,1,9,2,1,17,4,2,17,2,1,23,2,2),MDSYS.SDO_ORDINATE_ARRAY(50,15,55,15,60,15,65,15,10,25,20,30,25,25,30,30,110,178,110,175,120,175,120,178,115,180,110,178)),0) ) e
 ) order by rin desc );
  
 GEOM
 -----------------------------------------
 MDSYS.SDO_GEOMETRY(2006,null,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,4,2,1,2,2,5,2,1,13,2,1,17,2,1,21,2,1),MDSYS.SDO_ORDINATE_ARRAY(120,178,115,180,110,178,110,175,120,175,120,178,60,15,65,15,50,15,55,15,10,25,20,30,25,25,30,30))
 
 1 row selected

And run the new geometry against the new has_compound_curves PL/SQL function….

 set serveroutput on size unlimited;
 declare 
   v_geom mdsys.sdo_geometry := MDSYS.SDO_GEOMETRY(2006,null,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,4,2,1,2,2,5,2,1,13,2,1,17,2,1,21,2,1),MDSYS.SDO_ORDINATE_ARRAY(120,178,115,180,110,178,110,175,120,175,120,178,60,15,65,15,50,15,55,15,10,25,20,30,25,25,30,30));
 begin
  runstats_pkg.rs_start;
  test_compound(v_geom,0);
  runstats_pkg.rs_middle;
  test_compound(v_geom,1);
  runstats_pkg.rs_stop(10);
 end;
 /
 
 anonymous block completed
 Run1 ran in 2 hsecs
 Run2 ran in 92 hsecs
 Run 1 ran in 2.17 % of the time
  
 Name                                  Run1        Run2        Diff
 STAT...session logical reads            26          38          12
 LATCH.cache buffers chains             137         105         -32
 STAT...recursive cpu usage               0          91          91
 STAT...CPU used by this sessio           2          93          91
 STAT...undo change vector size       2,076       2,204         128
 STAT...redo size                     2,724       2,996         272
 STAT...recursive calls                   1       1,001       1,000
 LATCH.shared pool                        0       1,000       1,000
 STAT...calls to get snapshot s           1       1,001       1,000
 STAT...execute count                     1       1,001       1,000
 LATCH.library cache pin                  4       2,003       1,999
 LATCH.library cache lock                 0       2,000       2,000
 LATCH.SQL memory manager worka           6       2,006       2,000
 LATCH.library cache                      4       3,003       2,999
 LATCH.row cache objects                 12      15,012      15,000
 
 Run1 latches total versus runs -- difference AND pct
         Run1        Run2        Diff          Pct
          190      25,152      24,962           1%

Reorganising the SDO_GEOMETRY had little effect on the SQL algorithm.

Even though I always suspected the issue was and is the overhead of setting up multiple SQL calls, I still want to complete the last of the modifications to the SQL function.

 create or replace
 Function is_Compound(p_elem_info in mdsys.sdo_elem_info_array)
   return integer
 Is
   v_compound_element_count integer := 0;
 Begin
   SELECT 1 as c_element_count
     INTO v_compound_element_count
     FROM (SELECT trunc((rownum - 1) / 3,0) as id,
                  sum(case when mod(rownum,3) = 1 then sei.column_value else null end) as offset,
                  sum(case when mod(rownum,3) = 2 then sei.column_value else null end) as etype,
                  sum(case when mod(rownum,3) = 0 then sei.column_value else null end) as interpretation
             FROM TABLE(p_elem_info) sei
            GROUP BY trunc((rownum - 1) / 3,0)
          ) e
    WHERE ( ( e.etype = 2 AND e.interpretation = 2 /* circular arc*/ )
         OR ( e.etype in (4,1005,2005) )
         OR ( e.etype in (1003,2003) AND e.interpretation IN (2/*circular arc*/,4/*circle*/) )
          )
      AND rownum < 2;
   Return 1;
   Exception
     When NO_DATA_FOUND Then
       Return 0;
 End is_Compound;

Since I have decided to handle etype 4,1005 and 2005 elements, I should change the has_compound_curves() function. Since I am changing it I will change the function to have it accept and mdsys.sdo_elem_info_array as does the is_compound
() function.

 create or replace
 function has_compound_curves ( elem_in in mdsys.sdo_elem_info_array) 
   return integer 
 as
   etype number;
   interpret number;
 begin
   for i in elem_in.first .. elem_in.last loop
     case
       when mod(i,3) = 1 then null; -- continue;
       when mod(i,3) = 2 then etype := elem_in(i);
       when mod(i,3) = 0 
       then
         interpret := elem_in(i);
         if( (etype = 2 and interpret = 2) 
             or 
             ( etype in (4,1005,2005) )
             or 
             (etype in (1003,2003) and interpret in (2,4))
           ) then
           return 1;
         end if;
     end case;
   end loop;
   return 0;
 end has_compound_curves;
  
 create or replace
 procedure test_compound(p_elem in mdsys.sdo_elem_info_array,
                         p_is   in integer)
 As
   v_OK          integer;
   v_is_compound boolean := case when p_is > 0 then true else false end;
 Begin
   For i IN 1..1000 LOOP
     If ( v_is_compound ) Then
       v_OK := is_compound(p_elem);
     Else
       v_OK := has_compound_curves(p_elem);
     End If;
   End Loop;
 end test_compound;

Testing reveals:

 set serveroutput on size unlimited;
 declare 
   v_geom mdsys.sdo_geometry := MDSYS.SDO_GEOMETRY(2006,null,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,4,2,1,2,2,5,2,1,13,2,1,17,2,1,21,2,1),MDSYS.SDO_ORDINATE_ARRAY(120,178,115,180,110,178,110,175,120,175,120,178,60,15,65,15,50,15,55,15,10,25,20,30,25,25,30,30));
 begin
  runstats_pkg.rs_start;
  test_compound(v_geom.sdo_elem_info,0);
  runstats_pkg.rs_middle;
  test_compound(v_geom.sdo_elem_info,1);
  runstats_pkg.rs_stop(10);
 end;
 /
 anonymous block completed
 Run1 ran in 0 hsecs
 Run2 ran in 90 hsecs
 Run 1 ran in 0 % of the time
 
 Name                                  Run1        Run2        Diff
 LATCH.cache buffers chains             112         126          14
 LATCH.channel operations paren           0          14          14
 LATCH.checkpoint queue latch             0          16          16
 STAT...undo change vector size       2,076       2,144          68
 STAT...recursive cpu usage               2          92          90
 STAT...CPU used by this sessio           2          92          90
 STAT...redo size                     2,724       2,924         200
 STAT...calls to get snapshot s           1       1,001       1,000
 STAT...execute count                     1       1,001       1,000
 STAT...recursive calls                   1       1,003       1,002
 LATCH.shared pool                        0       1,029       1,029
 LATCH.library cache lock                 8       2,003       1,995
 LATCH.library cache pin                 11       2,007       1,996
 LATCH.SQL memory manager worka           6       2,073       2,067
 LATCH.library cache                     20       3,018       2,998
 LATCH.row cache objects                 12      15,024      15,012
 
 Run1 latches total versus runs -- difference AND pct
         Run1        Run2        Diff          Pct
          178      25,343      25,165           1%

Conclusion

It does not appear to matter how much effort we put in to making the SQL version of is_compound() more efficient, the non-SQL version will beat us every time. The performance difference is enormous with the gap, after having been close a small amount, immediately widened by removing of the need to “dereference” the sdo_elem_info_array (this reminds me of the performance difference I extracted from my VB6 library when I removed as much “dereferencing” of OraObjects in runtime).

Discussion

First off, even when you use SQL in a function, it doesn’t hurt to do some
profiling and improvement of its performance. In this case what I thought
was a pretty simple, fast piece of SQL, turned out to be capable of being
improved.

Secondly, why do I use SQL when I could use pure programmatic PL/SQL? I started out using programmatic SQL (as Pro Oracle Spatial uses) but I generally found:

  1. The code harder to write when one tries to handle 2, 3 and 4D data in the algorithm;
  2. The time to complete a standard compile-debug-test cycle longer and more complex than using pure SQL;
  3. The code required more documentation.

The thing I liked about using SQL is that it is:

  1. Easier and faster executing a single loop of the standard compile-debug-test cycle;
  2. Declarative and capable of being given to others for executing using a variety of tools (SQLPlus, SQL Developer, TOAD etc);
  3. Self-documenting because of its declarative nature;
  4. Finally, I have always assumed that SQL execution is the Sine qua non (without which there is nothing) of being an Oracle database!

However easier an algorithm is to write in SQL (and not all are), when one has it running correctly (the goal of all programming), if the algorithm is, as is said in project management, “on the critical path”, then it has to be converted to straight PL/SQL (as my commentator has done).

Of course I always meant to test that my SQL approach is NOT orders of magnitude slower than non-SQL approaches. Sadly, my simple testing above shows that that is not the case (well, at least in this case). Obviously the efficiency of one approach versus another depends on how it is used and the sensitivity of the process to the processing time of the function. But since a user has discovered this then, perhaps, now is the time to swap out the SQL algorithm for a much faster PL/SQL based on i.e., his!

Thanks very much to him for bringing my attention to the speed issues of the original isCompound().

I hope others find this article useful.

Leave a Reply

Your email address will not be published. Required fields are marked *