Spatial Pipelining

This technical blog article describes the benefits in using pipelined functions in Oracle to manipulate sdo_geometry objects. In doing so it will describe a number of functions available in the COGO and GEOM packages in the PL/SQL packages I make available for a free download from this site.

What I will do is introduce a realistic “business need” and then show how to construct a non-pipelined function that can be used in implementing that need. I will also create a pipelined version of the function and “compare and contrast” the two approaches in terms of memory use and performance.

Business Need

Imagine a company has a large Oracle Spatial database in which are stored land parcel (land record) polygons. The company would like to be able to display the bearings and distances (metes and bounds) of each boundary of each polygon dynamically by not relying on a second layer of sdo_geometry linestrings. This is displayed pictorially as follows.

!/images/16.png (Business Requirement: Dynamic bearings and distances)!

How can we achieve this?

Steps to create table

First let’s start by creating a table and populating it.

 CREATE TABLE land_parcel (
  gid  INTEGER,
  geom SDO_GEOMETRY
 );
 INSERT INTO land_parcel
 VALUES(1,
        SDO_GEOMETRY(2003, NULL, NULL,
                     SDO_ELEM_INFO_ARRAY(1,1003,1),
                     SDO_ORDINATE_ARRAY(100,0,400,0,400,150,250,100,250,200,400,150,400,300,100,300,100,0)));
 INSERT INTO user_sdo_geom_metadata
 VALUES('LAND_PARCEL','GEOM',SDO_DIM_ARRAY(SDO_DIM_ELEMENT('X',0,1000,0.05), SDO_DIM_ELEMENT('Y',0,1000,0.05)),NULL);
 COMMIT;

Vector Elements

Now we need a method for accessing the individual lines that make up our sdo_geometry polygon. For this we will need a function that takes the polygon and splits it up into its constituent vectors (where a vector is defined as being a linestring made up of only a starting and ending vertex).

First we create a vector data structure as follows:

 CREATE OR REPLACE TYPE Coord2DType AS OBJECT (
   x NUMBER,
   y NUMBER
 );
 CREATE OR REPLACE TYPE Vector2DType AS OBJECT (
   startCoord Coord2DType,
   endCoord   Coord2DType
 );

Once defined we can now create a set of vectors (to hold all those in any one land parcel polygon) as follows:

 CREATE OR REPLACE TYPE Vector2DSetType AS TABLE OF Vector2DType;

Finally, having created our data structures we can create our function. The one I created in my GEOM package is GetVector2D. Its essentials (for a full implementation see my packages) are:

 FUNCTION GetVector2D ( p_geometry IN mdsys.sdo_geometry)
   RETURN CODESYS.Vector2DSetType DETERMINISTIC;
     vectors Vector2DSetType := Vector2DSetType();
 BEGIN
   ...
   WHILE v_partToProcess Loop
     ...
     IF v_vertex = 1 THEN
       vectors.EXTEND;
       v_vector := vectors.LAST;
       vectors(v_vector) := Vector2DType(Coord2DType(-1,1),Coord2DType(-1,1));
       vectors(v_vector).startCoord.x := v_coord.x;
       vectors(v_vector).startCoord.y := v_coord.y;
     ELSE
       vectors(v_vector).endCoord.x := v_coord.x;
       vectors(v_vector).endCoord.y := v_coord.y;
       vectors.EXTEND;
       v_vector := vectors.LAST;
       vectors(v_vector) := Vector2DType(Coord2DType(-1,1),Coord2DType(-1,1));
       vectors(v_vector).startCoord.x := v_coord.x;
       vectors(v_vector).startCoord.y := v_coord.y;
     END IF;
     ...
   END LOOP;
   ...
   RETURN vectors;
 END;

Finally, we will need two functions that, given a single vector, can return a bearing and distance. I won’t go into the details of how to do this, all I will do is point out that, in my COGO package, are two functions:

 CREATE OR REPLACE PACKAGE COGO
 AS
     FUNCTION Bearing( dE1 in number,
                       dN1 in number,
                       dE2 in number,
                       dN2 in number)
     RETURN NUMBER DETERMINISTIC;
     FUNCTION Distance( dE1 in number,
                        dN1 in number,
                        dE2 in number,
                        dN2 in number)
     RETURN NUMBER DETERMINISTIC;
 ...
 END COGO;

(In the CONSTANTS package is a definition of PI which we will also use.)

View

Now we can construct a view that will take a land parcel (or set of land parcels) and return the vectors that compose it. From these vectors we will create sdo_geometry linestrings and also columns containing the bearings and distances computed from those vectors.

 CREATE OR REPLACE VIEW metes_and_bounds
 AS
 SELECT rownum AS gid,
        codesys.Cogo.DD2DMS(
               codesys.Cogo.Bearing(startx,starty,endx,endy)
               *
               (180/codesys.Constants.PI) )
               AS bearing,
        ROUND(codesys.Cogo.Distance(startx,starty,endx,endy),2)
               AS distance,
        MDSYS.sdo_geometry(2002,NULL,NULL,
               MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),
               MDSYS.SDO_ORDINATE_ARRAY(startx,startY,endX,endY))
               AS geometry
   FROM ( SELECT DISTINCT c.StartCoord.X AS startX,
                          c.StartCoord.Y AS startY,
                          c.EndCoord.X AS endX,
                          c.EndCoord.Y AS endY
            FROM land_parcel a,
                 TABLE(codesys.Geom.GetVector2D(a.geom)) c
        );

Note that all the “heavy lifting” is done by the GetVector2D function that returns a set of vectors into the TABLE disaggregator.

Finally, if the land_parcel table held a lot of records, we should really consider using a fast refreshable materialized view instead and create a spatial index over its geom column. This is because we cannot create a function based index over the above view.

Pipelining and Pipelined Functions

OK, so we can now implement our business requirement. Why do we need to go further?

We should always be on the lookout for performance improvements even in these days of fast computer hardware. This is because, in any organisation, database servers are shared across multiple databases and accessed by multiple applications and users. Anything we can do to reduce our application “footprint” helps all who use the resource. Pipelining is one method of improving the performance and memory use of a function.

Firstly, what is a pipelined function?

Tom Kyte says in this article that:

Pipelined functions are simply code that you can pretend is a database table.
Pipelined functions give you the (amazing, to me) ability to use

SELECT * FROM <PLSQL_FUNCTION>;


A pipelined function needs to return a collection type …

In fact it is

SELECT * FROM TABLE(<PLSQL_FUNCTION>)

In the metes_and_bounds view above we have used a function that returns a collection type (Vector2DSetType) inside a TABLE() function. This is allowed, but this is still not a pipelined function!

Let’s turn it into a pipelined function and then I can explain the difference.

 FUNCTION GetVector2D ( p_geometry IN mdsys.sdo_geometry)
   RETURN CODESYS.Vector2DSetType PIPELINED;
     v_vector         codesys.Vector2DType := codesys.Vector2DType(
                                      codesys.Coord2DType(codesys.Constants.c_MinVal,codesys.Constants.c_MinVal),
                              codesys.Coord2DType(codesys.Constants.c_MinVal,codesys.Constants.c_MinVal));
 BEGIN
   ...
   WHILE v_partToProcess Loop
     ...
     IF v_vertex = 1 THEN
       v_vector.startCoord.x := v_coord.x;
       v_vector.startCoord.y := v_coord.y;
     ELSE
       v_vector.endCoord.x := v_coord.x;
       v_vector.endCoord.y := v_coord.y;
       PIPE ROW(v_vector); 
       v_vector := vectors.LAST;
       v_vector.startCoord.x := v_coord.x;
       v_vector.startCoord.y := v_coord.y;
     END IF;
     ...
   END LOOP;
   ...
   RETURN;
 END;

The three points of difference are:

  • The definition of the function uses the PIPELINED keyword rather than the DETERMINISTIC keyword.
  • Instead of allocating memory as in the use of EXTEND on the
    private collection variable vectors (EXEND will allocate memory) in the pipelined function a single vector is simply pushed into the pipeline as it is constructed.
  • In the pipelined function no variable is returned using the final RETURN statement; in the non-pipelined function the local collection variable, vectors, which holds all vector objects constructed during function execution are return as one.

So, fairly obviously, a function is pipelined if it uses the PIPELINED keyword and pushes what it creates into the pipe via the PIPE ROW() statement as they are created.

The big thing to notice is that very little memory is created or used by the pipelined function whereas the non-pipelined function has to allocated memory to hold all the vector objects until the function’s end. Because the non-piplined function waits until the end before it can return its results, the calling SELECT statement itself if forced to wait before it can do any processing: not so with the pipelined function. The Oracle 10gR2 help confirms this:

Rows from a collection returned by a table function can also be pipelined, that is, iteratively returned as they are produced instead of in a batch after all processing of the table function’s input is completed.

That help also outlines the benefits of pipelining:

Streaming, pipelining, and parallel execution of table functions can improve performance:

  • By enabling multithreaded, concurrent execution of table functions
  • By eliminating intermediate staging between processes
  • By improving query response time: With non-pipelined table functions, the entire collection returned by a table function must be constructed and returned to the server before the query can return a single result row. Pipelining enables rows to be returned iteratively, as they are produced. This also reduces the memory that a table function requires, as the object cache does not need to materialize the entire collection.
  • By iteratively providing result rows from the collection returned by a table function as the rows are produced instead of waiting until the entire collection is staged in tables or memory and then returning the entire collectio

Demonstration of Performance Improvements

With the land_parcel table above we don’t have enough objects to demonstrate the performance improvements the pipelined function has over the non-pipelined function. So, I used some customer data (I have permission for this).

 SELECT count(*)
   FROM parcel;
   COUNT(*)
 ----------
      57453
 CREATE TABLE pipelined_version
 AS
 SELECT rownum AS gid,
        MDSYS.SDO_GEOMETRY(2002,NULL,NULL,
        MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),
        MDSYS.SDO_ORDINATE_ARRAY(startx,startY,endX,endY))
        AS geometry
   FROM ( SELECT DISTINCT
                 c.StartCoord.X AS startX,
                 c.StartCoord.Y AS startY,
                 c.EndCoord.X AS endX,
                 c.EndCoord.Y AS endY
            FROM ( SELECT geometry
                     FROM SP_PARCEL
                 ) a,
                 TABLE(codesys.Geom.Get{Piped}Vector2D(a.geometry)) c
         );

Note that I run this statement twice with the braces {} removed.

 SELECT COUNT(*)
   FROM PIPELINED_VERSION;
   COUNT(*)
 ----------
     763916

Performance numbers were:

Function TimeInSeconds
GetVector2D 02:18.13
GetPipedVector2D 00:47.90

In summary, pipelining improved the performance of the operation by 287% ( ( 1 / ( 48 / 138 ) * 100 ). Now that is quite an improvement even for this small amount of data. I have used this GetPipedVector2D function on some huge spatial datasets and have seen such substantial performance improvements I now use them in preference to previous techniques (I have left the previous implementations in my PL/SQL packages as occasionally they are useful).

I hope this little article helped whet your appetite for pipelined table functions in Oracle Spatial.