Generating multi-points from single point records in Oracle Spatial

Introduction
The background to this blog article comes from some work I did for GeoScience Australia in May-June 2007. This work involved the creation of a ‘distribution’ database from a ‘production’ database using Oracle replication with presentation by UMN MapServer.

The ‘production’ database included a single table (containing around 70 million individual point records that described around 13,000 individual hydrographic surveys. Dynamic searching and visualising individual surveys in the base table was considered to be too slow to be used in the interactive mapping that was required for the data discovery web application so ‘surrogate’ vector linestrings existed against which searching and visualisation would occur.

Surrogate Survey Line vs Actual Survey Points

No one had yet been able to implement search and visualisation against the base point data: until I came along.

I have not included a description of the original tables in this blog; what I have done is create a set of tables and scripts to generate random data which are included in a download for you to run yourself. Also, to simplify the material for this article, I will not discuss how the original code handles surveys with more 3D points than Oracle”s Sdo_Geometry Sdo_Ordinate_Array can hold (which is ( 1048576 ” 1 ) / 3 = 349,525 coordinates): all generated data will not break this limit.

The rest of this article describes the testing that was done testing different methods for the creation of single multi-point records from the base single point data. The article also shows how Oracle”s PL/SQL profiling package was used in improving the performance of the actual code.

Note that all timings in this article are based on Oracle”s DBMS_PROFILER package whose use will be described in another blog article.
Creating and Populating the base table.
The base table holding the individual survey points is declared as follows :

CREATE TABLE single beam_bathumetry (
 POINTNO     Integer,
 SB_ENO      Integer,
 DATETIME    Date,
 Geom        MDSYS.SDO_Geometry
)
PCTFREE 0 TABLESPACE USERS NOLOGGING;

Note, sb_eno is the enterprise identifier of a single singlebeam survey; the pointno is the survey point within a singlebeam survey. Normally sb_eno,pointno is the primary key but, for simplicity, the generated pointno will actually be unique across all surveys. To populate this table we can do this:

Prompt Define some constants and variable ...
DEFINE NUM_SB_ENOS=&1
DEFINE NUM_PTS=&2
Prompt Create the data ...
DECLARE
     v_SRID number := 8311; -- GDA94
     v_MINX number := 105;
     v_MAXX number := 179;
     v_MINY number := -70; 
     v_MAXY number := -5;
     v_MINZ number := -1000;
     v_MAXZ number := 1000;
BEGIN
     INSERT /*+APPEND*/
       INTO single beam_bathymetry
            (POINTNO,SB_ENO,datetime,GEOM)
     SELECT rownum,
            TRUN(dbms_random.value(1,&&NUM_SB_ENOS. + 1)),
            SYSDATE,
            MDSYS.SDO_GEOMETRY(3001,
                               v_SRID,
                  MDSYS.SDO_POINT_TYPE(
                        ROUND(dbms_random.value(v_MINX,v_MAXX),2,),
                        ROUND(dbms_random.value(v_MINY,v_MAXY),2,), 
                        ROUND(dbms_random.value(v_MINZ,v_MAXZ),2,)),
                  NULL,NULL)
    FROM DUAL
 CONNECT BY LEVEL <= &&NUM_PTS.;
 COMMIT;
END;
/
SHOW ERRORS
ALTER TABLE simglebeam_bathymetry LOGGING;

This data does not need to be indexed so this is not covered.

Target Tables

The target table that will hold our multipoint sdo_geometry objects is:

DROP TABLE single beam_mPOINT;
CREATE TABLE single beam-mpoint (
  SB_ENO          NUMBER(8),
  POINTCOUNT      NUMBER(8),
  GEOM_MPOINT     MDSYS.SDO_GEOMETRY );

Generating the MultiPoint objects

Method One: Simple is Best

“Forget about the maneuvers. Go straight at them.” Admiral Horatio Nelson

Of course, in a perfect world, the generation of the multipoint sdo_geometries should be as simple as (the APPEND hint provides a more efficient ‘direct path’ insert into the table):

INSERT /*+APPEND */ INTO single beam_mpoint
SELECT SB_,COUNT(*),
       SDO_AGGR_UNION{MDSYS.SDOAGGRTYPE(a.geom,0.05))
  FROM singlebeam_bathymetry a
 GROUP BY SB_ENO;

However, except for small datasets, this method is slow (some numbers justifying this assertion will be provided later).

Method Two: ‘Appendix D’

The Oracle Spatial Team”s recommended approach for the most efficient use of the SDO_AGGR_UNION function is documented in Section D.4 of Appendix D of the Oracle Spatial User”s Guide and Reference 10g Release 2 (10.2) documentation.

INSERT /*+APPEND */ INTO singlebeam_mpoint SELECT
SDO_AGGR_UNION(MDSYS.SDOAGGRTYPE(ageom,0.05)) FROM (SELECT sb_eno,
SDO_AFFR_UNION(MDSYS.SDOAFFRTYPE)ageom,0.05)) ageom FROM (SELECT sb_eno,
SDO_AFFR_UNION(MDSYS.SDOAFFRTYPE)ageom,0.05)) ageom FROM (SELECT sb_eno,
SDO_AFFR_UNION(MDSYS.SDOAFFRTYPE)ageom,0.05)) ageom FROM (SElECT a.SB_ENO,
SDO_AFFR_UNION(MDSYS.SDOAFFRTYPE)ageom,0.05)) ageom FROM
mv_singlebeam_bathymetry a GROUP BY a.SB_ENO, mod(rownum,16) ) GROUP BY
SB_ENO, mod (rownum, 8) ) GROUP BY SB_ENO, mod (rownum, 4 ) Group BY SB_ENO,
mod (rownum, 2) ) GROUP BY SB_ENO;

The general idea here is to avoid expensive ‘context swapping’ in the Oracle aggregation engine by implementing a form of ‘parallelisation’. The numbers presented later will show that this is effective but that it still does not perform fast enough compared to other methods.

Method Three: Pipelined Function

During my testing at GeoScience Australia I had the idea of using PL/SQL to generate the multi-points using a well-known, scalable, Oracle collection aggregation technique. The best way to describe this is by example in the following PL/SQL (the code includes a number of standard optimisations familiar to PL/SQL developers i.e., the use of CURSORs, BULK COLLECTion, FORALL “array” insertion, ROWTYPEs etc). It is the SQL statement that is of most interest and has been highlighted in the code.

This first method included the use of a pipelined function (pipelined function use will be covered in another blog) which turned an Sdo_Point object into an Sdo_Ordinate_Array object which could then be BULK COLLECTED in the highlighted SQL:

CREATE OR REPLACE FUNCTION asOrdinates( p_point_geom IN MDSYS.SDO_GEOMETRY)
           RETURN MDSYS.SDO_ORDINATE_ARRAY PIPELINED
  IS
  BEGIN
    PIPE ROW (p_point_geom.sdo_point.x);
    PIPE ROW (p_point_geom.sdo_point.y);
    PIPE ROW (p_point_geom.sdo_point.z);
    RETURN;
  END;
/
SHOW ERRORS
DECLARE
  TYPE t_recs        IS TABLE OF singlebeam_mpoint%ROWTYPE INDEX BY PLS_INTEGER;
  v_mpoint_recs      t_recs;
  v_3D_ordinates     mdsys.sdo_ordinate_array;
  v_mpoint_rec       singlebeam_mpoint%ROWTYPE;
  v_rec_count        PLS_INTEGER := 0;
  v_commit           PLS_INTEGER := 50;
  CURSOR c_eno_list
  Is
   SELECT sbl.SB_ENO,
          COUNT(*) AS POINTCOUNT
     FROM singlebeam_bathymetry sbl
    GROUP BY sbl.SB_ENO;
BEGIN
  FOR rec IN c_eno_list LOOP
     SELECT b.*
       BULK COLLECT INTO v_3D_ordinates
       FROM singlebeam_bathymetry a,
            TABLE(asOrdinates(a.geom)) b
      WHERE a.sb_eno = rec.sb_eno
      ORDER BY a.pointno;
     v_mpoint_rec.sb_eno      := rec.sb_eno;
     v_mpoint_rec.pointcount  := rec.pointcount;
     v_mpoint_rec.geom_mpoint := MDSYS.SDO_GEOMETRY(3005,8311,NULL,
                                        MDSYS.SDO_ELEM_INFO_ARRAY(
                                              1,1,v_mpoint_rec.pointcount ),
                                        v_3D_ordinates);
     v_rec_count := v_rec_count + 1;
     v_mpoint_recs(v_rec_count) := v_mpoint_rec;
     IF ( v_rec_count &gt; v_commit ) THEN
        FORALL i IN 1..v_rec_count
          INSERT INTO singlebeam_mpoint VALUES v_mpoint_recs(i);
        COMMIT;
        v_rec_count := 0;
     END IF;
  END LOOP;
  IF ( v_rec_count &gt; 0 ) THEN -- Write out any left over ...
     FORALL i IN 1..v_rec_count
       INSERT INTO singlebeam_mpoint VALUES v_mpoint_recs(i);
     COMMIT;
     v_rec_count := 0;
  END IF;
END;
/
SHOW ERRORS

The BULK COLLECT aggregates the individual Sdo_Ordinate_Arrays returned by the function into a single array for the survey. What I discovered was this approach was substantially faster than the previous two methods.

After getting this to work, I then realised I didn”t need the Function as is described in the next method.

Method Four: PL/SQL SDO_Ordinate_Array aggregation

The next improvement to the algorithm was to get rid of the custom “asOrdinates” function in favour of using the Oracle Spatial type library directly.

...
BEGIN
...
     SELECT ords.*
       BULK COLLECT INTO v_3D_ordinates
       FROM singlebeam_bathymetry a,
            TABLE(mdsys.sdo_ordinate_array(a.geom.sdo_point.x,
                                           a.geom.sdo_point.y,
                                           a.geom.sdo_point.z) ) ords
      WHERE a.sb_eno = rec.sb_eno
      ORDER BY a.pointno;
...
END;
/
SHOW ERRORS

All the heavy lifting in this approach is done by directly referencing the individual ordinates coded in the geometry”s 3D Sdo_Point object and converting them into an Sdo_Ordinate_Array which can then be BULK COLLECTED into a resultant Sdo_Ordinate_Array.

This method was the fastest of all the methods used. However, because some surveys where larger than can be represented in an Sdo_Ordinate_Array, I did look at creating “custom” point arrays that did not have the million ordinate restriction that is in the MDSYS.SDO_ORDINATE_ARRAY object.

Method Five: Custom array of SDO_Point_Type

Another method using aggregation involved the use of a custom array of SDO_Point_Type as in the following PL/SQL. (One of the advantages of this approach is the fact that the aggregation SELECT statement can be used for all multi-point aggregation regardless as to whether the resulting object breaks the SDO_Ordinate_Array limit; a negative is the need to turn the array of SDO_Point_Type back into array bound “safe” SDO_Ordinate_Arrays on insert. But the method is illustrative of what can be done if one “thinks outside the square”.)

DECLARE
  TYPE t_recs           IS TABLE OF singlebeam_mpoint%ROWTYPE INDEX BY PLS_INTEGER;
  /* Create array of SDO_POINT_TYPE than is bigger than the ordinary SDO_ORDINATE_ARRAY and larger than max number points per survey */
  TYPE t_3D_Point_Array IS VARRAY(4194304) OF MDSYS.SDO_POINT_TYPE; 
  v_3D_Points           t_3D_Point_Array;
  v_mpoint_recs      t_recs;
  v_3D_ordinates     mdsys.sdo_ordinate_array;
  v_mpoint_rec       singlebeam_mpoint%ROWTYPE;
  v_index            PLS_INTEGER := 0;
  v_rec_count        PLS_INTEGER := 0;
  v_commit           PLS_INTEGER := 50;
  CURSOR c_eno_list
  Is
   SELECT sbl.SB_ENO,
          COUNT(*) AS POINTCOUNT
     FROM singlebeam_bathymetry sbl;
BEGIN
  FOR rec IN c_eno_list LOOP
     SELECT a.geom.sdo_point
       BULK COLLECT INTO v_3D_Points
       FROM singlebeam_bathymetry a
      ORDER BY a.pointno;
     /* To insert we need the data to be held in the SDO_Ordinate_Array */
     v_index := 0;
     FOR v_point IN v_3D_Points.FIRST..v_3D_Points.LAST LOOP
         v_index := v_index + 1;
         v_3D_ordinates(v_index) := v_3D_Points(v_point).X;
         v_index := v_index + 1;
         v_3D_ordinates(v_index) := v_3D_Points(v_point).Y;
         v_index := v_index + 1;
         v_3D_ordinates(v_index) := v_3D_Points(v_point).Z;
     END LOOP;
     v_mpoint_rec.sb_eno      := rec.sb_eno;
     v_mpoint_rec.pointcount  := rec.pointcount;
     v_mpoint_rec.geom_mpoint := MDSYS.SDO_GEOMETRY(3005,8311,NULL,
                                        MDSYS.SDO_ELEM_INFO_ARRAY(
                                              1,1,v_mpoint_rec.pointcount ),
                                        v_3D_ordinates);
     v_rec_count := v_rec_count + 1;
     v_mpoint_recs(v_rec_count) := v_mpoint_rec;
     IF ( v_rec_count &gt; v_commit ) THEN
        FORALL i IN 1..v_rec_count
          INSERT INTO singlebeam_mpoint VALUES v_mpoint_recs(i);
        COMMIT;
        v_rec_count := 0;
     END IF;
  END LOOP;
  IF ( v_rec_count &gt; 0 ) THEN -- Write out any left over ...
     FORALL i IN 1..v_rec_count
       INSERT INTO singlebeam_mpoint VALUES v_mpoint_recs(i);
     COMMIT;
     v_rec_count := 0;
  END IF;
END;
/
SHOW ERRORS

Timing

To ensure that the performance of all four methods can be compared, the first two methods were “wrapped” by PL/SQL such that the SELECT statement was executed once for each sb_eno. Also, the final methods went through a few different interations in the handling of those surveys whose points broke the SDO_Ordinate_Array limit (see the actual SQL for how this was handled).

The first chart shows how substantially different the “native” versus “custom” methods can be: one cannot see the actual timings for the “custom” methods:

Performance of Oracle Aggregate vs PL/SQL Methods

When one removes the “native” methods, one can see more clearly the relative performance of the “custom” methods.

Performance of Individual PL/SQL Methods

Conclusion

Replication of the full, 67 million row raw point data, to a 13,000 row multi-point table was achieved in less than 50 minutes of processing. This was a substantial improvement and as a result of the work GeoScience Australia moved to a database replication strategy that included the efficient generation of multi-point survey data from the raw observations. This also gives the end-user the ability to work with the actual points in a web application instead of the surrogate averaged linework resulting in more accurate analysis and data discovery.

The “take home message” for users of Oracle Spatial/Locator should look to the power of the whole database to solve data warehousing and analysis problems and not just the Spatial packages themselves.

A fully worked example showing the fastest method for you to try youself is available here