Top 5 Recent Articles
ARTICLES CATEGORIES
- Algorithms (22)
- All (399)
- Biography (1)
- Blog (44)
- Business Requirements (1)
- Commentary (1)
- Conversion (2)
- Customers (2)
- Data Models (1)
- Education (2)
- GeoRaptor (13)
- GPS (1)
- Image Processing (2)
- Import Export (8)
- Licensing (2)
- LiDAR (1)
- Linear Referencing (4)
- Manifold GIS (3)
- Mapping (1)
- MySQL Spatial (7)
- Networking and Routing (including Optimization) (5)
- Open Source (18)
- Oracle Spatial and Locator (194)
- Partitioning (1)
- PostGIS (36)
- Projections (1)
- Published Articles (1)
- qGIS (1)
- Recommendations (1)
- Services (1)
- Software Change Log (1)
- Source Code (37)
- Space Curves (9)
- Spatial Database Functions (109)
- Spatial DB comparison (1)
- Spatial XML Processing (11)
- SQL Server Spatial (92)
- Standards (3)
- Stored Procedure (17)
- Tessellation or Gridding (10)
- Tools (2)
- Topological Relationships (1)
- Training (2)
- Triangulation (2)
Generating multi-points from single point records in Oracle Spatial
Introduction
The background to this blog article comes from some work I did for GeoScience Australia in May-June 2007. This work involved the creation of a ‘distribution’ database from a ‘production’ database using Oracle replication with presentation by UMN MapServer.
The ‘production’ database included a single table (containing around 70 million individual point records that described around 13,000 individual hydrographic surveys. Dynamic searching and visualising individual surveys in the base table was considered to be too slow to be used in the interactive mapping that was required for the data discovery web application so ‘surrogate’ vector linestrings existed against which searching and visualisation would occur.
No one had yet been able to implement search and visualisation against the base point data: until I came along.
I have not included a description of the original tables in this blog; what I have done is create a set of tables and scripts to generate random data which are included in a download for you to run yourself. Also, to simplify the material for this article, I will not discuss how the original code handles surveys with more 3D points than Oracle”s Sdo_Geometry Sdo_Ordinate_Array can hold (which is ( 1048576 ” 1 ) / 3 = 349,525 coordinates): all generated data will not break this limit.
The rest of this article describes the testing that was done testing different methods for the creation of single multi-point records from the base single point data. The article also shows how Oracle”s PL/SQL profiling package was used in improving the performance of the actual code.
Note that all timings in this article are based on Oracle”s DBMS_PROFILER package whose use will be described in another blog article.
Creating and Populating the base table.
The base table holding the individual survey points is declared as follows :
CREATE TABLE single beam_bathumetry ( POINTNO Integer, SB_ENO Integer, DATETIME Date, Geom MDSYS.SDO_Geometry ) PCTFREE 0 TABLESPACE USERS NOLOGGING;
Note, sb_eno is the enterprise identifier of a single singlebeam survey; the pointno is the survey point within a singlebeam survey. Normally sb_eno,pointno is the primary key but, for simplicity, the generated pointno will actually be unique across all surveys. To populate this table we can do this:
Prompt Define some constants and variable ... DEFINE NUM_SB_ENOS=&1 DEFINE NUM_PTS=&2 Prompt Create the data ... DECLARE v_SRID number := 8311; -- GDA94 v_MINX number := 105; v_MAXX number := 179; v_MINY number := -70; v_MAXY number := -5; v_MINZ number := -1000; v_MAXZ number := 1000; BEGIN INSERT /*+APPEND*/ INTO single beam_bathymetry (POINTNO,SB_ENO,datetime,GEOM) SELECT rownum, TRUN(dbms_random.value(1,&&NUM_SB_ENOS. + 1)), SYSDATE, MDSYS.SDO_GEOMETRY(3001, v_SRID, MDSYS.SDO_POINT_TYPE( ROUND(dbms_random.value(v_MINX,v_MAXX),2,), ROUND(dbms_random.value(v_MINY,v_MAXY),2,), ROUND(dbms_random.value(v_MINZ,v_MAXZ),2,)), NULL,NULL) FROM DUAL CONNECT BY LEVEL <= &&NUM_PTS.; COMMIT; END; / SHOW ERRORS ALTER TABLE simglebeam_bathymetry LOGGING;
This data does not need to be indexed so this is not covered.
Target Tables
The target table that will hold our multipoint sdo_geometry objects is:
DROP TABLE single beam_mPOINT; CREATE TABLE single beam-mpoint ( SB_ENO NUMBER(8), POINTCOUNT NUMBER(8), GEOM_MPOINT MDSYS.SDO_GEOMETRY );
Generating the MultiPoint objects
Method One: Simple is Best
“Forget about the maneuvers. Go straight at them.” Admiral Horatio Nelson
Of course, in a perfect world, the generation of the multipoint sdo_geometries should be as simple as (the APPEND hint provides a more efficient ‘direct path’ insert into the table):
INSERT /*+APPEND */ INTO single beam_mpoint SELECT SB_,COUNT(*), SDO_AGGR_UNION{MDSYS.SDOAGGRTYPE(a.geom,0.05)) FROM singlebeam_bathymetry a GROUP BY SB_ENO;
However, except for small datasets, this method is slow (some numbers justifying this assertion will be provided later).
Method Two: ‘Appendix D’
The Oracle Spatial Team”s recommended approach for the most efficient use of the SDO_AGGR_UNION function is documented in Section D.4 of Appendix D of the Oracle Spatial User”s Guide and Reference 10g Release 2 (10.2) documentation.
INSERT /*+APPEND */ INTO singlebeam_mpoint SELECT SDO_AGGR_UNION(MDSYS.SDOAGGRTYPE(ageom,0.05)) FROM (SELECT sb_eno, SDO_AFFR_UNION(MDSYS.SDOAFFRTYPE)ageom,0.05)) ageom FROM (SELECT sb_eno, SDO_AFFR_UNION(MDSYS.SDOAFFRTYPE)ageom,0.05)) ageom FROM (SELECT sb_eno, SDO_AFFR_UNION(MDSYS.SDOAFFRTYPE)ageom,0.05)) ageom FROM (SElECT a.SB_ENO, SDO_AFFR_UNION(MDSYS.SDOAFFRTYPE)ageom,0.05)) ageom FROM mv_singlebeam_bathymetry a GROUP BY a.SB_ENO, mod(rownum,16) ) GROUP BY SB_ENO, mod (rownum, 8) ) GROUP BY SB_ENO, mod (rownum, 4 ) Group BY SB_ENO, mod (rownum, 2) ) GROUP BY SB_ENO;
The general idea here is to avoid expensive ‘context swapping’ in the Oracle aggregation engine by implementing a form of ‘parallelisation’. The numbers presented later will show that this is effective but that it still does not perform fast enough compared to other methods.
Method Three: Pipelined Function
During my testing at GeoScience Australia I had the idea of using PL/SQL to generate the multi-points using a well-known, scalable, Oracle collection aggregation technique. The best way to describe this is by example in the following PL/SQL (the code includes a number of standard optimisations familiar to PL/SQL developers i.e., the use of CURSORs, BULK COLLECTion, FORALL “array” insertion, ROWTYPEs etc). It is the SQL statement that is of most interest and has been highlighted in the code.
This first method included the use of a pipelined function (pipelined function use will be covered in another blog) which turned an Sdo_Point object into an Sdo_Ordinate_Array object which could then be BULK COLLECTED in the highlighted SQL:
CREATE OR REPLACE FUNCTION asOrdinates( p_point_geom IN MDSYS.SDO_GEOMETRY) RETURN MDSYS.SDO_ORDINATE_ARRAY PIPELINED IS BEGIN PIPE ROW (p_point_geom.sdo_point.x); PIPE ROW (p_point_geom.sdo_point.y); PIPE ROW (p_point_geom.sdo_point.z); RETURN; END; / SHOW ERRORS DECLARE TYPE t_recs IS TABLE OF singlebeam_mpoint%ROWTYPE INDEX BY PLS_INTEGER; v_mpoint_recs t_recs; v_3D_ordinates mdsys.sdo_ordinate_array; v_mpoint_rec singlebeam_mpoint%ROWTYPE; v_rec_count PLS_INTEGER := 0; v_commit PLS_INTEGER := 50; CURSOR c_eno_list Is SELECT sbl.SB_ENO, COUNT(*) AS POINTCOUNT FROM singlebeam_bathymetry sbl GROUP BY sbl.SB_ENO; BEGIN FOR rec IN c_eno_list LOOP SELECT b.* BULK COLLECT INTO v_3D_ordinates FROM singlebeam_bathymetry a, TABLE(asOrdinates(a.geom)) b WHERE a.sb_eno = rec.sb_eno ORDER BY a.pointno; v_mpoint_rec.sb_eno := rec.sb_eno; v_mpoint_rec.pointcount := rec.pointcount; v_mpoint_rec.geom_mpoint := MDSYS.SDO_GEOMETRY(3005,8311,NULL, MDSYS.SDO_ELEM_INFO_ARRAY( 1,1,v_mpoint_rec.pointcount ), v_3D_ordinates); v_rec_count := v_rec_count + 1; v_mpoint_recs(v_rec_count) := v_mpoint_rec; IF ( v_rec_count > v_commit ) THEN FORALL i IN 1..v_rec_count INSERT INTO singlebeam_mpoint VALUES v_mpoint_recs(i); COMMIT; v_rec_count := 0; END IF; END LOOP; IF ( v_rec_count > 0 ) THEN -- Write out any left over ... FORALL i IN 1..v_rec_count INSERT INTO singlebeam_mpoint VALUES v_mpoint_recs(i); COMMIT; v_rec_count := 0; END IF; END; / SHOW ERRORS
The BULK COLLECT aggregates the individual Sdo_Ordinate_Arrays returned by the function into a single array for the survey. What I discovered was this approach was substantially faster than the previous two methods.
After getting this to work, I then realised I didn”t need the Function as is described in the next method.
Method Four: PL/SQL SDO_Ordinate_Array aggregation
The next improvement to the algorithm was to get rid of the custom “asOrdinates” function in favour of using the Oracle Spatial type library directly.
... BEGIN ... SELECT ords.* BULK COLLECT INTO v_3D_ordinates FROM singlebeam_bathymetry a, TABLE(mdsys.sdo_ordinate_array(a.geom.sdo_point.x, a.geom.sdo_point.y, a.geom.sdo_point.z) ) ords WHERE a.sb_eno = rec.sb_eno ORDER BY a.pointno; ... END; / SHOW ERRORS
All the heavy lifting in this approach is done by directly referencing the individual ordinates coded in the geometry”s 3D Sdo_Point object and converting them into an Sdo_Ordinate_Array which can then be BULK COLLECTED into a resultant Sdo_Ordinate_Array.
This method was the fastest of all the methods used. However, because some surveys where larger than can be represented in an Sdo_Ordinate_Array, I did look at creating “custom” point arrays that did not have the million ordinate restriction that is in the MDSYS.SDO_ORDINATE_ARRAY object.
Method Five: Custom array of SDO_Point_Type
Another method using aggregation involved the use of a custom array of SDO_Point_Type as in the following PL/SQL. (One of the advantages of this approach is the fact that the aggregation SELECT statement can be used for all multi-point aggregation regardless as to whether the resulting object breaks the SDO_Ordinate_Array limit; a negative is the need to turn the array of SDO_Point_Type back into array bound “safe” SDO_Ordinate_Arrays on insert. But the method is illustrative of what can be done if one “thinks outside the square”.)
DECLARE TYPE t_recs IS TABLE OF singlebeam_mpoint%ROWTYPE INDEX BY PLS_INTEGER; /* Create array of SDO_POINT_TYPE than is bigger than the ordinary SDO_ORDINATE_ARRAY and larger than max number points per survey */ TYPE t_3D_Point_Array IS VARRAY(4194304) OF MDSYS.SDO_POINT_TYPE; v_3D_Points t_3D_Point_Array; v_mpoint_recs t_recs; v_3D_ordinates mdsys.sdo_ordinate_array; v_mpoint_rec singlebeam_mpoint%ROWTYPE; v_index PLS_INTEGER := 0; v_rec_count PLS_INTEGER := 0; v_commit PLS_INTEGER := 50; CURSOR c_eno_list Is SELECT sbl.SB_ENO, COUNT(*) AS POINTCOUNT FROM singlebeam_bathymetry sbl; BEGIN FOR rec IN c_eno_list LOOP SELECT a.geom.sdo_point BULK COLLECT INTO v_3D_Points FROM singlebeam_bathymetry a ORDER BY a.pointno; /* To insert we need the data to be held in the SDO_Ordinate_Array */ v_index := 0; FOR v_point IN v_3D_Points.FIRST..v_3D_Points.LAST LOOP v_index := v_index + 1; v_3D_ordinates(v_index) := v_3D_Points(v_point).X; v_index := v_index + 1; v_3D_ordinates(v_index) := v_3D_Points(v_point).Y; v_index := v_index + 1; v_3D_ordinates(v_index) := v_3D_Points(v_point).Z; END LOOP; v_mpoint_rec.sb_eno := rec.sb_eno; v_mpoint_rec.pointcount := rec.pointcount; v_mpoint_rec.geom_mpoint := MDSYS.SDO_GEOMETRY(3005,8311,NULL, MDSYS.SDO_ELEM_INFO_ARRAY( 1,1,v_mpoint_rec.pointcount ), v_3D_ordinates); v_rec_count := v_rec_count + 1; v_mpoint_recs(v_rec_count) := v_mpoint_rec; IF ( v_rec_count > v_commit ) THEN FORALL i IN 1..v_rec_count INSERT INTO singlebeam_mpoint VALUES v_mpoint_recs(i); COMMIT; v_rec_count := 0; END IF; END LOOP; IF ( v_rec_count > 0 ) THEN -- Write out any left over ... FORALL i IN 1..v_rec_count INSERT INTO singlebeam_mpoint VALUES v_mpoint_recs(i); COMMIT; v_rec_count := 0; END IF; END; / SHOW ERRORS
Timing
To ensure that the performance of all four methods can be compared, the first two methods were “wrapped” by PL/SQL such that the SELECT statement was executed once for each sb_eno. Also, the final methods went through a few different interations in the handling of those surveys whose points broke the SDO_Ordinate_Array limit (see the actual SQL for how this was handled).
The first chart shows how substantially different the “native” versus “custom” methods can be: one cannot see the actual timings for the “custom” methods:
When one removes the “native” methods, one can see more clearly the relative performance of the “custom” methods.
Conclusion
Replication of the full, 67 million row raw point data, to a 13,000 row multi-point table was achieved in less than 50 minutes of processing. This was a substantial improvement and as a result of the work GeoScience Australia moved to a database replication strategy that included the efficient generation of multi-point survey data from the raw observations. This also gives the end-user the ability to work with the actual points in a web application instead of the surrogate averaged linework resulting in more accurate analysis and data discovery.
The “take home message” for users of Oracle Spatial/Locator should look to the power of the whole database to solve data warehousing and analysis problems and not just the Spatial packages themselves.
A fully worked example showing the fastest method for you to try youself is available here
Documentation
- MySQL Spatial General Functions
- Oracle LRS Objects
- Oracle Spatial Exporter (Java + pl/SQL)
- Oracle Spatial Object Functions
- Oracle Spatial Object Functions (Multi Page)
- PostGIS pl/pgSQL Functions
- SC4O Oracle Java Topology Suite (Java + pl/SQL)
- SQL Server Spatial General TSQL Functions
- SQL Server Spatial LRS TSQL Functions