Geomatics Degrees, Space Curves and Oracle Spatial

One of the things that should be taught to GIS students at Uni but isn’t in a lot of courses is the notion of space keys or space-filling curves. The mathematics behind these keys was done before computing by mathematicians like:

The one that most GIS people are introduced to is the z-order space-filling curve because it is well known in computer science (cf Donald Knuth’s classic work on Algorithms) as a location preserving hash function. Therefore it has been implemented many times. It is known as Morton-ordering.

See here for a fuller discussion on space-filling curves. Also, my Tesselate PL/SQL package has a good implementation of a QuadTree that also includes a Peano-Key.

In the end, the goal for any location preserving hash function (no matter whose algorithm is used), is to map a point in a multidimensional coordinate space to a scalar value.

The thing to note about space-filling curves – no matter what algorithm is used – can be seen in the figure in the Wikipedia “Space-filling curve” page titled “Six iterations of the Hilbert curve, a space-filling curve devised by mathematician David Hilbert”. Spot the repetition of the basic pattern as you move from top left across then down from left to right. Notice how the functions not only correlate the data in any one bucket, but the buckets themselves get correlated as well! Neat huh!

I have some implementations of Hilbert, Peano and Morton keys in C but it is back at home in Oz. But what we really need is an implementation for us Oracle Spatial users…

But as Aragorn said to H�leth, son of Hama at the battle for Helm’s Deep:

“There’s always hope”…

What is little known is that Oracle Spatial started out life as a solution to the storage of large volumes of hydrographic soundings (point data with Z and time dimensions) by Oracle Canada. What they needed was a method for storage and access for the point data that would be scalable and fast. They turned to space-filling curves as a method for creating a single numeric key which could be the primary key for a standard relational table. They took the basic space-fulling curve implementations and generalised them to cope with multiple-dimensions – hence the MD in MDSYS (Multi-Dimension). Of course, this approach was never going to work for linear and polygonal data that is why Oracle Spatial before 8i really wasn’t much use…. but that is another story.

Anyway, the implementation of their space-filling curve (based on Peano—See HHEncode link on this site) is still available and is in the MD package.

SQL> desc md

Specifically, the HHENCODE function.

Here is an example of how to use it to create a table of point data sorted in HHENCODE ordering…

 
CREATE OR REPLACE FUNCTION linear_key (  p_shape   in mdsys.sdo_geometry,
                                         P_diminfo in mdsys.sdo_dim_array )
RETURN RAW DETERMINISTIC
IS
  v_ctr   MDSYS.SDO_GEOMETRY;
  v_rval  RAW;
  v_lvl   INTEGER;
BEGIN
 —Compute the centroid of the geometry
 —The geometric centroid is good enough (does not have to fall within poygon object)
  v_ctr := MDSYS.SDO_GEOM.SDO_CENTROID(p_shape,p_diminfo);
  v_lvl := 8;—Encoding level for hhcode function
  p_rval := MD.HHENCODE.sdo_lb, p_diminfo(1).sdo_ub, v_lvl,
                         v_ctr.sdo_point.y, p_diminfo(2).sdo_lb, p_diminfo(2).sdo_ub, v_lvl);
  RETURN p_rval;
END;
/

CREATE TABLE surface.lidar_1000_sa10 AS
SELECT * FROM all_sdo_geom_metadata b, gis.lidar_1000_sa10 a WHERE ( b.owner = ‘GIS’ and b.table_name = ‘LIDAR_1000_SA10’ and b.column_name = ‘SHAPE’ )
ORDER BY linear_key( a.shape, b.diminfo )
TABLESPACE SURFACE PCTFREE 90 PCTUSED 10;

insert into user_sdo_geom_metadata (table_name, column_name, diminfo, srid)
select table_name, column_name, diminfo, srid from all_sdo_geom_metadata where owner = ‘GIS’ and table_name = ‘LIDAR_1000_SA10’ and column_name = ‘SHAPE’;

create index lidar_1000_sa10_shp on lidar_1000_sa10(shape)
indextype is MDSYS.SPATIAL_INDEX PARAMETERS;

Now, as long as the table can be created such that the blocks it uses are sequential on physical disk, the performance of any reads of the data (display all points in this query window) will be at an absolute maximum.