Spatial Sorting of Data via Morton Key

I have often advocated the use of a spatial sort when loading static spatial data into Oracle (See 1. Spatially sort read-only data. in my article on performance tips). The idea here is to try and place spatial data that is close together in space (called spatial autocorrelation), close together on disk. Then, when the hard-disk’s head goes down on the disk to retrieve one or more records, the head will not have to travel far to retrieve the geographically close data. Imagine you zooming in to a particular area in your GIS. The search will be a minimum bounding rectangle based query that is specifically requesting data that is close together geographically. If disk activity can be minimised by spatial autocorrelation then performance will improve.

But really, I had never gotten around to producing a robust solution for use solely within the Oracle database.

Until now.

Object Relational Databases like Oracle are ideally suited to this task through the SELECT statement’s ORDER BY clause such that we can create a new table and populate it as follows:

 CREATE TABLE spatially_sorted 
 AS 
 SELECT id, geom
   FROM spatially_unsorted
  ORDER BY <some sort of spatial sort on the geom column>;

(Of course we are assuming that when Oracle creates the table it will use contiguous blocks on disk. This will only be true is the tablespace is new or has been degragmented.)

Now, the problem is that a sort order is not defined on an Oracle Sdo_Geometry object such that one could sort as follows:

 CREATE TABLE spatially_sorted 
 AS 
 SELECT id, geom
   FROM spatially_unsorted
  ORDER BY geom  -- geom is an SDO_Geometry object

Oracle’s lack of a sort of any kind based on the SDO_Geometry object is why I have not provided a practical example before now. (Though it is worth noting that PostGIS can sort by geometry with Regina Obe observing that the sort is a btree sort: not rtree as in PostGIS’s CLUSTER ON geometry).

The standard way of sorting based on the Oracle RTree is to simply do an SDO_Filter search using the MBR of the whole of the data as follows:

 CREATE TABLE spatially_sorted 
 AS 
 SELECT id, geom
   FROM spatially_unsorted su,
       (SELECT sdo_aggr_mbr(su2.geom) as mbr
          FROM spatially_unsorted su2) su3
  WHERE SDO_Filter(su.geom,su3.mbr) = 'TRUE'

One could also sort simply by X and Y via use of any point in a geometry. Thus, for Oracle point data, this would also work:

 CREATE TABLE spatially_sorted 
 AS 
 SELECT id, geom
   FROM spatially_unsorted su,
 ORDER BY su.geom.sdo_point.x, su.geom.sdo_point.y;

Suitability of Sort

Of course both the PostGIS “ORDER BY geometry” sort and the Oracle RTree (SDO_Filter…) and XY based sorts will sort the data, but they will do so in strips based on Y and X.

What other theory can be used to define a spatial sort function?

Space curves

There is a body of theory that describes a space-filling curve such as in Z-Order) or Peano (my favourite). These curves, I would assert, when implemented, provide a better method for taking advantage of the spatially autocorrelation that is implicit to spatial data.

PostGIS 1.4 has implemented the ST_GeoHash function which I assume is a form of a space curve – see GeoHash Website – but this only for geodetic – longitude/latitude data.

Oracle is supposed to have implemented the Z-Order access method. From what I recall this would have been part of the original MultiDimension implementation in the 1990s. Now, the only place I have can find an implementation is MD.HHENCODE. I have tried to use this over the years but have been unsatisfied that it was working as I wanted. (The MD.HHENCODE function is undocumented.)

So, to implement a spatial sort that best represents spatial autocorrelation we must implement space curve key generation function on SDO_Geometry.

Implementation

I have Peano and Morton space-filling curves implemented in C but an implementation in Oracle (as EXPROC) based on these, while technically not difficult, does required changes to the LISTENER which is not popular with many DBAs.

I have also converted both these algorithms into Java and have successfully deployed them into the Oracle Java Virtual Machine with PL/SQL wrappers. While not difficult it does require DBA involvement to implement.

So, today, after having received an email from a colleague (whom I admire a lot) asking the following question, I decided to try and create a solution based entirely on PL/SQL:

We have a production instance in Oracle for our spatial data and another instance for web viewing to which we regularly copy our production data.

Normally we run a script which first removes indices, empties the tables, fills them with an INSERTSELECT … and recreates the indices.

Was wondering if it is possible (and a good idea) to do the INSERT in such a manner that the data is spatially clustered when copied, maybe by using some sort of ORDER BY in the INSERTSELECT.

Morton Implementation

The original C code for generating a Morton space key is as follows:

 unsigned xy_to_morton (col,row)
 /*  this procedure calculates the Morton number of a cell
     at the given row and col[umn]   

     Written:  D.M. Mark, Jan 1984;
     Converted to Vax/VMS: July 1985
 */
 unsigned col,row;
 {
    unsigned key;
    int level, left_bit, right_bit, quadrant;

    key = 0;
    level = 0;
    while ((row>0) || (col>0))
    {
 /*   split off the row (left_bit) and column (right_bit) bits and
      then combine them to form a bit-pair representing the
      quadrant                                                  */

      left_bit  = row % 2;
      right_bit = col % 2;
      quadrant = right_bit + 2*left_bit;

      key += quadrant<<(2*level);

 /*   row, column, and level are then modified before the loop
      continues                                                */

      row /= 2;
      col /= 2;
      level++;

    }
    return (key);
 }

Now there are a few difficulties to over-come to get a successful implementation. The first is the use of unsigned integers. To be frank, I am going to ignore this and worry only it if I have to use all 32 bits in generating a key value. The second is the need for a bitwise left << operator. This can be done via the following PL/SQL:

     Function Left_Shift( p_val Integer, p_shift Integer)
     Return Integer
     As
     Begin
        Return trunc(p_val * power(2,p_shift));
     End;

OK, now let’s do the translation to PL/SQL.

 create or replace Function Morton (p_col Natural, p_row Natural)
 Return INTEGER
 As
     v_row       Natural := ABS(p_row);
     v_col       Natural := ABS(p_col);
     v_key       Natural := 0;
     v_level     BINARY_INTEGER := 0;
     v_left_bit  BINARY_INTEGER;
     v_right_bit BINARY_INTEGER;
     v_quadrant  BINARY_INTEGER;
  
     Function Left_Shift( p_val Natural, p_shift Natural)
     Return PLS_Integer
     As
     Begin
        Return trunc(p_val * power(2,p_shift));
     End;
     
 Begin
     while ((v_row>0) Or (v_col>0)) Loop
        /*   split off the row (left_bit) and column (right_bit) bits and
             then combine them to form a bit-pair representing the
             quadrant                                                  */
        v_left_bit  := MOD(v_row,2);
        v_right_bit := MOD(v_col,2);
        v_quadrant  := v_right_bit + ( 2 * v_left_bit );
        v_key       := v_key + Left_Shift( v_quadrant,( 2 * v_level ) );
        /*   row, column, and level are then modified before the loop
             continues                                                */
        v_row := trunc( v_row / 2 );
        v_col := trunc( v_col / 2 );
        v_level := v_level + 1;
      End Loop;
     return (v_key);
 End Morton;

A quick test:

 SQL> select codesys.morton(100,50) from dual;

 CODESYS.MORTON(100,50)
 -----------------------
                    7704

Morton Restrictions

There is a restriction to how a Morton space curve can be applied: the grid must be regular. This means that while the X and Y dimensions of the grid may be different (ie giving a rectangle not a square) the number of grids either side must be the same (ie a 100×100 grid not a 150×100 grid). Another requirement that affects the generation of a correct space curve is that the (row,column) index values must always be the same. So, in a 100×100 grid the lower left grid should be referenced as (0,0) or (1234,1234) and not (0,1) or (1234, 3456).

Apply to Real Data

First off let’s generate something that is ideal to our task: random point data. Note that the area, 5000m x 5000m, can be divided in to 100 × 100 × 500m grid cells.

 DROP   TABLE MORTONED_P;
 DROP   TABLE MORTON_P;
 CREATE TABLE MORTON_P ( 
   id integer, 
   morton_key integer, 
   geom mdsys.sdo_geometry 
 );

 -- Now create the points and write then to the table
 --
 SET FEEDBACK OFF
     INSERT INTO MORTON_P
       SELECT rownum,
              0,
              mdsys.sdo_geometry(2001,NULL,
                    MDSYS.SDO_POINT_TYPE(
                          ROUND(dbms_random.value(350000  - ( 10000 / 2 ),  
                                                  350000  + ( 10000 / 2 )),2),
                          ROUND(dbms_random.value(5400000 - ( 10000 / 2 ), 
                                                  5400000 + ( 10000 / 2 )),2),
                          NULL),
                    NULL,NULL)
        FROM DUAL
      CONNECT BY LEVEL <= 1000; 
 COMMIT;
 SET FEEDBACK ON

 SET HEADING OFF
 SELECT 'Inserted '||count(*)||' records into MORTON_P' FROM MORTON_P;
 SET HEADING ON

 Inserted 1000 records into MORTON_P

 -- Create User_Sdo_Geom_Metadata
 --
 DELETE FROM USER_SDO_GEOM_METADATA WHERE TABLE_NAME = 'MORTON_P';
 DECLARE
   v_shape mdsys.sdo_geometry;
 BEGIN
   SELECT SDO_AGGR_MBR(geom) INTO v_shape FROM MORTON_P;
   INSERT INTO USER_SDO_GEOM_METADATA(TABLE_NAME,COLUMN_NAME,DIMINFO,SRID)
     VALUES ('MORTON_P','GEOM',MDSYS.SDO_DIM_ARRAY(
     MDSYS.SDO_DIM_ELEMENT('X',V_SHAPE.SDO_ORDINATES(1),V_SHAPE.SDO_ORDINATES(3),0.05),
     MDSYS.SDO_DIM_ELEMENT('Y',V_SHAPE.SDO_ORDINATES(2),V_SHAPE.SDO_ORDINATES(4),0.05)),NULL);
 END;
 /
 SHOW ERRORS
 COMMIT;

We are not going to create a spatial index as it is not needed.

What does this data look like? If we select against the table with no sort we can assume we are getting data as it is physically on disk. Note also, that we want to ensure that our column/row addresses are normalised to the start of the grid.

 SQL> set  pagesize 1000 linesize 131
 SQL> Prompt  Inspect rows to show random ordering...
 Inspect rows to show random ordering...
 SQL> list
   1  SELECT rowid,
   2         id,
   3         a.geom.sdo_point.y as y,
   4         a.geom.sdo_point.x as x,
   5         FLOOR(a.geom.sdo_point.y/500) - MIN(FLOOR(a.geom.sdo_point.y/500)) OVER (ORDER BY FLOOR(a.geom.sdo_point.y/500)) as morton_gx,
   6         FLOOR(a.geom.sdo_point.x/500) - MIN(FLOOR(a.geom.sdo_point.x/500)) OVER (ORDER BY FLOOR(a.geom.sdo_point.x/500)) as morton_gy
   7    FROM MORTON_P a
   8*  WHERE rownum < 20
 SQL> /

 ROWID                      ID          Y          X  MORTON_GX  MORTON_GY
 ------------------ ---------- ---------- ---------- ---------- ----------
 AAAvuoAAEAADF30AAP        253 5398446.61  345393.81          5          0
 AAAvuoAAEAADF30AAD        241 5398851.27  345741.16          6          1
 AAAvuoAAEAADF30AAR        255 5402238.68  348014.59         13          6
 AAAvuoAAEAADF30AAH        245 5401531.45  348593.19         12          7
 AAAvuoAAEAADF30AAL        249 5401286.57  348614.63         11          7
 AAAvuoAAEAADF30AAS        256 5397924.49  349504.67          4          9
 AAAvuoAAEAADF30AAK        248 5401027.36  349820.31         11          9
 AAAvuoAAEAADF30AAG        244 5403216.65  350262.56         15         10
 AAAvuoAAEAADF30AAO        252 5398864.63  350892.57          6         11
 AAAvuoAAEAADF30AAQ        254 5398583.57  350732.86          6         11
 AAAvuoAAEAADF30AAJ        247 5404038.95  351249.54         17         12
 AAAvuoAAEAADF30AAN        251 5404272.95  351816.35         17         13
 AAAvuoAAEAADF30AAC        240 5401672.14  352233.77         12         14
 AAAvuoAAEAADF30AAF        243 5404463.24  352343.83         17         14
 AAAvuoAAEAADF30AAA        238  5402457.6  352071.85         13         14
 AAAvuoAAEAADF30AAM        250 5403813.73  353353.01         16         16
 AAAvuoAAEAADF30AAI        246 5403652.16  353834.18         16         17
 AAAvuoAAEAADF30AAB        239 5395604.75  353661.09          0         17
 AAAvuoAAEAADF30AAE        242 5401454.78  354527.93         11         19

 19 rows selected.

Note how the data is pretty unsorted.

The following image was generated by linking to the MORTON_P table using Manifold GIS and rendering the data by the ID column such that sequential ID column values have roughly the same colour. Note that the colours are displaying randomly which befits the way the data was created and written.

Morton Space-curve - Random Point data

Now, let’s compute the morton key and add it to the base table.

 SQL> Prompt Now Apply Morton Key ....
 Now Apply Morton Key ....
 SQL> UPDATE MORTON_P A 
   2     SET a.morton_key = codesys.Morton(
   3   (SELECT FLOOR(b.geom.sdo_point.y/500) - MIN(FLOOR(b.geom.sdo_point.y/500)) OVER (ORDER BY FLOOR(b.geom.sdo_point.y/500)) FROM Morton_P b WHERE b.id = a.ID ),
   4   (SELECT FLOOR(c.geom.sdo_point.x/500) - MIN(FLOOR (c.geom.sdo_point.x/500)) OVER (ORDER BY FLOOR(c.geom.sdo_point.x/500)) FROM Morton_P c WHERE c.id = a.ID ));

 1000 rows updated.

 SQL> COMMIT;

 Commit complete.

Now let’s create our sorted table using the morton_key value on the base table.

 SQL> drop table mortoned_p;

 Table dropped.

 SQL> Create table mortoned_p
   2  as 
   3  select rownum as id, morton_key, geom
   4    from (select morton_key, geom
   5           from morton_p mp
   6          order by mp.morton_key desc
   7         );

 Table created.

Now select against the table with no sort (assume we are getting data as physically on disk).

 SQL> Prompt Inspect rows to show ordering with morton attribute assigned...
 Inspect rows to show ordering with morton attribute assigned...
 SQL> SELECT rowid,
   2         id,
   3         a.geom.sdo_point.x as x, 
   4         a.geom.sdo_point.y as x, 
   5         a.morton_key
   6    FROM MORTONED_P a
   7   WHERE rownum < 20;

 ROWID                      ID          X          X MORTON_KEY
 ------------------ ---------- ---------- ---------- ----------
 AAAvxSAAEAADJIUAAA          1  354863.68 5404756.97  143415955
 AAAvxSAAEAADJIUAAB          2  354149.72 5404920.69  143415954
 AAAvxSAAEAADJIUAAC          3  354258.89 5404744.74  143415954
 AAAvxSAAEAADJIUAAD          4  354683.14 5404263.14  143415953
 AAAvxSAAEAADJIUAAE          5  354737.04 5404003.85  143415953
 AAAvxSAAEAADJIUAAF          6  354075.34 5404131.93  143415952
 AAAvxSAAEAADJIUAAG          7  353977.42  5404713.1  143415943
 AAAvxSAAEAADJIUAAH          8  353564.38  5404992.3  143415943
 AAAvxSAAEAADJIUAAI          9  353807.76 5404522.37  143415943
 AAAvxSAAEAADJIUAAJ         10   353691.9 5404567.27  143415943
 AAAvxSAAEAADJIUAAK         11  353124.72 5404703.87  143415942
 AAAvxSAAEAADJIUAAL         12  353317.99 5404915.13  143415942
 AAAvxSAAEAADJIUAAM         13  353256.95 5404421.45  143415940
 AAAvxSAAEAADJIUAAN         14  353358.12 5404093.68  143415940
 AAAvxSAAEAADJIUAAO         15  353276.15 5404120.26  143415940
 AAAvxSAAEAADJIUAAP         16  352671.65 5404985.66  143415939
 AAAvxSAAEAADJIUAAQ         17  352068.16 5404722.24  143415938
 AAAvxSAAEAADJIUAAR         18  352317.38 5404995.05  143415938
 AAAvxSAAEAADJIUAAS         19  352019.32 5404902.57  143415938

 19 rows selected.

Yes, the data is nicely sorted as we want.

And visually?

Visualising the Morton Key

The best way to understand what a space curve is doing is to visualise it via the actual grid and also the actual space curve that passes through the key points in the grid.

First, let’s create the grid and visualise it.

 SQL> drop table morton_grid;

 Table dropped.

 SQL> create table morton_grid
   2  as 
   3  select rownum as id,
   4         c.geometry as geom,
   5         codesys.Morton(FLOOR(c.centroid.sdo_point.y/500) - MIN(FLOOR(c.centroid.sdo_point.y/500)) OVER (ORDER BY FLOOR(c.centroid.sdo_point.y/500)),
   6                        FLOOR(c.centroid.sdo_point.x/500) - MIN(FLOOR(c.centroid.sdo_point.x/500)) OVER (ORDER BY FLOOR(c.centroid.sdo_point.x/500))) as morton_key
   7   from (select m.geometry,
   8                codesys.centroid.sdo_centroid(codesys.centroid.convertGeometry(m.geometry,0.5),0.5,0) as centroid
   9           from table(codesys.tesselate.regularGrid((select sdo_aggr_mbr(p.geom) as mbr
  10                                                       from morton_p p),
  11                                                     mdsys.sdo_point_type(500,500,null))) m
  12        ) c;

 Table created.

Now, let’s look at it. The following image was generated by linking to the MORTONED_GRID table using Manifold GIS and rendering and labelling the data by the MORTON_KEY value.

500m x 500m Grid Rendered and Labelled by Morton Key

Notice how the data is colouring shows the relatedness of spatial data next to each grid.

But to get a real sense as to how the space curve sort is going to work, let’s create a linestring that passes through the centroids of each grid order by the Morton space curve value.

 SQL> Prompt Create linestring representing the morton space curve
 Create linestring representing the morton space curve
 SQL> drop table Morton_L;

 Table dropped.

 SQL> create table Morton_L
   2  as
   3  SELECT rownum as id,
   4         mdsys.sdo_geometry(2002,NULL,NULL,mdsys.sdo_elem_info_array(1,2,1),
   5                  CAST(MULTISET(SELECT ord
   6                                  FROM (select m.morton_Key,
   7                                               v.x as ord
   8                                          from morton_grid m,
   9                                               table(mdsys.sdo_util.getvertices(codesys.centroid.sdo_centroid(codesys.centroid.convertGeometry(m.geom,0.5),0.5,0))) v
  10                                        union all
  11                                        select m.morton_Key,
  12                                               v.y as ord
  13                                          from morton_grid m,
  14                                               table(mdsys.sdo_util.getvertices(codesys.centroid.sdo_centroid(codesys.centroid.convertGeometry(m.geom,0.5),0.5,0))) v
  15                                      )
  16                                  ORDER BY morton_key, ord
  17                                ) as mdsys.sdo_ordinate_array)) as geom
  18    from dual;

 Table created.

Here is the space curve superimposed on the labelled Morton grid:

Morton Grid and Space Curve Represented as a Linestring

We can also look at the actual point data rendered by the MORTON_KEY and ID columns. Note that the colours are displaying the ordering of the Morton space-curve.

Morton - Random Points After Morton Key Applied - Rendered by ID

Morton - Points After Morton Key Applied - Rendered by Morton_Key

Finally, let’s superimpose the actual point (geometry) data on the 500m x 500m grid with everything rendered by the Morton Key.

500m x 500m grid showing how points are gridded by Morton Key

I think we can safely assume we have achieved our goal.

I hope this is useful to someone.