# Spatial Sorting of Data via Morton Key For Oracle Spatial/Locator

I have often advocated the use of a spatial sort when loading static spatial data into Oracle (See 1. Spatially sort read-only data. in my article on performance tips). The idea here is to try and place spatial data that is close together in space (called spatial autocorrelation), close together on disk. Then, when the hard-disk’s head goes down on the disk to retrieve one or more records, the head will not have to travel far to retrieve the geographically close data. Imagine you zooming in to a particular area in your GIS. The search will be a minimum bounding rectangle based query that is specifically requesting data that is close together geographically. If disk activity can be minimised by spatial autocorrelation then performance will improve.

But really, I had never gotten around to producing a robust solution for use solely within the Oracle database.

Until now.

Object Relational Databases like Oracle are ideally suited to this task through the **SELECT** statement’s **ORDER BY** clause such that we can create a new table and populate it as follows:

CREATE TABLE spatially_sorted AS SELECT id, geom FROM spatially_unsorted ORDER BY <some sort of spatial sort on the geom column>;

(Of course we are assuming that when Oracle creates the table it will use contiguous blocks on disk. This will only be true is the tablespace is new or has been defragmented.)

Now, the problem is that a sort order is not defined on an Oracle SDO_GEOMETRY object such that one could sort as follows:

CREATE TABLE spatially_sorted AS SELECT id, geom FROM spatially_unsorted ORDER BY geom -- geom is an SDO_GEOMETRY object

Oracle’s lack of a sort of any kind based on the SDO_GEOMETRY object is why I have not provided a practical example before now. (Though it is worth noting that PostGIS can sort by geometry with Regina Obe observing that the sort is a btree sort: not rtree as in PostGIS’s CLUSTER ON geometry).

The standard way of sorting based on the Oracle RTree is to simply do an SDO_Filter search using the Minimum Bounding Rectangle (MBR) or Envelope of the whole of the data as follows:

CREATE TABLE spatially_sorted AS SELECT id, geom FROM spatially_unsorted su, (SELECT sdo_aggr_mbr(su2.geom) as mbr FROM spatially_unsorted su2) su3 WHERE SDO_Filter(su.geom,su3.mbr) = 'TRUE'

One could also sort simply by X and Y via use of any point in a geometry. Thus, for Oracle point data, this would also work:

CREATE TABLE spatially_sorted AS SELECT id, geom FROM spatially_unsorted su, ORDER BY su.geom.sdo_point.x, su.geom.sdo_point.y;

**Suitability of Sort**

Of course both the PostGIS ‘ORDER BY geometry’ sort and the Oracle RTree (SDO_Filter’) and XY based sorts will sort the data, but they will do so in strips based on Y and X.

What other theory can be used to define a spatial sort function?

**Space curves**

There is a body of theory that describes a space-filling curve such as in Z-Order) or Peano (my favourite). These curves, I would assert, when implemented, provide a better method for taking advantage of the spatially autocorrelation that is implicit to spatial data.

PostGIS 1.4 has implemented the ST_GeoHash function which I assume is a form of a space curve ‘ see GeoHash Website ‘ but this only for geodetic ‘ longitude/latitude data.

Oracle is supposed to have implemented the Z-Order access method. From what I recall this would have been part of the original MultiDimension implementation in the 1990s. Now, the only place I have can find an implementation is MD.HHENCODE. I have tried to use this over the years but have been unsatisfied that it was working as I wanted. (The MD.HHENCODE function is undocumented.)

So, to implement a spatial sort that best represents spatial autocorrelation we must implement space curve key generation function on SDO_GEOMETRY.

**Implementation**

I have Peano and Morton space-filling curves implemented in C but an implementation in Oracle (as EXPROC) based on these, while technically not difficult, does required changes to the LISTENER which is not popular with many DBAs.

I have also converted both these algorithms into Java and have successfully deployed them into the Oracle Java Virtual Machine with PL/SQL wrappers. While not difficult it does require DBA involvement to implement.

So, today, after having received an email from a colleague (whom I admire a lot) asking the following question, I decided to try and create a solution based entirely on PL/SQL:

We have a production instance in Oracle for our spatial data and another instance for web viewing to which we regularly copy our production data. Normally we run a script which first removes indices, empties the tables, fills them with an INSERT SELECT and recreates the indices. Was wondering if it is possible (and a good idea) to do the INSERT in such a manner that the data is spatially clustered when copied, maybe by using some sort of ORDER BY in the INSERT SELECT.

**Morton Implementation**

The original C code for generating a Morton space key is as follows:

unsigned xy_to_morton (col,row) /* this procedure calculates the Morton number of a cell at the given row and col[umn] Written: D.M. Mark, Jan 1984; Converted to Vax/VMS: July 1985 */ unsigned col,row; { unsigned key; int level, left_bit, right_bit, quadrant; key = 0; level = 0; while ((row>0) || (col>0)) { /* split off the row (left_bit) and column (right_bit) bits and then combine them to form a bit-pair representing the quadrant */ left_bit = row % 2; right_bit = col % 2; quadrant = right_bit + 2*left_bit; key += quadrant<<(2*level); /* row, column, and level are then modified before the loop continues */ row /= 2; col /= 2; level++; } return (key); }

Now there are a few difficulties to over-come to get a successful implementation. The first is the use of unsigned integers. To be frank, I am going to ignore this and worry only it if I have to use all 32 bits in generating a key value. The second is the need for a bitwise left << operator. This can be done via the following PL/SQL:

Function Left_Shift( p_val Integer, p_shift Integer) Return Integer As Begin Return trunc(p_val * power(2,p_shift)); End;

OK, now let’s do the translation to PL/SQL.

create or replace Function Morton (p_col Natural, p_row Natural) Return INTEGER As v_row Natural := ABS(p_row); v_col Natural := ABS(p_col); v_key Natural := 0; v_level BINARY_INTEGER := 0; v_left_bit BINARY_INTEGER; v_right_bit BINARY_INTEGER; v_quadrant BINARY_INTEGER; Function Left_Shift( p_val Natural, p_shift Natural) Return PLS_Integer As Begin Return trunc(p_val * power(2,p_shift)); End; Begin while ((v_row>0) Or (v_col>0)) Loop /* split off the row (left_bit) and column (right_bit) bits and then combine them to form a bit-pair representing the quadrant */ v_left_bit := MOD(v_row,2); v_right_bit := MOD(v_col,2); v_quadrant := v_right_bit + ( 2 * v_left_bit ); v_key := v_key + Left_Shift( v_quadrant,( 2 * v_level ) ); /* row, column, and level are then modified before the loop continues */ v_row := trunc( v_row / 2 ); v_col := trunc( v_col / 2 ); v_level := v_level + 1; End Loop; return (v_key); End Morton;

A quick test:

SQL> select codesys.morton(100,50) from dual; CODESYS.MORTON(100,50) ----------------------- 7704

**Morton Restrictions**

There is a restriction to how a Morton space curve can be applied: the grid must be regular. This means that while the X and Y dimensions of the grid may be different (ie giving a rectangle not a square) the number of grids either side must be the **same** (ie a 100×100 grid not a 150×100 grid).

Additionally, for a “perfect” looking grid, the grid cells must not just be the same in X and Y, but they must be a multiple of 4. So, while 100 x 100 grid will work, 96×96 or 128×128 would be better as they are divisible perfectly by 4.

Another requirement that affects the generation of a correct space curve is that the (row,column) index values must always be the same. So, in a 100×100 grid the lower left grid should be referenced as (0,0) or (1234,1234) and not (0,1) or (1234, 3456).

**Apply to Real Data**

First off let’s generate something that is ideal to our task: random point data. Note that the area, 5000m x 5000m, can be divided in to 100 × 100 × 500m grid cells (even though 96 x 96 might be “better”).

DROP TABLE MORTONED_P; DROP TABLE MORTON_P; CREATE TABLE MORTON_P ( id integer, morton_key integer, geom mdsys.sdo_geometry ); -- Now create the points and write then to the table -- SET FEEDBACK OFF INSERT INTO MORTON_P SELECT rownum, 0, mdsys.sdo_geometry(2001,NULL, MDSYS.SDO_POINT_TYPE( ROUND(dbms_random.value(350000 - ( 10000 / 2 ), 350000 + ( 10000 / 2 )),2), ROUND(dbms_random.value(5400000 - ( 10000 / 2 ), 5400000 + ( 10000 / 2 )),2), NULL), NULL,NULL) FROM DUAL CONNECT BY LEVEL <= 1000; COMMIT; SET FEEDBACK ON SET HEADING OFF SELECT 'Inserted '||count(*)||' records into MORTON_P' FROM MORTON_P; SET HEADING ON Inserted 1000 records into MORTON_P -- Create User_Sdo_Geom_Metadata -- DELETE FROM USER_SDO_GEOM_METADATA WHERE TABLE_NAME = 'MORTON_P'; DECLARE v_shape mdsys.sdo_geometry; BEGIN SELECT SDO_AGGR_MBR(geom) INTO v_shape FROM MORTON_P; INSERT INTO USER_SDO_GEOM_METADATA(TABLE_NAME,COLUMN_NAME,DIMINFO,SRID) VALUES ('MORTON_P','GEOM',MDSYS.SDO_DIM_ARRAY( MDSYS.SDO_DIM_ELEMENT('X',V_SHAPE.SDO_ORDINATES(1),V_SHAPE.SDO_ORDINATES(3),0.05), MDSYS.SDO_DIM_ELEMENT('Y',V_SHAPE.SDO_ORDINATES(2),V_SHAPE.SDO_ORDINATES(4),0.05)),NULL); END; / SHOW ERRORS COMMIT;

We are not going to create a spatial index as it is not needed.

What does this data look like? If we select against the table with no sort we can assume we are getting data as it is physically on disk. Note also, that we want to ensure that our column/row addresses are normalised to the start of the grid.

SQL> set pagesize 1000 linesize 131 SQL> Prompt Inspect rows to show random ordering... Inspect rows to show random ordering... SQL> list 1 SELECT rowid, 2 id, 3 a.geom.sdo_point.y as y, 4 a.geom.sdo_point.x as x, 5 FLOOR(a.geom.sdo_point.y/500) - MIN(FLOOR(a.geom.sdo_point.y/500)) OVER (ORDER BY FLOOR(a.geom.sdo_point.y/500)) as morton_gx, 6 FLOOR(a.geom.sdo_point.x/500) - MIN(FLOOR(a.geom.sdo_point.x/500)) OVER (ORDER BY FLOOR(a.geom.sdo_point.x/500)) as morton_gy 7 FROM MORTON_P a 8* WHERE rownum < 20 SQL> / ROWID ID Y X MORTON_GX MORTON_GY ------------------ ---------- ---------- ---------- ---------- ---------- AAAvuoAAEAADF30AAP 253 5398446.61 345393.81 5 0 AAAvuoAAEAADF30AAD 241 5398851.27 345741.16 6 1 AAAvuoAAEAADF30AAR 255 5402238.68 348014.59 13 6 AAAvuoAAEAADF30AAH 245 5401531.45 348593.19 12 7 AAAvuoAAEAADF30AAL 249 5401286.57 348614.63 11 7 AAAvuoAAEAADF30AAS 256 5397924.49 349504.67 4 9 AAAvuoAAEAADF30AAK 248 5401027.36 349820.31 11 9 AAAvuoAAEAADF30AAG 244 5403216.65 350262.56 15 10 AAAvuoAAEAADF30AAO 252 5398864.63 350892.57 6 11 AAAvuoAAEAADF30AAQ 254 5398583.57 350732.86 6 11 AAAvuoAAEAADF30AAJ 247 5404038.95 351249.54 17 12 AAAvuoAAEAADF30AAN 251 5404272.95 351816.35 17 13 AAAvuoAAEAADF30AAC 240 5401672.14 352233.77 12 14 AAAvuoAAEAADF30AAF 243 5404463.24 352343.83 17 14 AAAvuoAAEAADF30AAA 238 5402457.6 352071.85 13 14 AAAvuoAAEAADF30AAM 250 5403813.73 353353.01 16 16 AAAvuoAAEAADF30AAI 246 5403652.16 353834.18 16 17 AAAvuoAAEAADF30AAB 239 5395604.75 353661.09 0 17 AAAvuoAAEAADF30AAE 242 5401454.78 354527.93 11 19 19 rows selected.

Note how the data is pretty unsorted.

The following image was generated by linking to the MORTON_P table using Manifold GIS and rendering the data by the ID column such that sequential ID column values have roughly the same colour. Note that the colours are displaying randomly which befits the way the data was created and written.

“Morton Space-curve – Random Point data” />

Now, let’s compute the morton key and add it to the base table.

-- Now Apply Morton Key .... SQL> UPDATE MORTON_P A 2 SET a.morton_key = codesys.Morton( 3 (SELECT FLOOR(b.geom.sdo_point.y/500) - MIN(FLOOR(b.geom.sdo_point.y/500)) OVER (ORDER BY FLOOR(b.geom.sdo_point.y/500)) FROM Morton_P b WHERE b.id = a.ID ), 4 (SELECT FLOOR(c.geom.sdo_point.x/500) - MIN(FLOOR (c.geom.sdo_point.x/500)) OVER (ORDER BY FLOOR(c.geom.sdo_point.x/500)) FROM Morton_P c WHERE c.id = a.ID )); 1000 rows updated. SQL> COMMIT; Commit complete.

Now let’s create our sorted table using the morton_key value on the base table.

SQL> drop table mortoned_p; Table dropped. SQL> Create table mortoned_p 2 as 3 select rownum as id, morton_key, geom 4 from (select morton_key, geom 5 from morton_p mp 6 order by mp.morton_key desc 7 ); Table created.

Now select against the table with no sort (assume we are getting data as physically on disk).

-- Inspect rows to show ordering with morton attribute assigned... SQL> SELECT rowid, 2 id, 3 a.geom.sdo_point.x as x, 4 a.geom.sdo_point.y as x, 5 a.morton_key 6 FROM MORTONED_P a 7 WHERE rownum < 20; ROWID ID X X MORTON_KEY ------------------ ---------- ---------- ---------- ---------- AAAvxSAAEAADJIUAAA 1 354863.68 5404756.97 143415955 AAAvxSAAEAADJIUAAB 2 354149.72 5404920.69 143415954 AAAvxSAAEAADJIUAAC 3 354258.89 5404744.74 143415954 AAAvxSAAEAADJIUAAD 4 354683.14 5404263.14 143415953 AAAvxSAAEAADJIUAAE 5 354737.04 5404003.85 143415953 AAAvxSAAEAADJIUAAF 6 354075.34 5404131.93 143415952 AAAvxSAAEAADJIUAAG 7 353977.42 5404713.1 143415943 AAAvxSAAEAADJIUAAH 8 353564.38 5404992.3 143415943 AAAvxSAAEAADJIUAAI 9 353807.76 5404522.37 143415943 AAAvxSAAEAADJIUAAJ 10 353691.9 5404567.27 143415943 AAAvxSAAEAADJIUAAK 11 353124.72 5404703.87 143415942 AAAvxSAAEAADJIUAAL 12 353317.99 5404915.13 143415942 AAAvxSAAEAADJIUAAM 13 353256.95 5404421.45 143415940 AAAvxSAAEAADJIUAAN 14 353358.12 5404093.68 143415940 AAAvxSAAEAADJIUAAO 15 353276.15 5404120.26 143415940 AAAvxSAAEAADJIUAAP 16 352671.65 5404985.66 143415939 AAAvxSAAEAADJIUAAQ 17 352068.16 5404722.24 143415938 AAAvxSAAEAADJIUAAR 18 352317.38 5404995.05 143415938 AAAvxSAAEAADJIUAAS 19 352019.32 5404902.57 143415938 19 rows selected.

Yes, the data is nicely sorted as we want.

And visually?

**Visualising the Morton Key**

The best way to understand what a space curve is doing is to visualise it via the actual grid and also the actual space curve that passes through the key points in the grid.

First, let’s create the grid and visualise it.

SQL> drop table morton_grid; Table dropped. SQL> create table morton_grid 2 as 3 select rownum as id, 4 c.geometry as geom, 5 codesys.Morton(FLOOR(c.centroid.sdo_point.y/500) - MIN(FLOOR(c.centroid.sdo_point.y/500)) OVER (ORDER BY FLOOR(c.centroid.sdo_point.y/500)), 6 FLOOR(c.centroid.sdo_point.x/500) - MIN(FLOOR(c.centroid.sdo_point.x/500)) OVER (ORDER BY FLOOR(c.centroid.sdo_point.x/500))) as morton_key 7 from (select m.geometry, 8 codesys.centroid.sdo_centroid(codesys.centroid.convertGeometry(m.geometry,0.5),0.5,0) as centroid 9 from table(codesys.tesselate.regularGrid((select sdo_aggr_mbr(p.geom) as mbr 10 from morton_p p), 11 mdsys.sdo_point_type(500,500,null))) m 12 ) c; Table created.

Now, let’s look at it. The following image was generated by linking to the MORTONED_GRID table using Manifold GIS and rendering and labelling the data by the **MORTON_KEY** value.

Notice how the data is colouring shows the relatedness of spatial data next to each grid.

But to get a real sense as to how the space curve sort is going to work, let’s create a linestring that passes through the centroids of each grid order by the Morton space curve value.

SQL> Prompt Create linestring representing the morton space curve Create linestring representing the morton space curve SQL> drop table Morton_L; Table dropped. SQL> create table Morton_L 2 as 3 SELECT rownum as id, 4 mdsys.sdo_geometry(2002,NULL,NULL,mdsys.sdo_elem_info_array(1,2,1), 5 CAST(MULTISET(SELECT ord 6 FROM (select m.morton_Key, 7 v.x as ord 8 from morton_grid m, 9 table(mdsys.sdo_util.getvertices(codesys.centroid.sdo_centroid(codesys.centroid.convertGeometry(m.geom,0.5),0.5,0))) v 10 union all 11 select m.morton_Key, 12 v.y as ord 13 from morton_grid m, 14 table(mdsys.sdo_util.getvertices(codesys.centroid.sdo_centroid(codesys.centroid.convertGeometry(m.geom,0.5),0.5,0))) v 15 ) 16 ORDER BY morton_key, ord 17 ) as mdsys.sdo_ordinate_array)) as geom 18 from dual; Table created.

Here is the space curve superimposed on the labelled Morton grid:

We can also look at the actual point data rendered by the **MORTON_KEY** and **ID** columns. Note that the colours are displaying the ordering of the Morton space-curve.

Finally, let’s superimpose the actual point (geometry) data on the 500m x 500m grid with everything rendered by the Morton Key.

I think we can safely assume we have achieved our goal.

I hope this is useful to someone.