The Sad State of GIS SQL Standards

The background to this blog comes from two separate sources: the first is my testing of SQL Server 2008 “Katmai”, and secondly, a blog article that Charlie Savage wrote a while back on The Sad State of GIS Web Standards.

Web Mapping Standards

As an ex-GIS Manager who monitored GIS standards for possible deployment within the business I worked for I can only concur with Charlie’s comments.

In particular, Web Mapping Service (WMS) was pretty useless as the basis of an feature-rich, middle-tier, internet standards based, rendering engine given it was designed by the standards committee as a “raster representation” of spatial data when what the industry wanted was an IMS standard that enabled the querying of individual objects within individual layers and the visualisation of selections against the geographic objects within those layers (not just what occurred at a pixel). For those who say that Web Feature Service (WFS) can be used for just such a selection I can only retort that it took years to become available and, even now, selection and visualistion is still a custom use of WMS/WFS.

In the end I only found two uses for OGC web standards: one was for the display of a dumb raster image in Trimble Pathfinder Office behind GPS data before uploading into the corporate database (a good visual check the data was in the right area); the other was for accessing external public data in either WMS or WFS form. But for internal business-centric use where higher functionality was needed they were, and are, useless.

SQL Standards

But my main contribution to Charlie’s blog article is to outline a few issues with the current geospatial SQL standards: in particular the OpenGIS Simple Features Implementation Specification for SQL and SQL/MM standards. Microsoft’s SQL Server Spatial (“Katmai”) functionality is based on the OGC 1.1 standard.

Generally, I don’t have much of an issue with these standards as they implement a workable set of basic methods needed for the storage, search, geoprocessing and retrieval of spatial data.

However, I do have issue with the lack of support within either standard for Minimum Bounding Rectangle (MBR) objects, MBR-based search/retrieval and spatial aggregates. (After all so much of what we do with spatial data depends on MBR processing.)

Search Operators

In the OGC 1.1 and SQL/MM standards, the only search operators are:

  • ST_Equals,
  • ST_Disjoint,
  • ST_Intersects,
  • ST_Touches,
  • ST_Crosses,
  • ST_Within,
  • ST_Contains,
  • ST_Overlaps and
  • ST_Relate

There is no ST_MBR() or ST_Envelope() search operator. (There is a ST_Envelope() operator that returns the MBR of a geometry object as a polygon but there are no methods for “envelopes” or search operators.)

Currently, the best performing operator the creator of a geospatial application can use to get shapes out of an OGC/SQLMM compliant database for a “find all objects inside the current display extent” I can get is via ST_Intersects():

Example from SQL Server 2008: note STIntesects() and not ST_Intersects().
SELECT d.delaunay_num, d.geom.STAsText()
  FROM Delaunay d
 WHERE d.geom.STIntersects(geometry::STPolyFromText('POLYGON((350000 5400000,355000 5400000,355000 5405000,350000 5405000,350000 5400000))',28355)) = 1;

Two comments on MBR filtering and MBR objects are needed.

1. Construction of Search Object

The construction of the search area (MBR) via a 5 vertex polygon is a pain. Another method one might use is to generate a search polygon via use of the ST_Envelope() OGC 1.1/SQLMM method as in the following from SQL Server “Katmai”:

SELECT d.delaunay_num, d.geom.STAsText()
  FROM Delaunay d
 WHERE d.geom.STIntersects(geometry::STLineFromText('LINESTRING(350000 5400000,355000 5405000)',28355).STEnvelope()) = 1;

This is far more elegant a way to express the search rectangle.

But why can’t we use some sort of MBR (cf Oracle’s optimized rectangle) object more directly?

SELECT d.delaunay_num, d.geom.STAsText()
  FROM Delaunay d
 WHERE d.geom.STIntersects(geometry::STEnvelope(350000 5400000,355000 5405000,28355)) = 1;

So, that’s one suggestion: a rectangle/MBR type and operators.

2. Efficent and effective searching.

Back earlier this year I was asked to look at why Oracle Spatial was running so slowly at a client site. (One felt that Oracle was being “blamed” as it is “common knowledge” that Oracle performs badly.) When I got in to that site I discovered that Oracle’s MapViewer was running very, very fast: fast enough for me to observe that it was the performance benchmark against which all other GIS access should be measured – to whit the customer agreed. Now, Oracle’s MapViewer executes SDO_FILTER() to get its data for map rendering while Deegree (one of the geospatial applications accessing the database) was using SDO_RELATE(… ‘mask=ANYINTERACT’…). SDO_RELATE(.. ‘mask=ANYINTERACT’…) is roughly equivalent to STIntersects(). In executing some performance measures I discovered the following performance differences:

Dataset Deegree vs MapViewer (%) Notes
Parcels 19.7 Polygon data
Contours 30.3 Contours have a complex spatial description
Transport 48.2 Transport spatial geometries are relatively simple (small avg vertices)
PlaceNames 86.8 No geometric comparisons needed for points: Sdo_Relate effectively an Sdo_Filter

So, by the OGC not having a Filter operator it effectively slows down data access for 98% (a guestimate based on experience) of all spatial queries against a spatial database! The following SQL statement shows just what a filter that compares MBRs of geometries and returns those for which there is an interaction might look like:

SELECT d.delaunay_num, d.geom.STAsText()
  FROM Delaunay d
 WHERE d.geom.ST_MBR(350000,5400000,355000,5405000,28355) = 1;

The above query in PostGIS is (note the BOX3D operator cf my suggestion of an MBR object in the relevant standards):

SELECT d.delaunay_num, d.geom.STAsText()
  FROM Delaunay d
 d.geom && SetSRID('BOX3D(350000 5400000,355000 5405000)'::box3d,28355);

And in Oracle is:

SELECT d.delaunay_num, d.geom.STAsText()
  FROM Delaunay d
 WHERE sdo_filter(d.geom,sdo_geometry(2003,28355,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,1003,3), SDO_ORDINATE_ARRAY(350000,5400000,355000,5405000)) = 'TRUE'

So, both Oracle and PostGIS have a Box/MBR/optimised rectangle object support and a fast primary filter/search capability.

Spatial Aggregates

The relevant spatial SQL standards do not mention one of the most important functions that a user executes: group by aggregates. An example from Oracle:

SELECT d.table_attribute,
       SDO_AGGR_UNION(SDOAGGRTYPE(d.geom))
  FROM Delaunay d
 WHERE sdo_filter(d.geom,sdo_geometry(2003,28355,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,1003,3), SDO_ORDINATE_ARRAY(350000,5400000,355000,5405000)) = 'TRUE'
  GROUP BY d.table_attribute

As you can see the custom aggregate functions have type signatures that are specific to Oracle limiting cross-database portability (more on portability in another blog). But if we take Oracle’s approach and apply it to the standard (though this is probably invalid as the Oracle approach is dictated by the architecture of the Oracle database) perhaps a standards compliant SQL statement might look like this:

SELECT d.table_attribute, 
       d.geom.STAggrUnion()
  FROM Delaunay d
 WHERE d.geom.ST_MBR(350000,5400000,355000,5405000,28355) = 1;
  GROUP BY d.table_attribute

Though I would think that d.geom.STUnion() might still be acceptable with the context (ie the GROUP BY clause) telling the database query engine that the version of the STUnion() (no supplied geometry) function that is needed is an aggregate function and not just a union of two shapes.

Conclusion

Why do the relevant standards not implement or suggest fast search operators, MBR types and aggregate operators? Is it because the relevant committes were more interested in formal (theoretical) aspects of spatial data (cf WMS) or because it leaves vendors (who have representatives on those standards bodies) wiggly room to add in the necessary additional functionality they know is needed to make the products useable in real world situations (with a claim that they have to implement their custom extensions for reasons of speed/functionality)?

Comments?