Spatial Database Independence

I often hear asserted in discussions about accessing data in databases that the approach taken should be “database independent”. The assertion is always made as if it were some sort of Scientific Truth (Dogma even). And it is always assumed that this is easy to achieve. Finally, it always seems to be an accepted reality as if it was backed by some sort of consensus in the Computer Science Literature.

(I have almost always heard this when discussing non-Microsoft databases and technologies but rarely when the Microsoft technologies are the principle components of a solution. But, perhaps that is just a selective reading based on my own career!)

This article will provide a simple comparison based on generating “vectors” from the rings of a multi-polygon using Oracle’s ST_Geometry, SQL Server 2008’s STGeometry and PostgreSQL/PostGIS’s ST_Geometry.

Initial Comments on Hierarchy

In my comparison work I use the OGC SFS 1.1 standard with preference for the SQL/MM standard. Since all three databases are “Object” relational databases (ORDBMS) we need to start with a comment about this aspect of the implementation.

Oracle’s ST_Geometry is a true type hierarchy with only those functions defined on specific elements available for geometries of that type. So, for example, you cannot call ST_Area on an ST_CURVE (ST_LINESTRING, ST_MULTILINESTRING etc), whereas in SQL Server 2008 and PostGIS you can.

Oracle

 SELECT a.geom.ST_Area() AS area
   FROM (SELECT TREAT(ST_Geometry.FROM_WKT('LINESTRING(10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0)',NULL) AS st_linestring) AS geom
           FROM DUAL ) a;
 -- Result
 ORA-00904: "A"."GEOM"."ST_AREA": invalid identifier
 00904. 00000 -  "%s: invalid identifier"

The availability of a true object hierarchy was not always the case. Up to 10g Oracle only had a singly-inherited type library. The original SDO_GEOMETRY approach is still singly inherited: only the SQLMM/OGC implementation from 10g onwards is not.

In the above SQL you will notice a lack of an ST_LineFromText() constructor that returns an ST_LineString object: this is not available in the ST_CURVE or ST_LINESTRING sub-class objects. Given the use of ST_Geometry one would have thought an ST_GeomFromText() method would be available rather than the non-standard FROM_WKT: nope, sorry, Oracle has not implemented one. The only ST*FromText() method available are:

 SELECT object_name || '.' || procedure_name AS conversion_method
   FROM all_procedures
  WHERE owner = 'MDSYS'
    AND object_name LIKE 'ST%'
    AND procedure_name LIKE 'ST%TEXT';
 -- Result
 CONVERSION_METHOD
 -------------------------------------
ST_ANNOTATION_TEXT.ST_ANNOTATION_TEXT
 ST_MULTIPOLYGON.ST_BDMPOLYFROMTEXT 
 ST_MULTIPOLYGON.ST_BDMPOLYFROMTEXT
 ST_POLYGON.ST_BDPOLYFROMTEXT
 ST_POLYGON.ST_BDPOLYFROMTEXT

Finally, with Oracle, some constructors or accessors return ST_Geometry objects even if they are ST_LineStrings etc. To access the methods of the underlying sub-class object, we need to “cast” them to the required type via the Oracle TREAT function. eg SELECT TREAT (geom as ST_LineString) as line FROM …..

PostGIS

Note that in PostGIS you cannot use the . notation to access the methods of an object.

 SELECT ST_Area(a.geom) AS area
   FROM (SELECT ST_LineFromText('LINESTRING(10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0)',0) AS geom ) a;
 -- Result
 area
 ----
 0

Because PostGIS is “singly inherited”, it does not use a “dot” object.method invocation from a base class. Thus ST_LineFromText() function is an independent function that returns a geometry (ST_Geometry): No casting is required. SQL Server 2008

 SELECT a.geom.STArea() AS area
   FROM (SELECT geometry::STLineFromText('LINESTRING(10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0)',0) AS geom) a;
 -- Result
 area
 ----
 0

SQL Server 2008 is a “singly inherited” type system that uses a “dot” object.method invocation from a base class in object creation or method execution. Thus ST_STLineFromText() function requires the use of “geometry::” that acts as a surrogate for a missing ST_Geometry abstract class.

Comparisons

The following three SQL statements are the result of trying to minimise the differences in SQL between the three databases. Where possible, I avoided use of anything that is specific to a database (eg rownum in Oracle). In addition, I use my SQL Server and Oracle versions of PostgreSQL’s generate_series() set returning function.

Oracle

 -- MultiPoly with two elements (polygons) one with an Inner Ring
 WITH poly AS (
 SELECT ROW_NUMBER() OVER (PARTITION BY b.n ORDER BY b.n) AS polyNo,
        a.geom
   FROM (SELECT TREAT(VALUE(t) AS st_polygon) AS geom
           FROM (SELECT ST_MultiPolygon.ST_BdMPolyFromText(
                        'MULTIPOLYGON(((10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0),
                                      (12.0 12.0, 12.0 14.0, 14.0 14.0, 14.0 12.0, 12.0 12.0)),
                                     ((100.0 100.0, 150.0 100.0, 150.0 150.0, 100.0 150.0, 100.0 100.0)))',0)
                        AS geom
                   FROM dual) o,
                 TABLE(o.geom.ST_GEOMETRIES()) t
        ) a,
        (SELECT 1 AS n FROM dual) b
  WHERE a.geom.ST_GeometryType() = 'ST_POLYGON'
 )
 SELECT a.polyNo,
        a.ringNo,
        ROW_NUMBER() OVER (PARTITION BY a.PolyNo, a.ringNo, b.n ORDER BY b.n) AS vectorNo,
        a.sp.ST_X() AS sx, a.sp.ST_Y() AS sy,
        a.ep.ST_X() AS ex, a.ep.ST_Y() AS ey
   FROM ( SELECT g.polyNo,
                 g.ringNo,
                 g.geom.ST_PointN(p.column_value  ) AS sp,
                 g.geom.ST_PointN(p.column_value+1) AS ep
            FROM (SELECT p.polyNo,
                         1 AS ringNo,
                         p.geom.ST_ExteriorRing() AS geom
                    FROM poly p
                   UNION ALL
                  SELECT p.polyNo,
                         p.column_value+1 AS ringNo,
                         p.geom.ST_InteriorRingN(p.column_value) AS geom
                    FROM poly p,
                         TABLE(codesys.geom.generate_series(1,p.geom.ST_NumInteriorRing(),1)) p
                   WHERE p.column_value < p.geom.ST_NumInteriorRing()  /* Reports 2 but actually only 1: the first */
                 ) g,
                 TABLE(codesys.geom.generate_series(1,g.geom.ST_NumPoints(),1)) p
           WHERE p.column_value < g.geom.ST_NumPoints()
        ) a,
        (SELECT 1 AS n FROM dual) b
  ORDER BY a.polyNo, a.ringNo, 3;
 -- Result
 POLYNO RINGNO VECTORNO SX   SY  EX  EY
 ------ ------ -------- ---- --- --- ---
 1      1      1        10   10  15  10
 1      1      2        15   10  15  15
 1      1      3        15   15  10  15
 1      1      4        10   15  10  10
 1      2      1        12   12  14  12
 1      2      2        12   14  12  12
 1      2      3        14   14  12  14
 1      2      4        14   12  14  14
 2      1      1        100  100 150 100
 2      1      2        150  100 150 150
 2      1      3        150  150 100 150
 2      1      4        100  150 100 100
 -- Result
  12 ROWS selected

Notes:

  • Need for “(select 1 as n from dual) b” in order to generate a unique row number as Row_Number() cannot be used outside of a partition or order by (this is the same problem across all three databases).
  • generate_series return referenced by database specific “column_value” column name;
  • The use of TREAT to ensure the polygons extracted from the MultiPolygon are correctly cast to ST_Polygons so its methods eg ST_ExteriorRing() can be used.
  • ST_NumInteriorRing() seems to also report the exterior ring in its count yet it cannot be used (this is not the same as PostGIS and SQL Server 2008).
  • FROM dual” cannot be avoided;
  • ST_GeometryType returns “ST_POLYGON” (uppercase);
  • TABLE functions cannot be avoided but they can be used with values from other tables in a query;
  • TABLE function must be used to disaggregate the individual polygons returned by the non-standard ST_GEOMETRIES() method – Oracle does not implement ST_GeometryN();
  • Constructor function name ST_BdMPolyFromText different from other databases.

However, an additional comment is warranted with regards to ROW_NUMBER. In Oracle – but not the other 2 databases – it is possible to generate a row number as follows:

 SELECT ROW_NUMBER() OVER (partition BY 1 ORDER BY 1) AS rowno, b.column_value
   FROM TABLE(codesys.geom.generate_series(1,5,1)) b;
 -- Result
 ROWNO COLUMN_VALUE
 ----- ------------
 1     1
 2     2
 3     3
 4     4
 5     5

To ensure commonality across databases in generating row numbers, I had to use the constructs as shown in the different SQL.

SQL Server 2008

 -- MultiPoly with two elements (polygons) one with an Inner Ring
 WITH poly AS (
   SELECT ROW_NUMBER() OVER (PARTITION BY B.N ORDER BY B.N) AS polyNo,
          a.geom
     FROM (SELECT o.geom.STGeometryN(t.IntValue) AS geom
       FROM (SELECT geometry::STMPolyFromText(
                      'MULTIPOLYGON(((10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0),
                                     (12.0 12.0, 12.0 14.0, 14.0 14.0, 14.0 12.0, 12.0 12.0)),
                                    ((100.0 100.0, 150.0 100.0, 150.0 150.0, 100.0 150.0, 100.0 100.0)))',0)
                    AS geom
          ) o
          CROSS APPLY
          generate_series(1,o.geom.STNumGeometries(),1) t
          ) a,
          (SELECT 1 AS n) b
    WHERE UPPER(a.geom.STGeometryType()) = 'POLYGON'
 )
 SELECT a.polyNo,
        a.ringNo,
        ROW_NUMBER() OVER (PARTITION BY a.PolyNo, a.ringNo, b.n ORDER BY b.n) AS vectorNo,
        a.sp.STX AS sx, a.sp.STY AS sy,
        a.ep.STX AS ex, a.ep.STY AS ey
   FROM ( SELECT g.polyNo,
                 g.ringNo,
                 g.geom.STPointN(p.IntValue  ) AS sp,
                 g.geom.STPointN(p.IntValue+1) AS ep
            FROM (SELECT polyNo,
                         1 AS ringNo,
                         p.geom.STExteriorRing() AS geom
                    FROM poly p
                   UNION ALL
                  SELECT polyNo,
                         t.IntValue+1 AS ringNo,
                         p.geom.STInteriorRingN(t.IntValue) AS geom
                    FROM poly p
                         CROSS APPLY
                         generate_series(1,p.geom.STNumInteriorRing(),1) t
                 ) g
                 CROSS APPLY
                 generate_series(1,g.geom.STNumPoints(),1) p
           WHERE p.IntValue < g.geom.STNumPoints()
        ) a,
        (SELECT 1 AS n) b
  ORDER BY a.polyNo, a.ringNo, 3;
 -- Results
 polyNo ringNo vectorNo sx  sy  ex  ey
 ------ ------ -------- --- --- --- ---
 1      1      1        10  10  15  10
 1      1      2        15  10  15  15
 1      1      3        15  15  10  15
 1      1      4        10  15  10  10
 1      2      1        12  12  12  14
 1      2      2        12  14  14  14
 1      2      3        14  14  14  12
 1      2      4        14  12  12  12
 2      1      1        100 100 150 100
 2      1      2        150 100 150 150
 2      1      3        150 150 100 150
 2      1      4        100 150 100 100

Notes:

  • Use of non-standard “CROSS APPLY” to allow for table values to be used by the generate_series function;
  • generate_series return referenced by database specific “IntValue” column name;
  • Methods have ST prefix with no underscore;
  • Use of “geometry::” cast.
  • STGeometryType returns “Polygon” (no ST_ prefix as in other two databases);
  • Constructor function name STMPolyFromText different from other databases;
  • STNumInteriorRing reports only interior rings;
  • STX/STY are not methods.

PostgreSQL/PostGIS

 WITH poly AS (
 SELECT ROW_NUMBER() OVER (PARTITION BY b.n ORDER BY b.n) AS polyNo,
        a.geom
   FROM (SELECT ST_GeometryN(o.geom,generate_series(1,ST_NumGeometries(o.geom),1)) AS geom
           FROM (SELECT ST_MultiPolygonFromText(
                       'MULTIPOLYGON(((10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0),
                                     (12.0 12.0, 12.0 14.0, 14.0 14.0, 14.0 12.0, 12.0 12.0)),
                                    ((100.0 100.0, 150.0 100.0, 150.0 150.0, 100.0 150.0, 100.0 100.0)))',0)
                        AS geom
                 ) o
        ) a,
        (SELECT 1 AS n ) b
  WHERE UPPER(ST_GeometryType(a.geom)) = 'ST_POLYGON'
 )
 SELECT a.polyNo,
        a.ringNo,
        ROW_NUMBER() OVER (PARTITION BY a.PolyNo, a.ringNo, b.n ORDER BY b.n) AS vectorNo,
        ST_X(a.sp) AS sx,ST_Y(a.sp) AS sy,
        ST_X(a.ep) AS ex,ST_Y(a.ep) AS ey
   FROM ( SELECT g.polyNo,
                 g.ringNo,
                 ST_PointN(g.geom,generate_series(1,ST_NumPoints(g.geom)-1,1)) AS sp,
                 ST_PointN(g.geom,generate_series(2,ST_NumPoints(g.geom)  ,1)) AS ep
            FROM (SELECT p.polyNo,
                         1 AS ringNo,
                         ST_ExteriorRing(p.geom) AS geom
                    FROM poly p
                   UNION ALL
                  SELECT p.polyNo,
                         r.ringNo + 1,
                         ST_InteriorRingN(p.geom,r.ringNo) AS geom
                    FROM poly p,
                        (SELECT generate_series(1,ST_NumInteriorRing(p.geom),1) AS ringNo FROM poly p) AS r
                 ) g
        ) a,
        (SELECT 1 AS n) b
  ORDER BY a.polyNo, a.ringNo, 3;
 -- Results
 polyNo ringNo vectorNo sx  sy  ex  ey
 ------ ------ -------- --- --- --- ---
 1      1      1        10  10  15  10
 1      1      2        15  10  15  15
 1      1      3        15  15  10  15
 1      1      4        10  15  10  10
 1      2      1        12  12  12  14
 1      2      2        12  14  14  14
 1      2      3        14  14  14  12
 1      2      4        14  12  12  12
 2      1      1        100 100 150 100
 2      1      2        150 100 150 150
 2      1      3        150 150 100 150
 2      1      4        100 150 100 100

Notes:

  • Constructor function name MultiPolygonFromText different from other databases;
  • STNumInteriorRing reports only interior rings.
  • No object.method notation;
  • ST_GeometryType returns “ST_Polygon” which is upper cased for cross-database comparison purposes;
  • set returning function must be implemented as an attribute in a select statement in order to be able to use geometry methods.

Set Returning Functions

PostgreSQL’s inability for set returning functions eg generate_series() to draw their values/parameters from another table’s columns in the SQL is one of the major issues in trying to create cross-database SQL.

Here are some examples that show the issue. The first two show how generate_series can be used reasonably “naturally” with the value from the first table “a” being fed into the generate_series of “b”. The only difference is SQL Server’s keywords “CROSS APPLY” vs Oracle’s “, TABLE”.

 -- SQL Server 2008
 SELECT b.*
   FROM (SELECT 5 AS maxValue) a
        CROSS APPLY
        generate_series(1,a.maxValue,1) b;
 -- Result
 IntValue
 --------
 1
 2
 3
 4
 5
 -- Oracle
 SELECT b.*
   FROM (SELECT 5 AS maxValue FROM DUAL) a,
        TABLE(codesys.geom.generate_series(1,a.maxValue,1)) b;
 -- Result
 COLUMN_VALUE
 ------------
 1
 2
 3
 4
 5

But with PostgreSQL the above pattern cannot be applied.

 -- PostgreSQL - Attempt 1
 SELECT b.*
   FROM (SELECT 5 AS maxValue) a,
        generate_series(1,a.maxValue,1) b;
 -- Result
 ERROR:  FUNCTION expression IN FROM cannot refer TO other relations OF same query level
 LINE 3:        generate_series(1,a.maxValue,1) b; 
 -- PostgreSQL - Attempt 2
 SELECT b.*
   FROM (SELECT 5 AS maxValue) a,
        (SELECT generate_series(1,a.maxValue,1) AS column_value) b;
 -- Result
 ERROR:  subquery IN FROM cannot refer TO other relations OF same query level
 LINE 3:        (SELECT generate_series(1,a.maxValue,1) AS column_val...
 -- PostgreSQL - Attempt 3
 WITH aTable AS (
 SELECT 5 AS maxValue
 )
 SELECT b.*, a.maxValue
   FROM aTable a,
        (SELECT generate_series(1,a.maxValue,1) AS column_value FROM aTable a) b;
 -- Result
 column_value maxValue
 ------------ --------
 1            5
 2            5
 3            5
 4            5
 5            5

Finally, we get a result we can work with with PostgreSQL. However, to use this as the template for “cross database” SQL coding is, I think, pretty unacceptable. It is ugly, wordy and contorted, leading to semantic interpretation problems, never mind the potential for voluminous SQL (with multiple CTEs).

There is a better method in PostgreSQL which is as follows.

 -- PostgreSQL - Attempt 4
 SELECT generate_series(1,a.maxValue,1) b, a.maxValue
   FROM (SELECT 5 AS maxValue) a;
 -- Result
 column_value maxValue
 ------------ --------
 1            5
 2            5
 3            5
 4            5
 5            5

However, this use of a table function as an attribute is not possible in SQL Server and Oracle, thus it cannot be a basis for a cross-platform SQL “template”.

I am sure there are better methods. Anyone?

Summary

Here is a table comparing the OGC/SQLMM Methods Used:

Standard (SQL/MM ie ISO TC211) Oracle (11gR2) SQL Server 2008 PostGIS 1.5
ST_MPolyFromText() ST_BdMPolyFromText() STMPolyFromText() ST_MultiPolygonFromText()
ST_ExteriorRing() ST_ExteriorRing() STExteriorRing() ST_ExteriorRing()
ST_GeometryN() ST_Geometries() STGeometryN() ST_GeometryN()
ST_GeometryType() ST_GeometryType() STGeometryType() ST_GeometryType()
ST_InteriorRingN() ST_InteriorRingN() STInteriorRingN() ST_InteriorRingN()
ST_NumGeometries() none STNumGeometries() ST_NumGeometries()
ST_NumInteriorRing() ST_NumInteriorRing() STNumInteriorRing() ST_NumInteriorRing()
ST_NumPoints() ST_NumPoints() STNumPoints() ST_NumPoints()
ST_PointN() ST_PointN() STPointN() ST_PointN()
ST_X() ST_X() STX ST_X()
ST_Y() ST_Y() STY ST_Y()

In short, the OGC/SQLMM side of the database access problem is reasonably straight-forward (if anything ever is). Sure, there are naming differences but they can overcome. The lack of a true object hierarchy does provide some interesting implementation issues but these, I think, can be encapsulated via WITH or CTE expressions.

But, on the whole “database (or any technology) independence” is not easy to achieve for a whole host of non-spatial reasons.

  • Implementations of the relevant SQL standards (eg SQL99 Analytics) differ across all databases;
  • Non-standard database-specific functionality exists eg ROWNUM/DUAL/TABLE function (Oracle) etc;
  • Lack of “common” methods across all database products for handling collections eg PostgreSQL has no similar “CROSS APPLY”/TABLE function handling capability (can do so via methods that the other databases do not provide);
  • OGC/SQLMM implementation varies across databases with with methods missing (eg ST_GeometryN in Oracle), and limited or no inheritance;
  • Interpretation of OGC/SQLMM standards differ eg SQL Server 2008 + PostGIS has geography datatype which the standard does not mention or require.
  • API keywords vary;
  • Database architectures differ.

Conclusion

Well, this is one attempt at looking at the cross-database. I have not discussed other architectural/philosophical design patterns like Oracle’s “Writers never block readers”, database ROI etc. That stuff is well covered elsewhere eg Tom Kyte’s views are well represented in this article by Mark Rittman called The Cost Of Database Independence and Comparison of different SQL implementations.

I have probably made some errors as I don’t profess to being a PostgreSQL expert (just an experienced user). If anyone else wishes to contribute to this discussion please do via some constructive comments.