Loading and Processing GPX 1.1 files using Oracle XMLDB

There are a large number of geospatial data storage, delivery and interchange formats around nowadays. Including (I limit this discussion to vector data formats only):

  • Geography Markup Language (GML)
  • Keyhole Markup Language
  • GPS eXchange Format (GPX)
  • Vector Markup Language (VML)
  • Scalable Vector Graphics (SVG)
  • Victorian Government’s (Aust) Incremental Update Format (IUF)
  • New South Wales Government’s (Aust) Spatial Data Exchange (�SIX�) (and XML incremental update data stream)
  • LandXML
  • X3D (successor to Virtual Reality Modelling Language – VRML)

Each has multiple versions and the list do including all the GML Application Schemas like Geoscience Markup Language (GeoSciML), CityGML, LandGML, Climate Science Modelling Language (CSML) etc.

So what does all this mean? Certainly that the use of XML for geospatial purposes is firmly in place and likely to increase rather than decrease. And for the Oracle Spatial data practitioner? That they are, inevitably, be exposed to these formats and the need to load new, and update existing, Sdo_Geometry based data.

Over on Oracle Technet’s Spatial forum a user asked the following question :

“I have a GPX data that is XML based I know about FME and convert to shape then to oracle spatial, But what I want exactly to develop a small J2EE application to read the data from GPX file then load the data to database direct without third party software. Please if any one knows how to do it please help me.”

Now, this question cuts to the nub of the alternate way of approaching the processing of geospatial data. Why use an external, geospatial specific, tool when there are perfectly good, generic, integrated tools to do a job? The issues with using external tools are:

  • Need for a separately installed and configured tool either on the same server as the database (frowned upon in most organisations) – why purchase/install a separate tool if the existing database has the functionality?;
  • Cost of purchasing/learning external tool;
  • Need to re-test integration as external tool is upgraded (eg version 1 to version 2);
  • Need for specific expertise in a tool other than the database – what happens if the external tool specialist – who has written a special (idiosyncratic) script (in a non standard language) to implement the load/update – goes on leave or leaves the organisation entirely and the load script fails?;
  • Need for network/database connection – if the external application is on a separate server (or client PC) then a network connection is required when inserting/updating. If a batch process (via crontab or Windows Sheduled Task), what happens if the network goes down?

A lot of these issues are real as I have experienced them myself over the years as a consultant, a GIS Manager and an IT programmer.

Because all the above formats are XML, and Oracle Database (and its competitors) support XML processing inside the database, I thought I would write a short article on loading a GPX 1.1 file into Oracle and then extracting and generating Sdo_Geometry objects from it.

While I have given a presentation on the use of Oracle’s XMLDB before, I have not written a blog article showing, step by step, how to process a GPX file. Here is the article.

Instance Data and Schema

Firstly, the instance dataset. Late last year i downloaded GPX2SHP. Included with it is a file, pinatest2.gpx. This file forms the basis of my work.

Secondly, the GPX file is based on verison 1.1 of an XML Schema which was designed and is hosted by
Topographix. I downloaded and stored this xsd file with my GPX file.

Lots of programs support GPX and there are lots of examples of GPX instance documents available on the web. One source documenting programs and data is at the GISWiki.

There are other sample instance documents at the actual Topographix website but I chose not to use them as some include more complex extensions (which I didn’t want “clouding” the simplicity of this article) or were for GPX version 1.0 (the person asking the question on the Oracle forum has a version 1.1 document).

After having processed the GPX2SHP GPX file I found a great GPX document on the 2005 Tour de France. I wish I had found it earlier as it looks like a great dataset to process (except it too has a lot of complicating extensions holding additional data)!

Checking the data

One of the things about XML data is that often it is constructed informally (no XML Schema or Document Type Definition – DTD ) or is constructed and not checked against its XML Schema in order to ensure validity. Since I will be loading the data using XMLDB’s schema validation capabilties I thought I had best check the dataset I would be using. For this I used Altova’s XMLSpy 2009.

When I opened the pinatest2.gpx document in XMLSpy and validated it, XMLSpy indicated a range of errors in the document.

  1. The metadata section was incorrectly organised
  2. The WayPoint (wpt), Track (trk) and Route (rte) data were incorrectly ordered.

The metadata had to be corrected as follows.

Metadata – Before

 	<metadata>
		<name>Manila to Mt. Pinatubo</name>
		<copyright>Toshihiro Hiraoka</copyright>
		<author>Toshihiro Hiraoka</author>
		<keyword>Manila Mt. Pinatubo gpx2shp</keyword>
		<link>http://gpx2shp.sourceforge.jp</link>
		<description>This is test data for gpx2shp.</description>
		<time>2004-12-29T08:23:55Z</time>
	</metadata>

Metadata – After

	<metadata>
		<name>Manila to Mt. Pinatubo</name>
		<desc>This is test data for gpx2shp.</desc>
		<copyright author="Toshihiro Hiraoka">
			<year>2004</year>
			<license>http://gpx2shp.sourceforge.jp</license>
		</copyright>
		<link href="http://gpx2shp.sourceforge.jp">
			<text>Toshihiro Hiraoka</text>
		</link>
		<time>2004-12-29T08:23:55</time>
		<keywords>Manila Mt. Pinatubo gpx2shp</keywords>
		<bounds maxlat="90.0" maxlon="179.9" minlat="-90.0" minlon="-180.0"/>
	</metadata>

The ordering of the geospatial components in the original file was:

 <gpx ....>
   <metadata>
   <trk>
   <rte>
   <wpt>
 </gpx>

Whereas, the XML Schema requires the following order:

 <gpx ....>
   <metadata>
   <wpt>
   <rte>
   <trk>
 </gpx>

After correction I saved the file to a new name called gpxtest.gpx which you can download from this website.

Loading in to Oracle

After having corrected the GPX file I then proceeded to load it in to Oracle.

First off I needed to grant my schema some additional privileges:

 connect system/******@gisdb
 grant XDBADMIN to gis;
 grant alter session to gis;

Then I need to create an Oracle Directory so that I can access and load the GPX instance data and schema.

 connect gis/*****@gisdb
 CREATE DIRECTORY gpxdir AS 'E:\BlogBits\TipsTricks\GPX';
 GRANT READ ON gpxdir TO PUBLIC;

While not strictly speaking needed, the following function I have found useful as a simple method of loading XML documents from a file in a directory easier.

 create or replace function getClobDocument(
     p_directoryname In varchar2,
     p_filename      In varchar2,
     p_charset       In varchar2 default NULL)
     return          CLOB deterministic
 is
   v_file          bfile := bfilename(p_directoryname,p_filename);
   v_charContent   CLOB := ' ';
   v_targetFile    bfile;
   v_lang_ctx      number := DBMS_LOB.default_lang_ctx;
   v_charset_id    number := 0;
   v_src_offset    number := 1 ;
   v_dst_offset    number := 1 ;
   v_warning       number;
 begin
   if p_charset is not null then
     v_charset_id := NLS_CHARSET_ID(p_charset);
   end if;
   v_targetFile := v_file;
   DBMS_LOB.fileopen(v_targetFile, DBMS_LOB.file_readonly);
   DBMS_LOB.LOADCLOBFROMFILE(v_charContent, 
                             v_targetFile,
                             DBMS_LOB.getLength(v_targetFile), 
                             v_src_offset, 
                             v_dst_offset,
                             v_charset_id, 
                             v_lang_ctx,
                             v_warning);
   DBMS_LOB.fileclose(v_targetFile);
   return v_charContent;
 end;
 /

Before loading our GPX XML data we need to register the XML Schema that describes the XML content using DBMS_XMLSCHEMA.REGISTERSCHEMA. This is done via a two-step process of creating a resource and then registering the actual schema.

 -- Create some useful variable for use in the whole script
 var schemaURL varchar2(256)
 var schemaDoc varchar2(256)

 begin
   :schemaURL := 'http://www.topografix.com/GPX/1/1/gpx.xsd';
   :schemaDoc := 'gpx.xsd';
 end;
 /

 declare
   xmlSchema xmlType;
   res       boolean;
 begin
   xmlSchema := XMLTYPE(getCLOBDocument('GPXDIR',:schemaDoc,'AL32UTF8'));
   if (dbms_xdb.existsResource(:schemaDoc)) then
       dbms_xdb.deleteResource(:schemaDoc);
   end if;
   res := dbms_xdb.createResource(:schemaDoc,xmlSchema);
 end;
 /

 -- May not exist, but attempt to delete anyway.
 execute dbms_xmlSchema.deleteSchema(:schemaURL,DBMS_XMLSCHEMA.DELETE_CASCADE_FORCE);

 -- Now register the schema
 begin
   dbms_xmlschema.registerSchema
   (
     :schemaURL,
     XMLTYPE(getCLOBDocument('GPXDIR',:schemaDoc,'AL32UTF8')),
     TRUE,TRUE,FALSE,FALSE
   );
 end;
 /

 -- Queries to see if it exists
 select * 
   from user_xml_schemas
  where schema_url = :schemaURL;

 SCHEMA_URL                                LOCAL SCHEMA      INT_OBJNAME                QUAL_SCHEMA_URL                                                            HIER_TYPE   
 ----------------------------------------- ----- ----------- -------------------------- -------------------------------------------------------------------------- ---------
 http://www.topografix.com/GPX/1/1/gpx.xsd YES   SYS.XMLTYPE XD79rS1GmSRMqW2gvRg8ufhQ== http://xmlns.oracle.com/xdb/schemas/GIS/www.topografix.com/GPX/1/1/gpx.xsd CONTENTS

 1 rows selected

 SELECT PATH,
        extract(LINK, 
                '/LINK/Name/text()').getstringval() as linkName,
        extract(LINK,
                '/LINK/ParentName/text()').getstringval() as ParentName,
        extract(LINK,
                '/LINK/ChildName/text()').getstringval() as ChildName,
        extract(RES, 
                '/Resource/DisplayName/text()').getstringval() as DisplayName
   FROM PATH_VIEW
  WHERE PATH like '/sys/schemas/GIS/www.topo%';

 PATH                                                LINKNAME           PARENTNAME         CHILDNAME          DISPLAYNAME
 --------------------------------------------------- ------------------ ------------------ ------------------ ------------------
 /sys/schemas/GIS/www.topografix.com                 www.topografix.com GIS                www.topografix.com www.topografix.com 
 /sys/schemas/GIS/www.topografix.com/GPX             GPX                www.topografix.com GPX                GPX
 /sys/schemas/GIS/www.topografix.com/GPX/1           1                  GPX                1                  1
 /sys/schemas/GIS/www.topografix.com/GPX/1/1         1                  1                  1                  1 
 /sys/schemas/GIS/www.topografix.com/GPX/1/1/gpx.xsd gpx.xsd            1                  gpx.xsd            gpx.xsd 

 5 rows selected

Now that the schema is loaded and registered, we can create a table to hold the GPX XML data. (I will not discuss the different types of XML storage that XMLDB provides, sufficeth to say that we will create a table of XMLType object type. That is, we will create an object-table.)

 Prompt Create table to hold the sample GPX document
 drop table GPX;

 CREATE TABLE gpx
     OF XMLType 
     XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx";

 -- Now we can load the data itself
 INSERT INTO GPX 
        VALUES (XMLTYPE(getCLOBDocument('GPXDIR','gpxtest.gpx')));
 commit;

Additional Editing

When loading XML data into a table attributed with its XMLSCHEMA using the above, XMLDB will validate the data as it is loaded. The first-time I loaded the data Oracle XMLDB informed me that

ORA-01830: date format picture ends before converting entire input string xsd:dateTime

Upon investigation, the GPX file contains dateTime values like the following:

 <time>2004-12-29T08:23:55Z</time>

With Schema-based XML with nodes of type xsd:dateTime, if an instance document contains dates with TimeZone information (ie the “Z” at the end of the time value above), then the XML schema must be annotated with xsd:SQLType=“TIMESTAMP WITH TIME ZONE” in order to tell XMLDB to expect Time Zone info.

While this is not that difficult to do it would entail changing both the XSD and XML. Something I don’t think my readers would want to do to their GPX documents. So, in order to get around XMLDB I simply removed the “Z” suffix from the end of all time values in the GPX document.

Checking the table creation and load we get.

 select *
   from user_xml_tables 
  where table_name = 'GPX';

 TABLE_NAME XMLSCHEMA                                 SCHEMA_OWNER ELEMENT_NAME  STORAGE_TYPE
 ---------- ----------------------------------------- ------------ ------------- -----------------
 GPX        http://www.topografix.com/GPX/1/1/gpx.xsd GIS          gpx           OBJECT-RELATIONAL

 1 rows selected

And, an additional check of the validity of the loaded XML shows all is well.

 SELECT XMLIsValid(g.OBJECT_VALUE,'http://www.topografix.com/GPX/1/1/gpx.xsd') 
   FROM gpx g;

 XMLISVALID(G.OBJECT_VALUE,'HTTP://WWW.TOPOGRAFIX.COM/GPX/1/1/GPX.XSD') 
 ---------------------------------------------------------------------- 
 1                                                                      

 1 rows selected

Note that it is possible to define the table to have a CHECk constraint as follows:

 CREATE TABLE gpx
     OF XMLType 
    (CHECK (XMLIsValid(object_value) = 1))
     XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx";

Querying

Let’s start our querying with a very simple request to see the name of the GPX file.

SELECT extractValue(g.object_value,’/gpx/metadata/name’,’xmlns="http://www.topografix.com/GPX/1/1"’) metadataName
FROM GPX g;

METADATANAME
————————-
Manila to Mt. Pinatubo

1 rows selected

Let’s now see the full metadata.

 SELECT extractValue(g.object_value,'/gpx/metadata/name','xmlns="http://www.topografix.com/GPX/1/1"') as Name,
         extractValue(g.object_value,'/gpx/metadata/desc','xmlns="http://www.topografix.com/GPX/1/1"') as Description,
         extractValue(g.object_value,'/gpx/metadata/copyright/year','xmlns="http://www.topografix.com/GPX/1/1"') as Copyright_Year,
         extractValue(g.object_value,'/gpx/metadata/copyright/license','xmlns="http://www.topografix.com/GPX/1/1"') as Copyright_License,
         extractValue(g.object_value,'/gpx/metadata/link/@href','xmlns="http://www.topografix.com/GPX/1/1"') as Hyperlink,
         extractValue(g.object_value,'/gpx/metadata/link/text','xmlns="http://www.topografix.com/GPX/1/1"') as Hyperlink_Text,
         extractValue(g.object_value,'/gpx/metadata/time','xmlns="http://www.topografix.com/GPX/1/1"') as Document_DateTime,
         extractValue(g.object_value,'/gpx/metadata/keywords','xmlns="http://www.topografix.com/GPX/1/1"') as keywords,
         extractValue(g.object_value,'/gpx/metadata/bounds/@minlon','xmlns="http://www.topografix.com/GPX/1/1"') as MinLong,
         extractValue(g.object_value,'/gpx/metadata/bounds/@minlat','xmlns="http://www.topografix.com/GPX/1/1"') as MinLat,
         extractValue(g.object_value,'/gpx/metadata/bounds/@maxlon','xmlns="http://www.topografix.com/GPX/1/1"') as MaxLong,
         extractValue(g.object_value,'/gpx/metadata/bounds/@maxlat','xmlns="http://www.topografix.com/GPX/1/1"') as MaxLat
  FROM GPX g;

 NAME                   DESCRIPTION                    COPYRIGHT_YEAR COPYRIGHT_LICENSE             HYPERLINK                     HYPERLINK_TEXT    DOCUMENT_DATETIME               KEYWORDS                    MINLONG MINLAT MAXLONG MAXLAT
 ---------------------- ------------------------------ -------------- ----------------------------- ----------------------------- ----------------- ------------------------------- --------------------------- ------- ------ ------- ------
 Manila to Mt. Pinatubo This is test data for gpx2shp. 01/JAN/04      http://gpx2shp.sourceforge.jp http://gpx2shp.sourceforge.jp Toshihiro Hiraoka 29/DEC/04 08:23:55.000000000 AM Manila Mt. Pinatubo gpx2shp    -180    -90   179.9     90
 
 1 rows selected

Now let’s extract our waypoints (only name and coordinates).

 SELECT SUBSTR(EXTRACTVALUE(VALUE(t), 'wpt/name'),1,20) as Name,
         TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lon')) as longitude,
         TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lat')) as latitude,
         TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/ele'))  as Elevation
  FROM GPX g,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/wpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t;

 NAME                 LONGITUDE              LATITUDE               ELEVATION
 -------------------- ---------------------- ---------------------- ----------
 001                  121.043382715          14.636015547           45.307495
 002                  121.042653322          14.637198653           50.594727
 003                  121.043165457          14.640581002           46.989868
 004                  120.155537082          14.975596117           38.097656
 ...
 111                  120.917379661          14.474465596           5.172729
 112                  120.938029736          14.942621375           28.965332
 113                  124.379444299          12.255321192           21.995728
 GARMIN               -94.799016668          38.855549991           325.049072
 GRMEUR               -1.463899976           50.982883293           35.934692
 GRMPHX               -111.946110008         33.330189949           361.098145
 GRMTWN               121.640266674          25.061783362           38.097656
 NEDA4                121.043607602          14.636060139           8.296997

 118 rows selected

Since our target is SDO_Geometry let’s re-do the above query and make it generate Oracle Spatial data.

 SELECT SUBSTR(EXTRACTVALUE(VALUE(t), 'wpt/name'),1,20) as Name,
         SDO_GEOMETRY(3001,
                      8307,
                      SDO_POINT_TYPE(TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lon')),
                                     TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lat')),
                                     TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/ele'))),
  	             NULL,
 		 NULL) as geom
  FROM GPX g,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/wpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t;

 NAME                 GEOM
 ------ --------------------------------------------------------------------------------------------------
 001    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043382715,14.636015547,45.307495),null,null)
 002    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.042653322,14.637198653,50.594727),null,null)
 003    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043165457,14.640581002,46.989868),null,null)
 004    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.155537082,14.975596117,38.097656),null,null)
 ...
 111    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.917379661,14.474465596,5.172729),null,null)
 112    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.938029736,14.942621375,28.965332),null,null)
 113    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(124.379444299,12.255321192,21.995728),null,null)
 GARMIN MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-94.799016668,38.855549991,325.049072),null,null)
 GRMEUR MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-1.463899976,50.982883293,35.934692),null,null)
 GRMPHX MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-111.946110008,33.330189949,361.098145),null,null)
 GRMTWN MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.640266674,25.061783362,38.097656),null,null)
 NEDA4  MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043607602,14.636060139,8.296997),null,null)

 118 rows selected

Now waypoints are the bread and butter of most GPS use, but we also find tracks and sometimes routes. Both of these are linear elements so we need to see how to process them.

First, let’s look at the routes in the GPX file.

 -- Show all routes
 SELECT SUBSTR(   EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name,
        TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number'))    as route_number
   FROM GPX g,
        TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte','xmlns="http://www.topografix.com/GPX/1/1"'))) t1;

 ROUTE_NAME ROUTE_NUMBER
 ---------- ------------
 10-SEP-04  4

 1 rows selected

 -- Show the route and all its points (as XML)
 SELECT SUBSTR(   EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name,
        TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number'))    as route_number,
        EXTRACT(g.OBJECT_VALUE,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"') as route_pts
   FROM GPX g,
        TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]','xmlns="http://www.topografix.com/GPX/1/1"'))) t1;

 ROUTE_NAME ROUTE_NUMBER           ROUTE_PTS
 ---------- ------------ ---------------------------------------------------------------------------------
 10-SEP-04  4            <rtept xmlns="http://www.topografix.com/GPX/1/1" lat="15.044639" lon="120.25491">
                           <ele>209.704834</ele>
                           <time>2004-09-09T01:57:08.000000</time>
                         </rtept>
 ..
                         <rtept xmlns="http://www.topografix.com/GPX/1/1" lat="14.980567" lon="120.193627">
                           <ele>68.871582</ele>
                           <time>2004-09-10T00:51:48.000000</time>
                         </rtept>

 1 rows selected

 -- Finally, show the route with its point objects (same definition as a waypoint point object)
 SELECT SUBSTR(   EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name,
         TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number'))    as route_number,
         TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) as longitude,
         TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) as latitude,
         TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/ele','xmlns="http://www.topografix.com/GPX/1/1"')) as elevation
  FROM GPX g,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]','xmlns="http://www.topografix.com/GPX/1/1"'))) t1,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t;

 ROUTE_NAME ROUTE_NUMBER LONGITUDE  LATITUDE   ELEVATION
 ---------- ------------ ---------- ---------- ----------
 10-SEP-04  4            120.25491  15.044639  209.704834
 10-SEP-04  4            120.25491  15.044446  206.340088
 10-SEP-04  4            120.254931 15.044231  206.340088
 10-SEP-04  4            120.255103 15.043373  207.782104
 10-SEP-04  4            120.25506  15.04333   206.820679
 .....
 10-SEP-04  4            120.193799 14.980609  69.832886
 10-SEP-04  4            120.193627 14.980567  68.871582 

 323 rows selected

Finally, let’s use some SQL trickery to turn this single route into a linear SDO_Geometry object.

 -- Turn (Single) Route into geometry
 SELECT mdsys.sdo_geometry(3002,8307,null,mdsys.sdo_elem_info_array(1,2,1),
 CAST(MULTISET( 
  SELECT case when r.rin = 1 then longitude
              when r.rin = 2 then latitude
              when r.rin = 3 then elevation
          end as ordinate
    FROM (select level rin from dual connect by level < 4) r,
         (SELECT rownum as pt,
                  TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) as longitude,
                  TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) as latitude,
                  TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/ele','xmlns="http://www.topografix.com/GPX/1/1"'))  as elevation
            FROM GPX g,
                 TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t
         ) x
    order by x.pt, r.rin
 ) AS mdsys.sdo_ordinate_array)) as geom
   FROM DUAL ;

 GEOM
 --------------------------------------------
 MDSYS.SDO_GEOMETRY(3002,8307,null,
       MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),
       MDSYS.SDO_ORDINATE_ARRAY(
             120.25491,15.044639,209.704834,
             120.25491,15.044446,206.340088,
             120.254931,15.044231,206.340088,
             120.255103,15.043373,207.782104,
             120.25506,15.04333,206.820679,
             ...
             120.193799,14.980609,69.832886,
             120.193627,14.980567,68.871582))

 1 rows selected

The above geometry (which also passed SDO_GEOM.VALIDATE_GEOMETRY testing) was edited to aid presentation as it contains 323 3D coordinates.

Note you cannot use SDO_AGGR_UNION with point objects to generate a linestring (change the 3005 to 2002 and change the SDO_ELEM_INFO_ARRAY). The reason is that, at 10gR2, the resultant aggregated points can only be 2D.

Now the above approach is fine if the file contains a single route: what if it contains many?

The GPX file I loaded contains a number of tracks. Let’s look at them.

 -- Get all tracks
 SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
        SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name
   FROM GPX g,
        TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1;

 TRACK_NUMBER           TRACK_NAME 
 ---------------------- ---------- 
 1                      07-SEP-04  
 2                      08-SEP-04  
 3                      08-SEP-04  
 7                      Only 5 poi 

 4 rows selected

We have 4 tracks.

Now, tracks are organised a little differently than routes. Tracks can contain multiple segments (trkseg) each of which can have multiple points (trkpt).

 -- Get each track's segments
 SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
        SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
        EXTRACT(g.OBJECT_VALUE,'/gpx/trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"') as trackSeg
   FROM GPX g,
        TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1;

 TRACK_NUMBER           TRACK_NAME TRACKSEG 
 ---------------------- ---------- -----------------------------------------------------------------------------------------------------------------------------
 1                      07-SEP-04  <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="14.585381" lon="121.056118"><ele>42.916016</ele>  ... </trkseg>
 2                      08-SEP-04  <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="15.043309" lon="120.254095"><ele>206.340088</ele> ... </trkseg>
 3                      08-SEP-04  <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="15.043309" lon="120.254095"><ele>206.340088</ele> ... </trkseg>
 7                      Only 5 poi <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="14.907889" lon="120.558472"><ele>28.976929</ele>  ... </trkseg> 

 4 rows selected

Of course I have edited and truncated the trkseg data.

Now, let’s turn these tracks into linestring geometries.

 -- Turn (Multiple) Tracks into geometry
 -- Note, all can only have one track segment
 -- (If more than one track segment would need to handle by generating more than one row per track and group output below by SDO_AGGR_UNION)
 -- 
 WITH tracks As (
 SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
        SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
        EXTRACT(g.OBJECT_VALUE,'/gpx/trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"') as trackSeg
   FROM GPX g,
        TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1
 )
 SELECT trk.track_number,
        trk.track_name,
        sdo_geom.validate_geometry(mdsys.sdo_geometry(3002,8307,null,sdo_elem_info_array(1,2,1),
        CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"'))
                                  when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"'))
                                  when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"'))
                              end ordinate
                        FROM (select level as rin from dual connect by level < 4) r,
                              TABLE(XMLSequence(EXTRACT(trk.trackSeg,'/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
                     ) as mdsys.sdo_ordinate_array
            )),0.05) as geom
   FROM tracks trk;

 TRACK_NUMBER           TRACK_NAME GEOM
 ---------------------- ---------- ----
 1                      07-SEP-04  TRUE
 2                      08-SEP-04  TRUE
 3                      08-SEP-04  TRUE
 7                      Only 5 poi TRUE

 4 rows selected

Luckily, each of these tracks only has one track segment. If a track had more than one track segment we would have to generate MULTILINESTRINGS and not LINESTRINGS. For me that was slightly more involved as I ran in difficulties pursuing an pure-SQL based approach.

MultiLineStrings

The above SQL works because it splits out each trkseg that is associated with a track via a separate line. One could use SDO_AGGR_UNION or SDO_AGGR_CONCAT_LINES on the resultant geometry objects but I have found, after modifying the above SQL with a suitable GROUP BY clause that I could not generate the correct answer with either spatial aggregate (the main problem is the generation of the SDO_ELEM_INFO_ARRAY).

In the end I decided that I would have to do the geometry generation myself.

Firstly, I modified the gpxtest.gpx file merging the last two tracks:

Tracks and Segments – Before

 			 <trkpt lat="15.044639" lon="120.254910">
			 	 <ele>209.704834</ele>
				 <time>2004-09-09T01:56:51</time>
			 </trkpt>
		 </trkseg>
	 </trk>
	 <trk>
		 <name>Only 5 points</name>
		 <number>7</number>
		 <trkseg>
			 <trkpt lat="14.907889" lon="120.558472">
				 <ele>28.976929</ele>
				 <time>2004-09-10T08:35:38</time>
			 </trkpt>

Tracks and Segments – After

			 <trkpt lat="15.044639" lon="120.254910">
				 <ele>209.704834</ele>
				 <time>2004-09-09T01:56:51</time>
			 </trkpt>
		 </trkseg>
		 <trkseg>
			 <trkpt lat="14.907889" lon="120.558472">
				 <ele>28.976929</ele>
				 <time>2004-09-10T08:35:38</time>
			 </trkpt>

Then I loaded the modified GPX file into a new table.

 drop table GPX2;
 purge recyclebin;
 create table GPX2
     of xmltype
     XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx";
 INSERT INTO GPX2 VALUES (XMLTYPE(getCLOBDocument('GPXDIR','gpxtest2.gpx')));
 commit;

Now let’s query out the number of segments per track.

 SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
        SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
        count(*) as segments
   FROM GPX2 g,
        TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1,
        TABLE(XMLSequence(EXTRACT(VALUE(t1),'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"'))) t2
  GROUP BY TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')),
           SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  
  order by 1;

 TRACK_NUMBER           TRACK_NAME SEGMENTS
 ---------------------- ---------- ----------------------
 1                      07-SEP-04  1
 2                      08-SEP-04  1
 3                      08-SEP-04  2

 3 rows selected

You will note that I have to make extensive use of standard SQL’s COUNT function while, in many cases, use of XPath’s COUNT function would be far neater. However, XMLDB does not support the XPath COUNT function at 10gR2. It is, however, supported at 11gR1.

Now, to generate a single geometry for all segments in a track one must, of course, ensure that the SDO_ELEM_INFO_ARRAY is correctly generated. I looked at different ways to do this in pure SQL but I could not get a base SQL statement that I could to extend to work. For example, the following SQL statement, which tries to use the CAST(MULTISET approach that worked successfully for generating the SDO_ORDINATE_ARRAY, simply refused to terminate. I could get elements of the SQL to work via a “divide-and-conquery” approach but I was thwarted at every turn.

 -- This doesn't work as its execution never returns
 WITH tracks As (
 SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
        SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
        (SELECT count(*)
           FROM TABLE(XMLSequence(EXTRACT(VALUE(t1),
                                          'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg',
                                          'xmlns="http://www.topografix.com/GPX/1/1"')))) as SegCount,
        EXTRACT(VALUE(t1),
                'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg',
                'xmlns="http://www.topografix.com/GPX/1/1"') as allSegments
   FROM GPX2 g,
        TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1
 )
 SELECT trk.track_number,
        trk.track_name,
        CASE WHEN trk.segcount = 1 
             THEN mdsys.sdo_elem_info_array(1,2,1) 
             ELSE 
        CAST(MULTISET(SELECT count(*)
                        FROM TABLE(codesys.geom.generate_series(1,trk.segCount,1)) gs,
                             TABLE(XMLSequence(EXTRACT(trk.allSegments,'/trkseg['||gs.column_value||']/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
                       GROUP BY gs.column_value
                      ) AS mdsys.sdo_elem_info_array ) 
         END as coordarray,
        mdsys.sdo_geometry(3002,8307,null,
          mdsys.sdo_elem_info_array(1,2,1),
          /* mdsys.sdo_ordinate_array */
          CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"'))
                                    when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"'))
                                    when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"'))
                                end ordinate
                          FROM (select level as rin from dual connect by level < 4) r,
                                TABLE(XMLSequence(EXTRACT(trk.allSegments,'/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
                       ) as mdsys.sdo_ordinate_array
              )) as geom
   FROM tracks trk;
 -- Does not terminate

In the end, I decided to encapsulate the SDO_ELEM_INFO_ARRAY processing into a PL/SQL Function.

 CREATE OR REPLACE FUNCTION GetElemInfoFromXML ( p_trkseg_xml IN XMLType
 ) RETURN mdsys.Sdo_Elem_Info_Array 
 AS
   v_seg_count number;
   v_array     mdsys.sdo_ordinate_array;
   v_elem_info mdsys.Sdo_Elem_Info_Array := new mdsys.Sdo_Elem_Info_Array(1,2,1);
 BEGIN
   SELECT count(*)
     INTO v_seg_count
     FROM TABLE(XMLSequence(EXTRACT(p_trkseg_xml,
                                    'trkseg',
                                    'xmlns="http://www.topografix.com/GPX/1/1"')));
   If ( v_seg_count > 0 ) Then
     SELECT count(*)
       BULK COLLECT INTO v_array
       FROM TABLE(codesys.geom.generate_series(1,v_seg_count,1)) gs,
            TABLE(XMLSequence(EXTRACT(p_trkseg_xml,
                                              '/trkseg['||gs.column_value||']/trkpt',
                                              'xmlns="http://www.topografix.com/GPX/1/1"'))) t
      GROUP BY gs.column_value
      ORDER BY gs.column_value;
   
     IF ( v_array is not null AND v_array.COUNT > 1 ) THEN
       FOR i IN v_array.FIRST..(v_array.LAST-1) LOOP
         v_elem_info.EXTEND(3);
         v_elem_info(v_elem_info.COUNT-2) := ( v_array(i) * 3 ) + v_elem_info(v_elem_info.COUNT-5);
         v_elem_info(v_elem_info.COUNT-1) := 2;
         v_elem_info(v_elem_info.COUNT  ) := 1;
       END LOOP;
     END IF;
   END IF;
   RETURN v_elem_info;
 END GetElemInfoFromXML;
 /
 show errors

Which I then used as follows:

 SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
        SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
        mdsys.sdo_geometry(CASE WHEN (SELECT COUNT(*) 
                                        FROM TABLE(XMLSequence(EXTRACT(VALUE(allSegments),
                                                                      'trk/trkseg',
                                                                      'xmlns="http://www.topografix.com/GPX/1/1"')))) > 1
                                THEN 3006
                                ELSE 3002
                            END,
                           8307,
                           NULL,
          GetElemInfoFromXML(EXTRACT(VALUE(allSegments),'trk/trkseg','xmlns="http://www.topografix.com/GPX/1/1"')),
          CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"'))
                                    when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"'))
                                    when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"'))
                                end ordinate
                          FROM (select level as rin from dual connect by level < 4) r,
                                TABLE(XMLSequence(EXTRACT(VALUE(allSegments),'trk/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
                       ) as mdsys.sdo_ordinate_array
              )) as geom
   FROM GPX2 g,
        TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1, 
        TABLE(XMLSequence(EXTRACT(VALUE(t1),'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']','xmlns="http://www.topografix.com/GPX/1/1"'))) allSegments;
  
 TRACK_NUMBER TRACK_NAME GEOM
 ------------ ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1            07-SEP-04  MDSYS.SDO_GEOMETRY(3002,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(121.056118,14.595809,49.645264,...,120.246134,15.043287,209.224121)) 
 2            08-SEP-04  MDSYS.SDO_GEOMETRY(3002,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(120.254095,15.043309,210.666016,...,120.236864,15.034811,114.053589)) 
 3            08-SEP-04  MDSYS.SDO_GEOMETRY(3006,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1,478,2,1),MDSYS.SDO_ORDINATE_ARRAY(120.254095,15.043309,210.666016,...,120.558879,14.908383,28.015503)) 
   
 3 rows selected

So, a combination of SQL and PL/SQL solved the problem of how to generate MULTILINESTRINGS for tracks with multiple segments. Once I upgrade to 11gR1 I will re-look at this problem again to see if use of XPath’s COUNT function can help me implement a solution based on pure SQL.

Conclusion

XMLDB (or even any “horizontal market” XML processing tool) is an excellent tool for handling and processing GPX and other geospatial XML formats. The benefits for use of XMLDB include:

  • Fully integrated into the Oracle database;
  • A powerful, generic tool not limited to geospatial data processing;
  • No additional licensing cost;
  • No need for a separate application or application server;
  • Updates and processing can be executed by Oracle’s DBMS_JOB or DBMS_SCHEDULER queues.

I don’t profess to be an XML or XMLDB expert. If you find any errors in what I have written, please write to me and let me know and I will correct them.

I hope this is of use to someone.