Loading and Processing GPX 1.1 files using Oracle XMLDB

Note

This is an article I wrote some years back which I have ported to my new website.

Introduction

There are a large number of geospatial data storage, delivery and interchange formats around nowadays. Including (I limit this discussion to vector data formats only):

  • Geography Markup Language (GML)
  • Keyhole Markup Language
  • GPS eXchange Format (GPX)
  • Vector Markup Language (VML)
  • Scalable Vector Graphics (SVG)
  • Victorian Government’s (Aust) Incremental Update Format (IUF)
  • New South Wales Government’s (Aust) Spatial Data Exchange (SIX) (and XML incremental update data stream)
  • LandXML
  • X3D (successor to Virtual Reality Modelling Language VRML)

Each has multiple versions and the list do including all the GML Application Schemas like Geoscience Markup Language (GeoSciML), CityGML, LandGML, Climate Science Modelling Language (CSML) etc.

So what does all this mean? Certainly that the use of XML for geospatial purposes is firmly in place and likely to increase rather than decrease. And for the Oracle Spatial data practitioner? That they are, inevitably, be exposed to these formats and the need to load new, and update existing, SDO_GEOMETRY based data.

Over on Oracle Technet’s Spatial forum a user asked the following question :

“I have a GPX data that is XML based I know about FME and convert to shape then to oracle spatial, But what I want exactly to develop a small J2EE application to read the data from GPX file then load the data to database direct without third party software. Please if any one knows how to do it please help me.”

Now, this question cuts to the nub of the alternate way of approaching the processing of geospatial data. Why use an external, geospatial specific, tool when there are perfectly good, generic, integrated tools to do a job? The issues with using external tools are:

  • Need for a separately installed and configured tool either on the same server as the database (frowned upon in most organisations) – why purchase/install a separate tool if the existing database has the functionality?;
  • Cost of purchasing/learning external tool;
  • Need to re-test integration as external tool is upgraded (eg version 1 to version 2);
  • Need for specific expertise in a tool other than the database – what happens if the external tool specialist – who has written a special (idiosyncratic) script (in a non standard language) to implement the load/update – goes on leave or leaves the organisation entirely and the load script fails?;
  • Need for network/database connection – if the external application is on a separate server (or client PC) then a network connection is required when inserting/updating. If a batch process (via crontab or Windows Sheduled Task), what happens if the network goes down?

A lot of these issues are real as I have experienced them myself over the years as a consultant, a GIS Manager and an IT programmer.

Because all the above formats are XML, and Oracle Database (and its competitors – see other articles) support XML processing inside the database, I thought I would write a short article on loading a GPX 1.1 file into Oracle and then extracting and generating SDO_GEOMETRY objects from it.

This article article shows, step by step, how to process a GPX file.

Instance Data and Schema

Firstly, the instance dataset. When I downloaded GPX2SHP, it included with it a file, gpxtest.gpx. This file forms the basis of this article.

Secondly, the GPX file is based on verison 1.1 of an XML Schema which was designed and is hosted by Topographix. I downloaded and stored this XSD file with my GPX file.

Lots of programs support GPX and there are lots of examples of GPX instance documents available on the web. One source documenting programs and data is at the GISWiki.

There are other sample instance documents at the actual Topographix website but I chose not to use them as some include more complex extensions (which I didn’t want “clouding” the simplicity of this article) or were for GPX version 1.0 (the person asking the question on the Oracle forum has a version 1.1 document).

Checking the data

One of the things about XML data is that often it is constructed informally (no XML Schema or Document Type Definition – DTD ) or is constructed and not checked against its XML Schema in order to ensure validity. Since I will be loading the data using XMLDB’s schema validation capabilties I thought I had best check the dataset I would be using. For this I used Altova’s XMLSpy 2009.

When I opened the pinatest2.gpx document in XMLSpy and validated it, XMLSpy indicated a range of errors in the document.

  1. The metadata section was incorrectly organised
  2. The WayPoint (wpt), Track (trk) and Route (rte) data were incorrectly ordered.

The metadata had to be corrected as follows.

MetadataBefore

<metadata>
  <name>Manila to Mt. Pinatubo</name>
  <copyright>Toshihiro Hiraoka</copyright>
  <author>Toshihiro Hiraoka</author>
  <keyword>Manila Mt. Pinatubo gpx2shp</keyword>
  <link>http://gpx2shp.sourceforge.jp</link>
  <description>This is test data for gpx2shp.</description>
  <time>2004-12-29T08:23:55Z</time>
</metadata>

MetadataAfter

<metadata>
  <name>Manila to Mt. Pinatubo</name>
  <desc>This is test data for gpx2shp.</desc>
  <copyright author="Toshihiro Hiraoka">
    <year>2004</year>
    <license>http://gpx2shp.sourceforge.jp</license>
  </copyright>
  <link href="http://gpx2shp.sourceforge.jp">
    <text>Toshihiro Hiraoka</text>
  </link>
  <time>2004-12-29T08:23:55</time>
  <keywords>Manila Mt. Pinatubo gpx2shp</keywords>
  <bounds maxlat="90.0" maxlon="179.9" minlat="-90.0" minlon="-180.0"/>
</metadata>

The ordering of the geospatial components in the original file was:

<gpx ....>
  <metadata>
  <trk>
  <rte>
  <wpt>
</gpx>

Whereas, the XML Schema requires the following order:

<gpx ....>
  <metadata>
  <wpt>
  <rte>
  <trk>
</gpx>

After correction I saved the file to a new name called gpxtest.gpx which you can download from this website.

Loading in to Oracle

After having corrected the GPX file I then proceeded to load it in to Oracle.

First off I needed to grant my schema some additional privileges:

connect system/******@gisdb
grant XDBADMIN to gis;
grant alter session to gis;

Then I need to create an Oracle Directory so that I can access and load the GPX instance data and schema.

connect gis/*****@gisdb
CREATE DIRECTORY gpxdir AS 'E:\BlogBits\TipsTricks\GPX';
GRANT READ ON gpxdir TO PUBLIC;

While not strictly speaking needed, the following function I have found useful as a simple method of loading XML documents from a file in a directory easier.

create or replace function getClobDocument(
    p_directoryname In varchar2,
    p_filename      In varchar2,
    p_charset       In varchar2 default NULL)
    return          CLOB deterministic
is
  v_file          bfile := bfilename(p_directoryname,p_filename);
  v_charContent   CLOB := ' ';
  v_targetFile    bfile;
  v_lang_ctx      number := DBMS_LOB.default_lang_ctx;
  v_charset_id    number := 0;
  v_src_offset    number := 1 ;
  v_dst_offset    number := 1 ;
  v_warning       number;
begin
  if p_charset is not null then
    v_charset_id := NLS_CHARSET_ID(p_charset);
  end if;
  v_targetFile := v_file;
  DBMS_LOB.fileopen(v_targetFile, DBMS_LOB.file_readonly);
  DBMS_LOB.LOADCLOBFROMFILE(v_charContent, 
                            v_targetFile,
                            DBMS_LOB.getLength(v_targetFile), 
                            v_src_offset, 
                            v_dst_offset,
                            v_charset_id, 
                            v_lang_ctx,
                            v_warning);
  DBMS_LOB.fileclose(v_targetFile);
  return v_charContent;
end;
/

Before loading our GPX XML data we need to register the XML Schema that describes the XML content using DBMS_XMLSCHEMA.REGISTERSCHEMA. This is done via a two-step process of creating a resource and then registering the actual schema.

-- Create some useful variable for use in the whole script
var schemaURL varchar2(256)
var schemaDoc varchar2(256)

begin
  :schemaURL := 'http://www.topografix.com/GPX/1/1/gpx.xsd';
  :schemaDoc := 'gpx.xsd';
end;
/

declare
  xmlSchema xmlType;
  res       boolean;
begin
  xmlSchema := XMLTYPE(getCLOBDocument('GPXDIR',:schemaDoc,'AL32UTF8'));
  if (dbms_xdb.existsResource(:schemaDoc)) then
      dbms_xdb.deleteResource(:schemaDoc);
  end if;
  res := dbms_xdb.createResource(:schemaDoc,xmlSchema);
end;
/

-- May not exist, but attempt to delete anyway.
execute dbms_xmlSchema.deleteSchema(:schemaURL,DBMS_XMLSCHEMA.DELETE_CASCADE_FORCE);

-- Now register the schema
begin
  dbms_xmlschema.registerSchema
  (
    :schemaURL,
    XMLTYPE(getCLOBDocument('GPXDIR',:schemaDoc,'AL32UTF8')),
    TRUE,TRUE,FALSE,FALSE
  );
end;
/

-- Queries to see if it exists
select * 
  from user_xml_schemas
 where schema_url = :schemaURL;

SCHEMA_URL                                LOCAL SCHEMA      INT_OBJNAME                QUAL_SCHEMA_URL                                                            HIER_TYPE   
----------------------------------------- ----- ----------- -------------------------- -------------------------------------------------------------------------- ---------
http://www.topografix.com/GPX/1/1/gpx.xsd YES   SYS.XMLTYPE XD79rS1GmSRMqW2gvRg8ufhQ== http://xmlns.oracle.com/xdb/schemas/GIS/www.topografix.com/GPX/1/1/gpx.xsd CONTENTS

1 rows selected

SELECT PATH,
       extract(LINK, 
               '/LINK/Name/text()').getstringval() as linkName,
       extract(LINK,
               '/LINK/ParentName/text()').getstringval() as ParentName,
       extract(LINK,
               '/LINK/ChildName/text()').getstringval() as ChildName,
       extract(RES, 
               '/Resource/DisplayName/text()').getstringval() as DisplayName
  FROM PATH_VIEW
 WHERE PATH like '/sys/schemas/GIS/www.topo%';

PATH                                                LINKNAME           PARENTNAME         CHILDNAME          DISPLAYNAME
--------------------------------------------------- ------------------ ------------------ ------------------ ------------------
/sys/schemas/GIS/www.topografix.com                 www.topografix.com GIS                www.topografix.com www.topografix.com 
/sys/schemas/GIS/www.topografix.com/GPX             GPX                www.topografix.com GPX                GPX
/sys/schemas/GIS/www.topografix.com/GPX/1           1                  GPX                1                  1
/sys/schemas/GIS/www.topografix.com/GPX/1/1         1                  1                  1                  1 
/sys/schemas/GIS/www.topografix.com/GPX/1/1/gpx.xsd gpx.xsd            1                  gpx.xsd            gpx.xsd 

5 rows selected

Now that the schema is loaded and registered, we can create a table to hold the GPX XML data. (I will not discuss the different types of XML storage that XMLDB provides, sufficeth to say that we will create a table of XMLType object type. That is, we will create an object-table.)

Prompt Create table to hold the sample GPX document
drop table GPX;

CREATE TABLE gpx
    OF XMLType 
    XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx";

-- Now we can load the data itself
INSERT INTO GPX 
       VALUES (XMLTYPE(getCLOBDocument('GPXDIR','gpxtest.gpx')));
commit;

Additional Editing

When loading XML data into a table attributed with its XMLSCHEMA using the above, XMLDB will validate the data as it is loaded. The first-time I loaded the data Oracle XMLDB informed me that

ORA-01830: date format picture ends before converting entire input string xsd:dateTime

Upon investigation, the GPX file contains dateTime values like the following:

<time>2004-12-29T08:23:55Z</time>

With Schema-based XML with nodes of type xsd:dateTime, if an instance document contains dates with TimeZone information (ie the “Z” at the end of the time value above), then the XML schema must be annotated with xsd:SQLType=“TIMESTAMP WITH TIME ZONE” in order to tell XMLDB to expect Time Zone info.

While this is not that difficult to do it would entail changing both the XSD and XML. Something I don’t think my readers would want to do to their GPX documents. So, in order to get around XMLDB I simply removed the “Z” suffix from the end of all time values in the GPX document.

Checking the table creation and load we get.

select *
  from user_xml_tables 
 where table_name = 'GPX';

TABLE_NAME XMLSCHEMA                                 SCHEMA_OWNER ELEMENT_NAME  STORAGE_TYPE
---------- ----------------------------------------- ------------ ------------- -----------------
GPX        http://www.topografix.com/GPX/1/1/gpx.xsd GIS          gpx           OBJECT-RELATIONAL

1 rows selected

And, an additional check of the validity of the loaded XML shows all is well.

SELECT XMLIsValid(g.OBJECT_VALUE,'http://www.topografix.com/GPX/1/1/gpx.xsd') 
  FROM gpx g;

XMLISVALID(G.OBJECT_VALUE,'HTTP://WWW.TOPOGRAFIX.COM/GPX/1/1/GPX.XSD') 
---------------------------------------------------------------------- 
1                                                                      

1 rows selected

Note that it is possible to define the table to have a CHECK constraint as follows:

CREATE TABLE gpx
    OF XMLType 
   (CHECK (XMLIsValid(object_value) = 1))
    XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx";

Querying

Let’s start our querying with a very simple request to see the name of the GPX file.

SELECT extractValue(g.object_value,'/gpx/metadata/name','xmlns="http://www.topografix.com/GPX/1/1"') metadataName
 FROM GPX g;

METADATANAME
-------------------------
Manila to Mt. Pinatubo

1 rows selected

Let’s now see the full metadata.

SELECT extractValue(g.object_value,'/gpx/metadata/name','xmlns="http://www.topografix.com/GPX/1/1"') as Name,
       extractValue(g.object_value,'/gpx/metadata/desc','xmlns="http://www.topografix.com/GPX/1/1"') as Description,
       extractValue(g.object_value,'/gpx/metadata/copyright/year','xmlns="http://www.topografix.com/GPX/1/1"') as Copyright_Year,
       extractValue(g.object_value,'/gpx/metadata/copyright/license','xmlns="http://www.topografix.com/GPX/1/1"') as Copyright_License,
       extractValue(g.object_value,'/gpx/metadata/link/@href','xmlns="http://www.topografix.com/GPX/1/1"') as Hyperlink,
       extractValue(g.object_value,'/gpx/metadata/link/text','xmlns="http://www.topografix.com/GPX/1/1"') as Hyperlink_Text,
       extractValue(g.object_value,'/gpx/metadata/time','xmlns="http://www.topografix.com/GPX/1/1"') as Document_DateTime,
       extractValue(g.object_value,'/gpx/metadata/keywords','xmlns="http://www.topografix.com/GPX/1/1"') as keywords,
       extractValue(g.object_value,'/gpx/metadata/bounds/@minlon','xmlns="http://www.topografix.com/GPX/1/1"') as MinLong,
       extractValue(g.object_value,'/gpx/metadata/bounds/@minlat','xmlns="http://www.topografix.com/GPX/1/1"') as MinLat,
       extractValue(g.object_value,'/gpx/metadata/bounds/@maxlon','xmlns="http://www.topografix.com/GPX/1/1"') as MaxLong,
       extractValue(g.object_value,'/gpx/metadata/bounds/@maxlat','xmlns="http://www.topografix.com/GPX/1/1"') as MaxLat
 FROM GPX g;

NAME                   DESCRIPTION                    COPYRIGHT_YEAR COPYRIGHT_LICENSE             HYPERLINK                     HYPERLINK_TEXT    DOCUMENT_DATETIME               KEYWORDS                    MINLONG MINLAT MAXLONG MAXLAT
---------------------- ------------------------------ -------------- ----------------------------- ----------------------------- ----------------- ------------------------------- --------------------------- ------- ------ ------- ------
Manila to Mt. Pinatubo This is test data for gpx2shp. 01/JAN/04      http://gpx2shp.sourceforge.jp http://gpx2shp.sourceforge.jp Toshihiro Hiraoka 29/DEC/04 08:23:55.000000000 AM Manila Mt. Pinatubo gpx2shp    -180    -90   179.9     90

1 rows selected

Now let’s extract our waypoints (only name and coordinates).

SELECT SUBSTR(EXTRACTVALUE(VALUE(t), 'wpt/name'),1,20) as Name,
       TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lon')) as longitude,
       TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lat')) as latitude,
       TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/ele'))  as Elevation
 FROM GPX g,
      TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/wpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t;

NAME   LONGITUDE      LATITUDE     ELEVATION
------ -------------- ------------ ----------
001    121.043382715  14.636015547  45.307495
002    121.042653322  14.637198653  50.594727
003    121.043165457  14.640581002  46.989868
004    120.155537082  14.975596117  38.097656
...
111    120.917379661  14.474465596   5.172729
112    120.938029736  14.942621375  28.965332
113    124.379444299  12.255321192  21.995728
GARMIN -94.799016668  38.855549991 325.049072
GRMEUR -1.463899976   50.982883293  35.934692
GRMPHX -111.946110008 33.330189949 361.098145
GRMTWN 121.640266674  25.061783362  38.097656
NEDA4  121.043607602  14.636060139   8.296997

118 rows selected

Since our target is SDO_GEOMETRY let’s re-do the above query and make it generate Oracle Spatial data.

SELECT SUBSTR(EXTRACTVALUE(VALUE(t), 'wpt/name'),1,20) as Name,
       SDO_GEOMETRY(3001,
                    8307,
                    SDO_POINT_TYPE(TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lon')),
                                   TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lat')),
                                   TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/ele'))),
	            NULL,
		NULL) as geom
 FROM GPX g,
      TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/wpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t;

NAME                 GEOM
------ --------------------------------------------------------------------------------------------------
001    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043382715,14.636015547,45.307495),null,null)
002    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.042653322,14.637198653,50.594727),null,null)
003    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043165457,14.640581002,46.989868),null,null)
004    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.155537082,14.975596117,38.097656),null,null)
...
111    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.917379661,14.474465596,5.172729),null,null)
112    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.938029736,14.942621375,28.965332),null,null)
113    MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(124.379444299,12.255321192,21.995728),null,null)
GARMIN MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-94.799016668,38.855549991,325.049072),null,null)
GRMEUR MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-1.463899976,50.982883293,35.934692),null,null)
GRMPHX MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-111.946110008,33.330189949,361.098145),null,null)
GRMTWN MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.640266674,25.061783362,38.097656),null,null)
NEDA4  MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043607602,14.636060139,8.296997),null,null)

118 rows selected

Now waypoints are the bread and butter of most GPS use, but we also find tracks and sometimes routes. Both of these are linear elements so we need to see how to process them.

First, let’s look at the routes in the GPX file.

-- Show all routes
SELECT SUBSTR(   EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name,
       TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number'))    as route_number
  FROM GPX g,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte','xmlns="http://www.topografix.com/GPX/1/1"'))) t1;

ROUTE_NAME ROUTE_NUMBER
---------- ------------
10-SEP-04  4

1 rows selected

-- Show the route and all its points (as XML)
SELECT SUBSTR(   EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name,
       TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number'))    as route_number,
       EXTRACT(g.OBJECT_VALUE,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"') as route_pts
  FROM GPX g,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]','xmlns="http://www.topografix.com/GPX/1/1"'))) t1;

ROUTE_NAME ROUTE_NUMBER           ROUTE_PTS
---------- ------------ ---------------------------------------------------------------------------------
10-SEP-04  4            <rtept xmlns="http://www.topografix.com/GPX/1/1" lat="15.044639" lon="120.25491">
                          <ele>209.704834</ele>
                          <time>2004-09-09T01:57:08.000000</time>
                        </rtept>
..
                        <rtept xmlns="http://www.topografix.com/GPX/1/1" lat="14.980567" lon="120.193627">
                          <ele>68.871582</ele>
                          <time>2004-09-10T00:51:48.000000</time>
                        </rtept>

1 rows selected

-- Finally, show the route with its point objects (same definition as a waypoint point object)
SELECT SUBSTR(   EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name,
       TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number'))    as route_number,
       TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) as longitude,
       TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) as latitude,
       TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/ele','xmlns="http://www.topografix.com/GPX/1/1"')) as elevation
 FROM GPX g,
      TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]','xmlns="http://www.topografix.com/GPX/1/1"'))) t1,
      TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t;

ROUTE_NAME ROUTE_NUMBER LONGITUDE  LATITUDE   ELEVATION
---------- ------------ ---------- ---------- ----------
10-SEP-04  4            120.25491  15.044639  209.704834
10-SEP-04  4            120.25491  15.044446  206.340088
10-SEP-04  4            120.254931 15.044231  206.340088
10-SEP-04  4            120.255103 15.043373  207.782104
10-SEP-04  4            120.25506  15.04333   206.820679
.....
10-SEP-04  4            120.193799 14.980609  69.832886
10-SEP-04  4            120.193627 14.980567  68.871582

323 rows selected

Finally, let’s use some SQL trickery to turn this single route into a linear SDO_GEOMETRY object.

-- Turn (Single) Route into geometry
SELECT MDSYS.SDO_GEOMETRY(3002,8307,null,mdsys.sdo_elem_info_array(1,2,1),
CAST(MULTISET( 
 SELECT case when r.rin = 1 then longitude
             when r.rin = 2 then latitude
             when r.rin = 3 then elevation
         end as ordinate
   FROM (select level rin from dual connect by level < 4) r,
        (SELECT rownum as pt,
                TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) as longitude,
                TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) as latitude,
                TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/ele','xmlns="http://www.topografix.com/GPX/1/1"'))  as elevation
           FROM GPX g,
                TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t
        ) x
   order by x.pt, r.rin
) AS mdsys.sdo_ordinate_array)) as geom
  FROM DUAL ;

GEOM
--------------------------------------------
MDSYS.SDO_GEOMETRY(3002,8307,null,
      MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),
      MDSYS.SDO_ORDINATE_ARRAY(
            120.25491,15.044639,209.704834,
            120.25491,15.044446,206.340088,
            120.254931,15.044231,206.340088,
            120.255103,15.043373,207.782104,
            120.25506,15.04333,206.820679,
            ...
            120.193799,14.980609,69.832886,
            120.193627,14.980567,68.871582))

1 rows selected

The above geometry (which also passed SDO_GEOM.VALIDATE_GEOMETRY testing) was edited to aid presentation as it contains 323 3D coordinates.

Note you cannot use SDO_AGGR_UNION with point objects to generate a linestring (change the 3005 to 2002 and change the SDO_ELEM_INFO_ARRAY). The reason is that, at 10gR2, the resultant aggregated points can only be 2D.

Now the above approach is fine if the file contains a single route: what if it contains many?

The GPX file I loaded contains a number of tracks. Let’s look at them.

-- Get all tracks
SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
       SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name
  FROM GPX g,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1;

TRACK_NUMBER TRACK_NAME 
------------ ---------- 
1            07-SEP-04  
2            08-SEP-04  
3            08-SEP-04  
7            Only 5 poi 

4 rows selected

We have 4 tracks.

Now, tracks are organised a little differently than routes. Tracks can contain multiple segments (trkseg) each of which can have multiple points (trkpt).

-- Get each track's segments
SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
       SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
       EXTRACT(g.OBJECT_VALUE,'/gpx/trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"') as trackSeg
  FROM GPX g,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1;

TRACK_NUMBER TRACK_NAME TRACKSEG 
------------ ---------- -----------------------------------------------------------------------------------------------------------------------------
1            07-SEP-04  <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="14.585381" lon="121.056118"><ele>42.916016</ele>  ... </trkseg>
2            08-SEP-04  <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="15.043309" lon="120.254095"><ele>206.340088</ele> ... </trkseg>
3            08-SEP-04  <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="15.043309" lon="120.254095"><ele>206.340088</ele> ... </trkseg>
7            Only 5 poi <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="14.907889" lon="120.558472"><ele>28.976929</ele>  ... </trkseg>

4 rows selected

Of course I have edited and truncated the trkseg data.

Now, let’s turn these tracks into linestring geometries.

-- Turn (Multiple) Tracks into geometry
-- Note, all can only have one track segment
-- (If more than one track segment would need to handle by generating more than one row per track and group output below by SDO_AGGR_UNION)
-- 
WITH tracks As (
SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
       SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
       EXTRACT(g.OBJECT_VALUE,'/gpx/trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"') as trackSeg
  FROM GPX g,
       TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1
)
SELECT trk.track_number,
       trk.track_name,
       sdo_geom.validate_geometry(mdsys.sdo_geometry(3002,8307,null,sdo_elem_info_array(1,2,1),
       CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"'))
                                 when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"'))
                                 when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"'))
                             end ordinate
                       FROM (select level as rin from dual connect by level < 4) r,
                             TABLE(XMLSequence(EXTRACT(trk.trackSeg,'/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
                    ) as mdsys.sdo_ordinate_array
           )),0.05) as geom
  FROM tracks trk;

TRACK_NUMBER TRACK_NAME GEOM
------------ ---------- ----
1            07-SEP-04  TRUE
2            08-SEP-04  TRUE
3            08-SEP-04  TRUE
7            Only 5 poi TRUE

4 rows selected

Luckily, each of these tracks only has one track segment. If a track had more than one track segment we would have to generate MULTILINESTRINGS and not LINESTRINGS. For me that was slightly more involved as I ran in difficulties pursuing an pure-SQL based approach.

MultiLineStrings

The above SQL works because it splits out each trkseg that is associated with a track via a separate line. One could use SDO_AGGR_UNION or SDO_AGGR_CONCAT_LINES on the resultant geometry objects but I have found, after modifying the above SQL with a suitable GROUP BY clause that I could not generate the correct answer with either spatial aggregate (the main problem is the generation of the SDO_ELEM_INFO_ARRAY).

In the end I decided that I would have to do the geometry generation myself.

Firstly, I modified the gpxtest.gpx file merging the last two tracks:

Tracks and SegmentsBefore

			<trkpt lat="15.044639" lon="120.254910">
				<ele>209.704834</ele>
				<time>2004-09-09T01:56:51</time>
			</trkpt>
		</trkseg>
	</trk>
	<trk>
		<name>Only 5 points</name>
		<number>7</number>
		<trkseg>
			<trkpt lat="14.907889" lon="120.558472">
				<ele>28.976929</ele>
				<time>2004-09-10T08:35:38</time>
			</trkpt>

Tracks and SegmentsAfter

			<trkpt lat="15.044639" lon="120.254910">
				<ele>209.704834</ele>
				<time>2004-09-09T01:56:51</time>
			</trkpt>
		</trkseg>
		<trkseg>
			<trkpt lat="14.907889" lon="120.558472">
				<ele>28.976929</ele>
				<time>2004-09-10T08:35:38</time>
			</trkpt>

Then I loaded the modified GPX file into a new table.

drop table GPX2;
purge recyclebin;
create table GPX2
    of xmltype
    XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx";
INSERT INTO GPX2 VALUES (XMLTYPE(getCLOBDocument('GPXDIR','gpxtest2.gpx')));
commit;

Now let’s query out the number of segments per track.

SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
       SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
       count(*) as segments
  FROM GPX2 g,
       TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1,
       TABLE(XMLSequence(EXTRACT(VALUE(t1),'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"'))) t2
 GROUP BY TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')),
          SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  
 order by 1;

TRACK_NUMBER TRACK_NAME SEGMENTS
------------ ---------- --------
1            07-SEP-04  1
2            08-SEP-04  1
3            08-SEP-04  2

3 rows selected

You will note that I have to make extensive use of standard SQL’s COUNT function while, in many cases, use of XPath’s COUNT function would be far neater. However, XMLDB does not support the XPath COUNT function at 10gR2. It is, however, supported at 11gR1.

Now, to generate a single geometry for all segments in a track one must, of course, ensure that the SDO_ELEM_INFO_ARRAY is correctly generated. I looked at different ways to do this in pure SQL but I could not get a base SQL statement that I could to extend to work. For example, the following SQL statement, which tries to use the CAST(MULTISET approach that worked successfully for generating the SDO_ORDINATE_ARRAY, simply refused to terminate. I could get elements of the SQL to work via a “divide-and-conquery” approach but I was thwarted at every turn.

-- This doesn't work as its execution never returns
WITH tracks As (
SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
       SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
       (SELECT count(*)
          FROM TABLE(XMLSequence(EXTRACT(VALUE(t1),
                                         'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg',
                                         'xmlns="http://www.topografix.com/GPX/1/1"')))) as SegCount,
       EXTRACT(VALUE(t1),
               'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg',
               'xmlns="http://www.topografix.com/GPX/1/1"') as allSegments
  FROM GPX2 g,
       TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1
)
SELECT trk.track_number,
       trk.track_name,
       CASE WHEN trk.segcount = 1 
            THEN mdsys.sdo_elem_info_array(1,2,1) 
            ELSE 
       CAST(MULTISET(SELECT count(*)
                       FROM TABLE(codesys.geom.generate_series(1,trk.segCount,1)) gs,
                            TABLE(XMLSequence(EXTRACT(trk.allSegments,'/trkseg['||gs.column_value||']/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
                      GROUP BY gs.column_value
                     ) AS mdsys.sdo_elem_info_array ) 
        END as coordarray,
       mdsys.sdo_geometry(3002,8307,null,
         mdsys.sdo_elem_info_array(1,2,1),
         /* mdsys.sdo_ordinate_array */
         CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"'))
                                   when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"'))
                                   when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"'))
                               end ordinate
                         FROM (select level as rin from dual connect by level < 4) r,
                               TABLE(XMLSequence(EXTRACT(trk.allSegments,'/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
                      ) as mdsys.sdo_ordinate_array
             )) as geom
  FROM tracks trk;
-- Does not terminate

In the end, I decided to encapsulate the SDO_ELEM_INFO_ARRAY processing into a PL/SQL Function.

CREATE OR REPLACE FUNCTION GetElemInfoFromXML ( p_trkseg_xml IN XMLType
) RETURN mdsys.Sdo_Elem_Info_Array 
AS
  v_seg_count number;
  v_array     mdsys.sdo_ordinate_array;
  v_elem_info mdsys.Sdo_Elem_Info_Array := new mdsys.Sdo_Elem_Info_Array(1,2,1);
BEGIN
  SELECT count(*)
    INTO v_seg_count
    FROM TABLE(XMLSequence(EXTRACT(p_trkseg_xml,
                                   'trkseg',
                                   'xmlns="http://www.topografix.com/GPX/1/1"')));
  If ( v_seg_count > 0 ) Then
    SELECT count(*)
      BULK COLLECT INTO v_array
      FROM TABLE(codesys.geom.generate_series(1,v_seg_count,1)) gs,
           TABLE(XMLSequence(EXTRACT(p_trkseg_xml,
                                             '/trkseg['||gs.column_value||']/trkpt',
                                             'xmlns="http://www.topografix.com/GPX/1/1"'))) t
     GROUP BY gs.column_value
     ORDER BY gs.column_value;
  
    IF ( v_array is not null AND v_array.COUNT > 1 ) THEN
      FOR i IN v_array.FIRST..(v_array.LAST-1) LOOP
        v_elem_info.EXTEND(3);
        v_elem_info(v_elem_info.COUNT-2) := ( v_array(i) * 3 ) + v_elem_info(v_elem_info.COUNT-5);
        v_elem_info(v_elem_info.COUNT-1) := 2;
        v_elem_info(v_elem_info.COUNT  ) := 1;
      END LOOP;
    END IF;
  END IF;
  RETURN v_elem_info;
END GetElemInfoFromXML;
/
show errors

Which I then used as follows:

SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
       SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10)  as track_name,
       mdsys.sdo_geometry(CASE WHEN (SELECT COUNT(*) 
                                       FROM TABLE(XMLSequence(EXTRACT(VALUE(allSegments),
                                                                     'trk/trkseg',
                                                                     'xmlns="http://www.topografix.com/GPX/1/1"')))) > 1
                               THEN 3006
                               ELSE 3002
                           END,
                          8307,
                          NULL,
         GetElemInfoFromXML(EXTRACT(VALUE(allSegments),'trk/trkseg','xmlns="http://www.topografix.com/GPX/1/1"')),
         CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"'))
                                   when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"'))
                                   when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"'))
                               end ordinate
                         FROM (select level as rin from dual connect by level < 4) r,
                               TABLE(XMLSequence(EXTRACT(VALUE(allSegments),'trk/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t
                      ) as mdsys.sdo_ordinate_array
             )) as geom
  FROM GPX2 g,
       TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1, 
       TABLE(XMLSequence(EXTRACT(VALUE(t1),'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']','xmlns="http://www.topografix.com/GPX/1/1"'))) allSegments;
  
TRACK_NUMBER TRACK_NAME GEOM
------------ ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1            07-SEP-04  MDSYS.SDO_GEOMETRY(3002,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(121.056118,14.595809,49.645264,...,120.246134,15.043287,209.224121)) 
2            08-SEP-04  MDSYS.SDO_GEOMETRY(3002,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(120.254095,15.043309,210.666016,...,120.236864,15.034811,114.053589)) 
3            08-SEP-04  MDSYS.SDO_GEOMETRY(3006,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1,478,2,1),MDSYS.SDO_ORDINATE_ARRAY(120.254095,15.043309,210.666016,...,120.558879,14.908383,28.015503)) 
  
3 rows selected

So, a combination of SQL and PL/SQL solved the problem of how to generate MULTILINESTRINGS for tracks with multiple segments. Once I upgrade to 11gR1 I will re-look at this problem again to see if use of XPath’s COUNT function can help me implement a solution based on pure SQL.

Conclusion

XMLDB (or even any “horizontal market” XML processing tool) is an excellent tool for handling and processing GPX and other geospatial XML formats. The benefits for use of XMLDB include:

  • Fully integrated into the Oracle database;
  • A powerful, generic tool not limited to geospatial data processing;
  • No additional licensing cost;
  • No need for a separate application or application server;
  • Updates and processing can be executed by Oracle’s DBMS_JOB or DBMS_SCHEDULER queues.
  • I don’t profess to be an XML or XMLDB expert. If you find any errors in what I have written, please write to me and let me know and I will correct them.

    I hope this is of use to someone.