Top 5 Recent Articles
ARTICLES CATEGORIES
- Algorithms (22)
- All (399)
- Biography (1)
- Blog (44)
- Business Requirements (1)
- Commentary (1)
- Conversion (2)
- Customers (2)
- Data Models (1)
- Education (2)
- GeoRaptor (13)
- GPS (1)
- Image Processing (2)
- Import Export (8)
- Licensing (2)
- LiDAR (1)
- Linear Referencing (4)
- Manifold GIS (3)
- Mapping (1)
- MySQL Spatial (7)
- Networking and Routing (including Optimization) (5)
- Open Source (18)
- Oracle Spatial and Locator (194)
- Partitioning (1)
- PostGIS (36)
- Projections (1)
- Published Articles (1)
- qGIS (1)
- Recommendations (1)
- Services (1)
- Software Change Log (1)
- Source Code (37)
- Space Curves (9)
- Spatial Database Functions (109)
- Spatial DB comparison (1)
- Spatial XML Processing (11)
- SQL Server Spatial (92)
- Standards (3)
- Stored Procedure (17)
- Tessellation or Gridding (10)
- Tools (2)
- Topological Relationships (1)
- Training (2)
- Triangulation (2)
Loading and Processing GPX 1.1 files using Oracle XMLDB
There are a large number of geospatial data storage, delivery and interchange formats around nowadays. Including (I limit this discussion to vector data formats only):
- Geography Markup Language (GML)
- Keyhole Markup Language
- GPS eXchange Format (GPX)
- Vector Markup Language (VML)
- Scalable Vector Graphics (SVG)
- Victorian Government’s (Aust) Incremental Update Format (IUF)
- New South Wales Government’s (Aust) Spatial Data Exchange (�SIX�) (and XML incremental update data stream)
- LandXML
- X3D (successor to Virtual Reality Modelling Language – VRML)
Each has multiple versions and the list do including all the GML Application Schemas like Geoscience Markup Language (GeoSciML), CityGML, LandGML, Climate Science Modelling Language (CSML) etc.
So what does all this mean? Certainly that the use of XML for geospatial purposes is firmly in place and likely to increase rather than decrease. And for the Oracle Spatial data practitioner? That they are, inevitably, be exposed to these formats and the need to load new, and update existing, Sdo_Geometry based data.
Over on Oracle Technet’s Spatial forum a user asked the following question :
“I have a GPX data that is XML based I know about FME and convert to shape then to oracle spatial, But what I want exactly to develop a small J2EE application to read the data from GPX file then load the data to database direct without third party software. Please if any one knows how to do it please help me.”
Now, this question cuts to the nub of the alternate way of approaching the processing of geospatial data. Why use an external, geospatial specific, tool when there are perfectly good, generic, integrated tools to do a job? The issues with using external tools are:
- Need for a separately installed and configured tool either on the same server as the database (frowned upon in most organisations) – why purchase/install a separate tool if the existing database has the functionality?;
- Cost of purchasing/learning external tool;
- Need to re-test integration as external tool is upgraded (eg version 1 to version 2);
- Need for specific expertise in a tool other than the database – what happens if the external tool specialist – who has written a special (idiosyncratic) script (in a non standard language) to implement the load/update – goes on leave or leaves the organisation entirely and the load script fails?;
- Need for network/database connection – if the external application is on a separate server (or client PC) then a network connection is required when inserting/updating. If a batch process (via crontab or Windows Sheduled Task), what happens if the network goes down?
A lot of these issues are real as I have experienced them myself over the years as a consultant, a GIS Manager and an IT programmer.
Because all the above formats are XML, and Oracle Database (and its competitors) support XML processing inside the database, I thought I would write a short article on loading a GPX 1.1 file into Oracle and then extracting and generating Sdo_Geometry objects from it.
While I have given a presentation on the use of Oracle’s XMLDB before, I have not written a blog article showing, step by step, how to process a GPX file. Here is the article.
Instance Data and Schema
Firstly, the instance dataset. Late last year i downloaded GPX2SHP. Included with it is a file, pinatest2.gpx. This file forms the basis of my work.
Secondly, the GPX file is based on verison 1.1 of an XML Schema which was designed and is hosted by
Topographix. I downloaded and stored this xsd file with my GPX file.
Lots of programs support GPX and there are lots of examples of GPX instance documents available on the web. One source documenting programs and data is at the GISWiki.
There are other sample instance documents at the actual Topographix website but I chose not to use them as some include more complex extensions (which I didn’t want “clouding” the simplicity of this article) or were for GPX version 1.0 (the person asking the question on the Oracle forum has a version 1.1 document).
After having processed the GPX2SHP GPX file I found a great GPX document on the 2005 Tour de France. I wish I had found it earlier as it looks like a great dataset to process (except it too has a lot of complicating extensions holding additional data)!
Checking the data
One of the things about XML data is that often it is constructed informally (no XML Schema or Document Type Definition – DTD ) or is constructed and not checked against its XML Schema in order to ensure validity. Since I will be loading the data using XMLDB’s schema validation capabilties I thought I had best check the dataset I would be using. For this I used Altova’s XMLSpy 2009.
When I opened the pinatest2.gpx document in XMLSpy and validated it, XMLSpy indicated a range of errors in the document.
- The metadata section was incorrectly organised
- The WayPoint (wpt), Track (trk) and Route (rte) data were incorrectly ordered.
The metadata had to be corrected as follows.
Metadata – Before
<metadata> <name>Manila to Mt. Pinatubo</name> <copyright>Toshihiro Hiraoka</copyright> <author>Toshihiro Hiraoka</author> <keyword>Manila Mt. Pinatubo gpx2shp</keyword> <link>http://gpx2shp.sourceforge.jp</link> <description>This is test data for gpx2shp.</description> <time>2004-12-29T08:23:55Z</time> </metadata>
Metadata – After
<metadata> <name>Manila to Mt. Pinatubo</name> <desc>This is test data for gpx2shp.</desc> <copyright author="Toshihiro Hiraoka"> <year>2004</year> <license>http://gpx2shp.sourceforge.jp</license> </copyright> <link href="http://gpx2shp.sourceforge.jp"> <text>Toshihiro Hiraoka</text> </link> <time>2004-12-29T08:23:55</time> <keywords>Manila Mt. Pinatubo gpx2shp</keywords> <bounds maxlat="90.0" maxlon="179.9" minlat="-90.0" minlon="-180.0"/> </metadata>
The ordering of the geospatial components in the original file was:
<gpx ....> <metadata> <trk> <rte> <wpt> </gpx>
Whereas, the XML Schema requires the following order:
<gpx ....> <metadata> <wpt> <rte> <trk> </gpx>
After correction I saved the file to a new name called gpxtest.gpx which you can download from this website.
Loading in to Oracle
After having corrected the GPX file I then proceeded to load it in to Oracle.
First off I needed to grant my schema some additional privileges:
connect system/******@gisdb grant XDBADMIN to gis; grant alter session to gis;
Then I need to create an Oracle Directory so that I can access and load the GPX instance data and schema.
connect gis/*****@gisdb CREATE DIRECTORY gpxdir AS 'E:\BlogBits\TipsTricks\GPX'; GRANT READ ON gpxdir TO PUBLIC;
While not strictly speaking needed, the following function I have found useful as a simple method of loading XML documents from a file in a directory easier.
create or replace function getClobDocument( p_directoryname In varchar2, p_filename In varchar2, p_charset In varchar2 default NULL) return CLOB deterministic is v_file bfile := bfilename(p_directoryname,p_filename); v_charContent CLOB := ' '; v_targetFile bfile; v_lang_ctx number := DBMS_LOB.default_lang_ctx; v_charset_id number := 0; v_src_offset number := 1 ; v_dst_offset number := 1 ; v_warning number; begin if p_charset is not null then v_charset_id := NLS_CHARSET_ID(p_charset); end if; v_targetFile := v_file; DBMS_LOB.fileopen(v_targetFile, DBMS_LOB.file_readonly); DBMS_LOB.LOADCLOBFROMFILE(v_charContent, v_targetFile, DBMS_LOB.getLength(v_targetFile), v_src_offset, v_dst_offset, v_charset_id, v_lang_ctx, v_warning); DBMS_LOB.fileclose(v_targetFile); return v_charContent; end; /
Before loading our GPX XML data we need to register the XML Schema that describes the XML content using DBMS_XMLSCHEMA.REGISTERSCHEMA. This is done via a two-step process of creating a resource and then registering the actual schema.
-- Create some useful variable for use in the whole script var schemaURL varchar2(256) var schemaDoc varchar2(256) begin :schemaURL := 'http://www.topografix.com/GPX/1/1/gpx.xsd'; :schemaDoc := 'gpx.xsd'; end; / declare xmlSchema xmlType; res boolean; begin xmlSchema := XMLTYPE(getCLOBDocument('GPXDIR',:schemaDoc,'AL32UTF8')); if (dbms_xdb.existsResource(:schemaDoc)) then dbms_xdb.deleteResource(:schemaDoc); end if; res := dbms_xdb.createResource(:schemaDoc,xmlSchema); end; / -- May not exist, but attempt to delete anyway. execute dbms_xmlSchema.deleteSchema(:schemaURL,DBMS_XMLSCHEMA.DELETE_CASCADE_FORCE); -- Now register the schema begin dbms_xmlschema.registerSchema ( :schemaURL, XMLTYPE(getCLOBDocument('GPXDIR',:schemaDoc,'AL32UTF8')), TRUE,TRUE,FALSE,FALSE ); end; / -- Queries to see if it exists select * from user_xml_schemas where schema_url = :schemaURL; SCHEMA_URL LOCAL SCHEMA INT_OBJNAME QUAL_SCHEMA_URL HIER_TYPE ----------------------------------------- ----- ----------- -------------------------- -------------------------------------------------------------------------- --------- http://www.topografix.com/GPX/1/1/gpx.xsd YES SYS.XMLTYPE XD79rS1GmSRMqW2gvRg8ufhQ== http://xmlns.oracle.com/xdb/schemas/GIS/www.topografix.com/GPX/1/1/gpx.xsd CONTENTS 1 rows selected SELECT PATH, extract(LINK, '/LINK/Name/text()').getstringval() as linkName, extract(LINK, '/LINK/ParentName/text()').getstringval() as ParentName, extract(LINK, '/LINK/ChildName/text()').getstringval() as ChildName, extract(RES, '/Resource/DisplayName/text()').getstringval() as DisplayName FROM PATH_VIEW WHERE PATH like '/sys/schemas/GIS/www.topo%'; PATH LINKNAME PARENTNAME CHILDNAME DISPLAYNAME --------------------------------------------------- ------------------ ------------------ ------------------ ------------------ /sys/schemas/GIS/www.topografix.com www.topografix.com GIS www.topografix.com www.topografix.com /sys/schemas/GIS/www.topografix.com/GPX GPX www.topografix.com GPX GPX /sys/schemas/GIS/www.topografix.com/GPX/1 1 GPX 1 1 /sys/schemas/GIS/www.topografix.com/GPX/1/1 1 1 1 1 /sys/schemas/GIS/www.topografix.com/GPX/1/1/gpx.xsd gpx.xsd 1 gpx.xsd gpx.xsd 5 rows selected
Now that the schema is loaded and registered, we can create a table to hold the GPX XML data. (I will not discuss the different types of XML storage that XMLDB provides, sufficeth to say that we will create a table of XMLType object type. That is, we will create an object-table.)
Prompt Create table to hold the sample GPX document drop table GPX; CREATE TABLE gpx OF XMLType XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx"; -- Now we can load the data itself INSERT INTO GPX VALUES (XMLTYPE(getCLOBDocument('GPXDIR','gpxtest.gpx'))); commit;
Additional Editing
When loading XML data into a table attributed with its XMLSCHEMA using the above, XMLDB will validate the data as it is loaded. The first-time I loaded the data Oracle XMLDB informed me that
ORA-01830: date format picture ends before converting entire input string xsd:dateTime
Upon investigation, the GPX file contains dateTime values like the following:
<time>2004-12-29T08:23:55Z</time>
With Schema-based XML with nodes of type xsd:dateTime, if an instance document contains dates with TimeZone information (ie the “Z” at the end of the time value above), then the XML schema must be annotated with xsd:SQLType=“TIMESTAMP WITH TIME ZONE” in order to tell XMLDB to expect Time Zone info.
While this is not that difficult to do it would entail changing both the XSD and XML. Something I don’t think my readers would want to do to their GPX documents. So, in order to get around XMLDB I simply removed the “Z” suffix from the end of all time values in the GPX document.
Checking the table creation and load we get.
select * from user_xml_tables where table_name = 'GPX'; TABLE_NAME XMLSCHEMA SCHEMA_OWNER ELEMENT_NAME STORAGE_TYPE ---------- ----------------------------------------- ------------ ------------- ----------------- GPX http://www.topografix.com/GPX/1/1/gpx.xsd GIS gpx OBJECT-RELATIONAL 1 rows selected
And, an additional check of the validity of the loaded XML shows all is well.
SELECT XMLIsValid(g.OBJECT_VALUE,'http://www.topografix.com/GPX/1/1/gpx.xsd') FROM gpx g; XMLISVALID(G.OBJECT_VALUE,'HTTP://WWW.TOPOGRAFIX.COM/GPX/1/1/GPX.XSD') ---------------------------------------------------------------------- 1 1 rows selected
Note that it is possible to define the table to have a CHECk constraint as follows:
CREATE TABLE gpx OF XMLType (CHECK (XMLIsValid(object_value) = 1)) XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx";
Querying
Let’s start our querying with a very simple request to see the name of the GPX file.
SELECT extractValue(g.object_value,’/gpx/metadata/name’,’xmlns="http://www.topografix.com/GPX/1/1"’) metadataName
FROM GPX g;
METADATANAME
————————-
Manila to Mt. Pinatubo
1 rows selected
Let’s now see the full metadata.
SELECT extractValue(g.object_value,'/gpx/metadata/name','xmlns="http://www.topografix.com/GPX/1/1"') as Name, extractValue(g.object_value,'/gpx/metadata/desc','xmlns="http://www.topografix.com/GPX/1/1"') as Description, extractValue(g.object_value,'/gpx/metadata/copyright/year','xmlns="http://www.topografix.com/GPX/1/1"') as Copyright_Year, extractValue(g.object_value,'/gpx/metadata/copyright/license','xmlns="http://www.topografix.com/GPX/1/1"') as Copyright_License, extractValue(g.object_value,'/gpx/metadata/link/@href','xmlns="http://www.topografix.com/GPX/1/1"') as Hyperlink, extractValue(g.object_value,'/gpx/metadata/link/text','xmlns="http://www.topografix.com/GPX/1/1"') as Hyperlink_Text, extractValue(g.object_value,'/gpx/metadata/time','xmlns="http://www.topografix.com/GPX/1/1"') as Document_DateTime, extractValue(g.object_value,'/gpx/metadata/keywords','xmlns="http://www.topografix.com/GPX/1/1"') as keywords, extractValue(g.object_value,'/gpx/metadata/bounds/@minlon','xmlns="http://www.topografix.com/GPX/1/1"') as MinLong, extractValue(g.object_value,'/gpx/metadata/bounds/@minlat','xmlns="http://www.topografix.com/GPX/1/1"') as MinLat, extractValue(g.object_value,'/gpx/metadata/bounds/@maxlon','xmlns="http://www.topografix.com/GPX/1/1"') as MaxLong, extractValue(g.object_value,'/gpx/metadata/bounds/@maxlat','xmlns="http://www.topografix.com/GPX/1/1"') as MaxLat FROM GPX g; NAME DESCRIPTION COPYRIGHT_YEAR COPYRIGHT_LICENSE HYPERLINK HYPERLINK_TEXT DOCUMENT_DATETIME KEYWORDS MINLONG MINLAT MAXLONG MAXLAT ---------------------- ------------------------------ -------------- ----------------------------- ----------------------------- ----------------- ------------------------------- --------------------------- ------- ------ ------- ------ Manila to Mt. Pinatubo This is test data for gpx2shp. 01/JAN/04 http://gpx2shp.sourceforge.jp http://gpx2shp.sourceforge.jp Toshihiro Hiraoka 29/DEC/04 08:23:55.000000000 AM Manila Mt. Pinatubo gpx2shp -180 -90 179.9 90 1 rows selected
Now let’s extract our waypoints (only name and coordinates).
SELECT SUBSTR(EXTRACTVALUE(VALUE(t), 'wpt/name'),1,20) as Name, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lon')) as longitude, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lat')) as latitude, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/ele')) as Elevation FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/wpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t; NAME LONGITUDE LATITUDE ELEVATION -------------------- ---------------------- ---------------------- ---------- 001 121.043382715 14.636015547 45.307495 002 121.042653322 14.637198653 50.594727 003 121.043165457 14.640581002 46.989868 004 120.155537082 14.975596117 38.097656 ... 111 120.917379661 14.474465596 5.172729 112 120.938029736 14.942621375 28.965332 113 124.379444299 12.255321192 21.995728 GARMIN -94.799016668 38.855549991 325.049072 GRMEUR -1.463899976 50.982883293 35.934692 GRMPHX -111.946110008 33.330189949 361.098145 GRMTWN 121.640266674 25.061783362 38.097656 NEDA4 121.043607602 14.636060139 8.296997 118 rows selected
Since our target is SDO_Geometry let’s re-do the above query and make it generate Oracle Spatial data.
SELECT SUBSTR(EXTRACTVALUE(VALUE(t), 'wpt/name'),1,20) as Name, SDO_GEOMETRY(3001, 8307, SDO_POINT_TYPE(TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lon')), TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/@lat')), TO_NUMBER(EXTRACTVALUE(VALUE(t), 'wpt/ele'))), NULL, NULL) as geom FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/wpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t; NAME GEOM ------ -------------------------------------------------------------------------------------------------- 001 MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043382715,14.636015547,45.307495),null,null) 002 MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.042653322,14.637198653,50.594727),null,null) 003 MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043165457,14.640581002,46.989868),null,null) 004 MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.155537082,14.975596117,38.097656),null,null) ... 111 MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.917379661,14.474465596,5.172729),null,null) 112 MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(120.938029736,14.942621375,28.965332),null,null) 113 MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(124.379444299,12.255321192,21.995728),null,null) GARMIN MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-94.799016668,38.855549991,325.049072),null,null) GRMEUR MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-1.463899976,50.982883293,35.934692),null,null) GRMPHX MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(-111.946110008,33.330189949,361.098145),null,null) GRMTWN MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.640266674,25.061783362,38.097656),null,null) NEDA4 MDSYS.SDO_GEOMETRY(3001,8307,MDSYS.SDO_POINT_TYPE(121.043607602,14.636060139,8.296997),null,null) 118 rows selected
Now waypoints are the bread and butter of most GPS use, but we also find tracks and sometimes routes. Both of these are linear elements so we need to see how to process them.
First, let’s look at the routes in the GPX file.
-- Show all routes SELECT SUBSTR( EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name, TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number')) as route_number FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte','xmlns="http://www.topografix.com/GPX/1/1"'))) t1; ROUTE_NAME ROUTE_NUMBER ---------- ------------ 10-SEP-04 4 1 rows selected -- Show the route and all its points (as XML) SELECT SUBSTR( EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name, TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number')) as route_number, EXTRACT(g.OBJECT_VALUE,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"') as route_pts FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]','xmlns="http://www.topografix.com/GPX/1/1"'))) t1; ROUTE_NAME ROUTE_NUMBER ROUTE_PTS ---------- ------------ --------------------------------------------------------------------------------- 10-SEP-04 4 <rtept xmlns="http://www.topografix.com/GPX/1/1" lat="15.044639" lon="120.25491"> <ele>209.704834</ele> <time>2004-09-09T01:57:08.000000</time> </rtept> .. <rtept xmlns="http://www.topografix.com/GPX/1/1" lat="14.980567" lon="120.193627"> <ele>68.871582</ele> <time>2004-09-10T00:51:48.000000</time> </rtept> 1 rows selected -- Finally, show the route with its point objects (same definition as a waypoint point object) SELECT SUBSTR( EXTRACTVALUE(VALUE(t1), 'rte/name'),1,10) as route_name, TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'rte/number')) as route_number, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) as longitude, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) as latitude, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/ele','xmlns="http://www.topografix.com/GPX/1/1"')) as elevation FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]','xmlns="http://www.topografix.com/GPX/1/1"'))) t1, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte[number=4]/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t; ROUTE_NAME ROUTE_NUMBER LONGITUDE LATITUDE ELEVATION ---------- ------------ ---------- ---------- ---------- 10-SEP-04 4 120.25491 15.044639 209.704834 10-SEP-04 4 120.25491 15.044446 206.340088 10-SEP-04 4 120.254931 15.044231 206.340088 10-SEP-04 4 120.255103 15.043373 207.782104 10-SEP-04 4 120.25506 15.04333 206.820679 ..... 10-SEP-04 4 120.193799 14.980609 69.832886 10-SEP-04 4 120.193627 14.980567 68.871582 323 rows selected
Finally, let’s use some SQL trickery to turn this single route into a linear SDO_Geometry object.
-- Turn (Single) Route into geometry SELECT mdsys.sdo_geometry(3002,8307,null,mdsys.sdo_elem_info_array(1,2,1), CAST(MULTISET( SELECT case when r.rin = 1 then longitude when r.rin = 2 then latitude when r.rin = 3 then elevation end as ordinate FROM (select level rin from dual connect by level < 4) r, (SELECT rownum as pt, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) as longitude, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) as latitude, TO_NUMBER(EXTRACTVALUE(VALUE(t), 'rtept/ele','xmlns="http://www.topografix.com/GPX/1/1"')) as elevation FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/rte/rtept','xmlns="http://www.topografix.com/GPX/1/1"'))) t ) x order by x.pt, r.rin ) AS mdsys.sdo_ordinate_array)) as geom FROM DUAL ; GEOM -------------------------------------------- MDSYS.SDO_GEOMETRY(3002,8307,null, MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1), MDSYS.SDO_ORDINATE_ARRAY( 120.25491,15.044639,209.704834, 120.25491,15.044446,206.340088, 120.254931,15.044231,206.340088, 120.255103,15.043373,207.782104, 120.25506,15.04333,206.820679, ... 120.193799,14.980609,69.832886, 120.193627,14.980567,68.871582)) 1 rows selected
The above geometry (which also passed SDO_GEOM.VALIDATE_GEOMETRY testing) was edited to aid presentation as it contains 323 3D coordinates.
Note you cannot use SDO_AGGR_UNION with point objects to generate a linestring (change the 3005 to 2002 and change the SDO_ELEM_INFO_ARRAY). The reason is that, at 10gR2, the resultant aggregated points can only be 2D.
Now the above approach is fine if the file contains a single route: what if it contains many?
The GPX file I loaded contains a number of tracks. Let’s look at them.
-- Get all tracks SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number, SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) as track_name FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1; TRACK_NUMBER TRACK_NAME ---------------------- ---------- 1 07-SEP-04 2 08-SEP-04 3 08-SEP-04 7 Only 5 poi 4 rows selected
We have 4 tracks.
Now, tracks are organised a little differently than routes. Tracks can contain multiple segments (trkseg) each of which can have multiple points (trkpt).
-- Get each track's segments SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number, SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) as track_name, EXTRACT(g.OBJECT_VALUE,'/gpx/trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"') as trackSeg FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1; TRACK_NUMBER TRACK_NAME TRACKSEG ---------------------- ---------- ----------------------------------------------------------------------------------------------------------------------------- 1 07-SEP-04 <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="14.585381" lon="121.056118"><ele>42.916016</ele> ... </trkseg> 2 08-SEP-04 <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="15.043309" lon="120.254095"><ele>206.340088</ele> ... </trkseg> 3 08-SEP-04 <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="15.043309" lon="120.254095"><ele>206.340088</ele> ... </trkseg> 7 Only 5 poi <trkseg xmlns="http://www.topografix.com/GPX/1/1"><trkpt lat="14.907889" lon="120.558472"><ele>28.976929</ele> ... </trkseg> 4 rows selected
Of course I have edited and truncated the trkseg data.
Now, let’s turn these tracks into linestring geometries.
-- Turn (Multiple) Tracks into geometry -- Note, all can only have one track segment -- (If more than one track segment would need to handle by generating more than one row per track and group output below by SDO_AGGR_UNION) -- WITH tracks As ( SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number, SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) as track_name, EXTRACT(g.OBJECT_VALUE,'/gpx/trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"') as trackSeg FROM GPX g, TABLE(XMLSequence(extract(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1 ) SELECT trk.track_number, trk.track_name, sdo_geom.validate_geometry(mdsys.sdo_geometry(3002,8307,null,sdo_elem_info_array(1,2,1), CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"')) end ordinate FROM (select level as rin from dual connect by level < 4) r, TABLE(XMLSequence(EXTRACT(trk.trackSeg,'/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t ) as mdsys.sdo_ordinate_array )),0.05) as geom FROM tracks trk; TRACK_NUMBER TRACK_NAME GEOM ---------------------- ---------- ---- 1 07-SEP-04 TRUE 2 08-SEP-04 TRUE 3 08-SEP-04 TRUE 7 Only 5 poi TRUE 4 rows selected
Luckily, each of these tracks only has one track segment. If a track had more than one track segment we would have to generate MULTILINESTRINGS and not LINESTRINGS. For me that was slightly more involved as I ran in difficulties pursuing an pure-SQL based approach.
MultiLineStrings
The above SQL works because it splits out each trkseg that is associated with a track via a separate line. One could use SDO_AGGR_UNION or SDO_AGGR_CONCAT_LINES on the resultant geometry objects but I have found, after modifying the above SQL with a suitable GROUP BY clause that I could not generate the correct answer with either spatial aggregate (the main problem is the generation of the SDO_ELEM_INFO_ARRAY).
In the end I decided that I would have to do the geometry generation myself.
Firstly, I modified the gpxtest.gpx file merging the last two tracks:
Tracks and Segments – Before
<trkpt lat="15.044639" lon="120.254910"> <ele>209.704834</ele> <time>2004-09-09T01:56:51</time> </trkpt> </trkseg> </trk> <trk> <name>Only 5 points</name> <number>7</number> <trkseg> <trkpt lat="14.907889" lon="120.558472"> <ele>28.976929</ele> <time>2004-09-10T08:35:38</time> </trkpt>
Tracks and Segments – After
<trkpt lat="15.044639" lon="120.254910"> <ele>209.704834</ele> <time>2004-09-09T01:56:51</time> </trkpt> </trkseg> <trkseg> <trkpt lat="14.907889" lon="120.558472"> <ele>28.976929</ele> <time>2004-09-10T08:35:38</time> </trkpt>
Then I loaded the modified GPX file into a new table.
drop table GPX2; purge recyclebin; create table GPX2 of xmltype XMLSCHEMA "http://www.topografix.com/GPX/1/1/gpx.xsd" ELEMENT "gpx"; INSERT INTO GPX2 VALUES (XMLTYPE(getCLOBDocument('GPXDIR','gpxtest2.gpx'))); commit;
Now let’s query out the number of segments per track.
SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number, SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) as track_name, count(*) as segments FROM GPX2 g, TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1, TABLE(XMLSequence(EXTRACT(VALUE(t1),'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg','xmlns="http://www.topografix.com/GPX/1/1"'))) t2 GROUP BY TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')), SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) order by 1; TRACK_NUMBER TRACK_NAME SEGMENTS ---------------------- ---------- ---------------------- 1 07-SEP-04 1 2 08-SEP-04 1 3 08-SEP-04 2 3 rows selected
You will note that I have to make extensive use of standard SQL’s COUNT function while, in many cases, use of XPath’s COUNT function would be far neater. However, XMLDB does not support the XPath COUNT function at 10gR2. It is, however, supported at 11gR1.
Now, to generate a single geometry for all segments in a track one must, of course, ensure that the SDO_ELEM_INFO_ARRAY is correctly generated. I looked at different ways to do this in pure SQL but I could not get a base SQL statement that I could to extend to work. For example, the following SQL statement, which tries to use the CAST(MULTISET approach that worked successfully for generating the SDO_ORDINATE_ARRAY, simply refused to terminate. I could get elements of the SQL to work via a “divide-and-conquery” approach but I was thwarted at every turn.
-- This doesn't work as its execution never returns WITH tracks As ( SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number, SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) as track_name, (SELECT count(*) FROM TABLE(XMLSequence(EXTRACT(VALUE(t1), 'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg', 'xmlns="http://www.topografix.com/GPX/1/1"')))) as SegCount, EXTRACT(VALUE(t1), 'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']/trkseg', 'xmlns="http://www.topografix.com/GPX/1/1"') as allSegments FROM GPX2 g, TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1 ) SELECT trk.track_number, trk.track_name, CASE WHEN trk.segcount = 1 THEN mdsys.sdo_elem_info_array(1,2,1) ELSE CAST(MULTISET(SELECT count(*) FROM TABLE(codesys.geom.generate_series(1,trk.segCount,1)) gs, TABLE(XMLSequence(EXTRACT(trk.allSegments,'/trkseg['||gs.column_value||']/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t GROUP BY gs.column_value ) AS mdsys.sdo_elem_info_array ) END as coordarray, mdsys.sdo_geometry(3002,8307,null, mdsys.sdo_elem_info_array(1,2,1), /* mdsys.sdo_ordinate_array */ CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"')) end ordinate FROM (select level as rin from dual connect by level < 4) r, TABLE(XMLSequence(EXTRACT(trk.allSegments,'/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t ) as mdsys.sdo_ordinate_array )) as geom FROM tracks trk; -- Does not terminate
In the end, I decided to encapsulate the SDO_ELEM_INFO_ARRAY processing into a PL/SQL Function.
CREATE OR REPLACE FUNCTION GetElemInfoFromXML ( p_trkseg_xml IN XMLType ) RETURN mdsys.Sdo_Elem_Info_Array AS v_seg_count number; v_array mdsys.sdo_ordinate_array; v_elem_info mdsys.Sdo_Elem_Info_Array := new mdsys.Sdo_Elem_Info_Array(1,2,1); BEGIN SELECT count(*) INTO v_seg_count FROM TABLE(XMLSequence(EXTRACT(p_trkseg_xml, 'trkseg', 'xmlns="http://www.topografix.com/GPX/1/1"'))); If ( v_seg_count > 0 ) Then SELECT count(*) BULK COLLECT INTO v_array FROM TABLE(codesys.geom.generate_series(1,v_seg_count,1)) gs, TABLE(XMLSequence(EXTRACT(p_trkseg_xml, '/trkseg['||gs.column_value||']/trkpt', 'xmlns="http://www.topografix.com/GPX/1/1"'))) t GROUP BY gs.column_value ORDER BY gs.column_value; IF ( v_array is not null AND v_array.COUNT > 1 ) THEN FOR i IN v_array.FIRST..(v_array.LAST-1) LOOP v_elem_info.EXTEND(3); v_elem_info(v_elem_info.COUNT-2) := ( v_array(i) * 3 ) + v_elem_info(v_elem_info.COUNT-5); v_elem_info(v_elem_info.COUNT-1) := 2; v_elem_info(v_elem_info.COUNT ) := 1; END LOOP; END IF; END IF; RETURN v_elem_info; END GetElemInfoFromXML; / show errors
Which I then used as follows:
SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number, SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) as track_name, mdsys.sdo_geometry(CASE WHEN (SELECT COUNT(*) FROM TABLE(XMLSequence(EXTRACT(VALUE(allSegments), 'trk/trkseg', 'xmlns="http://www.topografix.com/GPX/1/1"')))) > 1 THEN 3006 ELSE 3002 END, 8307, NULL, GetElemInfoFromXML(EXTRACT(VALUE(allSegments),'trk/trkseg','xmlns="http://www.topografix.com/GPX/1/1"')), CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://www.topografix.com/GPX/1/1"')) when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://www.topografix.com/GPX/1/1"')) when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="http://www.topografix.com/GPX/1/1"')) end ordinate FROM (select level as rin from dual connect by level < 4) r, TABLE(XMLSequence(EXTRACT(VALUE(allSegments),'trk/trkseg/trkpt','xmlns="http://www.topografix.com/GPX/1/1"'))) t ) as mdsys.sdo_ordinate_array )) as geom FROM GPX2 g, TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1, TABLE(XMLSequence(EXTRACT(VALUE(t1),'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']','xmlns="http://www.topografix.com/GPX/1/1"'))) allSegments; TRACK_NUMBER TRACK_NAME GEOM ------------ ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 1 07-SEP-04 MDSYS.SDO_GEOMETRY(3002,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(121.056118,14.595809,49.645264,...,120.246134,15.043287,209.224121)) 2 08-SEP-04 MDSYS.SDO_GEOMETRY(3002,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(120.254095,15.043309,210.666016,...,120.236864,15.034811,114.053589)) 3 08-SEP-04 MDSYS.SDO_GEOMETRY(3006,8307,null,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1,478,2,1),MDSYS.SDO_ORDINATE_ARRAY(120.254095,15.043309,210.666016,...,120.558879,14.908383,28.015503)) 3 rows selected
So, a combination of SQL and PL/SQL solved the problem of how to generate MULTILINESTRINGS for tracks with multiple segments. Once I upgrade to 11gR1 I will re-look at this problem again to see if use of XPath’s COUNT function can help me implement a solution based on pure SQL.
Conclusion
XMLDB (or even any “horizontal market” XML processing tool) is an excellent tool for handling and processing GPX and other geospatial XML formats. The benefits for use of XMLDB include:
- Fully integrated into the Oracle database;
- A powerful, generic tool not limited to geospatial data processing;
- No additional licensing cost;
- No need for a separate application or application server;
- Updates and processing can be executed by Oracle’s DBMS_JOB or DBMS_SCHEDULER queues.
I don’t profess to be an XML or XMLDB expert. If you find any errors in what I have written, please write to me and let me know and I will correct them.
I hope this is of use to someone.
Documentation
- MySQL Spatial General Functions
- Oracle LRS Objects
- Oracle Spatial Exporter (Java + pl/SQL)
- Oracle Spatial Object Functions
- Oracle Spatial Object Functions (Multi Page)
- PostGIS pl/pgSQL Functions
- SC4O Oracle Java Topology Suite (Java + pl/SQL)
- SQL Server Spatial General TSQL Functions
- SQL Server Spatial LRS TSQL Functions