Using PostgreSQL’s XML processing to load spatial data

Using PostgreSQL’s XML processing to load spatial data

Spatial data comes in many forms, but the most common is, probably, XML.

If you have a need to load XML data – whether that be GML or other XML with spatial data – you may not know it, but you can use native PostgreSQL tools to load the XML data along with its geometry data.

Does this mean we no longer need traditional spatial/GIS tools such as FME or ogr2ogr?

Absolutely not!

It is just that in certain circumstances, the only tool in your tool-box just might be ordinary old XML type processing.

There are two scenarios

1. Loading WKT as part of a simple XML document

Here is a snipped of XML data that contains WKT data:

<hotels>
 <hotel id="mancha">
  <name>La Mancha</name>
  <location srid="4326"><strong>POINT(-43.23737 147.14756)</strong></location>
  <rooms>
   <room id="201"><capacity>3</capacity><comment>Great view of the Channel</comment></room>
   <room id="202"><capacity>5</capacity></room>
  </rooms>
 </hotel>
 <hotel id="valpo">
  <name>Valparaíso</name>
  <location srid="4326"><strong>POINT(-43.26737 147.29756)</strong></location>
  <rooms>
   <room id="201"><capacity>2</capacity><comment>Very noisy</comment></room>
   <room id="202"><capacity>2</capacity></room>
  </rooms>
 </hotel>
</hotels>

Normally one would use the COPY command in psql or pgAdmin IV to load an XML file from the file system into a table. But for the first part of article, we will use the XML directly.

The following SQL Select statement uses PostgreSQL’s XML processing capability to access the XML and extract the hotel information including accessing the WKT and converting it to an geography object.

With hoteldata as (
SELECT 
'<hotels>
 <hotel id="mancha">
  <name>La Mancha</name>
  <location srid="4326"><strong>POINT(-43.23737 147.14756)</strong></location>
  <rooms>
   <room id="201"><capacity>3</capacity><comment>Great view of the Channel</comment></room>
   <room id="202"><capacity>5</capacity></room>
  </rooms>
 </hotel>
 <hotel id="valpo">
  <name>Valparaíso</name>
  <location srid="4326"><strong>POINT(-43.26737 147.29756)</strong></location>
  <rooms>
   <room id="201"><capacity>2</capacity><comment>Very noisy</comment></room>
   <room id="202"><capacity>2</capacity></room>
   <room id="203"><capacity>3</capacity><comment>Very comfortable</comment></room>
  </rooms>
 </hotel>
</hotels>'::xml as hotels
)
-- LOCATION AS wkt
SELECT hotel_id,
       hotel_name,
       count(*) as total_rooms,
       sum(guests_per_room) as total_guests,
       ST_AsEWKT(
        ST_GeogFromText(
          ST_AsEWKT(
            ST_SwapOrdinates(
              ST_GeomFromText(CONCAT('SRID=',srid,'; ',wkt)),
              'xy'
            )
         )
        )
       ) as location
  from (SELECT (XPATH('@id',                 hotel))[1]::text as hotel_id,
               (XPATH('name/text()',         hotel))[1]::text as hotel_name,
               CAST((XPATH('location/text()',hotel))[1]::text as text) as wkt,
               CAST((XPATH('location/@srid', hotel))[1]::text as text) as srid,
               (XPATH('@id',            rooms))[1]::text::integer as room_number,
               (XPATH('capacity/text()',rooms))[1]::text::integer as guests_per_room
          FROM (select hotel.*,
                       UNNEST(XPATH('rooms/room',hotel)) as rooms
                 from hoteldata as x,
                      unnest(XPATH('/hotels/hotel',x.hotels)) as hotel
                ) q
        ) as f
 group by hotel_id, hotel_name,wkt,srid;

hotel_id hotel_name total_rooms total_guests location
valpo    Valparaíso           3            7 SRID=4326;POINT(147.29756 -43.26737)
mancha   La Mancha            2            8 SRID=4326;POINT(147.14756 -43.23737)

GML

The next examp[le, extracts the room information for each hotel and converts each rooms location (say a bungalow in away from the main hotel grounds). The GML is extracted and reformatted as a valid geography object.

With hoteldata as (
SELECT '<hotels>
  <hotel id="mancha">
  <name>La Mancha</name>
  <location>
    <Point srsName="EPSG:4326">
       <pos>-43.23737 147.14756</pos>
    </Point>
  </location>
  <rooms>
   <room id="201">
     <capacity>3</capacity>
     <comment>Great view of the Channel</comment>
     <location>
      <Point srsName="EPSG:4326">
        <pos>-43.23731 147.14751</pos>
      </Point>
     </location>
  </room>
   <room id="202">
     <capacity>5</capacity>
     <location>
      <Point srsName="EPSG:4326">
        <pos>-43.23732 147.14758</pos>
      </Point>
     </location>
    </room>
  </rooms>
 </hotel>
 <hotel id="valpo">
  <name>Valparaíso</name>
  <location>
    <Point srsName="EPSG:4326">
       <pos>-43.26737 147.29756</pos>
    </Point>
  </location>
  <rooms>
   <room id="201">
     <capacity>2</capacity>
     <comment>Very noisy</comment>
     <location>
       <Point srsName="EPSG:4326">
          <pos>-43.26729 147.29750</pos>
       </Point>
     </location>
   </room>
   <room id="202">
     <capacity>2</capacity>
     <location>
       <Point srsName="EPSG:4326">
          <pos>-43.26741 147.29734</pos>
       </Point>
     </location>
   </room>
   <room id="203">
     <capacity>4</capacity>
     <comment>Very comfortable</comment>
     <location>
       <Point srsName="EPSG:4326">
          <pos>-43.26740 147.29760</pos>
       </Point>
     </location>
   </room>
  </rooms>
 </hotel>
</hotels>'::xml as hotels
)
-- Just extract Hotel entity
SELECT hotel_name,
       ST_AsEWKT(
        ST_GeogFromText(
         ST_AsEWKT(
          ST_SwapOrdinates(
           ST_GeomFromGml(wkt),
           'xy'
          )
         )
        )
       ) as location,
       room_number,
       guests_per_room
  from (SELECT (XPATH('name/text()',         hotel))[1]::text as hotel_name,
               CAST((XPATH('location/Point',
                           rooms,
                           ARRAY[ARRAY['gml','"http://www.opengis.net/gml"']]
                          )
                    )[1]::text      as text)::text as wkt,
               (XPATH('@id',            rooms))[1]::text::integer as room_number,
               (XPATH('capacity/text()',rooms))[1]::text::integer as guests_per_room
          FROM (select hotel.*, rooms.*
                  from hoteldata as x,
                       unnest(XPATH('/hotels/hotel',x.hotels)) as hotel,
                       unnest(XPATH('rooms/room',
                                    hotel.*,
                                    ARRAY[ARRAY['gml','"http://www.opengis.net/gml"']]
                                   )
                             ) as rooms
                ) q
        ) as f
order by hotel_name, room_number;

hotel_name location                             room_number guests_per_room
La Mancha  SRID=4326;POINT(147.14751 -43.23731)         201               3
La Mancha  SRID=4326;POINT(147.14758 -43.23732)         202               5
Valparaíso SRID=4326;POINT(147.2975 -43.26729)          201               2
Valparaíso SRID=4326;POINT(147.29734 -43.26741)         202               2
Valparaíso SRID=4326;POINT(147.2976 -43.2674)           203               4

Using the COPY command.

The examples above hard-code the XML into the SQL statement.

Normally, the XML is provided as a disk file.

Here we can load the XML data using the PostgreSQL COPY command:

drop table if exists public.hoteldata;

create table public.hoteldata (
  hotels text /* Loading as text allows for validation: see below */
);

COPY public.hoteldata (hotels) FROM 'C:\temp\hotels.xml';

-- Validate
select XMLPARSE (DOCUMENT a.hotels) from public.hoteldata as a;
select XMLPARSE (CONTENT  a.hotels) from public.hoteldata as a;
select xml_is_well_formed(a.hotels),
       xml_is_well_formed_content(a.hotels)
from public.hoteldata as a;

Once the XML is loaded, it can be queried as above:

SELECT hotel_name,
       ST_AsEWKT(
        ST_GeogFromText(
         ST_AsEWKT(
          ST_SwapOrdinates(
           ST_GeomFromGml(gml),
           'xy'
          )
         )
        )
       ) as location,
	   room_number,
	   guests_per_room
  from (SELECT (XPATH('name/text()',         hotel))[1]::text as hotel_name,
               CAST((XPATH('location/Point',
                           rooms,
                           ARRAY[ARRAY['gml','"http://www.opengis.net/gml"']]
                          )
                    )[1]::text      as text)::text as gml,
               (XPATH('@id',            rooms))[1]::text::integer as room_number,
               (XPATH('capacity/text()',rooms))[1]::text::integer as guests_per_room
          FROM (select hotel.*, rooms.*
                  from public.hoteldata as x,
				       unnest(XPATH('/hotels/hotel',x.hotels::xml)) as hotel,
                       unnest(XPATH('rooms/room',
                                    hotel.*,
                                    ARRAY[ARRAY['gml','"http://www.opengis.net/gml"']]
                                   )
                             ) as rooms
                ) q
        ) as f
order by hotel_name, room_number;

hotel_name location                             room_number guests_per_room
La Mancha  SRID=4326;POINT(147.14751 -43.23731)         201               3
La Mancha  SRID=4326;POINT(147.14758 -43.23732)         202               5
Valparaíso SRID=4326;POINT(147.2975 -43.26729)          201               2
Valparaíso SRID=4326;POINT(147.29734 -43.26741)         202               2
Valparaíso SRID=4326;POINT(147.2976 -43.2674)           203               4

These points, when mapped, look like this:

I hope this article is of interest to someone out there!