Using SQL Server XML and Spatial to import VicMap seed/Incremental Update Format (IUF) data (Part 1)

Introduction

This article is about importing Victorian Government’s Incremental Update File (IUF) XML data into SQL Server.

The IUF XML is used to publish whole (complete) datasets (e.g. Local Government Areas) or deltas/changes.

As will be seen, it encodes geometry objects in its own unique way, and thus is a good dataset to demonstrate the use SQL Server Spatial and XML processing to import the data.

This article is one of three:

  • Part 1 introduces the structure of the IUF XML data and show how to process it to create polygons with exterior and interior rings, and attributes.
  • Part 2 investigates methods for pre-processing the XML and its effects of processing complexity and speed.
  • Part 3 describes a method for reducing XML processing complexity to radically speed processing.

Background

The Victorian Department of Sustainability and Environment releases its spatial data for free under a Creative Commons License.

The data it releases are grouped by themes:

  • Vicmap Address
  • Vicmap Admin
  • Vicmap Crown Land Tenure
  • Vicmap Elevation
  • Vicmap Features of Interest
  • Vicmap Hydro
  • Vicmap Imagery
  • Vicmap Lite
  • Vicmap Planning
  • Vicmap Position
  • Vicmap Property
  • Vicmap Topographic Mapping
  • Vicmap Transport
  • Vicmap Vegetation

And within each theme exists a number of layers.

Vicmap Admin (Boundaries and Names)

  • Local Government Areas (LGA)
  • Localities
  • Postcodes
  • State Electorates
  • Local Government Wards
  • Parishes and Townships
  • State Government Departments
  • CFA (Country Fire Authority) Regions, Districts and Total Fire Ban Districts
  • MFB (Melbourne Fire Brigade) Regions and Districts

Formats

All datasets other than VicMap Admin and Property are released in various binary formats (eg the venerable ESRI shapefile format);

The VicMap Admin and VicMap Property data are released in a proprietary XML format described by a Document Type Definition (DTD) and not an XML Schema.

There are two types of release types for these two themes:

  • Seed – A complete copy of the data is released at a point in time;
  • Incremental Update Format (IUF) – Contains deltas, or changes, the base (seed) data.

The seed and IUF data formats are the same except (controlled by same DTD) that the IUF format has INSERT, REPLACE and DELETE high level tags, with the seed only having an INSERT tag.

There are spatial software packages that support the VicMap DTD such as the ubiquitous and excellent Feature Manipulation Engine (FME).

But there are situations where one might not have FME but still need to be able to process and import the data into a database using standard IT tools.

Perhaps there is a requirement to process the XML in a Cloud environment such as Azure using only Python using as light a footprint as possible.

This article shows how to use a few IT tools, and SQL Server’s XML capabilities, to load a some layers from VicMap Admin theme.

Sample Data

The example document we will use for our examples is a cut-down version of a basic VicMap seed document (with fabricated) polygons.

To give an idea of how large a seed document can be, the all_seed_data_17-feb-2019.xml is 342 MB (358,757,517 bytes) in size!

Our cut down VicMap Admin seed document only contains the following layers out of the total that are possible (see above):

  • Local Government Area (LGA) boundaries and names
  • Locality boundaries and names
  • Postcode boundaries and numbers
  • Parish boundaries and names
  • Township boundaries and names

The tag hierarchy for each theme (with simple polygon coordinates) is demonstrated by the following example:

<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
   <DESCRIPTION>
      <AUTHORITY>LAND VICTORIA</AUTHORITY>
      <DATASET>VICMAP DIGITAL PROPERTY</DATASET>
      <EXPORT SCHEMA="VICTORIA" LGA="ALL" FROM_DATE="17-FEB-2019"
              TO_DATE="17-FEB-2019" FORMAT="LLD GDA94" DATE="15-FEB-2019" BY="SKM">
      </EXPORT>
   </DESCRIPTION>
   <INSERT>
      <POLYGON>
         <!--   SEED for POLYGON table LGA_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="LGA_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
               ... Polygon Description
            </POLYGON_ITEM>
            ... More Polygons
         </POLYGON_NAME>

         <!--   SEED for POLYGON table LOCALITY_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="LOCALITY_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
               ... Polygon Description
            </POLYGON_ITEM>
            ... More Polygons
         </POLYGON_NAME>

         <!--   SEED for POLYGON table POSTCODE_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="POSTCODE_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
               ... Polygon Description
            </POLYGON_ITEM>
            ... More Polygons
         </POLYGON_NAME>

         <!--   SEED for POLYGON table PARISH_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="PARISH_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
               ... Polygon Description
            </POLYGON_ITEM>
            ... More Polygons
         </POLYGON_NAME>

         <!--   SEED for POLYGON table TOWNSHIP_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="TOWNSHIP_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
               ... Polygon Description
            </POLYGON_ITEM>
            ... More Polygons
         </POLYGON_NAME>
      </POLYGON>
   </INSERT>
</IUF>

The top level XML tag element IUF is used for both seed and delta documents.

This seed XML data document does not contain POINT or LINE elements as all the data is polygonal.

The absence of REPLACE or DELETE elements (INSERT tags only) indicates that the IUF docment is a seeddocument; their presence indicates that the document is incremental update document. If it did contain incremental data it might look like this:

<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<iuf>
   <description>
      <authority>LAND VICTORIA</authority>
      <dataset>VICMAP DIGITAL PROPERTY</dataset>
      <export schema="VICTORIA" lga="ALL" from_date="17-FEB-2019" to_date="17-FEB-2019" format="LLD GDA94" date="15-FEB-2019" by="SKM">
      </export>
   </description>
   <insert>
     <polygon>...</polygon>
   </insert>
   <replace>
     <polygon>...</polygon>
   </replace>
   <delete>
     <polygon>...</polygon>
   </delete>
</iuf>

1. Selecting Objects by Type Using XML Attributes

When processing a document in SQL we often process the data a single layer at a time. This enables the process to write to a dedicated table eg [dbo].[Locality].

This requires the filtering of the IUF document.

The filtering can be done using XPATH against the XML document, as the following example shows.

with data as (
select CONVERT(xml, 
'<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
   <INSERT>
      <POLYGON>
         <!--   SEED for POLYGON table LGA_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="LGA_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
                 <OUTER><XY>1,2</XY></OUTER>
            </POLYGON_ITEM>
         </POLYGON_NAME>
         <!--   SEED for POLYGON table LOCALITY_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="LOCALITY_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
                 <OUTER><XY>1,2</XY></OUTER>
            </POLYGON_ITEM>
         </POLYGON_NAME>
         <!--   SEED for POLYGON table PARISH_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="PARISH_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
                 <OUTER><XY>1,2</XY></OUTER>
            </POLYGON_ITEM>
         </POLYGON_NAME>
         <!--   SEED for POLYGON table POSTCODE_POLYGON commencing   -->
         <POLYGON_NAME>
            <POLYGON_ITEM>
               <HEADER TABLE="POSTCODE_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER>
                 <OUTER><XY>1,2</XY></OUTER>
            </POLYGON_ITEM>
         </POLYGON_NAME>
      </POLYGON>
   </INSERT>
</IUF>',2) as iuf
)
select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)') as table_name
  from data as d
       CROSS APPLY 
       d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c);
GO

table_name
----------------
LOCALITY_POLYGON

Filtering (selection) can also be done using a SQL WHERE clause predicate on the HEADER/@TABLE attribute as follows:

WITH data AS (
...
)
SELECT n.c.value('(HEADER/@TABLE)[1]','varchar(max)') as table_name
  FROM data as d
       CROSS APPLY
       d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM') n(c)
 WHERE n.c.value('(HEADER/@TABLE)[1]','varchar(max)') ='LOCALITY_POLYGON'
GO

table_name
----------------
LOCALITY_POLYGON

Processing Polygon Layer Spatial Elements

The following (fabricated) XML shows how the IUF encodes an admin polygon’s exterior and interior rings, and its attributes:

<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
  <INSERT>
    <POLYGON>
      <!--   SEED for POLYGON table LGA_POLYGON commencing   -->
      <POLYGON_NAME>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER>
          <OUTER>
            <XY>148.04446887,-36.39192699</XY>
            <XY>148.04487,-36.39192699</XY>
            <XY>148.04487,-36.39175625</XY>
            <XY>148.04446887,-36.39175625</XY>
            <XY>148.04446887,-36.39192699</XY>
          </OUTER>
          <INNER>
            <XY>148.04452460167477,-36.391801308813505</XY>
            <XY>148.04452460173229,-36.391881931347996</XY>
            <XY>148.04481426826769,-36.391881931347953</XY>
            <XY>148.04481426832533,-36.391801308813427</XY>
            <XY>148.04452460167477,-36.391801308813505</XY>
          </INNER>
          <OUTER>
            <XY>148.0445607,-36.3918597</XY>
            <XY>148.0446579,-36.3918318</XY>
            <XY>148.0445518,-36.3918129</XY>
            <XY>148.0445607,-36.3918597</XY>
            <XY>148.0446211,-36.3917394</XY>
          </OUTER>
          <INNER>
            <XY>148.0447562,-36.3917394</XY>
            <XY>148.0447562,-36.3916796</XY>
            <XY>148.04462,-36.391679</XY>
            <XY>148.04462,-36.3916778</XY>
            <XY>148.0446211,-36.3917394</XY>
          </INNER>
          <ATT>LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>VICNAMES_ID="103388"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
      </POLYGON_NAME>
    </POLYGON>
  </INSERT>
</IUF>

We’ll see what this locality polygon looks like soon.

Cases

There are a number of cases/situations for encoding a polygon that the XML parsing must handle:

  • A polygon with a single exterior/OUTER ring and no interior/INNER ring.
  • A polygon with a single exterior/OUTER ring and one or more interior/INNER rings.
  • A polygon composed of more than one exterior/OUTER rings and one or more interior/INNER rings: a MultiPolygon.

Case 1: A polygon with a single exterior/OUTER ring

SQL Server XML processing does not allow for the parsing of XML with internal subset DTDs. To handle this, CONVERT with style option 2 is used to enable limited internal subset DTD support.

with data as (
select CONVERT(xml, 
'<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
  <INSERT>
    <POLYGON>
      <!--   SEED for POLYGON table LGA_POLYGON commencing   -->
      <POLYGON_NAME>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER>
          <OUTER>
            <XY>148.04446887,-36.39192699</XY>
            <XY>148.04487,-36.39192699</XY>
            <XY>148.04487,-36.39175625</XY>
            <XY>148.04446887,-36.39175625</XY>
            <XY>148.04446887,-36.39192699</XY>
          </OUTER>
          <ATT>LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>VICNAMES_ID="103388"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
      </POLYGON_NAME>
      <!--   SEED for POLYGON table LGA_POLYGON commencing   -->
      <POLYGON_NAME>
        <POLYGON_ITEM>
          <HEADER TABLE="TOWNSHIP_POLYGON" PFI="4608465" CREATED="20190208064929" UFI="568037229"></HEADER>
          <OUTER>
            <XY>141.95512062,-36.83868465</XY>
            <XY>141.96277778,-36.83945073</XY>
            <XY>141.96301054,-36.83947401</XY>
            <XY>141.96419852,-36.83959281</XY>
            <XY>141.95512062,-36.83868465</XY>
          </OUTER>
          <ATT>TOWNSHIP_CODE="5601"</ATT>
          <ATT>TOWNSHIP_NAME="NORADJUHA TP"</ATT>
        </POLYGON_ITEM>
      </POLYGON_NAME>
    </POLYGON>
  </INSERT>
</IUF>',2) as iuf
)
select f.table_name, f.pfi, f.ufi, f.create_date,
       geography::STGeomFromText('POLYGON ((' + STRING_AGG(REPLACE(f.XY,',',' '),',') + '))',4283).STAsText() as polygon
  from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)') as table_name,
               n.c.value('(HEADER/@PFI)[1]',    'varchar(100)') as pfi,
               n.c.value('(HEADER/@UFI)[1]',    'varchar(100)') as ufi,
               n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date,
               o.x.value('(text())[1]','varchar(max)')          as XY
          from data as d
               CROSS APPLY 
               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
               CROSS APPLY
               n.c.nodes('OUTER/XY') o(x)
       ) as f
 group by f.table_name, f.pfi, f.ufi, f.create_date
GO

table_name       pfi       ufi       create_date    polygon
LOCALITY_POLYGON 205407825 468965778 20131024153045 POLYGON ((148.04446887 -36.39192699, 148.04487 -36.39192699, 148.04487 -36.39175625, 148.04446887 -36.39175625, 148.04446887 -36.39192699))

This looks like:

Visual Representation of Polygon with a single exterior ring

Case 2: A polygon with a single exterior/OUTER ring, one interior/INNER ring.

Note that outer rings are in Counter-ClockWise order (CCW) order with inner rings being in Clockwise order.

The approach taken is to:

  • Construct WKT from the outer rings using the XY elements in the order they appear in the document.
  • Construct a WKT polygon for each inner ring but reversing the order of the XY ordinates to produce a valid exterior ring for each;
  • Use STSymDifference to create a polygon with the exterior and interior rings correctly processed to produce a valid polygon.
with data as (
select CONVERT(xml, 
'<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
  <INSERT>
    <POLYGON>
      <!--   SEED for POLYGON table LGA_POLYGON commencing   -->
      <POLYGON_NAME>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER>
          <OUTER>
            <XY>148.04446887,-36.39192699</XY>
            <XY>148.04487,-36.39192699</XY>
            <XY>148.04487,-36.39175625</XY>
            <XY>148.04446887,-36.39175625</XY>
            <XY>148.04446887,-36.39192699</XY>
          </OUTER>
          <INNER>
            <XY>148.04452460167477,-36.391801308813505</XY>
            <XY>148.04481426832533,-36.391801308813427</XY>
            <XY>148.04481426826769,-36.391881931347953</XY>
            <XY>148.04452460173229,-36.391881931347996</XY>
            <XY>148.04452460167477,-36.391801308813505</XY>
          </INNER>
          <ATT>LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>VICNAMES_ID="103388"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
      </POLYGON_NAME>
    </POLYGON>
  </INSERT>
</IUF>',2) as iuf
)
select a.table_name, a.pfi, a.ufi, a.create_date,
       /* Take into account that the polygon may not have any holes (INNER) */
       (case when b.interior_ring is not null
             then a.exterior_ring.STSymDifference(b.interior_ring) 
             else a.exterior_ring
        end).STAsText() as polygon
  from (select f.table_name, f.pfi, f.ufi, f.create_date,
               geography::STGeomFromText('POLYGON ((' + STRING_AGG(REPLACE(f.XY,',',' '),',') + '))',4283) as exterior_ring
          from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)') as table_name,
                       n.c.value('(HEADER/@PFI)[1]',    'varchar(100)') as pfi,
                       n.c.value('(HEADER/@UFI)[1]',    'varchar(100)') as ufi,
                       n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date,
                       o.x.value('(text())[1]','varchar(max)')          as XY
                  from data as d
                       CROSS APPLY 
                       d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                       CROSS APPLY
                       n.c.nodes('OUTER/XY') o(x)
               ) as f
        group by f.table_name, f.pfi, f.ufi, f.create_date
     ) as a
     LEFT OUTER JOIN /* In case no inners */
     (select f.table_name, f.pfi, f.ufi, f.create_date,
                geography::STGeomFromText('POLYGON ((' + (STRING_AGG(REPLACE(f.XY,',',' '),',') WITHIN GROUP (ORDER BY f.coord_position DESC)) + '))',4283) as interior_ring
        from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)') as table_name,
                     n.c.value('(HEADER/@PFI)[1]',    'varchar(100)') as pfi,
                     n.c.value('(HEADER/@UFI)[1]',    'varchar(100)') as ufi,
                     n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date,
                     i.x.value('(text())[1]',         'varchar(max)') as XY,
                     /* Generate sequence number For all XY pairs for current INNER in XML order */ 
                     i.x.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position
                from data as d
                     CROSS APPLY 
                     d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                     CROSS APPLY
                     n.c.nodes('INNER/XY') i(x)
            ) as f
      group by f.table_name, f.pfi, f.ufi, f.create_date
      ) as b
      ON (b.table_name = a.table_name 
      and b.pfi = a.pfi
      and b.ufi = a.ufi
      and b.create_date = a.create_date
      );

GO

table_name       pfi       ufi       create_date    polygon
LOCALITY_POLYGON 205407825 468965778 20131024153045 POLYGON ((148.04446887 -36.39175625, 148.04446887 -36.39192699, 148.04487 -36.39192699, 148.04487 -36.39175625, 148.04446887 -36.39175625), (148.04452460167477 -36.391801308813505, 148.04481426832533 -36.391801308813427, 148.04481426826769 -36.391881931347953, 148.04452460173229 -36.391881931347996, 148.04452460167477 -36.391801308813505))

This looks like:

Visual Representation of a Polygon with a single Exterior Ring and a Single Interior Ring

Case 3. MultiPolygon composed of two exterior/OUTER rings, one with a single interior/INNER ring, the other with two INNER rings.

The seed/IUF XML does not provide any ordering of the OUTER/INNER/XY elements.

Because SQL does not guarantee order (fundamental property of relational theory), some method for sorting the OUTER/INNER/XY tags within a polygon is needed.

Why is this a problem?

Well, if the XY coordinate elements described thousands of vertices, it would be unsafe to assume that the query optimizer would maintain row order from one select to its parent.

One way this could have been done is through the original IUF DTD allowing the OUTER/INNER/XY tags to have an order attribute. For example:

<outer id="1">
  <xy id="1"">148.04446887,-36.39192699</xy>
  ...
</outer>
<inner id="1">
  <xy id="1"">148.02938287,-36.31287299</xy>
  ...
</inner>
<inner id="2">
  <xy id="1"">148.12097854 -36.1289439</xy>
  ...
</inner>

But this is not available to us.

Since we have only XML processing available to us, we need to find a way of generating surrogate identifiers.
One way is to use an XML VALUE method which maintains the order the OUTER/INNER/XY tags appear in the document.

This is done in the following XML processing. Take note of the XPATH that generates the following elements:

  • outer_position
  • inner_position
  • coord_position

(Another method for ensuring XY order is described in another article.)

with data as (
select CONVERT(xml, 
'<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
  <INSERT>
    <POLYGON>
      <!--   SEED for POLYGON table LGA_POLYGON commencing   -->
      <POLYGON_NAME>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER>
          <OUTER>
            <XY>148.04446887,-36.39192699</XY>
            <XY>148.04487,-36.39192699</XY>
            <XY>148.04487,-36.39175625</XY>
            <XY>148.04446887,-36.39175625</XY>
            <XY>148.04446887,-36.39192699</XY>
          </OUTER>
          <INNER>
            <XY>148.04452460167477,-36.391801308813505</XY>
            <XY>148.04481426832533,-36.391801308813427</XY>
            <XY>148.04481426826769,-36.391881931347953</XY>
            <XY>148.04452460173229,-36.391881931347996</XY>
            <XY>148.04452460167477,-36.391801308813505</XY>
          </INNER>
          <OUTER>
            <XY>148.044619,-36.3917393</XY>
            <XY>148.0447565,-36.3917393</XY>
            <XY>148.0447565,-36.391678</XY>
            <XY>148.044619,-36.391678</XY>
            <XY>148.044619,-36.3917393</XY>
          </OUTER>
          <INNER>
            <XY>148.04463,-36.39169</XY>
            <XY>148.04475,-36.39169</XY>
            <XY>148.04475,-36.39173</XY> 
            <XY>148.04463,-36.39169</XY>
          </INNER>
          <INNER>
            <XY>148.04463073970265441 -36.39171145551856767</XY>
            <XY>148.0446664800863914 -36.39173171778336524</XY>
            <XY>148.04463186538401942 -36.39173115494267563</XY>
            <XY>148.04463073970265441 -36.39171145551856767</XY>
          </INNER>
          <ATT>LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>VICNAMES_ID="103388"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
      </POLYGON_NAME>
    </POLYGON>
  </INSERT>
</IUF>',2) as iuf
)
select /* Remove INNER polygons from OUTER polygons */
       a.table_name, a.pfi, a.ufi, a.create_date,
       /* Take into account that the polygon may not have any holes (INNER) */
       (case when b.interior_ring is not null
             then a.exterior_ring.STSymDifference(b.interior_ring) 
             else a.exterior_ring
         end).STAsText() as polygon
  from (select /* Aggregate all individual OUTER polygons within table/polygon_item */
               g.table_name, g.pfi, g.ufi, g.create_date,
               geography::CollectionAggregate(g.exterior_ring) as exterior_ring
          from (/* Generate OUTER polygons for each individual geometry within polygon_name */
                select f.table_name, f.pfi, f.ufi, f.create_date,
                       geography::STGeomFromText('POLYGON ((' +  
                                      (STRING_AGG(f.coord_XY,',') WITHIN GROUP (ORDER BY f.coord_position ASC) ) + 
                                     '))',
                                     4283
                       ) as exterior_ring
                  from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)')                as table_name,
                               n.c.value('(HEADER/@PFI)[1]',    'varchar(100)')                as pfi,
                               n.c.value('(HEADER/@UFI)[1]',    'varchar(100)')                as ufi,
                               n.c.value('(HEADER/@CREATED)[1]','varchar(100)')                as create_date,
                               o.x.value('(local-name(.))',     'varchar(100)')                as coord_tag,
                               /* Generate sequence number For all INNERs in current polygon_item */ 
                               o.x.value('for $i in . return count(../*[. << $i]) + 1', 'int') as outer_position,
                               /* Generate sequence number For all XY pairs for current INNER in XML order */ 
                               x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position,
                               REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ')       as coord_XY
                          from data as d
                               CROSS APPLY 
                               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                               CROSS APPLY
                               n.c.nodes('OUTER') o(x)  /* Extract OUTER node to enable counting */
                               CROSS APPLY
                               o.x.nodes('XY') x(y)     /* Extract XY coordinates for current OUTER */
                       ) as f
                 group by f.table_name, f.pfi, f.ufi, f.create_date, f.outer_position
                ) as g
          group by g.table_name, g.pfi, g.ufi, g.create_date
       ) as a
       LEFT OUTER JOIN /* In case no inners */
       (select /* Aggregate all individual OUTER polygons within table/polygon_item */
               g.table_name, g.pfi, g.ufi, g.create_date,
               geography::CollectionAggregate(g.interior_ring) as interior_ring
          from (/* Generate INNER rings as polygons for each individual geometry within polygon_name.
                   To create valid polygons we need to revese the XY order for each INNER ring */
                select f.table_name, f.pfi, f.ufi, f.create_date,
                       geography::STGeomFromText('POLYGON ((' +  
                                      (STRING_AGG(f.coord_XY,',') WITHIN GROUP (ORDER BY f.coord_position DESC) ) + 
                                  '))',
                                  4283
                       ) as interior_ring
                  from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)')                as table_name,
                               n.c.value('(HEADER/@PFI)[1]',    'varchar(100)')                as pfi,
                               n.c.value('(HEADER/@UFI)[1]',    'varchar(100)')                as ufi,
                               n.c.value('(HEADER/@CREATED)[1]','varchar(100)')                as create_date,
                               o.x.value('(local-name(.))',     'varchar(100)')                as coord_tag,
                               /* Generate sequence number For all INNERs in current polygon_item */ 
                               o.x.value('for $i in . return count(../*[. << $i]) + 1', 'int') as inner_position,
                               /* Generate sequence number For all XY pairs for current INNER in XML order */ 
                               x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position,
                               REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ')       as coord_XY
                          from data as d
                               CROSS APPLY 
                               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                               CROSS APPLY
                               n.c.nodes('INNER') o(x)  /* Extract INNER node to enable counting */
                               CROSS APPLY
                               o.x.nodes('XY') x(y)     /* Extract XY coordinates for current INNER */
                       ) as f
                 group by f.table_name, f.pfi, f.ufi, f.create_date, f.inner_position
               ) as g
          group by g.table_name, g.pfi, g.ufi, g.create_date
      ) as b
      ON (b.table_name = a.table_name 
      and b.pfi = a.pfi
      and b.ufi = a.ufi
      and b.create_date = a.create_date
      );
GO

table_name       pfi       ufi       create_date    polygon
LOCALITY_POLYGON 205407825 468965778 20131024153045 MULTIPOLYGON (((148.044619 -36.391678, 148.044619 -36.3917393, 148.0447565 -36.3917393, 148.0447565 -36.391678, 148.044619 -36.391678), (148.04463073970265 -36.391711455518568, 148.04466648008639 -36.391731717783365, 148.04463186538402 -36.391731154942676, 148.04463073970265 -36.391711455518568), (148.04463 -36.39169, 148.04475 -36.39169, 148.04475 -36.39173, 148.04463 -36.39169)), ((148.04446887 -36.39175625, 148.04446887 -36.39192699, 148.04487 -36.39192699, 148.04487 -36.39175625, 148.04446887 -36.39175625), (148.04452460167477 -36.391801308813505, 148.04481426832533 -36.391801308813427, 148.04481426826769 -36.391881931347953, 148.04452460173229 -36.391881931347996, 148.04452460167477 -36.391801308813505)))

This looks like:

MultiPolygon composed of two exterior/OUTER rings, one with a single interior/INNER ring, the other two INNER rings.

This SQL and XML processing will handle all possible (multi)polygon exterior/interior ring combinations.

Less is More?

We can reduce the use of SQL Server Spatial geometry constructors and functions by a fuller generation of the Well Known Text (WKT) for a polygon rings before constructing the geometry.

One might want to do this if the XML is processed externally to the database with the geography objects being created by a simple INSERT statement.

The following is a version of the previous where SQL Server Spatial constructors and function calls are only used in the last SELECT.

with data as (
select CONVERT(xml, 
'<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
  <INSERT>
    <POLYGON>
      <!--   SEED for POLYGON table LOCALITY_POLYGON commencing   -->
      <POLYGON_NAME>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER>
          <OUTER>
            <XY>148.04446887,-36.39192699</XY>
            <XY>148.04487,-36.39192699</XY>
            <XY>148.04487,-36.39175625</XY>
            <XY>148.04446887,-36.39175625</XY>
            <XY>148.04446887,-36.39192699</XY>
          </OUTER>
          <INNER>
            <XY>148.04452460167477,-36.391801308813505</XY>
            <XY>148.04481426832533,-36.391801308813427</XY>
            <XY>148.04481426826769,-36.391881931347953</XY>
            <XY>148.04452460173229,-36.391881931347996</XY>
            <XY>148.04452460167477,-36.391801308813505</XY>
          </INNER>
          <OUTER>
            <XY>148.044619,-36.3917393</XY>
            <XY>148.0447565,-36.3917393</XY>
            <XY>148.0447565,-36.391678</XY>
            <XY>148.044619,-36.391678</XY>
            <XY>148.044619,-36.3917393</XY>
          </OUTER>
          <INNER>
            <XY>148.04463,-36.39169</XY>
            <XY>148.04475,-36.39169</XY>
            <XY>148.04475,-36.39173</XY> 
            <XY>148.04463,-36.39169</XY>
          </INNER>
          <INNER>
            <XY>148.04463073970265441 -36.39171145551856767</XY>
             <XY>148.0446664800863914 -36.39173171778336524</XY>
            <XY>148.04463186538401942 -36.39173115494267563</XY>
            <XY>148.04463073970265441 -36.39171145551856767</XY>
          </INNER>
          <ATT>LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>VICNAMES_ID="103388"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205423232" CREATED="20191024153045" UFI="461413148"></HEADER>
          <OUTER>
            <XY>148.0444519, -36.3917393</XY>
            <XY>148.044619,  -36.3917393</XY>
            <XY>148.044619,  -36.391678</XY>
            <XY>148.0444519, -36.391678</XY>
            <XY>148.0444519, -36.3917393</XY>
          </OUTER>
          <INNER>
            <XY>148.044453,-36.39169</XY>
            <XY>148.04459,-36.39169</XY>
            <XY>148.04459,-36.39173</XY> 
            <XY>148.044453,-36.39169</XY>
          </INNER>
          <ATT>LOCALITY_NAME="WOORARRA WEST"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOORARRA WEST"</ATT>
          <ATT>VICNAMES_ID="103414"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
      </POLYGON_NAME>
    </POLYGON>
  </INSERT>
</IUF>',2) as iuf
)
select /* Remove INNER polygons from OUTER polygons */
       a.table_name, a.pfi, a.ufi, a.create_date,
       /* Take into account that the polygon may not have any holes (INNER) */
       (case when b.interior_rings is not null
            then geography::STGeomFromText(
                     case when CHARINDEX(',',a.exterior_rings) <> 0 
                          then 'MULTIPOLYGON(' + a.exterior_rings + ')'
                          else 'POLYGON' + a.exterior_rings 
                      end,
                     4283
                 ).STSymDifference(
                   geography::STGeomFromText(
                     case when CHARINDEX(',',b.interior_rings) <> 0 
                          then 'MULTIPOLYGON(' + b.interior_rings + ')'
                          else 'POLYGON' + b.interior_rings 
                      end,
                     4283
                   )
                 ) 
            else geography::STGeomFromText(
                     case when CHARINDEX(',',a.exterior_rings) <> 0 
                          then 'MULTIPOLYGON(' + a.exterior_rings + ')'
                          else 'POLYGON' + a.exterior_rings 
                       end,
                     4283
                 )
        end).STAsText() as polygon
  from (select /* Aggregate all individual OUTER polygons within table/polygon_item */
               g.table_name, g.pfi, g.ufi, g.create_date,
               '('+ STRING_AGG(g.exterior_ring,'),(') + ')' as exterior_rings
          from (/* Generate OUTER polygons for each individual geometry within polygon_name */
                select f.table_name, f.pfi, f.ufi, f.create_date,
                       '(' +  (STRING_AGG(f.coord_XY,',') WITHIN GROUP (ORDER BY f.coord_position ASC)) + ')'
                        as exterior_ring
                  from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)')                as table_name,
                               n.c.value('(HEADER/@PFI)[1]',    'varchar(100)')                as pfi,
                               n.c.value('(HEADER/@UFI)[1]',    'varchar(100)')                as ufi,
                               n.c.value('(HEADER/@CREATED)[1]','varchar(100)')                as create_date,
                               o.x.value('(local-name(.))',     'varchar(100)')                as coord_tag,
                               /* Generate sequence number For all INNERs in current polygon_item */ 
                               o.x.value('for $i in . return count(../*[. << $i]) + 1', 'int') as outer_position,
                               /* Generate sequence number For all XY pairs for current INNER in XML order */ 
                               x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position,
                               REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ')       as coord_XY
                          from data as d
                               CROSS APPLY 
                               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                               CROSS APPLY
                               n.c.nodes('OUTER') o(x)  /* Extract OUTER node to enable counting */
                               CROSS APPLY
                               o.x.nodes('XY') x(y)     /* Extract XY coordinates for current OUTER */
                       ) as f
                 group by f.table_name, f.pfi, f.ufi, f.create_date, f.outer_position
                ) as g
          group by g.table_name, g.pfi, g.ufi, g.create_date
       ) as a
       LEFT OUTER JOIN /* In case no inners */
       (select /* Aggregate all individual OUTER polygons within table/polygon_item */
               g.table_name, g.pfi, '('+ STRING_AGG(g.interior_ring,'),(') + ')' as interior_rings
          from (/* Generate INNER polygons for each individual geometry within polygon_name */
                select f.table_name, f.pfi,
                       '(' +  (STRING_AGG(f.coord_XY,',') WITHIN GROUP (ORDER BY f.coord_position DESC)) + ')' as interior_ring
                  from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)')                  as table_name,
                               n.c.value('(HEADER/@PFI)[1]',    'varchar(100)')                  as pfi,
                               o.x.value('(local-name(.))',     'varchar(100)')                  as coord_tag,
                               /* Generate sequence number For all INNERs in current polygon_item */ 
                               o.x.value('for $i in . return count(../INNER/*[. << $i]) + 1', 'int') as inner_position,
                               n.c.value('count(INNER)', 'int')                                      as inner_ring_count,
                               /* Generate sequence number For all XY pairs for current INNER in XML order */ 
                               x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int')   as coord_position,
                               REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ')         as coord_XY
                          from data as d
                               CROSS APPLY 
                               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM') n(c)
                               CROSS APPLY
                               n.c.nodes('INNER') o(x)  /* Extract INNER node to enable counting */
                               CROSS APPLY
                               o.x.nodes('XY') x(y)     /* Extract XY coordinates for current INNER */
                       ) as f
                 group by f.table_name, f.pfi, f.inner_position
               ) as g
          group by g.table_name, g.pfi
      ) as b
      ON (b.table_name = a.table_name 
      and b.pfi = a.pfi
      )
GO

table_name       pfi       ufi       create_date    polygon
LOCALITY_POLYGON 205407825 468965778 20131024153045 MULTIPOLYGON (((148.044619 -36.391678, 148.044619 -36.3917393, 148.0447565 -36.3917393, 148.0447565 -36.391678, 148.044619 -36.391678), (148.04463073970265 -36.391711455518568, 148.04466648008639 -36.391731717783365, 148.04463186538402 -36.391731154942676, 148.04463073970265 -36.391711455518568), (148.04463 -36.39169, 148.04475 -36.39169, 148.04475 -36.39173, 148.04463 -36.39169)), ((148.04446887 -36.39175625, 148.04446887 -36.39192699, 148.04487 -36.39192699, 148.04487 -36.39175625, 148.04446887 -36.39175625), (148.04452460167477 -36.391801308813505, 148.04481426832533 -36.391801308813427, 148.04481426826769 -36.391881931347953, 148.04452460173229 -36.391881931347996, 148.04452460167477 -36.391801308813505)))
LOCALITY_POLYGON 205423232 461413148 20191024153045 POLYGON ((148.0444519 -36.391678, 148.0444519 -36.3917393, 148.044619 -36.3917393, 148.044619 -36.391678, 148.0444519 -36.391678), (148.044453 -36.39169, 148.04459 -36.39169, 148.04459 -36.39173, 148.044453 -36.39169))

This looks like:

Less is More.

Attribute Extraction

Extracting attributes involves processing the tags.

Each attribute is encoded as a name/value pair.

The attributes and their values are extracted, in the following example, for a known layer type.

Extraction uses the SQL Server PIVOT operator.

with data as (
select CONVERT(xml, 
'<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
  <INSERT>
    <POLYGON>
      <!--   SEED for POLYGON table LGA_POLYGON commencing   -->
      <POLYGON_NAME>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER>
          <OUTER>
            <XY>148.04446887,-36.39192699</XY>
            <XY>148.04487,   -36.39192699</XY>
            <XY>148.04487,   -36.39175625</XY>
            <XY>148.04446887,-36.39175625</XY>
            <XY>148.04446887,-36.39192699</XY>
          </OUTER>
          <ATT>LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>VICNAMES_ID="103388"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
      </POLYGON_NAME>
    </POLYGON>
  </INSERT>
</IUF>',2) as iuf
)
SELECT [table_name],[pfi],[ufi],[create_date], 
       [LOCALITY_NAME], [GAZETTED_LOCALITY_NAME], [VICNAMES_ID], [PFI_CREATED]
  FROM (select [table_name], [pfi],[ufi],[create_date], [attribute_name], [attribute_value]
          from (select n.c.value('(../HEADER/@TABLE)[1]',  'varchar(max)') as table_name,
                       n.c.value('(../HEADER/@PFI)[1]',    'varchar(100)') as pfi,
                       n.c.value('(../HEADER/@UFI)[1]',    'varchar(100)') as ufi,
                       n.c.value('(../HEADER/@CREATED)[1]','varchar(100)') as create_date,
                       SUBSTRING(n.c.value('(text())[1]',  'varchar(max)'),
                                 1,
                                 CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))-1
                                 ) as attribute_name,
                       REPLACE(
                         SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'),
                                   CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))+1,
                                   1000
                                  ),
                         '"','') as attribute_value
                  from data as d
                       CROSS APPLY 
                       d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]/ATT') n(c)
               ) as f
       ) AS T
  PIVOT
  (
      max(attribute_value)
      FOR attribute_name in ( LOCALITY_NAME, GAZETTED_LOCALITY_NAME, VICNAMES_ID, PFI_CREATED )
) AS V
GO

table_name       pfi       ufi       create_date    LOCALITY_NAME GAZETTED_LOCALITY_NAME VICNAMES_ID PFI_CREATED
LOCALITY_POLYGON 205407825 468965778 20131024153045 WOODFIELD     WOODFIELD              103388      20050927080612

Merge Spatial and Attributes

The following SQL/XML processing brings all the elements together to create a representation of a POLYGON_ITEM that includes is spatial and attribute data.

It also introduces a simplified way of creating the final polygon from the inner/outer elements.

Instead of selecting the final polygon, the SQL validates the generated polygon using the STIsValid() geography method.

with data as (
select CONVERT(xml, 
'<?xml version="1.0"?>
<!DOCTYPE IUF SYSTEM "iuf_full.dtd">
<IUF>
  <INSERT>
    <POLYGON>
      <!--   SEED for POLYGON table LOCALITY_POLYGON commencing   -->
      <POLYGON_NAME>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER>
          <OUTER>
            <XY>148.04446887,-36.39192699</XY>
            <XY>148.04487,-36.39192699</XY>
            <XY>148.04487,-36.39175625</XY>
            <XY>148.04446887,-36.39175625</XY>
            <XY>148.04446887,-36.39192699</XY>
          </OUTER>
          <INNER>
            <XY>148.04452460167477,-36.391801308813505</XY>
            <XY>148.04481426832533,-36.391801308813427</XY>
            <XY>148.04481426826769,-36.391881931347953</XY>
            <XY>148.04452460173229,-36.391881931347996</XY>
            <XY>148.04452460167477,-36.391801308813505</XY>
          </INNER>
          <OUTER>
            <XY>148.044619,-36.3917393</XY>
            <XY>148.0447565,-36.3917393</XY>
            <XY>148.0447565,-36.391678</XY>
            <XY>148.044619,-36.391678</XY>
            <XY>148.044619,-36.3917393</XY>
          </OUTER>
          <INNER>
            <XY>148.04463,-36.39169</XY>
            <XY>148.04475,-36.39169</XY>
            <XY>148.04475,-36.39173</XY> 
            <XY>148.04463,-36.39169</XY>
          </INNER>
          <INNER>
            <XY>148.04463073970265441 -36.39171145551856767</XY>
            <XY>148.0446664800863914 -36.39173171778336524</XY>
            <XY>148.04463186538401942 -36.39173115494267563</XY>
            <XY>148.04463073970265441 -36.39171145551856767</XY>
          </INNER>
          <ATT>LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT>
          <ATT>VICNAMES_ID="103388"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
        <POLYGON_ITEM>
          <HEADER TABLE="LOCALITY_POLYGON" PFI="205423232" CREATED="20191024153045" UFI="461413148"></HEADER>
          <OUTER>
            <XY>148.0444519, -36.3917393</XY>
            <XY>148.044619,  -36.3917393</XY>
            <XY>148.044619,  -36.391678</XY>
            <XY>148.0444519, -36.391678</XY>
            <XY>148.0444519, -36.3917393</XY>
          </OUTER>
          <INNER>
            <XY>148.044453,-36.39169</XY>
            <XY>148.04459,-36.39169</XY>
            <XY>148.04459,-36.39173</XY> 
            <XY>148.044453,-36.39169</XY>
          </INNER>
          <ATT>LOCALITY_NAME="WOORARRA WEST"</ATT>
          <ATT>GAZETTED_LOCALITY_NAME="WOORARRA WEST"</ATT>
          <ATT>VICNAMES_ID="103414"</ATT>
          <ATT>PFI_CREATED="20050927080612"</ATT>
        </POLYGON_ITEM>
      </POLYGON_NAME>
    </POLYGON>
  </INSERT>
</IUF>',2) as iuf
)
select a.table_name, a.pfi, a.ufi, a.create_date, c.LOCALITY_NAME, c.GAZETTED_LOCALITY_NAME, c.VICNAMES_ID, c.PFI_CREATED,
       geography::STGeomFromText(a.exterior_rings,4326)
           .STSymDifference( 
              geography::STGeomFromText(COALESCE(b.interior_rings,'POLYGON EMPTY'),4326)
       ).STIsValid() as isValidPolygon
  from (select g.table_name, g.pfi, g.ufi, g.create_date, 
              case when count(*) = 1 
                   then 'POLYGON('      + STRING_AGG(    g.exterior_ring,',') + ')'
                   else 'MULTIPOLYGON(' + STRING_AGG('('+g.exterior_ring+')',',') + ')'
               end as exterior_rings
        from (select f.table_name, f.pfi, f.ufi, f.create_date,  '(' +  (STRING_AGG(f.XY,',') WITHIN GROUP (ORDER BY f.xy_position ASC)) + ')' as exterior_ring
               from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)')                as table_name,
                            n.c.value('(HEADER/@PFI)[1]',    'varchar(100)')                as pfi,
                            n.c.value('(HEADER/@UFI)[1]',    'varchar(100)')                as ufi,
                            n.c.value('(HEADER/@CREATED)[1]','varchar(100)')                as create_date,
                            o.i.value('(local-name(.))',     'varchar(100)')                as coord_tag,
                            o.i.value('for $i in . return count(../*[. << $i]) + 1','int')  as ring_position,
                            x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as xy_position,
                            REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ')       as XY
                       from data as d
                            CROSS APPLY 
                            d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                            CROSS APPLY
                            n.c.nodes('OUTER') o(i)  /* Extract OUTER node to enable counting */
                            CROSS APPLY
                            o.i.nodes('XY') x(y)     /* Extract XY coordinates for current OUTER */
                    ) as f
              group by f.table_name, f.pfi, f.ufi, f.create_date, f.ring_position
            ) as g
        group by g.table_name, g.pfi, g.ufi, g.create_date
      ) as a
      left outer join
      (select g.table_name, g.pfi, 
              case when count(*) = 1 
                   then 'POLYGON('      + STRING_AGG(g.interior_ring,',') + ')'
                   else 'MULTIPOLYGON(' + STRING_AGG('('+g.interior_ring+')',',') + ')'
               end as interior_rings
         from (select f.table_name, f.pfi, f.coord_tag, f.ring_position, '('+(STRING_AGG(f.XY,',') WITHIN GROUP (ORDER BY f.xy_position DESC))+')' as interior_ring
                  from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)')                as table_name,
                               n.c.value('(HEADER/@PFI)[1]',    'varchar(100)')                as pfi,
                               o.i.value('(local-name(.))',     'varchar(100)')                as coord_tag,
                               o.i.value('for $i in . return count(../*[. << $i]) + 1','int')  as ring_position,
                               x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as xy_position,
                               REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ')       as XY
                          from data as d
                               CROSS APPLY 
                               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                               CROSS APPLY
                               n.c.nodes('INNER') o(i)  /* Extract OUTER node to enable counting */
                               CROSS APPLY
                               o.i.nodes('XY') x(y)     /* Extract XY coordinates for current OUTER */
                        ) as f
                        group by f.table_name, f.pfi, f.coord_tag, f.ring_position
                ) as g
             group by g.table_name, g.pfi
      ) as b
      on (b.table_name = a.table_name and b.pfi = a.pfi)
      LEFT OUTER JOIN /* Attributes */
       (SELECT pv.pfi,pv.ufi,pv.LOCALITY_NAME,pv.GAZETTED_LOCALITY_NAME,pv.VICNAMES_ID,pv.PFI_CREATED
          FROM (select f.pfi,f.ufi,f.attribute_name,f.attribute_value
                  from (select n.c.value('(../HEADER/@PFI)[1]',  'varchar(100)') as pfi,
                               n.c.value('(../HEADER/@UFI)[1]','varchar(100)') as ufi,
                               n.c.value('(../HEADER/@CREATED)[1]','varchar(100)') as create_date,
                               SUBSTRING(n.c.value('(text())[1]',  'varchar(max)'),
                                        1,
                                        CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))-1
                                        ) as attribute_name,
                              REPLACE(
                                SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'),
                                          CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))+1,
                                          1000
                                         ),
                                '"','') as attribute_value
                          from data as d
                               CROSS APPLY 
                               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]/ATT') n(c)
                       ) as f
               ) AS T
         PIVOT (
             MAX(attribute_value)
             FOR attribute_name in ( 
                 LOCALITY_NAME, 
                 GAZETTED_LOCALITY_NAME, 
                 VICNAMES_ID, 
                 PFI_CREATED 
             )
         ) as pv
       ) as c
       ON (c.pfi = a.pfi and c.ufi=a.ufi)
GO

table_name       pfi       ufi       create_date    LOCALITY_NAME GAZETTED_LOCALITY_NAME VICNAMES_ID PFI_CREATED    isValidPolygon
LOCALITY_POLYGON 205407825 468965778 20131024153045 WOODFIELD     WOODFIELD              103388      20050927080612 1
LOCALITY_POLYGON 205423232 461413148 20191024153045 WOORARRA WEST WOORARRA WEST          103414      20050927080612 1

Reading from a File

Part 2 of this series will involve processing all of an IUF seed document containing all the LGA, Locality and Postcode boundaries for Victoria.

But until then the following example shows how to load and process an IUF XML document from a file.

Firstly we take the IUF document from the CTE (WITH) clause in the last example, and save it to a file called admin_seed_example.xml.

The we can use the following modified query which directly accesses this document using the OPENROWSET function.

The SQL is slightly different fro, the last as it returns polygon area via STArea().

with data as (
SELECT CONVERT(xml, BulkColumn, 2) as iuf
  FROM OPENROWSET(BULK N'C:\Users\Simon\Downloads\admin_seed_example.xml', 
                  CODEPAGE ='RAW',
                  SINGLE_CLOB) AS Document
)
select a.table_name, a.pfi, a.ufi, a.create_date, c.LOCALITY_NAME, c.GAZETTED_LOCALITY_NAME, c.VICNAMES_ID, c.PFI_CREATED,
       geography::STGeomFromText(a.exterior_rings,4326)
           .STSymDifference( 
              geography::STGeomFromText(COALESCE(b.interior_rings,'POLYGON EMPTY'),4326)
       ).STArea() as polygon_area
 from (select g.table_name, g.pfi, g.ufi, g.create_date, 
              case when count(*) = 1 
                   then 'POLYGON('      + STRING_AGG(    g.exterior_ring,',') + ')'
                   else 'MULTIPOLYGON(' + STRING_AGG('('+g.exterior_ring+')',',') + ')'
               end as exterior_rings
        from (select f.table_name, f.pfi, f.ufi, f.create_date,  '(' +  (STRING_AGG(f.XY,',') WITHIN GROUP (ORDER BY f.xy_position ASC)) + ')' as exterior_ring
               from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)')                as table_name,
                            n.c.value('(HEADER/@PFI)[1]',    'varchar(100)')                as pfi,
                            n.c.value('(HEADER/@UFI)[1]',    'varchar(100)')                as ufi,
                            n.c.value('(HEADER/@CREATED)[1]','varchar(100)')                as create_date,
                            o.i.value('(local-name(.))',     'varchar(100)')                as coord_tag,
                            o.i.value('for $i in . return count(../*[. << $i]) + 1','int')  as ring_position,
                            x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as xy_position,
                            REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ')       as XY
                       from data as d
                            CROSS APPLY 
                            d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                            CROSS APPLY
                            n.c.nodes('OUTER') o(i)  /* Extract OUTER node to enable counting */
                            CROSS APPLY
                            o.i.nodes('XY') x(y)     /* Extract XY coordinates for current OUTER */
                    ) as f
              group by f.table_name, f.pfi, f.ufi, f.create_date, f.ring_position
            ) as g
        group by g.table_name, g.pfi, g.ufi, g.create_date
      ) as a
      left outer join
      (select g.table_name, g.pfi, 
              case when count(*) = 1 
                   then 'POLYGON('      + STRING_AGG(g.interior_ring,',') + ')'
                   else 'MULTIPOLYGON(' + STRING_AGG('('+g.interior_ring+')',',') + ')'
               end as interior_rings
         from (select f.table_name, f.pfi, f.coord_tag, f.ring_position, '('+(STRING_AGG(f.XY,',') WITHIN GROUP (ORDER BY f.xy_position DESC))+')' as interior_ring
                  from (select n.c.value('(HEADER/@TABLE)[1]',  'varchar(max)')                as table_name,
                               n.c.value('(HEADER/@PFI)[1]',    'varchar(100)')                as pfi,
                               o.i.value('(local-name(.))',     'varchar(100)')                as coord_tag,
                               o.i.value('for $i in . return count(../*[. << $i]) + 1','int')  as ring_position,
                               x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as xy_position,
                               REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ')       as XY
                          from data as d
                               CROSS APPLY 
                               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c)
                               CROSS APPLY
                               n.c.nodes('INNER') o(i)  /* Extract OUTER node to enable counting */
                               CROSS APPLY
                               o.i.nodes('XY') x(y)     /* Extract XY coordinates for current OUTER */
                        ) as f
                        group by f.table_name, f.pfi, f.coord_tag, f.ring_position
                ) as g
             group by g.table_name, g.pfi
      ) as b
      on (b.table_name = a.table_name and b.pfi = a.pfi)
      LEFT OUTER JOIN /* Attributes */
       (SELECT pv.pfi,pv.ufi,pv.LOCALITY_NAME,pv.GAZETTED_LOCALITY_NAME,pv.VICNAMES_ID,pv.PFI_CREATED
          FROM (select f.pfi,f.ufi,f.attribute_name,f.attribute_value
                  from (select n.c.value('(../HEADER/@PFI)[1]',  'varchar(100)') as pfi,
                               n.c.value('(../HEADER/@UFI)[1]','varchar(100)') as ufi,
                               n.c.value('(../HEADER/@CREATED)[1]','varchar(100)') as create_date,
                               SUBSTRING(n.c.value('(text())[1]',  'varchar(max)'),
                                        1,
                                        CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))-1
                                        ) as attribute_name,
                              REPLACE(
                                SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'),
                                          CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))+1,
                                          1000
                                         ),
                                '"','') as attribute_value
                          from data as d
                               CROSS APPLY 
                               d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]/ATT') n(c)
                       ) as f
               ) AS T
         PIVOT (
             MAX(attribute_value)
             FOR attribute_name in ( 
                 LOCALITY_NAME, 
                 GAZETTED_LOCALITY_NAME, 
                 VICNAMES_ID, 
                 PFI_CREATED 
             )
         ) as pv
       ) as c
       ON (c.pfi = a.pfi and c.ufi=a.ufi)
GO

table_name       pfi       ufi       create_date    locality_name gazetted_locality_name vicnames_id pfi_created    polygon_area
LOCALITY_POLYGON 205407825 468965778 20131024153045 WOODFIELD     WOODFIELD              103388      20050927080612 505.967401109636
LOCALITY_POLYGON 205423232 461413148 20191024153045 WOORARRA WEST WOORARRA WEST          103414      20050927080612 74.6977753117681

Conclusion

The SQL and XML processing in this article by no means covers all aspects of the VicMap IUF. As already indicated, this article does not cover the following XML objects:

  • LINE/LINE_NAME/LINE_ITEM
  • POINT/POINT_NAME/POINT_ITEM
  • ASPATIAL
  • CONNECT

(LINE and POLYGON geography objects are much easier to handle than polygons and rings.)

Neither does this article describe how to handle IUF actions other than INSERT:

  • REPLACE
  • DELETE

The part two article will concentrate on how to process a large IUF document. It will introduce and explain some pre-processing techniques which help reduce file size, complexity and improve the SQL processing of the XML.

Part three will show how to use the OPENXML SQL Server operator to process an IUF document.

In these articles on XML processing of documents containing spatial elements, nothing pejorative should be inferred about the lack of use of, or reference to, mainstream geospatial technology such as FME, qGIS, og2ogr, Java Topology Suite (JTS) etc. Rather, these articles try to embrace existing IT technology to show how they can be used, on their own to deliver spatial processing within their more traditional IT areas of use. It may also be that such IT-tool-centric solution may be best for an existing IT project, where introducing heavy components could complicate the solution.

Don’t get me wrong, I love geospatial technology as much as anyone!

However, vigilance is need in solution designs that mainly involve IT tools and technologies so that we do not make the following mistake:

To a person with a hammer, everything is a nail.

Comment below or contact me directly.