Top 5 Recent Articles
ARTICLES CATEGORIES
- Algorithms (22)
- All (399)
- Biography (1)
- Blog (44)
- Business Requirements (1)
- Commentary (1)
- Conversion (2)
- Customers (2)
- Data Models (1)
- Education (2)
- GeoRaptor (13)
- GPS (1)
- Image Processing (2)
- Import Export (8)
- Licensing (2)
- LiDAR (1)
- Linear Referencing (4)
- Manifold GIS (3)
- Mapping (1)
- MySQL Spatial (7)
- Networking and Routing (including Optimization) (5)
- Open Source (18)
- Oracle Spatial and Locator (194)
- Partitioning (1)
- PostGIS (36)
- Projections (1)
- Published Articles (1)
- qGIS (1)
- Recommendations (1)
- Services (1)
- Software Change Log (1)
- Source Code (37)
- Space Curves (9)
- Spatial Database Functions (109)
- Spatial DB comparison (1)
- Spatial XML Processing (11)
- SQL Server Spatial (92)
- Standards (3)
- Stored Procedure (17)
- Tessellation or Gridding (10)
- Tools (2)
- Topological Relationships (1)
- Training (2)
- Triangulation (2)
Using SQL Server XML and Spatial to import VicMap seed/Incremental Update Format (IUF) data (Part 1)
Introduction
This article is about importing Victorian Government’s Incremental Update File (IUF) XML data into SQL Server.
The IUF XML is used to publish whole (complete) datasets (e.g. Local Government Areas) or deltas/changes.
As will be seen, it encodes geometry objects in its own unique way, and thus is a good dataset to demonstrate the use SQL Server Spatial and XML processing to import the data.
This article is one of three:
- Part 1 introduces the structure of the IUF XML data and show how to process it to create polygons with exterior and interior rings, and attributes.
- Part 2 investigates methods for pre-processing the XML and its effects of processing complexity and speed.
- Part 3 describes a method for reducing XML processing complexity to radically speed processing.
Background
The Victorian Department of Sustainability and Environment releases its spatial data for free under a Creative Commons License.
The data it releases are grouped by themes:
- Vicmap Address
- Vicmap Admin
- Vicmap Crown Land Tenure
- Vicmap Elevation
- Vicmap Features of Interest
- Vicmap Hydro
- Vicmap Imagery
- Vicmap Lite
- Vicmap Planning
- Vicmap Position
- Vicmap Property
- Vicmap Topographic Mapping
- Vicmap Transport
- Vicmap Vegetation
And within each theme exists a number of layers.
Vicmap Admin (Boundaries and Names)
- Local Government Areas (LGA)
- Localities
- Postcodes
- State Electorates
- Local Government Wards
- Parishes and Townships
- State Government Departments
- CFA (Country Fire Authority) Regions, Districts and Total Fire Ban Districts
- MFB (Melbourne Fire Brigade) Regions and Districts
Formats
All datasets other than VicMap Admin and Property are released in various binary formats (eg the venerable ESRI shapefile format);
The VicMap Admin and VicMap Property data are released in a proprietary XML format described by a Document Type Definition (DTD) and not an XML Schema.
There are two types of release types for these two themes:
- Seed – A complete copy of the data is released at a point in time;
- Incremental Update Format (IUF) – Contains deltas, or changes, the base (seed) data.
The seed and IUF data formats are the same except (controlled by same DTD) that the IUF format has INSERT, REPLACE and DELETE high level tags, with the seed only having an INSERT tag.
There are spatial software packages that support the VicMap DTD such as the ubiquitous and excellent Feature Manipulation Engine (FME).
But there are situations where one might not have FME but still need to be able to process and import the data into a database using standard IT tools.
Perhaps there is a requirement to process the XML in a Cloud environment such as Azure using only Python using as light a footprint as possible.
This article shows how to use a few IT tools, and SQL Server’s XML capabilities, to load a some layers from VicMap Admin theme.
Sample Data
The example document we will use for our examples is a cut-down version of a basic VicMap seed document (with fabricated) polygons.
To give an idea of how large a seed document can be, the all_seed_data_17-feb-2019.xml is 342 MB (358,757,517 bytes) in size!
Our cut down VicMap Admin seed document only contains the following layers out of the total that are possible (see above):
- Local Government Area (LGA) boundaries and names
- Locality boundaries and names
- Postcode boundaries and numbers
- Parish boundaries and names
- Township boundaries and names
The tag hierarchy for each theme (with simple polygon coordinates) is demonstrated by the following example:
<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <DESCRIPTION> <AUTHORITY>LAND VICTORIA</AUTHORITY> <DATASET>VICMAP DIGITAL PROPERTY</DATASET> <EXPORT SCHEMA="VICTORIA" LGA="ALL" FROM_DATE="17-FEB-2019" TO_DATE="17-FEB-2019" FORMAT="LLD GDA94" DATE="15-FEB-2019" BY="SKM"> </EXPORT> </DESCRIPTION> <INSERT> <POLYGON> <!-- SEED for POLYGON table LGA_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LGA_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> ... Polygon Description </POLYGON_ITEM> ... More Polygons </POLYGON_NAME> <!-- SEED for POLYGON table LOCALITY_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> ... Polygon Description </POLYGON_ITEM> ... More Polygons </POLYGON_NAME> <!-- SEED for POLYGON table POSTCODE_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="POSTCODE_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> ... Polygon Description </POLYGON_ITEM> ... More Polygons </POLYGON_NAME> <!-- SEED for POLYGON table PARISH_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="PARISH_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> ... Polygon Description </POLYGON_ITEM> ... More Polygons </POLYGON_NAME> <!-- SEED for POLYGON table TOWNSHIP_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="TOWNSHIP_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> ... Polygon Description </POLYGON_ITEM> ... More Polygons </POLYGON_NAME> </POLYGON> </INSERT> </IUF>
The top level XML tag element IUF is used for both seed and delta documents.
This seed XML data document does not contain POINT or LINE elements as all the data is polygonal.
The absence of REPLACE or DELETE elements (INSERT tags only) indicates that the IUF docment is a seeddocument; their presence indicates that the document is incremental update document. If it did contain incremental data it might look like this:
<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <iuf> <description> <authority>LAND VICTORIA</authority> <dataset>VICMAP DIGITAL PROPERTY</dataset> <export schema="VICTORIA" lga="ALL" from_date="17-FEB-2019" to_date="17-FEB-2019" format="LLD GDA94" date="15-FEB-2019" by="SKM"> </export> </description> <insert> <polygon>...</polygon> </insert> <replace> <polygon>...</polygon> </replace> <delete> <polygon>...</polygon> </delete> </iuf>
1. Selecting Objects by Type Using XML Attributes
When processing a document in SQL we often process the data a single layer at a time. This enables the process to write to a dedicated table eg [dbo].[Locality].
This requires the filtering of the IUF document.
The filtering can be done using XPATH against the XML document, as the following example shows.
with data as ( select CONVERT(xml, '<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <INSERT> <POLYGON> <!-- SEED for POLYGON table LGA_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LGA_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> <OUTER><XY>1,2</XY></OUTER> </POLYGON_ITEM> </POLYGON_NAME> <!-- SEED for POLYGON table LOCALITY_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> <OUTER><XY>1,2</XY></OUTER> </POLYGON_ITEM> </POLYGON_NAME> <!-- SEED for POLYGON table PARISH_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="PARISH_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> <OUTER><XY>1,2</XY></OUTER> </POLYGON_ITEM> </POLYGON_NAME> <!-- SEED for POLYGON table POSTCODE_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="POSTCODE_POLYGON" PFI="367" CREATED="20131024152212" UFI="468592502"></HEADER> <OUTER><XY>1,2</XY></OUTER> </POLYGON_ITEM> </POLYGON_NAME> </POLYGON> </INSERT> </IUF>',2) as iuf ) select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c); GO table_name ---------------- LOCALITY_POLYGON
Filtering (selection) can also be done using a SQL WHERE clause predicate on the HEADER/@TABLE attribute as follows:
WITH data AS ( ... ) SELECT n.c.value('(HEADER/@TABLE)[1]','varchar(max)') as table_name FROM data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM') n(c) WHERE n.c.value('(HEADER/@TABLE)[1]','varchar(max)') ='LOCALITY_POLYGON' GO table_name ---------------- LOCALITY_POLYGON
Processing Polygon Layer Spatial Elements
The following (fabricated) XML shows how the IUF encodes an admin polygon’s exterior and interior rings, and its attributes:
<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <INSERT> <POLYGON> <!-- SEED for POLYGON table LGA_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER> <OUTER> <XY>148.04446887,-36.39192699</XY> <XY>148.04487,-36.39192699</XY> <XY>148.04487,-36.39175625</XY> <XY>148.04446887,-36.39175625</XY> <XY>148.04446887,-36.39192699</XY> </OUTER> <INNER> <XY>148.04452460167477,-36.391801308813505</XY> <XY>148.04452460173229,-36.391881931347996</XY> <XY>148.04481426826769,-36.391881931347953</XY> <XY>148.04481426832533,-36.391801308813427</XY> <XY>148.04452460167477,-36.391801308813505</XY> </INNER> <OUTER> <XY>148.0445607,-36.3918597</XY> <XY>148.0446579,-36.3918318</XY> <XY>148.0445518,-36.3918129</XY> <XY>148.0445607,-36.3918597</XY> <XY>148.0446211,-36.3917394</XY> </OUTER> <INNER> <XY>148.0447562,-36.3917394</XY> <XY>148.0447562,-36.3916796</XY> <XY>148.04462,-36.391679</XY> <XY>148.04462,-36.3916778</XY> <XY>148.0446211,-36.3917394</XY> </INNER> <ATT>LOCALITY_NAME="WOODFIELD"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT> <ATT>VICNAMES_ID="103388"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> </POLYGON_NAME> </POLYGON> </INSERT> </IUF>
We’ll see what this locality polygon looks like soon.
Cases
There are a number of cases/situations for encoding a polygon that the XML parsing must handle:
- A polygon with a single exterior/OUTER ring and no interior/INNER ring.
- A polygon with a single exterior/OUTER ring and one or more interior/INNER rings.
- A polygon composed of more than one exterior/OUTER rings and one or more interior/INNER rings: a MultiPolygon.
Case 1: A polygon with a single exterior/OUTER ring
SQL Server XML processing does not allow for the parsing of XML with internal subset DTDs. To handle this, CONVERT with style option 2 is used to enable limited internal subset DTD support.
with data as ( select CONVERT(xml, '<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <INSERT> <POLYGON> <!-- SEED for POLYGON table LGA_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER> <OUTER> <XY>148.04446887,-36.39192699</XY> <XY>148.04487,-36.39192699</XY> <XY>148.04487,-36.39175625</XY> <XY>148.04446887,-36.39175625</XY> <XY>148.04446887,-36.39192699</XY> </OUTER> <ATT>LOCALITY_NAME="WOODFIELD"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT> <ATT>VICNAMES_ID="103388"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> </POLYGON_NAME> <!-- SEED for POLYGON table LGA_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="TOWNSHIP_POLYGON" PFI="4608465" CREATED="20190208064929" UFI="568037229"></HEADER> <OUTER> <XY>141.95512062,-36.83868465</XY> <XY>141.96277778,-36.83945073</XY> <XY>141.96301054,-36.83947401</XY> <XY>141.96419852,-36.83959281</XY> <XY>141.95512062,-36.83868465</XY> </OUTER> <ATT>TOWNSHIP_CODE="5601"</ATT> <ATT>TOWNSHIP_NAME="NORADJUHA TP"</ATT> </POLYGON_ITEM> </POLYGON_NAME> </POLYGON> </INSERT> </IUF>',2) as iuf ) select f.table_name, f.pfi, f.ufi, f.create_date, geography::STGeomFromText('POLYGON ((' + STRING_AGG(REPLACE(f.XY,',',' '),',') + '))',4283).STAsText() as polygon from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date, o.x.value('(text())[1]','varchar(max)') as XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('OUTER/XY') o(x) ) as f group by f.table_name, f.pfi, f.ufi, f.create_date GO table_name pfi ufi create_date polygon LOCALITY_POLYGON 205407825 468965778 20131024153045 POLYGON ((148.04446887 -36.39192699, 148.04487 -36.39192699, 148.04487 -36.39175625, 148.04446887 -36.39175625, 148.04446887 -36.39192699))
This looks like:
Case 2: A polygon with a single exterior/OUTER ring, one interior/INNER ring.
Note that outer rings are in Counter-ClockWise order (CCW) order with inner rings being in Clockwise order.
The approach taken is to:
- Construct WKT from the outer rings using the XY elements in the order they appear in the document.
- Construct a WKT polygon for each inner ring but reversing the order of the XY ordinates to produce a valid exterior ring for each;
- Use STSymDifference to create a polygon with the exterior and interior rings correctly processed to produce a valid polygon.
with data as ( select CONVERT(xml, '<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <INSERT> <POLYGON> <!-- SEED for POLYGON table LGA_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER> <OUTER> <XY>148.04446887,-36.39192699</XY> <XY>148.04487,-36.39192699</XY> <XY>148.04487,-36.39175625</XY> <XY>148.04446887,-36.39175625</XY> <XY>148.04446887,-36.39192699</XY> </OUTER> <INNER> <XY>148.04452460167477,-36.391801308813505</XY> <XY>148.04481426832533,-36.391801308813427</XY> <XY>148.04481426826769,-36.391881931347953</XY> <XY>148.04452460173229,-36.391881931347996</XY> <XY>148.04452460167477,-36.391801308813505</XY> </INNER> <ATT>LOCALITY_NAME="WOODFIELD"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT> <ATT>VICNAMES_ID="103388"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> </POLYGON_NAME> </POLYGON> </INSERT> </IUF>',2) as iuf ) select a.table_name, a.pfi, a.ufi, a.create_date, /* Take into account that the polygon may not have any holes (INNER) */ (case when b.interior_ring is not null then a.exterior_ring.STSymDifference(b.interior_ring) else a.exterior_ring end).STAsText() as polygon from (select f.table_name, f.pfi, f.ufi, f.create_date, geography::STGeomFromText('POLYGON ((' + STRING_AGG(REPLACE(f.XY,',',' '),',') + '))',4283) as exterior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date, o.x.value('(text())[1]','varchar(max)') as XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('OUTER/XY') o(x) ) as f group by f.table_name, f.pfi, f.ufi, f.create_date ) as a LEFT OUTER JOIN /* In case no inners */ (select f.table_name, f.pfi, f.ufi, f.create_date, geography::STGeomFromText('POLYGON ((' + (STRING_AGG(REPLACE(f.XY,',',' '),',') WITHIN GROUP (ORDER BY f.coord_position DESC)) + '))',4283) as interior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date, i.x.value('(text())[1]', 'varchar(max)') as XY, /* Generate sequence number For all XY pairs for current INNER in XML order */ i.x.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('INNER/XY') i(x) ) as f group by f.table_name, f.pfi, f.ufi, f.create_date ) as b ON (b.table_name = a.table_name and b.pfi = a.pfi and b.ufi = a.ufi and b.create_date = a.create_date ); GO table_name pfi ufi create_date polygon LOCALITY_POLYGON 205407825 468965778 20131024153045 POLYGON ((148.04446887 -36.39175625, 148.04446887 -36.39192699, 148.04487 -36.39192699, 148.04487 -36.39175625, 148.04446887 -36.39175625), (148.04452460167477 -36.391801308813505, 148.04481426832533 -36.391801308813427, 148.04481426826769 -36.391881931347953, 148.04452460173229 -36.391881931347996, 148.04452460167477 -36.391801308813505))
This looks like:
Case 3. MultiPolygon composed of two exterior/OUTER rings, one with a single interior/INNER ring, the other with two INNER rings.
The seed/IUF XML does not provide any ordering of the OUTER/INNER/XY elements.
Because SQL does not guarantee order (fundamental property of relational theory), some method for sorting the OUTER/INNER/XY tags within a polygon is needed.
Why is this a problem?
Well, if the XY coordinate elements described thousands of vertices, it would be unsafe to assume that the query optimizer would maintain row order from one select to its parent.
One way this could have been done is through the original IUF DTD allowing the OUTER/INNER/XY tags to have an order attribute. For example:
<outer id="1"> <xy id="1"">148.04446887,-36.39192699</xy> ... </outer> <inner id="1"> <xy id="1"">148.02938287,-36.31287299</xy> ... </inner> <inner id="2"> <xy id="1"">148.12097854 -36.1289439</xy> ... </inner>
But this is not available to us.
Since we have only XML processing available to us, we need to find a way of generating surrogate identifiers.
One way is to use an XML VALUE method which maintains the order the OUTER/INNER/XY tags appear in the document.
This is done in the following XML processing. Take note of the XPATH that generates the following elements:
- outer_position
- inner_position
- coord_position
(Another method for ensuring XY order is described in another article.)
with data as ( select CONVERT(xml, '<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <INSERT> <POLYGON> <!-- SEED for POLYGON table LGA_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER> <OUTER> <XY>148.04446887,-36.39192699</XY> <XY>148.04487,-36.39192699</XY> <XY>148.04487,-36.39175625</XY> <XY>148.04446887,-36.39175625</XY> <XY>148.04446887,-36.39192699</XY> </OUTER> <INNER> <XY>148.04452460167477,-36.391801308813505</XY> <XY>148.04481426832533,-36.391801308813427</XY> <XY>148.04481426826769,-36.391881931347953</XY> <XY>148.04452460173229,-36.391881931347996</XY> <XY>148.04452460167477,-36.391801308813505</XY> </INNER> <OUTER> <XY>148.044619,-36.3917393</XY> <XY>148.0447565,-36.3917393</XY> <XY>148.0447565,-36.391678</XY> <XY>148.044619,-36.391678</XY> <XY>148.044619,-36.3917393</XY> </OUTER> <INNER> <XY>148.04463,-36.39169</XY> <XY>148.04475,-36.39169</XY> <XY>148.04475,-36.39173</XY> <XY>148.04463,-36.39169</XY> </INNER> <INNER> <XY>148.04463073970265441 -36.39171145551856767</XY> <XY>148.0446664800863914 -36.39173171778336524</XY> <XY>148.04463186538401942 -36.39173115494267563</XY> <XY>148.04463073970265441 -36.39171145551856767</XY> </INNER> <ATT>LOCALITY_NAME="WOODFIELD"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT> <ATT>VICNAMES_ID="103388"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> </POLYGON_NAME> </POLYGON> </INSERT> </IUF>',2) as iuf ) select /* Remove INNER polygons from OUTER polygons */ a.table_name, a.pfi, a.ufi, a.create_date, /* Take into account that the polygon may not have any holes (INNER) */ (case when b.interior_ring is not null then a.exterior_ring.STSymDifference(b.interior_ring) else a.exterior_ring end).STAsText() as polygon from (select /* Aggregate all individual OUTER polygons within table/polygon_item */ g.table_name, g.pfi, g.ufi, g.create_date, geography::CollectionAggregate(g.exterior_ring) as exterior_ring from (/* Generate OUTER polygons for each individual geometry within polygon_name */ select f.table_name, f.pfi, f.ufi, f.create_date, geography::STGeomFromText('POLYGON ((' + (STRING_AGG(f.coord_XY,',') WITHIN GROUP (ORDER BY f.coord_position ASC) ) + '))', 4283 ) as exterior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date, o.x.value('(local-name(.))', 'varchar(100)') as coord_tag, /* Generate sequence number For all INNERs in current polygon_item */ o.x.value('for $i in . return count(../*[. << $i]) + 1', 'int') as outer_position, /* Generate sequence number For all XY pairs for current INNER in XML order */ x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position, REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ') as coord_XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('OUTER') o(x) /* Extract OUTER node to enable counting */ CROSS APPLY o.x.nodes('XY') x(y) /* Extract XY coordinates for current OUTER */ ) as f group by f.table_name, f.pfi, f.ufi, f.create_date, f.outer_position ) as g group by g.table_name, g.pfi, g.ufi, g.create_date ) as a LEFT OUTER JOIN /* In case no inners */ (select /* Aggregate all individual OUTER polygons within table/polygon_item */ g.table_name, g.pfi, g.ufi, g.create_date, geography::CollectionAggregate(g.interior_ring) as interior_ring from (/* Generate INNER rings as polygons for each individual geometry within polygon_name. To create valid polygons we need to revese the XY order for each INNER ring */ select f.table_name, f.pfi, f.ufi, f.create_date, geography::STGeomFromText('POLYGON ((' + (STRING_AGG(f.coord_XY,',') WITHIN GROUP (ORDER BY f.coord_position DESC) ) + '))', 4283 ) as interior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date, o.x.value('(local-name(.))', 'varchar(100)') as coord_tag, /* Generate sequence number For all INNERs in current polygon_item */ o.x.value('for $i in . return count(../*[. << $i]) + 1', 'int') as inner_position, /* Generate sequence number For all XY pairs for current INNER in XML order */ x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position, REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ') as coord_XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('INNER') o(x) /* Extract INNER node to enable counting */ CROSS APPLY o.x.nodes('XY') x(y) /* Extract XY coordinates for current INNER */ ) as f group by f.table_name, f.pfi, f.ufi, f.create_date, f.inner_position ) as g group by g.table_name, g.pfi, g.ufi, g.create_date ) as b ON (b.table_name = a.table_name and b.pfi = a.pfi and b.ufi = a.ufi and b.create_date = a.create_date ); GO table_name pfi ufi create_date polygon LOCALITY_POLYGON 205407825 468965778 20131024153045 MULTIPOLYGON (((148.044619 -36.391678, 148.044619 -36.3917393, 148.0447565 -36.3917393, 148.0447565 -36.391678, 148.044619 -36.391678), (148.04463073970265 -36.391711455518568, 148.04466648008639 -36.391731717783365, 148.04463186538402 -36.391731154942676, 148.04463073970265 -36.391711455518568), (148.04463 -36.39169, 148.04475 -36.39169, 148.04475 -36.39173, 148.04463 -36.39169)), ((148.04446887 -36.39175625, 148.04446887 -36.39192699, 148.04487 -36.39192699, 148.04487 -36.39175625, 148.04446887 -36.39175625), (148.04452460167477 -36.391801308813505, 148.04481426832533 -36.391801308813427, 148.04481426826769 -36.391881931347953, 148.04452460173229 -36.391881931347996, 148.04452460167477 -36.391801308813505)))
This looks like:
This SQL and XML processing will handle all possible (multi)polygon exterior/interior ring combinations.
Less is More?
We can reduce the use of SQL Server Spatial geometry constructors and functions by a fuller generation of the Well Known Text (WKT) for a polygon rings before constructing the geometry.
One might want to do this if the XML is processed externally to the database with the geography objects being created by a simple INSERT statement.
The following is a version of the previous where SQL Server Spatial constructors and function calls are only used in the last SELECT.
with data as ( select CONVERT(xml, '<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <INSERT> <POLYGON> <!-- SEED for POLYGON table LOCALITY_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER> <OUTER> <XY>148.04446887,-36.39192699</XY> <XY>148.04487,-36.39192699</XY> <XY>148.04487,-36.39175625</XY> <XY>148.04446887,-36.39175625</XY> <XY>148.04446887,-36.39192699</XY> </OUTER> <INNER> <XY>148.04452460167477,-36.391801308813505</XY> <XY>148.04481426832533,-36.391801308813427</XY> <XY>148.04481426826769,-36.391881931347953</XY> <XY>148.04452460173229,-36.391881931347996</XY> <XY>148.04452460167477,-36.391801308813505</XY> </INNER> <OUTER> <XY>148.044619,-36.3917393</XY> <XY>148.0447565,-36.3917393</XY> <XY>148.0447565,-36.391678</XY> <XY>148.044619,-36.391678</XY> <XY>148.044619,-36.3917393</XY> </OUTER> <INNER> <XY>148.04463,-36.39169</XY> <XY>148.04475,-36.39169</XY> <XY>148.04475,-36.39173</XY> <XY>148.04463,-36.39169</XY> </INNER> <INNER> <XY>148.04463073970265441 -36.39171145551856767</XY> <XY>148.0446664800863914 -36.39173171778336524</XY> <XY>148.04463186538401942 -36.39173115494267563</XY> <XY>148.04463073970265441 -36.39171145551856767</XY> </INNER> <ATT>LOCALITY_NAME="WOODFIELD"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT> <ATT>VICNAMES_ID="103388"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205423232" CREATED="20191024153045" UFI="461413148"></HEADER> <OUTER> <XY>148.0444519, -36.3917393</XY> <XY>148.044619, -36.3917393</XY> <XY>148.044619, -36.391678</XY> <XY>148.0444519, -36.391678</XY> <XY>148.0444519, -36.3917393</XY> </OUTER> <INNER> <XY>148.044453,-36.39169</XY> <XY>148.04459,-36.39169</XY> <XY>148.04459,-36.39173</XY> <XY>148.044453,-36.39169</XY> </INNER> <ATT>LOCALITY_NAME="WOORARRA WEST"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOORARRA WEST"</ATT> <ATT>VICNAMES_ID="103414"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> </POLYGON_NAME> </POLYGON> </INSERT> </IUF>',2) as iuf ) select /* Remove INNER polygons from OUTER polygons */ a.table_name, a.pfi, a.ufi, a.create_date, /* Take into account that the polygon may not have any holes (INNER) */ (case when b.interior_rings is not null then geography::STGeomFromText( case when CHARINDEX(',',a.exterior_rings) <> 0 then 'MULTIPOLYGON(' + a.exterior_rings + ')' else 'POLYGON' + a.exterior_rings end, 4283 ).STSymDifference( geography::STGeomFromText( case when CHARINDEX(',',b.interior_rings) <> 0 then 'MULTIPOLYGON(' + b.interior_rings + ')' else 'POLYGON' + b.interior_rings end, 4283 ) ) else geography::STGeomFromText( case when CHARINDEX(',',a.exterior_rings) <> 0 then 'MULTIPOLYGON(' + a.exterior_rings + ')' else 'POLYGON' + a.exterior_rings end, 4283 ) end).STAsText() as polygon from (select /* Aggregate all individual OUTER polygons within table/polygon_item */ g.table_name, g.pfi, g.ufi, g.create_date, '('+ STRING_AGG(g.exterior_ring,'),(') + ')' as exterior_rings from (/* Generate OUTER polygons for each individual geometry within polygon_name */ select f.table_name, f.pfi, f.ufi, f.create_date, '(' + (STRING_AGG(f.coord_XY,',') WITHIN GROUP (ORDER BY f.coord_position ASC)) + ')' as exterior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date, o.x.value('(local-name(.))', 'varchar(100)') as coord_tag, /* Generate sequence number For all INNERs in current polygon_item */ o.x.value('for $i in . return count(../*[. << $i]) + 1', 'int') as outer_position, /* Generate sequence number For all XY pairs for current INNER in XML order */ x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position, REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ') as coord_XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('OUTER') o(x) /* Extract OUTER node to enable counting */ CROSS APPLY o.x.nodes('XY') x(y) /* Extract XY coordinates for current OUTER */ ) as f group by f.table_name, f.pfi, f.ufi, f.create_date, f.outer_position ) as g group by g.table_name, g.pfi, g.ufi, g.create_date ) as a LEFT OUTER JOIN /* In case no inners */ (select /* Aggregate all individual OUTER polygons within table/polygon_item */ g.table_name, g.pfi, '('+ STRING_AGG(g.interior_ring,'),(') + ')' as interior_rings from (/* Generate INNER polygons for each individual geometry within polygon_name */ select f.table_name, f.pfi, '(' + (STRING_AGG(f.coord_XY,',') WITHIN GROUP (ORDER BY f.coord_position DESC)) + ')' as interior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, o.x.value('(local-name(.))', 'varchar(100)') as coord_tag, /* Generate sequence number For all INNERs in current polygon_item */ o.x.value('for $i in . return count(../INNER/*[. << $i]) + 1', 'int') as inner_position, n.c.value('count(INNER)', 'int') as inner_ring_count, /* Generate sequence number For all XY pairs for current INNER in XML order */ x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as coord_position, REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ') as coord_XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM') n(c) CROSS APPLY n.c.nodes('INNER') o(x) /* Extract INNER node to enable counting */ CROSS APPLY o.x.nodes('XY') x(y) /* Extract XY coordinates for current INNER */ ) as f group by f.table_name, f.pfi, f.inner_position ) as g group by g.table_name, g.pfi ) as b ON (b.table_name = a.table_name and b.pfi = a.pfi ) GO table_name pfi ufi create_date polygon LOCALITY_POLYGON 205407825 468965778 20131024153045 MULTIPOLYGON (((148.044619 -36.391678, 148.044619 -36.3917393, 148.0447565 -36.3917393, 148.0447565 -36.391678, 148.044619 -36.391678), (148.04463073970265 -36.391711455518568, 148.04466648008639 -36.391731717783365, 148.04463186538402 -36.391731154942676, 148.04463073970265 -36.391711455518568), (148.04463 -36.39169, 148.04475 -36.39169, 148.04475 -36.39173, 148.04463 -36.39169)), ((148.04446887 -36.39175625, 148.04446887 -36.39192699, 148.04487 -36.39192699, 148.04487 -36.39175625, 148.04446887 -36.39175625), (148.04452460167477 -36.391801308813505, 148.04481426832533 -36.391801308813427, 148.04481426826769 -36.391881931347953, 148.04452460173229 -36.391881931347996, 148.04452460167477 -36.391801308813505))) LOCALITY_POLYGON 205423232 461413148 20191024153045 POLYGON ((148.0444519 -36.391678, 148.0444519 -36.3917393, 148.044619 -36.3917393, 148.044619 -36.391678, 148.0444519 -36.391678), (148.044453 -36.39169, 148.04459 -36.39169, 148.04459 -36.39173, 148.044453 -36.39169))
This looks like:
Attribute Extraction
Extracting attributes involves processing the
Each attribute is encoded as a name/value pair.
The attributes and their values are extracted, in the following example, for a known layer type.
Extraction uses the SQL Server PIVOT operator.
with data as ( select CONVERT(xml, '<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <INSERT> <POLYGON> <!-- SEED for POLYGON table LGA_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER> <OUTER> <XY>148.04446887,-36.39192699</XY> <XY>148.04487, -36.39192699</XY> <XY>148.04487, -36.39175625</XY> <XY>148.04446887,-36.39175625</XY> <XY>148.04446887,-36.39192699</XY> </OUTER> <ATT>LOCALITY_NAME="WOODFIELD"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT> <ATT>VICNAMES_ID="103388"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> </POLYGON_NAME> </POLYGON> </INSERT> </IUF>',2) as iuf ) SELECT [table_name],[pfi],[ufi],[create_date], [LOCALITY_NAME], [GAZETTED_LOCALITY_NAME], [VICNAMES_ID], [PFI_CREATED] FROM (select [table_name], [pfi],[ufi],[create_date], [attribute_name], [attribute_value] from (select n.c.value('(../HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(../HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(../HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(../HEADER/@CREATED)[1]','varchar(100)') as create_date, SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'), 1, CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))-1 ) as attribute_name, REPLACE( SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'), CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))+1, 1000 ), '"','') as attribute_value from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]/ATT') n(c) ) as f ) AS T PIVOT ( max(attribute_value) FOR attribute_name in ( LOCALITY_NAME, GAZETTED_LOCALITY_NAME, VICNAMES_ID, PFI_CREATED ) ) AS V GO table_name pfi ufi create_date LOCALITY_NAME GAZETTED_LOCALITY_NAME VICNAMES_ID PFI_CREATED LOCALITY_POLYGON 205407825 468965778 20131024153045 WOODFIELD WOODFIELD 103388 20050927080612
Merge Spatial and Attributes
The following SQL/XML processing brings all the elements together to create a representation of a POLYGON_ITEM that includes is spatial and attribute data.
It also introduces a simplified way of creating the final polygon from the inner/outer elements.
Instead of selecting the final polygon, the SQL validates the generated polygon using the STIsValid() geography method.
with data as ( select CONVERT(xml, '<?xml version="1.0"?> <!DOCTYPE IUF SYSTEM "iuf_full.dtd"> <IUF> <INSERT> <POLYGON> <!-- SEED for POLYGON table LOCALITY_POLYGON commencing --> <POLYGON_NAME> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205407825" CREATED="20131024153045" UFI="468965778"></HEADER> <OUTER> <XY>148.04446887,-36.39192699</XY> <XY>148.04487,-36.39192699</XY> <XY>148.04487,-36.39175625</XY> <XY>148.04446887,-36.39175625</XY> <XY>148.04446887,-36.39192699</XY> </OUTER> <INNER> <XY>148.04452460167477,-36.391801308813505</XY> <XY>148.04481426832533,-36.391801308813427</XY> <XY>148.04481426826769,-36.391881931347953</XY> <XY>148.04452460173229,-36.391881931347996</XY> <XY>148.04452460167477,-36.391801308813505</XY> </INNER> <OUTER> <XY>148.044619,-36.3917393</XY> <XY>148.0447565,-36.3917393</XY> <XY>148.0447565,-36.391678</XY> <XY>148.044619,-36.391678</XY> <XY>148.044619,-36.3917393</XY> </OUTER> <INNER> <XY>148.04463,-36.39169</XY> <XY>148.04475,-36.39169</XY> <XY>148.04475,-36.39173</XY> <XY>148.04463,-36.39169</XY> </INNER> <INNER> <XY>148.04463073970265441 -36.39171145551856767</XY> <XY>148.0446664800863914 -36.39173171778336524</XY> <XY>148.04463186538401942 -36.39173115494267563</XY> <XY>148.04463073970265441 -36.39171145551856767</XY> </INNER> <ATT>LOCALITY_NAME="WOODFIELD"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOODFIELD"</ATT> <ATT>VICNAMES_ID="103388"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> <POLYGON_ITEM> <HEADER TABLE="LOCALITY_POLYGON" PFI="205423232" CREATED="20191024153045" UFI="461413148"></HEADER> <OUTER> <XY>148.0444519, -36.3917393</XY> <XY>148.044619, -36.3917393</XY> <XY>148.044619, -36.391678</XY> <XY>148.0444519, -36.391678</XY> <XY>148.0444519, -36.3917393</XY> </OUTER> <INNER> <XY>148.044453,-36.39169</XY> <XY>148.04459,-36.39169</XY> <XY>148.04459,-36.39173</XY> <XY>148.044453,-36.39169</XY> </INNER> <ATT>LOCALITY_NAME="WOORARRA WEST"</ATT> <ATT>GAZETTED_LOCALITY_NAME="WOORARRA WEST"</ATT> <ATT>VICNAMES_ID="103414"</ATT> <ATT>PFI_CREATED="20050927080612"</ATT> </POLYGON_ITEM> </POLYGON_NAME> </POLYGON> </INSERT> </IUF>',2) as iuf ) select a.table_name, a.pfi, a.ufi, a.create_date, c.LOCALITY_NAME, c.GAZETTED_LOCALITY_NAME, c.VICNAMES_ID, c.PFI_CREATED, geography::STGeomFromText(a.exterior_rings,4326) .STSymDifference( geography::STGeomFromText(COALESCE(b.interior_rings,'POLYGON EMPTY'),4326) ).STIsValid() as isValidPolygon from (select g.table_name, g.pfi, g.ufi, g.create_date, case when count(*) = 1 then 'POLYGON(' + STRING_AGG( g.exterior_ring,',') + ')' else 'MULTIPOLYGON(' + STRING_AGG('('+g.exterior_ring+')',',') + ')' end as exterior_rings from (select f.table_name, f.pfi, f.ufi, f.create_date, '(' + (STRING_AGG(f.XY,',') WITHIN GROUP (ORDER BY f.xy_position ASC)) + ')' as exterior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date, o.i.value('(local-name(.))', 'varchar(100)') as coord_tag, o.i.value('for $i in . return count(../*[. << $i]) + 1','int') as ring_position, x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as xy_position, REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ') as XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('OUTER') o(i) /* Extract OUTER node to enable counting */ CROSS APPLY o.i.nodes('XY') x(y) /* Extract XY coordinates for current OUTER */ ) as f group by f.table_name, f.pfi, f.ufi, f.create_date, f.ring_position ) as g group by g.table_name, g.pfi, g.ufi, g.create_date ) as a left outer join (select g.table_name, g.pfi, case when count(*) = 1 then 'POLYGON(' + STRING_AGG(g.interior_ring,',') + ')' else 'MULTIPOLYGON(' + STRING_AGG('('+g.interior_ring+')',',') + ')' end as interior_rings from (select f.table_name, f.pfi, f.coord_tag, f.ring_position, '('+(STRING_AGG(f.XY,',') WITHIN GROUP (ORDER BY f.xy_position DESC))+')' as interior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, o.i.value('(local-name(.))', 'varchar(100)') as coord_tag, o.i.value('for $i in . return count(../*[. << $i]) + 1','int') as ring_position, x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as xy_position, REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ') as XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('INNER') o(i) /* Extract OUTER node to enable counting */ CROSS APPLY o.i.nodes('XY') x(y) /* Extract XY coordinates for current OUTER */ ) as f group by f.table_name, f.pfi, f.coord_tag, f.ring_position ) as g group by g.table_name, g.pfi ) as b on (b.table_name = a.table_name and b.pfi = a.pfi) LEFT OUTER JOIN /* Attributes */ (SELECT pv.pfi,pv.ufi,pv.LOCALITY_NAME,pv.GAZETTED_LOCALITY_NAME,pv.VICNAMES_ID,pv.PFI_CREATED FROM (select f.pfi,f.ufi,f.attribute_name,f.attribute_value from (select n.c.value('(../HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(../HEADER/@UFI)[1]','varchar(100)') as ufi, n.c.value('(../HEADER/@CREATED)[1]','varchar(100)') as create_date, SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'), 1, CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))-1 ) as attribute_name, REPLACE( SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'), CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))+1, 1000 ), '"','') as attribute_value from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]/ATT') n(c) ) as f ) AS T PIVOT ( MAX(attribute_value) FOR attribute_name in ( LOCALITY_NAME, GAZETTED_LOCALITY_NAME, VICNAMES_ID, PFI_CREATED ) ) as pv ) as c ON (c.pfi = a.pfi and c.ufi=a.ufi) GO table_name pfi ufi create_date LOCALITY_NAME GAZETTED_LOCALITY_NAME VICNAMES_ID PFI_CREATED isValidPolygon LOCALITY_POLYGON 205407825 468965778 20131024153045 WOODFIELD WOODFIELD 103388 20050927080612 1 LOCALITY_POLYGON 205423232 461413148 20191024153045 WOORARRA WEST WOORARRA WEST 103414 20050927080612 1
Reading from a File
Part 2 of this series will involve processing all of an IUF seed document containing all the LGA, Locality and Postcode boundaries for Victoria.
But until then the following example shows how to load and process an IUF XML document from a file.
Firstly we take the IUF document from the CTE (WITH) clause in the last example, and save it to a file called admin_seed_example.xml.
The we can use the following modified query which directly accesses this document using the OPENROWSET function.
The SQL is slightly different fro, the last as it returns polygon area via STArea().
with data as ( SELECT CONVERT(xml, BulkColumn, 2) as iuf FROM OPENROWSET(BULK N'C:\Users\Simon\Downloads\admin_seed_example.xml', CODEPAGE ='RAW', SINGLE_CLOB) AS Document ) select a.table_name, a.pfi, a.ufi, a.create_date, c.LOCALITY_NAME, c.GAZETTED_LOCALITY_NAME, c.VICNAMES_ID, c.PFI_CREATED, geography::STGeomFromText(a.exterior_rings,4326) .STSymDifference( geography::STGeomFromText(COALESCE(b.interior_rings,'POLYGON EMPTY'),4326) ).STArea() as polygon_area from (select g.table_name, g.pfi, g.ufi, g.create_date, case when count(*) = 1 then 'POLYGON(' + STRING_AGG( g.exterior_ring,',') + ')' else 'MULTIPOLYGON(' + STRING_AGG('('+g.exterior_ring+')',',') + ')' end as exterior_rings from (select f.table_name, f.pfi, f.ufi, f.create_date, '(' + (STRING_AGG(f.XY,',') WITHIN GROUP (ORDER BY f.xy_position ASC)) + ')' as exterior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(HEADER/@UFI)[1]', 'varchar(100)') as ufi, n.c.value('(HEADER/@CREATED)[1]','varchar(100)') as create_date, o.i.value('(local-name(.))', 'varchar(100)') as coord_tag, o.i.value('for $i in . return count(../*[. << $i]) + 1','int') as ring_position, x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as xy_position, REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ') as XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('OUTER') o(i) /* Extract OUTER node to enable counting */ CROSS APPLY o.i.nodes('XY') x(y) /* Extract XY coordinates for current OUTER */ ) as f group by f.table_name, f.pfi, f.ufi, f.create_date, f.ring_position ) as g group by g.table_name, g.pfi, g.ufi, g.create_date ) as a left outer join (select g.table_name, g.pfi, case when count(*) = 1 then 'POLYGON(' + STRING_AGG(g.interior_ring,',') + ')' else 'MULTIPOLYGON(' + STRING_AGG('('+g.interior_ring+')',',') + ')' end as interior_rings from (select f.table_name, f.pfi, f.coord_tag, f.ring_position, '('+(STRING_AGG(f.XY,',') WITHIN GROUP (ORDER BY f.xy_position DESC))+')' as interior_ring from (select n.c.value('(HEADER/@TABLE)[1]', 'varchar(max)') as table_name, n.c.value('(HEADER/@PFI)[1]', 'varchar(100)') as pfi, o.i.value('(local-name(.))', 'varchar(100)') as coord_tag, o.i.value('for $i in . return count(../*[. << $i]) + 1','int') as ring_position, x.y.value('for $i in . return count(../*[. << $i]) + 1', 'int') as xy_position, REPLACE(x.y.value('(text())[1]', 'varchar(max)'),',',' ') as XY from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]') n(c) CROSS APPLY n.c.nodes('INNER') o(i) /* Extract OUTER node to enable counting */ CROSS APPLY o.i.nodes('XY') x(y) /* Extract XY coordinates for current OUTER */ ) as f group by f.table_name, f.pfi, f.coord_tag, f.ring_position ) as g group by g.table_name, g.pfi ) as b on (b.table_name = a.table_name and b.pfi = a.pfi) LEFT OUTER JOIN /* Attributes */ (SELECT pv.pfi,pv.ufi,pv.LOCALITY_NAME,pv.GAZETTED_LOCALITY_NAME,pv.VICNAMES_ID,pv.PFI_CREATED FROM (select f.pfi,f.ufi,f.attribute_name,f.attribute_value from (select n.c.value('(../HEADER/@PFI)[1]', 'varchar(100)') as pfi, n.c.value('(../HEADER/@UFI)[1]','varchar(100)') as ufi, n.c.value('(../HEADER/@CREATED)[1]','varchar(100)') as create_date, SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'), 1, CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))-1 ) as attribute_name, REPLACE( SUBSTRING(n.c.value('(text())[1]', 'varchar(max)'), CHARINDEX('=',n.c.value('(text())[1]', 'varchar(max)'))+1, 1000 ), '"','') as attribute_value from data as d CROSS APPLY d.iuf.nodes('/IUF/INSERT/POLYGON/POLYGON_NAME/POLYGON_ITEM[HEADER/@TABLE="LOCALITY_POLYGON"]/ATT') n(c) ) as f ) AS T PIVOT ( MAX(attribute_value) FOR attribute_name in ( LOCALITY_NAME, GAZETTED_LOCALITY_NAME, VICNAMES_ID, PFI_CREATED ) ) as pv ) as c ON (c.pfi = a.pfi and c.ufi=a.ufi) GO table_name pfi ufi create_date locality_name gazetted_locality_name vicnames_id pfi_created polygon_area LOCALITY_POLYGON 205407825 468965778 20131024153045 WOODFIELD WOODFIELD 103388 20050927080612 505.967401109636 LOCALITY_POLYGON 205423232 461413148 20191024153045 WOORARRA WEST WOORARRA WEST 103414 20050927080612 74.6977753117681
Conclusion
The SQL and XML processing in this article by no means covers all aspects of the VicMap IUF. As already indicated, this article does not cover the following XML objects:
- LINE/LINE_NAME/LINE_ITEM
- POINT/POINT_NAME/POINT_ITEM
- ASPATIAL
- CONNECT
(LINE and POLYGON geography objects are much easier to handle than polygons and rings.)
Neither does this article describe how to handle IUF actions other than INSERT:
- REPLACE
- DELETE
The part two article will concentrate on how to process a large IUF document. It will introduce and explain some pre-processing techniques which help reduce file size, complexity and improve the SQL processing of the XML.
Part three will show how to use the OPENXML SQL Server operator to process an IUF document.
In these articles on XML processing of documents containing spatial elements, nothing pejorative should be inferred about the lack of use of, or reference to, mainstream geospatial technology such as FME, qGIS, og2ogr, Java Topology Suite (JTS) etc. Rather, these articles try to embrace existing IT technology to show how they can be used, on their own to deliver spatial processing within their more traditional IT areas of use. It may also be that such IT-tool-centric solution may be best for an existing IT project, where introducing heavy components could complicate the solution.
Don’t get me wrong, I love geospatial technology as much as anyone!
However, vigilance is need in solution designs that mainly involve IT tools and technologies so that we do not make the following mistake:
To a person with a hammer, everything is a nail.
Comment below or contact me directly.
Documentation
- MySQL Spatial General Functions
- Oracle LRS Objects
- Oracle Spatial Exporter (Java + pl/SQL)
- Oracle Spatial Object Functions
- Oracle Spatial Object Functions (Multi Page)
- PostGIS pl/pgSQL Functions
- SC4O Oracle Java Topology Suite (Java + pl/SQL)
- SQL Server Spatial General TSQL Functions
- SQL Server Spatial LRS TSQL Functions