Filtering Rings in Polygon (PostGIS)

This is my first blog article on PostGIS.

On the PostGIS discussion list, a question was asked:

> I have a polygon table that has many small areas and holes. Now, I would
> like to remove small areas and holes that are 2800 m^2.
>
> Any help or advice would be really appreciated.

I decided to have a bash at this as part of my continuing to learn PostGIS.

This article will firstly look at a simple (single) polygon (I will use WKT to construct a polygon with two inner holes) which will make it easy for my readers to replicate. Then it will consider how to handle situations where a supplied polygon has no holes (inner rings) and how to handle polygons and multipolygons together before concluding with some performance figures.

1. Single Polygon with Inner Rings (holes)

The function that can help us do what we want is the ST_DumpRings(geometry) function. Here we can see how the function, applied to our test geomety, will extract all the rings for processing:

 SELECT ST_AsText(b.the_geom) as final_geom, ST_Area(b.the_geom) as area
   FROM (SELECT (ST_DumpRings(a.geom)).geom As the_geom
           FROM (SELECT ST_PolyFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))') as geom
                 ) a
         ) b;

Results:

geometry area
“POLYGON ((0 0,20 0,20 20,0 20,0 0))” 400
“POLYGON ((10 10,10 11,11 11,11 10,10 10))” 1
“POLYGON ((5 5,5 7,7 7,7 5,5 5))” 4

My first attempt to use ST_DumpRings to achieve the required outcome resulted in the following SQL (note how we can apply our area filter to the extracted rings):

 SELECT ST_AsText(c.the_geom) as final_geom
   FROM (SELECT ST_Collect(b.the_geom) as the_geom
           FROM (SELECT (ST_DumpRings(a.geom)).geom As the_geom
                   FROM (SELECT ST_PolyFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))') as geom
                        ) a
                 ) b
           WHERE ST_Area(b.the_geom) > 1
         ) c;
 -- Result
 "MULTIPOLYGON(((0 0,20 0,20 20,0 20,0 0)),((5 5,5 7,7 7,7 5,5 5)))"

Note that the output is a MULTIPOLYGON with each ring being a separate polygon in the collection and not a single POLYGON (as was the source data) with inner rings which is what we do want as the result.

Searching the PostGIS documentation indicated that I should be able to reconstitute the rings into a single polygon via use of the ST_Difference aggregate. Here we want to take the outer ring as the first argument to ST_Difference and subtract the set of all inner rings from it:

 SELECT ST_AsEWKT(d.final_geom)
   FROM (SELECT ST_Difference(        ST_GeometryN(c.the_geom,1),
                              (SELECT ST_Collect(ST_GeometryN(c.the_geom,
                                                              generate_series(2,numgeometries(c.the_geom)))))::geometry
                             ) as final_geom
           FROM (SELECT ST_Collect(b.the_geom) as the_geom
                   FROM (SELECT (ST_DumpRings(a.geom)).geom As the_geom
                           FROM (SELECT ST_PolyFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))') as geom ) a
                         ) b
                  WHERE ST_Area(b.the_geom) > 1
                ) c
          GROUP BY c.the_geom
         ) d;
 -- Result
 ERROR: set-valued function called in context that cannot accept a set

This is because the ST_Collect returns geometry[] (ie a multi-geometry) and not a single geometry. (See documentation note: “Do not call with a GeometryCollection as an argument”)

Are there other solutions?

The ST_MakePolygon function looks useful:

ST_MakePolygon(linestring, [linestring[]])

Creates a Polygon formed by the given shell and array of holes. You can construct a geometry array using ST_Accum.

Though it requires linestring and not polygon arguments which will result in more complex SQL (as is shown below):

 SELECT ST_AsEWKT(ST_MakePolygon(c.outer_ring, d.inner_rings )) as final_geom
   FROM (/* Get outer ring of polygon */
         SELECT ST_ExteriorRing(b.the_geom) as outer_ring
           FROM (SELECT (ST_DumpRings(a.geom)).geom As the_geom, path(ST_DumpRings(a.geom)) as path
                   FROM (SELECT ST_PolyFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))') as geom
                        ) a
                 ) b
           WHERE path[1] = 0 /* ie the outer ring */
         ) c,
        (/* Get all inner rings > a particular area */
         SELECT ST_Accum(ST_ExteriorRing(b.the_geom)) as inner_rings
           FROM (SELECT (ST_DumpRings(a.geom)).geom As the_geom, path(ST_DumpRings(a.geom)) as path
                   FROM (SELECT ST_PolyFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))') as geom
                        ) a
                 ) b
           WHERE path[1] > 0 /* ie not the outer ring */
             AND ST_Area(b.the_geom) > 1
         ) d;
 -- Result
 "POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))"

Note, that we get our desired result.

The splitting of the SQL into the two halves to extract the outer ring and the inner rings (separately) had to be done because the only method of reconstructing the polygon was via the ST_MakePolygon function and it needs two inputs: a linestring for the outer ring and an array of linestrings for the inner rings left after being filtered by area. Sadly, ST_Collect() on the straight output of ST_DumpRings (with no filtering by path only area) just generates a multipolygon. I tried playing around with ST_Intersection etc but these still return a multipolygon.

Now, this SQL is fine for a single, hardcoded test geometry, but I think it would be messy if we tried to use it to process a lot of polygons in a table. The best way to approach this is to encapsulate the above SQL (a complete algorithm) in a function and then use the function when processing our tabular data.

 CREATE OR REPLACE FUNCTION Filter_Rings(geometry,float)
 RETURNS geometry AS
 $$
 SELECT ST_MakePolygon(c.outer_ring, d.inner_rings) as final_geom
   FROM (/* Get outer ring of polygon */
         SELECT ST_ExteriorRing(b.the_geom) as outer_ring
           FROM (SELECT (ST_DumpRings($1)).geom As the_geom, path(ST_DumpRings($1)) as path) b
           WHERE b.path[1] = 0 /* ie the outer ring */
        ) c,
       (/* Get all inner rings > a particular area */
        SELECT ST_Accum(ST_ExteriorRing(b.the_geom)) as inner_rings
          FROM (SELECT (ST_DumpRings($1)).geom As the_geom, path(ST_DumpRings($1)) as path) b
          WHERE b.path[1] > 0 /* ie not the outer ring */
            AND ST_Area(b.the_geom) > $2
        ) d
 $$
   LANGUAGE 'sql' IMMUTABLE;

And example of the use of Filter_Rings is:

 SELECT ST_AsEWKT(Filter_Rings(ST_PolyFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))') ,1::float));
 -- Result
 "POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))"

p. After submission to the PostGIS discussion list, the greatly skilled and experienced Regina Obe suggested a few small amendments to my SQL and function.

“You should do this:

SELECT (ST_DumpRings(a.geom)).*

Instead of this:

SELECT (ST_DumpRings(a.geom)).geom As the_geom, path(ST_DumpRings(a.geom)) as path

(Which would mean in the upper part you would need to reference by .geom instead of the_geom.)

The reason for that is internally PostgreSQL will call ST_DumpRings twice [cf Deterministic functions in Oracle). This was pointed out to me by a very experienced PostgreSQL developer: His blog entry about it is here

Implementing this suggestion leads to this change in the *filter_rings* function.

 CREATE OR REPLACE FUNCTION filter_rings(geometry, float)
   RETURNS geometry AS
 $BODY$
 SELECT ST_MakePolygon(c.outer_ring, d.inner_rings) as final_geom
   FROM (/* Get outer ring of polygon */
         SELECT ST_ExteriorRing(b.geom) as outer_ring
           FROM (SELECT (ST_DumpRings($1)).*) b
           WHERE b.path[1] = 0 /* ie the outer ring */
         ) c,
        (/* Get all inner rings > a particular area */
         SELECT ST_Accum(ST_ExteriorRing(b.geom)) as inner_rings
           FROM (SELECT (ST_DumpRings($1)).*) b
           WHERE b.path[1] > 0 /* ie not the outer ring */
             AND ST_Area(b.geom) > $2
         ) d
 $BODY$
   LANGUAGE 'sql' IMMUTABLE;

However, Regina also pointed out a way to shorten the SQL to something like that which I wanted at the start of this article by suggesting:

“I think ST_BuildArea might be better than ST_MakePolygon in this regard. It will work fine with a single closed ring and if multiple, it turns the inners to holes.

What I was thinking of was this.

 CREATE OR REPLACE FUNCTION upgis_filter_rings(geometry,float) RETURNS geometry AS
 $$ SELECT ST_BuildArea(ST_Collect(a.geom)) as final_geom
         FROM ST_DumpRings($1) AS a
           WHERE a.path[1] = 0 OR
                 (a.path[1] > 0 AND ST_Area(a.geom) > $2)
 $$
   LANGUAGE 'sql' IMMUTABLE;

“BuildArea just assumes everything outside is a polygon and everything inside is hole where as ST_MakePolygon takes your categorization of hole/vs shell as gospel.

So […] compar[ing]:

 SELECT ST_AsText(upgis_filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))'),2));
 -- Result
 "POLYGON((0 0,0 20,20 20,20 0,0 0),(5 5,7 5,7 7,5 7,5 5))"
 -- Another ....
 SELECT ST_AsText(filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))'),2));
 -- Result
 "POLYGON((0 0,0 20,20 20,20 0,0 0),(5 5,7 5,7 7,5 7,5 5))"
 .

We note that they yield the same answer.”

Regina goes on to say:

“The main disadvantage aside from possibly speed over Simon’s is that if you have 3D polygons, I think the above will squash them to 2D where as his approach will support 3D.”

Testing the 3D/2D issue leads to:

 SELECT ST_AsEWKT(upgis_filter_rings(ST_GeomFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))'),2));
 -- Result
 "POLYGON((0 0 1,0 20 1,20 20 1,20 0 1,0 0 1),(5 5 3,7 5 3,7 7 3,5 7 3,5 5 3))"
 .
 SELECT ST_AsEWKT(      filter_rings(ST_GeomFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))'),2));
 -- Result
 "POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))"
 .

Which shows that both approaches honour the third dimension.

2. Handling of Polygons with no Inner Rings

It is important that any function we deploy handles polygons with an outer ring but no inner rings. Testing Regina’s function and mine leads to this:

 SELECT ST_AsText(upgis_filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),2));
 -- Result
 "POLYGON((0 0,0 20,20 20,20 0,0 0))"
 .
 SELECT ST_AsText(filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),2));
 -- Result
 ""

So, the implementation based on ST_MakePolygon fails while Regina’s, based on ST_BuildArea, passes.

The problem with my approach appears to be in the handling of the “ST_Collect”ing (or “ST_Accum”ulating) of the inner rings. Regina again:

“It has to do with this annoying behavior of ST_MakePolygon and hmm ST_Accum that when the second argument is null it nullifies everything and I think ST_Accum is returning null when no result.

Now if you changed your function to this — it would work fine and ARRAY is faster than ST_Accum anyway.

Also you really don’t need the subselects so I took that out as well.”

So, the new filter_rings based on ST_MakePolygon becomes:

 CREATE OR REPLACE FUNCTION filter_rings(geometry, double precision)
   RETURNS geometry AS
 $BODY$
 SELECT ST_MakePolygon((/* Get outer ring of polygon */
         SELECT ST_ExteriorRing(geom) as outer_ring
           FROM ST_DumpRings($1)
           WHERE path[1] = 0 /* ie the outer ring */
         ),  ARRAY(/* Get all inner rings > a particular area */
         SELECT ST_ExteriorRing(geom) as inner_rings
           FROM ST_DumpRings($1)
           WHERE path[1] > 0 /* ie not the outer ring */
             AND ST_Area(geom) > $2
         ) ) as final_geom
 $BODY$
   LANGUAGE 'sql' IMMUTABLE;

And, testing:

 SELECT ST_AsEWKT(filter_rings(ST_GeomFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))'),2));
 -- Result
 "POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))"
 .
 SELECT ST_AsText(filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),2));
 -- Result
 "POLYGON((0 0,20 0,20 20,0 20,0 0))"

Success: thanks Regina!

3. Multi-Polygons

Of course, having a filter function for polygons is great, but many real-world definitions of area features involves multiple outer shells. So, we really need an implementation that handles mutli-polygons. Let’s try the following.

 SELECT ST_AsEWKT(ST_Collect(c.final_geom)) as filtered_geom
   FROM (SELECT ST_MakePolygon((/* Get outer ring of polygon */
     SELECT ST_ExteriorRing(b.the_geom) as outer_ring /* ie the outer ring */
     ), ARRAY(/* Get all inner rings > a particular area */
      SELECT ST_ExteriorRing(e.geom) as inner_ring
        FROM (SELECT (ST_DumpRings(b.the_geom)).*) e
       WHERE e.path[1] > 0 /* ie not the outer ring */
         AND ST_Area(e.geom) > 2
     ) ) as final_geom
          FROM (SELECT ST_GeometryN(a.geom,generate_series(1,numgeometries(a.geom))) as the_geom
                  FROM (SELECT ST_Multi(ST_GeomFromEWKT('MULTIPOLYGON(((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3)),((50 5 3,50 7 3,70 7 3,70 5 3,50 5 3)))')) as geom
                        ) a
               ) b
        ) c;
 -- Result
 "MULTIPOLYGON(((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3)),((50 5 3,50 7 3,70 7 3,70 5 3,50 5 3)))"

Encapsulating this in our filter_rings function:

 CREATE OR REPLACE FUNCTION filter_rings(geometry, double precision)
   RETURNS geometry AS
 $BODY$
 SELECT ST_BuildArea(ST_Collect(b.final_geom)) as filtered_geom
   FROM (SELECT ST_MakePolygon((/* Get outer ring of polygon */
     SELECT ST_ExteriorRing(a.the_geom) as outer_ring /* ie the outer ring */
     ), ARRAY(/* Get all inner rings > a particular area */
      SELECT ST_ExteriorRing(b.geom) as inner_ring
        FROM (SELECT (ST_DumpRings(a.the_geom)).*) b
       WHERE b.path[1] > 0 /* ie not the outer ring */
         AND ST_Area(b.geom) > $2
     ) ) as final_geom
          FROM (SELECT ST_GeometryN(ST_Multi($1),/*ST_Multi converts any Single Polygons to MultiPolygons */
                                    generate_series(1,ST_NumGeometries(ST_Multi($1)))
                                    ) as the_geom
                ) a
        ) b
 $BODY$
   LANGUAGE 'sql' IMMUTABLE;

Which, testing, gives us:

 SELECT ST_AsText(filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),2));
 -- Result
 "POLYGON((0 10,0 20,20 20,20 0,0 0))"
 .
 SELECT ST_AsEWKT(filter_rings(ST_GeomFromEWKT('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))'),2));
 -- Result
 "POLYGON((0 0 1,0 20 1,20 20 1,20 0 1,0 0 1),(5 5 3,7 5 3,7 7 3,5 7 3,5 5 3))"
 .
 SELECT ST_AsEWKT(filter_rings(ST_GeomFromEWKT('MULTIPOLYGON(((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3)),((50 5 3,50 7 3,70 7 3,70 5 3,50 5 3)))'),2));
 -- Result
 "MULTIPOLYGON(((0 0 1,0 20 1,20 20 1,20 0 1,0 0 1),(5 5 3,7 5 3,7 7 3,5 7 3,5 5 3)),((50 5 3,50 7 3,70 7 3,70 5 3,50 5 3)))"

Which, looks pretty good to me.

In summary, I think Regina’s suggested change to use ST_BuildArea is neater and cleaner than my use of ST_MakePolygon. But my final solution is a combination of both.

4. Performance

What is now needed is some performance analysis of Regina and my approaches especially given Regina’s comment alluding to performance.

I have a dataset which contains some very complicated rural polygons with multiple inner rings so I thought I would test using that. Here are the two test SQL statements:

 -- Test SQL 1
 SELECT upgis_filter_rings(a.geom,10000)
   FROM rural_areas a
  WHERE GeometryType(geom) = 'POLYGON'
  ORDER BY random()
  limit 1000;
 -- Test SQL 2
 SELECT filter_rings(a.geom,10000) /* , and ms */
   FROM rural_areas a
  WHERE GeometryType(geom) = 'POLYGON'
  ORDER BY random()
  LIMIT 1000;

I also recompiled filter_rings to include the fuller multi-geometry handling and ran the last SQL statement again. The relative performance can be seen in the table below.

            Function Run 1 ms Run 2 ms
upgis_filter_rings 77453 76500
filter_rings 23781 23890
(multi) filter_rings 86563 87625

Clearly the single polygon-handling ST_MakePolygon-based filter_rings performs faster than the ST_BuildArea-based upgis_filter_rings function. Converting filter_rings so that it handles multi-polygons pushes the performance out: this is probably due to the use of ST_BuildArea and the multi-polygon disaggregation/aggregation. I suspect changing Regina’s upgis_filter_rings to be multi-polygon aware would push the performance of that function out even further.

My attempt at converting Regina’s function is:

 CREATE OR REPLACE FUNCTION upgis_filter_rings(geometry,float) RETURNS geometry AS
 $$ SELECT ST_BuildArea(ST_Collect(d.built_geom)) as filtered_geom
      FROM (SELECT ST_BuildArea(ST_Collect(c.geom)) as built_geom
              FROM (SELECT b.geom
                      FROM (SELECT (ST_DumpRings(ST_GeometryN(ST_Multi($1),/*ST_Multi converts any Single Polygons to MultiPolygons */
                                                generate_series(1,ST_NumGeometries(ST_Multi($1)) )
                                                ))).*
                           ) b
                    WHERE b.path[1] = 0 OR
                         (b.path[1] > 0 AND ST_Area(b.geom) > $2)
                   ) c
            ) d
 $$
    LANGUAGE 'sql' IMMUTABLE;

Testing against the rural_areas table showed the function ran at 147281 (Run 1) and 144813 (Run 2) ms. Regina was right: upgis_filter_rings is slower because of its use of ST_BuildArea.

Summary

In summary, this has demonstrated to me that PostGIS is very functionally rich in the handling of basic simple features geometries. It has more construction and editing tools than any of the current commercial products though many of those have functionality that PostGIS still does not have.

I hope this article gives, the reader, some idea as to the direction you could take…