Using Oracle’s SDO_NN Operator – Some examples

This little article was occasioned by someone emailing me and asking:

Actually I do have a topic for you – clustering. Before we moved to Oracle Spatial, we had a request for a client to: “Show us all the stores that are close to each other”. I’m somewhat embarrassed to say that we did this by coordinate match.

What would be nice is to understand how to implement the query: Show me all stores within x distance of each other.

Well, there is nothing to be embarrassed about because you did this before you had Oracle Spatial (actually all the functionality used below is available to Locator users – thanks Oracle)! What I will do is outline the Oracle functionality that can do what the person wanted and give a practical example I used recently to solve a particular problem.

First off the function we need in Oracle is the SDO_NN spatial operator. (My preference for everything I do in Oracle is to try and push as much processing into a SQL statement before launching into PL/SQL ie programming.) The Oracle documentation on SDO_NN is quite thorough so I recommend that you read it in consort with this article: For example, the only thing I will say say about the SDO_NN_DISTANCE ancillary operator (used in the first example) is that it returns the actual distance Oracle computed between the searched for object and the search object supplied to the SDO_NN operator. The examples below will clarify its use. For a detailed discussion on the operators read the documentation. For now, I will limit myself to a few comments and examples.

 SDO_BATCH_SIZE vs SDO_NUM_RES 

One of the first things to understand is use of the sdo_batch_size and sdo_num_res parameters. (Again the documentation is quite thorough on these parameters.)

 sdo_num_res=<N>

simply returns exactly N nearest objects to the search object. So, in the example that follows the nearest object only is returned.

 gis@XE> select id,
   2            storetype,
   3            sdo_nn_distance(1)
   4       from store s
   5      where sdo_nn(s.geom,
   6                   mdsys.sdo_geometry(2003,NULL,NULL,
   7                         mdsys.sdo_elem_info_array(1,1003,3),
   8                         mdsys.sdo_ordinate_array(380004,5100003,390000,5160000)),
   9*                  'sdo_num_res=1',1) = 'TRUE'
 gis@XE> /
         ID STORETYPE  SDO_NN_DISTANCE(1)
 ---------- ---------- ------------------
        271 SHOE                215195.12

(The script to create the STORE table can be accessed here.)

Note that the store is a SHOE store. Let’s “widen” the search a little (note the use of the ORDER BY clause):

 gis@XE> select id,
   2         storetype,
   3         sdo_nn_distance(1)
   4    from store s
   5   where sdo_nn(s.geom,
   6                mdsys.sdo_geometry(2003,NULL,NULL,
   7                      mdsys.sdo_elem_info_array(1,1003,3),
   8                      mdsys.sdo_ordinate_array(380004,5100003,390000,5160000)),
   9                'sdo_num_res=5',1) = 'TRUE'
  10*  order by 3
 gis@XE> /
         ID STORETYPE  SDO_NN_DISTANCE(1)
 ---------- ---------- ------------------
        271 SHOE                215195.12
        100 COMPUTER           215228.103
         60 SHOE                215307.04
        494 VIDEO              215341.161
        493 FURNITURE          215347.298

So, what if we only want the nearest three SHOE stores to our search point? Can we just add the “storetype = ‘SHOE'”?

 gis>XE>  select id,
   2         storetype,
   3         sdo_nn_distance(1)
   4    from store s
   5   where sdo_nn(s.geom,
   6                mdsys.sdo_geometry(2003,NULL,NULL,
   7                      mdsys.sdo_elem_info_array(1,1003,3),
   8                      mdsys.sdo_ordinate_array(380004,5100003,390000,5160000)),
   9                'sdo_num_res=5',1) = 'TRUE'
  10     and s.storetype = 'SHOE'
  11*  order by 3
 gis@XE> /
         ID STORETYPE  SDO_NN_DISTANCE(1)
 ---------- ---------- ------------------
        271 SHOE                215195.12
         60 SHOE                215307.04

No this will not do what we want. Why? Because the sdo_num_res parameter returns the objects based on their geometry not their attributes: the 5 stores returned by SDO_NN with sdo_num_res=5 are the 5 nearest. No more are returned. These 5 objects are then passed to the added predicate resulting in only 2 of the 5 passing the test. Can we move the predicate around? No because the SDO_NN will still only return the same 5 objects to the SELECT statement. We have to find a way to increase the set that can be returned by SDO_NN. Why not just increase the sdo_num_res to, say, 10?

 gis@XE> select id,
   2         storetype,
   3         sdo_nn_distance(1)
   4    from store s
   5   where sdo_nn(s.geom,
   6                mdsys.sdo_geometry(2003,NULL,NULL,
   7                      mdsys.sdo_elem_info_array(1,1003,3),
   8                      mdsys.sdo_ordinate_array(380004,5100003,390000,5160000)),
   9                'sdo_num_res=10',1) = 'TRUE'
  10     and s.storetype = 'SHOE'
  11*  order by 3
 gis@XE> /
         ID STORETYPE  SDO_NN_DISTANCE(1)
 ---------- ---------- ------------------
        271 SHOE                215195.12
         60 SHOE                215307.04
        380 SHOE                215909.35

Yes, it works, but it is subject to one guessing a value for sdo_num_res that will work in all cases. For example, sdo_num_res=10 does not work for COMPUTER stores:

 gis@XE>  select id,
   2         storetype,
   3         sdo_nn_distance(1)
   4    from store s
   5   where sdo_nn(s.geom,
   6                mdsys.sdo_geometry(2003,NULL,NULL,
   7                      mdsys.sdo_elem_info_array(1,1003,3),
   8                      mdsys.sdo_ordinate_array(380004,5100003,390000,5160000)),
   9                'sdo_num_res=10',1) = 'TRUE'
  10     and s.storetype = 'COMPUTER'
  11*  order by 3
 gis@XE> /
         ID STORETYPE  SDO_NN_DISTANCE(1)
 ---------- ---------- ------------------
        100 COMPUTER           215228.103
        465 COMPUTER            215717.65

See, we only got two! Sure, we would increase the sdo_num_res batch size again but we are playing a guessing game.

This is why the sdo_batch_size parameter exists. It exists, as the documentation says,”If any geometries in the table might be nearer than the geometries specified in the WHERE clause”. We know SHOE stores in the table are nearer than our COMPUTER stores but it is COMPUTER stores that we want.

 gis@XE>  select id,
   2         storetype,
   3         sdo_nn_distance(1)
   4    from store s
   5   where sdo_nn(s.geom,
   6                mdsys.sdo_geometry(2003,NULL,NULL,
   7                      mdsys.sdo_elem_info_array(1,1003,3),
   8                      mdsys.sdo_ordinate_array(380004,5100003,390000,5160000)),
   9                'sdo_batch_size=0',1) = 'TRUE'
  10     and s.storetype = 'COMPUTER'
  11     and rownum < 4
  12*  order by 3
 gis@XE> /
         ID STORETYPE  SDO_NN_DISTANCE(1)
 ---------- ---------- ------------------
        100 COMPUTER           215228.103
        465 COMPUTER            215717.65
          1 COMPUTER           216857.982

A correct result. I will leave you to read up on sdo_batch_size parameter value setting in the documentation.

How do we find the 3 nearest SHOE stores to every SHOE store in the STORE table? As follows?

 gis@XE> select /*+ ORDERED USE_NL(s,s2)*/
   2         s.id,
   3         s2.id as nearestStoreId,
   4         sdo_nn_distance(1) as distance
   5    from store s,
   6   store s2
   7   where s.storetype = 'SHOE'
   8     and sdo_nn(s2.geom,
   9                s.geom,
  10                'sdo_batch_size=10',1) = 'TRUE'
  11     and s2.storetype = 'SHOE'
  12     and s2.id <> s.id
  13     and rownum < 4
  14   order by 1,3
  15  /
         ID NEARESTSTOREID   DISTANCE
 ---------- -------------- ----------
        394             39 5631.82065
        394            413  6387.1917
        394            462 6927.07224

Anyone spot the error? The rownum < 4 is applied to the final rowset and not to each s.store and its 3 nearest neighbours!

What we need to do is order the query such that the rownum test is applied to the neighbours of each and every store. One way to do this is via a correlated subquery as follows:

 gis@XE> select /*+ ORDERED USE_NL(s,s2)*/
   2         s.id,
   3         s2.id as nearestStoreId,
   4         mdsys.sdo_geom.sdo_distance(s.geom,s2.geom,0.05) as distance
   5    from store s,
   6   store s2
   7   where s.storetype = 'SHOE'
   8.    AND s2.id in (select id
   9                from store s3
  10               where sdo_nn(s3.geom,s.geom,'sdo_batch_size=10',1) = 'TRUE'
  11                      and s3.storetype = 'SHOE'
  12                      and s3.id <> s.id
  13                      and rownum < 4)
  14*  order by 1,2
 gis@XE> /
         ID NEARESTSTOREID   DISTANCE
 ---------- -------------- ----------
 ....
        490             82 7933.95789
        490            233 7704.35301
        490            480 5899.48249
        495            159 4850.15973
        495            243 5073.66333
        495            318 7008.11984
        499             41 2635.73385
        499             74 4104.74433
        499            430 4327.98326

A correct result.

Note: Because of limitations with returned values from correlated sub-queries, we cannot access any sdo_nn_distance() values so we simply use the sdo_geom.sdo_distance function to get out required distance. If I can come up with a better query (or perhaps my readers may have a better approach) I will amend this blog posting.

Quality of Returned Distance

What I want to turn to the types of distances SDO_NN calculates. It is an approximation to the actual geometric distance done so that the speed of the SDO_NN operator remains fast (which it is) through mainly RTree processing or is it computed by looking at the actual geometries? All I will do is conduct a very simple test using a point and a line.

In the Codesys.ProjLine2D table there is a simple, straight line geometry with linetype of ‘STRAIGHTVERTEX’ as follows:

 INSERT INTO ProjLine2D VALUES(
         'STRAIGHTVERTEX',
         SDO_GEOMETRY(2002, NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY( 380000.0, 5100000.0, 380000.0, 5160000.0)));
 

You will note that it is a horizontal line of length 60,000 meters. If I search this feature using two points to show that the SDO_NN distance calculation is quite accurate.

 gis@XE> select sdo_nn_distance(1)
   2    from codesys.projline2d p
   3    where p.linetype = 'STRAIGHTVERTEX'
   4    and sdo_nn(p.geom,mdsys.sdo_geometry(2001,NULL,mdsys.sdo_point_type(380010.0, 5160000.0,NULL),NULL,NULL),
   5*  'sdo_num_res=1',1) = 'TRUE'
 gis@XE> /
 SDO_NN_DISTANCE(1)
 ------------------
                 10
 gis@XE> select sdo_nn_distance(1)
   2    from codesys.projline2d p
   3    where p.linetype = 'STRAIGHTVERTEX'
   4    and sdo_nn(p.geom,mdsys.sdo_geometry(2001,NULL,mdsys.sdo_point_type(380010.0, 5155550.0,NULL),NULL,NULL),
   5*  'sdo_num_res=1',1) = 'TRUE'
 gis@XE> /
 SDO_NN_DISTANCE(1)
 ------------------
                 10

So, SDO_NN must look at the geometries of the objects it deals with to calculate its distances and not rely on MBRs and object vertices.

Hints

You will have noted that that all the queries above that involve a single table require no hints (as expected). However, the query against the two STORE tables above used the ORDERED and USE_NL (USE_Nested_Loops) hints. Why is this? Occasionally, queries just won’t run. For example, here is a query that does not work for the data I ship with my free PL/SQL packages on the copy of XE (10gr) I am running:

 gis@XE> select p.id,
   2         l.linetype,
   3         sdo_nn_distance(1)
   4    from codesys.projpoint2d sample (0.1) p,
   5         codesys.projline2d l
   6   where sdo_nn(l.geom,p.geom,'sdo_num_res=5',1) = 'TRUE'
   7     and l.linetype like '%VERTEX%'
   8  /
 select p.id,
 *
 ERROR at line 1:
 ORA-13249: SDO_NN cannot be evaluated without using index
 ORA-06512: at "MDSYS.MD", line 1723
 ORA-06512: at "MDSYS.MDERR", line 17
 ORA-06512: at "MDSYS.PRVT_IDX", line 9

I have note that this can happen where a non-spatial attribute used in a predicate has its own index (and sometimes not). This issue is documented in the Spatial Operators chapter of the Oracle Spatial documentation and also in the excellent book Pro Oracle Spatial. The recommended solution is to include hints. Sometimes adding the /*+ORDERED*/ hint on its own can work but in other cases not. The ones recommended in the book involve knowing about the name of the RTRee and ordinary indexes over the CODESYS.PROJLINE2D table (/*+INDEX(rtree index name)*/). Similarly for the documentation though it does suggested the LEADING hint in cases involving a join between the two tables in the FROM clause used in the SDO_NN operator. I have found that, at times, the ORDERED, INDEX and NO_INDEX hints are not enough: I have had to specify the LEADING or USE_NL (USE_Nested_Loops) hints as in the following examples:

 gis@XE> select /*+ORDERED USE_NL(p,l) */
   2         p.id,
   3         l.linetype,
   4         sdo_nn_distance(1)
   5    from codesys.projpoint2d sample (0.1) p,
   6         codesys.projline2d l
   7   where sdo_nn(l.geom,p.geom,'sdo_num_res=5',1) = 'TRUE'
   8     and l.linetype like '%VERTEX%'
   9 /
         ID LINETYPE                                 SDO_NN_DISTANCE(1)
 ---------- ---------------------------------------- ------------------
        393 VERTEXARC                                        156738.398
        393 VERTEX                                           241507.531
        393 STRAIGHTVERTEX                                   249322.965
        393 45DEGREEVERTEX                                   300109.767

You should read the documentation and be prepared to experiment with different hints.

A Real Example

Finally, I thought I would leave you with a real example derived from something I had to do for a customer recently. What we will do is find the nearest vertex-to-vertex segment (greater than 5 meters in length) of a land parcel polygon to a road centreline with the same name. Here is a picture of some data and a road centreline. Note that the distance to the road centreline will be the same for a parcel side boundary where it joins a front boundary as both share the same (corner) vertex. To get around this we will use the mid-point of each segment.

!/images/15.gif (Before sdo_nn/min-point processing)!

The identifiers of the hightlighted land parcels are: oid in ( 26388, 26386, 26387, 26392, 26390, 26391, 26389 )

Here is a query that achieves what we want to do.

 gis@XE> select /*+ ORDERED USE_NL(a,s)*/
   2           a.oid,
   3             sdo_geom.sdo_length(sdo_geometry(2002,8307,null,
   4                                              sdo_elem_info_array(1,2,1),
   5                                              sdo_ordinate_array(b.startcoord.x,b.startcoord.y,b.endcoord.x,b.endcoord.y)),
   6                                    0.05) length
   7          from land_parcel a,
   8               table(codesys.geom.getpipedvector2d(a.geom)) b,
   9               road_centreline s
  10          where a.oid in ( 26388, 26386, 26387, 26392, 26390, 26391, 26389 )
  11            and sdo_nn(s.geom,
  12                       sdo_geometry(2001,8307, sdo_point_type((b.startcoord.x+b.endcoord.x)/2,(b.startcoord.y+b.endcoord.y)/2,null),
  13                                    NULL,NULL),
  14                       'sdo_num_res=1') = 'TRUE'
  15            and s.street_name = a.street_name
  16            and sdo_geom.sdo_length(sdo_geometry(2002,8307,null,
  17                                                 sdo_elem_info_array(1,2,1),
  18                                                 sdo_ordinate_array(b.startcoord.x,b.startcoord.y,b.endcoord.x,b.endcoord.y)),
  19                                    0.05) > 5
  20  order by 1,2
  21  /
        OID     LENGTH
 ---------- ----------
      26386  16.598107
      26386 35.6314592
      26386 35.6419791
      26387 15.5455833
      26387 17.5226582
      26387 32.3342913
      26387 35.6419791
      26388 11.9649317
      26388 32.3342913
      26388 37.0525111
      26389 13.0049383
      26389 36.2842051
      26389 37.0525111
      26390 16.9069198
      26390 36.2708158
      26390 36.2842051
      26391 18.5583617
      26391 19.6084577
      26391 36.2663495
      26391 36.6570959
      26392  18.900781
      26392 36.2663495
      26392 36.2708158
 
 23 rows selected.

But we want the nearest vector for each OID. The SQL is:

 gis@XE> select /*+ ORDERED USE_NL(a,s) */
   2             a.oid,
   3             min(sdo_geom.sdo_length(sdo_geometry(2002,8307,null,
   4                                              sdo_elem_info_array(1,2,1),
   5                                              sdo_ordinate_array(b.startcoord.x,b.startcoord.y,b.endcoord.x,b.endcoord.y)),
   6                                    0.05) ) as length
   7          from land_parcel a,cad_poly
   8               table(codesys.geom.getpipedvector2d(a.geom)) b,
   9               road_centreline s
  10          where a.oid in ( 26388, 26386, 26387, 26392, 26390, 26391, 26389 )
  11            and sdo_nn(s.geom,
  12                         sdo_geometry(2001,8307, sdo_point_type((b.startcoord.x+b.endcoord.x)/2,(b.startcoord.y+b.endcoord.y)/2,null),
  13                                      NULL,NULL),
  14                       'sdo_num_res=1') = 'TRUE'
  15            and s.street_name = a.street_name
  16            and sdo_geom.sdo_length(sdo_geometry(2002,8307,null,
  17                                                 sdo_elem_info_array(1,2,1),
  18                                                 sdo_ordinate_array(b.startcoord.x,b.startcoord.y,b.endcoord.x,b.endcoord.y)),
  19                                    0.05) > 5
  20        group by a.oid
  21  /
        OID     LENGTH
 ---------- ----------
      26388 11.9649317
      26392  18.900781
      26391 18.5583617
      26387 15.5455833
      26390 16.9069198
      26386  16.598107
      26389 13.0049383
 
 7 rows selected.

Finally, while we could use the MIN() function with an appropriate GROUP BY clause to return the vector with the minimum distance to the road centreline, we cannot easily return the midpoint of the nearest vector of each OID using these operators. To do so requires some tricky SQL using a rank/partition analytic to find the first vector whose distance to the road centreline is the minimum of all vectors that make up any one land parcel (rank = 1). Here is the final result.

 gis@XE> select oid,cldist,midpoint,parcel_vector
   2   from (select oid,cldist,midpoint,parcel_vector,
   3                RANK() OVER (PARTITION BY oid ORDER BY cldist) as oidrank
   4.          from ( select /*+ ORDERED USE_NL(a,s)*/
   5                         a.oid,
   6                         sdo_geom.sdo_distance(s.geom,
   7                                               sdo_geometry(2001,8307,
   8                                               sdo_point_type((b.startcoord.x+b.endcoord.x)/2,(b.startcoord.y+b.endcoord.y)/2,null),
   9                                               NULL,NULL),
  10                                               0.05) as cldist,
  11                         sdo_geometry(2002,8307,null,
  12                                      sdo_elem_info_array(1,2,1),
  13                                      sdo_ordinate_array(b.startcoord.x,b.startcoord.y,b.endcoord.x,b.endcoord.y)) as parcel_vector,
  14                         sdo_geometry(2001,8307,
  15                                      sdo_point_type((b.startcoord.x+b.endcoord.x)/2,(b.startcoord.y+b.endcoord.y)/2,null),
  16                                      NULL,NULL) as midpoint
  17                    from land_parcel a,
  18                         table(codesys.geom.getpipedvector2d(a.geom)) b,
  19                         road_centreline s
  20                    where a.oid in ( 26388, 26386, 26387, 26392, 26390, 26391, 26389 )
  21                      and sdo_nn(s.geom,
  22                                sdo_geometry(2001,8307,sdo_point_type((b.startcoord.x+b.endcoord.x)/2,(b.startcoord.y+b.endcoord.y)/2,null),NULL,NULL),
  23                                'sdo_num_res=1') = 'TRUE'
  24                     and s.street_name = a.street_name
  25                     and sdo_geom.sdo_length(sdo_geometry(2002,8307,null,
  26                                                          sdo_elem_info_array(1,2,1),
  27                                                          sdo_ordinate_array(b.startcoord.x,b.startcoord.y,b.endcoord.x,b.endcoord.y)),
  28                                             0.05) > 5
  29               )
  30        )
  31   where oidrank = 1
  32  /
        OID     CLDIST
 ---------- ----------
 MIDPOINT(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
 -------------------------------------------------------------------------------
 PARCEL_VECTOR(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINATES)
 ------------------------------------------------------------------------------------
      26386 5.72710973
 SDO_GEOMETRY(2001, 4030, SDO_POINT_TYPE(144.673904, -37.862795, NULL), NULL, NULL)
 SDO_GEOMETRY(2002, 4030, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(144.673898, -37.862869, 144.673909, -37.86272))
      26387 10.1054417
 SDO_GEOMETRY(2001, 4030, SDO_POINT_TYPE(144.67386, -37.862933, NULL), NULL, NULL)
 SDO_GEOMETRY(2002, 4030, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(144.673822, -37.862996, 144.673898, -37.862869))
     26388 16.3701483
 SDO_GEOMETRY(2001, 4030, SDO_POINT_TYPE(144.673864, -37.863038, NULL), NULL, NULL)
 SDO_GEOMETRY(2002, 4030, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(144.673905, -37.863081, 144.673822, -37.862996))
     26389 11.580563
 SDO_GEOMETRY(2001, 4030, SDO_POINT_TYPE(144.673974, -37.863058, NULL), NULL, NULL)
 SDO_GEOMETRY(2002, 4030, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(144.674042, -37.863036, 144.673905, -37.863081))
     26390 6.43017772
 SDO_GEOMETRY(2001, 4030, SDO_POINT_TYPE(144.674138, -37.863039, NULL), NULL, NULL)
 SDO_GEOMETRY(2002, 4030, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(144.674234, -37.863043, 144.674042, -37.863036))
     26391 5.70850051
 SDO_GEOMETRY(2001, 4030, SDO_POINT_TYPE(144.674554, -37.863055, NULL), NULL, NULL)
 SDO_GEOMETRY(2002, 4030, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(144.674659, -37.863059, 144.674449, -37.863051))
     26392 6.07655561
 SDO_GEOMETRY(2001, 4030, SDO_POINT_TYPE(144.674341, -37.863047, NULL), NULL, NULL)
 SDO_GEOMETRY(2002, 4030, NULL, SDO_ELEM_INFO_ARRAY(1, 2, 1), SDO_ORDINATE_ARRAY(144.674449, -37.863051, 144.674234, -37.863043))
 
 7 rows selected.

Mapping these midpoints (large gray circles) produces the following.

!/images/14.gif (After sdo_nn/min-point processing)!

Given Thomas’s comments below I thought I would show an image of the solution I created from a new algorithm that is much improved algorithm over the one above. Here the requirement is to place a centroid point 3 meters inside the road boundary of a land parcel: if possible the centroid should be in the middle of the road frontage. If anyone is interested in using this algorithm (encapsulated inside a fairly short PL/SQL Procedure), please contact me, but generally the algorithm:

  • Filters out all boundaries that are shared in adjacent polygons;
  • Takes advantage of the anti-clockwise winding of the outer shell of a polygon to create the point 3 meters inside the boundary;
  • Uses SDO_NN to find the correct boundary (both the land parcel and road centreline data I was provided have road name as an attribute);
  • Uses my CENTROID.sdo_centroid() function to find the middle point of the chosen road frontage.

Here is the finaly image after processing (I know it is a bit garish but I have deliberately coloured the parcel and road centrelines by their road name to show how the algorithm correctly finds the land parcel frontage closest to its named road).

!http://www.spatialdbadvisor.com/images/65.gif (Land Parcel Centroids 3m Inside Road Frontage)!

If you reached the end of this article, thanks for persevering with the material. I hope you got something out of it.