Duplicate Geometry data and Data Models

Over on the Oracle Technology Spatial Forum this question was asked:

I want to store duplicate geometry intentionally, will it have any problems ??
Because same data comes from different sources and going to be big database system.

(The article is here )

Here is my answer repackaged for this blog.

The questioner’s question can be turned around to point to a more generic question (for remember: “spatial is not special”!).

Do database tables with date columns mind if you store the same date more than once? Same for numbers, strings etc?

Clearly, the answer is no!

Attribute Uniqueness

Now, if a duplicate date was not allowed for the same observation of a plant in a table called FLORA, we would stop this by creating a unique constraint on the table:

 ALTER TABLE flora ADD CONSTRAINT flora_plant_uq UNIQUE(plant_id,obs_date);

So the issue about what constitutes valid duplication or not is the way you implement constraints on the table.

Constraints and Models

Constraints reflect the “business rules” that constrain the data you are storing so that it is “fit for [your] purpose”

So, in the end, it is all about the data model.

One should start all data storage by creating a data model that exposes the rules you want for the data.

So, for our OBS_DATE column in our FLORA table we may wish that the values entered are valid. But we define valid to mean from 1st January 2005 onwards.

 ALTER TABLE flora ADD CONSTRAINT flora_obs_date_ck CHECK(obs_date >= '01/JAN/2005');

SQL Developer 3.x has a free data modelling tool built in: it is an excellent tool for designing basic data models.

Geometry Constraints

It is difficult to implement uniqueness constraints that involve sdo_geometry data. Perhaps one day our SQL/OGC standards bodies and implementing database vendors will catch up on this problem.

Even so, we can do something. For example, if the flora table had an sdo_geometry column, called location, which was a point, this will work:

 ALTER TABLE flora ADD CONSTRAINT flora_location_uq UNIQUE (location.sdo_point.x,location.sdo_point.y);
 -- or with other columns
 ALTER TABLE flora ADD CONSTRAINT flora_location_uq UNIQUE (plant_id,obs_date,location.sdo_point.x,location.sdo_point.y);

However, this would not work for linestring/polygon data. For example, let’s assume the FLORA table allowed observations of a plant to include a patch ie not a single plant (eg a tree) but an area of blackberries. We can’t do the following because one cannot reference a function in the constraint:

 ALTER TABLE flora ADD CONSTRAINT flora_location_uq UNIQUE (plant_id,obs_date,location.get_wkt());

OK, so how can we do this?

AFAIK, we can only do this via a trigger:

 CREATE OR REPLACE TRIGGER flora_location_ai
 after INSERT ON flora
 FOR each ROW
 DECLARE
   v_ok pls_integer;
 BEGIN
   SELECT 1
     INTO v_ok
     FROM flora g
    WHERE g.plant_id = NEW.plant_id
      AND g.obs_date = NEW.obs_date
      AND SDO_EQUAL(g.geom,:NEW.geom) = 'TRUE';
   -- We found a duplicate, so raise an error
   raise_application_error(-20001,'Duplicate location found for (' || :NEW.plant_id || ',' || NEW.obs_date || ')');
   EXCEPTION
      WHEN NO_DATA_FOUND THEN
          RETURN; -- No duplicates
 END;
 /

Summary

So, in short, you can store anything in any Oracle table. In a table with 1 million rows you could have:

1. Only 2 unique values of an column called SPECIES.
2. 20,000 unique OBS_DATE values.
3. 200,00 unique LOCATION values.

But whether this is right or not (garbage) depends on the model and the constraints that represent “correctness”.

Folks, the best advice I can ever give you is: Start with the model.