Top 5 Recent Articles
- Algorithms (15)
- All (399)
- Biography (1)
- Blog (44)
- Business Requirements (1)
- Commentary (1)
- Customers (2)
- Data Models (1)
- Education (2)
- GeoRaptor (13)
- Image Processing (2)
- Import Export (8)
- Licensing (2)
- Linear Referencing (4)
- Manifold GIS (3)
- Mapping (1)
- MySQL Spatial (7)
- Networking and Routing (including Optimization) (3)
- Open Source (18)
- Oracle Spatial and Locator (192)
- PostGIS (34)
- Published Articles (1)
- Recommendations (1)
- Services (1)
- Software Change Log (1)
- Source Code (35)
- Space Curves (9)
- Spatial Database Functions (106)
- Spatial DB comparison (1)
- Spatial XML Processing (10)
- SQL Server Spatial (General) (83)
- SQL Server Spatial (LRS) (38)
- Standards (1)
- Stored Procedure (15)
- Tessellation or Gridding (10)
- Tools (2)
- Topological Relationships (1)
- Training (2)
- Triangulation (2)
Duplicate Geometry data and Data Models
Over on the Oracle Technology Spatial Forum this question was asked:
I want to store duplicate geometry intentionally, will it have any problems ??
Because same data comes from different sources and going to be big database system.
(The article is here )
Here is my answer repackaged for this blog.
The questioner’s question can be turned around to point to a more generic question (for remember: “spatial is not special”!).
Do database tables with date columns mind if you store the same date more than once? Same for numbers, strings etc?
Clearly, the answer is no!
Now, if a duplicate date was not allowed for the same observation of a plant in a table called FLORA, we would stop this by creating a unique constraint on the table:
ALTER TABLE flora ADD CONSTRAINT flora_plant_uq UNIQUE(plant_id,obs_date);
So the issue about what constitutes valid duplication or not is the way you implement constraints on the table.
Constraints and Models
Constraints reflect the “business rules” that constrain the data you are storing so that it is “fit for [your] purpose”
So, in the end, it is all about the data model.
One should start all data storage by creating a data model that exposes the rules you want for the data.
So, for our OBS_DATE column in our FLORA table we may wish that the values entered are valid. But we define valid to mean from 1st January 2005 onwards.
ALTER TABLE flora ADD CONSTRAINT flora_obs_date_ck CHECK(obs_date >= '01/JAN/2005');
SQL Developer 3.x has a free data modelling tool built in: it is an excellent tool for designing basic data models.
It is difficult to implement uniqueness constraints that involve sdo_geometry data. Perhaps one day our SQL/OGC standards bodies and implementing database vendors will catch up on this problem.
Even so, we can do something. For example, if the flora table had an sdo_geometry column, called location, which was a point, this will work:
ALTER TABLE flora ADD CONSTRAINT flora_location_uq UNIQUE (location.sdo_point.x,location.sdo_point.y); -- or with other columns ALTER TABLE flora ADD CONSTRAINT flora_location_uq UNIQUE (plant_id,obs_date,location.sdo_point.x,location.sdo_point.y);
However, this would not work for linestring/polygon data. For example, let’s assume the FLORA table allowed observations of a plant to include a patch ie not a single plant (eg a tree) but an area of blackberries. We can’t do the following because one cannot reference a function in the constraint:
ALTER TABLE flora ADD CONSTRAINT flora_location_uq UNIQUE (plant_id,obs_date,location.get_wkt());
OK, so how can we do this?
AFAIK, we can only do this via a trigger:
CREATE OR REPLACE TRIGGER flora_location_ai after INSERT ON flora FOR each ROW DECLARE v_ok pls_integer; BEGIN SELECT 1 INTO v_ok FROM flora g WHERE g.plant_id = NEW.plant_id AND g.obs_date = NEW.obs_date AND SDO_EQUAL(g.geom,:NEW.geom) = 'TRUE'; -- We found a duplicate, so raise an error raise_application_error(-20001,'Duplicate location found for (' || :NEW.plant_id || ',' || NEW.obs_date || ')'); EXCEPTION WHEN NO_DATA_FOUND THEN RETURN; -- No duplicates END; /
So, in short, you can store anything in any Oracle table. In a table with 1 million rows you could have:
1. Only 2 unique values of an column called SPECIES.
2. 20,000 unique OBS_DATE values.
3. 200,00 unique LOCATION values.
But whether this is right or not (garbage) depends on the model and the constraints that represent “correctness”.
Folks, the best advice I can ever give you is: Start with the model.