Top 5 Recent Articles
ARTICLES CATEGORIES
- Algorithms (22)
- All (399)
- Biography (1)
- Blog (44)
- Business Requirements (1)
- Commentary (1)
- Conversion (2)
- Customers (2)
- Data Models (1)
- Education (2)
- GeoRaptor (13)
- GPS (1)
- Image Processing (2)
- Import Export (8)
- Licensing (2)
- LiDAR (1)
- Linear Referencing (4)
- Manifold GIS (3)
- Mapping (1)
- MySQL Spatial (7)
- Networking and Routing (including Optimization) (5)
- Open Source (18)
- Oracle Spatial and Locator (194)
- Partitioning (1)
- PostGIS (36)
- Projections (1)
- Published Articles (1)
- qGIS (1)
- Recommendations (1)
- Services (1)
- Software Change Log (1)
- Source Code (37)
- Space Curves (9)
- Spatial Database Functions (109)
- Spatial DB comparison (1)
- Spatial XML Processing (11)
- SQL Server Spatial (92)
- Standards (3)
- Stored Procedure (17)
- Tessellation or Gridding (10)
- Tools (2)
- Topological Relationships (1)
- Training (2)
- Triangulation (2)
Effects of Sdo_Geometry Ordinate Precision on Storage
In this article I am going look at the effect ordinate precision has on the storage space occupied by tables containing sdo_geometry data. In a related article I will also look at the effect storage precision may have on performance (ie searching and returning data in SQL).
Sdo_Geometry Coordinate Precision
There is a little known side effect of using too many digits of precision when describing and storing spatial data using Oracle’s SDO_GEOMETRY object type. That side effect is that the storage requirements can be significantly greater than expected.
Unlike other spatial storage formats (normally some sort of compressed binary) Oracle storage is open. Very open. Oracle stores the ordinates describing the spatial data (SDO_GEOMETRY) in an array of NUMBER called SDO_ORDINATE_ARRAY.
Note: This is an array of NUMBER and not an array of binary double. Oracle stores a NUMBER in a variable storage format so small numbers take less space than larger NUMBERs. A BINARY_DOUBLE’s storage size is fixed regardless as to the precision of the data being held.
The ramification of this is that your storage costs are in relation to the number of digits used.
Simple Example
The Google Mercator Map coordinates that are created by projection from the original longitude/latitude data are unnecessarily large as can be shown from the following example:
SELECT sdo_cs.transform(sdo_geometry(2002,8311,NULL, sdo_elem_info_array(1,2,1), sdo_ordinate_array(147.123,-32.456,147.672,-33.739)),3785) AS geom FROM dual; -- Results -- GEOM ------------------------------------------------------------------------------------------------------------------- MDSYS.SDO_GEOMETRY(2002,3785,NULL, MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1), MDSYS.SDO_ORDINATE_ARRAY(16377657.4439788,-3800381.82007675,16438771.8444243,-3970070.49100647))
Being that the input data was only specified to 0.001 of a degree, an output – in meters – specified to 8 decimal places seems somewhat excessive.
But if I rounded this data to 1 decimal place what would be the potential saving in storage?
Oracle’s documentation shows how to calculate the size of a NUMBER as stored in the database. As such, an analysis was carried out on a table of Australian land parcels to see what the effect of rounding the ordinates to 1cm and 1mm would have on data size.
Let’s apply a bit of SQL to compute NUMBER size for the above Google Mercator geometry before and after a potential rounding of its ordinates:
SELECT SUMX,SUMY, SUMXR2,SUMYR2,100 - ROUND(((SUMXR2+SUMYR2)/(SUMX+SUMY) * 100),1) AS PCT_Save_cm, SUMXR3,SUMYR3,100 - ROUND(((SUMXR3+SUMYR3)/(SUMX+SUMY) * 100),1) AS PCT_Save_mm FROM (SELECT SUM(vsize(t.x)) AS SUMX, SUM(vsize(ROUND(T.X,2))) AS SUMXR2, SUM(vsize(ROUND(T.X,3))) AS SUMXR3, SUM(vsize(t.y)) AS SUMY, SUM(vsize(ROUND(T.Y,2))) AS SUMYR2, SUM(vsize(ROUND(T.Y,3))) AS SUMYR3 FROM (SELECT sdo_cs.transform( sdo_geometry(2002,8311,NULL, sdo_elem_info_array(1,2,1), sdo_ordinate_array(147.123,-32.456,147.672,-33.739)),3785) AS geom FROM dual) S, TABLE(SDO_UTIL.GETVERTICES(S.GEOM)) T ) f; -- Results -- SUMX SUMY SUMXR2 SUMYR2 PCT_SAVE_CM SUMXR3 SUMYR3 PCT_SAVE_MM ---- ---- ------ ------ ----------- ------ ------ ----------- 18 20 12 14 31.6 14 15 23.7
So, trimming the X and Y ordinates to 1cm (2 digits of precision) should produce a 31.6 percent reduction in storage, while trimming to 1mm should produce a 23.7 percent reduction in storage.
That’s a single object, what about geometries stored in tables?
Trimming or Rounding
To execute the trimming of an actual sdo_geometry object requires a relatively simply PL/SQL function. I created just such a function a long time ago which is available in my GEOM PL/SQL package available for free on this website. A standalone version of this function is presented in an associated article on this website called Rounding Coordinates or Ordinates in SDO_GEOMETRY
Baseline
Note that the original parcel data is also over specified:
SELECT geom FROM parcel a WHERE sdo_util.getNumVertices(a.geom) < 6 AND rownum < 2; -- Results -- GEOM ---------------------------------------------------------------------------------------------------------------- MDSYS.SDO_GEOMETRY(2003,82469,NULL, MDSYS.SDO_ELEM_INFO_ARRAY(1,1003,1), MDSYS.SDO_ORDINATE_ARRAY(447291.935307683,6470098.06758179,447294.953958522,6470116.87270875, 447258.829362703,6470122.5382358,447255.811490383,6470103.73299092, 447291.935307683,6470098.06758179))
So, before using the RoundOrdinates function, we need to get a baseline measure of storage.
WITH object_storage AS ( SELECT segment_name AS TABLE_NAME, 'ALL' AS column_name, '' AS segment_name, '' AS segment_subtype, s.bytes, s.blocks, s.extents FROM dba_segments s WHERE s.owner = 'LAND' AND s.segment_name IN ('PARCEL') UNION ALL SELECT l.TABLE_NAME,l.column_name, s.segment_name, '' AS segment_subtype, s.bytes, s.blocks, s.extents FROM user_lobs l, dba_segments s WHERE l.TABLE_NAME IN ('PARCEL') AND s.segment_name = l.segment_name ) SELECT TABLE_NAME,column_name,segment_name, to_char(SUM(bytes),'FM9,999,999,999,999') AS bytes,SUM(blocks) AS blocks,SUM(extents) AS extents FROM object_storage GROUP BY TABLE_NAME,column_name,segment_name; . TABLE_NAME COLUMN_NAME SEGMENT_NAME BYTES BLOCKS EXTENTS ------ ---------------------- ------------------------- ---------- ------ ------- PARCEL ALL 29,360,128 3584 43 PARCEL "GEOM"."SDO_ORDINATES" SYS_LOB0000077718C00021$$ 6,291,456 768 21 PARCEL "GEOM"."SDO_ELEM_INFO" SYS_LOB0000077718C00020$$ 65,536 8 1
Now, let’s make the change to the ordinates.
Changed Ordinates
DROP TABLE LM_PARCEL_P; . CREATE TABLE LM_PARCEL_P (FID,AREA,GID,LOT_NUMBER,SECTION,DP_NUMBER,ID_PARCEL_TYPE,ID_PARCEL_LEGAL_STATUS,ASSET,ADMIN_CODE,ID_ORIGIN,ID_ACCURACY,NARRATIVE,GEOM) NOLOGGING AS SELECT FID,AREA,GID,LOT_NUMBER,SECTION,DP_NUMBER,ID_PARCEL_TYPE,ID_PARCEL_LEGAL_STATUS,ASSET,ADMIN_CODE,ID_ORIGIN,ID_ACCURACY,NARRATIVE,CODESYS.ROUNDORDINATES(p.GEOM,0.0005) AS GEOMETRY FROM PARCEL p; commit; . ALTER TABLE LM_PARCEL_P LOGGING; . -- Calculate space usage of table and LOBs WITH object_storage AS ( SELECT segment_name AS TABLE_NAME, 'ALL' AS column_name, '' AS segment_name, '' AS segment_subtype, s.bytes, s.blocks, s.extents FROM dba_segments s WHERE s.owner = 'LAND' AND s.segment_name IN ('PARCEL_P') UNION ALL SELECT l.TABLE_NAME,l.column_name, s.segment_name, '' AS segment_subtype, s.bytes, s.blocks, s.extents FROM user_lobs l, dba_segments s WHERE l.TABLE_NAME IN ('PARCEL_P') AND s.segment_name = l.segment_name ) SELECT TABLE_NAME,column_name,segment_name, to_char(SUM(bytes),'FM9,999,999,999,999') AS bytes,SUM(blocks) AS blocks,SUM(extents) AS extents FROM object_storage GROUP BY TABLE_NAME,column_name,segment_name; . TABLE_NAME COLUMN_NAME SEGMENT_NAME BYTES BLOCKS EXTENTS -------- ---------------------- ------------------------- ----------- ------ ------- PARCEL_P ALL 25,165,824 3072 39 PARCEL_P "GEOM"."SDO_ORDINATES" SYS_LOB0000077747C00021$$ 3,145,728 384 18 PARCEL_P "GEOM"."SDO_ELEM_INFO" SYS_LOB0000077747C00020$$ 65,536 8 1
Results
The difference in sdo_ordinate storage is:
3,145,728 / 6,291,456 * 100 = 50%.
And of the table’s storage:
25,165,824 / 29,360,128 = 85%
Not bad.
So, the benefits of a decrease in disk storage and traffic should be obvious to all. But can we see any statistical benefits in this change? (These calculations should be ratified before making any decisions with respect to your data.) I will include some numbers in the next few days.
I hope this is useful to someone.
If you found this article informative, try the followup article on the effect rounding has on precision.
Documentation
- MySQL Spatial General Functions
- Oracle LRS Objects
- Oracle Spatial Exporter (Java + pl/SQL)
- Oracle Spatial Object Functions
- Oracle Spatial Object Functions (Multi Page)
- PostGIS pl/pgSQL Functions
- SC4O Oracle Java Topology Suite (Java + pl/SQL)
- SQL Server Spatial General TSQL Functions
- SQL Server Spatial LRS TSQL Functions