Loading Spatial Data from an external CSV file in Oracle

There have been many, many articles written on the web describing how to access external data (data held in operating system files not under the control of Oracle and not stored within its own datafiles and tablespaces) from within Oracle such that one can query the data using SQL.

In short, Oracle allows a user to access external data by wrapping a table definition over the top of the external data files and using its SQL loader and DataPump technologies to convert the data “on the fly” as users query it. As the Oracle documentation notes, this is a read-only table whose metadata is stored in the database but whose data in stored outside the database.

The table “wrapper” is achieved by using the CREATE TABLE statement to define the columns of the table in the normal manner, but the appending to the definition its physical_properties, in particular, the ORGANIZATION EXTERNAL clause.

This clause is made up of two dependent sub-clauses, the external_table_clause and the external_data_properties that allows you to specify the properties so described. For example, the TYPE clause of external_table_clause allows you to indicate the driver to be used to access external table’s data. Oracle database provides two access drivers: ORACLE_LOADER and ORACLE_DATAPUMP. Whereas the ACCESS PARAMETERS clause of “external_data_properties” allows you to describe the physical layout of the file (what type of delimiter, number of fields etc).

The best way to describe the external data loading capability of Oracle is to show you how to use it to load a small sample of LiDAR data (after it has been post-processed).

Firstly, we need to place our data in an operating system directory which Oracle can access, (in this example I have placed a file called ground.csv in C:\Temp\Lidar), and then create an Oracle Directory object and grant the user that will create and load the data (here my standard “CODESYS” user), read access on that directory.

 -- Create directory that points to actual data file.
 CREATE OR REPLACE DIRECTORY EXTERNAL_DATA_DIRECTORY AS 'C:\temp\lidar';

 CREATE OR REPLACE DIRECTORY succeeded.

 -- If create directory done by DBA as SYSTEM then you would need to grant the user doing the processing the ability to read from that directory via:
 GRANT READ ON DIRECTORY EXTERNAL_DATA_DIRECTORY TO codesys;

 GRANT READ succeeded.

Now we need to define our table object in Oracle. We also need to describe its external properties as part of the one definition.

 DROP   TABLE lidar;

 DROP   succeeded.

 CREATE TABLE lidar(
   easting NUMBER,
  northing NUMBER,
    height NUMBER
 )
 ORGANIZATION EXTERNAL
 (TYPE ORACLE_LOADER
 DEFAULT DIRECTORY DATA_PUMP_DIR
 ACCESS PARAMETERS (
          RECORDS DELIMITED BY NEWLINE 
          FIELDS TERMINATED BY ',' 
          MISSING FIELD VALUES ARE NULL 
          (easting, northing, height) 
 	 )
 LOCATION (EXTERNAL_DATA_DIRECTORY:'ground.csv'))
 REJECT LIMIT UNLIMITED;

 CREATE TABLE succeeded.

 --Note that if our external file was TAB delimited we would use the following clause:
 -- FIELDS TERMINATED BY 0x'09' 

Now we execute SQL SELECT statements against the table.

 SELECT count(*) 
   FROM lidar;

 COUNT(*)               
 ---------------------- 
 40401                  

 SELECT count(*)
   FROM lidar l
  WHERE l.easting  BETWEEN 478040  AND 478060
    AND l.northing BETWEEN 5228050 AND 5228070
    AND l.height   BETWEEN 200     AND 600;

 1 rows selected
 COUNT(*)               
 ---------------------- 
 301                    

 1 rows selected

I will now demonstrate that you cannot create spatial indexes on this table. Firstly, note that I have no SDO_GEOMETRY column in this table that I can index. However, I could try creating a function-based index (even though I doubt it would work because Oracle indexes normally contain ROWIDs that link the index to the actual row in the table holding the data. External tables do not have ROWIDs.)

It is worth noting the following from the Oracle documentation.

Restrictions on External Tables
External tables are subject to the following restrictions:

  • An external table cannot be a temporary table.
  • You cannot specify constraints on an external table.
  • An external table cannot have object type, varray, or LONG columns.

(I will look at the last limitation at the end of this article.)

Let’s create the function based index in the common manner, including the metadata.

 -- Create Function that returns a point geometry so we can create a function-based index
 --
 create or replace function observation(p_srid in number,
                                          p_x    in number, 
                                          p_y    in number, 
                                          p_z    in number )
        return mdsys.sdo_geometry deterministic
 Is
 Begin
   Return MDSYS.SDO_GEOMETRY(3001,p_srid,mdsys.sdo_point_type(p_x,p_y,p_z),NULL,NULL);
 End observation;
 /
 show errors

 function observation(p_srid Compiled.
 No Errors.

 -- Generate sdo_geom_metadata entry
 --
 DELETE FROM USER_SDO_GEOM_METADATA
       WHERE table_name = 'LIDAR';

 1 rows deleted
 COMMIT;

 COMMIT succeeded.
 
 -- Generate USER_SDO_GEOM_METADATA entry based on actual extent of data in external table.
 DECLARE
   v_minx Number;
   v_maxx Number;
   v_miny Number;
   v_maxy Number;
   v_minz Number;
   v_maxz Number;
 BEGIN
   SELECT min(easting),max(easting),min(northing),max(northing),min(height),max(height)
     INTO       v_minx,      v_maxx,        v_miny,       v_maxy,      v_minz,     v_maxz
     FROM LIDAR;
   INSERT INTO USER_SDO_GEOM_METADATA(
     TABLE_NAME,
     COLUMN_NAME,
     DIMINFO,
     SRID
     ) VALUES (
     'LIDAR',
     'OBSERVATION(28355,EASTING,NORTHING,HEIGHT)',
     MDSYS.SDO_DIM_ARRAY(
       MDSYS.SDO_DIM_ELEMENT('X',v_minx,v_maxx,0.005),
       MDSYS.SDO_DIM_ELEMENT('Y',v_miny,v_maxy,0.005),
       MDSYS.SDO_DIM_ELEMENT('Z',v_minz,v_maxz,0.005)
       ),
     28355
   );
 END;
 /
 SHOW ERRORS

 anonymous block completed
 No Errors.

 COMMIT;

 COMMIT succeeded.

 -- Let's check to make sure the entry has been created
 SELECT TABLE_NAME,COLUMN_NAME,DIMINFO 
   FROM USER_SDO_GEOM_METADATA 
  WHERE TABLE_NAME = 'LIDAR';

 TABLE_NAME COLUMN_NAME                                DIMINFO
 ---------- ------------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------
 LIDAR      OBSERVATION(28355,EASTING,NORTHING,HEIGHT) MDSYS.SDO_DIM_ARRAY(MDSYS.SDO_DIM_ELEMENT(X,478000,478100,0.005),MDSYS.SDO_DIM_ELEMENT(Y,5228000,5228100,0.005),MDSYS.SDO_DIM_ELEMENT(Z,1.019,1999.959,0.005))

 1 rows selected

 -- Now try and create an index on this function

 DROP INDEX LIDAR_GEOM;

 Error starting at line 84 in command:
 DROP INDEX LIDAR_GEOM
 Error report:
 SQL Error: ORA-01418: specified index does not exist
 01418. 00000 -  "specified index does not exist"

 CREATE INDEX LIDAR_GEOM
        ON LIDAR(OBSERVATION(28355,EASTING,NORTHING,HEIGHT))
        INDEXTYPE is mdsys.spatial_index 
        parameters('sdo_indx_dims=2, layer_gtype=point');

 Error starting at line 85 in command:
 CREATE INDEX LIDAR_GEOM
        ON LIDAR(OBSERVATION(28355,EASTING,NORTHING,HEIGHT))
        INDEXTYPE is mdsys.spatial_index 
        parameters('sdo_indx_dims=2, layer_gtype=point')
 Error at Command Line:86 Column:10
 Error report:
 SQL Error: ORA-30657: operation not supported on external organized table
 30657.0000 -  "operation not supported on external organized table"
 *Cause:    User attempted on operation on an external table which is
            not supported.
 *Action:   Don't do that!

Don’t you love the recommended Action!

OK, so our external data of LiDAR points can really on read via full-table scans as one cannot create an index on an externally organized table.

If we want to spatially index we have to bring the data inside the database which we will now do.


 -- Rename our existing LiDAR table to LiDAR_EXT
 DROP TABLE lidar_ext;

 Error starting at line 90 in command:
 drop table lidar_ext
 Error report:
 SQL Error: ORA-00942: table or view does not exist
 00942. 00000 - "table or view does not exist"

 RENAME lidar TO lidar_ext;

 rename lidar succeeded.

 CREATE TABLE lidar
 AS
 SELECT sdo_geometry(3001,28355,sdo_point_type(easting,northing,height),null,null) as geom
   FROM lidar_ext;

 create table succeeded.

 -- Change column name in existing USER_SDO_GEOM_METADATA
 UPDATE user_sdo_geom_metadata
    SET column_name = 'GEOM'
  WHERE table_name = 'LIDAR';

 1 rows updated 

 COMMIT;

 commit succeeded.

 -- Now, rebuild index.
   DROP INDEX LIDAR_GEOM;

 Error starting at line 104 in command:
 DROP INDEX LIDAR_GEOM
 Error report:
 SQL Error: ORA-01418: specified index does not exist
 01418. 00000 - "specified index does not exist"

 CREATE INDEX LIDAR_GEOM
        ON LIDAR(GEOM) 
        INDEXTYPE is mdsys.spatial_index
        parameters('sdo_indx_dims=2, layer_gtype=point');

 CREATE INDEX succeeded.
 
 DROP TABLE LIDAR_EXT;
 
 DROP TABLE LIDAR_EXT succeeded.

So, now we can query the data.

 SELECT count(*)
   FROM lidar l
  WHERE SDO_ANYINTERACT(l.geom,sdo_geometry(2003,28355,null,sdo_elem_info_array(1,3,3),sdo_ordinate_array(478040,5228050,478060,5228070))) = 'TRUE'
    AND l.geom.sdo_point.z BETWEEN 200 AND 600;

 COUNT(*)
 ---------------------- 
 301

 1 rows selected

Data Other than Points

Our example is based on point data that is loaded into three numeric attributes rather than to an SDO_GEOMETRY object directly. As noted above:

  • An external table cannot have object type, varray, or LONG columns.

Instead of modifying my existing example to try and define the SDO_GEOMETRY object directly on the X, Y and Z fields in the CSV, I thought I would attempt to load the linear data associated with my article on executing shp2sdo from within the database.

The shp2sdo utility creates a SQL Loader control file with the actual data stored in-line inside the control file or externally in a data file. This is controlled by the “-d” parameter. I re-ran the tool so that I got the shapefile’s data in the separate data file. I the opened the CTL file, grabbed the load parameters and constructed an appropriate CREATE TABLE statement from it inside Oracle database. Here is that statement:

 CREATE OR REPLACE DIRECTORY TEMP_DATA_DIRECTORY AS 'C:\temp';

 -- GRANT READ ON DIRECTORY TEMP_DATA_DIRECTORY TO codesys;

 CREATE TABLE GEODATA250K_TAS_ROADS 
    ( 	 GID NUMBER(38,0), 
 	     CLASS VARCHAR2(20 BYTE), 
 	     FEATWIDTH VARCHAR2(20 BYTE), 
 	     NRN VARCHAR2(20 BYTE), 
	     PID VARCHAR2(20 BYTE), 
	     FEATREL VARCHAR2(20 BYTE), 
	     FORMATION VARCHAR2(20 BYTE), 
	     ATTRREL VARCHAR2(20 BYTE), 
	     RETIRED VARCHAR2(20 BYTE), 
	     SRN VARCHAR2(20 BYTE), 
	     SOURCE VARCHAR2(20 BYTE), 
	     SHAPE_LENG VARCHAR2(20 BYTE), 
	     NAME VARCHAR2(20 BYTE), 
	     TEXTNOTE VARCHAR2(20 BYTE), 
             FEATTYPE VARCHAR2(20 BYTE), 
	     PLANACC VARCHAR2(20 BYTE), 
	     SYMBOL VARCHAR2(20 BYTE), 
	     CREATED VARCHAR2(20 BYTE), 
	     PEANO NUMBER, 
	     GEOM MDSYS.SDO_GEOMETRY
 )
 ORGANIZATION EXTERNAL
 (TYPE ORACLE_LOADER
 DEFAULT DIRECTORY TEMP_DATA_DIRECTORY
 ACCESS PARAMETERS (
  CONTINUEIF NEXT(1:1) = '#'
  FIELDS TERMINATED BY '|'
  TRAILING NULLCOLS (
    GID INTEGER EXTERNAL,
    CLASS 	     NULLIF CLASS = BLANKS,
    FEATWIDTH 	 NULLIF FEATWIDTH = BLANKS,
    NRN  NULLIF NRN = BLANKS,
    PID  NULLIF PID = BLANKS,
    FEATREL 	 NULLIF FEATREL = BLANKS,
    FORMATION 	 NULLIF FORMATION = BLANKS,
    ATTRREL 	 NULLIF ATTRREL = BLANKS,
    RETIRED 	 NULLIF RETIRED = BLANKS,
    SRN  NULLIF SRN = BLANKS,
    SOURCE 	     NULLIF SOURCE = BLANKS,
    SHAPE_LENG 	 NULLIF SHAPE_LENG = BLANKS,
    NAME 	     NULLIF NAME = BLANKS,
    TEXTNOTE 	 NULLIF TEXTNOTE = BLANKS,
    FEATTYPE 	 NULLIF FEATTYPE = BLANKS,
    PLANACC 	 NULLIF PLANACC = BLANKS,
    SYMBOL 	     NULLIF SYMBOL = BLANKS,
    CREATED 	 NULLIF CREATED = BLANKS,
    PEANO,
    GEOM COLUMN OBJECT 
    (
      SDO_GTYPE       INTEGER EXTERNAL, 
      SDO_SRID        INTEGER EXTERNAL, 
      SDO_ELEM_INFO   VARRAY TERMINATED BY '|/' 
        (X            FLOAT EXTERNAL), 
      SDO_ORDINATES   VARRAY TERMINATED BY '|/' 
        (X            FLOAT EXTERNAL) 
    )
 )
 )
 LOCATION (TEMP_DATA_DIRECTORY:'GeoData250K_Tas_roads.dat'))
 REJECT LIMIT UNLIMITED;

On execution this is what happened.

 Error starting at line 4 in command:
 CREATE TABLE GEODATA250K_TAS_ROADS 
    (	 GID NUMBER(38,0), 
	     CLASS VARCHAR2(20 BYTE), 
	     FEATWIDTH VARCHAR2(20 BYTE), 
	     NRN VARCHAR2(20 BYTE), 
	     PID VARCHAR2(20 BYTE), 
	     FEATREL VARCHAR2(20 BYTE), 
	     FORMATION VARCHAR2(20 BYTE), 
	     ATTRREL VARCHAR2(20 BYTE), 
	     RETIRED VARCHAR2(20 BYTE), 
	     SRN VARCHAR2(20 BYTE), 
	     SOURCE VARCHAR2(20 BYTE), 
	     SHAPE_LENG VARCHAR2(20 BYTE), 
	     NAME VARCHAR2(20 BYTE), 
	     TEXTNOTE VARCHAR2(20 BYTE), 
	     FEATTYPE VARCHAR2(20 BYTE), 
	     PLANACC VARCHAR2(20 BYTE), 
	     SYMBOL VARCHAR2(20 BYTE), 
	     CREATED VARCHAR2(20 BYTE), 
	     PEANO NUMBER, 
	     GEOM MDSYS.SDO_GEOMETRY
 )
 ORGANIZATION EXTERNAL
 (TYPE ORACLE_LOADER
 DEFAULT DIRECTORY TEMP_DATA_DIRECTORY
 ACCESS PARAMETERS (
  CONTINUEIF NEXT(1:1) = '#'
  FIELDS TERMINATED BY '|'
  TRAILING NULLCOLS (
    GID INTEGER EXTERNAL,
    CLASS 	     NULLIF CLASS = BLANKS,
    FEATWIDTH 	 NULLIF FEATWIDTH = BLANKS,
    NRN  NULLIF NRN = BLANKS,
    PID  NULLIF PID = BLANKS,
    FEATREL 	 NULLIF FEATREL = BLANKS,
    FORMATION 	 NULLIF FORMATION = BLANKS,
    ATTRREL 	 NULLIF ATTRREL = BLANKS,
    RETIRED 	 NULLIF RETIRED = BLANKS,
    SRN  NULLIF SRN = BLANKS,
    SOURCE    	 NULLIF SOURCE = BLANKS,
    SHAPE_LENG 	 NULLIF SHAPE_LENG = BLANKS,
    NAME 	     NULLIF NAME = BLANKS,
    TEXTNOTE 	 NULLIF TEXTNOTE = BLANKS,
    FEATTYPE 	 NULLIF FEATTYPE = BLANKS,
    PLANACC 	 NULLIF PLANACC = BLANKS,
    SYMBOL 	     NULLIF SYMBOL = BLANKS,
    CREATED 	 NULLIF CREATED = BLANKS,
    PEANO,
    GEOM COLUMN OBJECT 
    (
      SDO_GTYPE       INTEGER EXTERNAL, 
      SDO_SRID        INTEGER EXTERNAL, 
      SDO_ELEM_INFO   VARRAY TERMINATED BY '|/' 
        (X            FLOAT EXTERNAL), 
      SDO_ORDINATES   VARRAY TERMINATED BY '|/' 
        (X            FLOAT EXTERNAL) 
    )
 )
 )
 LOCATION (TEMP_DATA_DIRECTORY:'GeoData250K_Tas_roads.dat'))
 REJECT LIMIT UNLIMITED
 Error at Command Line:24 Column:1
 Error report:
 SQL Error: ORA-30656: column type not supported on external organized table
 30656.0000 -  "column type not supported on external organized table"
 *Cause:    Attempt to create an external organized table with a column
            of type LONG, LOB, BFILE, ADT, or VARRAY.
 *Action:   These column types are not supported, change the DDL.

Note, as per the documentation, “These column types are not supported”. For non-point geometry data you must use other methods for loading.

I hope this is useful to someone.