Viscosity's Blog

Loading Fixed Width Data into ATP/ADW with DBMS_CLOUD

Written by Sean Scott | Mar 7, 2020 1:55:15 AM

Oracle Autonomous Databases are great, but there are some adjustments experienced developers and administrators need to learn to fully leverage the power of the Oracle Cloud. This is particularly true when it comes to loading data from files into external tables.

On a classic database we have access to traditional filesystems where files can be staged, and access to tools like SQL*Loader for processing file contents into the database. An Autonomous Database doesn’t have a filesystem we can access. Instead, there’s Object Storage.

Processing the contents of files in an object store is done with the DBMS_CLOUD package, and reviewing the documentation reveals procedures for copying data and even creating external tables. The syntax for the CREATE_EXTERNAL_TABLE procedure will look familiar to those who’ve worked with SQL*Loader or used the ORGANIZATION EXTERNAL directive for creating an external table.

Oracle provides a number of references for using the COPY_DATA and CREATE_EXTERNAL_TABLE subprograms to load data into an Autonomous Database. Nilay Panchal has one I’ve used more than once, “So you have your CSV, TSV, and JSON data lying in your Oracle Cloud Object Store. How do you get it over into your Autonomous Database?

But… fixed width? Why no fixed width?

Fixed width seems like a no-brainer, but I looked and… didn’t find anything. I found lots of documentation and examples for loading CSV files and of course the Oracle documentation for DBMS_CLOUD itself. But those documents mostly end with an example of how to process CSV and then link to the Loader Access driver specification.

Needless to say, I tried a variety of things that I thought would work since they worked in an external table definition, but didn’t, namely, RECORDS FIXED. An Oracle contact indicated that fixed-width files are not supported by DBMS_CLOUD. It looked like the file would need to be loaded as a single, long field, then processed with SUBSTR in SQL.

Convinced that wasn’t the final-final answer, I did some poking and prodding at the interface and present The Definitive (and perhaps only) Guide to Loading Fixed-Width Data Into an Autonomous Database using DBMS_CLOUD.


Let’s generate some fixed data from DBA_OBJECTS:

select lpad(to_char(object_id), 10, ' ')
|| rpad(substr(object_name, 1, 30), 30, ' ')
|| rpad(object_type, 25, ' ')
|| to_char(created, 'YYMMDD')
|| to_char(created, 'YYYYMMDD')
|| to_char(last_ddl_time, 'YYYYMMDD')
|| rpad(status, 10, ' ')
from dba_objects where rownum < 51

Upload the file to an OCI bucket and create the necessary credentials and grants for a user to process the data, then create an external table to match this:

begin
dbms_cloud.create_external_table(
table_name => 'FIXED_WIDTH_OBJECTS'
, credential_name => 'ATBDB_CRED'
, file_uri_list => 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/.../fixedwidth.txt'
, format => json_object('trimspaces' value 'rtrim', 'skipheaders' value '1', 'dateformat' value 'YYYYMMDD')
, field_list => 'object_id (1:10) char
, object_name (11:40) char
, object_type (41:65) char
, created_date1 (66:71) date mask "YYMMDD"
, created_date2 (72:79) date
, last_ddl_time (80:87) date
, status (88:97)'
, column_list => 'object_id number
, object_name varchar2(30)
, object_type varchar2(25)
, status varchar2(10)
, created_date1 date
, created_date2 date
, last_ddl_time date');
end;
/

 

This demonstrates the field lists and column lists remain independent in DBMS_CLOUD (as expected) and a default date format can be overridden by date masking at the field level. Checking the length of data in the result shows the trimspaces option removes trailing spaces, and the ID column is converted to a number. All is good.

 

Now here’s the interesting part.

The SQL to create the table displayed from SQL*Developer or DBMS_METADATA.GET_DDL is an ordinary-looking external table specification:

 CREATE TABLE "ADMIN"."FIXED_WIDTH_OBJECTS"
( "OBJECT_ID" NUMBER,
"OBJECT_NAME" VARCHAR2(30 BYTE) COLLATE "USING_NLS_COMP",
"OBJECT_TYPE" VARCHAR2(25 BYTE) COLLATE "USING_NLS_COMP",
"STATUS" VARCHAR2(10 BYTE) COLLATE "USING_NLS_COMP",
"CREATED_DATE1" DATE,
"CREATED_DATE2" DATE,
"LAST_DDL_TIME" DATE
) DEFAULT COLLATION "USING_NLS_COMP"
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "DATA_PUMP_DIR"
ACCESS PARAMETERS
( RECORDS IGNORE_HEADER=1 DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE READSIZE=10000000
FIELDS TERMINATED BY '|' RTRIM DATE_FORMAT DATE MASK 'YYYYMMDD' ( object_id (1:10) char
, object_name (11:40) char
, object_type (41:65) char
, created_date1 (66:71) date mask "YYMMDD"
, created_date2 (72:79) date
, last_ddl_time (80:87) date
, status (88:97) )
)
LOCATION
( 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/.../fixedwidth.txt'
)
)
PARALLEL ;


Notice the ACCESS PARAMETERS section. Fields terminated by ‘|’? Where did that come from? That specifies a delimited file, not something fixed width. I would have expected just the RECORDS DELIMITED BY NEWLINE (or an ordinary RECORDS FIXED). I also didn’t find any documentation describing the IGNORE_HEADER option, though there is a reference to error messages where it’s used alongside SKIP. Sure, it seems pretty obvious but maybe deserving of a paragraph in the loader spec.

Despite the odd DDL, the data is loaded and everyone is happy.