Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Load step under Step in a Query Object loads the incoming workflow transfers processed data (after extraction and transformation) into a target system (File, Relational database, HDFS, No SQL / Columnar system etc.).

You need to select a target Intellicus connection (already created under Configure > Databases) on which data should be loaded. One of the important configurations is the deselecting of the ‘Read Only’ setting of the target system. If not de-selected, this will not show in the list of target system where the extracted data can be loaded.

There are target connection-specific properties required to load data into the system.

On Opening the Target Connection, you will see the option “Default.” When you select the default connection, it will correspond to the default connection that the user selected on the configure page. In such case, the Load function will work for the default connection.

...

You can fetch list of existing Tables/Files from the connection.

Loading into File/RDBMS

Figure 1: Loading into RDBMS

...

The properties of Load step (in case target system is a file system) are:

PropertyValuesCommentsTarget ConnectionSelect from listSelect a target connection on which data should be loadedTable/File Properties

Target Table/File NameSelect from listSelect target table/file name using the selected Target Connection.

Type in the table/file name if it does not exist. The table/file gets created under the selected target connection

If Table/File ExistsSelect from list:

 

Append Data

 

Delete Existing Data

 

Skip Load

...

system, such as a file system, relational database (RDBMS), HDFS, NoSQL systems like HBASE, or columnar databases. This step finalizes the ETL process by depositing data into its destination, ensuring it’s ready for analysis or reporting.

General Features of the Load Step

  1. Target Connection Selection:

    • Requires selection of a target connection configured under Configure > Databases.

    • Ensure the target system’s “Read Only” option is deselected; otherwise, the connection will not be available for loading.

    • Use the "Default" option to load data into the default connection specified during configuration.

  2. Run and Scheduling:

    • Query Objects with a Load step can be executed immediately or scheduled for one-time or recurring runs.

    • Execution logs any transformation and writes data into the specified target system.

  3. Error Handling:

    • If the Load step fails, a CSV file containing the failed records (<TargetTableName>_failed_records.csv) is created at the specified installation path.

    • The location of this CSV file is configured via the LOAD_STEP_FAILS_DUMP property in the ReportEngine_Default.properties file.

Target-Specific Properties

1. Loading into File/RDBMS

  • Target Connection: Select the target connection for loading data.

  • Table/File Properties:

    • Target Table/File Name: Choose an existing table/file or specify a new one (created during the load step).

    • If Table/File Exists: Define the action if the table/file already exists:

      • Append Data

      • Delete Existing Data

      • Skip Load

      • Drop and Create New Table/Delete and Create New File

 

      • Truncate Existing Data

Action to be performed in case table/file exists:

 

To append data in the end

 

To delete data in existing table

 

To not perform any action on table/file and skip load step

 

To drop the table and create new table /file with same name and new schema.

 

 

To delete data in existing table, but Admin user can roll back this data

EncodingSelect from listSelect the file encoding typeCompressionSelect from listSelect the compression type of network dataSuffix TimeStampCheck/UncheckCheck= suffix timestamp in the chosen format to table/file

 

Select from the available list

Data PropertiesIf Record ExistsSelect from list:

 

Insert Only

 

Update Only

 

 

Update or Insert

Action to be performed in case record exists:

 

Insert the new record

 

Update the existing record with the new

 

In case of an existing record, update it with the new record

Else insert the new record

Ignore Empty RowsCheck/UncheckCheck= ignores empty rows in table while loadingRow PropertiesLine SeparatorType YourselfSpecify the separator that is used to separate rows in the fileSplit After N RowsType YourselfSpecify the maximum number of rows to split the fileInclude HeaderCheck/UncheckCheck to include headers specified in the fileIgnore Empty RowsCheck/UncheckCheck to ignore empty rows specified in the fileColumn PropertiesFetch ColumnsClick the optionFetches updated columns of the existing table

You can see the field details of the data to be loaded under Column Properties section

Remove All ColumnsClick the optionRemoves all columns in the target tableEnclosureType YourselfSpecify the enclosure if the file contains enclosure within the fieldsSeparatorType YourselfSpecify the field separator for the fileError HandlingError ThresholdType YourselfSpecify count of error after which loading process should stop.

Any positive number>0 means stop processing after that many +1 errors i.e. if its value is 2, stop processing upon 3rd error.

Loading into NOSQL (HBASE)

The properties of Load step (in case target system is an HBASE system) are:

PropertyValuesCommentsTarget ConnectionSelect from listSelect a target connection on which data should be loadedTable PropertiesTarget Table NameSelect from listSelect target table name using the selected Target ConnectionIf Table ExistsSelect from list:

 

Append Data

 

Skip Load

 

 

 

Drop and Create New Table

Action to be performed in case table exists:

 

To append data in the end

 

To not perform any action on table and skip load step

 

 

To drop the table and create new table with same name and new schema

Suffix TimeStampCheck/UncheckCheck= suffix timestamp in the chosen format to table.

 

Select from the available list

Row PropertiesRow KeySelect from list of available fieldsUnique identifier/field for each rowDataTypeSelect from listData Type of the Row KeyError HandlingError ThresholdType YourselfSpecify count of error after which loading process should stop.

Any positive number>0 means stop processing after that many +1 errors i.e. if its value is 2, stop processing upon 3rd error

Column PropertiesFetch Column FamiliesClick the optionFetches updated column families of the existing table.

 

You can see the field details of the data to be loaded under Column Properties section

Remove All ColumnsClick the optionRemoves all columns in the target tableLoading into HDFS

The properties of Load step (in case target system is HDFS) are:

PropertyValuesCommentsTarget ConnectionSelect from listSelect a target connection on which data should be loadedFile Properties

 

 

Target File NameSelect from listSelect target file name using the selected Target Connection.

 

You should have Read/Write/Execute access permissions on the target directory to select target file from

If File ExistsSelect from list:

 

Append Data

 

Skip Load

 

 

Delete and Create New File

Action to be performed in case file exists:

 

To append data in the end

 

To not perform any action on file and skip load step

 

To delete existing file and create a new file with the same name

EncodingSelect from listOption to encode the file in available formats (ASCII, ISO-8859-1, UTF-8)CompressionSelect from listOption to create the file in compressed format (zip & gzip)Suffix TimeStampCheck/UncheckCheck= suffix timestamp in the chosen format to file.

Select from the available list

Row PropertiesLine SeparatorType YourselfSpecify a separator to distinguish rows/recordsSplit After N RowsType YourselfOption to split the output into multiple files.

Splitting will be performed on the basis of number of records. Blank/0 means no splitting. Any positive number means split the output after that many records

Include HeaderCheck/UncheckCheck= includes header from file/table.

This option will be applicable only for new file

Ignore Empty RowsCheck/UncheckCheck= ignores empty rows in table while loadingColumn PropertiesEnclosureType YourselfSpecify enclosure for column valuesSeparatorType YourselfSpecify the column separatorError HandlingError ThresholdType YourselfSpecify count of error after which loading process should stop.

Any positive number>0 means stop processing after that many +1 errors i.e. if its value is 2, stop processing upon 3rd error

You can run and schedule Query Objects which are of Load type from under Explorer.
Running a Query Object will execute the transformation and load the data into target system. You can also schedule the query object in once or recurring mode.

Load step can also be followed by the Data Source step further followed by Format step to be used for generating reports.

...

    • Encoding: Set the file encoding type.

    • Compression: Select a compression format (e.g., zip, gzip).

    • Suffix Timestamp: Add a timestamp suffix to the table/file name.

  • Data Properties:

    • If Record Exists: Define the handling of duplicate records:

      • Insert Only

      • Update Only

      • Update or Insert

  • Row Properties:

    • Line Separator: Specify the row separator.

    • Split After N Rows: Break the file into smaller parts after a set number of rows.

    • Include Header: Include column headers in the output file.

    • Ignore Empty Rows: Ignore rows with no data.

  • Column Properties:

    • Fetch Columns: Retrieve the updated columns of an existing table.

    • Remove All Columns: Delete all columns in the target table.

    • Enclosure: Specify the enclosure character for fields.

    • Separator: Define the field separator.

2. Loading into NoSQL (HBASE)

  • Target Connection: Select the HBASE target connection.

  • Table Properties:

    • Target Table Name: Choose an existing table or create a new one.

    • If Table Exists: Specify actions:

      • Append Data

      • Skip Load

      • Drop and Create New Table

    • Suffix Timestamp: Add a timestamp to the table name.

  • Row Properties:

    • Row Key: Choose a unique identifier for each row.

    • Data Type: Set the data type for the row key.

  • Column Properties:

    • Fetch Column Families: Retrieve the column families of an existing table.

    • Remove All Columns: Remove all columns in the target table.

3. Loading into HDFS

  • Target Connection: Choose the HDFS target connection.

  • File Properties:

    • Target File Name: Select or specify the target file.

    • If File Exists: Define actions:

      • Append Data

      • Skip Load

      • Delete and Create New File

    • Encoding: Choose encoding type (e.g., UTF-8, ASCII).

    • Compression: Select compression format (e.g., zip, gzip).

    • Suffix Timestamp: Add a timestamp to the file name.

  • Row Properties:

    • Line Separator: Define the row delimiter.

    • Split After N Rows: Split the output file after a specified number of rows.

    • Include Header: Include headers in the file.

    • Ignore Empty Rows: Ignore rows without data.

  • Column Properties:

    • Enclosure: Specify the enclosure character.

    • Separator: Define the column delimiter.

Post-Load Workflow

The Load step can be succeeded by the Data Source Step, followed by the Format Step, to prepare the loaded data for reporting. This enables seamless integration into subsequent Query Objects or report generation.