/
Load Step

Load Step

The Load Step in a Query Object workflow transfers processed data (after extraction and transformation) into a target system, such as a file system, relational database (RDBMS), HDFS, NoSQL systems like HBASE, or columnar databases. This step finalizes the ETL process by depositing data into its destination, ensuring it’s ready for analysis or reporting.

General Features of the Load Step

  1. Target Connection Selection:

    • Requires selection of a target connection configured under Configure > Databases.

    • Ensure the target system’s “Read Only” option is deselected; otherwise, the connection will not be available for loading.

    • Use the "Default" option to load data into the default connection specified during configuration.

  2. Run and Scheduling:

    • Query Objects with a Load step can be executed immediately or scheduled for one-time or recurring runs.

    • Execution logs any transformation and writes data into the specified target system.

  3. Error Handling:

    • If the Load step fails, a CSV file containing the failed records (<TargetTableName>_failed_records.csv) is created at the specified installation path.

    • The location of this CSV file is configured via the LOAD_STEP_FAILS_DUMP property in the ReportEngine_Default.properties file.

Target-Specific Properties

1. Loading into File/RDBMS

  • Target Connection: Select the target connection for loading data.

  • Table/File Properties:

    • Target Table/File Name: Choose an existing table/file or specify a new one (created during the load step).

    • If Table/File Exists: Define the action if the table/file already exists:

      • Append Data

      • Delete Existing Data

      • Skip Load

      • Drop and Create New Table/Delete and Create New File

      • Truncate Existing Data

    • Encoding: Set the file encoding type.

    • Compression: Select a compression format (e.g., zip, gzip).

    • Suffix Timestamp: Add a timestamp suffix to the table/file name.

  • Data Properties:

    • If Record Exists: Define the handling of duplicate records:

      • Insert Only

      • Update Only

      • Update or Insert

  • Row Properties:

    • Line Separator: Specify the row separator.

    • Split After N Rows: Break the file into smaller parts after a set number of rows.

    • Include Header: Include column headers in the output file.

    • Ignore Empty Rows: Ignore rows with no data.

  • Column Properties:

    • Fetch Columns: Retrieve the updated columns of an existing table.

    • Remove All Columns: Delete all columns in the target table.

    • Enclosure: Specify the enclosure character for fields.

    • Separator: Define the field separator.

2. Loading into NoSQL (HBASE)

  • Target Connection: Select the HBASE target connection.

  • Table Properties:

    • Target Table Name: Choose an existing table or create a new one.

    • If Table Exists: Specify actions:

      • Append Data

      • Skip Load

      • Drop and Create New Table

    • Suffix Timestamp: Add a timestamp to the table name.

  • Row Properties:

    • Row Key: Choose a unique identifier for each row.

    • Data Type: Set the data type for the row key.

  • Column Properties:

    • Fetch Column Families: Retrieve the column families of an existing table.

    • Remove All Columns: Remove all columns in the target table.

3. Loading into HDFS

  • Target Connection: Choose the HDFS target connection.

  • File Properties:

    • Target File Name: Select or specify the target file.

    • If File Exists: Define actions:

      • Append Data

      • Skip Load

      • Delete and Create New File

    • Encoding: Choose encoding type (e.g., UTF-8, ASCII).

    • Compression: Select compression format (e.g., zip, gzip).

    • Suffix Timestamp: Add a timestamp to the file name.

  • Row Properties:

    • Line Separator: Define the row delimiter.

    • Split After N Rows: Split the output file after a specified number of rows.

    • Include Header: Include headers in the file.

    • Ignore Empty Rows: Ignore rows without data.

  • Column Properties:

    • Enclosure: Specify the enclosure character.

    • Separator: Define the column delimiter.

Post-Load Workflow

The Load step can be succeeded by the Data Source Step, followed by the Format Step, to prepare the loaded data for reporting. This enables seamless integration into subsequent Query Objects or report generation.

Related content

Copyright Kyvos, Inc. All rights reserved.