Document toolboxDocument toolbox

Formatting, sorting, filtering file data while registering datasets

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


When you are registering datasets, you can specify how data is formatted, which data is included, how it is sorted, and more.

Note

Kyvos does not support some special characters/values. Either remove them from the source data or contact the Kyvos Support team for assistance.
Example of unsupported values: [New line character 'n', HTML tags, KYVOS_NULL_VALUE, !==!, ~, !__!, !@!, !@@!]

Formatting data

Kyvos allows you to change the data type for columns while registering datasets. The supported Data Types are char, date, and number.

If it is a numeric field, then select Data Sub Type from:

  • Long: If the column value contains integers values, use this option.

  • Float: Float is useful when you have large decimal numbers. It can store about 7 digits after the decimal point, and rounds off decimal places beyond that; thus decreasing precision. This data type should never be used for precise values, such as currency. 

  • Double :   The double data type can store about 15 digits after the decimal point. It's almost double the float, so we can say that float is a single-point precision, and double is a double-point precision.

  • Decimal: For a decimal data type, ensure the following: 

    • If the data type scale is greater than 0, and the precision value is greater than 16, the Numeric data type will be automatically mapped to the Decimal data type.

    • If the data type has a scale of 0 and the precision value is greater than 21, the Numeric data type will be automatically mapped to the Decimal data type.

Note

  • If the raw data contains decimal values, Kyvos detects it as a  Double, by default. If you do not want any precision loss post semantic model process-in querying, then set the  Data Type  of the respective field as  Decimal  and set the correct  precisions  and  scale  to avoid any data inaccuracy and precision loss. Semantic model process size may increase if the data type is  Decimal.

  • From the Kyvos 2023.1 release onwards, the Decimal data type is applicable only when registering a dataset using a BigQuery connection. The Decimal data type will only be applicable to new Register datasets, while the previous data types (long, float, and double) will be used for existing register files.

Working with Dates

Generally, date columns store dates as strings. Kyvos then reads the date using the default format YYYY/MM/DD.  If the date format of the source data doesn't match the default format, then Kyvos will throw an error while validating the registered dataset. To improve performance and avoid errors, you should specify the format string for the date.

To match the data type and data value format, perform the following steps. 

  • The custom date format should be in line with the underlying data. 

  • The register dataset format of date columns should match with the date format of the underlying file’s/table’s/SQL’s date format.
    For example, if underlying data is in format yyyy-MM-dd, you must mark the format at Kyvos register file at yyyy-MM-dd.   

  • For raw data querying, Kyvos does not  support raw data in the following format:

    8/6/2021 17:57:100 (MM/DD/YYYY HH: SS:000)

    You need to change the format to 8/6/2021 at 17:57, and the format on the registered dataset should be set as MM/dd/yyyy HH: ss

Note

From the Kyvos 2023.1 release onwards, for any timestamp data field, the .S value for milliseconds will be auto-suggested. This is applicable to all warehouses. 

Sort and filter

When you register a dataset, you can specify details about what data is included and how it is organized. You can choose to view the data as a column list or in a preview table and sort or filter the data before it gets used.



To specify a table is a dimension or a fact, select  Dimension  or  Fact  from the list. Set a file as a fact to use it as a fact table in relationships. 

Use the icons to view either the Column list or the Preview table.

Click  Sort Columns  and choose by Name, Data type, or Visibility. For  Name, specify ascending or descending order. You can also clear the sort settings. A checkmark indicates which options are currently selected.

Click  Filter Columns  and choose by Data type Data SubType or Visibility.

  • For Data type,  specify char, number, or date. 

  • For Data SubType, when using numeric values, you can specify double, long, float, or decimal.

    • Long: If the column value contains integers values, use this option.

    • Float: Float is useful when you have large decimal numbers. It can store about 7 digits after the decimal point, and rounds off decimal places beyond that; thus decreasing precision. This data type should never be used for precise values, such as currency. 

    • Double:  The double data type can store about 15 digits after the decimal point. It's almost double the float, so we can say that float is a single-point precision, and double is a double-point precision.

  • For Visibility, can choose to filter only Visible or Hidden.

You can also remove filtering.

To sort columns, perform the following steps. 

  1. From the Toolbox, select the object type, such as  Files.

  2. Select a registered file.

  3. Click the icon to view the Column list if needed.

  4. Click  Sort columns.

  5. For  Name, choose Ascending or Descending to set the order.

  6. Click  Data Type  to sort by data type.

  7. Click  Visibility  to sort by visibility. 

  8. To remove sorting, select  Remove sorting.

To filter columns, perform the following steps. 

  1. From the Toolbox, select the object type, such as  Files.

  2. Select a registered dataset.

  3. Click the icon to view the Column list if needed.

  4. Click  Filter columns.

  5. To filter by data type, choose  Data Type  and select Char, Number, or Date.

  6. For numeric data, to filter by data subtype, choose  Data SubType  and Double, Long, Float, or Decimal.

  7. Click  Visibility  to filter by visibility. 

  8. To remove filtering, select  Remove filtering.

To filter data, perform the following steps. 

  1. From the Toolbox, select the object type, such as  Files.

  2. Select a registered dataset.

  3. Click the icon to view the Column list if needed.

  4. Click  Filter data.

  5. Enter the field, criteria, compare value, value, and to create complex filters, use parentheses, And or Or, and click the  Plus icon  to add more additional criteria.

  6. Click  Apply. See Using filters to learn more
            For  Ex: You want to filter the data as per the nation "Canada". 
                         a. Click on  column c_nation.
                         b. Click on  Filter data.
                         c.  Fill in the details.
                         d.  Click on  Apply.
                         e.  Click on  Preview table 
    Now the table is shown below.

Column properties

Depending on your data source, when you are viewing the column list, you can select a column to view Column properties. See Formatting columns.

If you have a large number of columns, use Search to quickly find a column.

Use Column properties to hide a column, set the relationship key to primary or foreign, edit the field name, and set the data type. If you choose a numeric datatype, use the drop-down list to specify whether it is a double, long, float, or decimal.  

For numeric data types:

  • Long: If the column value contains integers values, use this option.

  • Float: Float is useful when you have large decimal numbers. It can store about 7 digits after the decimal point, and rounds off decimal places beyond that, thus decreasing precision. This data type should never be used for precise values, such as currency. 

  • Double:  The double data type can store about 15 digits after the decimal point. It's almost double the float, so we can say that float is a single-point precision, and double is a double-point precision.

  • Decimal:  Use for fractional numbers.


Use the incremental identifier checkbox to identify incremental data for successive processes. The current value will be used to filter the next process data.

Click the Actions menu (...) to the right of Format to set the format for the data. See Formatting measures.

File properties

You can also specify advanced properties for the registered file using the Actions menu (...).

See the  Setting properties section to know more.

 

Copyright Kyvos, Inc. All rights reserved.