Formatting, sorting, filtering file data while registering datasets

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

When you are registering datasets, you can specify how data is formatted, which data is included, how it is sorted, and more.

Note

Kyvos does not support some special characters/values. Either remove them from the source data or contact the Kyvos Support team for assistance.
Example of unsupported values: [New line character 'n', HTML tags, KYVOS_NULL_VALUE, !==!, ~, !__!, !@!, !@@!]

Formatting data

Kyvos allows you to change the data type for columns while registering datasets. The supported Data Types are char, date, and number.

If it is a numeric field, then select Data Sub Type from:

Long: If the column value contains integers values, use this option.
Float: Float is useful when you have large decimal numbers. It can store about 7 digits after the decimal point, and rounds off decimal places beyond that; thus decreasing precision. This data type should never be used for precise values, such as currency.
Double : The double data type can store about 15 digits after the decimal point. It's almost double the float, so we can say that float is a single-point precision, and double is a double-point precision.
Decimal: For a decimal data type, ensure the following:
- If the data type scale is greater than 0, and the precision value is greater than 16, the Numeric data type will be automatically mapped to the Decimal data type.
- If the data type has a scale of 0 and the precision value is greater than 21, the Numeric data type will be automatically mapped to the Decimal data type.

Note

If the raw data contains decimal values, Kyvos detects it as a Double, by default. If you do not want any precision loss post cube build-in querying, then set the Data Type of the respective field as Decimal and set the correct precisions and scale to avoid any data inaccuracy and precision loss. Cube build size may increase if the data type is Decimal.
From the Kyvos 2023.1 release onwards, the Decimal data type is applicable only when registering a dataset using a BigQuery connection. The Decimal data type will only be applicable to new Register datasets, while the previous data types (long, float, and double) will be used for existing register files.

Working with Dates

Generally, date columns store dates as strings. Kyvos then reads the date using the default format YYYY/MM/DD. If the date format of the source data doesn't match the default format, then Kyvos will throw an error while validating the registered dataset. To improve performance and avoid errors, you should specify the format string for the date.

To match the data type and data value format, perform the following steps.

The custom date format should be in line with the underlying data.
The register dataset format of date columns should match with the date format of the underlying file’s/table’s/SQL’s date format.
For example, if underlying data is in format yyyy-MM-dd, you must mark the format at Kyvos register file at yyyy-MM-dd.
For raw data querying, Kyvos does not support raw data in the following format:
8/6/2021 17:57:100 (MM/DD/YYYY HH: SS:000)
You need to change the format to 8/6/2021 at 17:57, and the format on the registered dataset should be set as MM/dd/yyyy HH: ss

Note

From the Kyvos 2023.1 release onwards, for any timestamp data field, the .S value for milliseconds will be auto-suggested. This is applicable to all warehouses.

Sort and filter

When you register a dataset, you can specify details about what data is included and how it is organized. You can choose to view the data as a column list or in a preview table and sort or filter the data before it gets used.

To specify a table is a dimension or a fact, select Dimension or Fact from the list. Set a file as a fact to use it as a fact table in relationships.

Use the icons to view either the Column list or the Preview table.

Click Sort Columns and choose by Name, Data type, or Visibility. For Name, specify ascending or descending order. You can also clear the sort settings. A checkmark indicates which options are currently selected.

Click Filter Columns and choose by Data type Data SubType or Visibility.

For Data type, specify char, number, or date.
For Data SubType, when using numeric values, you can specify double, long, float, or decimal.
- Long: If the column value contains integers values, use this option.
- Float: Float is useful when you have large decimal numbers. It can store about 7 digits after the decimal point, and rounds off decimal places beyond that; thus decreasing precision. This data type should never be used for precise values, such as currency.
- Double: The double data type can store about 15 digits after the decimal point. It's almost double the float, so we can say that float is a single-point precision, and double is a double-point precision.
For Visibility, can choose to filter only Visible or Hidden.

You can also remove filtering.

To sort columns, perform the following steps.

From the Toolbox, select the object type, such as Files.
Select a registered file.
Click the icon to view the Column list if needed.
Click Sort columns.
For Name, choose Ascending or Descending to set the order.
Click Data Type to sort by data type.
Click Visibility to sort by visibility.
To remove sorting, select Remove sorting.

To filter columns, perform the following steps.

From the Toolbox, select the object type, such as Files.
Select a registered dataset.
Click the icon to view the Column list if needed.
Click Filter columns.
To filter by data type, choose Data Type and select Char, Number, or Date.
For numeric data, to filter by data subtype, choose Data SubType and Double, Long, Float, or Decimal.
Click Visibility to filter by visibility.
To remove filtering, select Remove filtering.

To filter data, perform the following steps.

From the Toolbox, select the object type, such as Files.
Select a registered dataset.
Click the icon to view the Column list if needed.
Click Filter data.
Enter the field, criteria, compare value, value, and to create complex filters, use parentheses, And or Or, and click the Plus icon to add more additional criteria.
Click Apply. See Using filters to learn more
For Ex: You want to filter the data as per the nation "Canada".
a. Click on column c_nation.
b. Click on Filter data.
c. Fill in the details.
d. Click on Apply.
e. Click on Preview table
Now the table is shown below.

Column properties

Depending on your data source, when you are viewing the column list, you can select a column to view Column properties. See Formatting columns.

If you have a large number of columns, use Search to quickly find a column.

Use Column properties to hide a column, set the relationship key to primary or foreign, edit the field name, and set the data type. If you choose a numeric datatype, use the drop-down list to specify whether it is a double, long, float, or decimal.

For numeric data types:

Long: If the column value contains integers values, use this option.
Float: Float is useful when you have large decimal numbers. It can store about 7 digits after the decimal point, and rounds off decimal places beyond that, thus decreasing precision. This data type should never be used for precise values, such as currency.
Double: The double data type can store about 15 digits after the decimal point. It's almost double the float, so we can say that float is a single-point precision, and double is a double-point precision.
Decimal: Use for fractional numbers.

Use the incremental identifier checkbox to identify incremental data for successive processes. The current value will be used to filter the next process data.

Click the Actions menu (...) to the right of Format to set the format for the data. See Formatting measures.

File properties

You can also specify advanced properties for the registered file using the Actions menu (...).

See the Setting properties section to know more.