Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)
Datasets identify the data used from a data source so that Kyvos can access and work with the data. You can register a dataset or a folder. Kyvos supports advanced datatypes like maps, arrays, and structs in HCatalog tables. You can fetch these with the help of a formula step in datasets using two formula functions to extract these data types. You can also apply some formatting and filters to the file when you register it. Formatting standardizes the appearance of data such as how dates are displayed. Filtering allows you to exclude data from the original source file that you don't plan to use.
If your instance of Kyvos is configured via the portal.properties
file to support it; you can create a dataset with a Presto connection. However, you can't create a dataset or semantic model process using this dataset.
When you specify Lookup as you register the dataset, the complete data is processed for both full and incremental builds regardless of the data source type. See Using lookup to learn more.
You can set up properties to control how Kyvos handles data. For example, you can set a property to ignore invalid rows during a transformation process.
Important
Your account must have the appropriate security access to the underlying databases and tables to do some of the tasks related to these files.
You can avoid the dependency of creating a dataset by writing SQL code when registering a dataset.
Supported encoding types include ASCII, ISO-8859-1, and UTF-8. LZO compression is supported.
Note
The Kyvos 2023.1 release includes a new RFSQL-specific parser where the existing parser fails to parse SQL when filtering on a date column with >= or <= and explicit type cast. If this occurs, add the kyvos.rf.sqlparser.enabled property and set its value to true at the Hadoop connection.
Topics in this section are:
Also refer to the section Common actions to learn more about the actions that you can perform on the datasets.