Working with non-materialized or raw data semantic models

Applies to:Kyvos Enterprise Kyvos Cloud (Managed Services on AWS) Kyvos Azure Marketplace

Kyvos AWS Marketplace Kyvos Single Node Installation (Kyvos SNI) Kyvos Free (Limited offering for AWS)

There are some differences between using Kyvos with a materialized or a non-materialized semantic model. See MOLAP, ROLAP, and HOLAP for the distinction between the different semantic model types.

Kyvos allows you to work with raw data that is not processed into a semantic model. When you use raw data, Kyvos quickly analyzes the data to know what types of data it contains.

Info

A non-materialized semantic model doesn't include pre-calculated query results or summarized data but uses an external SQL engine to calculate values at the time of a query.

Consider using raw data when:

The data is small (a few GB in size).
When dimensions have high cardinality and pre-aggregating, the data is unlikely to improve performance.
You want to start exploring the data quickly, and once you know how to use it, you can plan to process it later.
You want to generate drill-through reports

The value of RAW_DATA_QUERYING_ENABLED property should be set to true in the olapengine.properties on the Kyvos Manager.

Kyvos Supported SQL engines for raw-data querying and drill through

Kyvos supports multiple SQL connections for raw data querying. You must select Enable Raw Data Querying from the semantic model Properties to support working with raw data. Once enabled, you can select the required SQL connection from the Semantic Model Designer page. You can select the required SQL connection from the Raw Data SQL Connection list to use the connection for raw data querying. This feature enables you to select the SQL connection at the cube level.

For example, consider a case where the Snowflake connection is selected as the default SQL connection for raw data querying. Now, if you want to use the Hive connection, where underlying semantic model entities are created for raw data querying, you can select the System Identified option displayed in the Raw Data SQL connection list.

Here's a list of Kyvos-supported SQL engines for raw-data querying and drill through:

Presto
Spark
Hive
Snowflake
Athena
BigQuery

Info

To use a Spark or Hive connection, the Is Default SQL Engine option must be enabled in the Hadoop connection.
For a Spark connection, you need to provide the Spark thrift server connection string URL. For example :jdbc:hive2://<IP Address>:<port>/default;transportMode=http;httpPath=cliservice;principal=hive/<installation node>.kyvostest.com@KYVOSTEST.COM

Points to remember

When working with raw data semantic models, there are certain limitations:

Crosstabs are not available. However, you can view a table view of raw data.
Once you process a semantic model, the crosstab option becomes available.
You won't see the calculated measures in raw data semantic models.
If you are working with a time dimension with a pre-defined hierarchy such as Year - Month - Day, you can specify whether month data should combine years' worth of data into a single month.
You cannot use a dataset over a DFS file directly for raw data browsing. To browse a non-built semantic model (using raw data SQL engine) register the dataset with HCatalog.
Kyvos supports single file, star schema, multifact, and snowflake schemas for a non-built semantic model.

Enabling raw data querying on the semantic model

To allow raw data querying on a semantic model, perform the following steps.

Navigate to the semantic model where you want to enable raw data querying.
From the Semantic Model Properties, select any of the following from the Raw Data Querying option to control the ability to query the semantic model without processing it.
Select from the following options:
- Enable: Allows raw data querying using an external SQL engine for all types of queries. After enabling raw data querying on a particular semantic model, the Raw Data SQL Connection list is displayed. You can select the required SQL connection to execute raw data queries. This option applies only to non-materialized semantic models.
  In the Raw Data SQL Connection list, select any one of the following:
  - System Default: Use this option to select the default SQL connection for raw data querying.
  - System Identified: Use this option to select the connections on which underlying semantic model entities are created. This is applicable only when the semantic model entities are created on a single connection. Otherwise, the default SQL connection is used.
  - Custom connections: Use this option to select the available SQL connections that can be used for raw data querying.
    Info
    
    If you select the System Identified or any custom SQL connection from the Raw Data SQL Connection list, Kyvos replaces the default SQL connection (if enabled while creating a connection) with the selected SQL connection.
- Disable: Disables raw data querying for the semantic model.
- Only For Drillthrough: Allows raw data querying only for drill-through queries. This option applies only to materialized semantic models.

You can use raw data like you use data in a semantic model, but you can also use data profiles of raw data to see how to create better data relationships or semantic model designs. When creating a worksheet using raw data, you can specify whether it should show summarized data or detailed data.

You can pre-cache metadata queries for a raw data semantic model. This allows you to auto-populate the metadata queries in advance for a non-materialized semantic model before exposing it to a business user.

Info

If you don't see the option Raw Data Querying in the Properties pane, you need to create a connection to the raw data. See Configuring the connection setting for raw data.

Table view

When a semantic model with raw data is viewed as a table, all levels, attributes, and measures are seen as individual columns. Consider a semantic model design that has the following two measures defined over the same underlying column, `sales` from the fact table:

min_sales = min(sales)

max_sales = max(sales)

These are seen as two columns of min(sales) and max(sales).

If you create a query that asks for an aggregated value on a column, which is an attribute or level in the semantic model, the query behaves differently before and after the semantic model is processed.

Exporting raw data

You can export the table data in CSV format.

Related topics