Applies to: (tick) Kyvos Enterprise  (tick) Kyvos Cloud (SaaS on AWS) (tick) Kyvos AWS Marketplace

(tick) Kyvos Azure Marketplace   (tick) Kyvos GCP Marketplace (tick) Kyvos Single Node Installation (Kyvos SNI)


Prerequisites for semantic model process size and time optimization  

Optimizations through semantic model design 

Check the semantic model job summary and thereafter analyze the following:  

  1. Explore the possibility to Reduce the Number of Dimensions by:    

    1. Combining dimensions:  At the dataset level, whenever possible, combine dimensions with a fewer number of attributes (1-3) into one dimension to reduce the number of dimensions. For example, some dimensions can be merged at the register file level to reduce the number of dimensions at the semantic model level.  

    2. Multiple hierarchies: Consider using multiple hierarchies if you need two types of time data, such as year-month-day and year-quarter-month-day, for different purposes. Or if you need two types of location data such as division-region-district-location and state-county-city-location.

    3. Dimension merging: At the time of semantic model processing, the dimensions and combinations of different dimensions are pre-aggregated/materialized and stored on a disk. If the number of dimensions in a semantic model is too high, the size of materialization becomes large to accommodate existing architecture. This leads to high process time, increased size on disk, and higher read time while querying.
      You can merge two or more dimensions related to the same or different transformation into a single dimension. The subset of facts to which these multiple transformations are related should be the same. This allows saving disk space, reduces process time to materialize semantic models, and improves query performance.

  2. Revisit Distinct Count Measures

    1. Explore the possibility to see if  any Distinct count measures  can be removed if not needed.

    2. Or Accurate counts can be converted to Approximate Counts?

    3. Explore the possibility of using Boundary Based Distinct Counts  wherever possible for High Cardinalities. 

    4. Also, explore if  Sum  or Count  functions can be used   instead of  Distinct Count to derive the same requirement.   

Optimization through aggregation strategy

On the Aggregation Strategy tab in semantic model designer, you can modify the aggregation properties to control the dimension combinations and materializations.  

Optimization through Spark properties 

Users should  have  prior knowledge of the available cluster resources (nodes, cores, memory)  and  should observe each level time while doing the semantic model processing. This will help them to identify the bottlenecks and Optimize the time further by using the below spark properties  to increase task parallelism and use Spark distributed environment within the confinement of given resources.  

Note

For Azure  Databricks  based environment you need  explicitly  define/modify the spark properties  in Databricks Advance Spark Configuration.

Semantic model data replication optimization through Cloud Native CLI

Kyvos supports multiple flows for copying semantic model metadata from the DFS to the local disk.

For cloud-based clusters, using the Cloud Native CLI optimizes the performance. You can configure this through the Connections page.

Property name: kyvos.build.datacopy.useCLI

Description: This property is used to configure whether cloud-native CLI should be used for copying semantic model data. If the CLI command fails in copying the data, it will fall back to the existing Hadoop API

Possible Values:

Default value: NONE

Scope: BI Server, Query Engine

Comes into effect: This is hot deployed at BI Server but requires a restart of Query Engines if CLI is used to copy cuboids.

Impact Area: Metadata copy at BI Server and cuboid copying at Query Engine

Environment: Applicable in all cloud environments

 Note

We recommend using METADATA value for the kyvos.build.datacopy.useCLI property in Kyvos 2022.1 version.

Optimization through cluster configuration