Tuning and optimizing semantic model

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

Prerequisites for semantic model process size and time optimization

The semantic model should be built for 1-2 partitions of base partitions. For example, If the partition is on Day then 1-2 Day, Or for week Partition 1-2 weeks and so on.
You should have at least 10-20 different business queries available to be executed on the semantic model process and get the query performance numbers.

Optimizations through semantic model design

Check the semantic model job summary and thereafter analyze the following:

Explore the possibility to Reduce the Number of Dimensions by:
1. Combining dimensions: At the dataset level, whenever possible, combine dimensions with a fewer number of attributes (1-3) into one dimension to reduce the number of dimensions. For example, some dimensions can be merged at the register file level to reduce the number of dimensions at the semantic model level.
2. Multiple hierarchies: Consider using multiple hierarchies if you need two types of time data, such as year-month-day and year-quarter-month-day, for different purposes. Or if you need two types of location data such as division-region-district-location and state-county-city-location.
3. Dimension merging: At the time of semantic model processing, the dimensions and combinations of different dimensions are pre-aggregated/materialized and stored on a disk. If the number of dimensions in a semantic model is too high, the size of materialization becomes large to accommodate existing architecture. This leads to high process time, increased size on disk, and higher read time while querying.
  You can merge two or more dimensions related to the same or different transformation into a single dimension. The subset of facts to which these multiple transformations are related should be the same. This allows saving disk space, reduces process time to materialize semantic models, and improves query performance.
Revisit Distinct Count Measures
1. Explore the possibility to see if any Distinct count measures can be removed if not needed.
2. Or Accurate counts can be converted to Approximate Counts?
3. Explore the possibility of using Boundary Based Distinct Counts wherever possible for High Cardinalities.
4. Also, explore if Sum or Count functions can be used instead of Distinct Count to derive the same requirement.

Optimization through aggregation strategy

On the Aggregation Strategy tab in semantic model designer, you can modify the aggregation properties to control the dimension combinations and materializations.

Selective dimension materialization: Use this property to control dimension combinations to be pre-aggregated in a semantic model process. You can specify the dimension names for materialization using the property dialog. In each dimension combination, dimensions are kept in a specific order (defined in property kyvos.build.dimension.order ).
While choosing dimension combinations to materialize using this property, only the combinations starting with the selected dimension(s) are materialized resulting in reduce semantic model size and process time.
Reducing aggregation can impact query performance so you need to cautiously identify which dimensions can be selectively materialized.
Selective hierarchy materialization: Use this property to specify the highest level of a dimension to be materialized, allowing you to reduce the semantic model process time and size. Levels higher than the specified level will be aggregated at run time. Dimensions not specified in this property will be materialized based on default settings. You can also specify to materialize individual levels if needed. The property value comes into effect after a full semantic model process.
For Example :
Consider a TIME dimension with 4 levels of hierarchy as YEAR, QTR, MONTH, DAY. If you select QTR, Kyvos will pre-aggregate only the QTR and DAY (as DAY is the lowest level). Any query containing YEAR will be served from QTR, while MONTH will be served from DAY, and they will be aggregated at run time. To materialize YEAR, you can select it individually too.
Recommendation Aggregates Configurations: Click Recommend me to view a recommended strategy including base and subpartitions, recommendation details, and reasons. Also, you can get aggregation strategy recommendations. These recommendations are based on how the data is being used for querying. Kyvos automatically recommends aggregates based on its internal logic, which will improve performance and displays the number of recommendations.  See Using aggregates for additional details.

Optimization through Spark properties

Users should have prior knowledge of the available cluster resources (nodes, cores, memory) and should observe each level time while doing the semantic model processing. This will help them to identify the bottlenecks and Optimize the time further by using the below spark properties to increase task parallelism and use Spark distributed environment within the confinement of given resources.

kyvos.build.spark.levelJob.tasks: Use this property to configure the number of tasks for reducing stage of Level1 and Level_DistCount jobs for Full or/and Incremental semantic model process job when executing through Spark engine. You can set this property to any positive integer and thereafter that number of tasks will get launched for the reducer stage to increase parallel tasks if the required resources are available on the cluster.
Default value is automatic based on data loads.
spark.dynamicAllocation.minExecutors: Use this spark property sets the lower bound for the number of executors if the dynamic allocation is enabled. See details.
spark.yarn.executor.memoryOverhead: Use this spark property to set the amount of off-heap memory (in megabytes) to be allocated per executor. This memory accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%). See details.
spark.driver.memory: Use this property to set the amount of memory to be used for the driver process, i.e., where SparkContext is initialized ( e.g., 1g, 2g).
kyvos.spark.executor.memory.level1: Use this property t o set the spark executor memory for level1 job(s) launched during the full and/or incremental semantic model process.
kyvos.spark.executor.cores.level1: Use this property to set the spark executor cores for level1 job(s) launched during the full and/or incremental semantic model process.
spark.executor.memory: Use this property to set the amount of memory to be used per executor process ( e.g., 2g, 8g). See details.
spark.executor.cores: Use this property to set the number of cores to be used on each executor. This property is for YARN and standalone mode only in spark. In standalone mode, setting this parameter allows an application to run multiple executors on the same worker if there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.  See details.
spark.dynamicAllocation.maxExecutors: Use this property to set the upper bound for the number of executors if the dynamic allocation is enabled. See details.

Note

For Azure Databricks based environment you need explicitly define/modify the spark properties in Databricks Advance Spark Configuration.

Semantic model data replication optimization through Cloud Native CLI

Kyvos supports multiple flows for copying semantic model metadata from the DFS to the local disk.

For cloud-based clusters, using the Cloud Native CLI optimizes the performance. You can configure this through the Connections page.

Property name: kyvos.build.datacopy.useCLI

Description: This property is used to configure whether cloud-native CLI should be used for copying semantic model data. If the CLI command fails in copying the data, it will fall back to the existing Hadoop API

Possible Values:

NONE: Semantic model metadata and cuboids will be copied using existing Hadoop API
METADATA: Semantic model metadata will be copied using cloud-native CLI and cuboids will be copied using existing Hadoop API
CUBOID: Cuboids will be copied using cloud-native CLI and metadata will be copied using existing Hadoop API
BOTH: semantic model metadata and cuboids will be copied using cloud-native CLI

Default value: NONE

Scope: BI Server, Query Engine

Comes into effect: This is hot deployed at BI Server but requires a restart of Query Engines if CLI is used to copy cuboids.

Impact Area: Metadata copy at BI Server and cuboid copying at Query Engine

Environment: Applicable in all cloud environments

Note

We recommend using METADATA value for the kyvos.build.datacopy.useCLI property in Kyvos 2022.1 version.

Optimization through cluster configuration

For Cloud-based environments, please configure the Autoscaling and define minimum and maximum worker nodes per the computational loads. This will ensure proper utilization of the cloud resources and assists in semantic model process time optimization if used judiciously.
Additional details are mentioned in the cloud best practices.

Kyvos 2024.1.x