Document toolboxDocument toolbox

Tuning and optimizing semantic model

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


Prerequisites for semantic model process size and time optimization  

  • The semantic model should be built for 1-2  partitions of base partitions. For example,  If the partition is on Day then 1-2 Day, Or for week Partition 1-2 weeks and so on.   

  • You should have at least  10-20 different business queries available to be executed on the semantic model process and get the query performance numbers.  

Optimizations through semantic model design 

Check the semantic model job summary and thereafter analyze the following:  

  1. Explore the possibility to Reduce the Number of Dimensions by:    

    1. Combining dimensions:  At the dataset level, whenever possible, combine dimensions with a fewer number of attributes (1-3) into one dimension to reduce the number of dimensions. For example, some dimensions can be merged at the register file level to reduce the number of dimensions at the semantic model level.  

    2. Multiple hierarchies: Consider using multiple hierarchies if you need two types of time data, such as year-month-day and year-quarter-month-day, for different purposes. Or if you need two types of location data such as division-region-district-location and state-county-city-location.

    3. Dimension merging: At the time of semantic model processing, the dimensions and combinations of different dimensions are pre-aggregated/materialized and stored on a disk. If the number of dimensions in a semantic model is too high, the size of materialization becomes large to accommodate existing architecture. This leads to high process time, increased size on disk, and higher read time while querying.
      The subset of facts to which these multiple transformations are related should be the same. This allows saving disk space, reduces process time to materialize semantic models, and improves query performance.

  2. Revisit Distinct Count Measures

    1. Explore the possibility to see if  any Distinct count measures  can be removed if not needed.

    2. Or Accurate counts can be converted to Approximate Counts?

    3. Explore the possibility of using Boundary Based Distinct Counts  wherever possible for High Cardinalities. 

    4. Also, explore if  Sum  or Count  functions can be used   instead of  Distinct Count to derive the same requirement.   

Optimization through aggregation strategy

On the Aggregation Strategy tab in semantic model designer, you can modify the aggregation properties to control the dimension combinations and materializations.  

  • Selective dimension materialization: Use this  property to control dimension combinations to be pre-aggregated in a semantic model process. You can specify the dimension names for materialization using the property dialog. In each dimension combination, dimensions are kept in a specific order (defined in  property  kyvos.build.dimension.order ).  
    While choosing dimension combinations to materialize using this property, only the combinations starting with the selected dimension(s) are materialized  resulting in reduce semantic model size and process time.
    Reducing aggregation can impact query performance so you need to cautiously identify which dimensions can be selectively materialized.

  • Selective hierarchy materialization: Use this  property to specify  the highest level of a dimension to be materialized, allowing you to reduce the semantic model process time and size.  Levels higher than the specified level will be aggregated at run time. Dimensions not specified in this property will be materialized based on default settings. You can also specify to materialize individual levels if needed.  The property value comes into effect after a full semantic model process.
    For  Example : 
    Consider a TIME dimension with 4 levels of hierarchy as YEAR, QTR, MONTH, DAY. If you select QTR, Kyvos will pre-aggregate only the QTR and DAY (as DAY is the lowest level). Any query containing YEAR will be served from QTR, while MONTH will be served from DAY, and they will be aggregated at run time. To materialize YEAR, you can select it individually too.  

  • Recommendation Aggregates Configurations: Click Recommend me to view a recommended strategy including base and subpartitions, recommendation details, and reasons.   Also, you can get aggregation strategy recommendations.  These recommendations are based on how the data is being used for querying. Kyvos automatically recommends aggregates based on its internal logic, which will improve performance and displays the number of recommendations.  See Using aggregates for additional details.   

Optimization through Spark properties 

Users should  have  prior knowledge of the available cluster resources (nodes, cores, memory)  and  should observe each level time while doing the semantic model processing. This will help them to identify the bottlenecks and Optimize the time further by using the below spark properties  to increase task parallelism and use Spark distributed environment within the confinement of given resources.  

  • kyvos.build.spark.levelJob.tasks:  Use this property to configure the number of tasks for reducing stage of Level1 and Level_DistCount  jobs for Full or/and Incremental semantic model process job when executing through Spark engine. You can set this property to any positive integer and thereafter that number of tasks will get launched for the reducer stage to increase parallel tasks if the required resources are available on the cluster.
    Default value is automatic based on data loads.  

  • spark.dynamicAllocation.minExecutors: Use this spark property sets the lower bound for the number of executors if the dynamic allocation is enabled. See details.

  • spark.yarn.executor.memoryOverhead: Use this spark property to set the amount of off-heap memory (in megabytes) to be allocated per executor. This memory accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%). See details.

  • spark.driver.memory: Use this  property  to  set the amount of memory to be used for the driver process, i.e., where SparkContext is initialized ( e.g., 1g, 2g).

  • kyvos.spark.executor.memory.level1: Use this property t o set the spark executor memory for level1 job(s) launched during the full and/or incremental semantic model process.  

  • kyvos.spark.executor.cores.level1: Use this property to  set the spark executor cores for level1 job(s) launched during the full and/or incremental semantic model process.

  • spark.executor.memory: Use this  property to set the amount of memory to be used per executor process ( e.g.,  2g, 8g). See details.

  • spark.executor.cores: Use this  property  to  set the number of cores to be used on each executor. This property is for YARN and standalone mode only in spark. In standalone mode, setting this parameter allows an application to run multiple executors on the same worker  if  there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.  See details.

  • spark.dynamicAllocation.maxExecutors: Use this property to set the upper bound for the number of executors if the dynamic allocation is enabled. See details.

Note

For Azure  Databricks  based environment you need  explicitly  define/modify the spark properties  in Databricks Advance Spark Configuration.

Semantic model data replication optimization through Cloud Native CLI

Kyvos supports multiple flows for copying semantic model metadata from the DFS to the local disk.

For cloud-based clusters, using the Cloud Native CLI optimizes the performance. You can configure this through the Connections page.

Property name: kyvos.build.datacopy.useCLI

Description: This property is used to configure whether cloud-native CLI should be used for copying semantic model data. If the CLI command fails in copying the data, it will fall back to the existing Hadoop API

Possible Values:

  • NONE: Semantic model metadata and cuboids will be copied using existing Hadoop API

  • METADATA: Semantic model metadata will be copied using cloud-native CLI and cuboids will be copied using existing Hadoop API

  • CUBOID: Cuboids will be copied using cloud-native CLI and metadata will be copied using existing Hadoop API

  • BOTH: semantic model metadata and cuboids will be copied using cloud-native CLI

Default value: NONE

Scope: BI Server, Query Engine

Comes into effect: This is hot deployed at BI Server but requires a restart of Query Engines if CLI is used to copy cuboids.

Impact Area: Metadata copy at BI Server and cuboid copying at Query Engine

Environment: Applicable in all cloud environments

 Note

We recommend using METADATA value for the kyvos.build.datacopy.useCLI property in Kyvos 2022.1 version.

Optimization through cluster configuration

  • For Cloud-based  environments,  please  configure the Autoscaling and define minimum and maximum worker nodes per the computational loads. This will ensure proper utilization of the cloud resources  and assists in semantic model process time optimization if used judiciously.  

  • Additional details are mentioned in the cloud best practices.

Copyright Kyvos, Inc. All rights reserved.