Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Applies to: Image Removed Kyvos Enterprise   Image Removed Kyvos Cloud (Managed Services on AWS)     Image Removed Kyvos Azure Marketplace

Image Removed Kyvos AWS Marketplace   Image Removed Kyvos Single Node Installation (Kyvos SNI)     Image Removed Kyvos Free (Limited offering for AWSApplies to: (tick) Kyvos Enterprise  (tick) Kyvos Cloud (SaaS on AWS) (tick) Kyvos AWS Marketplace

(tick) Kyvos Azure Marketplace   (tick) Kyvos GCP Marketplace (tick) Kyvos Single Node Installation (Kyvos SNI)

...

Prerequisites for semantic model process size and time optimization  

  • The semantic model should

    be  built  for

    be built for 1-2  partitions

    of  base 

    of base partitions. For example

    :

    ,  If the partition is on Day then 1-2 Day, Or for week Partition 1-2 weeks and so on.   

  • You  should  have 

    You should have at least  10-20 different business queries available to be executed on the semantic model process and get the query performance numbers.  

Optimizations through semantic model design 

Check the semantic model job summary and thereafter analyze the following:  

  1. Explore the possibility

...

  1. to Reduce the Number of Dimensions by:    

...

    1. Combining dimensions

...

    1. At the

...

    1. dataset level, whenever possible, combine dimensions with a fewer number of attributes (1-3) into one dimension to reduce the number of dimensions. For example, some dimensions can be merged at the register file level to reduce the number of dimensions at the semantic model level.  

    2. Multiple hierarchies: Consider using multiple hierarchies if you need two types of time data, such as year-month-day and year-quarter-month-day, for different purposes. Or if you need two types of location data such as division-region-district-location and state-county-city-location.

...

    1. Dimension mergingAt the time of semantic model processing, the dimensions and combinations of different dimensions are pre-aggregated/materialized and stored on a disk. If the number of dimensions in a semantic model is too high, the size of materialization becomes large to accommodate existing architecture. This leads to high process time, increased size on disk, and higher read time while querying.
      You can merge two or more dimensions related to the same or different transformation into a single dimension. The subset of facts to which these multiple transformations are related should be the same. This allows saving disk space, reduces process time to materialize semantic models, and improves query performance.

  1. Revisit

...

  1. Distinct Count

...

  1. Measures

    1. Explore the possibility to see if  any

...

    1. Distinct count measures  can be removed

...

    1. if not needed.

    2. Or Accurate counts can be converted to Approximate Counts?

    3. Explore the possibility of using Boundary Based Distinct Counts  wherever possible for High Cardinalities. 

    4. Also, explore if 

...

    1. Sum  or Count  functions can be used   instead of

...

    1.  Distinct Count to derive the same requirement.   

Optimization through aggregation strategy

On the Aggregation Strategy tab tab in semantic model designer, you can modify the aggregation properties to control the dimension combinations and materializations.  

  • Selective dimension materialization: Use this  property to control dimension combinations to be pre-aggregated in a semantic model process. You can specify the dimension names for materialization using the property dialog. In each dimension combination, dimensions are kept in a specific order (defined in  property  kyvos.build.dimension.order ).  
    While choosing dimension combinations to materialize using this property, only the combinations starting with the selected dimension(s) are materialized  resulting in reduce semantic model size and process time.
    Reducing aggregation can impact query performance so you need to cautiously identify which dimensions can be selectively materialized.

  • Selective hierarchy materialization: Use this

     property  to  specif y

     property to specify  the highest level of a dimension to be materialized, allowing you to reduce the semantic model process time and size.  Levels higher than the specified level will be aggregated at run time. Dimensions not specified in this property will be materialized based on default settings. You can also specify to materialize individual levels if needed.  The property value comes into effect after a full semantic model process.
    For  Example : 
    Consider a TIME dimension with 4 levels of hierarchy as YEAR, QTR, MONTH, DAY. If you select QTR, Kyvos will pre-aggregate only the QTR and DAY (as DAY is the lowest level). Any query containing YEAR will be served from QTR, while MONTH will be served from DAY, and they will be aggregated at run time. To materialize YEAR, you can select it individually too.  

  • Recommendation  Aggregates

    Recommendation Aggregates Configurations: Click Recommend me to view a recommended strategy including base and subpartitions, recommendation details, and reasons.   Also, you can get aggregation strategy recommendations.  These recommendations are based on how the data is being used for querying. Kyvos automatically recommends aggregates based on its internal logic, which will improve performance and displays the number of recommendations.  See Using aggregates

     for

    for additional details.   

Optimization through Spark properties 

Users should  have  prior knowledge of the available cluster resources (nodes, cores, memory)  and  should observe each level time while doing the semantic model processing. This will help them to identify the bottlenecks and Optimize the time further by using the below spark properties  to increase task parallelism and use Spark distributed environment within the confinement of given resources.  

  • kyvos.build.spark.levelJob.tasks:  Use this property to configure the number of tasks for reducing stage of Level1

    and Level

    and Level_DistCount  jobs for Full or/and Incremental semantic model process job when executing through Spark engine. You can set this property to any positive integer and thereafter that number of tasks will get launched for the reducer stage to increase parallel tasks if the required resources are available on the cluster.
    Default value is automatic based on data loads.  

  • spark.dynamicAllocation.minExecutors: Use this spark property sets the lower bound for the number of executors if the dynamic allocation is enabled. See details.

  • spark.yarn.executor.memoryOverhead: Use this spark property to set the amount of off-heap memory (in megabytes) to be allocated per executor. This memory accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%). See details.

  • spark.driver.memory: Use this  property  to  set the amount of memory to be used for the driver process,

     

    i.e.,

     where  SparkContext  is

    where SparkContext is initialized

    .

    ( e.g.,

     1g

    1g, 2g).

  • kyvos.spark.executor.memory.level1: Use this property t o set the spark executor memory for level1 job(s) launched during the full and/or incremental semantic model process.  

  • kyvos.spark.executor.cores.level1: Use this property to  set the spark executor cores for level1 job(s) launched during the full and/or incremental semantic model process.

  • spark.executor.memory: Use this  property to set the amount of memory to be used per executor process ( e.g.,  2g, 8g). See details.

  • spark.executor.cores: Use this  property  to  set the number of cores to be used on each executor. This property is for YARN and standalone mode only in spark. In standalone mode, setting this parameter allows an application to run multiple executors on the same worker  if  there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.  See details.

  • spark.dynamicAllocation.maxExecutors:

     

    Use this

     property  to  set

    property to set the upper bound for the number of executors if the dynamic allocation is enabled. See details.

...

Panel
panelIconIdatlassian-note
panelIcon:note:
bgColor#DEEBFF

Note

For Azure  Databricks  based environment you need  explicitly  define/modify the spark properties  in Databricks Advance Spark Configuration.

Semantic model data replication optimization through Cloud Native CLI

...

Description: This property is used to configure whether cloud-native CLI should be used for copying semantic model data. If the CLI command fails in copying the data, it will fall back to the existing Hadoop API

Possible Values:

  • NONE: Semantic model metadata and cuboids will be copied using existing Hadoop API

  • METADATA: Semantic model metadata will be copied using cloud-native CLI and cuboids will be copied using existing Hadoop API

  • CUBOID: Cuboids will be copied using cloud-native CLI and metadata will be copied using existing Hadoop API

  • BOTH: semantic model metadata and cuboids will be copied using cloud-native CLI

Default value: NONE

Scope: BI Server, Query Engine

...

Environment: Applicable in all cloud environments

...

Panel
panelIconIdatlassian-note
panelIcon:note:
bgColor#DEEBFF

 Note

We recommend using METADATA value for the kyvos.build.datacopy.useCLI property in Kyvos 2022.1 version.

Optimization through cluster configuration

  • For Cloud-based  environments,  please  configure the Autoscaling and define minimum and maximum worker nodes per the computational loads. This will ensure proper utilization of the cloud resources  and assists in semantic model process time optimization if used judiciously.  

  • Additional details are mentioned in the cloud best practices.