Configure load-based scaling

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

When the cluster usage pattern is not fixed, you must use load-based scaling. This type of scaling allows you to automatically scale up or scale down the query engines based on the resource utilization of the Query Engine instances. By configuring cluster scaling, you can enhance the utilization of your cloud cluster and save on compute costs.

From Kyvos 2024.2 onwards, scaling is implemented based on the CPU and Memory usage of the Query Engine instances. System resources will be monitored for all BI Servers and Query Engines every 30 seconds, and Query Engines’ scaling will be performed based on this data.

Cluster will be scaled up step-by step, for instance, the cluster will be scale from Low to Moderate and then Moderate to High. Similarly, for scaling down the cluster will step-by step, as from High to moderate and Moderate to Low.

By default, load-based cluster scaling is enabled and if you want to further configure it, you can do this on Cluster Scaling page by using the Load option.

To set a load-based scaling, from the Toolbox, click Setup > Cluster Scaling. The Cluster Scaling page is displayed.

Kyvos provides the following scaling modes for Load-based scaling:

Managed: In Managed scaling, Kyvos intelligently manages the cluster capacity to scale up and scale down Query Engine instances.
Custom: In Custom scaling, you can set up rules based on your cluster usage patterns. Currently, the custom scaling mode supports CPU and Memory utilization-based scaling. When the CPU or Memory load condition meets the configured parameters, the Query Engine cluster can be scaled up or down.

Note

The capacity of the BI Servers cannot be changed.
All BI Servers can be shut down except the Coordination Master. If there is only one BI Server, this BI Server is treated as the Coordination Master.
The Settings option and Add Schedule option are disabled on the Load screen.
You can view on-screen notifications that provide you with timely information about the state of the cluster.
When you scale down the Query Engines, you reduce the capacity of the node, including the number of cores and memory. Conversely, when you scale up the Query Engines, you increase the capacity of the node by adding more cores and memory.

Points to know

If the cluster is down and a query is executed, the first query triggers the cluster startup process, and all queries fail until the cluster is up and running. In this case, the following messages are displayed:
- Message 1: "Could not serve the query as Query Engine Cluster is not available. Query Engine is launched. Please try after some time."
- Message 2: "Could not serve the query as Query Engine Cluster is starting. Please try after some time."
By default, the queries fail until Query Engines are started. If you want to hold the queries when query engines are down, you can specify the time (in minutes) for holding queries by using the QE_STARTUP_QUERY_HOLD_TIME property.
Ensure that if the query engines become active before the configured time, the query will be served; otherwise, it will fail.
The Query Engines do not start for any ROLAP queries.
During the transitioning period of Query Engines; such as scale-up, scale-down, or shut down; you can design and refine the semantic model because the Coordination Master is always up and running .
The following tables specify the approximate time required to complete the process during the transitioning period.

To set the scaling modes for Load-Based scaling, perform the following steps.

On the page, the Load page is displayed by default.
To set the scaling mode, select one of the following:
1. Managed: Select the required capacity from the list to start Query Engine when any query is fired.
2. Custom: Select this option to set the rules for resource utilization.
  - To set scale up rules,
    1. Select the required capacity from the list to start Query Engine when any query is fired.
    2. Enter a percentage to scale up the cluster if CPU and Memory utilization threshold goes above the specified percentage. Also, specify the number of data points and the total number of data points to set.
  - To set scale down rules,
    1. Specify the BI Server and/or Query Engine from the list to shut down when no queries are fired for the specified period of time.
    2. Enter a percentage to scale down the cluster if CPU and memory utilization threshold remains below the specified percentage. Also, specify the number of data points and the total number of data points to set.
Click Save. The load-based scaling mode is set.

NOTE

A data point is information on resource utilization captured every 30 seconds.

+++++++++

Additionally, you can set rules to automatically shut down or start the BI Servers and Query Engines to further optimize resource utilization.

To implement a scale-down rule, you can configure a setting to automatically reduce the capacity of your Query Engines when the cluster is under usage. Additionally, you can set a rule to automatically shut down the Query Engines and BI Servers if the cluster is not in use for a specified period. Similarly, for a scale-up rule, you can set a rule to automatically start the BI Servers and Query Engines with the required capacity when the cluster utilization exceeds over a defined period.

On this page, you can specify rules to scale down or scale up the query engines, as well as set rules to shut down or start the BI Servers and query engines.

You can set the default settings for the query engine cluster or create a new schedule using the Schedule option displayed on the Cluster Scheduling page. See the Creating schedule-based scaling section for more details.