Document toolboxDocument toolbox

Managing on-demand cluster

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


The Hadoop or YARN cluster is a critical resource for running Spark or MapReduce jobs to do build activities. Running a build cluster in a cloud environment such as AWS, Azure, or Google incurs costs. Therefore, the cluster must be 'up' only to do any processing activity. Upon completion of the process activity, the cluster must be terminated or stopped to save costs. To reduce the costs of the cloud resources and increase the scalability and performance of the cluster, Kyvos allows you to configure an on-demand Hadoop or YARN cluster by using Kyvos properties from Kyvos Manager. Configuring on-demand EMR or GCP clusters typically reduces your cloud costs. 

After configuring the on-demand cluster, you can do the following operations:

  • Starting the cluster in-demand times

  • Stopping the cluster when there is no schedule to process a job

  • Creating the EMR or Dataproc clusters schedule for the next scheduled process

The supported cloud vendors are AWS (EMR cluster), GCP(Dataproc cluster), and Azure (Databricks cluster). In the case of these cloud vendors, you can do the following operations: 

  • Launching or terminating the EMR cluster

  • Starting or stopping the GCP cluster

Note

  • DO NOT enable the feature if the source data resides on HCatalog or Hive because these services are coupled with the job cluster. After the job cluster stops, these services also stop.

  • Ensure Azure Databricks can start and stop the cluster per the demand. Kyvos does not manage the Databricks cluster.

  • Upon completion of the processing activity, Kyvos scales in the EMR cluster, which reduces the cluster's overall cost. 

The process activities are of the following types:

  • Semantic model process 

  • Semantic model profile

  • Register file profile

  • Update aggregates

Configuring an on-demand cluster  

The Kyvos Manager enables you to configure an on-demand cluster by using the properties of the olapengine.properties file. See the Managing Kyvos properties section for more details. 

The configuration properties of the on-demand cluster are of the following types:

  • BUILD_CLUSTER_INACTIVE_TIME_INTERVAL: This property is used to specify the inactive time for scheduled processes. 

  • BUILD_CLUSTER_SETUP_THRESHOLD: This property is used to specify the time required for creating a job cluster. 

  • BUILD_CLUSTER_SHUTDOWN_INTERVAL: This property is used to specify the time to terminate a job cluster. 

Note

You can modify the default value of the property as needed. 

To modify the default value of the property, perform the following steps. 

  1. To access the Properties page, click the cluster name > Kyvos and Ecosystem > Kyvos Properties on the navigation pane.
    The Kyvos Properties dialog is displayed.

  2. Expand the olapengine.properties file and look for the properties to configure the on-demand cluster. 

  3. The default value of the property is displayed next to the property name. You can also change the default value as needed.

  4. In this section, you can also do the following: 

    1. Click Expand All or Collapse All link to view or hide all properties.

    2. Click the i icon corresponding to a property to view its description. Alternatively, click the Show Description button from the bottom of the page to view the properties' descriptions.

Enabling or disabling the on-demand cluster 

You can enable or disable the on-demand EMR or Dataproc clusters at any time. The steps to configure the EMR or Dataproc cluster are the same. 

To enable or disable the on-demand cluster, perform the following steps. 

  1. On the title bar, click the Menu button (hamburger icon).
    The Kyvos Manager menu appears. 

  2. On the navigation pane, click Kyvos and Ecosystem > EMR.
    In the case of GCP (Dataproc) deployments, the Dataproc option is displayed. Click the Dataproc option. 

  3. On the EMR page, select the Automatically Launch or Terminate EMR Cluster checkbox to enable the on-demand EMR cluster. Similarly, clear the checkbox to disable the on-demand EMR cluster. 
    In the case of GCP (Dataproc) deployments, select the Automatically Start or Stop Dataproc Cluster checkbox to enable the on-demand Dataproc cluster. Similarly, clear the checkbox to disable the on-demand Dataproc cluster. 

EMR page 

Dataproc page 

Copyright Kyvos, Inc. All rights reserved.