Document toolboxDocument toolbox

Configuring Spot instances for EMR

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


As Kyvos allows on-demand EMR, you can configure Spot instances for use with EMR to optimize resource utilization at the time of semantic model processing. Running EMR on Spot Instances drastically reduces the cost and allows for significantly higher semantic model process capacity.

You can configure spot instances using a pre-created AWS CloudFormation template or manually from the AWS Console, as explained in the sections below.

Notes

AWS provides Spot instances at lower prices, which Kyvos can leverage for saving semantic model process costs, but spot instances are forcibly retracted because of insufficient capacity. While using spot instances, it is quite common and may lead to reattempting a failed task that was running on a retracted spot node, and increase the semantic model process time.

Tip

You can use the same configuration steps for shared EMR also.

Configuring Spot instances using AWS CloudFormation Template

To configure Spot instances using the CloudFormation template, perform the following steps

  1. Download the emr_spot.json file the AWS Installation files folder:

  2. Log in to your AWS CloudFormation Console.

  3. From the top-right, click Create Stack > With new resources.

  4. Click Upload a template file and select the downloaded emr-spot.json file.

  5. Click Next.

  6. Here, enter the details as:

Parameter

Description

Parameter

Description

Stack Name

Provide a  stack name of your choice.

VPC 

Select the VPC in which EMR instances will be launched.

Subnet 

Select the subnet which is attached with Kyvos Instances.

Security group

Select the Security group to be attached to EMR.

NOTE: Select the same security group which is attached to EC2 instances for the BI Server or Query Engine of the cluster.

Key Pair

Select the name of the Key Pair to be used with EMR instances.

EnableSSHFlag 

Select the value as true to enable SSH to the EMR cluster. 

S3 bucket

Enter the name of the S3 bucket used for storing the Kyvos semantic model.

Core EC2 Instances

Enter the number of Core EC2 Instances to be launched with EMR.

Minimum number of Core EC2 instances

Enter the minimum number of Core EC2 instances that should be kept running 

Maximum number of Core EC2 instances

Enter the maximum number of Core EC2 instances that should be kept running

EMR Version

Select the version of EMR which needs to be launched 

Use Graviton

Set the value as true to use Graviton Instances for EMR Cluster

Enable TLS Encryption

Select true to enable TLS Encryption for EMR Cluster

S3object ARM

Enter the S3object ARM of the TLS certificate.

  1. Click Next.

  2. Mention Tag if needed.

  3. Click Next.

  4. Click Create Stack.

  5. Now, navigate to the Kyvos Manager and configure the private IP address of the master node of the EMR on the EMR Configuration page. 

    1. Enter master IP of the EMR in EMR Master Node IP/Host Name field.

    2. Click outside the textbox. The system automatically populates the EMR configuration.

    3. Select Sync Configuration and click Apply.

  6. Submit process request to start EMR on demand.

Note

The number of nodes and types will change as per the use cases. In the Kyvos test labs, we used 3 task groups (r5.2xlarge, r4.2xlarge, r3.2xlarge) of 10 SPOT nodes in each task group while testing various semantic models.

Manually Configuring Spot instances for EMR

For this, perform the following steps:

  1. Log in to your AWS console.

  2. Go to EMR and click the cluster on which you want to configure spot instances.

  3. Click the Hardware tab to configure Spot instances with the task node.

    1. Click Add task instance group

    2. Provide the required details for task nodes

      1. Name

      2. EC2 Instance type: Type of node should be similar to core nodes, for example, if the core node type is r5.2xlarge then recommend configuring at least 3 task groups with the same configuration.

        • Non Graviton instances-based

          • r5.2xlarge 

          • r3.2xlarge

          • r4.2xlarge

        • From Kyvos 2023.2 onwards, Graviton instances-based EMR is supported.

          • r6g.2xlarge

          • c6g.2xlarge

          • m6g.2xlarge

      3. Instance Count 10

      4. Check Request Spot with use on-demand as max price

      5. Do the above steps if you want to add a task group

Node Group Type
Non Graviton instances-based

Node Type

Node Group Type
Non Graviton instances-based

Node Type

Core Group

r5.2xlarge

Task-1 (Spot)

r4.2xlarge

Task-2 (Spot)

r5.2xlarge

Task-3 (Spot)

r3.2xlarge

Node Group Type
Graviton instances-based

Node Type

Node Group Type
Graviton instances-based

Node Type

Core Group

r6g.2xlarge

Task-1 (Spot)

r6g.2xlarge

Task-2 (Spot)

c6g.2xlarge

Task-3 (Spot)

m6g.2xlarge

Once you are done with the above steps, enable autoscaling in Core and Task groups.

For this, perform the following steps.

Example for: 35 nodes

Click the Edit button in Cluster Scaling Policy.

Click the Edit icon corresponding to each Task and Core group to create the autoscaling policy, as follows.

For CORE group

Scale-out

Rule1

Add 2 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds.

Rule2

Add 2 instances if AppsPending is greater than or equal to 1 for 1 five-minute period with a cooldown of 180 seconds.

Scale in

Terminate 4 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds

For all TASK groups

Add 3* instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds.

Scale in

Terminate 9 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds.

To persist configuration changes (e.g. Adding/removing task nodes) in upcoming EMRs, click Apply from the EMR Configuration Page on Kyvos Manager.

In the case of shared EMR, follow steps 2 and 3 above.

Example for: 100 nodes

Click the Edit button in Cluster Scaling Policy.

Click the Edit icon corresponding to each Task and Core group to create the autoscaling policy, as follows.

For CORE group

Scale-out

Rule1

Add 5 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds

Rule2

Add 5 instances if AppsPending is greater than or equal  to 1 for 1 five-minute period with a cooldown of 180 seconds

Scale in

Terminate 5 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds

For all TASK groups

Add 9 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds

Scale in

Terminate 29 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds

Points to remember

  1. As the availability of spot instances is not guaranteed, it may impact the semantic model process SLAs, and the probability of job failure for long-running jobs is high.

  2. If your tasks failed multiple times due to spot node reclaim, increase the retry count using the spark.task.maxFailures property.

Copyright Kyvos, Inc. All rights reserved.