Configuring Spot instances for EMR

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

As Kyvos allows on-demand EMR, you can configure Spot instances for use with EMR to optimize resource utilization at the time of semantic model processing. Running EMR on Spot Instances drastically reduces the cost and allows for significantly higher semantic model process capacity.

You can configure spot instances using a pre-created AWS CloudFormation template or manually from the AWS Console, as explained in the sections below.

Notes

AWS provides Spot instances at lower prices, which Kyvos can leverage for saving semantic model process costs, but spot instances are forcibly retracted because of insufficient capacity. While using spot instances, it is quite common and may lead to reattempting a failed task that was running on a retracted spot node, and increase the semantic model process time.

Tip

You can use the same configuration steps for shared EMR also.

Configuring Spot instances using AWS CloudFormation Template

To configure Spot instances using the CloudFormation template, perform the following steps

Download the emr_spot.json file the AWS Installation files folder:
Log in to your AWS CloudFormation Console.
From the top-right, click Create Stack > With new resources.
Click Upload a template file and select the downloaded emr-spot.json file.
Click Next.
Here, enter the details as:

Parameter	Description

Parameter	Description
Stack Name	Provide a stack name of your choice.
VPC	Select the VPC in which EMR instances will be launched.
Subnet	Select the subnet which is attached with Kyvos Instances.
Security group	Select the Security group to be attached to EMR. NOTE: Select the same security group which is attached to EC2 instances for the BI Server or Query Engine of the cluster.
Key Pair	Select the name of the Key Pair to be used with EMR instances.
EnableSSHFlag	Select the value as true to enable SSH to the EMR cluster.
S3 bucket	Enter the name of the S3 bucket used for storing the Kyvos semantic model.
Core EC2 Instances	Enter the number of Core EC2 Instances to be launched with EMR.
Minimum number of Core EC2 instances	Enter the minimum number of Core EC2 instances that should be kept running
Maximum number of Core EC2 instances	Enter the maximum number of Core EC2 instances that should be kept running
EMR Version	Select the version of EMR which needs to be launched
Use Graviton	Set the value as true to use Graviton Instances for EMR Cluster
Enable TLS Encryption	Select true to enable TLS Encryption for EMR Cluster
S3object ARM	Enter the S3object ARM of the TLS certificate.

Click Next.
Mention Tag if needed.
Click Next.
Click Create Stack.
Now, navigate to the Kyvos Manager and configure the private IP address of the master node of the EMR on the EMR Configuration page.
1. Enter master IP of the EMR in EMR Master Node IP/Host Name field.
2. Click outside the textbox. The system automatically populates the EMR configuration.
3. Select Sync Configuration and click Apply.
Submit process request to start EMR on demand.

Note

The number of nodes and types will change as per the use cases. In the Kyvos test labs, we used 3 task groups (r5.2xlarge, r4.2xlarge, r3.2xlarge) of 10 SPOT nodes in each task group while testing various semantic models.

Manually Configuring Spot instances for EMR

For this, perform the following steps:

Log in to your AWS console.
Go to EMR and click the cluster on which you want to configure spot instances.
Click the Hardware tab to configure Spot instances with the task node.
1. Click Add task instance group
2. Provide the required details for task nodes
  1. Name
  2. EC2 Instance type: Type of node should be similar to core nodes, for example, if the core node type is r5.2xlarge then recommend configuring at least 3 task groups with the same configuration.
    - Non Graviton instances-based
      - r5.2xlarge
      - r3.2xlarge
      - r4.2xlarge
    - From Kyvos 2023.2 onwards, Graviton instances-based EMR is supported.
      - r6g.2xlarge
      - c6g.2xlarge
      - m6g.2xlarge
  3. Instance Count 10
  4. Check Request Spot with use on-demand as max price
  5. Do the above steps if you want to add a task group

Node Group Type Non Graviton instances-based	Node Type

Node Group Type Non Graviton instances-based	Node Type
Core Group	r5.2xlarge
Task-1 (Spot)	r4.2xlarge
Task-2 (Spot)	r5.2xlarge
Task-3 (Spot)	r3.2xlarge

Node Group Type Graviton instances-based	Node Type

Node Group Type Graviton instances-based	Node Type
Core Group	r6g.2xlarge
Task-1 (Spot)	r6g.2xlarge
Task-2 (Spot)	c6g.2xlarge
Task-3 (Spot)	m6g.2xlarge

Once you are done with the above steps, enable autoscaling in Core and Task groups.

For this, perform the following steps.

Example for: 35 nodes

Click the Edit button in Cluster Scaling Policy.

Click the Edit icon corresponding to each Task and Core group to create the autoscaling policy, as follows.

For CORE group

Scale-out

Rule1

Add 2 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds.

Rule2

Add 2 instances if AppsPending is greater than or equal to 1 for 1 five-minute period with a cooldown of 180 seconds.

Scale in

Terminate 4 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds

For all TASK groups

Add 3* instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds.

Scale in

Terminate 9 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds.

To persist configuration changes (e.g. Adding/removing task nodes) in upcoming EMRs, click Apply from the EMR Configuration Page on Kyvos Manager.

In the case of shared EMR, follow steps 2 and 3 above.

Example for: 100 nodes

Click the Edit button in Cluster Scaling Policy.

Click the Edit icon corresponding to each Task and Core group to create the autoscaling policy, as follows.

For CORE group

Scale-out

Rule1

Add 5 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds

Rule2

Add 5 instances if AppsPending is greater than or equal to 1 for 1 five-minute period with a cooldown of 180 seconds

Scale in

Terminate 5 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds

For all TASK groups

Add 9 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds

Scale in

Terminate 29 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds

Points to remember

As the availability of spot instances is not guaranteed, it may impact the semantic model process SLAs, and the probability of job failure for long-running jobs is high.
If your tasks failed multiple times due to spot node reclaim, increase the retry count using the spark.task.maxFailures property.