Configuring Spot instances for EMR
Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
Kyvos Azure Marketplace  Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)
As Kyvos allows on-demand EMR, you can configure Spot instances for use with EMR to optimize resource utilization at the time of semantic model processing. Running EMR on Spot Instances drastically reduces the cost and allows for significantly higher semantic model process capacity.
You can configure spot instances using a pre-created AWS CloudFormation template or manually from the AWS Console, as explained in the sections below.
Notes
AWS provides Spot instances at lower prices, which Kyvos can leverage for saving semantic model process costs, but spot instances are forcibly retracted because of insufficient capacity. While using spot instances, it is quite common and may lead to reattempting a failed task that was running on a retracted spot node, and increase the semantic model process time.
Tip
You can use the same configuration steps for shared EMR also.
Configuring Spot instances using AWS CloudFormation Template
To configure Spot instances using the CloudFormation template, perform the following steps
Download the emr_spot.json file the AWS Installation files folder:
Log in to your AWS CloudFormation Console.
From the top-right, click Create Stack > With new resources.
Click Upload a template file and select the downloaded emr-spot.json file.
Click Next.
Here, enter the details as:
Parameter | Description |
---|---|
Stack Name | Provide a stack name of your choice. |
VPCÂ | Select the VPC in which EMR instances will be launched. |
Subnet | Select the subnet which is attached with Kyvos Instances. |
Security group | Select the Security group to be attached to EMR. NOTE: Select the same security group which is attached to EC2 instances for the BI Server or Query Engine of the cluster. |
Key Pair | Select the name of the Key Pair to be used with EMR instances. |
EnableSSHFlag | Select the value as true to enable SSH to the EMR cluster. |
S3 bucket | Enter the name of the S3 bucket used for storing the Kyvos semantic model. |
Core EC2 Instances | Enter the number of Core EC2 Instances to be launched with EMR. |
Minimum number of Core EC2 instances | Enter the minimum number of Core EC2 instances that should be kept running |
Maximum number of Core EC2 instances | Enter the maximum number of Core EC2 instances that should be kept running |
EMR Version | Select the version of EMR which needs to be launched |
Use Graviton | Set the value as true to use Graviton Instances for EMR Cluster |
Enable TLS Encryption | Select true to enable TLS Encryption for EMR Cluster |
S3object ARM | Enter the S3object ARM of the TLS certificate. |
Click Next.
Mention Tag if needed.
Click Next.
Click Create Stack.
Now, navigate to the Kyvos Manager and configure the private IP address of the master node of the EMR on the EMR Configuration page.Â
Enter master IP of the EMR in EMR Master Node IP/Host Name field.
Click outside the textbox. The system automatically populates the EMR configuration.
Select Sync Configuration and click Apply.
Submit process request to start EMR on demand.
Note
The number of nodes and types will change as per the use cases. In the Kyvos test labs, we used 3 task groups (r5.2xlarge, r4.2xlarge, r3.2xlarge) of 10Â SPOT nodes in each task group while testing various semantic models.
Manually Configuring Spot instances for EMR
For this, perform the following steps:
Log in to your AWS console.
Go to EMR and click the cluster on which you want to configure spot instances.
Click the Hardware tab to configure Spot instances with the task node.
Click Add task instance group
Provide the required details for task nodes
Name
EC2 Instance type: Type of node should be similar to core nodes, for example, if the core node type is r5.2xlarge then recommend configuring at least 3 task groups with the same configuration.
Non Graviton instances-based
r5.2xlargeÂ
r3.2xlarge
r4.2xlarge
From Kyvos 2023.2 onwards, Graviton instances-based EMR is supported.
r6g.2xlarge
c6g.2xlarge
m6g.2xlarge
Instance Count 10
Check Request Spot with use on-demand as max price
Do the above steps if you want to add a task group
Node Group Type | Node Type |
---|---|
Core Group | r5.2xlarge |
Task-1 (Spot) | r4.2xlarge |
Task-2 (Spot) | r5.2xlarge |
Task-3 (Spot) | r3.2xlarge |
Node Group Type | Node Type |
---|---|
Core Group | r6g.2xlarge |
Task-1 (Spot) | r6g.2xlarge |
Task-2 (Spot) | c6g.2xlarge |
Task-3 (Spot) | m6g.2xlarge |
Once you are done with the above steps, enable autoscaling in Core and Task groups.
For this, perform the following steps.
Example for: 35 nodes
Click the Edit button in Cluster Scaling Policy.
Click the Edit icon corresponding to each Task and Core group to create the autoscaling policy, as follows.
For CORE group
Scale-out
Rule1
Add 2 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds.
Rule2
Add 2 instances if AppsPending is greater than or equal to 1 for 1 five-minute period with a cooldown of 180 seconds.
Scale in
Terminate 4 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds
For all TASK groups
Add 3* instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds.
Scale in
Terminate 9 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds.
To persist configuration changes (e.g. Adding/removing task nodes) in upcoming EMRs, click Apply from the EMR Configuration Page on Kyvos Manager.
In the case of shared EMR, follow steps 2 and 3 above.
Example for: 100 nodes
Click the Edit button in Cluster Scaling Policy.
Click the Edit icon corresponding to each Task and Core group to create the autoscaling policy, as follows.
For CORE group
Scale-out
Rule1
Add 5 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds
Rule2
Add 5 instances if AppsPending is greater than or equal to 1 for 1 five-minute period with a cooldown of 180 seconds
Scale in
Terminate 5 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds
For all TASK groups
Add 9 instances if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 180 seconds
Scale in
Terminate 29 instances if AppsRunning is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds
Points to remember
As the availability of spot instances is not guaranteed, it may impact the semantic model process SLAs, and the probability of job failure for long-running jobs is high.
If your tasks failed multiple times due to spot node reclaim, increase the retry count using the spark.task.maxFailures property.
Copyright Kyvos, Inc. All rights reserved.