Document toolboxDocument toolbox

Best practices for working on AWS environment

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


  1. Ensure same AWS Region for source data and the Kyvos deployment
    You must ensure that the Kyvos is deployed on the same AWS Region as the data source (on S3, Snowflake, or Redshift). Different Regions will incur data transfer costs and delays. 

  2. Query Engines, BI Server, and S3 storage must be in the same region

  3. Configure Cluster and Query Engine Scheduling to save cost and use cloud resources only when needed. 
    You can create schedules for: 

    1. Shutdown cluster for any time interval .

    2. Start cluster for any time interval. 

    3. Schedule Query Engines for any time interval.

  4. Elastic semantic model processes on EMR
    Configure the EMR such that the cluster can scale in or scale out to use only the resources that are needed. This saves the cost of the semantic model process as compared to having static worker nodes. 

  5. Use On-Demand EMR
    The on-demand EMR cluster gets created when the semantic model process is launched and gets terminated when no semantic model process is running OR no semantic model process is scheduled within the next 30 minutes. This ensures that the EMR is used only for the duration of the semantic model process. 

  6. Use Spot instances to save EMR costs in the semantic model process
    To save resource costs while building Kyvos semantic model, configure the Spot instances. AWS offers Spot instances at a discounted price and can significantly reduce the semantic model process cost. Please note, the Spot instances are forcibly retracted because of insufficient capacity, which is quite common while using Spot instances. This may lead to a reattempt of the failed task, which was running on the retracted Spot node and can increase semantic model process time.

  7. Use Glue service for Hive table metadata storage
    Glue allows you to use HCatalog Glue to avoid the recreation of HCatalog tables with EMR cluster recreation for the same deployment. Table metadata will be preserved in Glue even if the EMR gets terminated.

  8. Ensure that all the semantic model which are not eligible for querying must be set to cuboid replication type as NONE.

  9. Ensure that there should be enough amount of local disk space available on Query Engine to replicate the built semantic model.

  10. For the environments where we are not having sufficient local disk available (Local disk less than semantic model size) - create a segment, create a dedicated metadata folder and allocate the prod semantic model to this segment and the rest of the semantic model to the default segment.

Copyright Kyvos, Inc. All rights reserved.