Document toolboxDocument toolbox

No-Spark Semantic model processing through Kyvos Native Compute Engines

Platform for Compute Type

Platform for Compute Type

You can choose Kyvos Native as a compute engine or External engines for semantic model processing.

 

Compute Type

 Platform

External

Kyvos Native

AWS

EMR / Databricks

SHARED QE or Dedicated Compute (K8S)

AZURE

Databricks

SHARED QE or Dedicated Compute (K8S)

GCP

Dataproc

SHARED QE or Dedicated Compute (K8S)

ON PREM

Hadoop

SHARED QE or Dedicated Compute

Deploy Kyvos through Kyvos Native Compute

Kyvos Native Compute operates independently of any external compute clusters when processing semantic models. It uses its proprietary Kyvos Analytical Store, which reduces costs, bolsters security, and removes the dependency on permissions.

Note
When you process a semantic model with no Spark, cuboids are stored at persistent storage. However, a copy of these cuboids is kept at the local storage (local disk).

  • Shared Query Engine: In this mode, the query engine not only performs queries but also handles semantic model processing. This dual role is named SHARED because the same process undertakes both activities.

Note

From Kyvos 2024.9.1 onwards, if you use Query Engines as a compute server:

  • For Load Based Scaling: Query Engines will be automatically started when the semantic model is processed.

  • For Schedule based scaling: Query Engines will be automatically started when the semantic model is processed, but if you have enabled scheduled-based scaling, the Query Engines will not auto start. In this case, Kyvos recommends switching to Load-based scaling.

  • Dedicated Compute: In this mode, the semantic model is processed via dedicated service. In cloud-based deployment, the semantic model is processed using Kubernetes (K8S) cluster-based nodes while in ON PREM environment, models are processed on dedicated nodes.

You can change from an external compute cluster to Kyvos Native for processing semantic models through Kyvos Manager on the Compute Cluster page.

From Kyvos 2024.10, you can:

  • Process semantic model with no-Spark using the Shared Query Engine and dedicated Kubernetes cluster on AWS Managed Services.

  • Resume failed or canceled semantic model process.

  • Run test data semantic model process job.

Supported Platforms

 

Supported Environments

 

AWS

AZURE

GCP

ON PREM

Supported Native Types

Kyvos Enterprise

Marketplace

Managed Services

Enterprise

Marketplace

Enterprise

Marketplace

 

SHARED QE

Dedicated Compute (K8S)

  • AWS: For Kubernetes, Kyvos processes the semantic model using Amazon Elastic Kubernetes Service (Amazon EKS). You can select Query Engine or Kubernetes as a compute engine using no Spark model processing.
    For further details about deployment, see the Automated deployment for AWS via CloudFormation with Kyvos Native section.

  • Azure: For Kubernetes, Kyvos processes the semantic model using Azure’s managed service AKS (Azure Kubernetes Service).

    • The Azure cluster is deployed via ARM templates. You can create a cluster without Spark or process the semantic model using Spark mode within ARM templates.
      For further details about deployment, see the Automated deployment on Azure with Kyvos Native section.

    • From Kyvos 2024.3 onwards, you can select the compute cluster as the Query engine or Kubernetes when deploying Kyvos through Azure Template Specs.

  • GCP: For Kubernetes, Kyvos processes the semantic model using Google Cloud's managed service GKE (Google Kubernetes Engine). The GKE cluster is deployed through GCP Installation Files. Using the scripts, you can select a No-Spark-based cluster or process the semantic model using Spark Mode.
    Optionally, for no-Spark deployments, you can either use new or existing Dataproc cluster.
    For further details about Kyvos deployment on GCP using the no-Spark model, see the following section:

  • On Premises: For On-premises deployment, you can deploy using No Spark types: SHARED_QE and Compute Server. For further details about on premises deployment with no-Spark, see the Deploying no Hadoop no Spark.

Using existing Kubernetes cluster

Note

  • This applies only to AWS and GCP.

  • You can use the Kubernetes (K8s) enabled Kyvos cluster in the following cases:

    • Fresh Automated deployment

    • Fresh Wizard based deployment

    • Configuring K8s in an existing external compute-based cluster

Points to know before using existing Kubernetes (K8s) cluster

  1. Shared cluster is not supported on AWS and GCP.

  2. Name spaces must be fixed for existing K8s cluster as kyvos-compute and kyvos-monitoring on AWS and GCP

  3. Node pool of Kubernetes cluster must be for dedicated Kyvos use.

  4. Even if a dedicated node pool is needed for Kyvos, currently, a single Kubernetes cluster can be used with any single Kyvos cluster. This means that one dedicated node pool of a K8s cluster cannot be used with one Kyvos cluster, and another dedicated node pool of the same K8s cluster cannot be used with another Kyvos cluster.

  5. Node pool used for Kyvos must have single instance type used for pool.

  6. Node pool with multiple instance type is not supported.

  7. Currently, Azure is not supported for existing Kubernetes cluster in any of the following cases:

    1. Fresh automated deployment

    2. Fresh wizard- based deployment

  8. Instance type of Node pool must be of 4 minimum cores & 16 GB memory requirement.

  9. There must be required permissions to list Kubernetes clusters and their node pools; without these permissions, the input will be converted to a text input rather than a dropdown.

  10. Node pool for GCP Kubernetes cluster must be of single zone. Multi-zone node pool is not supported.

Support for using existing Kubernetes cluster with Kyvos

  1. The name of a Kubernetes cluster provided by the user can be arbitrary and is not required to be a fixed name.

  2. The name of (user) node pool can be arbitrary and is not required to be a fixed name. Thus, user provided pool can be used with Kyvos.

  3. Regardless of the method used to create a pre-existing Kubernetes cluster (UI, Terraform, ARM/CFT), it can be used with Kyvos.

  4. The role or identity used for creating a Kubernetes cluster may be identical to or different from the one used for creating Kyvos resources. However, the Kyvos role possesses the required permissions to access the Kubernetes cluster, it will function properly.

  5. If user’s Kubernetes cluster in different VPC then peering must be required.

  6. The security group and subnet of a Kubernetes cluster must be the same. However, if the security group or subnet of the provided Kubernetes cluster differs, it can still be used after permissions to access that subnet and the required ports are added to the security group used in the Kyvos cluster.

Post deployment steps for all clouds (AWS, Azure and GCP)

After deploying Kyvos using no-Spark processing model, perform the following post deployment steps.

  • Modify the values of the following properties in the advance properties of semantic model job:

    • kyvos.process.compute.type=KYVOS_COMPUTE

    • kyvos.build.aggregate.type=TABULAR

Post deployment steps on Azure

You need to add the Storage Blob Data Contributor role and add external location on the Azure portal.

  • To add the Storage Blob Data Contributor role,

    1. On the Home page of the Azure portal, search for Storage Accounts.

    2. On the Storage Accounts page, select the storage account that is used for deployment.

    3. Navigate to Access Control (IAM).

    4. Select the Storage Blob Data Contributor role from the list.

    5. In the Assign Access to section, select Managed Identity.

    6. Click Select Member.

    7. On the Select Managed Identity dialog, select the Access Connector for Azure Databricks from the list.

    8. Click Review+assign and click save the permission.

      image-20240802-094013.png
  • To add an external location,

    1. Go to Databricks workspace. In the left pane, click Catalog.

    2. Click Settings, and then click External Locations.

      image-20240802-095059.png
    3. On the External Location page, click Create location.

    4. On the Create Location dialog, add external location name, select the credentials from the list, and URL
      The URL must be in abfss://<Container name >@<Storage name >.dfs.core.windows.net/<Cluster engine_work directroy>
      For example, abfss://kyvoscontainer@kyvossa05751.dfs.core.windows.net/user/engine_work

    5. Click Create. Click Grant and select all privileges as CREATE EXTERNAL TABLE AND WRITE FILES roleand grant them to the user whose token is used for while creating SQL Warehouse connection.

    6. Click Grant. The Permission is assigned.

Important points to know

To process semantic model without spark, you must do the following:

  • To execute the sanity suite for no-spark and Kubernetes (K8S) supported deployments, refer to the Prerequisites for no-spark deployments section.

  • Modify the value of the following semantic model advanced properties on Kyvos Web Portal:

  • You can also set subtype to select no-Spark model processing for semantic model via property.
    To do this, navigate to the Kyvos Properties page and update the KYVOS_PROCESS_COMPUTE_SUBTYPE property and restart Kyvos services.

  • For no-Spark deployments with enhanced security, when creating a Kyvos warehouse connection of a Databricks managed workspace, ensure that the firewall settings on the storage account are disabled.

Copyright Kyvos, Inc. All rights reserved.