Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)
Important
By default, the Kyvos SNS is created with a node size of Standard D16s v4 (16 vCPUs, 64 GB memory) and 500 GB disk size.
Before you start the automated installation of the Kyvos application on Azure, ensure that you have the following information.
Basic Configurations
To install Kyvos in your Azure environment, you must have an Azure account with an active subscription.
Permissions
Kyvos can be deployed from the Azure Marketplace using an existing or new resource group.
Important
To deploy Kyvos, you must have the required permissions, as explained below. To obtain these permissions, contact your Azure Administrator.
To verify your access, refer to Microsoft documentation.
To verify access for a user to Azure resources, refer to Microsoft documentation.
If you have an Owner/Contributor/Managed Application Contributor Role at the subscription level, you can skip the prerequisites for both new and existing resource groups.
Deploying Kyvos in a new Resource Group
The Managed Application Contributor Role must be assigned to the user at the subscription level .
Deploying Kyvos using the existing Resource Group
The Owner role must be assigned to the user on the Resource Group in which Kyvos is being created.
The custom role must be assigned to the user at the subscription level. Contact your administrator to create or share the name of the custom role.
Register Microsoft Resource Providers at the Subscription Level
To deploy Kyvos, ensure that the following Microsoft Resource Providers are registered at the subscription level.
To learn about how to verify and register Resource Provider, see the Verifying and Registering Microsoft Providers section.
Important
If you are unable to register Microsoft Resource Providers, contact your Azure Account Administrator to do so.
Microsoft Resource Providers
Microsoft.Storage
Microsoft.Compute
Microsoft.ManagedIdentity
Microsoft.Network
Microsoft.KeyVault
Microsoft.insights
Microsoft.Web
Microsoft.Databricks
Network Configurations
For existing virtual network, you must have the following permissions on the existing network:
Note
This is not required if you are creating network resources using the Kyvos provided template.
Network/virtualNetworks/subnets/read
Network/virtualNetworks/read
Network/virtualNetworks/subnets/joinViaServiceEndpoint/action
Network/virtualNetworks/subnets/write
Network/virtualNetworks/subnets/join/action
OR
The Network Contributor role must be assigned to the user. See the Configuring Roles for Deployment User section for details on creating and assigning roles.
Prerequisites
Two subnets must be available for the deployment of Kyvos. These subnets must be within the required CIDR Range for the deployment of Kyvos Azure Marketplace:
Subnet for Kyvos Instances: /16 to /26
Subnet for Application Gateway: /16 to /27
No subnet delegations attached to any of the subnets.
Service Endpoints are required on the Subnet for Kyvos Instances:
Azure Storage (Microsoft.Storage) : This model secures and controls the level of access to your storage accounts so that only applications requesting data over the specified set of networks or through the specified set of Azure resources, can access a storage account.
Azure Key Vault (Microsoft.KeyVault) : The virtual network service endpoints for Azure Key Vault allow you to restrict access to a specified virtual network and a list of IPv4 (Internet Protocol version 4) address ranges.
Azure App Service (Microsoft.Web) : By setting up access restrictions, you can create a priority-ordered allow/deny list to control network access to your application.
Databricks Configurations
The Kyvos application requires a Databricks cluster for building cubes and data profiling jobs.
Note
You must have access to the existing Databricks workspace, or you can create a new one. Refer to Microsoft documentation to create an Azure Databricks workplace.
You can either create a new Databricks cluster along with the Kyvos application or configure your own/existing Databricks cluster to deploy Kyvos using the Azure Marketplace wizard.
Following are the common prerequisites when using the Azure Marketplace wizard to create a new Databricks cluster or to use your own/existing Databricks cluster.
Workspace URL: See Microsoft Documentation to learn about how to get a workspace URL.
Azure Databricks Personal Access Token: You can use an existing token or create a new token.
To create an Azure Databricks personal access token for an Azure Databricks user, perform the following steps.Login to your Azure Databricks workspace by using the URL you obtained from Step 1.
NOTE: If you are unable to log in, contact your administrator to give you access. Refer to Microsoft documentation to learn about how to add a user to your Azure Databricks account using the account console.In your Azure Databricks workspace, click your Azure Databricks username in the top right bar, and then select User Settings from the list.
On the Access tokens tab, click Generate new token.
Optionally, enter a comment that helps you to identify this token in the future and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).Click Generate .
Copy the displayed token, and then click Done .
Service Principal: You need Service Principal Client ID, Service Principal Client Secret, and Tenant ID . You can either create a new Service Principal or use an existing Service Principal.
New Service Principal
You must have the required rights to create a new Service Principal.
Refer to the Provision a service principal in Azure portal section to learn about how to create an Azure AD Service Principal and obtain the Service Principal Client ID (also known as Application (client) ID), the Service Principal Client Secret and Tenant ID (also known as Directory (tenant) ID).
Existing Service Principal
If you do not have permission to view the Service Principal (App registration), contact your administrator.
To get the details from the existing Azure AD Service Principal, see the Getting Azure AD Service Principal Details section.
Service Principal Object ID:
To obtain this, search App registration in Azure Portal > Search and select your Service Principal , which you have created or used in Step 3, as explained above.
Click the value of Managed application in local directory. The overview page will display the Object ID of the Service Principal.
To use the existing Databricks cluster, refer to Databricks documentation to learn about how to get the Cluster ID.
Existing Databricks Cluster
To configure the existing Databricks cluster to deploy the Kyvos application, perform the following steps.
Log in to Databricks.
Click Compute.
Select the cluster that you want to configure in Kyvos.
Configure the existing Databricks cluster as follows:
Parameters Description Databricks Runtime Version
Kyvos supports:
Version 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)
Version 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)
Version 10.4 LTS (includes Apache Spark 3.1.2, Scala 2.12)
Autoscaling Options
Enable autoscaling: Select this to enable autoscaling.
Terminate after ___ minutes of inactivity: Set the value as 30.
Worker type
Recommended type Standard_E16ds_v4
Min Worker: Recommended value 1
Max Workers: Recommended value 10
To use Databricks with Spot Instances, select the corresponding checkbox. (not recommended for production use)
Driver Type
Recommended type Standard_E8ds_v4
In the Advanced Options, define the Spark Configurations as follows:
Sample configuration:spark.sql.parquet.int96AsTimestamp true spark.databricks.delta.preview.enabled true spark.hadoop.spark.sql.parquet.binaryAsString false spark.hadoop.fs.azure.account.oauth2.client.secret {Service-Principal-Client-Secret} spark.databricks.preemption.enabled false spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/{Tenant-ID}/oauth2/token spark.sql.parquet.binaryAsString false spark.databricks.service.server.enabled true spark.hadoop.fs.azure.account.oauth2.client.id {Service-Principal-Client-ID} spark.hadoop.fs.azure.account.auth.type OAuth spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.spark.sql.parquet.int96AsTimestamp true
Important
To use Databricks 10.4 LTS, you need to update the following properties in Databricks Advance options >Spark Configurations.
spark.sql.caseSensitive false
spark.hadoop.spark.sql.caseSensitive false
Replace the details of Service-Principal-Client-Secret, Tenant-ID, and Service-Principal-Client-ID with the value you have obtained in step 3, as explained above.
Related topics: