Document toolboxDocument toolbox

Prerequisites for Kyvos Deployment on Azure

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


Before you start the automated installation for Kyvos on Azure, ensure you have the following information.

Tip

  1. Resource Group for all Kyvos resources. We recommend you keep an empty resource group that will only be used for deploying Kyvos resources. The deployment user must have Owner rights on this resource group.

  2. If your network resources (for deploying Kyvos) are available in a separate Resource Group (other than the one mentioned in Point 1), create a Custom role for the user deploying the cluster with the following permissions. Refer to the Configuring Roles for Deployment User section for details on creating and assigning roles.
    NOTE: This is not required if you are creating network resources using the Kyvos provided template.

    1. Microsoft.Network/virtualNetworks/subnets/read

    2. Microsoft.Network/virtualNetworks/read

    3. Microsoft.Network/networkSecurityGroups/read

    4. Microsoft.Network/virtualNetworks/subnets/joinViaServiceEndpoint/action

    5. Microsoft.Network/virtualNetworks/subnets/write

    6. Microsoft.Network/virtualNetworks/subnets/join/action

    7. Microsoft.Network/networkSecurityGroups/join/action

  3. Managed Identity for Kyvos resources with the following information:
    NOTE:  As mentioned in the attached Prerequisites sheet, this is optional. It will be created if the value for Enable Managed Identity Creation is set as True in the ARM. 

    1. If you want to use your existing Managed Identity, you will need these details: 

      1. Managed Identity Name

      2. Managed Identity Resource Group Name

    NOTE: If using an existing Managed Identity, ensure that NO permissions are assigned to it.

  4. To use an Azure Active Directory (AAD) token for Managed Identity authentication, we recommend using an existing Managed Identity and providing the permissions to access Databricks as explained in Configure Managed Identity in Azure Databricks section.
    NOTE: The below-mentioned permissions have been added to the Kyvos 2022.3 release. 
    If you want to create a new Managed Identity with the deployment, perform the following steps after completing the cluster deployment.

    1. Configure Managed Identity in Azure Databricks 

    2. Run JAR services from the Kyvos Manager. For this, navigate to the Kyvos Manager home page, and click the Actions menu on the dashboard. Select Sync Operation > Run JAR services option.

    3. Restart the BI Server.

      NOTE: Post deployment of the cluster, you can change the Databricks Authentication type from the Databricks page on Kyvos Manager. 

  5. Valid License file for Kyvos.

  6. Secret Key to access the Kyvos bundle.

  7. For wizard-based deployment, the Managed Identity should have the following permissions.

    1. Tag Contributor role attached to the Resource group. This allows you to apply tags on the Kyvos machine and storage without having access to the entity itself.

    2. Virtual Machine Contributor role on the Resource group. This allows Kyvos to start, stop, create, and manage virtual machines in it.

    3. Reader on the Resource group. This allows a user to read deployment-related information.

    4. Website Contributor role attached tn the Resource group. This allows you to update schedule expression on the Azure function app (used for Automatic scheduling of Query Engines/BI Server).

    5. Storage blob data contributor role attached to the storage/container account. This allows getting read/write access.

    6. Contributor role attached on the managed identity to get Subscription ID and permission attached on the managed identity.

    7. Contributor role permission must be attached to managed identity on resource level to start/stop Postgres Flexible server through scheduling.

  8. Service Endpoints required on Subnet :

    1. Azure Storage (Microsoft.Storage): This model enables you to secure and control the level of access to your storage accounts so that only applications requesting data over the specified set of networks or through the specified set of Azure resources can access a storage account.

    2. Azure Key Vault (Microsoft.KeyVault): The virtual network service endpoints for Azure Key Vault allow you to restrict access to a specified virtual network. The endpoints also allow you to restrict access to a list of IPv4 (internet protocol version 4) address ranges.

    3. Azure App Service (Microsoft.Web): By setting up access restrictions, you can define a priority-ordered allow/deny list that controls network access to your app.

    4. Azure SQL Database (Microsoft.Sql): Security feature that controls whether the server for your databases and elastic pools in Azure SQL Database or for your dedicated SQL pool (formerly SQL DW) databases in Azure Synapse Analytics accepts communications that are sent from particular subnets in virtual networks.

  9. Databricks cluster with the following parameters:

    1. Databricks Runtime Version: Select version 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12) or 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)

    2. Autopilot Options: Select the following: 

      1. Enable autoscaling: Select this to enable autoscaling.

      2. Terminate after ___ minutes of inactivity. Set the value as 30.

    3. Worker type: Recommended value E16dsv4  Min 2 Max 10

      1. Min Worker: Recommended value 2

      2. Max Workers: Recommended value 10

    4. Driver Type: Recommended value E8dsv4 

    5. Advanced options
      You MUST assign read rights on ADLS Gen2 from where you want to read raw data

      1. In the Spark Configurations define the required Service Principal to read and write on ADLS Gen2
        Sample configuration

        spark.sql.parquet.int96AsTimestamp true spark.hadoop.spark.sql.parquet.binaryAsString false spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets/kyvoskeys/kyvossecret}} spark.databricks.preemption.enabled false spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/%7BTenant-ID%7D/oauth2/token spark.sql.parquet.binaryAsString false spark.hadoop.fs.azure.account.oauth2.client.id {Service-Principal-Client-ID} spark.hadoop.fs.azure.account.auth.type OAuth spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.spark.sql.parquet.int96AsTimestamp true spark.hadoop.fs.azure.account.oauth2.msi.tenant {Tenant-ID} spark.sql.caseSensitive false spark.hadoop.spark.sql.caseSensitive false

        NOTE: You can directly provide the Secret for your Service Principal.
        From Kyvos 2024.3 onwards, Databricks version 14.3 LTS is supported for Kyvos Enterprise. If you use Databricks 14.3 LTS, you must include those mentioned above and the following properties.
        Sample configuration

        spark.sql.parquet.inferTimestampNTZ.enabled false spark.hadoop.spark.sql.parquet.inferTimestampNTZ.enabled false spark.sql.legacy.parquet.nanosAsLong false spark.hadoop.spark.sql.legacy.parquet.nanosAsLong false
      2. You must change Spark configurations to use managed disk. Ensure that you must not change the configuration in the default root (/tmp) volume.

        1. In the Spark Configurations, add the spark.local.dir /local_disk0 property where the local_disk0 is the managed disk.

        2. Optionally, you can execute the df -h command from a notebook for verification.

        3. Add the SPARK_WORKER_DIR=/local_disk0 in the Environment variables.
          For security concerns, put the Secrets in Key Vault, and assign it to Databricks. See details for Azure Key Vault Configuration for Kyvos.

  10. Databricks information:

    1. Databricks Cluster Id: To obtain this ID, click the Cluster Name on the Clusters page in Databricks.
      The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id>. The cluster ID is the number after the /clusters/ component in the URL of this page.

    2. Databricks Cluster Organization ID: To obtain this ID, click the Cluster Name on the Clusters page in Databricks.
      The number after o= in the workspace URL is the organization ID. For example, if the workspace URL is  https://westus.azuredatabricks.net/?o=7692xxxxxxxx , then the organization ID is 7692xxxxxxxx.

    3. Object Id of Service Principal assigned to Databricks. To obtain this, search App registration > Search and select your Service Principal name. Click the value of  Managed application in local directory. The overview page will display the Object ID of the Service Principal.

  11. Key Vault URL
    If this is not provided, Kyvos will automatically create a new Key Vault for Azure passwords.
    NOTE: You can create your own Key Vault for use with Kyvos. 
    If using an existing Key vault, ensure that the Soft Delete property is enabled, or you can enable it later.

    1. Permissions needed on Key Vault:

      1. Assigned managed identity should have permission Secret Permissions (GET, LIST, and SET)

    2. For externally created Key Vault, provide the Databricks token manually in the Key Vault:

      1. Go to Key Vault Secret.

      2. Add KYVOS -DATABRICKS-SERVICE-TOKEN-DefaultHadoopCluster01 and provide DB token value and save.

  12. In case of Automated deployment, Wizard-based deployment, and/or if using an existing Azure Database for Postgres Flexible Server, ensure that a separate subnet attached to it with delegation (Microsoft.DBforPostgreSWL/flexibleServers and service endpoints Storage, KeyVault, SQL and Web).

  13. To use externally created Flexible Server in deployments, use ARM template (FlexibleServerKyvosManagerRepository and FlexibleServerKyvosRepository) to create Flexible Server that can be used in the deployments directly. OR, if you create Flexible Server through Microsoft, then you need to complete the following steps. For more information about how to create Flexible Server, refer to Microsoft documentation.

    1. For Kyvos repository

      1. Database name must be delverepo.

      2. Username must be Postgres

      3. Following tags are expected on the external repository:
        UsedBy - Kyvos
        ROLE - DATABASE
        LAYER - Metadata_Storage

    2. For Kyvos Manager repository

      1. Database name must be kmrepo.

      2. Username must be kmdbuser

      3. Following tags are expected on the external repository:
        UsedBy - Kyvos
        ROLE - DATABASE_KM
        LAYER - Metadata_Storage

  14. To deploy the Kyvos cluster through the Kyvos Manager wizard method using password-based authentication for service nodes, ensure that the permissions listed here are available on all the VM instances for Linux users deploying the cluster.

  15. To deploy the Kyvos cluster through the Kyvos Manager wizard method using custom hostnames for resources, ensure that the steps listed here are completed.  

  16. The Azure logged-in user should have the following rights to create Kyvos resources using ARM templates.

    1. Owner Access on Resource group being used for deployment of Kyvos resources.

    2. Key and Secret Management rights on the Key vault if using an existing Key vault.

  17. Networking: Kyvos ARM template will need information about Vnet, Subnet, Network Interface/Security Group that will be used by Kyvos Machines.

    1. Create a Network Interface/Security Group with the following ports opened in Inbound rules. 
      6602, 6903, 6703, 45450, 45460, 6603, 6803, 45440, 6605, 8081, 8080, 45421, 45564, 4000, 7009, 22, 8443, 8444. 9443, 9444.
      To enable Web Portal High Availability,

      1. If using Session Management, you will need 45564 and 4000 ports opened in Inbound rules

      2. If using Azure Load Balancer, you will need port 80. 
        See Ports required for Kyvos for details.

  18. SSH Key pair consisting of a private key and a public key.

  19. Storage account permission and recommendations:

    1. Managed identity attached to the storage/container should have storage blob data contributor permission.

    2. If the storage account is in a separate Resource Group (different from the one in which the Managed Identity exists), then the Managed Identity should have a Reader role assigned to it at the Storage Account level. This permission is needed by the Kyvos Manager validation framework to check if the Storage Account is accessible or not.

    3. Service principle attached to the Databricks cluster should have storage blob data contributor permission on the storage/container.

    4. Soft deletion property must be disabled.

    5. Storage account must be of type ADLS GEN 2.

  20. If the Databricks cluster and Kyvos resources are deployed in different networks, then VNET peering MUST be enabled between both networks. 
    Refer to VNet Peering for more information. 

  21. If you want to use Environment Validation to verify resources before Kyvos deployment, Databricks network peering, and port validation, you need to provide a set of additional permission on the Managed Identity. If Managed identity does not have the required permission, environment validation would fail in case of different networks are used in Kyvos and Databricks cluster.
    In this case, you can add permissions to the managed identity using any of the two methods explained below:

    1. Define a custom role with the following permissions.

      1. Microsoft.Network/virtualNetworks/subnets/read

      2. Microsoft.Network/virtualNetworks/read

      3. Microsoft.Network/networkSecurityGroups/read

      4. Microsoft.Network/virtualNetworks/virtualNetworkPeerings/read
        OR

    2. Assign the following permissions without creating custom roles: 

      1. Reader permission on both virtual networks used in peering

      2. Reader permission on network security group [For port validation]

  22. To access the Usage Dashboard, you need to provide permissions after completing the deployment.

  23. When upgrading Kyvos  from an older version than 2022.1, you must ensure the following:

    1. jq utility must be installed on the Query Engine instances.

      Manual steps to download jq utility on the Query Engine instances:

      1. SSH into the instance using the command:
        ssh -i pemfile kyvos@qe_ip

      2. Run the following commands:

        cd /data/kyvos/installs/bin/ wget https://github.com/stedolan/jq/releases/download/jq-1.5/jq-linux64 -O jq chmod a+x jq
    2. The Upgrade Policy for Virtual Machine Scale Set should be set as Manual.

  24. When upgrading Kyvos from an older version than 2023.1,  you must ensure the following:

    1. jq utility must be installed on the BI Server instances.

      Manual steps to download jq utility on the BI Server instances:

      1. SSH into the instance using the command:
        ssh -i pemfile kyvos@bi_ip

      2. Run the following commands:

        cd /data/kyvos/installs/bin/ wget https://github.com/stedolan/jq/releases/download/jq-1.5/jq-linux64 -O jq chmod a+x jq
  25. When upgrading Kyvos from an older version than 2023.1, you must run the openFirewallPortBi.sh script  from the sudo user on all BI Server Instances. 

  26. For Automated Azure deployment,

    1. Newly created Flexible Server: User provided password will be used for repository. No password change is required.

    2. Existing Flexible Server: Password of the existing repository needs to be provided. No password change is required.

  27. For Wizard-based Azure deployment,

    1. For Kyvos Shared repository, password of existing repository needs to be provided.

    2. For Kyvos non-shared repository, password of repository can be provided as any value to which it will be changed. However for this change to work properly actual password of repository needs to be as delve123@

    3. If using pre-created external repository for Kyvos Manager, the initial password must kyvosmanager#123

Important

Please save the details for future reference, as deployment will fail if you provide wrong details.

Optional Information

In addition to the prerequisites, you can also keep the following information handy according to your business requirements.

  1. Boot Diagnostics Storage Account Uri: This is needed to store the Bootup logs of the VM. In case the value is empty, the boot diagnostics feature will be disabled for all Kyvos VMs.

  2. Log Analytics Workspace Name and  Resource Group Name: This is needed for enabling Log Analytics virtual machine extension for VMs used for monitoring Azure VM.

  3. Shared Image Gallery Information: If you want to use your hardened images as a base OS for  Kyvos  VMs, then you will need the information for the following parameters:

    1. Gallery Resource Group Name: Name of the Resource Group in which the Gallery resides.

    2. Gallery Subscription ID: Subscription ID in which Gallery resides.

    3. Gallery Name: Name of the Shared Image Gallery. An Azure image gallery is a repository for managing and sharing custom images. An image source can be an existing Azure VM.

    4. Gallery Image Definition Name:  Name of the Image Definition. Image definitions are created within a gallery and carry information about the image and requirements for using it internally. This includes whether the image is Windows or Linux, release notes, and minimum and maximum memory requirements. It is a definition of a type of image.

    5. Gallery Image Version Name:  Name of the Image Version in the <MajorVersion>.<MinorVersion>.<Patch> format.

  4. Azure Function Name: If you want to use pre-created functions, then you can either provide the name of the function at the time of creating Kyvos resources through Kyvos Manager (Wizard-based deployment), or you can create the function externally (from the Azure portal), using the azure_functions.json  template file or azure_functions_secure.json file (for enhanced security) available in the Azure Installation Files folder. If upgrading from an older version (prior to 2021.3), you can use the steps mentioned in the section Post-upgrade Steps.

Copyright Kyvos, Inc. All rights reserved.