Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
...
Before you start the automated installation for Kyvos on Azure, ensure that you have the following information.
Tip |
---|
Tip
|
Resource Group for all Kyvos resources. We recommend you keep an empty resource group that will only be used for deploying Kyvos resources. The deployment user must have Owner rights on this resource group.
If your network resources (for deploying Kyvos) are available in a separate Resource Group (other than the one mentioned in Point 1), create a Custom role for the user deploying the cluster with the following permissions. Refer to the Configuring Roles for Deployment User section for details on creating and assigning roles.
NOTE: This is not required if you are creating network resources using the Kyvos provided template.Microsoft.Network/virtualNetworks/subnets/read
Microsoft.Network/virtualNetworks/read
Microsoft.Network/networkSecurityGroups/read
Microsoft.Network/virtualNetworks/subnets/joinViaServiceEndpoint/action
Microsoft.Network/virtualNetworks/subnets/write
Microsoft.Network/virtualNetworks/subnets/join/action
Microsoft.Network/networkSecurityGroups/join/action
Managed Identity for Kyvos resources with the following information:
NOTE: As mentioned in the attached Prerequisites sheet, this is optional. It will be created if the value for Enable Managed Identity Creation is set as True in the ARM.If you want to use your existing Managed Identity, you will need these details:
Managed Identity Name
Managed Identity Resource Group Name
NOTE: If using an existing Managed Identity, ensure that NO permissions are assigned to it.
To use an Azure Active Directory (AAD) token for Managed Identity authentication, we recommend using an existing Managed Identity and providing the permissions to access Databricks as explained in Configure Managed Identity in Azure Databricks section.
NOTE: The below-mentioned permissions have been added to the Kyvos 2022.3 release.
If you want to create a new Managed Identity with the deployment, perform the following steps after completing the cluster deployment.Run JAR services from the Kyvos Manager. For this, navigate to the Kyvos Manager home page, and click the Actions menu on the dashboard. Select Sync Operation > Run JAR services option.
Restart the BI Server.
NOTE: Post deployment of the cluster, you can change the Databricks Authentication type from the Databricks page on Kyvos Manager.
Info title Important When switching from AAD to PAT, you need to provide personal access token, which is saved as a secret in your Azure Key Vault and is read from there for authentication purposes. For this:
- Go to Key Vault Secret.
- Add KYVOS -DATABRICKS-SERVICE-TOKEN-DefaultHadoopCluster01 and provide the DB token value and click Save.
When switching from PAT to AAD, perform the following steps:
- Configure Managed Identity in Azure Databricks
- Run JAR services from the Kyvos Manager. For this, navigate to the Kyvos Manager home page, and click the Actions menu on the dashboard. Select Sync Operation > Run JAR services option.
- Restart the BI Server
Valid License file for Kyvos.
Secret Key to access the Kyvos bundle.
For wizard-based deployment, the Managed Identity should have the following permissions.
Tag Contributor role attached to the Resource group. This allows you to apply tags on the Kyvos machine and storage without having access to the entity itself.
Virtual Machine Contributor role on the Resource group. This allows Kyvos to start, stop, create, and manage virtual machines in it.
Reader on the Resource group. This allows a user to read deployment-related information.
Website Contributor role attached tn the Resource group. This allows you to update schedule expression on the Azure function app (used for Automatic scheduling of Query Engines/BI Server).
Storage blob data contributor role attached to the storage/container account. This allows getting read/write access.
Contributor role attached on the managed identity to get Subscription ID and permission attached on the managed identity.
Contributor role permission must be attached to managed identity on resource level to start/stop Postgres Flexible server through scheduling.
Service Endpoints required on Subnet :
Azure Storage (Microsoft.Storage): This model enables you to secure and control the level of access to your storage accounts so that only applications requesting data over the specified set of networks or through the specified set of Azure resources can access a storage account.
Azure Key Vault (Microsoft.KeyVault): The virtual network service endpoints for Azure Key Vault allow you to restrict access to a specified virtual network. The endpoints also allow you to restrict access to a list of IPv4 (internet protocol version 4) address ranges.
Azure App Service (Microsoft.Web): By setting up access restrictions, you can define a priority-ordered allow/deny list that controls network access to your app.
Azure SQL Database (Microsoft.Sql): Security feature that controls whether the server for your databases and elastic pools in Azure SQL Database or for your dedicated SQL pool (formerly SQL DW) databases in Azure Synapse Analytics accepts communications that are sent from particular subnets in virtual networks.
Databricks cluster with the following parameters:
Databricks Runtime Version: Select version 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
Autopilot Options: Select the following:
Enable autoscaling: Select this to enable autoscaling.
Terminate after ___ minutes of inactivity. Set the value as 30.
Worker type: Recommended value E16dsv4 Min 2 Max 10
Min Worker: Recommended value 2
Max Workers: Recommended value 10
Driver Type: Recommended value E8dsv4
Advanced options
You MUST assign read rights on ADLS Gen2 from where you want to read raw dataIn the Spark Configurations define the required Service Principal to read and write on ADLS Gen2
Sample configurationCode Block spark.sql.parquet.int96AsTimestamp true spark.hadoop.spark.sql.parquet.binaryAsString false spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets/kyvoskeys/kyvossecret}} spark.databricks.preemption.enabled false spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/{Tenant-ID}/oauth2/token spark.sql.parquet.binaryAsString false spark.hadoop.fs.azure.account.oauth2.client.id {Service-Principal-Client-ID} spark.hadoop.fs.azure.account.auth.type OAuth spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.spark.sql.parquet.int96AsTimestamp true spark.hadoop.fs.azure.account.oauth2.msi.tenant {Tenant-ID} spark.sql.caseSensitive false spark.hadoop.spark.sql.caseSensitive false
NOTE: You can directly provide the Secret for your Service Principal.
You must change Spark configurations to use managed disk. Ensure that you must not change the configuration in the default root (/tmp) volume.
In the Spark Configurations, add the spark.local.dir /local_disk0 property where the local_disk0 is the managed disk.
Optionally, you can execute the df -h command from a notebook for verification.
Add the SPARK_WORKER_DIR=/local_disk0 in the Environment variables.
For security concerns, put the Secrets in Key Vault, and assign it to Databricks. See details for Azure Key Vault Configuration for Kyvos.
Databricks information:
Databricks Cluster Id: To obtain this ID, click the Cluster Name on the Clusters page in Databricks.
The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id>. The cluster ID is the number after the /clusters/ component in the URL of this page.Databricks Cluster Organization ID: To obtain this ID, click the Cluster Name on the Clusters page in Databricks.
The number after o= in the workspace URL is the organization ID. For example, if the workspace URL is https://westus.azuredatabricks.net/?o=7692xxxxxxxx , then the organization ID is 7692xxxxxxxx.Object Id of Service Principal assigned to Databricks. To obtain this, search App registration > Search and select your Service Principal name. Click the value of Managed application in local directory . The overview page will display the Object ID of the Service Principal.
Key Vault URL
If this is not provided, Kyvos will automatically create a new Key Vault for Azure passwords.
NOTE: You can create your own Key Vault for use with Kyvos.
If using an existing Key vault, ensure that the Soft Delete property is enabled, or you can enable it later.Permissions needed on Key Vault:
Assigned managed identity should have permission Secret Permissions (GET, LIST, and SET)
For externally created Key Vault, provide the Databricks token manually in the Key Vault:
Go to Key Vault Secret.
Add KYVOS -DATABRICKS-SERVICE-TOKEN-DefaultHadoopCluster01 and provide DB token value and save.
In case of Automated deployment, Wizard-based deployment, and/or if using an existing Azure Database for Postgres Flexible Server, ensure that a separate subnet attached to it with delegation (Microsoft.DBforPostgreSWL/flexibleServers and service endpoints Storage, KeyVault, SQL and Web).
To use externally created Flexible Server in deployments, use ARM template (FlexibleServerKyvosManagerRepository and FlexibleServerKyvosRepository) to create Flexible Server that can be used in the deployments directly. OR, if you create Flexible Server through Microsoft, then you need to complete the following steps. For more information about how to create Flexible Server, refer to Microsoft documentation.
For Kyvos repository
Database name must be delverepo.
Username must be Postgres
Following tags are expected on the external repository:
UsedBy - Kyvos
ROLE - DATABASE
LAYER - Metadata_Storage
For Kyvos Manager repository
Database name must be kmrepo.
Username must be kmdbuser
Following tags are expected on the external repository:
UsedBy - Kyvos
ROLE - DATABASE_KM
LAYER - Metadata_Storage
To deploy the Kyvos cluster through the Kyvos Manager wizard method using password-based authentication for service nodes, ensure that the permissions listed here are available on all the VM instances for Linux users deploying the cluster.
To deploy the Kyvos cluster through the Kyvos Manager wizard method using custom hostnames for resources, ensure that the steps listed here are completed.
The Azure logged-in user should have the following rights to create Kyvos resources using ARM templates.
Owner Access on Resource group being used for deployment of Kyvos resources.
Key and Secret Management rights on the Key vault if using an existing Key vault.
Networking: Kyvos ARM template will need information about Vnet, Subnet, Network Interface/Security Group that will be used by Kyvos Machines.
Create a Network Interface/Security Group with the following ports opened in Inbound rules.
6602, 6903, 6703, 45450, 45460, 6603, 6803, 45440, 6605, 8081, 8080, 45421, 45564, 4000, 7009, 22, 8443, 8444. 9443, 9444.
To enable Web Portal High Availability,If using Session Management, you will need 45564 and 4000 ports opened in Inbound rules
If using Azure Load Balancer, you will need port 80.
See Ports required for Kyvos for details.
SSH Key pair consisting of a private key and a public key.
Storage account permission and recommendations:
Managed identity attached to the storage/container should have storage blob data contributor permission.
If the storage account is in a separate Resource Group (different from the one in which the Managed Identity exists), then the Managed Identity should have a Reader role assigned to it at the Storage Account level. This permission is needed by the Kyvos Manager validation framework to check if the Storage Account is accessible or not.
Service principle attached to the Databricks cluster should have storage blob data contributor permission on the storage/container
Soft deletion property must be disabled.
Storage account must be of type ADLS GEN 2.
If the Databricks cluster and Kyvos resources are deployed in different networks, then VNET peering MUST be enabled between both networks.
Refer to VNet Peering for more information.If you want to use Environment Validation to verify resources before Kyvos deployment, Databricks network peering, and port validation, you need to provide a set of additional permission on the Managed Identity. If Managed identity does not have the required permission, environment validation would fail in case of different networks are used in Kyvos and Databricks cluster.
In this case, you can add permissions to the managed identity using any of the two methods explained below:Define a custom role with the following permissions.
Microsoft.Network/virtualNetworks/subnets/read
Microsoft.Network/virtualNetworks/read
Microsoft.Network/networkSecurityGroups/read
Microsoft.Network/virtualNetworks/virtualNetworkPeerings/read
OR
Assign the following permissions without creating custom roles:
Reader permission on both virtual networks used in peering
Reader permission on network security group [For port validation]
To access the Usage Dashboard, you need to provide permissions after completing the deployment.
When upgrading Kyvos from an older version than 2022.1 , you must ensure the following:
jq utility must be installed on the Query Engine instances.
Manual steps to download jq utility on the Query Engine instances:
SSH into the instance using the command:
ssh -i pemfile kyvos@qe_ipRun the following commands:
Code Block cd /data/kyvos/installs/bin/ wget https://github.com/stedolan/jq/releases/download/jq-1.5/jq-linux64 -O jq chmod a+x jq
The Upgrade Policy for Virtual Machine Scale Set should be set as Manual.
When upgrading Kyvos from an older version than 2023.1 , you must ensure the following:
jq utility must be installed on the BI Server instances.
Manual steps to download jq utility on the BI Server instances:
SSH into the instance using the command:
ssh -i pemfile kyvos@bi_ipRun the following commands:
Code Block cd /data/kyvos/installs/bin/ wget https://github.com/stedolan/jq/releases/download/jq-1.5/jq-linux64 -O jq chmod a+x jq
When upgrading Kyvos from an older version than 2023.1, you must run the openFirewallPortBi.sh script from the sudo user on all BI Server Instances.
For Automated Azure deployment,
Newly created Flexible Server: User provided password will be used for repository. No password change is required.
Existing Flexible Server: Password of the existing repository needs to be provided. No password change is required.
For Wizard-based Azure deployment,
For Kyvos Shared repository, password of existing repository needs to be provided.
For Kyvos non-shared repository, password of repository can be provided as any value to which it will be changed. However for this change to work properly actual password of repository needs to be as delve123@
If using pre-created external repository for Kyvos Manager, the initial password must kyvosmanager#123
...