Automated Kyvos deployment with CDP on Azure
Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)
Prerequisites
Before you start the automated installation for Kyvos on Azure, ensure that you have the following information.
Resource Group for all Kyvos resources. Kyvos recommends having an empty resource group that will be only used for deploying Kyvos resources.
Managed Identity for Kyvos resources, with the following information.
Managed Identity Name
Managed Identity Resource Group Name
Key Vault URL for storing Postgres password.
Azure logged-in user should have the following rights to create Kyvos resources using ARM templates.
Contributor Access on Resource group being used for deployment of Kyvos resources.
Key and Secret Management rights on the Key Vault if using an existing Key Vault.
Networking: Kyvos ARM template will need information about Vnet, Subnet, Network Interface/Security Group that will be used by Kyvos Machines
Create a Network Interface/Security Group with the following ports opened in Inbound
6602, 6903, 6703, 45450, 45460, 6603, 6803, 45440, 6605, 8081, 8080, 45421, 45564, 4000,7009 and 22.
See Ports required for Kyvos for details.
SSH Key pair consisting of a private key and a public key.
Custom image with the following configuration files included in it:
conf file and includer file
Cacerts file: You can copy the cacerts file from the /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/jre/lib/security/cacerts location on master node and paste it at the /opt folder.
cm-auto-global_truststore.jks file: Copy the file from /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks at the same location in image.
Additional Information
Key Vault URL for storing secrets, if using the existing key vault.
Boot Diagnostics Storage Account Uri:
Shared Image Gallery Information: Since the custom image will be used as the base OS for Kyvos VMs, the user needs to provide the below parameters:
Gallery Resource Group Name
Gallery Subscription ID
Gallery Name
Gallery Image Definition Name
Gallery Image Version Name
Managed Identity for Kyvos resources, with the following information.
Managed Identity Name
Managed Identity Resource Group Name
Deploying Kyvos using Azure Resource Manager Template
To deploy Kyvos with CDP on Azure, follow the two-step process, as mentioned below:
Automated deployment of Kyvos on Databricks
To deploy the Kyvos using the Azure Resource Manager (ARM) template, perform the following steps.
Log in to your Azure Portal, with the user having sufficient permissions and information as mentioned in the prerequisites.
Use the Search box to search Deploy a Custom Template.
On the Custom Deployment page, click the Build your own template in the editor option.
On the Edit Template page, click Load File. Upload your Kyvos ARM Template.
Click Save.
The Custom Deployment page is loaded with parameters required for deployment.
Here, enter the details as:
Fields marked * are mandatory.
Parameter | Description |
---|---|
Subscription* | Your account subscription. |
Resource Group* | Enter the name of your resource group. The resource group is a collection of resources that share the same lifecycle, permissions, and policies. |
Region* | Choose the Azure region that's right for you and your customers. Not every resource is available in every region. |
VnetAddress | Enter the CIDR notation for the new VPC that will be created in the deployment. NOTE: This option is displayed only when the CreateVPC option is selected. |
NetworkSecurityGroupIpWhiteList | Provide the range of IP addresses allowed to access Kyvos Instances. Use 0.0.0.0/0 to allow all users access. NOTE: This parameter is displayed only when a new network security group is created within the deployment. |
Virtual Network Name* | Name of Virtual Network in which your VMs will run. |
VM Subnet Name* | Name of Subnet in which your VMs will run. This Subnet should be part of the above Virtual Network. |
ApplicationGatewaySubnetName* | Name of the Subnet in which Application Gateway will be created. The Subnet should be part of the above Virtual Network. NOTE: This parameter will display only if an existing VPC is used for deployment. |
Security Group Name* | Name of the Security group that can be used to access the VMs. |
Network Resource Group Name* | Name of Resource Group in which Virtual Network and Subnet are deployed. |
Security Group Resource Group Name | Name of Resource Group in which SecurityGroup is deployed. |
Enable Managed Identity Creation | Select True to Create New Managed Identity for Kyvos. Select False to use an already existing managed identity. |
Managed Identity Name* | Enter the name of User-Managed Identity to be attached with all Kyvos VMs. |
Managed Identity Resource Group Name | The Name of Resource Group in which Managed Identity is deployed. |
Databricks Authentication Type | Select the authentication type for the Databricks cluster from:
|
Databricks Token* | Specifies the value of the token used to connect to Databricks Cluster |
Kyvos Work Directory | Enter the path for the Kyvos work directory. |
SSH Public Key* | Provide an RSA public key in the single-line format (starting with "ssh-rsa") or the multi-line PEM format. You can generate SSH keys using ssh-keygen on Linux and OS X, or PuTTYGen on Windows. |
Additional Tags | Enter the additional tags to put on all resources. Use the syntax as: {"Key1": "Value1", "Key2" : "Value2"} |
Storage Account Name | Enter the name of the Storage Account to be used for Kyvos. |
Storage Account Container Name | Enter the name of Container in Storage Account which will be used for Kyvos. |
CustomPrefixVirtualMachines | Enter a custom prefix that you want to append before the name of the virtual machines to be used for Kyvos. |
CustomPrefixVPC | Enter the custom prefix you want to append before the name of VPC in case a new VPC is created for use with Kyvos. |
CustomPrefixNSG | Enter the custom prefix you want to append before the name of the Network Security Group in case a new group is created for use with Kyvos. |
CustomPrefixKeyVault | Enter the custom prefix you want to append before the name of Key Vault in case a new Key Vault is created for use with Kyvos. |
CustomPrefixScaleSet | Enter the custom prefix you want to append before the name of Scaleset that will be created for use with Kyvos. |
Vault URL* | If you have saved your secrets in the Key Vault, provide its URL. |
Vault Resource Group* | Enter the name of the Resource Group in which the Key Vault is deployed. |
Boot Diagnostics Storage Account Resource ID | Resource ID of a storage account of type gen1 for enabling Boot Diagnostics of VMs. If left blank Storage Account of type gen1 will be created. |
Storage Account Resource Group | Enter the name of the Resource Group in which the Storage Account is deployed. |
Object Id of Service Principal* | The Object ID assigned to the Service principal. This maps to the ID inside the Active Directory. |
SSH Private Key* | Provide the RSA private key in a single-line format. |
Kyvos Cluster Name | Provide a name for your Kyvos cluster. |
Kyvos Installation Path | Enter the installation path to deploy Kyvos. |
Databricks URL* | Provide the URL in <https://<account>.cloud.databricks.com> format. |
Databricks Cluster ID* | Enter the Cluster ID of your Azure cluster. To obtain this ID, click the Cluster Name on the Clusters page in Databricks. The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id>. The cluster ID is the number after the |
Databricks Cluster Organization ID* | Enter the Cluster Organization ID of your Azure cluster. To obtain this ID, click the Cluster Name on the Clusters page in Databricks. |
Postgres Password* | Provide the password to be used for Postgres. |
License File Value* | Enter valid Kyvos license key. |
Secret Key For Kyvos Bundle Download* | Enter the Secret key to access Kyvos bundle. |
Enable Public IP | Select True to enable Public IP for Kyvos Web portal. |
DNS Label Prefix | Unique DNS Name for the Public IP used to access the Virtual Machine. |
Perform Env Validation | Select True to perform environment validation before cluster deployment to ensure all the resources are created correctly. |
Host Name Based Deployment | Select True to use hostnames instead of IP Addresses for instances during cluster deployment. |
Click Review + Create.
The system validates your inputs and displays a summary of the provided inputs. To continue with deployment, click Create.
This creates all the resources and services required for deploying the Kyvos cluster. The Outputs page displays the details for all your services and the Kyvos Manager URL.
Note
For accessing the Usage Dashboard, you need to provide permissions after completing deployment.
Post deployment steps to use CDP on Azure
After Kyvos deployment gets completed, perform the following steps to configure Kyvos to use CDP on Azure:
Configure the CDP cluster to access the storage account used by Kyvos from within Spark jobs.
Log in to the Cloudera management console and go to Data Hub Clusters.
Open the Cloudera Manager application by clicking the CM URL link, as shown in the following figure.
On the Compute cluster, open Spark configurations from the Services dropdown.
On the Configuration tab, search for the yarn.access.hadoopFileSystems property, and replace the existing value with the location of the container used in Kyvos. If you do not find the property, then add a new property with value as the container location.
For example:
yarn.access.hadoopFileSystems=abfs://data@kyvoscdp.dfs.core.windows.net,abfs://kyvoscontainer92604@kyvos33333.dfs.core.windows.net
Here, the highlighted part is added to the existing value.
Restart affected services.
Configure IDBroker mapping for Kyvos user on CDP, using the following steps.
On the navigation pane, click Environments, and then click Actions > Manage Access.
Click the IDBroker Mappings tab and add user or group by clicking the Edit option in the Current Mappings section.
Click the plus icon to add a user or group, select a user or group from the dropdown, and enter a role in the Role input.
Click Save and Sync .
Ensure that the CDP DataAssumerRole is assigned to the Kyvos storage account with Storage blob data contributor.
Add cacerts file from /opt to /data/kyvos/app/kyvos/jre/jre/lib/security folder on all nodes (Kyvos Manager, BI Server, and Query Engines). This file is added from the CDP master node to /opt at the time of image creation.
Make the Host entry in /etc/hosts of CDP master hostname with CDP master private IP address on BI server nodes.
Change vendor from Databricks to Cloudera from Kyvos Manager, using the following steps.
Log on to Kyvos Manager, and navigate to Manage Kyvos > Hadoop Ecosystem Configuration.
Change the value for vendor from DATABRICKS to CLOUDERA .
Change file system type from ABFSS to ABFS.
In the HDInsights Home field, provide the value as /opt/cloudera/parcels/CDH/meta/
Provide the Namenode IP.
Change the value for Hadoop version to 3.1.1
Provide Cloudera version in Hadoop parameters as 7.2.2
Select Hive version as 3.1
Provide Hive JDBC URL. You can copy the Hive JDBC URL from the Cloudera manager application.
Select Spark version as 2.4
Update the Spark Library Path to /opt/cloudera/parcels/CDH/lib/spark/jars,/opt/cloudera/parcels/CDH/jars/spark-hive_2.11-2.4.5.7.2.2.7-6.jar,/opt/cloudera/parcels/CDH/jars/spark-atlas-connector-assembly-0.1.0.7.2.2.7-6.jar
Provide Spark history server URL.
Select both Sync Library and Sync Configurations.
On the Security configuration, provide details as:
Change the Hadoop Security Type from SIMPLE to KERBEROS.
Provide Keytab User Name and Keytab file details.
Refer to the Cloudera documentation to download the Keytab file .Upload local policy jar and US export policy jar.
Go back to Hadoop Ecosystem Configurations screen on Kyvos Manager. Select both Sync Library and Sync Configurations and submit.
Click Apply from the top-right of the screen.
Change the owner of azcopy executable file from adminuser to kyvos.
Location : /data/kyvos/installs/bin
This needs to be done on all nodes for Kyvos Manager, BI Server, and Query engines.
Restart the Kyvos services.
To design a cube over data residing on the Snowflake data source, then add the following property on the Kyvos connection screen and restart the BI service.
Property | Value |
---|---|
spark.driver.extraJavaOptions | -Dnet.snowflake.jdbc.ocspResponseCacheDir=/tmp |
This completes Kyvos installation and deployment in your environment, you can now access Kyvos to start creating your cubes.
Copyright Kyvos, Inc. All rights reserved.