Document toolboxDocument toolbox

Automated Kyvos deployment with CDP on Azure

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


Prerequisites

Before you start the automated installation for Kyvos on Azure, ensure that you have the following information.

  1. Resource Group for all Kyvos resources. Kyvos recommends having an empty resource group that will be only used for deploying Kyvos resources.

  2. Managed Identity for Kyvos resources, with the following information.

    1. Managed Identity Name

    2. Managed Identity Resource Group Name

  3. Key Vault URL for storing Postgres password.

  4. Azure logged-in user should have the following rights to create Kyvos resources using ARM templates.

    1. Contributor Access on Resource group being used for deployment of Kyvos resources.

    2. Key and Secret Management rights on the Key Vault if using an existing Key Vault.

  5. Networking: Kyvos ARM template will need information about Vnet, Subnet, Network Interface/Security Group that will be used by Kyvos Machines

    1. Create a Network Interface/Security Group with the following ports opened in Inbound
      6602, 6903, 6703, 45450, 45460, 6603, 6803, 45440, 6605, 8081, 8080, 45421, 45564, 4000,7009 and 22.
      See Ports required for Kyvos for details.

  6. SSH Key pair consisting of a private key and a public key.

  7. Custom image with the following configuration files included in it:

    1. conf file and includer file

    2. Cacerts file: You can copy the cacerts file from the /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/jre/lib/security/cacerts location on master node and paste it at the /opt folder.

    3. cm-auto-global_truststore.jks file: Copy the file from /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks at the same location in image.

Additional Information

  1. Key Vault URL for storing secrets, if using the existing key vault.

  2. Boot Diagnostics Storage Account Uri:

  3. Shared Image Gallery Information: Since the custom image will be used as the base OS for Kyvos VMs, the user needs to provide the below parameters:

    1. Gallery Resource Group Name

    2. Gallery Subscription ID

    3. Gallery Name

    4. Gallery Image Definition Name

    5. Gallery Image Version Name

  4. Managed Identity for Kyvos resources, with the following information.

    1. Managed Identity Name

    2. Managed Identity Resource Group Name

Deploying Kyvos using Azure Resource Manager Template

To deploy Kyvos with CDP on Azure, follow the two-step process, as mentioned below:

  1. Automated deployment of Kyvos on Databricks

  2. Post deployment steps to use CDP on Azure

Automated deployment of Kyvos on Databricks

To deploy the Kyvos using the Azure Resource Manager (ARM) template, perform the following steps.

  1. Log in to your Azure Portal, with the user having sufficient permissions and information as mentioned in the prerequisites.

  2. Use the Search box to search Deploy a Custom Template.

  3. On the Custom Deployment page, click the Process your own template in the editor option.

  4. On the Edit Template page, click Load File. Upload your Kyvos ARM Template.

  5. Click Save.

  6. The Custom Deployment page is loaded with parameters required for deployment.

  7. Here, enter the details as:
    Fields marked * are mandatory.

Parameter

Description

Parameter

Description

Subscription*

Your account subscription.

Resource Group*

Enter the name of your resource group. The resource group is a collection of resources that share the same lifecycle, permissions, and policies.

Region*

Choose the Azure region that's right for you and your customers. Not every resource is available in every region.

VnetAddress

Enter the CIDR notation for the new VPC that will be created in the deployment.

NOTE: This option is displayed only when the  CreateVPC  option is selected.
If a new VPC is created and you have enabled  WebPortal  HA (from the Kyvos Manager), then you must perform the  post-deployment steps  after deploying the cluster.

NetworkSecurityGroupIpWhiteList

Provide the range of IP addresses allowed to access Kyvos Instances. Use 0.0.0.0/0 to allow all users access.

NOTE: This parameter is displayed only when a new network security group is created within the deployment. 

Virtual Network Name*

Name of Virtual Network in which your VMs will run.

VM Subnet Name*

Name of Subnet in which your VMs will run. This Subnet should be part of the above Virtual Network.

ApplicationGatewaySubnetName* 

Name of the Subnet in which Application Gateway will be created. The Subnet should be part of the above Virtual Network. 

NOTE: This parameter will display only if an existing VPC is used for deployment.

Security Group Name*

Name of the Security group that can be used to access the VMs.

Network Resource Group Name*

Name of Resource Group in which Virtual Network and Subnet are deployed.

Security Group Resource Group Name

Name of Resource Group in which SecurityGroup is deployed.

Enable Managed Identity Creation

Select True to Create New Managed Identity for Kyvos.

Select False to use an already existing managed identity.

Managed Identity Name*

Enter the name of User-Managed Identity to be attached with all Kyvos VMs.

Managed Identity Resource Group Name

The Name of Resource Group in which Managed Identity is deployed.

Databricks Authentication Type

Select the authentication type for the Databricks cluster from:

  • AAD Token Using Managed Identity: This option is supported only with premium workspace.

  • Personal Access Token

Databricks Token*

Specifies the value of the token used to connect to Databricks Cluster

Kyvos Work Directory

Enter the path for the Kyvos work directory.

SSH Public Key*

Provide an RSA public key in the single-line format (starting with "ssh-rsa") or the multi-line PEM format.

You can generate SSH keys using ssh-keygen on Linux and OS X, or PuTTYGen on Windows.

Additional Tags

Enter the additional tags to put on all resources.

Use the syntax as: {"Key1": "Value1", "Key2" : "Value2"}

Storage Account Name

Enter the name of the Storage Account to be used for Kyvos.

Storage Account Container Name

Enter the name of Container in Storage Account which will be used for Kyvos.

CustomPrefixVirtualMachines

Enter a custom prefix that you want to append before the name of the virtual machines to be used for Kyvos.

CustomPrefixVPC

Enter the custom prefix you want to append before the name of VPC in case a new VPC is created for use with Kyvos.

CustomPrefixNSG

Enter the custom prefix you want to append before the name of the Network Security Group in case a new group is created for use with Kyvos.

CustomPrefixKeyVault

Enter the custom prefix you want to append before the name of Key Vault in case a new Key Vault is created for use with Kyvos.

CustomPrefixScaleSet

Enter the custom prefix you want to append before the name of Scaleset that will be created for use with Kyvos.

Vault URL*

If you have saved your secrets in the Key Vault, provide its URL.

Vault Resource Group*

Enter the name of the Resource Group in which the Key Vault is deployed.

Boot Diagnostics Storage Account Resource ID

Resource ID of a storage account of type gen1 for enabling Boot Diagnostics of VMs. If left blank Storage Account of type gen1 will be created.

Storage Account Resource Group

Enter the name of the Resource Group in which the Storage Account is deployed.

Object Id of Service Principal*

The Object ID assigned to the Service principal. This maps to the ID inside the Active Directory.

SSH Private Key*

Provide the RSA private key in a single-line format.

Kyvos Cluster Name

Provide a name for your Kyvos cluster.

Kyvos Installation Path

Enter the installation path to deploy Kyvos.

Databricks URL*

Provide the URL in <https://<account>.cloud.databricks.com> format.

Databricks Cluster ID*

Enter the Cluster ID of your Azure cluster.

To obtain this ID, click the Cluster Name on the Clusters page in Databricks.

The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id>. The cluster ID is the number after the /clusters/ component in the URL of this page.

Databricks Cluster Organization ID*

Enter the Cluster Organization ID of your Azure cluster. To obtain this ID, click the Cluster Name on the Clusters page in Databricks.
The number after o= in the workspace URL is the organization ID. For example, if the workspace URL is  https://westus.azuredatabricks.net/?o=7692xxxxxxxx , then the organization ID is 7692xxxxxxxx.

Postgres Password*

Provide the password to be used for Postgres.

License File Value*

Enter valid Kyvos license key.

Secret Key For Kyvos Bundle Download*

Enter the Secret key to access Kyvos bundle.

Enable Public IP

Select True to enable Public IP for Kyvos Web portal.

DNS Label Prefix

Unique DNS Name for the Public IP used to access the Virtual Machine.

Perform Env Validation

Select True to perform environment validation before cluster deployment to ensure all the resources are created correctly.

Host Name Based Deployment  

Select True to use hostnames instead of IP Addresses for instances during cluster deployment.

  1. Click Review + Create.

  2. The system validates your inputs and displays a summary of the provided inputs. To continue with deployment, click Create.

This creates all the resources and services required for deploying the Kyvos cluster. The Outputs page displays the details for all your services and the Kyvos Manager URL.

Note

For accessing the Usage Dashboard, you need to provide  permissions after completing deployment.

Post deployment steps to use CDP on Azure

After Kyvos deployment gets completed, perform the following steps to configure Kyvos to use CDP on Azure:

  1. Configure the CDP cluster to access the storage account used by Kyvos from within Spark jobs.

    1. Log in to the Cloudera management console and go to Data Hub Clusters.

    2. Open the Cloudera Manager application by clicking the CM URL link, as shown in the following figure.

    3. On the Compute cluster, open Spark configurations from the Services dropdown.

    4. On the Configuration tab, search for the yarn.access.hadoopFileSystems property, and replace the existing value with the location of the container used in Kyvos. If you do not find the property, then add a new property with value as the container location.
      For example:
      yarn.access.hadoopFileSystems=abfs://data@kyvoscdp.dfs.core.windows.net,abfs://kyvoscontainer92604@kyvos33333.dfs.core.windows.net
      Here, the highlighted part is added to the existing value.

    5. Restart affected services.

  2. Configure IDBroker mapping for Kyvos user on CDP, using the following steps.

    1. On the navigation pane, click Environments, and then click Actions > Manage Access.

    2. Click the IDBroker Mappings tab and add user or group by clicking the Edit option in the Current Mappings section.

    3. Click the plus icon to add a user or group, select a user or group from the dropdown, and enter a role in the Role input.

    4. Click Save and Sync .

  3. Ensure that the CDP DataAssumerRole is assigned to the Kyvos storage account with Storage blob data contributor.

  4. Add cacerts file from /opt to /data/kyvos/app/kyvos/jre/jre/lib/security folder on all nodes (Kyvos Manager, BI Server, and Query Engines). This file is added from the CDP master node to /opt at the time of image creation.

  5. Make the Host entry in /etc/hosts of CDP master hostname with CDP master private IP address on BI server nodes.

  6. Change vendor from Databricks to Cloudera from Kyvos Manager, using the following steps.

    1. Log on to Kyvos Manager, and navigate to Manage Kyvos > Hadoop Ecosystem Configuration.

    2. Change the value for vendor from DATABRICKS to CLOUDERA .

    3. Change file system type from ABFSS to ABFS.

    4. In the HDInsights Home field, provide the value as /opt/cloudera/parcels/CDH/meta/

    5. Provide the Namenode IP.

    6. Change the value for Hadoop version to 3.1.1

    7. Provide Cloudera version in Hadoop parameters as 7.2.2

    8. Select Hive version as 3.1

    9. Provide Hive JDBC URL. You can copy the Hive JDBC URL from the Cloudera manager application.

    10. Select Spark version as 2.4

    11. Update the Spark Library Path to /opt/cloudera/parcels/CDH/lib/spark/jars,/opt/cloudera/parcels/CDH/jars/spark-hive_2.11-2.4.5.7.2.2.7-6.jar,/opt/cloudera/parcels/CDH/jars/spark-atlas-connector-assembly-0.1.0.7.2.2.7-6.jar

    12. Provide Spark history server URL.

    13. Select both Sync Library and Sync Configurations.

  7. On the Security configuration, provide details as:

    1. Change the Hadoop Security Type from SIMPLE to KERBEROS.

    2. Provide Keytab User Name and Keytab file details.
      Refer to the Cloudera documentation to download the Keytab file .

    3. Upload local policy jar and US export policy jar.

  8. Go back to Hadoop Ecosystem Configurations  screen on Kyvos Manager. Select both Sync Library and Sync Configurations and submit.

  9. Click Apply from the top-right of the screen.

  10. Change the owner of azcopy executable file from adminuser to kyvos.

    1. Location : /data/kyvos/installs/bin

    2. This needs to be done on all nodes for Kyvos Manager, BI Server, and Query engines.

    3. Restart the Kyvos services.

  11. To design a semantic model over data residing on the Snowflake data source, then add the following property on the Kyvos connection screen and restart the BI service.

Property

Value

Property

Value

spark.driver.extraJavaOptions

-Dnet.snowflake.jdbc.ocspResponseCacheDir=/tmp

This completes Kyvos installation and deployment in your environment, you can now access Kyvos to start creating your semantic models.

Copyright Kyvos, Inc. All rights reserved.