Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)
...
Kyvos allows you to use separate build process and read connections, as well as multiple build process connections on AWS, Azure, and GCP platforms. This helps user groups across organizations to use separate clusters and hence improve semantic model process time as the resources are not shared across a larger group, and also each group will be able to get an estimate of their own usage.
Is Data Process – This connection holds the data on which the semantic model will be processed.
Use as Source – This connection will primarily be used for reading purposes like semantic model build process and data profiling jobs. However, you can also mark a build process connection as a read connection, as it can be used for creating datasets and designing semantic models.
You need to create a base build process connection during the deployment and subsequently add a new build process connection through Kyvos. The new connection must have the same configuration in terms of cluster version, Hadoop, Hive, and Spark versions.
...
If you have configured multiple build process connections, you can choose the connection to be used for launching the build process jobs at the time of scheduling buildsprocesses.
By default, all Warehouse connections are Use as a Source connections, which can only be used to read data for registering files.
Aura tab collection |
---|
params | JTdCJTIyZ2VuZXJhbCUyMiUzQSU3QiUyMnRhYlNwYWNpbmclMjIlM0EwJTJDJTIydGFiV2lkdGglMjIlM0ExMDAlMkMlMjJ0YWJIZWlnaHQlMjIlM0E1MCUyQyUyMmRpcmVjdGlvbiUyMiUzQSUyMmhvcml6b250YWwlMjIlN0QlMkMlMjJjb250ZW50JTIyJTNBJTdCJTIyYmFja2dyb3VuZENvbG9yJTIyJTNBJTdCJTIyY29sb3IlMjIlM0ElMjIlMjNmZmYlMjIlN0QlMkMlMjJib3JkZXIlMjIlM0ElN0IlMjJzdHlsZSUyMiUzQSUyMnNvbGlkJTIyJTJDJTIyd2lkdGglMjIlM0ExJTJDJTIydG9wJTIyJTNBdHJ1ZSUyQyUyMmJvdHRvbSUyMiUzQXRydWUlMkMlMjJsZWZ0JTIyJTNBdHJ1ZSUyQyUyMnJpZ2h0JTIyJTNBdHJ1ZSUyQyUyMmNvbG9yJTIyJTNBJTdCJTIybGlnaHQlMjIlM0ElMjIlMjNjY2NlY2YlMjIlN0QlN0QlMkMlMjJwYWRkaW5nJTIyJTNBJTdCJTIydG9wJTIyJTNBMTAlMkMlMjJyaWdodCUyMiUzQTEwJTJDJTIyYm90dG9tJTIyJTNBMTAlMkMlMjJsZWZ0JTIyJTNBMTAlN0QlN0QlMkMlMjJhY3RpdmUlMjIlM0ElN0IlMjJiYWNrZ3JvdW5kQ29sb3IlMjIlM0ElN0IlMjJjb2xvciUyMiUzQSU3QiUyMmxpZ2h0JTIyJTNBJTIyJTIzZjU4MjI3JTIyJTdEJTdEJTJDJTIydGV4dCUyMiUzQSU3QiUyMmZvbnRTaXplJTIyJTNBMTYlMkMlMjJjb2xvciUyMiUzQSU3QiUyMmxpZ2h0JTIyJTNBJTIyJTIzMDAwMDAwJTIyJTdEJTJDJTIydGV4dEFsaWduJTIyJTNBJTIybGVmdCUyMiUyQyUyMmZvbnRXZWlnaHQlMjIlM0ElMjJib2xkJTIyJTdEJTdEJTJDJTIyaG92ZXIlMjIlM0ElN0IlMjJiYWNrZ3JvdW5kQ29sb3IlMjIlM0ElN0IlMjJjb2xvciUyMiUzQSUyMiUyM2RmZTFlNiUyMiU3RCUyQyUyMnRleHQlMjIlM0ElN0IlMjJmb250U2l6ZSUyMiUzQTE4JTJDJTIyY29sb3IlMjIlM0ElMjIlMjM1ZTZjODQlMjIlMkMlMjJ0ZXh0QWxpZ24lMjIlM0ElMjJsZWZ0JTIyJTJDJTIyZm9udFdlaWdodCUyMiUzQSUyMmxpZ2h0ZXIlMjIlN0QlN0QlMkMlMjJpbmFjdGl2ZSUyMiUzQSU3QiUyMmJhY2tncm91bmRDb2xvciUyMiUzQSU3QiUyMmNvbG9yJTIyJTNBJTIyJTIzZjRmNWY3JTIyJTdEJTJDJTIydGV4dCUyMiUzQSU3QiUyMmZvbnRTaXplJTIyJTNBMTYlMkMlMjJjb2xvciUyMiUzQSUyMiUyMzVlNmM4NCUyMiUyQyUyMnRleHRBbGlnbiUyMiUzQSUyMmxlZnQlMjIlMkMlMjJmb250V2VpZ2h0JTIyJTNBJTIybGlnaHRlciUyMiU3RCU3RCU3RA== |
---|
|
Aura tab |
---|
summary | AWS |
---|
params | JTdCJTIydGl0bGUlMjIlM0ElMjJBV1MlMjIlN0Q= |
---|
|
Creating Databricks connectionTo create a Databricks connection for AWS, perform the following steps. From the Toolbox, click Setup, then choose Connections. From the Actions menu ( ⋮ ) click Add Connection. Enter a Name for the connection and provide information as:
Parameter | Description/Remarks |
---|
Category | Select the Build Process option. | Providers | Select the Databricks option. | Databricks Service Address | Enter the URL of the Databricks workspace. | Databricks Personal Access Token | Provide the personal access token to access and connect to your Databricks workspace. Refer to the Databricks documentation to know how to get your token. | Databricks Cluster Id | Enter the ID of your Databricks cluster. To obtain this ID, click the Cluster Name on the Clusters page in Databricks. The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id> | Is Data Process | By default, the checkbox is selected. | Use as source | Select the checkbox to use this as a read connection. In this case, the connection will be used to read data (creating registered datasets) on which the semantic model will be created. | Metastore Type | Metastore type to be used for fetching databases and table listing or writing SQL queries to design register datasets. NOTE: This option is only displayed if you select the Use as a source checkbox. You can select from the DEFAULT or GLUE METASTORE options. The GLUE option is displayed and set by default only if your AWS cluster was deployed with Glue enabled. | Hive Server JDBC URL | Databricks cluster JDBC URL to connect with Databricks internal metastore by creating JDBC connection. | SQL Engine | Select the HIVE or Spark option from the SQL Engine from the list. Is default SQL engine: To enable the connection for raw data querying, select the checkbox to set this connection to run the default SQL engine. Configure Job cluster: Use this option to allow Kyvos to execute Spark jobs on Job Cluster to reduce the costs of the build process jobs. This feature is helpful and recommended in limited scenarios.
| Configure Job Cluster | Use this option to allow Kyvos to execute Spark jobs on Job Cluster to reduce the costs of the process jobs. This feature is helpful and recommended in limited scenarios. Please see the recommendations and best practices sections below for details. | Autoscaling | If needed, enable autoscaling and specify the minimum and maximum number of worker nodes for the cluster. NOTE: This option is displayed only when you select the Configure Job Cluster checkbox. | Use same instance pool for worker and driver | Select the checkbox to use the same instance pool for worker and driver nodes. NOTE: Kyvos does not perform any heavy operation on the driver node; it is recommended to use a pool of cheaper nodes for the Spark driver, preferably Standard_DS3_v2. NOTE: This option is displayed only when you select the Configure Job Cluster checkbox. | Instance Pool Id | Instance Pool Id to be used for worker nodes. If the Use same instance pool for worker and driver option is selected, this pool will also be used for driver nodes. NOTE: Runtime version of the instance pool must be the same as that of the Databricks Cluster ID. You can provide the ID of an existing Instance Pool. See Databricks documentation for details. NOTE: This option is displayed only when you select the Configure Job Cluster checkbox. | Driver Instance Pool Id | Instance Pool Id to be used for driver nodes. NOTE: This option is displayed only when you select the Configure Job Cluster checkbox. | Spark config | Enter your Spark configuration to fine-tune the Spark job performance. Provide space-separated key-value pair for a property. Multiple properties must also be separated by space. Learn more. NOTE: This option is displayed only when you select the Configure Job Cluster checkbox. | Tags (JSON string) | You can add additional tags for the cluster by providing the tags in JSON format. Both the cluster-level tags and those inherited from pools are applied. You cannot add a cluster-specific tag with the same key name as a custom tag inherited from a pool (that is, you cannot override a custom tag that is inherited from the pool). Example: {"key1": "val1","key2": "val2"} Learn more. NOTE: This option is displayed only when you select the Configure Job Cluster checkbox. | Cluster log path (DBFS) | You can configure the DBFS location where the system should persist the Spark job logs. If you leave it blank, the system will persist the logs at the dbfs:/cluster-logs location. NOTE: This option is displayed only when you select the Configure Job Cluster checkbox. |
Aura tab |
---|
summary | Azure |
---|
params | JTdCJTIydGl0bGUlMjIlM0ElMjJBenVyZSUyMCUyMiU3RA== |
---|
|
Creating a build process and read connectionFrom the Toolbox, click Setup, then choose Connections. From the Actions menu ( ⋮ ) click Add Connection. Enter a Name for the connection. From the Category drop-down list, select the BUILD PROCESS option. From the Provider list, select the Databricks option, and provide the following: Databricks Cluster Id: Enter the ID of your Databricks cluster. To obtain this ID, click the Cluster Name on the Clusters page in Databricks. The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id> Databricks Service Address: Enter the URL of your Databricks workspace. Databricks Personal Access Token: Enter the personal access token to access and connect to your Databricks workspace. Refer to the Databricks documentation to get your token
To use this connection as a read connection also, select the Is Read checkbox. Provide the Hive Server JDBC URL. You can find this URL from the Kyvos Manager. Navigate to Hadoop Ecosystem Configuration page > Hadoop Parameters. Copy the value of kyvos.hiveserver2.jdbc.url parameter provided in the format: jdbc:spark://adb-<Databricks ID>.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<organization ID>/<Databricks cluster ID>;AuthMech=3; Ensure that you update the Databricks ID, Organization ID, and Databricks Cluster ID according to the cluster that you are working on. To use this connection as the default SQL engine, select the Is Default SQL Engine checkbox, and select the SQL engine from Spark, Hive, or Database SQL Warehouse . See the Provider parameters table for details. To use the Databricks SQL engine, provide the Server URL and Alternate Server URL.
Aura tab |
---|
summary | GCP |
---|
params | JTdCJTIydGl0bGUlMjIlM0ElMjJHQ1AlMjIlN0Q= |
---|
|
Creating a build process and read connectionFrom the Toolbox, click Setup, then choose Connections. From the Actions menu ( ⋮ ) click Add Connection. Enter a Name for the connection. From the Category drop-down list, select the BUILD PROCESS option. From the Provider list, select the Dataproc option. Provide the Dataproc Cluster Name. Select the required Hive Authentication from the list. By default, the Is Data Process checkbox is selected. To use this connection as a read connection also, select the Use as Source checkbox. Click the Save button. The system auto-populates the History server Url in the http://[MASTER_NODE_IP]:18080 format according to the provided Dataproc cluster name. The system auto-populates the Livy server Url in the http://[MASTER_NODE_IP]:8998 format according to the provided Dataproc cluster name. To use this connection as the default SQL engine, select the Is Default SQL Engine checkbox, and select the SQL engine from Spark or Hive. See the Provider parameters table for details. To use the Spark engine, provide the Server URL and Alternate Server URL. Click the Save button.
|
...