Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. From the Toolbox, click Setup, then choose Connections.

  2. From the Actions menu (  ) click Add Connection.

  3. Enter a Name for the connection.

  4. From the Category drop-down list, select the BUILD option.

  5. From the Provider list, select the Databricks option, and provide the following:

    1. Databricks Cluster Id: Enter the ID of your Databricks cluster.
      To obtain this ID, click the Cluster Name on the Clusters page in Databricks. The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id>

    2. Databricks Service Address: Enter the URL of your Databricks workspace.

    3. Databricks Personal Access Token: Enter the personal access token to access and connect to your Databricks workspace. Refer to the Databricks documentation to get your token 

  6. To use this connection as a read connection also, select the Is Read Use as Source checkbox.

  7. Provide the Hive Server JDBC URL. You can find this URL from the Kyvos Manager. Navigate to Hadoop Ecosystem Configuration page > Hadoop Parameters. Copy the value of kyvos.hiveserver2.jdbc.url parameter provided in the format:
    jdbc:spark://adb-<Databricks ID>.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<organization ID>/<Databricks cluster ID>;AuthMech=3;
    Ensure that you update the Databricks ID, Organization ID, and Databricks Cluster ID according to the cluster that you are working on.

  8. Select the SQL engine as Database SQL Warehouse .
    See the Provider parameters table for details.

    1. Enter Databricks SQL Warehouse JDBC URL. For more information, see Microsoft documentation.

    2. Enter Databricks SQL Personal Access Token for the Databricks SQL workspace.
      NOTE: It is optional in the absence of this PAT. The system will attempt to connect using the PAT provided for the Databricks connection.

  9. To use this connection as the default SQL engine, select the Is Default SQL Engine checkbox.

  10. Enter Min and Max Workers.

  11. Select the Use Same Instance Pool For Worker And Driver checkbox to use pool of cheaper modes for Spark driver, preferably Standard_DS3_v2.

  12. Enter Instance Pool Id. The Runtime version of the instance pool must be same as that of the Databricks Cluster ID. You can provide the ID of an existing Instance Pool or create Instance Pool.

  13. Enter your Spark Config to fine tune the Spark job performance. Provide space separated key-value pair for a property. Multiple properties must also be separated by space. For more information, see Microsoft documentation.

  14. Enter Tags (JSON String) for the cluster by providing the tags in JSON format. Both the cluster level tags and inherited from the SQL engine list, select Database SQL Warehouse .
    See the Provider parameters table for details.

  15. To use the Databricks SQL engine, provide the Server URL and Alternate Server URL.

...

  1. pools are applied. For more information, see Microsoft documentation.

  2. Configure the Cluster Log Path (DBFS) location where the system should persists the Spark job logs.
    NOTE: If you leave it blank, system will persists the logs at dbfs:/cluster-logs

  3. Select the Catalog Enabled checkbox.

  4. Click the Properties link to view or set properties.

  5. After you finish configuring the settings using the table shown below, click the Test button from the top left to validate the connection settings.

  6. If the connection is valid, click the Save button. 

  7. To refresh connections, click the Actions menu ( ⋮ ) at the top of the Connections column and select Refresh.

...