Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Panel
panelIconIdatlassian-note
panelIcon:note:
bgColor#DEEBFF

Note

  • Supported only with premium workspace.

  • Supported only with Personal Access Token authentication.

  • On storage, Storage Blob Data Contributor rights are required for the logged-in user. 

  • You must have permission to create (and map) Storage credentials and External Locations for the Unity Catalog.

Configuring Databricks SQL warehouse

  • To create Databricks SQL warehouse, refer to Microsoft Documentation.

    1. Type: The serverless type will only be supported.

    2. Unity Catalog must be enabled. If Unity Catalog is not enabled for your workspace, you do not see this option.

  • To configure Databricks SQL warehouses with SQL parameters, perform the following steps.

    1. Open Databricks workspace.

    2. Click your username in the top bar of the workspace and select Admin Settings from the list.

    3. Click the SQL Warehouse Settings tab.

    4. In the SQL Configuration Parameters textbox, specify the below key-value pair:
      LEGACY_TIME_PARSER_POLICY LEGACY

    5. Click Save.
      For more information, see the SQL configuration parameters and Legacy_Time_Parser_Policy

...

Panel
panelIconIdatlassian-note
panelIcon:note:
bgColor#DEEBFF
  • You can modifying the existing default Datalake connection by selecting the SQL engine as Database SQL Warehouse .
    See the Provider parameters table for details.

    1. Enter Databricks SQL Warehouse JDBC URL. For more information, see Microsoft documentation.

    2. Enter Databricks SQL Personal Access Token for the Databricks SQL workspace.
      NOTE: It is optional in the absence of this PAT. The system will attempt to connect using the PAT provided for the Databricks connection.

    3. Select the Catalog Enabled checkbox. It is mandatory to enabled it for Databricks SQL.

  • You can also create a new build and read connection by using the SQL engine as Database SQL Warehouse. To do this, refer to the section, as explained below.

Creating Databricks SQL Warehouse connection

  1. From the Toolbox, click Setup, then choose Connections.

  2. From the Actions menu (  ) click Add Connection.

  3. Enter a Name for the connection.

  4. From the Category drop-down list, select the BUILD option.

  5. From the Provider list, select the Databricks option, and provide the following:

    1. Databricks Cluster Id: Enter the ID of your Databricks cluster.
      To obtain this ID, click the Cluster Name on the Clusters page in Databricks. The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id>

    2. Databricks Service Address: Enter the URL of your Databricks workspace.

    3. Databricks Personal Access Token: Enter the personal access token to access and connect to your Databricks workspace. Refer to the Databricks documentation to get your token 

  6. To use this connection as a read connection also, select the Use as Source checkbox.Provide the Hive Server JDBC URL. You can find this URL from the Kyvos Manager. Navigate to Hadoop Ecosystem Configuration page > Hadoop Parameters. Copy the value of kyvos.hiveserver2.jdbc.url parameter provided in the format:jdbc:spark://adb-<Databricks ID>.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<organization ID>/<Databricks cluster ID>;AuthMech=3;
    Ensure that you update the Databricks ID, Organization ID, and Databricks Cluster ID according to the cluster that you are working on.

  7. Select the SQL engine as Database SQL Warehouse .
    See the Provider parameters table for details.

    1. Enter Databricks SQL Warehouse JDBC URL. For more information, see Microsoft documentation.

    2. Enter Databricks SQL Personal Access Token for the Databricks SQL workspace.
      NOTE: It is optional in the absence of this PAT. The system will attempt to connect using the PAT provided for the Databricks connection.

  8. To use this connection as the default SQL engine, select the Is Default SQL Engine checkbox.

  9. Enter Min and Max Workers.

  10. Select the Use Same Instance Pool For Worker And Driver checkbox to use pool of cheaper modes for Spark driver, preferably Standard_DS3_v2.

  11. Enter Instance Pool Id. The Runtime version of the instance pool must be same as that of the Databricks Cluster ID. You can provide the ID of an existing Instance Pool or create Instance Pool.

  12. Enter your Spark Config to fine tune the Spark job performance. Provide space separated key-value pair for a property. Multiple properties must also be separated by space. For more information, see Microsoft documentation.

  13. Enter Tags (JSON String) for the cluster by providing the tags in JSON format. Both the cluster level tags and inherited from pools are applied. For more information, see Microsoft documentation.

  14. Configure the Cluster Log Path (DBFS) location where the system should persists the Spark job logs.NOTE: If you leave it blank, system will persists the logs at dbfs:/cluster-logs

  15. Select the Catalog Enabled checkbox. It is mandatory to enabled it for Databricks SQL.

  16. Click the Properties link to view or set properties.

  17. After you finish configuring the settings using the table shown below, click the Test button from the top left to validate the connection settings.If the connection is valid, click the Save button. 

  18. To refresh connections, click the Actions menu ( ⋮ ) at the top of the Connections column and select Refresh.

...