Kyvos on Azure - SQL Queries from Python Code in Databricks Notebook

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

This section explains the steps and procedures required to install and configure Azure Databricks machine, connect to Kyvos semantic model using Python and execute SQL queries on Kyvos semantic model. Although these steps are performed on Databricks, you can use the same steps on any Linux machine running Ubuntu OS.

Notes

The steps given in this section are for the Ubuntu machine and Debian ODBC package. However, for other flavors of Linux, the package would change to rpm.
These steps are executed in Python notebook on Databricks and all the progress is lost when the Databricks cluster is terminated. To automate these steps, you can write init scripts which are shell scripts that run at the startup of each cluster.

Step 1: Install pyodbc
1. Create a Databricks Notebook in Python and run the following command to install the pyodbc library.

%sh
apt-get update
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc

2. Execute the following command from Databricks notebook, to verify if the library has been installed successfully. Select the Language as python for the notebook.

import pyodbc

Step 2: Install unixODBC and other dependencies

%sh
apt-get -y install unixodbc-dev
sudo apt install libsasl2-modules-gssapi-mit

Step 3: Download and Install Simba Spark ODBC
1. Download the Simba_Spark_ODBC_Debian.zip file using the following command.

2. Unzip and install the package.

Note

The steps listed here are recommended in Simba spark connector documentation. If you installed the driver in a different directory specify the path accordingly.

Step 4: Configure Environment Variable
1. Specify the locations of the Driver Configuration files.

2. Specify the path for the ODBC Driver Managers library.

Step 5: Configure DSN
1. Configure the DSN parameters in the ODBC.ini file.
a. View the template that comes with the connector, using the following command.

b. Overwrite the ODBC.ini file, using the following command.

Here,

Kyvos supports ONLY the following configurations

SparkServerType=3
Here, 3 stands for SparkThrist Server

ThriftTransport=2
Here, 2 stands for HTTP protocol

AuthMech=1 or AuthMech=3
Here, 1 stands for Kerberos, and 3 stands for Username and Password authentication mechanism.

Important

Port 8081 is used for HTTP and 443 is used for HTTPS (SSL enabled).

To configure SSL, set the value of SSL=1

If you have configured SSL, you will need to add your certificate (PEM format) to the following file

%sh echo "Your certificate content here" >> /opt/simba/spark/lib/64/cacerts.pem

2. Check the ODBC.ini and other configuration files are present in /etc/.

3. Install the ODBC driver and provide the template.
The below command installs an instance of the ODBC driver based on the parameters provided in the template file.

Step 6: Python Code to create connection and execute SQL query

Optional Step:

Configure simba.sparkodbc.ini, overwrite LogLevel = 5 (debug), and specify a location for the LogPath to generate connection logs to debug any error in the connection.

This will create 2 log files in the /ODBC_logs/ folder:

log – created one time for each instance of the driver installed
log – created for each connection