Kyvos on Azure - SQL Queries from Python Code in Databricks Notebook

Applies to:Kyvos Enterprise Kyvos Cloud Kyvos Azure Marketplace Kyvos Single Node Installation

Kyvos Single Node Standard Installation Kyvos Free

This section explains the steps and procedures required to install and configure Azure Databricks machine, connect to Kyvos semantic model using Python and execute SQL queries on Kyvos semantic model . Although these steps are performed on Databricks, you can use the same steps on any Linux machine running Ubuntu OS.

The steps given in this section are for the Ubuntu machine and Debian ODBC package. However, for other flavors of Linux, the package would change to rpm.
These steps are executed in Python notebook on Databricks and all the progress is lost when the Databricks cluster is terminated. To automate these steps, you can write init scripts which are shell scripts that run at the startup of each cluster.

Step 1: Install pyodbc
1. Create a Databricks Notebook in Python and run the following command to install the pyodbc library.

%sh
apt-get update
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc

2. Execute the following command from Databricks notebook, to verify if the library has been installed successfully. Select the Language as python for the notebook.

Import pyodbc

Step 2: Install unixODBC and other dependencies

%sh
apt-get -y install unixodbc-dev
sudo apt install libsasl2-modules-gssapi-mit

Step 3: Download and Install Simba Spark ODBC
1. Download the Simba_Spark_ODBC_Debian.zip file using the following command.

%sh
wget https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/odbc/2.6.17/SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip

2. Unzip and install the package.

%sh
unzip SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip
dpkg --install SimbaSparkODBC-2.6.17.0024-Debian-64bit/simbaspark_2.6.17.0024-2_amd64.deb

The steps listed here are recommended in Simba spark connector documentation. If you installed the driver in a different directory specify the path accordingly.

Step 4: Configure Environment Variable
1. Specify the locations of the Driver Configuration files.

%sh
export ODBCINI=/etc/odbc.ini
export ODBCSYSINI=/etc
export SIMBASPARKINI=/etc/simba.sparkodbc.ini

2. Specify the path for the ODBC Driver Managers library.

%sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/simba/spark/lib/

Step 5: Configure DSN
1. Configure the DSN parameters in the ODBC.ini file.
a. View the template that comes with the connector, using the following command.

%sh
cat /opt/simba/spark/Setup/odbc.ini

b. Overwrite the ODBC.ini file, using the following command.

%sh
echo "[Simba Spark 64-bit]
Description=Simba Spark ODBC Driver (64-bit) DSN
Driver=/opt/simba/spark/lib/64/libsparkodbc_sb64.so
HOST=start.trial.kyvosinsights.com
PORT=443
SparkServerType=3
AuthMech=3
ThriftTransport=2
UseNativeQuery=0
UID=user
PWD=abshgf
HTTPPath = kyvos/sql
SSL=1
" > /opt/simba/spark/Setup/odbc.ini

Here,

Kyvos supports ONLY the following configurations

SparkServerType=3
Here, 3 stands for SparkThrist Server

ThriftTransport=2
Here, 2 stands for HTTP protocol

AuthMech=1 or AuthMech=3
Here, 1 stands for Kerberos, and 3 stands for Username and Password authentication mechanism.

Port 8081 is used for HTTP and 443 is used for HTTPS (SSL enabled).

To configure SSL, set the value of SSL=1

If you have configured SSL, you will need to add your certificate (PEM format) to the following file

%sh echo "Your certificate content here" >> /opt/simba/spark/lib/64/cacerts.pem

2. Check the ODBC.ini and other configuration files are present in /etc/.

3. Install the ODBC driver and provide the template.
The below command installs an instance of the ODBC driver based on the parameters provided in the template file.

Step 6: Python Code to create connection and execute SQL query

Import pyodbc
conn = pyodbc.connect( 'DRIVER={Simba Spark 64-bit};'
                       'HOST=start.trial.kyvosinsights.com;PORT=80;'
                       'HTTPPath=kyvos/sql;'
                       'AuthMech=3;'
                       'SparkServerType=3;'
                       'ThriftTransport=2;'
                      'IgnoreTransactions=1;'
                       'UID=user;'
                       'PWD=abshgf')
# Example getting records back from stored procedure (could also be a SELECT statement)
cursor = conn.cursor()
# Example doing a simple execute
cursor.execute('SELECT `nike_uc4`.`athlete_nm` AS `athlete_nm`,   SUM(`nike_uc4`.`demand`) AS `sum_demand_ok` FROM `user`.`nike_uc4` `nike_uc4` GROUP BY `nike_uc4`.`athlete_nm`')
#iterate the results
D1=cursor.fetchall()
for showD1 in D1:
  print(showD1)

Optional Step:

Configure simba.sparkodbc.ini, overwrite LogLevel = 5 (debug), and specify a location for the LogPath to generate connection logs to debug any error in the connection.

%sh
echo "[Driver]
DriverManagerEncoding=UTF-16
ErrorMessagesPath=/opt/simba/spark/ErrorMessages/
LogLevel=5
LogPath=/odbc_logs/
SwapFilePath=/tmp
ODBCInstLib=libodbcinst.so.1" >/opt/simba/spark/lib/64/simba.sparkodbc.ini

This will create 2 log files in the /ODBC_logs/ folder:

log – created one time for each instance of the driver installed
log – created for each connection