Applies to:Kyvos Enterprise Kyvos Azure Marketplace
Kyvos AWS Marketplace Kyvos Free (
This section explains the steps and procedures required to install and configure Azure Databricks machine, connect to Kyvos semantic model using Python and execute SQL queries on Kyvos semantic model . Although these steps are performed on Databricks, you can use the same steps on any Linux machine running Ubuntu OS.
- The steps given in this section are for the Ubuntu machine and Debian ODBC package. However, for other flavors of Linux, the package would change to rpm.
- These steps are executed in Python notebook on Databricks and all the progress is lost when the Databricks cluster is terminated. To automate these steps, you can write init scripts which are shell scripts that run at the startup of each cluster.
Step 1: Install pyodbc
1. Create a Databricks Notebook in Python and run the following command to install the pyodbc library.
%sh apt-get update sudo apt-get install python3-pip -y pip3 install --upgrade pyodbc
2. Execute the following command from Databricks notebook, to verify if the library has been installed successfully. Select the Language as python for the notebook.
Import pyodbc
Step 2: Install unixODBC and other dependencies
%sh apt-get -y install unixodbc-dev sudo apt install libsasl2-modules-gssapi-mit
Step 3: Download and Install Simba Spark ODBC
1. Download the Simba_Spark_ODBC_Debian.zip file using the following command.
%sh wget https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/odbc/2.6.17/SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip
2. Unzip and install the package.
%sh unzip SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip dpkg --install SimbaSparkODBC-2.6.17.0024-Debian-64bit/simbaspark_2.6.17.0024-2_amd64.deb
The steps listed here are recommended in Simba spark connector documentation. If you installed the driver in a different directory specify the path accordingly.
Step 4: Configure Environment Variable
1. Specify the locations of the Driver Configuration files.
%sh export ODBCINI=/etc/odbc.ini export ODBCSYSINI=/etc export SIMBASPARKINI=/etc/simba.sparkodbc.ini
2. Specify the path for the ODBC Driver Managers library.
%sh export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/simba/spark/lib/
Step 5: Configure DSN
1. Configure the DSN parameters in the ODBC.ini file.
a. View the template that comes with the connector, using the following command.
%sh cat /opt/simba/spark/Setup/odbc.ini
b. Overwrite the ODBC.ini file, using the following command.
%sh echo "[Simba Spark 64-bit] Description=Simba Spark ODBC Driver (64-bit) DSN Driver=/opt/simba/spark/lib/64/libsparkodbc_sb64.so HOST=start.trial.kyvosinsights.com PORT=443 SparkServerType=3 AuthMech=3 ThriftTransport=2 UseNativeQuery=0 UID=user PWD=abshgf HTTPPath = kyvos/sql SSL=1 " > /opt/simba/spark/Setup/odbc.ini
Here,
Kyvos supports ONLY the following configurations
SparkServerType=3
Here, 3 stands for SparkThrist Server
ThriftTransport=2
Here, 2 stands for HTTP protocol
AuthMech=1 or AuthMech=3
Here, 1 stands for Kerberos, and 3 stands for Username and Password authentication mechanism.
Port 8081 is used for HTTP and 443 is used for HTTPS (SSL enabled).
To configure SSL, set the value of SSL=1
If you have configured SSL, you will need to add your certificate (PEM format) to the following file
%sh echo "Your certificate content here" >> /opt/simba/spark/lib/64/cacerts.pem
2. Check the ODBC.ini and other configuration files are present in /etc/.
3. Install the ODBC driver and provide the template.
The below command installs an instance of the ODBC driver based on the parameters provided in the template file.
Step 6: Python Code to create connection and execute SQL query
Import pyodbc conn = pyodbc.connect( 'DRIVER={Simba Spark 64-bit};' 'HOST=start.trial.kyvosinsights.com;PORT=80;' 'HTTPPath=kyvos/sql;' 'AuthMech=3;' 'SparkServerType=3;' 'ThriftTransport=2;' 'IgnoreTransactions=1;' 'UID=user;' 'PWD=abshgf') # Example getting records back from stored procedure (could also be a SELECT statement) cursor = conn.cursor() # Example doing a simple execute cursor.execute('SELECT `nike_uc4`.`athlete_nm` AS `athlete_nm`, SUM(`nike_uc4`.`demand`) AS `sum_demand_ok` FROM `user`.`nike_uc4` `nike_uc4` GROUP BY `nike_uc4`.`athlete_nm`') #iterate the results D1=cursor.fetchall() for showD1 in D1: print(showD1)
Optional Step:
Configure simba.sparkodbc.ini, overwrite LogLevel = 5 (debug), and specify a location for the LogPath to generate connection logs to debug any error in the connection.
%sh echo "[Driver] DriverManagerEncoding=UTF-16 ErrorMessagesPath=/opt/simba/spark/ErrorMessages/ LogLevel=5 LogPath=/odbc_logs/ SwapFilePath=/tmp ODBCInstLib=libodbcinst.so.1" >/opt/simba/spark/lib/64/simba.sparkodbc.ini
This will create 2 log files in the /ODBC_logs/ folder:
- log – created one time for each instance of the driver installed
- log – created for each connection