For Kubernetes, Kyvos builds the semantic model using Google Cloud's managed service GKE (Google Kubernetes Engine). The GKE cluster is deployed through Google Deployment Manager scripts.
Using the scripts, you can either select a No-Spark-based cluster or process the semantic model using Spark Mode. Currently, to proceed with a no-spark-based deployment mode, you must use a Dataproc cluster (either new or existing).
Prerequisites
Before deploying the Kubernetes cluster, it is recommended that you refer to the Prerequisites for deploying Kyvos in a GCP environment section for the complete set of permissions required for deploying Kyvos.
Additionally, for creating a GKE cluster, you must complete the following prerequisites.
Create a GKE cluster
Case 1: If using an existing Virtual Network, creating a GKE Cluster requires two secondary IPV4 addresses in the subnet. Additionally, if using a shared Virtual Network, following roles and permissions are required for by Default service account of Kubernetes (service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com) on the project of Shared Virtual Network
Compute Network User
kubernetes_role (create a custom role)
compute.firewalls.create
compute.firewalls.delete
compute.firewalls.get
compute.firewalls.list
compute.firewalls.update
compute.networks.updatePolicy
compute.subnetworks.get
container.hostServiceAgent.use
Case2: Existing (IAM) Service account
Add the following roles to the existing IAM service account:
roles/iam.serviceAccountTokenCreator (Service Account Token Creator)
roles/container.developer (Kubernetes Engine Developer)
roles/container.clusterAdmin (Kubernetes Engine Cluster Admin)
Add the following permissions to Kyvos role:
compute.instanceGroupManagers.update
Compute.instanceGroupManagers.get
Kyvos Deployment in a GCP environment
Users need to update below parameters in the kyvos-template.yaml (provided in Google Deployment Manager scripts) for creating Kubernetes cluster:
gkeSubnetName
secondaryRangeName1
secondaryRangeName2
dataprocMetastoreURI
createGKE
gkeWorkerInitialNodeCount
gkeWorkerInstancetype
minWorkerNodeCount
maxWorkerNodeCount
Note- Refer to Kyvos Deployment document for other parameters, require to be updated for creating other resources and deploying Kyvos.
If using existing Service Account-
Once resources are created, execute the following commands using the gcloud CLI to link the Kubernetes Service account to the IAM Service account.
gcloud iam service-accounts add-iam-policy-binding IAM_SA_NAME@IAM_SA_PROJECT_ID.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member "serviceAccount:PROJECT_ID.svc.id.goog[kyvos-monitoring/default]"
gcloud iam service-accounts add-iam-policy-binding IAM_SA_NAME@IAM_SA_PROJECT_ID.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member "serviceAccount:PROJECT_ID.svc.id.goog[kyvos-compute/default]"
Replace the following:
IAM_SA_NAME: The name of your new IAM service account.
IAM_SA_PROJECT_ID: The project ID of your IAM service account.
PROJECT_ID: The project ID of your Google Cloud.
Post deployment, the following steps need to be taken:
Set property in connections: Users must add the following property from the Kyvos connections page:
kyvos.connection.readUsingCustomFS.jobs.internal=NONE
Set properties at Cube level: Users should modify the values of the following properties in the advance properties of Cube build:
kyvos.process.compute.type=KYVOS_COMPUTE
kyvos.build.aggregate.type=TABULAR
Set below property on Hadoop connection properties and restart kyvos services
Property - kyvos.process.datastore.properties
Value - SET disabled_optimizers = 'join_order';SET memory_limit='40GB';SET threads TO 1;
Note: For changing the Kyvos compute cluster mode, users must modify the value of
KYVOS_PROCESS_COMPUTE_SUBTYPE property on KM Kyvos Properties page.
Debugging
The GKE cluster consists of two namespaces:
a. kyvos-compute: This namespace hosts all Kyvos computation workers.
b. kyvos-monitoring: The Kyvos monitoring server, responsible for creating and scaling Kyvos computation workers, operates within the kyvos-monitoring namespace.
Kubectl commands list:
View the list of running pods across all namespaces:
kubectl get pods --all-namespaces
View the Google Kubernetes Engine (GKE) worker nodes:
kubectl get nodes
View the monitoring pods in the kyvos-monitoring namespace:
kubectl get pods -n kyvos-monitoring
View the Kyvos compute worker pods in the kyvos-compute namespace:
kubectl get pods -n kyvos-compute
View the bootup logs of Kyvos compute worker pods:
kubectl logs kyvos-compute-pod-name -n kyvos-compute -c kyvos-compute-worker
View the logs of the Kyvos monitoring pod:
kubectl logs kyvos-monitoring-server-pod-name -n kyvos-monitoring -c kyvos-monitoring-server
If the pods are not coming up and are in a pending state:
For Kyvos compute:
kubectl describe pods kyvos-compute-pod-name -n kyvos-compute
For Kyvos monitoring:
kubectl describe pods kyvos-monitoring-server-pod-name -n kyvos-monitoring
SSH into a pod:
kubectl exec -it kyvos-compute-pod-name -n kyvos-compute -c kyvos-compute-worker bash
Replace:
`kyvos-compute-pod-name` with the actual name of the pod obtained using `kubectl get pods -n kyvos-compute`.
` kyvos-monitoring-server-pod-name ` with the actual name of the pod obtained using `kubectl get pods -n kyvos-monitoring`.