Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 4 Next »

Kyvos now supports automated resource creation for GCP using Terraform.

To create Kyvos resources, read the following:

Prerequisites to deploy Kyvos

  • You need a valid Google Cloud Platform account. This account will be used to authenticate Terraform to interact with GCP resources.

  • The following permissions must be given to the logged-in user account:

    • Editor

    • Secret Manager Admin

    • Storage Object Admin

    • Create a custom role and assign the below permission to the role. Ensure that custom role must be attached to logged-in user account.

      • iam.roles.create  

      • iam.serviceAccounts.setIamPolicy

      • resourcemanager.projects.setIamPolicy

  • Manual resource creation: Google Console users should have the privilege to launch Google resources like Instances, Dataproc cluster, Google Storage, and Disks in the Project.  

  • Dataproc Service Agent service account: Dataproc creates this service account with the Dataproc Service Agent role in a Dataproc user's Google Cloud project. This service account cannot be replaced by a user-specified service account when you create a cluster. This service agent account is used to perform Dataproc control plane operations, such as creating, updating, and deleting cluster VMs. Please refer to Dataproc Service Agent (Control Plane identity) for details.
    By default, Dataproc uses the service-[project-number]@dataproc-accounts.iam.gserviceaccount.com as the service agent account. If that service account doesn't exist, Dataproc uses the Google APIs service agent account, [project-number]@cloudservices.gserviceaccount.com, for control plane operations.   

    Permission required :    

    1. The above service account must have the Dataproc Service Agent predefined role attached

    2. Compute Network User: If using a Shared Network, grant the above service account the 'Compute Network User' predefined role to the project where the network originally resides.

  • Kyvos needs a service account to launch the Kyvos instance. Refer to the steps given in the Service Account section to create it.

  • The logged-in user will need access to VPN, Subnet, Network Interface/Security Group, and Service Account, which will be used by Kyvos to launch compute engines, Dataproc, and Instance Group.

  • Ensure that the following ports are opened/allowed in the Firewall inbound rules for all internal communication between Kyvos instances.
    2121, 2181, 2888, 3888, 4000, 6602, 6903, 6703, 45450, 45460, 45461, 45462, 45463, 45464, 45465, 6603, 6702, 6803, 7003, 45440, 6605, 45421, 45564, 4000, 8080, 8081, 8005, 8009, 8443, 8444, 9443, 22 and 9444.

  • Ensure that the following ports are opened/allowed in the Firewall inbound rules for all internal communication between the Dataproc cluster and Kyvos.
    3306, 8030, 8031, 8032, 8033, 8042, 8088, 9083, 8188, 18080, 8050, 8051, 8020, 10020, 19888, 10033, 8188, 9870, 10200, 10000, 10002, 22, 45460, 9866, 8998, and 9867
    NOTE: The port 8998 is required for Livy.

  • Ports 22, 8080, and 8081 should be accessible from outside of the cluster from where you want to access the Web application.

  • Create a firewall rule with all ports open between Dataproc master and worker nodes using network tags as targets, which will be attached to the Dataproc.
    For more information about the required ports between the Dataproc master nodes and the worker nodes, refer to GCP documentation at: Dataproc Cluster Network Configuration

  • If the Kyvos instances and Dataproc clusters are launched in a different VPN/Subnet, then Network Peering should be created between both networks.

  • There should be a private and public key for creating the Kyvos instances and the Dataproc cluster.

  • Kyvos will need the Storage Legacy Bucket Owner role on the storage bucket to store data (semantic models).

  • To access the storage bucket from the Kyvos instances, a NAT Gateway in VPC or Endpoint between storage and VPC should be available.  

  • To send requests to your VPC network and receive the corresponding responses without using the public internet, you must use the Serverless VPC Access connector.
    Serverless VPC Access uses the   Serverless VPC Access Service Agent   service account. This service account's email address has the following form:

    service-PROJECT_NUMBER@gcp-sa-vpcaccess.iam.gserviceaccount.com

    Permissions required:

    1. By default, the above service account has the Serverless VPC Access Service Agent role (roles/vpcaccess.serviceAgent). Serverless VPC Access operations may fail if you change this account's permissions.

    2. If using a Shared Network, grant the above service account the Serverless VPC Access Service Agent predefined role to the project where the network originally resides.

      NOTE: You can refer to the GCP documentation to create a Serverless VPC Access connector. 

  • Create an Autoscaling policy using Kyvos recommended configuration for Dataproc.

  • Private Google Access must be enabled for the subnet that you will use for deploying Kyvos and Dataproc clusters.

  • To enable external Hive metastore, the role attached to the Kyvos Manager node must have the following permissions:

    1. resourcemanager.projects.list

    2. dataproc.clusters.get

    3. compute.instances.get

    4. If your bucket is in another project, then for cross-project bucket access, you must provide the following permissions on your bucket.

      • storage.object.list

      • storage.object.get

    5. For cross-project metastore and Dataproc, assign the following roles on the project having metastore. Refer to the GCP documentation for details.

      • Dataproc Service Agent

      • Dataproc Metastore Service Agent

  • Ensure that the Kyvos deployment and the Dataproc cluster for use with Kyvos run in the same Project and Region.

  • Kyvos recommend instance configuration:

    1. Machine type for Kyvos Manager, Query Engine, and BI Server
      Kyvos Manager: n2-standard-4
      Query Engine: n2-highmem-4
      BI Server: n2-standard-8

    2. Master and worker nodes of Dataproc cluster
      Master Node:
      Series: N2
      Machine Type – n2-highmem-4 (4 vCPU and 32 GB)

      Worker Node:
      Series: N2
      Machine Type: n2-highmem-8 (8 vCPU and 64 GB)

  • If the Dataproc cluster is in a different region, then under compute metadata VmDnsSetting, set the value as GlobalDefault.

  • For a non-SSH based cluster,  If you use an existing Dataproc cluster and an existing bucket, you must execute the dataproc.sh script (available in the GCP Installation Files folder) on the master node of Dataproc after changing the values of DEPLOYMENT_BUCKET, WORK_DIR, COPY_LIB, and DATAPROC_VERSION to the name of the existing bucket.

  • To store repository credentials and other confidential credentials on the Secret Manager, you need to create a Secret.

  • To deploy the Kyvos cluster using  password-based authentication  for service nodes, ensure that the permissions listed here are available on all the VM instances for Linux user deploying the cluster.

  • To deploy the Kyvos cluster using  custom hostnames  for resources, ensure that the steps listed hereare completed on the resources created for use in the Kyvos cluster.

  • If using shared VPC, the VPC must be shared with the project that you want to access.

    1. Navigate to the VPC network.

    2. Click the Shared VPC.

    3. Go to the ATTACHED PROJECTS tab and attach the project.
      NOTE: This should be performed from the project where the shared VPC network originally resides.

Note

The gcloud compute instances re

  • For additional permissions, refer to the Prerequisites for deploying Kyvos in a GCP environment using Deployment Manager section from Step 2 to Step 27.

  • When using an existing VPC, the subnet must have a minimum mask range of /22

  • Subnets in which Kubernetes cluster is launched should have connectivity to the subnets in which Kyvos instances are launched.

  • When using an existing VPC, ensure that the subnet has two secondary IP ranges with valid mask ranges, as these will be used by the Kubernetes cluster.

  • Click Roles > Create new role. Provide a name like Kyvos-role for storage service, and assign the following permissions. This role should be attached to Kyvos service account.

  • deploymentmanager.deployments.list

  • deploymentmanager.resources.list

  • deploymentmanager.manifests.list

  • cloudfunctions.functions.get

  • dataproc.clusters.list

  • dataproc.clusters.get

  • compute.disks.setLabels

  • compute.instances.start

  • compute.instances.stop

  • compute.instances.list

  • compute.instances.setLabels

  • storage.buckets.get

  • storage.buckets.list

  • storage.objects.create

  • storage.objects.delete

  • storage.buckets.update

  • compute.disks.get

  • compute.instances.get

  • dataproc.clusters.update

  • storage.objects.get

  • storage.objects.list

  • storage.objects.update

  • cloudfunctions.functions.update

  • compute.subnetworks.get

  • resourcemanager.projects.getIamPolicy

  • compute.firewalls.list

  • iam.roles.get  

  • compute.machineTypes.get  

  • compute.machineTypes.list  

  • compute.instances.setMachineType

  • compute.instances.setMetadata

  • Add the below predefined roles in service account used by Kyvos cluster.

    • BigQuery data viewer

    • BigQuery user

    • Dataproc Worker

    • Cloud Functions Admin

    • Cloud Scheduler Admin

    • Cloud Scheduler Service Agent

    • Service Account User

    • Logs Writer

    • Workload Identity User

  • Permissions for Cross-Project Datasets Access with BigQuery:

    1. Use the same service account that is being used by Kyvos VMs.

    2. Give the following roles to the above-created service account on the BigQuery Project.

      • BigQuery Data Viewer

      • BigQuery User

  • Prerequisites for Cross-Project BigQuery setup and Kyvos VMs.

    1. Use the same service account that is being used by Kyvos VMs.

    2. To the service account used by Kyvos VMs, give the following roles on the BigQuery Project:

      • BigQuery Data Viewer

      • BigQuery User

  • For accessing BigQuery Views, add the following permissions to the Kyvos custom role (created above).

    • bigquery.tables.create

    • bigquery.tables.delete

    • bigquery.tables.update

    • bigquery.tables.updateData

  • Permissions to generate Temporary Views in Separate Dataset when performing the validation/preview operation from Kyvos on Google BigQuery.

    • bigquery.tables.create = permissions to create a new table  

    • bigquery.tables.updateData = to write data to a new table, overwrite a table, or append data to a table  

Prerequisites to run Terraform form local machine

  • Download and install Terraform on your local machine.

  • To install Terraform, refer to the Terraform documentation.

  • Execute Terraform init command to verify successful installation of Terraform.

  • Jq should be installed on your local machine.

  • You need a GCP account to create and manage resources. Ensure that you have the necessary permissions.

  • Configure GCP on your local machine.

  • For gcloud initialization, refer to the Google documentation.

Prerequisites to use Customer Managed Key (CMK) or Bring Your Own Key (BYOK) deployment

  • To use an existing service account for deployments, the following predefined roles are needed on Kyvos Service Account:

    • Cloud KMS CryptoKey Decrypter

    • Cloud KMS CryptoKey Encrypter

    • Cloud KMS CryptoKey Encrypter/Decrypter

Note

  • Encryption will be enabled for the following components:

    • Disk

    • Cloud storage

    • Secret manager

  • The service agent must be present in the project where the user is going to create Google Cloud Storage and Secret Manager. For more details, refer to Google documentation.

  • Cloud Key Management Service (KMS) API must be enabled in the project before deployment.

  • The existing CMK must be in the same region as deployment.

  • The existing CMK location must be regional; global keys are not supported by GCS buckets. For more details, refer to Google documentation.

  • To use the BYOK (Bring Your Own Key):
    The service agent must be present in the project where the user is going to create Google Cloud Storage and Secret Manager. For more details, refer to Google documentation.

Additional permission required to run Auto scaling for GCP Enterprise

Apart from existing permissions mentioned in the Creating a service account from Google Cloud Console section, you must need the following permissions for GCP Enterprise:

Permissions required in GCP

  • compute.instanceGroups.get

  • compute.instances.create

  • compute.disks.create

  • compute.disks.use

  • compute.subnetworks.use

  • compute.instances.setServiceAccount

  • compute.instances.delete

  • compute.instanceGroups.update

  • compute.instances.use

  • compute.instances.detachDisk

  • compute.disks.delete

  • compute.instances.attachDisk

Conditional permission needed if using Shared Network

  • compute.subnetworks.use (on the Kyvos service account in the project where your network resides)

Prerequisites to deploy Kyvos using Kubernetes

/wiki/spaces/KD202411/pages/268639615section for the complete set of permissions required for deploying Kyvos.

Additionally, for creating a GKE cluster, you must complete the following prerequisites.

Create a GKE cluster

  • Ensure that the GKE service agent’s default service account (service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com) has the Kubernetes Engine Service Agent role attached to it.

  • Existing Virtual Network

    • If using an existing Virtual Network for creating a GKE Cluster requires two secondary IPV4 addresses in the subnet. Additionally, if using a shared Virtual Network, following roles and permissions are required for by Default service account of Kubernetes (service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com) on the project of Shared Virtual Network.

      • Compute Network User

      • kubernetes_role: You must create a custom role. To do this, click Roles > Create new role. Provide a name like kubernetes_role; assign the following permissions, and then attach to the service account:

        Google documentation.

      • The 2181,45460,6903 ports must be allowed in the Firewall inbound rules for all internal communication between the Kubernetes cluster and Kyvos.

    •   Existing (IAM) Service account

      1. Add the following predefined roles to the existing IAM service account:

        1. Service Account Token Creator

        2. Kubernetes Engine Developer

        3. Kubernetes Engine Cluster Admin

      2. Add the following permissions to the kubernetes_role custom role that you created above.

        1. compute.instanceGroupManagers.update

        2. Compute.instanceGroupManagers.get

Prerequisites for using existing GKE Cluster

You must have an existing GKE cluster to complete the following prerequisites.

  1. VNet peering is necessary if the Kyvos VPC differs from the VPC associated with the existing Kubernetes cluster.

  2. Firewall rule on GKE Cluster VPC:

    1. An inbound rule allows TCP traffic on the 6903 port that is required with the source IP range set to the Kyvos VPC.

  3. Permissions required by GKE Service Account: For the GKE Service Account, the following roles and permissions are required:
    IAM Roles:

    • roles/iam.serviceAccountTokenCreator

    • roles/iam.workloadIdentityUser
      The above permission [roles/iam.workloadIdentityUser] is associated with the Kubernetes namespace and service account used for Kyvos deployment.

Command:

gcloud iam service-accounts add-iam-policy-binding IAM_SA_NAME@IAM_SA_PROJECT_ID.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member "serviceAccount:PROJECT_ID.svc.id.goog[kyvos-monitoring/default]"
gcloud iam service-accounts add-iam-policy-binding IAM_SA_NAME@IAM_SA_PROJECT_ID.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member "serviceAccount:PROJECT_ID.svc.id.goog[kyvos-compute/default]"
  • No labels