Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Current »

Applies to: (tick) Kyvos Enterprise  (error) Kyvos Cloud (SaaS on AWS) (error) Kyvos AWS Marketplace

(error) Kyvos Azure Marketplace  (error) Kyvos GCP Marketplace (error) Kyvos Single Node Installation (Kyvos SNI)


Prerequisites

Important

Before starting the deployment for AWS, you must have the following.

If you are using the CloudFormation template for IAM roles and VPC, you should have the administrative privileges to create IAM roles and VPC.

  1. AWS CloudFormation template. Contact Kyvos support to get your custom template. Alternatively, download the default template Kyvos_AWS_Default_Template_Databricks2023.2.json file, or create a template as per your requirements.

  2. The CloudFormation template can be deployed through the logged-in user or a role. The logged-in user must have the required policies given in the aws-console-user-iam-policy.json file.

  3. EC2 key pair, consisting of a private key and a public key. You can create the key pair if needed.

  4. Networking requirements:

    1. Use the Network CloudFormation template to create network resources (VPC, Subnet, and Security Group) automatically. 

      • If you want to deploy your network with NAT Gateway, use the NATGateway Template (vpc_nat.json file) . 
        OR

    2. If you want to use existing network resources, perform the following steps in your VPC. 

      1. You must create VPC Endpoints within your VPC, to connect with the AWS services. Else, you must have the internet and NAT Gateway in the subnet.

        List of VPC Endpoints for AWS services required by Kyvos:

        AWS Service Name

        Description/Purpose

        VPC Endpoint Name

        CloudWatch logs

        Used to send bootstrap logs of the EC2 machines to CloudWatch Logs.

        com.amazonaws.{AWS-REGION}.logs

        Glue

        Used to connect to Glue from the Kyvos BI Server and fetch metadata of the tables stored.

        com.amazonaws.{AWS-REGION}.glue

        Cloudformation

        Used by Kyvos Manager at the time of deployment to validate and get details from the AWS stack in Cloudformation.

        com.amazonaws.{AWS-REGION}.cloudformation

        CloudWatch Event

        Used to schedule events on CloudWatch Event for scheduled starting of the Kyvos BI Server.

        com.amazonaws.{AWS-REGION}.events

        S3

        Used to connect to an S3 bucket for reading raw data and writing metadata.

        com.amazonaws.{AWS-REGION}.s3

        RDS

        Used for scheduled start/stop of the Kyvos cluster along with RDS.

        com.amazonaws.{AWS-REGION}.rds

        EC2

        Used by Kyvos Manager to describe EC2 and Kyvos BI Server for scheduled start/stop of Query Engines.

        com.amazonaws.{AWS-REGION}.ec2

        Secrets Manager

        Used by the Kyvos BI Server to get the passwords stored in AWS Secrets Manager.

        com.amazonaws.${AWS-REGION}.secretsmanager

        In the table above, change the {AWS-REGION} according to the region in which you are deploying Kyvos.
        AWS does not provide a VPC endpoint for the Cost explorer service, so the Kyvos Resource Usage feature will not work without internet access.

  5. Permission requirements:

    1. You can create IAM roles using the CloudFormation template (automated_deployment_iam_role.json file).
      OR

    2. Create IAM Role for:
      Refer to the section/wiki/spaces/KD20233/pages/18448740to create new roles.

      1. EC2 that will be attached to all Kyvos instances. This role contains all the permissions required by Kyvos Services and Kyvos Manager.
        Details for permissions required for EC2.

      2. Lambda that will be attached to the Kyvos created Lambda functions. This role contains all the permissions required by lambda functions to run.

  6. Port 443 of the Databricks cluster should be accessible by Kyvos.

  7. Create Databricks-instanceprofile-role with the following permissions:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
        "Sid": "GrantCatalogAccessToGlue",
        "Effect": "Allow",
        "Action": [
          "glue:BatchCreatePartition",
          "glue:BatchDeletePartition",
          "glue:BatchGetPartition",
          "glue:CreateDatabase",
          "glue:CreateTable",
          "glue:CreateUserDefinedFunction",
          "glue:DeleteDatabase",
          "glue:DeletePartition",
          "glue:DeleteTable",
          "glue:DeleteUserDefinedFunction",
          "glue:GetDatabase",
          "glue:GetDatabases",
          "glue:GetPartition",
          "glue:GetPartitions",
          "glue:GetTable",
          "glue:GetTables",
          "glue:GetUserDefinedFunction",
          "glue:GetUserDefinedFunctions",
          "glue:UpdateDatabase",
          "glue:UpdatePartition",
          "glue:UpdateTable",
          "glue:UpdateUserDefinedFunction"
        ],
        "Resource": [
          "*"
        ]
        },
        {
        "Sid": "VisualEditor0",
        "Effect": "Allow",
        "Action": [
          "s3:GetBucketLocation",
          "s3:GetObject",
          "s3:ListBucket"
        ],
        "Resource": "*"
        }
      ]
    }
  8. S3 Bucket permissions

    If you want to use an existing S3 bucket and IAM role, or if you want to read data from an S3 bucket other than where Kyvos is deployed, then the IAM role must have the following permissions on the S3 bucket.

    Here, replace:

    <Bucket Name> with the name of your bucket name.

    <Lambda Role> with the name of your Lambda Role.

    <EC2 Role> with the name of your EC2 Role.

    { 
      "Version": "2008-10-17", 
      "Statement": [ 
        { 
        "Sid": "Ec2LambdaRoleBucketPolicy", 
        "Effect": "Allow", 
        "Principal": { 
          "AWS": [ 
            "arn:aws:iam::456531263183:role/EC2-Role",
    					"arn:aws:iam::<AWS Accout ID>:role/<Lambda Role>",
            "arn:aws:iam::456531263183:role/Databricks-instanceprofile-role"  
         ] 
        }, 
        "Action": [ 
          "s3:PutAnalyticsConfiguration", 
          "s3:GetObjectVersionTagging", 
          "s3:ReplicateObject", 
          "s3:GetObjectAcl", 
          "s3:GetBucketObjectLockConfiguration", 
          "s3:DeleteBucketWebsite", 
          "s3:PutLifecycleConfiguration", 
          "s3:GetObjectVersionAcl", 
          "s3:DeleteObject", 
          "s3:GetBucketPolicyStatus", 
          "s3:GetObjectRetention", 
          "s3:GetBucketWebsite", 
          "s3:PutReplicationConfiguration", 
          "s3:PutObjectLegalHold", 
          "s3:GetObjectLegalHold", 
          "s3:GetBucketNotification", 
          "s3:PutBucketCORS", 
          "s3:GetReplicationConfiguration", 
          "s3:ListMultipartUploadParts", 
          "s3:PutObject", 
          "s3:GetObject", 
          "s3:PutBucketNotification", 
          "s3:PutBucketLogging", 
          "s3:GetAnalyticsConfiguration", 
          "s3:PutBucketObjectLockConfiguration", 
          "s3:GetObjectVersionForReplication", 
          "s3:GetLifecycleConfiguration", 
          "s3:GetInventoryConfiguration", 
          "s3:GetBucketTagging", 
          "s3:PutAccelerateConfiguration", 
          "s3:DeleteObjectVersion", 
          "s3:GetBucketLogging", 
          "s3:ListBucketVersions", 
          "s3:RestoreObject", 
          "s3:ListBucket", 
          "s3:GetAccelerateConfiguration", 
          "s3:GetBucketPolicy", 
          "s3:PutEncryptionConfiguration", 
          "s3:GetEncryptionConfiguration", 
          "s3:GetObjectVersionTorrent", 
          "s3:AbortMultipartUpload", 
          "s3:GetBucketRequestPayment", 
          "s3:GetObjectTagging", 
          "s3:GetMetricsConfiguration", 
          "s3:DeleteBucket", 
          "s3:PutBucketVersioning", 
          "s3:GetBucketPublicAccessBlock", 
          "s3:ListBucketMultipartUploads", 
          "s3:PutMetricsConfiguration", 
          "s3:GetBucketVersioning", 
          "s3:GetBucketAcl", 
          "s3:PutInventoryConfiguration", 
          "s3:GetObjectTorrent", 
          "s3:PutBucketWebsite", 
          "s3:PutBucketRequestPayment", 
          "s3:PutObjectRetention", 
          "s3:GetBucketCORS", 
          "s3:GetBucketLocation", 
          "s3:ReplicateDelete", 
          "s3:GetObjectVersion", 
          "s3:PutBucketTagging" 
        ], 
        "Resource": [ 
          "arn:aws:s3:::bucket-name/*", 
          "arn:aws:s3:::bucket-name" 
        ] 
        } 
      ] 
    }
  9. You must have the Access Key and Secret Key to access the Kyvos bundle. Contact Kyvos Support for details.

  10. Valid Kyvos license file.

  11. Databricks cluster for semantic model processing and processing aggregations, with the following parameters:  

    1. Databricks Runtime Version: Select Version 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)

    2. Autopilot Options: Select the following:  

      1. Enable autoscaling: Select this to enable autoscaling.  

      2. Terminate after ___ minutes of inactivity: Set the value as 30

    3. Worker type: Recommended value r5.4xlarge  

      1. Min Workers: Recommended value 1  

      2. Max Workers: Recommended value 10  

    4. Driver Type: Recommended value r 5.xlarge  

    5. Advanced options

      1. By default, the Spot fall back to On-demand checkbox is selected. Kyvos recommends you clear this checkbox.

      2. In the Spark Configurations define the below property in case of Glue-based deployment.

        • spark.databricks.hive.metastore.glueCatalog.enabled  true  

      3. If cross-account glue is to be used, define the below property to access cross-account glue: spark.hadoop.hive.metastore.glue.catalogid <GLUE_CATALOG_ID>  
        Now, set the below parquet-specific configuration properties:  

        1. spark.hadoop.spark.sql.parquet.int96AsTimestamp true  

        2. spark.sql.parquet.binaryAsString false

        3. spark.sql.parquet.int96AsTimestamp true  

        4. spark.hadoop.spark.sql.parquet.binaryAsString false

        5. spark.sql.caseSensitive false

        6. spark.hadoop.spark.sql.caseSensitive false

        7. spark.databricks.preemption.enabled false

      4. You must change Spark configurations to use managed disk. Ensure that you must not change the configuration in the default root (/tmp) volume.

        1. In the Spark Configurations, add the spark.local.dir /local_disk0 property where the local_disk0 is the managed disk.

        2. Optionally, you can execute the df -h command from a notebook for verification.

        3. Add the SPARK_WORKER_DIR=/local_disk0 value in the Environment variables.

    6. Tags: UsedBy tag with the value set as Kyvos is  required to run the cluster.  

    7. Instance profile: Copy the Instance Profile ARN of the role which is being created in Point 7 above.  

      1. In Databricks console, go to Admin Console > Instance Profile and click Add Instance Profile. Paste the Instance Profile ARN in the text box.  

      2. Select the Skip Validation checkbox and then click Add.  

      3. In Cluster settings, go to Advance Options, and select the instance profile created above in the Instance Profile field.

  12. Databricks information:  

    1. Databricks Cluster Id:  To obtain this ID, click the Cluster Name on the Clusters page in Databricks.  
      The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id>. The cluster ID is the number after the /clusters/ component in the URL of this page.  

    2. Databricks Cluster Organization ID:  To obtain this ID, click the Cluster Name on the Clusters page in Databricks.  
      The number after o= in the workspace URL is the organization ID. For example, if the workspace URL is https://westus.azuredatabricks.net/?o=7692xxxxxxxx , then the organization ID is 7692xxxxxxxx.  

    3. Databricks Role ARN: Use the ARN of the Databricks-instanceprofile-role created in point 7 above. 
      The ARN looks like this:  arn:aws:iam ::45653****** *:role /AssumeRoleTest.
      This Databricks Role should have " iam:PassRole " permission in the role you have created for Databricks workspace.

  13. If using an existing Secrets Manager, ensure that the KYVOS-DATABRICKS-SERVICE-TOKEN-DefaultHadoopCluster01 key is added to it.

Creating CloudFormation template

The Kyvos CloudFormation template can create the following resources:

  • EC2 instances for Kyvos services - BI Server, Query Engines, Kyvos Manager, Kyvos Web Portal, and Postgres.

  • S3 for storing Kyvos semantic model

  • RDS for use as Kyvos repository if you don't want to use the default Postgres database provided in the Kyvos package.

  • Lambda to use the scheduling (cluster ON) features.

  • API Gateway to get the Rest URL on the Lambda function.

  • CloudWatch event for scheduling the Kyvos BI Server.

  • Secrets Manager for storing passwords, like Kyvos DB password, Active Directory password, and SMTP password (if configured)

  • SecurityGroup for Kyvos and Databricks instances.

Note

The Security Group created by the template is allowed with all the requisite ports. To know more about specific inbound rules, see Port requirements.
You must ensure proper connectivity between the Security group being used by Databricks and the Kyvos instances

Important

Kyvos uses Lambda to start the BI Server instances. The CloudFormation template will create three Lambda functions for:

  1. Scheduled start of the BI Server

  2. Forced start of the BI Server

  3. Display cluster status on the Kyvos Web UI.

Next: Deploy Kyvos using CloudFormation Template

  • No labels