Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Applies to: Image Modified Kyvos Enterprise   Image Modified Kyvos Cloud (Managed Services on AWS)    Image Modified Kyvos Azure Marketplace

Image Modified Kyvos AWS Marketplace    Image Modified Kyvos Single Node Installation (Kyvos SNI)     Image Modified Kyvos Free ( Limited offering for AWS)

...

Kyvos provides the following methods for wizard-based deployment on AWS:

...

Regardless of the type of installation, the following prerequisites should be available.

  1. EC2  key pair, consisting of a private key and a public key. You can  create  the key pair if needed.

  2. Networking requirements:
    1. Use the  Network   CloudFormation template  to create network resources (VPC, Subnet, and Security Group) automatically. 
      1. If you want to deploy your network with NAT Gateway, use  the  NATGateway Template (vpc_nat.json  file) . 
      2. If you want to deploy your network with Endpoints,  use the  Endpoints Template   (vpc_internet_gateway.json  file)
        OR
    2. If you want to use existing network resources, perform the following steps in your VPC. 
      1. You must create VPC Endpoints within your VPC, to connect with the AWS services. Else, you must have the internet and NAT Gateway in the subnet.

        List of VPC Endpoints for AWS services required by Kyvos:

        AWS Service Name

        Description/Purpose

        VPC Endpoint Name

        CloudWatch logs

        Used to send bootstrap logs of the EC2 machines to CloudWatch Logs.

        com.amazonaws.{AWS-REGION}.logs

        Databricks  

        Used to connect to Databricks from Kyvos BI Server for submitting jobs to cluster and other Databricks related activities.


        Glue

        Used to connect to Glue from the Kyvos BI Server and fetch metadata of the tables stored.

        com.amazonaws.{AWS-REGION}.glue

        Cloudformation

        Used by Kyvos Manager at the time of deployment to validate and get details from the AWS stack in Cloudformation.

        com.amazonaws.{AWS-REGION}.cloudformation

        CloudWatch Event

        Used to schedule events on CloudWatch Event for scheduled starting of the Kyvos BI Server.

        com.amazonaws.{AWS-REGION}.events

        S3

        Used to connect to S3 bucket for reading raw data and writing metadata.

        com.amazonaws.{AWS-REGION}.s3

        RDS

        Used for scheduled start/stop of the Kyvos cluster along with RDS.

        com.amazonaws.{AWS-REGION}.rds

        EC2

        Used by Kyvos Manager to describe EC2 and Kyvos BI Server for scheduled start/stop of Query Engines.

        com.amazonaws.{AWS-REGION}.ec2

        Secrets Manager

        Used by the Kyvos BI Server to get the passwords stored in AWS Secrets Manager.

        com.amazonaws.${AWS-REGION}.secretsmanager

        In the table above, change the {AWS-REGION} according to the region in which you are deploying Kyvos.
        AWS does not provide a VPC endpoint for the Cost explorer service, so the Kyvos Resource Usage feature will not work without internet access.

  3. Permission requirements:
    1. You can create IAM roles using the CloudFormation template (wizard_based_deployment_iam_role.json file in the Installation Files folder).
      OR
    2. Create  IAM Role  for:
      1. EC2 that will be attached to all Kyvos instances.  This role contains all the permissions required by Kyvos Services and Kyvos Manager. 
        Details for permissions required for EC2 . You need to provide all the permissions mentioned in the Permissions required for Automated CloudFormation template-based deployment and Additional permissions required for Wizard-based deployment sections.
      2. Lambda that will be attached to the Kyvos created Lambda functions. This role contains all the permissions required by lambda functions to run.
        Download the  ec2_iam_policy.json  and  lambda_iam_policy.json  files.
  4. Create Databricks-instanceprofile-role with the following permissions:

    Code Block
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "GrantCatalogAccessToGlue",
                "Effect": "Allow",
                "Action": [
                    "glue:BatchCreatePartition",
                    "glue:BatchDeletePartition",
                    "glue:BatchGetPartition",
                    "glue:CreateDatabase",
                    "glue:CreateTable",
                    "glue:CreateUserDefinedFunction",
                    "glue:DeleteDatabase",
                    "glue:DeletePartition",
                    "glue:DeleteTable",
                    "glue:DeleteUserDefinedFunction",
                    "glue:GetDatabase",
                    "glue:GetDatabases",
                    "glue:GetPartition",
                    "glue:GetPartitions",
                    "glue:GetTable",
                    "glue:GetTables",
                    "glue:GetUserDefinedFunction",
                    "glue:GetUserDefinedFunctions",
                    "glue:UpdateDatabase",
                    "glue:UpdatePartition",
                    "glue:UpdateTable",
                    "glue:UpdateUserDefinedFunction"
                ],
                "Resource": [
                    "*"
                ]
            },
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:GetBucketLocation",
                    "s3:GetObject",
                    "s3:ListBucket"
                ],
                "Resource": "*"
            }
        ]
    }


  5. S3 Bucket permissions for using existing bucket

    If you want to use an existing S3 bucket and IAM role, or if you want to read data from an S3 bucket other than where Kyvos is deployed, then the IAM role must have the following permissions on the S3 bucket.

    Info

    Ensure that the bucket name confirms to AWS naming convention. Additionally, Kyvos does not allow dot (.) to be used for Bucket Name.


    Here, replace:
    <Bucket Name> with the name of your bucket name.
    <Lambda Role> with the name of your Lambda Role.
    <EC2 Role> with the name of your EC2 Role.

    Code Block
    { 
        "Version": "2008-10-17", 
        "Statement": [ 
            { 
                "Sid": "Ec2LambdaRoleBucketPolicy", 
                "Effect": "Allow", 
                "Principal": { 
                    "AWS": [ 
                        "arn:aws:iam::456531263183:role/EC2-Role", 
                        "arn:aws:iam::456531263183:role/Databricks-instanceprofile-role"  
                   ] 
                }, 
                "Action": [ 
                    "s3:PutAnalyticsConfiguration", 
                    "s3:GetObjectVersionTagging", 
                    "s3:ReplicateObject", 
                    "s3:GetObjectAcl", 
                    "s3:GetBucketObjectLockConfiguration", 
                    "s3:DeleteBucketWebsite", 
                    "s3:PutLifecycleConfiguration", 
                    "s3:GetObjectVersionAcl", 
                    "s3:DeleteObject", 
                    "s3:GetBucketPolicyStatus", 
                    "s3:GetObjectRetention", 
                    "s3:GetBucketWebsite", 
                    "s3:PutReplicationConfiguration", 
                    "s3:PutObjectLegalHold", 
                    "s3:GetObjectLegalHold", 
                    "s3:GetBucketNotification", 
                    "s3:PutBucketCORS", 
                    "s3:GetReplicationConfiguration", 
                    "s3:ListMultipartUploadParts", 
                    "s3:PutObject", 
                    "s3:GetObject", 
                    "s3:PutBucketNotification", 
                    "s3:PutBucketLogging", 
                    "s3:GetAnalyticsConfiguration", 
                    "s3:PutBucketObjectLockConfiguration", 
                    "s3:GetObjectVersionForReplication", 
                    "s3:GetLifecycleConfiguration", 
                    "s3:GetInventoryConfiguration", 
                    "s3:GetBucketTagging", 
                    "s3:PutAccelerateConfiguration", 
                    "s3:DeleteObjectVersion", 
                    "s3:GetBucketLogging", 
                    "s3:ListBucketVersions", 
                    "s3:RestoreObject", 
                    "s3:ListBucket", 
                    "s3:GetAccelerateConfiguration", 
                    "s3:GetBucketPolicy", 
                    "s3:PutEncryptionConfiguration", 
                    "s3:GetEncryptionConfiguration", 
                    "s3:GetObjectVersionTorrent", 
                    "s3:AbortMultipartUpload", 
                    "s3:GetBucketRequestPayment", 
                    "s3:GetObjectTagging", 
                    "s3:GetMetricsConfiguration", 
                    "s3:DeleteBucket", 
                    "s3:PutBucketVersioning", 
                    "s3:GetBucketPublicAccessBlock", 
                    "s3:ListBucketMultipartUploads", 
                    "s3:PutMetricsConfiguration", 
                    "s3:GetBucketVersioning", 
                    "s3:GetBucketAcl", 
                    "s3:PutInventoryConfiguration", 
                    "s3:GetObjectTorrent", 
                    "s3:PutBucketWebsite", 
                    "s3:PutBucketRequestPayment", 
                    "s3:PutObjectRetention", 
                    "s3:GetBucketCORS", 
                    "s3:GetBucketLocation", 
                    "s3:ReplicateDelete", 
                    "s3:GetObjectVersion", 
                    "s3:PutBucketTagging" 
                ], 
                "Resource": [ 
                    "arn:aws:s3:::bucket-name/*", 
                    "arn:aws:s3:::bucket-name" 
                ] 
            } 
        ] 
    } 


  6. You must have the Access Key and Secret Key to access the Kyvos bundle.  Contact Kyvos Support  for details.
  7. Valid Kyvos license file.
  8. Databricks  cluster with the following  parameters :  
    1. Databricks Runtime Version:   Select  Select 10.4 LTS (includes Apache Spark 3.1.2, Scala 2.12) or 12.2 LTS ((includes Apache Spark 3.3.2.1, Scala 2.12) 
    2. Autopilot Options:  Select the following:  
      1. Enable autoscaling:  Select this to enable autoscaling.  
      2. Terminate after ___ minutes of inactivity . Set the value as  30 .  
    3. Worker type:  Recommended value  r5.4xlarge  
      1. Min Workers: Recommended value 1  
      2. Max Workers: Recommended value 10
    4. Driver Type:  Recommended value r 5.xlarge  
    5. Advanced options   
      1. To use Databricks with Spot Instances:
        • Select the Spot fall back to On-demand checkbox from the On-demand/spot composition area.
        • Specify the number of workers.
      2. In the  Spark Configurations  define the  following property  in case of Glue-based deployment.  
        • s park.databricks.hive.metastore.glueCatalog.enabled  true  
      3. If cross-account glue is to be used, then define the following property to access cross-account glue :  
        • spark.hadoop.hive.metastore.glue.catalogid <GLUE_CATALOG_ID>  
      4. After these, set the below parquet-specific configuration properties:  
        • spark.hadoop.spark.sql.parquet .int96AsTimestamp true  
        • spark.sql.parquet.binaryAsString false  
        • spark.sql.parquet .int96AsTimestamp true  
        • spark.hadoop.spark.sql.parquet.binaryAsString false

        • spark.databricks.preemption.enabled false

        • spark.sql.caseSensitive false
        • spark.hadoop.spark.sql.caseSensitive false
    6. Tags: Owner and JI RA tag is required to run the cluster.  
    7. Ins tance profile: Copy the Instance Profile ARN of the role created in Point 4 above .  
      1. In Databricks console, go to Admin Console > Instance Profile and click Add Instance Profile. Paste the Instance Profile ARN in the text box.  
      2. Select the Skip Validation checkbox and then click Add.  
      3. In Cluster settings, go to  Advance O ptions, and in Instance Profile field, select the instance profile created above.  
  9. Databricks information:  
    1. Databricks Cluster Id:  To obtain this ID, click the Cluster Name on the Clusters page in Databricks.  
      The page URL shows <https://<databricks-instance>/#/settings/clusters/<cluster-id>. The cluster ID is the number after the  /clusters/  component in the URL of this page.  
    2. Databricks Cluster Organization ID:  To obtain this ID, click the Cluster Name on the Clusters page in Databricks.  
      The number after o= in the workspace URL is the organization ID. For example, if the workspace URL is https://westus.azuredatabricks.net/?o=7692xxxxxxxx , then the organization ID is 7692xxxxxxxx.  
    3. Databricks Role ARN: Use the ARN of the Databricks-instanceprofile-role created in point 4 above
      The ARN looks like this:  arn:aws:iam ::45653****** *:role /AssumeRoleTest.
      This Databricks Role should have " iam:PassRole" permission in the role you have created for the D atabricks workspace.
       
  10. If using an existing Secrets Manager, ensure that the  KYVOS-CONNECTION-DATABRICKS-TOKEN key is added to it.

Using Kyvos Public AMI

In addition to the prerequisites mentioned in the Common section, you must have the following:

...