Document toolboxDocument toolbox

Disaster Recovery

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


Guided Recovery

Kyvos Manager provides a dedicated interface for disaster recovery of all cloud clusters. The dedicated wizard provides a guided flow for performing disaster recovery across all cloud clusters. 

Note

You need to manually create nodes for Kyvos Manager from the terminal.

Prerequisites

  1. Create a new node for Kyvos Manager, and ensure the following:

    1. This node should have the same set of permissions in terms of roles, tags (UsedBy / CreatedBy, CLUSTER_ID, ROLE : KM, LAYER : KM_Service), network access rules and permissions (VirtaulNetwork, Subnet, Security Group, Resource Group), credentials, size and instance type, disk organization (mount point, disks, directories where Kyvos Manager and Kyvos installed) as that of the original Kyvos Manager node which doesn’t exist anymore.

    2. For access purposes, you need to either add the same security group or the security group added must have the same set of access rules and permissions.

    3. If Secrets Manager/Key Vault is in use, then ensure that the roles assigned to the new Kyvos Manager node have access to the Secrets Manager/Key Vault.

    4. Ensure that roles assigned to the new Kyvos Manager node have access to the S3 bucket/ABFS account.

  2. If the Kyvos Manager node is created by attaching a disk image of any old Kyvos Manager node, then ensure the below in mentioned sequence:

    1. Agent service is stopped on that node.

    2. Agent cron entry deleted from crontab.

    3. Kyvos Manager Agent and Kyvos folders were deleted from it.

  3. The OS commands must be present in the path of a non-interactive login session for the user account used to log in to the nodes.

  4. To restore Kyvos Manager on the new node, download a script file named disaster-recovery-kyvosmanager.sh from the DFS at path <engine_work>/setup/scripts/ and execute that script. This will restore the Kyvos Manager server and the Kyvos Manager service will start automatically.

Note

  • Keep the following things handy during disaster recovery, depending on what is affected in your cluster.

    • New certificates are applicable if existing settings (domain/subdomain) are changed after recovery.

    • Production license as per new BI nodes in case any BI server impacted

  • You must use the disaster recovery capability in any of the following scenarios: 

    • If Kyvos Manager, BI Server, or Query Engine nodes are affected. 

    • If only the Kyvos Manager nodes are affected. 

    • If Kyvos Manager and all Kyvos nodes (BI Servers, Query Engines, Web Portal, and Postgres Server) are affected. 

  • If only the BI Server or Query Engine nodes are affected, then add a node for that service, and the cluster can be restored. You will not need to use disaster recovery capability for this case.

  • If you enable TLS for Kyvos Manager and Kyvos application, the TLS option is not applicable during the Disaster Recovery restoration. After successful restoration, the TLS-related certificates are restored, and you can continue with the TLS option. 

Disaster recovery through the guided flow on Kyvos Manager

  1. Log on to the Kyvos Manager portal and restart the services.

  2. On the navigation pane, click Utilities > Disaster Recovery.  

Note

When you log on to the restored Kyvos Manager, you will be automatically redirected to the Disaster Recovery page, showing the state of current nodes and steps to restore the cluster. 

  1. Click the Uninstall button corresponding to Step 1: Uninstall Zookeeper in the Restore Cluster area.

  2. On the displayed confirmation dialog box, provide your Kyvos Manager password, and click the Uninstall button.


    A new browser tab is opened, showing add node operation details and status. You may switch back to the Disaster Recovery browser tab.
    Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for deleting the offline nodes.


  3. Click the Delete button corresponding to Step 2: Delete Offline Nodes.

  4. From the Delete Offline Nodes dialog box, select the nodes you want to delete and provide your Kyvos Manager Password.
    Note that you will see only the Offline nodes in this list.

  5. Click the Delete button.
    NOTE: Once deleted, nodes cannot be retrieved.
    A new browser tab is opened, showing add node operation details and status. 
    You may switch back to the Disaster Recovery browser tab.
    Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for adding new nodes.

  6. Click the Add button corresponding to Step 3: Add Nodes.

  7. On the Add Nodes to Cluster dialog box, provide the Node Name or IP Address, and click the Add to List button.
    You can add as many new nodes with desired roles (all roles not listed in the image) as you need.

  8. Once done, provide your Kyvos Manager Password, and click the Add button.


    A new browser tab is opened, showing add node operation details and status. You may switch back to the Disaster Recovery browser tab.
    Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for installing Zookeeper.
     

  9. Click the Install button corresponding to Step 4: Install Zookeeper.

  10. Provide your Kyvos Manager Password on the confirmation box and click the Install button.

    A new browser tab is opened, showing uninstall Zookeeper operation details and status. You may switch back to the Disaster Recovery browser tab.
    Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for switching the repository.

  11. Click the Switch button corresponding to Step 5: Switch Repository. You will be redirected to the Switch Repository page.
    Refer to the Manage Kyvos Repository section to learn more.

Checkpoints

Some important checkpoints that you must verify after completing the disaster recovery process.

  1. Log in to Kyvos Manager.

  2. After login, the same old cluster should be visible by default.

  3. After the addition of the new Kyvos Manager node, the license error on the Kyvos Manager cluster Dashboard goes away.

  4. Log in to the Kyvos Web portal.

  5. Run the Sanity suite.

Snapshot Bundles:

Name and location to find snapshot bundles in DFS

Bundle Name

DFS folder Location inside Engine Work directory specified for cluster

Purpose / Contents

biserver_conf_snapshot.tar.gz

setup/conf/

BI configurations (bin, conf)

biserver_lib_snapshot.tar.gz

setup/binaries/

BI Binaries/lib (jars)

biserver_connections_snapshot.tar.gz

setup/conf/

BI Connections (Hadoop, DataLake, DBRepo)

queryengine_conf_snapshot.tar.gz

setup/conf/

QE Conf (bin, conf)

kyvos_webapp_conf_snapshot.tar.gz

setup/conf/

Kyvos WebApp Conf

kyvos_webapp_binaries_snapshot.tar.gz

setup/binaries/

Kyvos WebApp binaries (tomcat & Kyvos WebApp)

hadoop_connection_conf_snapshot.tar.gz

setup/conf/

Hadoop conf

hadoop_connection_lib_snapshot.tar.gz

setup/binaries/

Hadoop lib

kyvos_commons_snapshot.tar.gz

setup/conf/

Kyvos commons (Includes Acknowledgement & SanitySuite)

jre_snapshot.tar.gz

setup/binaries/

Jre folder

postgres_snapshot.tar.gz

setup/binaries/

Postgres bundle

km_agent_conf_snapshot.tar.gz

setup/conf/

KM agent conf

km_agent_lib_snapshot.tar.gz

setup/binaries/

KM agent lib

km_snapshot.tar.gz

setup/binaries/

Kyvos Manager snapshot (Kyvos manager application)

km_data_snapshot.tar.gz

setup/conf/

Kyvos Manager Data

km_db_snapshot.tar.gz

setup/binaries/

Kyvos Manager Repo

km_conf_snapshot.tar.gz

setup/conf/

Kyvos Manager configuration

Manually creating a Kyvos Manager machine for disaster recovery purposes to restore Kyvos Manager.

If the original cluster deployment was not an automated cluster deployment, then follow the same way to create the Kyvos Manager machine as it was used to create the original Kyvos Manager machine. For automated deployment cases, follow Step 2, recommended below in possible approaches.

Possible approaches to create Kyvos Manager machine

These approaches are mainly when Kyvos Manager was originally created using automated deployment. The original template can be referred for machine type, image related details for the original Kyvos Manager machine:

  1. Creating Kyvos Manager machine image

    1. One time after cluster deployment

      1. Pros: One-time image creation is required, and steps remain common.

      2. Cons: Binaries update are also required from the snapshot

    2. Per Upgrade

      1. Pros: Binaries are the latest only data snapshot date required.

      2. Cons: Per upgrade, image creation is required.

  2. Kyvos Manager disk Snapshot (For OS as well as data disk) (Recommended approach for Azure)

    1. Create a snapshot of the OS disk in the same resource group

      1. Cons: Need terminal access to Kyvos Manager node and needs sudo access to comment fstab entry before creating disk snapshot.

    2. Using created disk snapshot

      1. Create a managed disk using the above snapshot

      2. Create a new VM using this managed disk

  3. Create a template for an automated deployment Stack using existing bucket/Secrets Manager/KeyVault (BI and KM node only) and use KM part only (Cons: Kyvos Manager required)

    1. Remove resources BI, QE resources from the template

    2. Delete extra created resources

  4. Clone existing KM machine which is only for AWS (Not for actual, only for testing purposes)

  5. Launch the VM with the required image and attach a bootstrap script to it.

  6. Manually create a machine.

Steps for using created disk snapshot

If the Kyvos Manager node is restored using an image or disk snapshot, then ensure that:

  • Kyvos Manager agent process running on a node is stopped

  • Crontab entry for the agent is removed

  • Folders kyvos, kyvosmanagerdata, and Kyvos_States_Backup parallel to it are also deleted before adding this new Kyvos Manager node to the cluster.

When the backup is created

  1. Backup of the Kyvos Postgres and Kyvos Manager database is taken periodically at an interval of 06 hours (00:00, 06:00, 12:00 & 18:00). Any operation, entities, user, or information added to the database after the last backup will not be recovered from the last backup.

  2. Backup of all lib and conf of all the components is always taken in 4 operations (Deploy, Upgrade, Patch, and Deploy, and Rollback)

  3. Other than this, in each operation backup is taken for files that can change in that operation & the respective snapshot bundle of which those file are part, that snapshot bundle is uploaded.

Copyright Kyvos, Inc. All rights reserved.