Disaster Recovery
Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
Kyvos Azure Marketplace  Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)
Guided Recovery
Kyvos Manager provides a dedicated interface for disaster recovery of all cloud clusters. The dedicated wizard provides a guided flow for performing disaster recovery across all cloud clusters.Â
Note
You need to manually create nodes for Kyvos Manager from the terminal.
Prerequisites
Create a new node for Kyvos Manager, and ensure the following:
This node should have the same set of permissions in terms of roles, tags (UsedBy / CreatedBy, CLUSTER_ID, ROLE : KM, LAYER : KM_Service), network access rules and permissions (VirtaulNetwork, Subnet, Security Group, Resource Group), credentials, size and instance type, disk organization (mount point, disks, directories where Kyvos Manager and Kyvos installed) as that of the original Kyvos Manager node which doesn’t exist anymore.
For access purposes, you need to either add the same security group or the security group added must have the same set of access rules and permissions.
If Secrets Manager/Key Vault is in use, then ensure that the roles assigned to the new Kyvos Manager node have access to the Secrets Manager/Key Vault.
Ensure that roles assigned to the new Kyvos Manager node have access to the S3 bucket/ABFS account.
If the Kyvos Manager node is created by attaching a disk image of any old Kyvos Manager node, then ensure the below in mentioned sequence:
Agent service is stopped on that node.
Agent cron entry deleted from crontab.
Kyvos Manager Agent and Kyvos folders were deleted from it.
The OS commands must be present in the path of a non-interactive login session for the user account used to log in to the nodes.
To restore Kyvos Manager on the new node, download a script file named disaster-recovery-kyvosmanager.sh from the DFS at path <engine_work>/setup/scripts/ and execute that script. This will restore the Kyvos Manager server and the Kyvos Manager service will start automatically.
Note
Keep the following things handy during disaster recovery, depending on what is affected in your cluster.
New certificates are applicable if existing settings (domain/subdomain) are changed after recovery.
Production license as per new BI nodes in case any BI server impacted
You must use the disaster recovery capability in any of the following scenarios:Â
If Kyvos Manager, BI Server, or Query Engine nodes are affected.Â
If only the Kyvos Manager nodes are affected.Â
If Kyvos Manager and all Kyvos nodes (BI Servers, Query Engines, Web Portal, and Postgres Server) are affected.Â
If only the BI Server or Query Engine nodes are affected, then add a node for that service, and the cluster can be restored. You will not need to use disaster recovery capability for this case.
If you enable TLS for Kyvos Manager and Kyvos application, the TLS option is not applicable during the Disaster Recovery restoration. After successful restoration, the TLS-related certificates are restored, and you can continue with the TLS option.Â
Disaster recovery through the guided flow on Kyvos Manager
Log on to the Kyvos Manager portal and restart the services.
On the navigation pane, click Utilities > Disaster Recovery. Â
Note
When you log on to the restored Kyvos Manager, you will be automatically redirected to the Disaster Recovery page, showing the state of current nodes and steps to restore the cluster.Â
Click the Uninstall button corresponding to Step 1: Uninstall Zookeeper in the Restore Cluster area.
On the displayed confirmation dialog box, provide your Kyvos Manager password, and click the Uninstall button.
A new browser tab is opened, showing add node operation details and status. You may switch back to the Disaster Recovery browser tab.
Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for deleting the offline nodes.Click the Delete button corresponding to Step 2: Delete Offline Nodes.
From the Delete Offline Nodes dialog box, select the nodes you want to delete and provide your Kyvos Manager Password.
Note that you will see only the Offline nodes in this list.Click the Delete button.
NOTE: Once deleted, nodes cannot be retrieved.
A new browser tab is opened, showing add node operation details and status.Â
You may switch back to the Disaster Recovery browser tab.
Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for adding new nodes.Click the Add button corresponding to Step 3: Add Nodes.
On the Add Nodes to Cluster dialog box, provide the Node Name or IP Address, and click the Add to List button.
You can add as many new nodes with desired roles (all roles not listed in the image) as you need.Once done, provide your Kyvos Manager Password, and click the Add button.
A new browser tab is opened, showing add node operation details and status. You may switch back to the Disaster Recovery browser tab.
Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for installing Zookeeper.
ÂClick the Install button corresponding to Step 4: Install Zookeeper.
Provide your Kyvos Manager Password on the confirmation box and click the Install button.
A new browser tab is opened, showing uninstall Zookeeper operation details and status. You may switch back to the Disaster Recovery browser tab.
Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for switching the repository.Click the Switch button corresponding to Step 5: Switch Repository. You will be redirected to the Switch Repository page.
Refer to the Manage Kyvos Repository section to learn more.
Important
When Kyvos Manager HA is enabled and Managed zookeeper is used then after completing the Disaster Recovery activity, stop and start Kyvos Manager from terminal (not from Kyvos Manager UI) irrespective of whether TLS is enabled or not. Prior to the Kyvos 2024.1 release, Kyvos Manager restart is required only when TLS is enabled.
After completing disaster recovery, ensure that the following snapshots are pushed from Kyvos Manager. To do this, navigate to Utilities > Update Snapshot Bundles.
Checkpoints
Some important checkpoints that you must verify after completing the disaster recovery process.
Log in to Kyvos Manager.
After login, the same old cluster should be visible by default.
After the addition of the new Kyvos Manager node, the license error on the Kyvos Manager cluster Dashboard goes away.
Log in to the Kyvos Web portal.
Run the Sanity suite.
Name and location to find snapshot bundles in DFS
Bundle Name | DFS folder Location inside Engine Work directory specified for cluster | Purpose / Contents |
biserver_conf_snapshot.tar.gz | setup/conf/ | BI configurations (bin, conf) |
biserver_lib_snapshot.tar.gz | setup/binaries/ | BI Binaries/lib (jars) |
biserver_connections_snapshot.tar.gz | setup/conf/ | BI Connections (Hadoop, DataLake, DBRepo) |
queryengine_conf_snapshot.tar.gz | setup/conf/ | QE Conf (bin, conf) |
kyvos_webapp_conf_snapshot.tar.gz | setup/conf/ | Kyvos WebApp Conf |
kyvos_webapp_binaries_snapshot.tar.gz | setup/binaries/ | Kyvos WebApp binaries (tomcat & Kyvos WebApp) |
hadoop_connection_conf_snapshot.tar.gz | setup/conf/ | Hadoop conf |
hadoop_connection_lib_snapshot.tar.gz | setup/binaries/ | Hadoop lib |
kyvos_commons_snapshot.tar.gz | setup/conf/ | Kyvos commons (Includes Acknowledgement & SanitySuite) |
jre_snapshot.tar.gz | setup/binaries/ | Jre folder |
postgres_snapshot.tar.gz | setup/binaries/ | Postgres bundle |
km_agent_conf_snapshot.tar.gz | setup/conf/ | KM agent conf |
km_agent_lib_snapshot.tar.gz | setup/binaries/ | KM agent lib |
km_snapshot.tar.gz | setup/binaries/ | Kyvos Manager snapshot (Kyvos manager application) |
km_data_snapshot.tar.gz | setup/conf/ | Kyvos Manager Data |
km_db_snapshot.tar.gz | setup/binaries/ | Kyvos Manager Repo |
km_conf_snapshot.tar.gz | setup/conf/ | Kyvos Manager configuration |
Note
Any manual changes that need to be included in snapshots, must be applied on the Kyvos Manager node (having Web Portal Role and Postgres Role when bundled repository is used)
Executing start scripts of BI Server and Query Engine has behavior that those download snapshot bundles from DFS to sync their state. If any manual changes were made on the node and not part of the snapshot bundle yet, then executing the start script of BI Server and Query Engine will sync bundles in their startup, and any changes made locally in the same-named files that already exist in the bundle will get lost.
For Web Portal, there are two scripts, start-dc.sh and start-sc_sync.sh in the jakarta/bin folder. When Web Portal service started from Kyvos Manager UI normally, the start-dc.sh will be executed. However, for Services HA or disaster recovery-based Web Portal start, Kyvos Manager will execute the start-dc_sync.sh script.
In different operations that are being performed from Kyvos Manager, those bundles will be uploaded, which contain files that may change during that particular operation.
Manually creating a Kyvos Manager machine for disaster recovery purposes to restore Kyvos Manager.
If the original cluster deployment was not an automated cluster deployment, then follow the same way to create the Kyvos Manager machine as it was used to create the original Kyvos Manager machine. For automated deployment cases, follow Step 2, recommended below in possible approaches.
Note
You MUST create a backup/snapshot/image of the Kyvos Manager node after deployment/upgrade completion.
Possible approaches to create Kyvos Manager machine
These approaches are mainly when Kyvos Manager was originally created using automated deployment. The original template can be referred for machine type, image related details for the original Kyvos Manager machine:
Creating Kyvos Manager machine image
One time after cluster deployment
Pros: One-time image creation is required, and steps remain common.
Cons: Binaries update are also required from the snapshot
Per Upgrade
Pros: Binaries are the latest only data snapshot date required.
Cons: Per upgrade, image creation is required.
Kyvos Manager disk Snapshot (For OS as well as data disk) (Recommended approach for Azure)
Create a snapshot of the OS disk in the same resource group
Cons:Â Need terminal access to Kyvos Manager node and needs sudo access to comment fstab entry before creating disk snapshot.
Using created disk snapshot
Create a managed disk using the above snapshot
Create a new VM using this managed disk
Create a template for an automated deployment Stack using existing bucket/Secrets Manager/KeyVault (BI and KM node only) and use KM part only (Cons: Kyvos Manager required)
Remove resources BI, QE resources from the template
Delete extra created resources
Clone existing KM machine which is only for AWS (Not for actual, only for testing purposes)
Launch the VM with the required image and attach a bootstrap script to it.
Manually create a machine.
Steps for using created disk snapshot
If the Kyvos Manager node is restored using an image or disk snapshot, then ensure that:
Kyvos Manager agent process running on a node is stopped
Crontab entry for the agent is removed
Folders kyvos, kyvosmanagerdata, and Kyvos_States_Backup parallel to it are also deleted before adding this new Kyvos Manager node to the cluster.
When the backup is created
Backup of the Kyvos Postgres and Kyvos Manager database is taken periodically at an interval of 06 hours (00:00, 06:00, 12:00 & 18:00). Any operation, entities, user, or information added to the database after the last backup will not be recovered from the last backup.
Backup of all lib and conf of all the components is always taken in 4 operations (Deploy, Upgrade, Patch, and Deploy, and Rollback)
Other than this, in each operation backup is taken for files that can change in that operation & the respective snapshot bundle of which those file are part, that snapshot bundle is uploaded.
Copyright Kyvos, Inc. All rights reserved.