Disaster Recovery
Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)
Guided Recovery
Kyvos Manager provides a dedicated interface for disaster recovery of all cloud clusters. The dedicated wizard provides a guided flow for performing disaster recovery across all cloud clusters.
Note
You need to manually create nodes for Kyvos Manager from the terminal.
Prerequisites
Create a new node for Kyvos Manager, and ensure the following:
This node should have the same set of permissions in terms of roles, tags (UsedBy / CreatedBy, CLUSTER_ID, ROLE : KM, LAYER : KM_Service), network access rules and permissions (VirtaulNetwork, Subnet, Security Group, Resource Group), credentials, size and instance type, disk organization (mount point, disks, directories where Kyvos Manager and Kyvos installed) as that of the original Kyvos Manager node which doesn’t exist anymore.
For access purposes, you need to either add the same security group or the security group added must have the same set of access rules and permissions.
If Secrets Manager/Key Vault is in use, then ensure that the roles assigned to the new Kyvos Manager node have access to the Secrets Manager/Key Vault.
Ensure that roles assigned to the new Kyvos Manager node have access to the S3 bucket/ABFS account.
If the Kyvos Manager node is created by attaching a disk image of any old Kyvos Manager node, then ensure the below in mentioned sequence:
Agent service is stopped on that node.
Agent cron entry deleted from crontab.
Kyvos Manager Agent and Kyvos folders were deleted from it.
To restore Kyvos Manager on the new node, download a script file named disaster-recovery-kyvosmanager.sh from the DFS at path <engine_work>/setup/scripts/ and execute that script. This will restore the Kyvos Manager server.
Note
Keep the following things handy during disaster recovery, depending on what is affected in your cluster.
New certificates are applicable if existing settings (domain/subdomain) are changed after recovery.
Production license as per new BI nodes in case any BI server impacted
You must use the disaster recovery capability in any of the following scenarios:
If Kyvos Manager, BI Server, or Query Engine nodes are affected.
If only the Kyvos Manager nodes are affected.
If Kyvos Manager and all Kyvos nodes (BI Servers, Query Engines, WebPortal, and Postgres Server) are affected.
If only the BI Server or Query Engine nodes are affected, then add a node for that service, and the cluster can be restored. You will not need to use disaster recovery capability for this case.
If you enable TLS for Kyvos Manager and Kyvos application, the TLS option is not applicable during the Disaster Recovery restoration. After successful restoration, the TLS-related certificates are restored, and you can continue with the TLS option.
Disaster recovery through the guided flow on Kyvos Manager
Log on to the Kyvos Manager portal and restart the services.
On the navigation pane, click Utilities > Disaster Recovery.
Note
When you log on to the restored Kyvos Manager, you will be automatically redirected to the Disaster Recovery page, showing the state of current nodes and steps to restore the cluster.
Click the Uninstall button corresponding to Step 1: Uninstall Zookeeper in the Restore Cluster area.
On the displayed confirmation dialog box, provide your Kyvos Manager password, and click the Uninstall button.
A new browser tab is opened, showing add node operation details and status. You may switch back to the Disaster Recovery browser tab.
Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for deleting the offline nodes.Click the Delete button corresponding to Step 2: Delete Offline Nodes.
From the Delete Offline Nodes dialog box, select the nodes you want to delete and provide your Kyvos Manager Password.
Note that you will see only the Offline nodes in this list.Click the Delete button.
NOTE: Once deleted, nodes cannot be retrieved.
A new browser tab is opened, showing add node operation details and status.
You may switch back to the Disaster Recovery browser tab.
Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for adding new nodes.Click the Add button corresponding to Step 3: Add Nodes.
On the Add Nodes to Cluster dialog box, provide the Node Name or IP Address, and click the Add to List button.
You can add as many new nodes with desired roles (all roles not listed in the image) as you need.Once done, provide your Kyvos Manager Password, and click the Add button.
A new browser tab is opened, showing add node operation details and status. You may switch back to the Disaster Recovery browser tab.
Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for installing Zookeeper.
Click the Install button corresponding to Step 4: Install Zookeeper.
Provide your Kyvos Manager Password on the confirmation box and click the Install button.
A new browser tab is opened, showing uninstall Zookeeper operation details and status. You may switch back to the Disaster Recovery browser tab.
Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for switching the repository.Click the Switch button corresponding to Step 5: Switch Repository. You will be redirected to the Switch Repository page.
Refer to the Manage Kyvos Repository section to learn more.
Manual Recovery
Steps for Manual Recovery of Kyvos Manager Node and Roles on it
Create a new node for Kyvos Manager, and ensure the following:
This node should have the same set of permissions in terms of roles, tags (UsedBy / CreatedBy, CLUSTER_ID, ROLE : KM, LAYER : KM_Service), network access rules and permissions (VirtaulNetwork, Subnet, Security Group, Resource Group), credentials, size and instance type, disk organization (mount point, disks, directories where Kyvos Manager and Kyvos installed) as that of the original Kyvos Manager node which doesn’t exist anymore.
For access purposes, you need to either add the same security group or the security group added must have the same set of access rules and permissions.
If Secrets Manager/Key Vault is in use, then ensure that the roles assigned to the new Kyvos Manager node have access to the Secrets Manager/Key Vault.
Ensure that roles assigned to the new Kyvos Manager node have access to the S3 bucket/ABFS account.
Download snapshots (KyvosManager, KyvosManager data, and KyvosManager DB) from DFS on the above-created node. For downloading these snapshots, refer Snapshot bundles table below to know the location of the folder on DFS. You can find the URL for downloading individual snapshot bundles at the Azure portal by navigating to that folder within the container of the ABFS account that is used in deployment.
Azure: For this to work with the below provided commands, ensure identity is already attached on the newly created Kyvos Manager node.
azcopy login --identity
azcopy cp ABFS-folder-path local-pathAWS:
aws s3 cp s3-path local-path
Untar these bundles in the same above-mentioned order at the same respective paths as they were in the original Kyvos Manager node.
Start Kyvos Manager using the startup.sh script.
On the Kyvos Manager, navigate to the Kyvos Manager > Settings, and perform the following steps.
In the Kyvos Manager Server Details area, click Reconfigure.
Update the Hostname and Port for Kyvos Manager.
Click the Validate button. You will see a validation error Server accessibility failed from 1 node. This is due to the unavailability of the old Kyvos Manager node.
Click Apply.
Navigate to the Dashboard. The cluster dashboard will show Unable to get license info error (see image below). Ignore it till a new KM node gets added to the cluster.
Stop Kyvos component services using the Actions menu for each component.
Click Manage Kyvos > Disaster Recovery on the navigation pane.
Depending on the current state of the system, you may see up to 3 links.If Kyvos Manager-managed multi-node Zookeeper was deployed, then the first link will appear for Zookeeper removal. For a single-node Kyvos Manager-managed zookeeper, no such link will appear.
Then, you will see a link for removing the old Kyvos Manager node, which is no longer available.
Thereafter, you will see a link to Add a new (current) Kyvos Manager node.
Remove Zookeeper using the link.
Remove the old Kyvos Manager node using the Remove Unreachable Node link. This initiates the Remove Node operation for removing the node having a WebPortal role (and Postgres role if bundled Repository is being used) from the cluster. You will be redirected to the Remove Node operation details page.
Go to the Disaster Recovery page, and perform the following steps.
Click the Add Node link to add a new Kyvos Manager node to the cluster. This will initiate the Add Node operation for adding the Web Portal role on the new Kyvos Manager node. You will be redirected to the Add Node operation details page.
In case of any failure, re-perform this operation.
On successful completion of this operation, the Kyvos folder will be added to this node.
If bundled Postgres was in use, then:
Download the Postgres snapshot bundle from binaries, and by deleting the existing Postgres folder on the KM node, set up the extracted folder from this snapshot as Postgres on the node. This Postgres snapshot needs to be extracted by copying it in parallel to the kyvos foler.
Download the latest/applicable Postgres dump bundle from DFS (from the data folder) to the new Kyvos Manager node.
Start Postgres service on the Kyvos Manager node.
Import the dump in the Postgres instance (see the Manage Kyvos Repository section)
On the Switch Repository page, configure the bundled repository on the Kyvos Manager node (see the Manage Kyvos Repository section).
If any additional nodes are impacted, then:
Remove those nodes using the Delete Node functionality of Kyvos Manager.
Add the newly created node with the required roles on it.
For cloud-based clusters, add Zookeeper to the cluster depending on how Zookeeper was used earlier.
If non managed Zookeeper was in use, then configure a new KM node ip:2181 as a value for the Zookeeper string from the Hadoop Ecosystem configuration page.
If Kyvos Manager-managed Zookeeper was in use, then deploy the Zookeeper component from the Hadoop Ecosystem configuration page.
Start Kyvos Component services from the Dashboard using the Actions menu.
Checkpoints
Some important checkpoints that you must verify after completing the disaster recovery process.
Log in to Kyvos Manager.
After login, the same old cluster should be visible by default.
After the addition of the new Kyvos Manager node, the license error on the Kyvos Manager cluster Dashboard goes away.
Log in to the Kyvos Web portal.
Run the Sanity suite.
Snapshot Bundles:
Name and location to find snapshot bundles in DFS
Bundle Name | DFS folder Location inside Engine Work directory specified for cluster | Purpose / Contents |
biserver_conf_snapshot.tar.gz | setup/conf/ | BI configurations (bin, conf) |
biserver_lib_snapshot.tar.gz | setup/binaries/ | BI Binaries/lib (jars) |
biserver_connections_snapshot.tar.gz | setup/conf/ | BI Connections (Hadoop, DataLake, DBRepo) |
queryengine_conf_snapshot.tar.gz | setup/conf/ | QE Conf (bin, conf) |
kyvos_webapp_conf_snapshot.tar.gz | setup/conf/ | Kyvos WebApp Conf |
kyvos_webapp_binaries_snapshot.tar.gz | setup/binaries/ | Kyvos WebApp binaries (tomcat & Kyvos WebApp) |
hadoop_connection_conf_snapshot.tar.gz | setup/conf/ | Hadoop conf |
hadoop_connection_lib_snapshot.tar.gz | setup/binaries/ | Hadoop lib |
kyvos_commons_snapshot.tar.gz | setup/conf/ | Kyvos commons (Includes Acknowledgement & SanitySuite) |
jre_snapshot.tar.gz | setup/binaries/ | Jre folder |
postgres_snapshot.tar.gz | setup/binaries/ | Postgres bundle |
km_agent_conf_snapshot.tar.gz | setup/conf/ | KM agent conf |
km_agent_lib_snapshot.tar.gz | setup/binaries/ | KM agent lib |
km_snapshot.tar.gz | setup/binaries/ | KyvosManager snapshot (kyvosmanager application) |
km_data_snapshot.tar.gz | setup/conf/ | KyvosManager Data |
km_db_snapshot.tar.gz | setup/binaries/ | KyvosManager Repo |
Manually creating a Kyvos Manager machine for disaster recovery purposes to restore Kyvos Manager.
If the original cluster deployment was not an automated cluster deployment, then follow the same way to create the Kyvos Manager machine as it was used to create the original Kyvos Manager machine. For automated deployment cases, follow Step 2, recommended below in possible approaches.
Possible approaches to create Kyvos Manager machine
These approaches are mainly when KM was originally created using automated deployment. The original template can be referred for machine type, image related details for the original Kyvos Manager machine:
Creating Kyvos Manager machine image
One time after cluster deployment
Pros: One-time image creation is required, and steps remain common.
Cons: Binaries update are also required from the snapshot
Per Upgrade
Pros: Binaries are the latest only data snapshot date required.
Cons: Per upgrade, image creation is required.
Kyvos Manager disk Snapshot (For OS as well as data disk) (Recommended approach for Azure)
Create a snapshot of the OS disk in the same resource group
Cons: Need terminal access to Kyvos Manager node and needs sudo access to comment fstab entry before creating disk snapshot.
Using created disk snapshot
Create a managed disk using the above snapshot
Create a new VM using this managed disk
Create a template for an automated deployment Stack using existing bucket/Secrets Manager/KeyVault (BI and KM node only) and use KM part only (Cons: Kyvos Manager required)
Remove resources BI, QE resources from the template
Delete extra created resources
Clone existing KM machine which is only for AWS (Not for actual, only for testing purposes)
Launch the VM with the required image and attach a bootstrap script to it.
Manually create a machine.
Steps for using created disk snapshot
If the Kyvos Manager node is restored using an image or disk snapshot, then ensure that:
Kyvosmanager agent process running on a node is stopped
Crontab entry for the agent is removed
Folders kyvos, kyvosmanagerdata, and Kyvos_States_Backup parallel to it are also deleted before adding this new Kyvos Manager node to the cluster.
When the backup is created
Backup of the Kyvos Postgres and Kyvos Manager database is taken periodically at an interval of 06 hours (00:00, 06:00, 12:00 & 18:00). Any operation, entities, user, or information added to the database after the last backup will not be recovered from the last backup.
Backup of all lib and conf of all the components is always taken in 4 operations (Deploy, Upgrade, Patch, and Deploy, and Rollback)
Other than this, in each operation backup is taken for files that can change in that operation & the respective snapshot bundle of which those file are part, that snapshot bundle is uploaded.
Copyright Kyvos, Inc. All rights reserved.