Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Applies to: (tick) Kyvos Enterprise  (tick) Kyvos Cloud (SaaS on AWS) (tick) Kyvos AWS Marketplace

...

Panel
panelIconIdatlassian-note
panelIcon:note:
bgColor#DEEBFF

Note

  • Keep the following things handy during disaster recovery, depending on what is affected in your cluster.

    • New certificates are applicable if existing settings (domain/subdomain) are changed after recovery.

    • Production license as per new BI nodes in case any BI server impacted

  • You must use the disaster recovery capability in any of the following scenarios: 

    • If Kyvos Manager, BI Server, or Query Engine nodes are affected. 

    • If only the Kyvos Manager nodes are affected. 

    • If Kyvos Manager and all Kyvos nodes (BI Servers, Query Engines, Web Portal, and Postgres Server) are affected. 

  • If only the BI Server or Query Engine nodes are affected, then add a node for that service, and the cluster can be restored. You will not need to use disaster recovery capability for this case.

  • If you enable TLS for Kyvos Manager and Kyvos application, the TLS option is not applicable during the Disaster Recovery restoration. After successful restoration, the TLS-related certificates are restored, and you can continue with the TLS option. 

Disaster recovery through the guided flow on Kyvos Manager

...

  1. Click the Uninstall button corresponding to Step 1: Uninstall Zookeeper in the Restore Cluster area.

  2. On the displayed confirmation dialog box, provide your Kyvos Manager password, and click the Uninstall button.


    A new browser tab is opened, showing add node operation details and status. You may switch back to the Disaster Recovery browser tab.
    Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for deleting the offline nodes.

  3. Click the Delete button corresponding to Step 2: Delete Offline Nodes.

  4. From the Delete Offline Nodes dialog box, select the nodes you want to delete and provide your Kyvos Manager Password.
    Note that you will see only the Offline nodes in this list.

  5. Click the Delete button.
    NOTE: Once deleted, nodes cannot be retrieved.
    A new browser tab is opened, showing add node operation details and status. 
    You may switch back to the Disaster Recovery browser tab.
    Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for adding new nodes.

  6. Click the Add button corresponding to Step 3: Add Nodes.

  7. On the Add Nodes to Cluster dialog box, provide the Node Name or IP Address, and click the Add to List button.
    You can add as many new nodes with desired roles (all roles not listed in the image) as you need.

  8. Once done, provide your Kyvos Manager Password, and click the Add button.


    A new browser tab is opened, showing add node operation details and status. You may switch back to the Disaster Recovery browser tab.
    Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for installing Zookeeper.
     

  9. Click the Install button corresponding to Step 4: Install Zookeeper.

  10. Provide your Kyvos Manager Password on the confirmation boxand click the Install button.

    A new browser tab is opened, showing uninstall Zookeeper operation details and status. You may switch back to the Disaster Recovery browser tab.
    Once the operation is completed, you will see the status shown in the following figure. At this point, you will be able to perform the next step for switching the repository.

  11. Click the Switch button corresponding to Step 5: Switch Repository. You will be redirected to the Switch Repository page.
    Refer to the Manage Kyvos Repository section to learn more.

...

Manual Recovery

Steps for Manual Recovery of Kyvos Manager Node and Roles on it

  1. Create a new node for Kyvos Manager, and ensure the following:

    1. This node should have the same set of permissions in terms of roles, tags (UsedBy / CreatedBy, CLUSTER_ID, ROLE : KM, LAYER : KM_Service), network access rules and permissions (VirtaulNetwork, Subnet, Security Group, Resource Group), credentials, size and instance type, disk organization (mount point, disks, directories where Kyvos Manager and Kyvos installed) as that of the original Kyvos Manager node which doesn’t exist anymore.

    2. For access purposes, you need to either add the same security group or the security group added must have the same set of access rules and permissions.

    3. If Secrets Manager/Key Vault is in use, then ensure that the roles assigned to the new Kyvos Manager node have access to the Secrets Manager/Key Vault.

    4. Ensure that roles assigned to the new Kyvos Manager node have access to the S3 bucket/ABFS account.

  2. Download snapshots (KyvosManager, KyvosManager data, and KyvosManager DB) from DFS on the above-created node. For downloading these snapshots, refer Snapshot bundles table below to know the location of the folder on DFS. You can find the URL for downloading individual snapshot bundles at the Azure portal by navigating to that folder within the container of the ABFS account that is used in deployment.

    1. Azure: For this to work with the below provided commands, ensure identity is already attached on the newly created Kyvos Manager node.
      azcopy login --identity
      azcopy cp ABFS-folder-path local-path

    2. AWS:
      aws s3 cp s3-path local-path

  3. Untar these bundles in the same above-mentioned order at the same respective paths as they were in the original Kyvos Manager node.

Panel
panelIconIdatlassian-note
panelIcon:note:
bgColor#DEEBFF

Note

The expected path of the kyvosmanager and kyvosmanagerdata folders can be cross-checked with variables configured in the setenv.sh file at kyvosmanager_war/kyvosmanager/setenv.sh after untar of km_snapshot.tar.gz.

  • tar -xvf km_snapshot.tar.gz

  • tar –xvf km_data_snapshot.tar.gz

  • tar –xvf km_db_snapshot.tar.gz

Panel
panelIconIdatlassian-note
panelIcon:note:
bgColor#DEEBFF

Note

Extract the Kyvos Manager DB snapshot tar by keeping it parallel to the kyvosmanagerdata folder. This will ensure that after untar of km_db_snapshot.tar.gz, the ankushdb folder is created at the kyvosmanagerdata/server/db/ location.

  1. Start Kyvos Manager using the startup.sh script.

  2. On the Kyvos Manager, navigate to the Kyvos Manager > Settings, and perform the following steps.

    1. In the Kyvos Manager Server Details area, click Reconfigure.

    2. Update the Hostname and Port for Kyvos Manager.

    3. Click the Validate button. You will see a validation error Server accessibility failed from 1 node. This is due to the unavailability of the old Kyvos Manager node.

    4. Click Apply.

  3. Navigate to the Dashboard. The cluster dashboard will show Unable to get license info error (see image below). Ignore it till a new KM node gets added to the cluster.

    Image Removed
  4. Stop Kyvos component services using the Actions menu for each component.

  5. Click Manage Kyvos > Disaster Recovery on the navigation pane.
    Depending on the current state of the system, you may see up to 3 links.

    1. If Kyvos Manager-managed multi-node Zookeeper was deployed, then the first link will appear for Zookeeper removal. For a single-node Kyvos Manager-managed zookeeper, no such link will appear.

    2. Then, you will see a link for removing the old Kyvos Manager node, which is no longer available.

    3. Thereafter, you will see a link to Add a new (current) Kyvos Manager node.

      Warning

      You MUST click the links in the same order as they are listed (i.e., first remove the Zookeeper (if applicable), then remove the unreachable node, and finally add the new node.

  6. Remove Zookeeper using the link.

  7. Remove the old Kyvos Manager node using the Remove Unreachable Node link. This initiates the Remove Node operation for removing the node having a WebPortal role (and Postgres role if bundled Repository is being used) from the cluster. You will be redirected to the Remove Node operation details page.

  8. Go to the Disaster Recovery page, and perform the following steps.

    1. Click the Add Node link to add a new Kyvos Manager node to the cluster. This will initiate the Add Node operation for adding the Web Portal role on the new Kyvos Manager node. You will be redirected to the Add Node operation details page.

    2. In case of any failure, re-perform this operation.

    3. On successful completion of this operation, the Kyvos folder will be added to this node.

  9. If bundled Postgres was in use, then:

    1. Download the Postgres snapshot bundle from binaries, and by deleting the existing Postgres folder on the KM node, set up the extracted folder from this snapshot as Postgres on the node. This Postgres snapshot needs to be extracted by copying it in parallel to the kyvos foler.

    2. Download the latest/applicable Postgres dump bundle from DFS (from the data folder) to the new Kyvos Manager node.

    3. Start Postgres service on the Kyvos Manager node.

    4. Import the dump in the Postgres instance (see the Manage Kyvos Repository section)

  10. On the Switch Repository page, configure the bundled repository on the Kyvos Manager node (see the Manage Kyvos Repository section).

  11. If any additional nodes are impacted, then:

    1. Remove those nodes using the Delete Node functionality of Kyvos Manager.

    2. Add the newly created node with the required roles on it.

  12. For cloud-based clusters, add Zookeeper to the cluster depending on how Zookeeper was used earlier.

    1. If non managed Zookeeper was in use, then configure a new KM node ip:2181 as a value for the Zookeeper string from the Hadoop Ecosystem configuration page.

    2. If Kyvos Manager-managed Zookeeper was in use, then deploy the Zookeeper component from the Hadoop Ecosystem configuration page.

  13. Start Kyvos Component services from the Dashboard using the Actions menu.

Important

When Kyvos Manager HA is enabled and Managed zookeeper is
Panel
panelIconIdatlassian-info
panelIcon:info:
bgColor#FFFAE6
Panel
panelIconIdatlassian-info
panelIcon:info:
bgColor#FFFAE6

Important

  • When Kyvos Manager HA is enabled and Managed zookeeper is used then after completing the Disaster Recovery activity, restart Kyvos Manager irrespective of whether TLS is enabled or not. Prior to the Kyvos 2024.1 release, Kyvos Manager restart is required only when TLS is enabled.

  • After completing disaster recovery, ensure that the following snapshots are pushed from Kyvos Manager. To do this, navigate to Utilities > Update Snapshot Bundles.

    image-20240124-144518.pngImage Added

Checkpoints

Some important checkpoints that you must verify after completing the disaster recovery process.

...

Panel
panelIconIdatlassian-note
panelIcon:note:
bgColor#DEEBFF

Note

  1. Any manual changes that need to be included in snapshots, must be applied on the Kyvos Manager node (having WebPortal Web Portal Role and Postgres Role when bundled repository is used)

  2. Executing start scripts of BI Server and Query Engine has behavior that those download snapshot bundles from DFS to sync their state. If any manual changes were made on the node and not part of the snapshot bundle yet, then executing the start script of BI Server and Query Engine will sync bundles in their startup, and any changes made locally in the same-named files that already exist in the bundle will get lost.

  3. For Web Portal, there are two scripts, start-dc.sh and start-sc_sync.sh in the jakarta/bin folder. When Web Portal service started from Kyvos Manager UI normally, the start-dc.sh will be executed. However, for Services HA or disaster recovery-based Web Portal start, Kyvos Manager will execute the start-dc_sync.sh script.

  4. In different operations that are being performed from Kyvos Manager, those bundles will be uploaded, which contain files that may change during that particular operation.

...

These approaches are mainly when KM Kyvos Manager was originally created using automated deployment. The original template can be referred for machine type, image related details for the original Kyvos Manager machine:

...