Applies to:Kyvos Enterprise Kyvos Azure Marketplace
Kyvos AWS Marketplace Kyvos Free (
The primary purpose of the caching service is to improve server performance by managing static and non-static java objects. The performance gain is from reducing the number of trips to the repository or other external sources of information, avoiding the cost of repeatedly re-creating objects, sharing objects between threads in a process and, when possible, between processes, and efficient use of process resources.
Kyvos uses caches extensively to deliver high performance and optimize the consumption of resources. Multiple caches have been integrated in different modules of Kyvos.
This document describes the cache maintained at the various components in Kyvos, individual cache, and their properties, such as type of information held, time of initialization, and events causing their invalidation.
BI Server caches
The BI Server caches both static and non-static information. Additionally, information like security information, entities, users, groups, and so on, is also cached on the BI Server. As this data typically remains the same, it does not take much space on the disks and does not increase incrementally.
Query caching at the BI Server layer is called the Result Cache or the Warm cache. If the same query has been previously executed, BI Server will cache the result of the query and serve it directly to the BI tool without referring to the Query Engine. If the query has been received by the BI Server for the first time, then it will be passed on to the Query Engine, where it is served either using the cuboid cache, local disk, or a combination of both.
The following sections explain the various caches maintained on the BI Server.
Entity cache
The entities created in Kyvos, such as, datasets, transformation, relationships, semantic models, worksheets, and workbooks are persisted in repository. To prevent extensive and expensive hits on repository, an in-memory entity cache (local to each BI Server) is maintained.
The cache is warmed at BI Server bootup and synced every 15 minutes. The sync interval is configurable.
It gets updated in case of the following events:
- Creating a new Kyvos entity
- Updating/modifying existing entity
- Deleting an entity
Folder cache
Logically related entities can be grouped together in a Folder for easy accessibility, sharing, and organization. Folder details are also cached in memory to reduce dependency on the repository for fetching the details.
The folder cache is warmed at BI Server bootup and synced every 15 mins by the cache sync thread.
Following is the list of events triggering cache updates:
- Creating a new folder
- Updating/modifying existing folder details
- Deleting a folder
Entity properties cache
Kyvos lets you apply a wide range of properties on entities to derive optimal performance and utilization of resources. These properties also offer a greater degree of control over the configurations and resource utilization. Each entity can be uniquely configured according to its design or performance requirements.
Properties associated with each entity are stored in the repository and cached in memory. The cache is created at BI Server bootup and synced every 15 mins with the repository.
The properties cache gets updated whenever:
- A new property is applied to any entity
- The value of the existing property is modified
- An applied property is removed
Users’ cache
User details are stored in the repository. The user profile could be imported from an external directory such as LDAP etc., or user could be native to Kyvos. Access restrictions are closely governed often by the user permissions or their roles. To enforce security checks without degrading system performance or overloading the repository with requests, user details are cached in memory.
User details are synced every 15 minutes with the repository. User details get updated when:
- New Kyvos user profile is created
- Existing Kyvos user profile is modified
- Existing Kyvos user profile is deleted
- Changes are identified while syncing the Kyvos user repository with the external active directory
- User logs into Kyvos
- User is added to a group
- User is removed from a group
Group cache
For easier management of users and performing batch operations, on the group of users, users are organized into groups, either in Kyvos or in an external directory. These group details are persisted in repository and maintained in memory cache also.
The details are synced every 15 mins with Kyvos repository. Group details get modified when
- New group is created
- Existing group is modified
- Existing group is deleted
- New user is added to the group
- Existing user is removed from the group
Access rights cache
Entities in Kyvos are shareable across users with the following three access types:
- Read-only
- Read-write
- Deny
Users can share the folders or individual entities. These access rights come into play throughout the user interaction with Kyvos, such as:
- Showing the list of entities.
- Performing CRUD operations on these entities.
These access rights are cached in memory for faster access at bootup.
The cache gets updated when:
- An entity is shared
- Permissions on already shared entity are modified/ revoked
Data security rules cache
To control the data visible to users, Kyvos provides the functionality of defining row-level and column-level data security rules. These rules are implicitly applied when a user browses data.
Accessing these constraints from the repository during browsing can impede the browsing performance. Hence, the rules are cached in memory for faster processing.
The cache is warmed at bootup and synced with the repository every 15 minutes.
The cache gets updated when:
- New rule is created
- Existing rule is modified
- Existing rule is deleted
Cuboid distribution cache
There are many pertinent details of cuboids required while serving any browsing requests, such as:
- Dimension/ measure set/ version they belong to
- Query Engines where they are replicated
- Other details which may help in filtering the cuboids
These details are persisted in the repository and require the joining of two or more tables, which is an expensive operation. To improve the browsing performance, cuboid distribution cache is created.
The cache is closely driven by query engines’ health and browsing requests. It gets invalidated when query engines get disconnected from the network (due to disconnection from Zookeeper or machine-related issues). The cache gets populated at the time of bootup. Upon invalidation, it gets recreated when the browsing request comes in.
To re-sync the cache externally, there is a REST API that allows the user to purge the existing cache.
Result cache
Serving browsing requests in Kyvos incurs data processing costs at BI Servers and query engines and network costs when data travels from query engines to BI Servers, and from BI Servers to client. As the cardinality of data increases, the associated costs also rise. To reduce these costs, requests are cached at BI Server in results cache. This eliminates the data processing and network travel costs from query engines as they are not queried and also reduces the processing and network costs at BI Servers.
The cache gets warmed at bootup. It gets invalidated when:
- Semantic model process is completed successfully (complete cache is invalidated)
- Semantic model definition gets altered (partial cache is invalidated)
- Semantic model is deleted
Successful queries served by query engines are continually cached.
Upon invalidation, cache can be warmed by executing auto-population suite which executes historical queries. To invalidate cache externally, a REST API has also been exposed. After invalidation, the cache can be re-warmed by auto population suite.
Query Engine caches
Kyvos provides multi-level caching options, when a query is received by the BI Server, it checks whether that query has been previously executed, if yes then the output will be served from the BI Server cache, also known as the Result cache.
If not, then the query is passed on to the Query Engine. If some portion of the node information is available on the Query Engine heap memory, it is known as the Look Ahead Cache. Node is the smallest unit of a cuboid.
After that, the result will be served from the Query Engine off-heap or the main memory. If the block cache is not available in the Off-Heap memory, then the cuboids are read from the local disk, which is replicated from the persistent storage. If still some cuboids are not found in the local disk, then those are fetched from Persistent storage like the HDFS or S3.
The cuboids used to serve the query are cached in the Query Engine RAM and is known as the cold cache or cuboid cache strategy.
The following sections explain the various caches maintained on the Query Engines.
Look ahead cache
When serving a query, the single cuboid block may be accessed multiple times. To avoid reading the same block from disk multiple times and decompressing it, a cuboid block is kept in memory as look ahead cache for the current request.
This cache is maintained in the heap memory of Query Engine (Xmx) and cleared once a request is served. This is an in-memory cache maintained for each query, and it gets purged when the query is completed.
The look ahead cache size on the query engine for a single request is determined by the amount of available memory for each cuboid reader thread and it follows LRU (Least Recently Used) eviction policy. Typically, 50% of heap memory is allocated for look ahead cache. Higher the size of look ahead cache, better would be the query performance.
MOLAP semantic model cache
MOLAP semantic model in Kyvos holds the pertinent metadata, which provides a vital role in the semantic model process and browsing operations. It is persisted on the distributed file system and copied to the local disk on BI Servers and Query Engines. The process of initializing the MOLAP semantic model is a time and memory-intensive operation because it initializes multiple lookup maps and other data structures. For active browsing, it is initialized once and cached.
The cached reference gets invalidated and removed from cache when
- The semantic model is not browsed for 15 minutes.
- The semantic model is successfully built
To close the cached reference externally, a REST API has been exposed.
Cuboid blocks cache
Cuboid blocks are the building blocks for cuboids in Kyvos. When a cuboid is read to serve a query, it is actually the blocks that are read and queried. The cuboid typically rests on the distributed file system and is available on a local disk (configurable). To ensure higher throughput of queries and preserve optimal performance, these blocks are cached in off-heap memory.
The cache gets populated as the blocks get queried and remains in the memory until their cleanup. The purge thread performs cleanup of the cache when the following conditions are met:
- Cache has consumed 100% of configured memory.
- There are some blocks older than a day (Time to live).
The above-mentioned conditions are configurable and trigger a cleanup of 25% (configurable parameter) memory consumed by cache.
Advanced properties for managing caching strategy
Kyvos allows you to use multiple advanced properties to configure the caching strategy on both the BI Server and the Query Engines. You can control the caching using these properties. However, default caches like, Entity cache, Folder cache, Entity properties cache, Users’ cache, Group cache, Access rights cache, and Data security rules cache are not configurable.
Advanced properties for Cuboid Distribution
- cuboid.replication.type: Specifies where cuboid replication should take place. Local stores cuboids on the local disk of each query engine. NFS will replicate the cuboids in a shared location for all Query Engines and None will not replicate any cuboids
- cuboid.balance.threshold: The ranges/cuboids for a given dimension should be balanced if the difference of range/cuboid count for a dimension in any 2 query engines exceeds this value.
- cuboid.replication.factor: Specifies the number of query engines on which a particular cuboid will be assigned for querying.
Advanced properties for Query Cache
- query.resultcache.lookup.strategy: Specifies the records cache lookup strategy. In case of Exact strategy, only exact lookup is performed. In case of Auto strategy, complete records cache is iterated to serve the request.
- query.resultcache.strategy: Specifies BI Server's strategy for caching metadata and user query results. You can choose NONE, ALL, or AUTO options. When the value is set as AUTO, the system will determine which query results should be cached when the execution time exceeds the query cache time threshold.
- query.cuboidcache.compression: Specifies the compression status of cached cuboid blocks. You can choose to keep the cuboid blocks as compressed or uncompressed.
- query.cuboidcache.strategy: Specifies the cuboid cache level. This is an off-heap memory-based cache on the query engines. Enabling this cache helps improve the performance of queries. The maximum number of cuboids that can be cached depends on the off-heap memory configurations of the query engines.
- query.cuboidcache.tier: Specifies the location where cuboid blocks will be cached. If cached to disk, the blocks are always stored in compressed form. You can choose to cache the blocks on an off-heap memory or a disk-backed off heap memory.