Sample Query Engine memory calculation

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

Each Query Engine runs inside a YARN container. Query Engine memory has two components - Xmx and off-heap.

The Xmx memory is used for query processing and the off-heap memory is used for cuboid caching.

QE Container memory = Xmx + MaxDirectMemorySize

Further, there are two types of cache in the container memory.

Look Ahead cache (local): This cache is maintained in the Xmx memory of the Query Engine and cleared once a request is served. This is maintained only for one session.

Global/Cuboid cache (Off heap): This cache is maintained in the MaxDirectMemorySize memory of the Query Engine. To ensure higher throughput of queries and preserve optimal performance, cuboids are cached in off-heap memory. The cache gets populated as the blocks get queried and remain in the memory until they are cleaned up. The purge thread performs cleanup of the cache when the following conditions are met:

Cache has consumed 100% of configured memory.
There are some blocks older than a day (Time to live).

Consider the following example.

Let’s assume there are 3 Query Engines configured on the Kyvos cluster and the memory is configured as:

Max Memory (Xmx) = 20GB

Off Heap Memory = 12GB

Assuming there is a Sales semantic model with 300 cuboids. A request came in from a user which needs to query all 300 cuboids. A thread (THRIFT_WORKER_THREAD – default value is 10 meaning 10 users can query simultaneously) is run from OLAP Engine to Query Engines for this request.

Memory used for each worker thread is calculated as:

Total_Xmx/THRIFT_WORKER_THREAD = 20GB/20 = 2GB (2000 MB)

This means each Query Engine can use 2GB for the look ahead cache.

On each Query Engine, one thread is used for each cuboid. Hence, for 100 cuboids 100 threads are needed per QE.

If PARALLEL_THREAD_PER_QUERY is configured as 20, this means 20 cuboids can be read parallelly for each request per Query Engine. Once this is done, the next 20 cuboids will be read, and so on.

Now, 50% of (2GB/20 = 100 MB) which is 50 MB, can be used for look ahead cache per cuboid.

If the Block size = 1 MB (default is 512 kB). Then per thread 50MB/1 MB = 50 blocks/cuboids can be kept in look ahead cache.

Here, the number of cuboids per Query Engine is 100, and we can store only 50 cuboids at a time in look ahead cache. So, the remaining 50 cuboids will be missed. In this case, the Off Heap(Global) cache will come into the picture and the result will be searched from the global cache. Still, if the result is not there, then the OLAP engine will read from the disk. This will increase the I/O time.

Hence, you must ensure that the caches are fully utilized and tweak memory configuration if required.

If you increase the thrift worker threads for concurrency, then memory will also need an increase, as Look Ahead capacity will be decreased. You must decide these as per your use case.

If HDFS is slow then use global cache (increase off-heap memory and decrease LA memory) if HDFS is fast, then use local cache (increase LA cache memory decrease off-heap). This can be checked per Query Engine in BI Server logs.

PFB sample

Time=4686 ms RecordsProcessed=1171 cuboidReadTime=4593 ms [io=4327 decompress=60 cacheEntryToNodeBuild=19 =4406 ms] readFromFC=4567 ms getAndCacheBlockCalls=46777]  nodesRead=46698 [STAR=36891]  blocksAccessed=78 [LA_Cache capacity=18 :: hit=93400 miss=168 (readThruCache=90 readThruDisk=78)] readFrom=HDFS NL_Blocks=48 size=13302896 L_Blocks=30 size=5636120 viewNodesRead=0  filterTime=47ms  filterMapdbLookupTime=0  filterCriterionTime