Frequently Asked Questions
What is the cardinality of a dimension?
Cardinality refers to the number of distinct values available in a column. For example, the cardinality of states in the US is 50 whereas the cardinality of the number of customers in the US could be in millions. The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row.
For Kyvos Free, you can have a maximum of 50 Million Cardinality for each dimension.
How can I control the number of dimensions?
Reduce the number of dimensions by using any or all of the following options:
Combining dimensions: At the Register File level, whenever possible, combine dimensions with a fewer number of attributes (1-3) into one dimension to reduce the number of dimensions. For example, some dimensions can be merged at the register file level to reduce the number of dimensions at the cube level.
Using multiple hierarchies: Consider using multiple hierarchies if you need two types of time data such as year-month-day and year-quarter-month-day for different purposes. Or if you need two types of location data such as division-region-district-location and state-county-city-location.
What are the supported dimension types and how to configure them?
Kyvos supports Time, Location, and Regular dimension types. The dimension type can be configured through the Dimension Properties panel as shown below.
What are the supported hierarchy types and how to create them?
Predefined Time Hierarchy: By selecting the hierarchy from Kyvos' pre-defined hierarchy list (allowed only on hierarchies for Time Type Dimension).
Custom Time Hierarchy: By selecting the Date Data Type for each level of the Time hierarchy individually from the Level Properties panel (allowed only on hierarchies of Time Type Dimension)
NOTE: If Date Data Type is not configured on a level of the custom time hierarchy, all the levels below that level will not perform/behave as Date Data Type even if configured.Custom Calendar: If the Quarter, Month, etc. for your data model are not the same as defined in the Gregorian calendar (allowed only on Custom Time Hierarchy of Time Type Dimension).
Alternate Hierarchy: By selecting the Has Alternate Path checkbox on the Hierarchy Properties panel.
Parent Child Hierarchy: By selecting the Has Parent Child Relation checkbox on the Hierarchy Properties panel.
Unbalanced and Ragged Hierarchies: You can hide redundant levels using the Hide Member property from the Level properties panel.
How to create a Time dimension in Kyvos?
At the time of adding dimensions, choose the dimension Type as Time from the Dimension Properties pane on the right.
Choose the hierarchy and field by right-clicking the dimension and making a selection.
If this is a time hierarchy, choose Year-Month-Date, Year-Quarter-Month, Year-Quarter-Week-Date, or Year-Quarter-Month.
If there is not a pre-defined time hierarchy shown, set the field data type to a Date type such as year-quarter-month.
Kyvos allows you to use time hierarchies in two ways:
Predefined: These are pre-created hierarchies in the system. You can select and add them to the cube from the hierarchies list from Year-Month-Date, Year-Quarter-Month, Year-Quarter-Week-Date, or Year-Quarter-Month.
Custom: In this type, you can define different date values for different columns. For example, year=2021 and month=09. In this case, you can drag individual columns and set up the field data type to each field such as year-quarter-month-day.
You can use the following to indicate time values:
Year: YYYY -2020
Quarter: Q1, Quarter1, 1 , 01
Month: Jan, January 01, 1
Week: w1, 01, 1, Week1
Day: 01, 1
If your data contains values like " 2020-Sept " for a month, it is not a true month value and you will need to convert the data to a supported format.
How can I process newer data without a full build?
You can use the Incremental build to update only the latest changes to an existing cube, without a full build. This option is faster than a full build.
For example, use it to update the cube with daily changes. You can also replace already processed data using the replace partition option.
NOTE: You can run an incremental build only on a successfully built cube.
You can also define a schedule for an incremental build, such that the build job is triggered automatically as per the defined schedule.
Read more: Semantic Model Process Types
I want to load incremental data in a built cube, what should I know?
Please refer some terminologies and details in the below table. The cube entities whose source tables have Incremental Data will be processed when an Incremental build job is scheduled on the cube.
There is no difference in the “Incremental data” definition for the dimension table and fact table.
Complete processing will be skipped during the incremental build for the cube entities whose source table does not have Incremental Data.
NOTE: If none of the source tables are found to have “Incremental Data”, the incremental build will be marked as skipped. Thus, the cube will continue to remain in the same state as before the incremental build.
| SQL source | Other data source |
Incremental Identifier | Any one column of the source table can be configured as the Incremental Identifier on the registered file. It's not mandatory to configure the Incremental identifier. | Software intelligence:
|
Incremental Data |
| Since these data sources are backed by a FileSystem. The records of all the files which meets Incremental Identifier criteria will be considered as “Incremental data”
|
I want to load incremental data in existing partitions/add new partitions, what should I know?
Kyvos allows the user to configure what must be done with the “Incremental Data". While scheduling the Incremental build job, you can choose from either of the Replace Partition options as detailed below.
None: Doesn't replace the partition. Adds incremental data in already processed partitions.
Auto: Identifies the partition automatically, and replaces data of partitions (existing data will be dropped from the cube and incremental/new data will be added).
NOTE: The Incremental Data will simply be added into the cube in the following scenarios:
If partitioning strategy is configured on the cube: New partitions from the source table.
If Partitioning strategy is NOT configured on the cube: Complete Incremental Data from the source table.
How can I connect to Kyvos cubes from Tableau and Excel?
Kyvos integrates seamlessly with Tableau (SQL and MDX connectivity) and Excel (MDX connectivity).
To connect Kyvos cubes:
What is partition strategy, how and why to set it up before full build?
Partitions are a flexible means of managing cubes. They are internal structures used by Kyvos to manage and drop or update data in a cube. The base partition must be created on the Date type value.
For example, a cube that contains sales information can contain a partition of data for each past year and partitions for each quarter of the current year.
When new information is added, only the current quarter's partition needs to be processed. At the end of the year, the four quarters of accumulated partition data can be merged into a single partition for the year and a new partition created for the first quarter of the new year.
To create partitions, the cube must have a date type dimension defined.
Kyvos provides options for getting recommendations based on a data profile build or queries executed on the cube to further optimize the partitioning strategy.
Note that partitions are not visible to business users of the cube. However, users with Cube modify roles assigned to them can configure, add, or drop partitions.
Read more:
What are the prerequisites to create a partition and/or sub-partition?
You can create cube partitions/sub-partitions from the Refine tab on the cube designer screen.
Following are the prerequisites:
General
Can only be created levels on a dimension hierarchy.
Can only be created on levels of the first hierarchy of a dimension (NOTE: A dimension may have multiple hierarchies).
Cannot be created on columns of Slowly Changing Type1 dimension.
Base Partition
Can only be created on a level of Time type hierarchy.
Can be created on any level of Predefined Time hierarchy (Date Data Type is implicitly created in Predefined Time hierarchy).
Cannot be created on levels on which the Date Data Type is not configured.
Cannot be created on a level of a hierarchy that is marked as a Custom Calendar.
Sub Partition
Can be created on any level of hierarchy excluding the ones which are mentioned in the General section.
What are the recommendations to create cube partitions and/or sub-partitions?
It is recommended to create a partition/sub-partition on columns that are likely to be used as filters while browsing.
Example: If it is likely that Month, Region will be used as a filter in the queries. These columns should be configured in cube partition / sub-partition. This helps in the pruning aggregations and hence in achieving better query response time.
The total number of combinations of partitions and sub-partitions must not be huge. It is recommended to keep the combinations in thousands.
How does the partition size for the base partition works?
Kyvos assigns a partition number to each fact record based on static logic. These partition numbers are calculated as the time difference of the record with the base date “01-Jan-1970". Below are some examples for various Date Data Types
Year: Partition Number = Number of Years between Year of record and 1-Jan-1970/Partition Size
Example (Base Partition on Year, Partition Size = 2)
Year Range (Partition window) | Partition# |
1971-72 | 1 |
1973-74 | 2 |
1975-76 | 3 |
1977-78 | 4 |
1979-80 | 5 |
Day: Partition Number = Number of Days between Date of record and 1-Jan-1970/Partition Size
Example (Base Partition on Day, Partition Size = 15)
Date Range (Partition window) | Partition# |
1 Jan 1970 - 15 Jan 1970 | 1 |
16 Jan 1970 - 30 Jan 1970 | 2 |
31 Jan 1970 - 14 Feb 1970 | 3 |
15 Feb 1970 – 1 Mar 1970 | 4 |
2 Mar 1970 – 16 Mar 1970 | 5 |
Thus, the partition numbers are a simple and static calculation depending upon:
Date Data Type of base partition Level
Partition Size configured on cube
Kyvos considers all the records of a partition number (Date Range) as an atomic unit and all the jobs like Drop Partition, Replace Partition etc. work accordingly.
What is the difference between Expert mode vs Smart mode aggregation strategy?
Expert (user-driven) is the configuration-driven mode that allows you to tune your cube performance by setting up some key materialization values and the precompute threshold level. From the summary screen, you can see how many dimension or hierarchy materializations are selected. From there you can view details, make changes, or learn more by clicking the i symbol. You can also see the degree of materialization and the precompute threshold. For all of these items, you can see whether it is the default, inherited, or modified. Use this option if you want to have more control over the process.
Smart (system-driven) is query-based and allows you to filter or analyze historical queries and work with data profile information. You can set the materialization level and specify whether to use fresh aggregates or to potentially add additional aggregates in addition to any existing ones. Use this option if you want to get quick insights with a low learning curve. You will be prompted to regenerate aggregates after changes to the cube design.
How to create a Slowly Changing Type1 dimension? What are the prerequisites?
On the Dimension Properties pane, select the Is Slowly Changing Type 1 Dimension checkbox.
Prerequisites:At least one of the following rules must be fulfilled:
The column that relates the dimension table and fact table (join key column) is the lowest level in the dimension hierarchy.
ORThe column that relates the dimension table and fact table (join key column) is one of the attributes in the dimension.
ORThe values of the lowest level of hierarchy/or any attribute possess a 1-1 relation with the join key column values.
NOTE: Kyvos compares the join key column value to identify if an existing record is updated as part of SCD1.
Login Failure
Unable to connect to database- Check if the Postgres is up on Kyvos Manager
If not then start it from the KM.
Could not connect to BI server- Check if the BI Server is up and running from Kyvos Manager. If not, then start it.
If the BI server is not coming up, then check any port is being used by another process. End that process or contact Kyvos Support to configure other ports for the BI server.
Could not retrieve the users list
Check if the BI server and Postgres are running and accepting connections on their ports.
If not start the services accordingly.
If LDAP is enabled then check whether Admins are able to login or not and check whether the LDAP user is listed in the user list (Kyvos Menu > Setup > Users).
Cube Build Failed
Dependency on External Services
S3
503 SlowDown Error.
EMR File Inconsistency Issues.
Access Rights issues. (403 Access denied)
Service Down. (Region failure/ Availability Zone
EMR
Error while scaling the EMR cluster. (Out of memory for example).
Incorrect configuration (For example glue not enabled properly).
Connectivity issue between EMR and Kyvos. (Different subnet).
Incorrect JDBC URL Specified for Hive.
Incorrect History Server URL Given for Spark. (behavior).
EMR is terminated while running the job.
EMR is terminated simply which is synched with Kyvos. How product behaves in that case.
Configurations are changed in EMR. (In Running EMR for ex scaling is changed.)
If EMR has no rights to access S3. (Monitor Behavior).
EMR doesn’t have rights on glue tables. (Rights are revoked, also on RF).
Cluster is resized while running.
Single EMR is shared across multiple Kyvos distributions.
Spot Instances are not available, or decommissioned. What is the behavior in that case ?
Spot Instances With Map Reduce.
In Non Glue Based Deployment, if EMR is terminated, how product behaves in that case.
Dynamo DB
Capacity Of Dynamo DB is reached.
Same Dynamo DB is shared across multiple kyvos deployment.
Dynamo DB Metadata Table is deleted.
EC2
Accidently BI Server/ KM/ QE node is shut down by anyone.
Attached disks are removed from the node, for any reason.
If attached volumes are not sufficient to build the cube, what is the product behavior in that case ?
If the user attached new volumes because the earlier build was failed due to lack of disk space, is he able to build the cube again successfully.
GLUE
BI Server doesn’t have Rights on glue tables.
Multiple kyvos deployments using the same Glue databases.
Table is modified in source after the registered file is created.
SECURITY
IAM Role Doesn’t have sufficient privileges on:
GLUE
S3
EMR
IAM Role is removed during the cube build process.
Snowflake
i. Access rights on Database and Tables on which cube is built.
ii. Credits are consumed.
iii. Access to create a stage on the database which is mentioned on the connection.
Miscellaneous
AWS Lambda time out.
1.1.1 Cube build is failing with File Not Found Exception
Check on the cluster for the presence of the file which is shown in the error snippet.
Maybe there was an ETL job running that deleted the data file temporarily.
Check on the cluster when did this file changed.
1.1.2 Cube build failed with out of memory error/ Java heap
Increase the memory for the mapper and reducer for the cube (if you are using MapReduce to build the cube).
Property’s value to change:
map.memory.mb and mapreduce.reduce.memory.mb- In the case of Spark increase the value for the property spark.executor.memory.
Cube Build slow
Cube build stuck at indexing job
Check the available resources on the Resource Manager.
Check the cube build job status on Resource Manager, It can be in Accepted, New, or Submitted Application.
Wait for the other jobs to complete or decrease the value for mapper and reducer memory in cube properties.
Register File Issue
Unable to preview getting File_Name Execution failed- Try changing the execution engine for the file either MapReduce or Spark.
To do so, add
build.execution.engine property in the Register File property.
By default, the SPARK SQL context is used to preview the files so try changing to MapReduce.
Restart the Spark SQL Context by API call or by restarting BI Server.
Connection error: Browser is not able to connect to server or the server is down.- This comes when you are registering the file from HCatalog.
Check the JDBC URL at the connection level if it is correct.
Check if the table is accessible from a beeline or hive shell.
Worksheet/Dashboard Preview Fail
Could not serve query as SQL operation asynchronous executor is stopped / Failed to execute query through OLAP layout
The SQL context running on the BI server node is stooped.
Restart the SQL context through REST API or restarting the BI server will automatically start the SQL context.
Could not execute the query as no query engine is active in the allocated or default segment- Check the state of the query engine on Activity Monitor.
If it is in an initializing state then wait until they come up.
If they are down then start them from Kyvos Manager and wait till they come up.
Please build the cube before browsing.- This happens when the cube is not built.
Build the cube or enable raw data querying on that cube.
Failed to execute SQL query through OLAP layout, Connection URL not provided for external engine HIVE- Check the value of hiveserver2.jdbc.url property in Hadoop Connection. (Navigate to Kyvos Menu > Setup > Connections > HadoopConnection).
If the correct JDBC URL is set or not.
If yes try to access Hive through a beeline with the same JDBC URL.
Kyvos Services not Starting
Web Portal is not coming up after starting- Check the ports used by the web portal are being used by other processes.
Default ports used by Web Portal are 6605, 8081, 8009, and 8005.
Make sure the Kyvos Manager Agent is running on the Web Portal’s node.
BI Server is not coming up after starting- Check the ports used by BI Server such as 6602, 45460, 45450
Make sure the Kyvos Manager Agent is running on the Web Portal’s node.
Query Engines are not coming up (YARN Mode)- Check the Resource Manager if the application for Query Engines is I which state.
Check the available resource on the Resource Manager
Check the available resources for the YARN Queue
Postgres showing down status- Check the port 45421 (used by Postgres) whether it is being used by another process.
Check the Kyvos Manager agent is up and running.
Kyvos Manager showing click to upgrade, what shall I do?- This happens when you start the Kyvos manager from the old version’s directory.
Make sure to start the correct Kyvos Manager from the correct directory.
Data Integrity Concerns
Measure values aren’t matching with expected results/other BI software
Check the filter conditions on register files
Check the SQL query on register files
Check the number of records in the cube build summary
Ensure that the source table had valid records when the cube was scheduled for build
No data/ETL load was under process when the cube build was running
In the Kyvos register files, validate that the column data types are correct as per the source data
Check the summary function of measure(s) in the cube design
Are your records falling beyond the configured cube sliding window strategy?
Check the join condition/column of your dimension and fact tables in cube DRD
Check if the discrepancy is because aggregates are shown as “unknown”
NOTE: NULL or blank does not contribute to aggregations in Kyvos, is your BI tool doing otherwise?
Dimension records are missing in viewer filter/browsing results
In the raw data of dimension table, check the values in all the columns of dimension for the missing record, are all column values valid as per column data type?
Ensure that the missing record(s) were present in the source table when the cube was scheduled for build.
Are you missing record(s) which has NULL/blank value(s) in any column?
If it is because of NULL/blank values, then set appropriate configurations as mentioned in the valid values table.
How does Kyvos handle special values like NULL, invalid value, and blanks?
Refer to the following table that shows how various special values are treated with respect to the data type of source column and the cube entity which is created using the source column.
Also, see Refine > Advanced Properties > Data Cleansing on the cube designer screen.
Data Type | Example | Level | Attribute | Measure | Primary Key |
CHAR | “” (blank) | Valid | Valid | Invalid | Invalid |
CHAR | NULL | Invalid | Invalid | Invalid | Invalid |
NUMBER (long) |
| Invalid | Invalid | Invalid | Invalid |
NUMBER (double, float, decimal) |
| Invalid | Invalid | Invalid | Invalid |
Date |
| Invalid | Invalid | Invalid | Invalid |
Behaviour of kyvos.build.blankvalue.badrow
Entity | TRUE (Default) | FALSE |
Dimension | The complete record gets skipped. | Invalid value is shown as NULL. |
Dimension | The complete record gets skipped. | The complete record gets skipped. |
Measure | Measure value will not be considered in totals. | Measure value will not be considered in totals. |
Primary Key | Record will be skipped. | Record will be skipped. |
Foreign Key | The totals of this record will be shown as unknown. | The totals of this record will be shown as unknown. |
What are valid/invalid values for Levels on which Date Data Type is configured?
Date Data Type is not respected/supported on:
Regular/Location Type dimension
Level if Date Data Type is not configured on any of its parent Levels in the hierarchy
Attribute(s) of any dimension
Date Data Type | Valid Values |
Year | Any valid number >= 1970 |
Quarter |
|
Month |
|
Week (Of Year) |
|
Day (Of Month) | Valid number according to Year-Month to which it belongs to according to Gregorian calendar |
DATE | Any valid date >= 1-Jan-1970 as per configured date format |
Rule for week start day
Kyvos supports SUNDAY as the week start day. Hence, any custom time hierarchy having a WEEK-type level must comply with this prerequisite.
Invalid data example:
If the hierarchy is Y-Q-M-W-D, then the week number for 2009-Jan-03 must be 1, while the week number for 2009-Jan-04 (SUNDAY) must be 2.
| Year | Quarter | Month | Week | Day |
Record 1 | 2009 | 1 | 1 | 1 | 3 |
Record 2 | 2009 | 1 | 1 | 1* | 4 |
*Shows invalid data.
Rule for a week that spans across the year
If a WEEK is such that it spans across 2 Years in the Gregorian calendar, then the WEEK number should be 53 for Days of December and 1 for Days of January.
For example, consider the last WEEK of the Year 2008 (it has first 4 days in December 2008 and last 3 Days in January 2009).
Valid data example:
| Year | Quarter | Month | Week | Day |
Day 1 | 2008 | 4 | 12 | 53 | 28 |
Day 2 | 2008 | 4 | 12 | 53 | 29 |
Day 3 | 2008 | 4 | 12 | 53 | 30 |
Day 4 | 2008 | 4 | 12 | 53 | 31 |
Day 5 | 2009 | 1 | 1 | 1 | 1 |
Day 6 | 2009 | 1 | 1 | 1 | 2 |
Day 7 | 2009 | 1 | 1 | 1 | 3 |
Invalid data example:
| Year | Quarter | Month | Week | Day |
Example 1 | 2008 | 4 | 12 | 1* | 29 |
Example 2 | 2009 | 1 | 1 | 53* | 2 |
*Shows invalid data.
Miscellaneous
How to get a count of queries that were run against the dimension/measures for Kyvos cube?
The Query Analyzer feature in Kyvos helps with your request in getting the details regarding the count of queries that were run against the dimension/measures.
Navigate to the cube > Click on the three dots button > Analyze Queries.
How to find the performance of Kyvos queries?
It can be seen from Live Query Analysis and Historic Query Analysis in the workbook section.
Navigate to Kyvos Menu -> Workbooks -> Query Analysis
Getting frequent alerts for memory shoot-up on Kyvos Manager
Check the buffer/cache memory on the node for which the alerts are triggered.
It's safe to clear the buffer/cache memory from the terminal.
Or change the alert level for these alerts in the Alerts and High Availability section from Kyvos Manager.
Unable to initiate new operations, such as restart cluster or restart BI server
This is generally caused by the locking of the Derby DB.
Steps to remove Derby DB lock:
Stop the Kyvos Manager server from the terminal by executing the stop command.
Navigate to kyvosmanagerdata/server/db/ folder and delete all the log files ending with .lck extension.
Navigate to kyvosmanagerdata/server/db/ankushdb/ folder and delete the db.lck and dbex.lck files.
Start the KyvosManager server from the terminal by executing the stop command.
Preview on Register file created over SQL based registered file is failing for dynamic schema
For dynamic schema where the table is created over two parquet files, one with data type as int and another as double, add a new property spark.sql.hive.convertMetastoreParquet=false in the Cube Advanced properties.
The property controls whether to use the built-in Parquet reader and writer for Hive tables with the parquet storage format (instead of Hive SerDe). The default value is set as true.
Internally, this property enables RelationConversions logical rule to convert HiveTableRelations to HadoopFsRelation
Copyright Kyvos, Inc. All rights reserved.