Distinct count measures

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

Distinct count refers to the count of unique values present in a column (dimensions or measures) or expression.

Kyvos provides support for two types of distinct count:

Approximate: The fact value is an estimated distinct count of raw data values.
Accurate: The fact value gives the exact distinct count of raw data values.

Approximate distinct counts are computationally more efficient when compared to accurate counts. Utilizing the Approximate count enhances performance, whereas the Accurate count leads to increased semantic model size and processing time. There are numerous business scenarios where accurate distinct counts are unnecessary or where it is acceptable to prioritize performance over accuracy. Additionally, some businesses aim to maintain an optimal semantic model size. In such instances, it is preferable to employ Approximate distinct counts.

To compute both approximate and accurate values, Kyvos performs indexing.

From Kyvos 2023.2 onwards, indexing for Approximate distinct count measures is no longer required. Consequently, utilizing the member values directly to estimate distinct counts will help reduce semantic model process time, cost, and metadata size. This change specifically applies to the 'Approximate' Distinct Count measure. Additionally, you can add or delete a measure for the subsequent semantic model processes (Incremental process, Update Aggregation, DropPartition).

Important points to remember

This feature will only be enabled for semantic models which are created in Kyvos 2023.2 or a later version using the following:
- Add a semantic Model
- Duplicate or Save As existing semantic model
The distinct count measure must be selected as an Approximate distinct count measure.
The source column of the measure must belong to the fact dataset.

To add a distinct count measure, perform the following steps.

From the Toolbox, click Semantic Models.
Select a semantic model from the list.
From the Source Fields section, drag candidate measures to the semantic model design worksheet to the areas for measures. Use Ctrl+click to select multiple fields.
In the Measure Properties section, select the Distinct Count option from the Function list.
Select the Count type as Approximate or Accurate.
Select the dataset from the Count On list. The 'Count On' feature is used to calculate the distinct count value of a field in one dataset based on another related dataset.
Select the Accuracy Level from the list.

Note

To improve the precision of the distinct count approximation, you can select a higher accuracy level. However, this will require more computational memory and disk storage, thereby increasing the size of the semantic model.

Select the Value of <Measure name> does not repeat across base partition checkbox to avoid the repetition of measure values across base partition.

Note

You must provide an appropriate partition strategy. See the Partition strategy section for more details.

To change the Format, click the Actions menu (…) button. Click the i icon to learn more.
Choose the desired format type from the list and enter relevant values.
Specify the name of the semantic model in the Display Folder field to display the semantic model metadata while browsing in Excel.
Specify whether to materialize the measure. Yes means, this field will be part of the processed semantic model.
Optionally, select the Visible checkbox to display the measure for use in worksheets. If you don’t want users to use this data directly, you can clear it.
Optionally, add a description of up to 200 characters. The description is used when you hover the cursor over the measure in a worksheet.
Click Save.