Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Interactive Cluster 

Job Cluster 

All the Spark jobs are submitted to a common cluster. The Virtual machines are shared across the jobs. 

  • Better control of the total number of Virtual Machines as it is independent of the build process jobs, or semantic model design, etc. 

  • Fewer hardware costs due to optimal usage of Virtual machines 

 

 

Each Spark job is submitted to a new/dedicated cluster. The Virtual machines are not shared across the jobs 

  • Many Kyvos jobs do not operate in a way that utilizes the capacity of Virtual Machines. So, the system might underutilize the hardware and incur higher costs 

  • When multiple jobs are running in parallel, it might cause the usage of more Virtual Machines simultaneously, and hence the system would be more vulnerable to the following errors:

...

Semantic Model Scenario 

Details 

Recommended Cluster 

Huge fact data processing (>300M row count) is required during the Full or Incremental semantic model processes, and the semantic model does not have other fact datasets with low data volume. 

Level jobs processing time is three hours or more 

Job 

The indexing time of the semantic model process job is less.

Indexing jobs time is 20% or less in the semantic model process time 

Job 

The semantic model has a large number of fact transformations.

The utilization of Virtual machines/Cores could exceed the instance pool maximum nodes or subscription quota. 

If there are ten fact transformations and the cluster is configured to use 20 worker nodes at max, the System might end up using 200 Virtual Machines (at max) simultaneously and result in Usage + quota limit errors.

Interactive 

Process jobs on low data volume.

Less usage of Azure Databricks 

Interactive 

Wide semantic model having a huge number of dimensions or attributes.

Too many process jobs are underutilizing the Virtual machines 

Interactive 

...