Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

Use this to set Hadoop properties for cluster deployment.

Note

The user account used to deploy the cluster must have READ permissions on the namenode for Hadoop Library Path, Hadoop Native Library Path, and Hadoop Configuration Path. The root user cannot be used for the Hadoop ecosystem configuration.

Hadoop configuration

The following figure displays the Hadoop Configuration area.

As Kyvos Manager also allows the deployment of Cloud clusters, you will see cloud-specific options too. For on-premise clusters, please ignore these configurations.

To configure Hadoop properties:

Enter details as:

Area	Parameter/Field	Comments/Description

Area	Parameter/Field	Comments/Description
	Vendor	Select your Hadoop vendor from the list.
	File System Type	Select HDFS to use as a file system by the Hadoop cluster.
Node and Authentication	EMR Master Node IP/Host Name	Hostname/IP address of the node where all the Hadoop library and configuration files are available. NOTE: You must have access to the namenode, if it is not a part of the cluster being deployed.
Node and Authentication	Use different user account for accessing Name Node	Select the check box if you want to use a different user account (other than the logged-in Hadoop Node authentication user) for accessing the namenode. NOTE: If you select this option. You will be prompted to provide a Username, Authentication Type, and Password/Shared Key for authentication.
Paths and Version	Version	Enter your Hadoop vendor version.
	EMR and Hadoop Library Path	Enter library absolute paths for jar inclusion. Use a comma-separated list for multiple paths. Example: /home/hadoop/,/home/hadoop/lib/ Refer to the Appendix for details.
	Hadoop Native Library Path	Enter native library absolute paths for Hadoop (.so) files inclusion. Use a comma-separated list for multiple paths. Example: /home/hadoop/native,/home/hadoop/lib/native Refer to the Appendix for details.
	EMR and Hadoop Configuration Path	Enter configuration absolute paths for configuration file inclusion. Use a comma-separated list for multiple paths. Example: /home/hadoop/conf,/etc/hadoop/conf/ Refer to the Appendix for details.
Zookeeper	Zookeeper Connection String	Enter the connection string for Zookeeper in the IP:Port OR HostName:Port format. Use a comma-separated list for multiple nodes.
Input Compression	Input Compression	Select the relevant check box to enable Snappy or LZO support for input file compression for the whole cluster. You will need to provide Snappy Native Library Path or LZO Native Library Path and LZO Library Path, depending upon the selected compression type. Refer to the Appendix for Hadoop paths to enable compression in Kyvos Manager.
Hadoop Parameters	Hadoop Parameters	Use this to add custom Hadoop parameters for your cluster. NOTE: You must provide the value for cloudera.version parameter if you have selected Cloudera as your Hadoop Vendor.

Click Validate Hadoop Configuration.

The system validates user authentication and paths that connect to the namenode. If validation is successful, proceed to HCatalog configurations. Else, click the back button and edit information wherever necessary, and click Revalidate.

HCatalog configuration

HCatalog is a table storage management tool for Hadoop that exposes tabular data of Hive metastore to other Hadoop applications. It enables users with different data processing tools to quickly write data onto a grid. It provides read and write interfaces for various data processing tools.

To configure the HCatalog properties for the cluster:

Select the Enable HCatalog checkbox to present a relational view of data in the Hadoop Distributed File system (HDFS).
Select the Use Hive as data source check box if your data is stored on Hive.
Enter the details as:

Area	Parameter/Field	Comments/Description

Area	Parameter/Field	Comments/Description
Node and Authentication	Hive Source Node	For Hive Source Node, select the Same As NameNode option. For Hive services, select the Other Node option.
	Hive Node Host Name	If you selected the Other Node option above, enter your node IP here.
	Use different user account for accessing Hive Node	Select this checkbox to use a user account other than the Hadoop Node authentication user for accessing the Hive node.
Paths and Version	H ive Version	Select the Hive version from the list.
	HCatalog Library Path	Enter library absolute paths for jar inclusion. Use a comma-separated list for multiple paths. Example: /home/hadoop/,/home/hadoop/lib/ Refer to the Appendix for details.
	HCatalog Configuration Path	Enter configuration absolute paths for configuration file inclusion. Use a comma-separated list for multiple paths. Eexample: /home/hadoop/conf,/etc/hadoop/conf/
HCatalog Parameters	HCatalog Parameters	Use this to add custom HCatalog parameters for your cluster. NOTE: You must add kyvos.hiveserver2.jdbc.url parameter with value set as jdbc:hive2://<hiveserver2 hostname>:<port>/. Kyvos uses this parameter to connect with the HCatalog table using JDBC. Example: jdbc:hive2://clnode2:10000/ This can also be set post-deployment under DefaultHadoopCluster01 Connection on the Kyvos portal.

Click Validate Hive File Paths.

The system validates user authentication and paths that connect to namenode. If validation is successful, proceed to Execution Engine configurations. Else, click the back button and edit information wherever necessary, and click Revalidate.

Execution Engine Configuration

MapReduce is a default execution engine for Hive. Kyvos also supports Spark for running queries on Hive. You can configure the execution engine in this area according to your requirements.

Note

The fields displayed in the following figure are displayed ONLY if you select the Spark option.

To configure execution engine properties for the cluster:

Enter details as:

Area	Parameter/Field	Comments/Description

Area	Parameter/Field	Comments/Description
	Execution Engine Name	Select the Execution engine from the list.
	Deployment Mode	Select the yarn-cluster option in case your Spark deployment mode is YARN cluster; else select the yarn-client option.
Node and Authentication	Spark Source Node	To use the Hive Source Node, select the Same As NameNode option. Else, select the Other Node option.
	Spark Node Host Name	If you selected the Other Node option above, enter your node IP here.
	Use different user account for accessing Spark Node	Select the checkbox to use a user account other than the Hadoop Node authentication user for accessing the Spark node. NOTE: If you select this option. You will be prompted to provide Username, Authentication Type, and Password/Shared Key for authentication.
Paths and Version	Spark Version	Select the Spark version from the list.
	Spark Home Directory	Provide Spark home directory.
	Spark Library Path	Enter library files path for Spark. Refer to the Appendix for details.
	Spark Configuration Path	Enter the configuration files path for Spark. Refer to the Appendix for details.
Spark Parameters	Spark Parameters	Use this to add custom Spark parameters for your cluster. NOTE: You must provide the spark.yarn.historyServer.address parameter.

Click Validate Spark file paths. The system validates user authentication and connection for paths.
Click Next. You are directed to the Security page.

Next: Configure Security Properties

Kyvos 2023.3

Defining Hadoop Properties from Kyvos Manager

Analytics

Hadoop configuration

HCatalog configuration

Execution Engine Configuration

Next: Configure Security Properties