Document toolboxDocument toolbox

Defining Hadoop Properties from Kyvos Manager

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


Use this to set Hadoop properties for cluster deployment.

Note

The user account used to deploy the cluster must have READ permissions on the namenode for Hadoop Library Path, Hadoop Native Library Path, and Hadoop Configuration Path. The root user cannot be used for the Hadoop ecosystem configuration.

Hadoop configuration

The following figure displays the Hadoop Configuration area.

As Kyvos Manager also allows the deployment of Cloud clusters, you will see cloud-specific options too. For on-premise clusters, please ignore these configurations.

To configure Hadoop properties:

  1. Enter details as:

Area

Parameter/Field

Comments/Description

Area

Parameter/Field

Comments/Description

 

Vendor

Select your Hadoop vendor from the list.

File System Type

Select HDFS to use as a file system by the Hadoop cluster.

Node and Authentication

 

EMR Master Node IP/Host Name

Hostname/IP address of the node where all the Hadoop library and configuration files are available.
NOTE: You must have access to the namenode, if it is not a part of the cluster being deployed.

Use different user account for accessing Name Node

Select the check box if you want to use a different user account (other than the logged-in Hadoop Node authentication user) for accessing the namenode.

NOTE: If you select this option. You will be prompted to provide a Username, Authentication Type, and Password/Shared Key for authentication.

Paths and Version

Version

Enter your Hadoop vendor version.

EMR and Hadoop Library Path

Enter library absolute paths for jar inclusion. Use a comma-separated list for multiple paths.

Example: /home/hadoop/,/home/hadoop/lib/

Refer to the Appendix for details.

Hadoop Native Library Path

Enter native library absolute paths for Hadoop (.so) files inclusion. Use a comma-separated list for multiple paths.

Example: /home/hadoop/native,/home/hadoop/lib/native

Refer to the Appendix for details.

EMR and Hadoop Configuration Path

Enter configuration absolute paths for configuration file inclusion. Use a comma-separated list for multiple paths.

Example:  /home/hadoop/conf,/etc/hadoop/conf/

Refer to the Appendix for details.

Zookeeper

Zookeeper Connection String

Enter the connection string for Zookeeper in the IP:Port OR HostName:Port format. Use a comma-separated list for multiple nodes.

Input Compression

 

Input Compression

Select the relevant check box to enable Snappy or LZO support for input file compression for the whole cluster.

You will need to provide Snappy Native Library Path or LZO Native Library Path and LZO Library Path, depending upon the selected compression type.

Refer to the Appendix for Hadoop paths to enable compression in Kyvos Manager.

Hadoop Parameters

Hadoop Parameters

Use this to add custom Hadoop parameters for your cluster.

NOTE: You must provide the value for cloudera.version parameter if you have selected Cloudera as your Hadoop Vendor.

  1. Click Validate Hadoop Configuration.

The system validates user authentication and paths that connect to the namenode. If validation is successful, proceed to HCatalog configurations. Else, click the back button and edit information wherever necessary, and click Revalidate.

HCatalog configuration

HCatalog is a table storage management tool for Hadoop that exposes tabular data of Hive metastore to other Hadoop applications. It enables users with different data processing tools to quickly write data onto a grid. It provides read and write interfaces for various data processing tools.

To configure the HCatalog properties for the cluster:

  1. Select the Enable HCatalog checkbox to present a relational view of data in the Hadoop Distributed File system (HDFS).

  2. Select the Use Hive as data source check box if your data is stored on Hive.

  3. Enter the details as:

Area

Parameter/Field

Comments/Description

Area

Parameter/Field

Comments/Description

Node and Authentication

 

 

 

Hive Source Node

For Hive Source Node, select the Same As NameNode option.

For Hive services, select the Other Node option.

Hive Node Host Name

If you selected the Other Node option above, enter your node IP here.

Use different user account for accessing Hive Node

Select this checkbox to use a user account other than the Hadoop Node authentication user for accessing the Hive node.

Paths and Version

 

 

H ive Version

Select the Hive version from the list.

HCatalog Library Path

Enter library absolute paths for jar inclusion. Use a comma-separated list for multiple paths.

Example: /home/hadoop/,/home/hadoop/lib/

Refer to the Appendix for details.

HCatalog Configuration Path

Enter configuration absolute paths for configuration file inclusion. Use a comma-separated list for multiple paths.

Eexample: /home/hadoop/conf,/etc/hadoop/conf/

HCatalog Parameters

HCatalog Parameters

Use this to add custom HCatalog parameters for your cluster.
NOTE: You must add kyvos.hiveserver2.jdbc.url parameter with value set as jdbc:hive2://<hiveserver2 hostname>:<port>/. Kyvos uses this parameter to connect with the HCatalog table using JDBC.
Example: jdbc:hive2://clnode2:10000/
This can also be set post-deployment under DefaultHadoopCluster01 Connection on the Kyvos portal.

  1. Click Validate Hive File Paths.

The system validates user authentication and paths that connect to namenode. If validation is successful, proceed to Execution Engine configurations. Else, click the back button and edit information wherever necessary, and click Revalidate.

Execution Engine Configuration

MapReduce is a default execution engine for Hive. Kyvos also supports Spark for running queries on Hive. You can configure the execution engine in this area according to your requirements.

Note

The fields displayed in the following figure are displayed ONLY if you select the Spark option.

To configure execution engine properties for the cluster:

  1. Enter details as:

Area

Parameter/Field

Comments/Description

Area

Parameter/Field

Comments/Description

 

Execution Engine Name

Select the Execution engine from the list.

 

Deployment Mode

Select the yarn-cluster option in case your Spark deployment mode is YARN cluster; else select the yarn-client option.

Node and Authentication

 

 

Spark Source Node

To use the Hive Source Node, select the Same As NameNode option. Else, select the Other Node option.

Spark Node Host Name

If you selected the Other Node option above, enter your node IP here.

Use different user account for accessing Spark Node

Select the checkbox to use a user account other than the Hadoop Node authentication user for accessing the Spark node.
NOTE: If you select this option. You will be prompted to provide Username, Authentication Type, and Password/Shared Key for authentication.

Paths and Version

 

 

 

Spark Version

Select the Spark version from the list.

Spark Home Directory

Provide Spark home directory.

Spark Library Path

Enter library files path for Spark. Refer to the  Appendix for details.

Spark Configuration Path

Enter the configuration files path for Spark. Refer to the  Appendix for details.

Spark Parameters

Spark Parameters

Use this to add custom Spark parameters for your cluster.
NOTE: You must provide the spark.yarn.historyServer.address parameter.

  1. Click Validate Spark file paths. The system validates user authentication and connection for paths.

  2. Click Next. You are directed to the Security page.

Next: Configure Security Properties

Copyright Kyvos, Inc. All rights reserved.