Defining Hadoop Properties from Kyvos Manager
Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace
Kyvos Azure Marketplace  Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)
Use this to set Hadoop properties for cluster deployment.
Note
The user account used to deploy the cluster must have READ permissions on the namenode for Hadoop Library Path, Hadoop Native Library Path, and Hadoop Configuration Path. The root user cannot be used for the Hadoop ecosystem configuration.
Hadoop configuration
The following figure displays the Hadoop Configuration area.
As Kyvos Manager also allows the deployment of Cloud clusters, you will see cloud-specific options too. For on-premise clusters, please ignore these configurations.
To configure Hadoop properties:
Enter details as:
Area | Parameter/Field | Comments/Description |
---|---|---|
 | Vendor | Select your Hadoop vendor from the list. |
File System Type | Select HDFS to use as a file system by the Hadoop cluster. | |
Node and Authentication  | EMR Master Node IP/Host Name | Hostname/IP address of the node where all the Hadoop library and configuration files are available. |
Use different user account for accessing Name Node | Select the check box if you want to use a different user account (other than the logged-in Hadoop Node authentication user) for accessing the namenode. NOTE: If you select this option. You will be prompted to provide a Username, Authentication Type, and Password/Shared Key for authentication. | |
Paths and Version | Version | Enter your Hadoop vendor version. |
EMR and Hadoop Library Path | Enter library absolute paths for jar inclusion. Use a comma-separated list for multiple paths. Example: /home/hadoop/,/home/hadoop/lib/ Refer to the Appendix for details. | |
Hadoop Native Library Path | Enter native library absolute paths for Hadoop (.so) files inclusion. Use a comma-separated list for multiple paths. Example: /home/hadoop/native,/home/hadoop/lib/native Refer to the Appendix for details. | |
EMR and Hadoop Configuration Path | Enter configuration absolute paths for configuration file inclusion. Use a comma-separated list for multiple paths. Example: /home/hadoop/conf,/etc/hadoop/conf/ Refer to the Appendix for details. | |
Zookeeper | Zookeeper Connection String | Enter the connection string for Zookeeper in the IP:Port OR HostName:Port format. Use a comma-separated list for multiple nodes. |
Input Compression  | Input Compression | Select the relevant check box to enable Snappy or LZO support for input file compression for the whole cluster. You will need to provide Snappy Native Library Path or LZO Native Library Path and LZO Library Path, depending upon the selected compression type. Refer to the Appendix for Hadoop paths to enable compression in Kyvos Manager. |
Hadoop Parameters | Hadoop Parameters | Use this to add custom Hadoop parameters for your cluster. NOTE: You must provide the value for cloudera.version parameter if you have selected Cloudera as your Hadoop Vendor. |
Click Validate Hadoop Configuration.
The system validates user authentication and paths that connect to the namenode. If validation is successful, proceed to HCatalog configurations. Else, click the back button and edit information wherever necessary, and click Revalidate.
HCatalog configuration
HCatalog is a table storage management tool for Hadoop that exposes tabular data of Hive metastore to other Hadoop applications. It enables users with different data processing tools to quickly write data onto a grid. It provides read and write interfaces for various data processing tools.
To configure the HCatalog properties for the cluster:
Select the Enable HCatalog checkbox to present a relational view of data in the Hadoop Distributed File system (HDFS).
Select the Use Hive as data source check box if your data is stored on Hive.
Enter the details as:
Area | Parameter/Field | Comments/Description |
---|---|---|
Node and Authentication    | Hive Source Node | For Hive Source Node, select the Same As NameNode option. For Hive services, select the Other Node option. |
Hive Node Host Name | If you selected the Other Node option above, enter your node IP here. | |
Use different user account for accessing Hive Node | Select this checkbox to use a user account other than the Hadoop Node authentication user for accessing the Hive node. | |
Paths and Version   | H ive Version | Select the Hive version from the list. |
HCatalog Library Path | Enter library absolute paths for jar inclusion. Use a comma-separated list for multiple paths. Example: /home/hadoop/,/home/hadoop/lib/ Refer to the Appendix for details. | |
HCatalog Configuration Path | Enter configuration absolute paths for configuration file inclusion. Use a comma-separated list for multiple paths. Eexample: /home/hadoop/conf,/etc/hadoop/conf/ | |
HCatalog Parameters | HCatalog Parameters | Use this to add custom HCatalog parameters for your cluster. |
Click Validate Hive File Paths.
The system validates user authentication and paths that connect to namenode. If validation is successful, proceed to Execution Engine configurations. Else, click the back button and edit information wherever necessary, and click Revalidate.
Execution Engine Configuration
MapReduce is a default execution engine for Hive. Kyvos also supports Spark for running queries on Hive. You can configure the execution engine in this area according to your requirements.
Note
The fields displayed in the following figure are displayed ONLY if you select the Spark option.
To configure execution engine properties for the cluster:
Enter details as:
Area | Parameter/Field | Comments/Description |
---|---|---|
 | Execution Engine Name | Select the Execution engine from the list. |
 | Deployment Mode | Select the yarn-cluster option in case your Spark deployment mode is YARN cluster; else select the yarn-client option. |
Node and Authentication   | Spark Source Node | To use the Hive Source Node, select the Same As NameNode option. Else, select the Other Node option. |
Spark Node Host Name | If you selected the Other Node option above, enter your node IP here. | |
Use different user account for accessing Spark Node | Select the checkbox to use a user account other than the Hadoop Node authentication user for accessing the Spark node. | |
Paths and Version    | Spark Version | Select the Spark version from the list. |
Spark Home Directory | Provide Spark home directory. | |
Spark Library Path | Enter library files path for Spark. Refer to the Appendix for details. | |
Spark Configuration Path | Enter the configuration files path for Spark. Refer to the Appendix for details. | |
Spark Parameters | Spark Parameters | Use this to add custom Spark parameters for your cluster. |
Click Validate Spark file paths. The system validates user authentication and connection for paths.
Click Next. You are directed to the Security page.
Next: Configure Security Properties
Copyright Kyvos, Inc. All rights reserved.