16 October, 2019

Cloudera Certified Administrator For Apache Hadoop (CCAH) CCA-500 Exam Questions And Answers

Want to know features? Want to lear more about experience? Study . Gat a success with an absolute guarantee to pass Cloudera CCA-500 (Cloudera Certified Administrator for Apache Hadoop (CCAH)) test on your first attempt.

Free CCA-500 Demo Online For Microsoft Certifitcation:

Page: 1 / 5

Total 60 questions Full Exam Access

Question 1

You are running Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undeselected?

HDFS is almost full

The NameNode goes down

A DataNode is disconnected from the cluster

Map or reduce tasks that are stuck in an infinite loop

MapReduce jobs are causing excessive memory swaps

Question 2

You are configuring a server running HDFS, MapReduce version 2 (MRv2) on YARN running Linux. How must you format underlying file system of each DataNode?

They must be formatted as HDFS

They must be formatted as either ext3 or ext4

They may be formatted in any Linux file system

They must not be formatted - - HDFS will format the file system automatically

Question 3

You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine?

Install the impalad daemon statestored daemon, and daemon on each machine in the cluster, and the impala shell on your gateway machine

Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine

Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalogd daemon on one of the nodes in the cluster

Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine

Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node

Question 4

Your cluster’s mapred-start.xml includes the following parameters

4 GB

17.2 GB

8.9 GB

8.2 GB

24.6 GB

Question 5

You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?

Sample the web server logs web servers and copy them into HDFS using curl

Ingest the server web logs into HDFS using Flume

Channel these clickstreams into Hadoop using Hadoop Streaming

Import all user clicks from your OLTP databases into Hadoop using Sqoop

Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers

Question 6

On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10 plain text files as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?

We cannot say; the number of Mappers is determined by the ResourceManager

We cannot say; the number of Mappers is determined by the developer

We cannot say; the number of mappers is determined by the ApplicationMaster

Question 7

You have A 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running HDFS High Availability (HA). You want to minimize the chance of data loss in your cluster. What should you do?

Add another master node to increase the number of nodes running the JournalNode which increases the number of machines available to HA to create a quorum

Set an HDFS replication factor that provides data redundancy, protecting against node failure

Run a Secondary NameNode on a different master from the NameNode in order to provide automatic recovery from a NameNode failure.

Run the ResourceManager on a different master from the NameNode in order to load- share HDFS metadata processing

Configure the cluster’s disk drives with an appropriate fault tolerant RAID level

Question 8

Which two features does Kerberos security add to a Hadoop cluster?(Choose two)

User authentication on all remote procedure calls (RPCs)

Encryption for data during transfer between the Mappers and Reducers

Encryption for data on disk (“at rest”)

Authentication for user access to the cluster against a central server

Root access to the cluster for users hdfs and mapred but non-root access for clients

Question 9

You have installed a cluster HDFS and MapReduce version 2 (MRv2) on YARN. You have no dfs.hosts entry(ies) in your hdfs-site.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node. What do you have to do on the cluster to allow the worker node to join, and start sorting HDFS blocks?

Without creating a dfs.hosts file or making any entries, run the commands hadoop.dfsadmin-refreshModes on the NameNode

Restart the NameNode

Creating a dfs.hosts file on the NameNode, add the worker Node’s name to it, then issue the command hadoop dfsadmin –refresh Nodes = on the Namenode

Nothing; the worker node will automatically join the cluster when NameNode daemon is started

Question 10

You’re upgrading a Hadoop cluster from HDFS and MapReduce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce a block size of 128MB for all new files written to the cluster after upgrade. What should you do?

You cannot enforce this, since client code can always override this value

Set dfs.block.size to 128M on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final

Set dfs.block.size to 128 M on all the worker nodes and client machines, and set the parameter to fina

You do not need to set this value on the NameNode

Set dfs.block.size to 134217728 on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final

Set dfs.block.size to 134217728 on all the worker nodes and client machines, and set the parameter to fina

You do not need to set this value on the NameNode

Question 11

On a cluster running CDH 5.0 or above, you use the hadoop fs –put command to write a 300MB file into a previously empty directory using an HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another use see when they look in directory?

The directory will appear to be empty until the entire file write is completed on the cluster

They will see the file with a ._COPYING_ extension on its nam

If they view the file, they will see contents of the file up to the last completed block (as each 64MB block is written, that block becomes available)

They will see the file with a ._COPYING_ extension on its nam

If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster

They will see the file with its original nam

If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster

Question 12

You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum Storage. What is the purpose of ZooKeeper in such a configuration?

It only keeps track of which NameNode is Active at any given time

It monitors an NFS mount point and reports if the mount point disappears

It both keeps track of which NameNode is Active at any given time, and manages the Edits fil

Which is a log of changes to the HDFS filesystem

If only manages the Edits file, which is log of changes to the HDFS filesystem

Clients connect to ZooKeeper to determine which NameNode is Active

Question 13

Table schemas in Hive are:

Stored as metadata on the NameNode

Stored along with the data in HDFS

Stored in the Metadata

Stored in ZooKeeper

Question 14

You want to node to only swap Hadoop daemon data from RAM to disk when absolutely necessary. What should you do?

Delete the /dev/vmswap file on the node

Delete the /etc/swap file on the node

Set the ram.swap parameter to 0 in core-site.xml

Set vm.swapfile file on the node

Delete the /swapfile file on the node

Question 15

Given:

You want to clean up this list by removing jobs where the State is KILLED. What command you enter?

Yarn application –refreshJobHistory

Yarn application –kill application_1374638600275_0109

Yarn rmadmin –refreshQueue

Yarn rmadmin –kill application_1374638600275_0109

Question 16

A slave node in your cluster has 4 TB hard drives installed (4 x 2TB). The DataNode is configured to store HDFS blocks on all disks. You set the value of the dfs.datanode.du.reserved parameter to 100 GB. How does this alter HDFS block storage?

25GB on each hard drive may not be used to store HDFS blocks

100GB on each hard drive may not be used to store HDFS blocks

All hard drives may be used to store HDFS blocks as long as at least 100 GB in total is available on the node

A maximum if 100 GB on each hard drive may be used to store HDFS blocks

Question 17

You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job is in a directory named westUsers, located just below your home directory in HDFS. Which command gathers these into a single file on your local file system?

Hadoop fs –getmerge –R westUsers.txt

Hadoop fs –getemerge westUsers westUsers.txt

Hadoop fs –cp westUsers/* westUsers.txt

Hadoop fs –get westUsers westUsers.txt

Question 18

Which three basic configuration parameters must you set to migrate your cluster from MapReduce 1 (MRv1) to MapReduce V2 (MRv2)?(Choose three)

Configure the NodeManager to enable MapReduce services on YARN by setting the following property in yarn-site.xml:

Question 19

Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your yarn.site.xml has the following configuration:

Modify yarn-site.xml with the following property:

Question 20

Your Hadoop cluster is configuring with HDFS and MapReduce version 2 (MRv2) on YARN. Can you configure a worker node to run a NodeManager daemon but not a DataNode daemon and still have a functional cluster?

The daemon will receive data from the NameNode to run Map tasks

The daemon will get data from another (non-local) DataNode to run Map tasks

The daemon will receive Map tasks only

The daemon will receive Reducer tasks only

Question 21

For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log files stored?

Cached by the NodeManager managing the job containers, then written to a log directory on the NameNode

Cached in the YARN container running the task, then copied into HDFS on job completion

In HDFS, in the directory of the user who generates the job

On the local disk of the slave mode running the task

Question 22

You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?

When your workload generates a large amount of output data, significantly larger than the amount of intermediate data

When your workload consumes a large amount of input data, relative to the entire capacity if HDFS

When your workload consists of processor-intensive tasks

When your workload generates a large amount of intermediate data, on the order of the input data itself

Page: 1 / 5

Total 60 questions Full Exam Access