Hadoop Final

Download Report

Transcript Hadoop Final

Hadoop
Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan
Pingilley, and Adam Albertson
Operating System and Network
Configuration
 The first thing we did was install Ubuntu 10.4 LTS on every
machine.
 After all of the nodes were up and running Ubuntu, we
connected all of them to the switch
Basic Configuration - Java
 After we got all of the machines connected to the
switch, we had to install some of the packages we
needed for Hadoop.
 The Ubuntu installation did not come with Java, so we
installed Java on each machine and then configured
the PATH variable for each machine so they would be
able to discover the Java binary.
Basic Configuration - Hadoop
 After getting the Java Development Kit installed, we
installed the Hadoop files and setup the PATH variable for
HADOOP_HOME.
 We then had to create a Hadoop user account and
group on each node and change the ownership of the
Hadoop files over to that new user.
Basic Configuration - SSH
 After setting up the Hadoop accounts on each node, we
had to setup the authorized_keys for the master node so
it could shell into the Hadoop accounts on the other
nodes.
File System Configuration
 On each node, we had to configure the XML files that
were used for the distributed file system configuration.
 After setting up the DFS configuration, we had to format
the namenode (master node).
 Once all configuration was done, we started the
distributed file system and tasktracker scripts and got the
datanodes and jobtrackers running on all of the slaves.
Test Run
 For our test run, we gave Hadoop seven different books
to run against the word counting program provided with
the installation.
 The first time we ran the test, the cluster successfully
mapped all of the work, but failed to reduce.
 The problem ended up being caused by an error in the
/etc/hosts configuration.
Test Run
 When the node running the reducer went to look for the
output of its own maps, it would reference its own IP
address to communicate with the task tracker it was
running.
 What we did not realize was that the nodes were
referencing themselves using an entry in /etc/hosts that
was setup by the Ubuntu installation which pointed to
127.0.1.1 (nodeName-desktop)
 We changed the IP of this entry, on each node, to that
specific node’s static IP address. This resolved the fetch
failure issue we were having with the maps.
Test Run
 Once the problem was resolved, our Hadoop cluster
successfully counted the occurrence of each word in the
input files.