Sunday, October 21, 2012

Installing Cloudera CDH4 on Ubuntu 12.04 LTS

Presented are some short notes for installing Cloudera CDH4 on Ubuntu 12.04 LTS running as a guest OS on Oracle's VirtualBox. For those unfamiliar with Cloudera and CDH, CDH is Cloudera’s 100% open source Hadoop distribution. What is documented here is not a complete tutorial, but rather pieces of information to be used in conjunction with the product's documentation. Use these tips to make the installation of Cloudera on Ubuntu easier.

Prerequisites

Creating the VM

Create a new virtual machine using the new VM wizard and downloaded Ubuntu ISO. It is important to have the 64-bit LTS ISO or the Cloudera manager will not start.

VM Settings:

  • 4GB RAM (minimum)
  • 2 CPUs
  • 128MB Display Memory
  • 25GB Dynamic Disk

When finished, you should see something similar to the following:

Configuring Ubuntu 12.04 LTS

Once you have started the Ubuntu VM and logged in, set a password for root. The Cloudera manager will need the password to install the cluster. You are also free to use a passwordless sudo setup.

sudo passwd root

Next, you will need to install the SSH server and client. This is needed by the Cloudera manager for cluster installation:

sudo apt-get install openssh-client
sudo apt-get install openssh-server

Make the following changes to /etc/hosts. Not modifying the file will cause a number of cluster startup errors such as not being able to start hBase or creating a number of default directories:

127.0.0.1 KRDAVIS-CLOUDERA localhost
#127.0.0.1 localhost
#127.0.1.1 KRDAVIS-CLOUDERA

Install the GNOME session fallback package:

sudo apt-get install gnome-session-fallback

Logout and select "Gnome Classic (no effects)" for your session. This will prevent any weirdness with running Compiz under the VM. You can now log back in.

Install Cloudera CDH4

Start the cluster installation by running the Cloudera installation manager:

chmod 755 cloudera-manager-installer.bin
sudo ./cloudera-manager-installer.bin

Follow the instructions and accept and default values. When you are done, you should have a single node CDH4 cluster running in your VM!

Shutting down the cluster and VM

When it comes time to shutdown the VM, I found I have fewer problems if I shutdown the cluster by logging into the Cloudera management web app. Select "All Services" from the "Services" menu. For the cluster, select "Stop..." from the Actions dropdown menu. Wait for all services to come to a stop.

After verifying that all cluster services are stopped, shutdown the VM by opening a terminal and running the following command:

sudo /sbin/shutdown -h now

Selecting shutdown from the Ubuntu UI appears to only log out of the system without shutting it down. That is a problem for another day.