Presented are some short notes for installing Cloudera CDH4 on Ubuntu 12.04 LTS running as a guest OS on Oracle’s VirtualBox. For those unfamiliar with Cloudera and CDH, CDH is Cloudera’s 100% open source Hadoop distribution. What is documented here is not a complete tutorial, but rather pieces of information to be used in conjunction with the product’s documentation. Use these tips to make the installation of Cloudera on Ubuntu easier.
- Windows 7 Professional 64-bit:
- Oracle VirtualBox:
- Ubuntu 12.04 LTS 64-bit ISO:
- Cloudera Manager Free Edition:
Creating the VM
Create a new virtual machine using the new VM wizard and downloaded Ubuntu ISO. It is important to have the 64-bit LTS ISO or the Cloudera manager will not start.
- 4GB RAM (minimum)
- 2 CPUs
- 128MB Display Memory
- 25GB Dynamic Disk
When finished, you should see something similar to the following:
Configuring Ubuntu 12.04 LTS
Once you have started the Ubuntu VM and logged in, set a password for root. The Cloudera manager will need the password to install the cluster. You are also free to use a passwordless sudo setup.
sudo passwd root
Next, you will need to install the SSH server and client. This is needed by the Cloudera manager for cluster installation:
sudo apt-get install openssh-client sudo apt-get install openssh-server
Make the following changes to /etc/hosts. Not modifying the file will cause a number of cluster startup errors such as not being able to start hBase or creating a number of default directories:
127.0.0.1 KRDAVIS-CLOUDERA localhost #127.0.0.1 localhost #127.0.1.1 KRDAVIS-CLOUDERA
Install the GNOME session fallback package:
sudo apt-get install gnome-session-fallback
Logout and select “Gnome Classic (no effects)” for your session. This will prevent any weirdness with running Compiz under the VM. You can now log back in.
Install Cloudera CDH4
Start the cluster installation by running the Cloudera installation manager:
chmod 755 cloudera-manager-installer.bin sudo ./cloudera-manager-installer.bin
Follow the instructions and accept and default values. When you are done, you should have a single node CDH4 cluster running in your VM!
Shutting down the cluster and VM
When it comes time to shutdown the VM, I found I have fewer problems if I shutdown the cluster by logging into the Cloudera management web app. Select “All Services” from the “Services” menu. For the cluster, select “Stop…” from the Actions dropdown menu. Wait for all services to come to a stop.
After verifying that all cluster services are stopped, shutdown the VM by opening a terminal and running the following command:
sudo /sbin/shutdown -h now
Selecting shutdown from the Ubuntu UI appears to only log out of the system without shutting it down. That is a problem for another day.