Presented are some short notes for installing Cloudera CDH4 on Ubuntu 12.04 LTS running as a guest OS on Oracle’s VirtualBox. For those unfamiliar with Cloudera and CDH, CDH is Cloudera’s 100% open source Hadoop distribution. What is documented here is not a complete tutorial, but rather pieces of information to be used in conjunction with the product’s documentation. Use these tips to make the installation of Cloudera on Ubuntu easier.
I needed to find a simple way to implement a text search on a table without using MySQL’s fulltext search functionality. I am currently limited to using a version of MySQL that only supports fulltext search on the MyISAM storage engine. My tables use the InnoDB storage engine, hence the problem. This post will show a quick and dirty way to search multiple database fields with multiple search terms using only SQL.
The Dynamic Data System (DDS) makes loading a data warehouse much easier and faster. This system uses SSIS to load standard and partitioned tables via SSIS packages created on the fly using metadata. This eliminates the need to maintain complex SSIS packages. Only basic T-SQL skills are needed. The Dynamic Data System is available on CodePlex.
My current data warehouse project uses SQL Server 2008 R2. Out of the box I found a couple minor configuration issues that prevented me from getting started with using DDS. The following notes, in addition to the documentation provided with the product, should get you on your way.
Continue reading “Tips for Running the Dynamic Data System (DDS)”