Thursday, October 4, 2012

PHP & MySQL - Search Multiple Fields and Terms

Background

I needed to find a simple way to implement a text search on a table without using MySQL's fulltext search functionality. I am currently limited to using a version of MySQL that only supports fulltext search on the MyISAM storage engine. My tables use the InnoDB storage engine, hence the problem. This post will show a quick and dirty way to search multiple database fields with multiple search terms using only SQL.

Requirements

  • Search for multiple terms across multiple database fields.
  • Specify which table fields are to be searched.
  • Search terms to be space delimited.
  • Find partial and exact terms.
  • Find several terms in a specific order preserving whitespace.
  • Identify and work with dates supplied in common formats.
  • Identify and work with numeric search terms.
  • Provide a few hints to control search behavior.
  • Generate SQL suitable for use in a WHERE clause.
  • Table independent for re-usability.

Note: High performance was not a consideration as the number of records searched would be relatively small.

Dates

While strings are straight forward to compare against, dates posed a bit of a problem. I needed to accept a few common date formats, check if the date was valid, and convert the search format to MySQL's internal date format to make a proper comparison.

The first step was to create an is_date() function to determine if a search term was valid date. The function below is straight forward. We break down the string into the appropriate date components and use the PHP checkdate() function to determine if the date is valid. The function is shown below.

    /**
     * Evaluates if a string is a valid date
     *
     * @param $str string - The date string to evaluate
     *
     * @return bool TRUE if the string is a valid date, FALSE otherwise 
     */
    public function is_date($str) { 
        $stamp = strtotime(str_replace(array('-', '.'), '/', $str)); 

        if (!is_numeric($stamp)) 
            return FALSE; 
        
        $month = date( 'm', $stamp ); 
        $day   = date( 'd', $stamp ); 
        $year  = date( 'Y', $stamp ); 
        
        if (checkdate($month, $day, $year)) 
            return TRUE; 
        
        return FALSE; 
    }

Search Hints

To meet some of the requirements, I needed to add a few query hints that a user could add to their query to help get the expected results.

To find an exact term match in a field, prefix the term with a plus '+'. The comparison will be performed using equality.

To find a series of terms in a specific order while preserving whitespace, surround the terms with double quotes '"'. The comparison will be performed using the LIKE operator.

The Code

The filter function is very simple. Pass a string of space delimited terms and an array of fields to be searched. Two nested loops create a comparison for each term for every field passed in the array. Each comparison is linked using an OR. Each of resulting OR'd comparison sets are linked together by an AND.

   /**
     * Creates a SQL filter expression suitable for use in a WHERE clause.
     *
     * Terms will be parsed by whitespace.
     *
     * Terms that are identified as valid dates will be converted to the database date
     * format and compared with equality e.g. =.
     *
     * Terms identified as valid numbers will be compared with equality e.g. =. It
     * should be noted that if a comparison is made against a non numeric field, implicit
     * conversions may occur that affect performance.
     *
     * Terms prefixed with a + will be compared with equality e.g. =.
     *
     * Terms surrounded by double quotes will be compared with LIKE in the order submitted with all
     * whitespace within the quotes preserved.
     *
     * All other terms will be compared with a LIKE without any leading or
     * trailing whitespace.
     *
     * @param $searchTerms   string Space separated list of search terms
     * @param $searchFields  array  Array of database fields to be searched
     *
     * @return string The filter expression in SQL format
     */
    protected function create_filter_expression($searchTerms, $searchFields) {
        // Setup the filter
        $filter = "";
 
        if(!empty($searchTerms) && count($searchFields) > 0) {
            // Explode search words into an array
            preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $searchTerms, $arraySearch);
            $arraySearch = $arraySearch[0];
 
            // Set the counters    
            $countFields = count($searchFields);
            $countSearch = count($arraySearch);        
            $a = 0;
            $b = 0;
          
            $filter = "(";
 
            // Loop through the terms
            while ($a < $countSearch)
            {
                // Loop through the fields
                while ($b < $countFields) {
                    // Date term
                    if (preg_match('/[^A-Za-z\"]/', $arraySearch[$a]) && $this->is_date($arraySearch[$a])) {
                        $filter .= "$searchFields[$b] = '" . date(self::MYSQL_DATE_FORMAT,
                            strtotime(str_replace(array('-', '.'), '/',
                                $arraySearch[$a]))) . "'";
                    }
                    // Numeric term
                    elseif (is_numeric($arraySearch[$a])) {
                        $filter .= "$searchFields[$b] = $arraySearch[$a]";
                    }
                    // Specific term - starts with a +
                    elseif (strchr($arraySearch[$a], '+')) {
                        $filter .= "$searchFields[$b] = '" . str_replace('+', '', $arraySearch[$a]) . "'";
                    }
                    // Default term
                    else {
                        $filter .= "$searchFields[$b] LIKE '%%" . str_replace('"', '', $arraySearch[$a]) . "%%'";
                    }
 
                    // Increment the field counter
                    $b++;
 
                    if ($b < $countFields) {
                        $filter .= " OR ";
                    }
                }
 
                // Increment all loop counters
                $b = 0;
                $a++;
          
                if ($a < $countSearch) {
                    $filter .= ") AND (";
                }
            }
 
            // Close everything up
            $filter .= ")";
        }
      
        return $filter;    
    }

Usage

The following will create an appropriate series of comparisons for the terms "Max Payne +male 3/19/1956" on the specified fields:

$searchTerms = "Max Payne +male 3/19/1956";
$searchFields = array(0 => 'firstname', 1 => 'lastname', 2 => 'date_of_birth', 3 => 'gender');

$where = $this->create_filter_expression($searchTerms, $searchFields);
Generated filter code:
(firstname LIKE '%Max%' OR lastname LIKE '%Max%' OR gender LIKE '%Max%')
AND (firstname LIKE '%Payne%' OR lastname LIKE '%Payne%' OR gender LIKE '%Payne%')
AND (firstname = 'male' OR lastname = 'male' OR gender = 'male')
AND (firstname = '1956-03-19' OR lastname = '1956-03-19' OR gender = '1956-03-19') 

Conclusion

The are probably a number of ways to improve upon what has been presented. The solution has been meeting my needs without any problems to date. The reader should remember that any dynamically generated SQL needs to be checked for the possibility of injection attacks. Comments are always welcome.