Ranked search results (Includes our coded solution)

You have a suggestion for a future version of phpMyFAQ? Then post it here!

Moderator: Thorsten

Post Reply
ArmgaSys
Posts: 7
Joined: Thu May 21, 2015 4:49 am

Ranked search results (Includes our coded solution)

Post by ArmgaSys »

Hi again,

phpMyFaq is a tremendous product, keep up the awesome work!

As part of our internal testing, our test group noted the search results (both advanced and instant) were not ranked or weighted. For a FAQ or Knowledge base with a non-trivial number of solutions, this can quickly render the search ineffective. We tackled the problem internally and coded a solution which works well for us. We are sharing our solution as it may work well for others and may also prove to be a good fit for integration into the current trunk.

Goals:
* Rank / weight search results based on Keywords, Title, and Answer text
* Support multiple keyword / search term ranking (emulate AND matching with weighting)
* Use existing methodologies already in place within phpMyFaq
* Produce as little performance impact as possible

Step#1: Include the database keywords column in the search query output.
Modify .\inc\PMF\Search.php at line 135 (version 2.8.22) to return the keywords column

Code: Select all

        $search->setTable($fdTable)
               ->setResultColumns(array(
                    $fdTable . '.id AS id',
                    $fdTable . '.lang AS lang',
                    $fdTable . '.solution_id AS solution_id',
                    $fcrTable . '.category_id AS category_id',
                    $fdTable . '.thema AS question',
                    $fdTable . '.content AS answer',
                    $fdTable . '.keywords AS keywords'))   // <--- New line of code added here
               ->setJoinedTable($fcrTable)
               ->setJoinedColumns(array(
                    $fdTable . '.id = ' . $fcrTable . '.record_id',
                    $fdTable . '.lang = ' . $fcrTable . '.record_lang'))
               ->setConditions($condition);
Step#2: Modify the search resultset handler to weight and rank search results.
Modify ./inc/PMF/Search/Resultset.php at line 109 (version 2.8.22) by replacing the entire reviewResultset function with the following code:

Code: Select all

    public function reviewResultset(Array $resultset, $searchterm='')
    {
        $this->setResultset($resultset);

		$resultRanks = array();
		$processedResults = array();
		$searchTerms = array();
		$searchTermCount = 0;
		if (!is_numeric($searchterm)) {
			$searchTerms = PMF_String::preg_split("/\s+/", strtoupper($searchterm));
			$searchTermCount = count($searchTerms);
		}
		
        $currentUserId    = $this->user->getUserId();
        if ('medium' === $this->_config->get('security.permLevel')) {
            $currentGroupIds = $this->user->perm->getUserGroups($currentUserId);
        } else {
            $currentGroupIds = array(-1);
        }

        foreach ($this->rawResultset as $key => $result) {

			$permission = false;
            // check permissions for groups
            if ('medium' === $this->_config->get('security.permLevel')) {
                $groupPermission = $this->faq->getPermission('group', $result->id);
                if (count($groupPermission) && in_array($groupPermission[0], $currentGroupIds)) {
                    $permission = true;
                }
            }
            // check permission for user
            if ($permission || 'basic' === $this->_config->get('security.permLevel')) {
                $userPermission = $this->faq->getPermission('user', $result->id);
                if (in_array(-1, $userPermission) || in_array($this->user->getUserId(), $userPermission)) {
                    $permission = true;
                } else {
                    $permission = false;
                }
            }

			if ($permission) {
				if (isset($processedResults[$result->id])) {
					continue;	// Already processed (duplicate), skip
				}
				$rankValue = 0;
				$matchedTermCount = 0;
				if ($searchTermCount > 0) {
					foreach($searchTerms as $term){
						$termRank = 0;
						$termRank += (PMF_String::substr_count(strtoupper($result->keywords), $term) * 3);
						$termRank += (PMF_String::substr_count(strtoupper($result->question), $term) * 2);
						$termRank += (PMF_String::substr_count(strtoupper($result->answer), $term) *.25);
						if ($termRank > 0) {
							$matchedTermCount++;
						}
						$rankValue += $termRank;
					}
					
					// Reduce ranking if not all terms matched
					$rankValue = $rankValue * ($matchedTermCount / $searchTermCount);
				}
				$processedResults[$result->id] = 1;
				$resultRanks[$key] = $rankValue;
			}
        }
		
		// Final sort and filter
		arsort($resultRanks);
		foreach($resultRanks as $key => $rank) {
			$this->reviewedResultset[] = $resultset[$key];
		}
        
        $this->setNumberOfResults($this->reviewedResultset);
    }
Step#3: Modify existing calls to the reviewResultset function to include the user entered search terms.
Modify ./ajaxresponse.php at line 93 (version 2.8.22) by replacing the line with the following code:

Code: Select all

$faqSearchResult->reviewResultset($searchResult, $searchString);
Modify ./search.php at line 92 (version 2.8.22) by replacing the line with the following code:

Code: Select all

$faqSearchResult->reviewResultset($searchResults, $inputSearchTerm);
Solution Notes:
For our solution, we weighted keywords highest, title second highest, with the solution content ranked lowest. Others may want to weight and rank differently!

Hope this helps someone else out!
Last edited by ArmgaSys on Sun May 24, 2015 6:58 pm, edited 1 time in total.
Thorsten
Posts: 15559
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: Ranked search results (Includes our coded solution)

Post by Thorsten »

Hi,

cool solution! With phpMyFAQ 2.9 we'll use the keyword as well and the internal scoring of MySQL fulltext search for better results.

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
ArmgaSys
Posts: 7
Joined: Thu May 21, 2015 4:49 am

Re: Ranked search results (Includes our coded solution)

Post by ArmgaSys »

Thanks Thorsten,
The scored MySQL solution sounds great... but, unfortunately, the corporate powers that be in our organization have targeted SQL Server for our install :( So, we need a database agnostic solution. The solution presented made our test group very happy!

Speaking of our test group, they found an error in our duplicate checking logic in our solution (I.E. duplicates were being allowed through). We have patched the code and updated the original post above.

Enjoy!
Thorsten
Posts: 15559
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: Ranked search results (Includes our coded solution)

Post by Thorsten »

Hi,

thanks! Maybe I can combine your and my solutions.

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
ArmgaSys
Posts: 7
Joined: Thu May 21, 2015 4:49 am

Re: Ranked search results (Includes our coded solution)

Post by ArmgaSys »

Absolutely!
If it makes someone else's test group experience easier, combine away!
Post Reply