phpMyFaq is a tremendous product, keep up the awesome work!
As part of our internal testing, our test group noted the search results (both advanced and instant) were not ranked or weighted. For a FAQ or Knowledge base with a non-trivial number of solutions, this can quickly render the search ineffective. We tackled the problem internally and coded a solution which works well for us. We are sharing our solution as it may work well for others and may also prove to be a good fit for integration into the current trunk.
Goals:
* Rank / weight search results based on Keywords, Title, and Answer text
* Support multiple keyword / search term ranking (emulate AND matching with weighting)
* Use existing methodologies already in place within phpMyFaq
* Produce as little performance impact as possible
Step#1: Include the database keywords column in the search query output.
Modify .\inc\PMF\Search.php at line 135 (version 2.8.22) to return the keywords column
Code: Select all
$search->setTable($fdTable)
->setResultColumns(array(
$fdTable . '.id AS id',
$fdTable . '.lang AS lang',
$fdTable . '.solution_id AS solution_id',
$fcrTable . '.category_id AS category_id',
$fdTable . '.thema AS question',
$fdTable . '.content AS answer',
$fdTable . '.keywords AS keywords')) // <--- New line of code added here
->setJoinedTable($fcrTable)
->setJoinedColumns(array(
$fdTable . '.id = ' . $fcrTable . '.record_id',
$fdTable . '.lang = ' . $fcrTable . '.record_lang'))
->setConditions($condition);
Modify ./inc/PMF/Search/Resultset.php at line 109 (version 2.8.22) by replacing the entire reviewResultset function with the following code:
Code: Select all
public function reviewResultset(Array $resultset, $searchterm='')
{
$this->setResultset($resultset);
$resultRanks = array();
$processedResults = array();
$searchTerms = array();
$searchTermCount = 0;
if (!is_numeric($searchterm)) {
$searchTerms = PMF_String::preg_split("/\s+/", strtoupper($searchterm));
$searchTermCount = count($searchTerms);
}
$currentUserId = $this->user->getUserId();
if ('medium' === $this->_config->get('security.permLevel')) {
$currentGroupIds = $this->user->perm->getUserGroups($currentUserId);
} else {
$currentGroupIds = array(-1);
}
foreach ($this->rawResultset as $key => $result) {
$permission = false;
// check permissions for groups
if ('medium' === $this->_config->get('security.permLevel')) {
$groupPermission = $this->faq->getPermission('group', $result->id);
if (count($groupPermission) && in_array($groupPermission[0], $currentGroupIds)) {
$permission = true;
}
}
// check permission for user
if ($permission || 'basic' === $this->_config->get('security.permLevel')) {
$userPermission = $this->faq->getPermission('user', $result->id);
if (in_array(-1, $userPermission) || in_array($this->user->getUserId(), $userPermission)) {
$permission = true;
} else {
$permission = false;
}
}
if ($permission) {
if (isset($processedResults[$result->id])) {
continue; // Already processed (duplicate), skip
}
$rankValue = 0;
$matchedTermCount = 0;
if ($searchTermCount > 0) {
foreach($searchTerms as $term){
$termRank = 0;
$termRank += (PMF_String::substr_count(strtoupper($result->keywords), $term) * 3);
$termRank += (PMF_String::substr_count(strtoupper($result->question), $term) * 2);
$termRank += (PMF_String::substr_count(strtoupper($result->answer), $term) *.25);
if ($termRank > 0) {
$matchedTermCount++;
}
$rankValue += $termRank;
}
// Reduce ranking if not all terms matched
$rankValue = $rankValue * ($matchedTermCount / $searchTermCount);
}
$processedResults[$result->id] = 1;
$resultRanks[$key] = $rankValue;
}
}
// Final sort and filter
arsort($resultRanks);
foreach($resultRanks as $key => $rank) {
$this->reviewedResultset[] = $resultset[$key];
}
$this->setNumberOfResults($this->reviewedResultset);
}
Modify ./ajaxresponse.php at line 93 (version 2.8.22) by replacing the line with the following code:
Code: Select all
$faqSearchResult->reviewResultset($searchResult, $searchString);
Code: Select all
$faqSearchResult->reviewResultset($searchResults, $inputSearchTerm);
For our solution, we weighted keywords highest, title second highest, with the solution content ranked lowest. Others may want to weight and rank differently!
Hope this helps someone else out!