FAQ search result duplication when using multiple categories

In this board you can talk about general questions about phpMyFAQ

Moderator: Thorsten

Post Reply
dajoker
Posts: 59
Joined: Sat Jan 30, 2010 1:01 am

FAQ search result duplication when using multiple categories

Post by dajoker » Fri May 28, 2010 9:32 pm

I'm putting this here since I'm guessing it is not a bug, but it's not helping either. If I search through PMF and one of the FAQ entries returned is associated with multiple categories then I get that FAQ entry multiple times, once per category. Would it be possible to have an option to have it return just one category's entry or maybe show in the results that a FAQ entry is associated with multiple categories?

I suppose if the user using the FAQ just searches within one category that would fix the problem but that requires m ore clicking and may involve knowing something about the system that isn't available, or may unnecessarily limit the results. Anyway this is probably by design but it'd be nice if there were an option to just show unique FAQ entries (faqdata rows) rather than having results returned as many times as they are in faqcategoryrelations.

Thanks.

Thorsten
Posts: 15091
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: FAQ search result duplication when using multiple categories

Post by Thorsten » Sat May 29, 2010 8:33 am

Hi,

I know this issue by years and no one could give me a proper solution how to select the FAQ entry from one category. Which entry in which category listed in multiple categories is the most important?

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist

dajoker
Posts: 59
Joined: Sat Jan 30, 2010 1:01 am

Re: FAQ search result duplication when using multiple categories

Post by dajoker » Sat May 29, 2010 2:02 pm

Yes, this is a tricky one. Since the FAQ is exactly the same regardless of the category chosen perhaps it does not matter which category is shown (the first one returned, for example) or perhaps this could be detected when ordering the results or formatting them for output and then the category could be replaced with something like 'Multiple Categories'.

This is probably something too big to change in 2.6.x but it'd be nice to be able to work through it at some point. As it stands I do not think a user benefits from having duplicates at all (they're literally the exact same data, right?) except that they can see which categories return something for their item.

Thanks.

dajoker
Posts: 59
Joined: Sat Jan 30, 2010 1:01 am

Re: FAQ search result duplication when using multiple categories

Post by dajoker » Thu Jun 10, 2010 12:00 am

Remove FAQ duplicates when querying and a FAQ is associated w/multiple categories.
Several changes for this one, and more to do to make it work for all possible databases (plus testing) but it seems to work without completely rewriting the searchEngine method.

First, enable it in the database: INSERT INTO faqconfig (config_name, config_value) VALUES ('records.removemulticategorydupes', 'true'); --Use false or nothing to disable the feature in all environments.

Next as working through this I found I needed functionality to reset the $result resource for a database query which is easy with PHP and to make it modular for PMF I added this:
<quote file='./phpmyfaq/inc/PMF_DB/Driver.php' version='2.6.5'>
25a26,38
> * Resets the result pointer.
> *
> * Resets the pointer within the query result to
> * a specified location, or to the beginning if
> * nothing is specified.
> *
> * @param resource $result
> * @param integer $pointer
> * @return boolean
> */
> public function result_seek($result, $pointer);
>
> /**
</quote>

Also that method needed to be defined in the Pgsql.php file (along with all other RDBMS's files) which is shown below (for Pgsql only):
<quote file='./phpmyfaq/inc/PMF_DB/Pgsql.php' version='2.6.5'>
108a109,126
> * Resets the result pointer.
> *
> * Resets the pointer within the query result to
> * a specified location, or to the beginning if
> * nothing is specified.
> *
> * @param resource $result
> * @param integer $pointer
> * @return boolean
> * @author Aaron Burgemeister <dajoker@gmail.com>
> * @since 2010-06-09
> */
> public function result_seek($result, $pointer = 0)
> {
> return pg_result_seek($result, $pointer);
> }
>
> /**
</quote>

Along with the functionality discussed here to remove duplicates because of categories I think that the result_seek method could be used to slightly improve the performance hit when going to any of the pages after the first page. Currently logic in the searchEngine function (./phpmyfaq/inc/functions.php) goes through all of the returned results but basically loops through all results before the ones wanted which could be done with a single call to the seek function saving some cycles. Current code is shown below:
<code>
if ($counter <= $first) {
continue;
}
</code>

Finally it is time to update the ./phpmyfaq/inc/functions.php file to do the job of finding unique FAQs and only displaying the first link. This was a bit less work than I originally thought to make it work, but a bit more work to make it work and not have pages/FAQ-counts that were completely off due to calculating of total FAQs at different points. I'm sure this can be optimized somewhat though to what degree I cannot state with certainty.
<quote file='./phpmyfaq/inc/functions.php' version='2.6.5'>
552a553
> $dupeFAQs = array();
567a569,585
> //Get a new $num based on non-duplicate entries. Rewriting this function with some small changes may prevent
> //this from being necessary which would be a good thing.
> if ((@$faqconfig->get('records.removemulticategorydupes') != '') && ($faqconfig->get('records.removemulticategorydupes'))) {
> while ($row = $db->fetch_object($result)) {
> if (!isset($dupeFAQs[$row->id])) {
> $dupeFAQs[$row->id] = 1;
> }
> else {
> ++$dupeFAQs[$row->id];
> continue;
> }
> }
> $num = count(array_keys($dupeFAQs));
> $dupeFAQs = array();
> $db->result_seek($result, 0);
> }
>
590a609,620
> //Check for this FAQ's presence previously (probably associated with a different category)
> //and, if configured to do so, skip it.
> if ((@$faqconfig->get('records.removemulticategorydupes') != '') && ($faqconfig->get('records.removemulticategorydupes'))) {
> if (!isset($dupeFAQs[$row->id])) {
> $dupeFAQs[$row->id] = 1;
> }
> else {
> ++$dupeFAQs[$row->id];
> continue;
> }
> }
>
</quote>

First is just the initialization of an array to be used later. The second little section is before the while loop that actually generates all of the links and FAQ summaries and is just there to get a count of non-duplicated FAQs so that the generation of the count at the top of the results along with the creation of the page links (pagination) also works properly so that empty pages and "incorrect" counts do not occur to the end user. It basically checks for the config option being set properly and then loops through results building a quick associative array of unique FAQ IDs (I could have, as easily, used a counter but I already had the code from the next section which did exactly what I wanted and this also gives a count of each FAQ's frequency in case that is useful later in life). Once done it sets $num to the number of unique FAQs, clears out the array used, and sets the DB result resource back to its original value (no hitting the database again).

The last section above is actually within the while loop and prevents duplicates from being shown by skipping the rest of the code in the loop when a duplicate is detected.

Other options for this process could include preventing the count of total FAQs as well as the generation of the page links from happening until after the FAQs are deduplicated using a couple more $output-like variables. This can be added later on without impacting the database side of things so it may be worth doing for efficiency of processing, though maybe not clarity of code.

Mailing completed files for clarity. As mentioned this needs to be implemented for each RDBMS's class using things like mysql_data_seek, sybase_data_seek, mssql_data_seek, seek (sqlite... maybe the wrong function), and then not sure what to use for DB2, OCI8 or Firebird but maybe it doesn't matter for those databases.

Thanks.

Thorsten
Posts: 15091
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: FAQ search result duplication when using multiple categories

Post by Thorsten » Fri Jun 11, 2010 12:24 pm

Hi,

thank you very much. We will review the code and test it on other databases and if everything is okay we'll include it to phpMyFAQ.

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist

Post Reply