PMF Fulltext Search Addition (postgresql)

You have a suggestion for a future version of phpMyFAQ? Then post it here!

Moderator: Thorsten

Post Reply
dajoker
Posts: 59
Joined: Sat Jan 30, 2010 1:01 am

PMF Fulltext Search Addition (postgresql)

Post by dajoker » Mon May 31, 2010 9:43 pm

I've been playing around a bit today trying to get queries to come back significantly faster while also getting results that aren't crap. The result follows and is an implementation of fulltext search within PostgreSQL. I do not know that it is the best way (correct code in correct files) but it works. Responses from the database dropped from (measuring from the browser with 'related FAQs' disabled completely on the backend, so keep that in mind) around six seconds (I think it was closer to ten when 'related FAQs' functionality was still on) to almost no time at all. It is literally taking longer to draw the resulting FAQ (once clicking on it) than the time required to get (reportedly) 589 search results from the database.

As a quick summary I added a column to the faqdata table to be the one holding the indexed data (tsvector type), populated it initially with all FAQ data (UPDATE statement), added an index on it, and then a trigger to keep it up-to-date as FAQs changed.

I modified two .php files which have differences below. ./phpmyfaq/inc/functions.php was modified to send non-numeric-only search queries to a new function defined for PostgreSQL in the ./phpmyfaq/inc/PMF_DB/Pgsql.php file. The ./phpmyfaq/inc/PMF_DB/Pgsql.php file simply had a new method added to it to pull off the advanced search.

As a last note I added a new config value to the faqconfig table to make this something that can be enabled/disabled per implementation as this functionality is eventually added to other database systems.

I believe MySQL could probably do this fairly easily as it has a FULLTEXT column type. I have not looked into it. I presume Oracle and others have something similar as well but, again, no experience there.



Notes:

Adding tsearch2 functionality to PostgreSQL's phpmyfaqdb database.
http://barryp.org/blog/entries/postgres ... ch-django/
http://www.sai.msu.su/~megera/postgres/ ... intro.html
http://www.postgresql.org/docs/8.3/stat ... intro.html

--Create index column.
ALTER TABLE faqdata ADD COLUMN faqdataft0 tsvector;

--Copy existing data into the column joining the fields I care about.
UPDATE faqdata SET faqdataft0=to_tsvector('english',coalesce(keywords,'') ||' '|| coalesce(thema,'') ||' '|| coalesce(content,''));
VACUUM FULL ANALYZE;

--Create an index on the column.
CREATE INDEX faqdataft0_idx ON faqdata USING gist(faqdataft0);
VACUUM FULL ANALYZE;

--Create trigger to continuously update the indexed column as inserts/updates happen within the database
CREATE TRIGGER faqdatatsvectorupdate BEFORE INSERT OR UPDATE ON faqdata FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(faqdataft0, 'pg_catalog.english', keywords, thema, content);

--Test searching with ranking - http://www.postgresql.org/docs/8.3/stat ... trols.html
SELECT solution_id, thema, ts_rank_cd(faqdataft0, query, 32) AS rank FROM faqdata, plainto_tsquery('query terms') as query WHERE faqdataft0 @@ plainto_tsquery('query terms') ORDER BY rank DESC;

--Get "headlines" from the results, specifically in the content field (add HTML bold tags by default to the resulting data).
SELECT solution_id, ts_headline(content, query) AS snippet, ts_rank_cd(faqdataft0, query, 32) AS rank FROM faqdata, plainto_tsquery('query terms') as query WHERE faqdataft0 @@ plainto_tsquery('query terms') ORDER BY rank DESC;

phpMyFAQ add-in to facilitate fulltext search:
INSERT INTO faqconfig (config_name, config_value) VALUES ('db.fulltextsearch', 'true');

Making the PHP search properly
<quote file='./phpmyfaq/inc/PMF_DB/Pgsql.php' version='2.6.5'>
240,311d239
< * Generates a result based on a fulltext search. This functionality is database-specific
< * and as a result must be implemented in the database-specific PHP file.
< *
< * @access public
< * @author Aaron Burgemeister <dajoker@gmail.com>
< * @since 2010-05-31
< */
< public function fullTextSearch($table, Array $assoc, $joinedTable = '', Array $joinAssoc = array(), $match = array(), $string = '', Array $cond = array())
< {
< $string = pg_escape_string(trim($string));
< $fields = '';
< $joined = '';
< $where = '';
<
< //Get a list of fields to be retrieved via SQL.
< foreach ($assoc as $field) {
<
< if (empty($fields)) {
<
< $fields = $field;
< } else {
<
< $fields .= ', '.$field;
< }
< }
<
< //Include any joined-in tables
< if (isset($joinedTable) && $joinedTable != '') {
< $joined .= ' LEFT JOIN '.$joinedTable.' ON ';
< }
<
< if (is_array($joinAssoc)) {
< foreach ($joinAssoc as $joinedFields) {
< $joined .= $joinedFields.' AND ';
< }
< $joined = PMF_String::substr($joined, 0, -4);
< }
<
< foreach ($cond as $field => $data) {
< if (empty($where)) {
< $where .= $field.' = '.$data;
< } else {
< $where .= ' AND '.$field.' = '.$data;
< }
< }
<
< $match = implode("|| ' ' ||", $match);
<
< $query = 'SELECT ' . $fields . ' FROM plainto_tsquery($SQLQUOTE$' . $string . '$SQLQUOTE$) as query, ' . $table . $joined . ' WHERE ('.$match.") @@ plainto_tsquery('" . $string . "')";
<
< if (!empty($where)) {
< $query .= ' AND ('.$where.')';
< }
<
< //Order by rank, descending, when using fulltext searches.
< /*
< $firstOrderBy = true;
< foreach ($orderBy as $field) {
< if ($firstOrderBy) {
< $query .= " ORDER BY ".$field;
< $firstOrderBy = false;
< } else {
< $query .= ", ".$field;
< }
< }
< */
< $query .= ' ORDER BY rank DESC';
<
< return $this->query($query);
< }
<
< /**
</quote>
<quote file='./phpmyfaq/inc/functions.php' version='2.6.5'>
447,460c447,484
< $result = $db->search(SQLPREFIX."faqdata",
< array(SQLPREFIX."faqdata.id AS id",
< SQLPREFIX."faqdata.lang AS lang",
< SQLPREFIX."faqcategoryrelations.category_id AS category_id",
< SQLPREFIX."faqdata.thema AS thema",
< SQLPREFIX."faqdata.content AS content"),
< SQLPREFIX."faqcategoryrelations",
< array(SQLPREFIX."faqdata.id = ".SQLPREFIX."faqcategoryrelations.record_id",
< SQLPREFIX."faqdata.lang = ".SQLPREFIX."faqcategoryrelations.record_lang"),
< array(SQLPREFIX."faqdata.thema",
< SQLPREFIX."faqdata.content",
< SQLPREFIX."faqdata.keywords"),
< $searchterm,
< $cond);
---
> //Adding in capability for fulltext search. This should probably be in the Pgsql.php file somehow
> //but I am not sure how to do it properly there so we'll start with this and let Thorsten work out
> //details later as this is enabled for other databases. One difficulty of having it elsewhere is
> //around specifying the table/column to query. Maybe this is the best place after all. Hmmmm....
> if (PMF_Configuration::getInstance()->get('db.fulltextsearch') == true) {
> //public function fullTextSearch($table, Array $assoc, $joinedTable = '', Array $joinAssoc = array(), $match = array(), $string = '', Array $cond = array())
> //FROM faqdata, plainto_tsquery('partition merge') as query WHERE faqdataft0 @@ plainto_tsquery('partition merge') ORDER BY rank DESC;
>
> $result = $db->fullTextSearch(SQLPREFIX.'faqdata',
> array(SQLPREFIX.'faqdata.id AS id',
> SQLPREFIX.'faqdata.lang AS lang',
> SQLPREFIX.'faqcategoryrelations.category_id AS category_id',
> SQLPREFIX.'faqdata.thema AS thema',
> SQLPREFIX.'faqdata.content AS content',
> SQLPREFIX.'ts_rank_cd(faqdataft0, query, 32) AS rank'),
> SQLPREFIX."faqcategoryrelations",
> array(SQLPREFIX."faqdata.id = ".SQLPREFIX."faqcategoryrelations.record_id",
> SQLPREFIX."faqdata.lang = ".SQLPREFIX."faqcategoryrelations.record_lang"),
> array(SQLPREFIX."faqdata.faqdataft0"),
> $searchterm,
> $cond);
> }
> else {
> $result = $db->search(SQLPREFIX."faqdata",
> array(SQLPREFIX."faqdata.id AS id",
> SQLPREFIX."faqdata.lang AS lang",
> SQLPREFIX."faqcategoryrelations.category_id AS category_id",
> SQLPREFIX."faqdata.thema AS thema",
> SQLPREFIX."faqdata.content AS content"),
> SQLPREFIX."faqcategoryrelations",
> array(SQLPREFIX."faqdata.id = ".SQLPREFIX."faqcategoryrelations.record_id",
> SQLPREFIX."faqdata.lang = ".SQLPREFIX."faqcategoryrelations.record_lang"),
> array(SQLPREFIX."faqdata.thema",
> SQLPREFIX."faqdata.content",
> SQLPREFIX."faqdata.keywords"),
> $searchterm,
> $cond);
> }
</quote>

Thorsten
Posts: 14769
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: PMF Fulltext Search Addition (postgresql)

Post by Thorsten » Tue Jun 01, 2010 10:13 am

Hi,

wow, what a great patch! :-)

Is it possible to get your 2 patched files by e-mail? I would like to add the code to the next bigger release of phpMyFAQ.

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist

dajoker
Posts: 59
Joined: Sat Jan 30, 2010 1:01 am

Re: PMF Fulltext Search Addition (postgresql)

Post by dajoker » Tue Jun 01, 2010 1:40 pm

Mailed. I'm pretty sure it was just the two files but if anything is amiss let me know and I can send you my entire setup.

Thanks.

dajoker
Posts: 59
Joined: Sat Jan 30, 2010 1:01 am

Re: PMF Fulltext Search Addition (postgresql)

Post by dajoker » Sun Jun 13, 2010 7:55 am

Found that after making the changes above the difference between the faqdata and faqdata_revisions table caused a warning to be thrown when modifying FAQs. The following additional SQL statement resolves it by getting the two tables to match but there is no need to add the trigger or index on faqdata_revisions like there is on faqdata (afaik):

ALTER TABLE faqdata_revisions ADD COLUMN faqdataft0 tsvector;

The error received before doing this follows:

<quote>
phpMyFAQ warning [2]: pg_query(): Query failed: ERROR: INSERT has more expressions than target columns LINE 4: SELECT * FROM ^ in Pgsql.php on line 91
The changes were saved successfully
</quote>

Post Reply