HowTo on making PHPMyFAQ spiderable. (static pages)

In this board you can talk about general questions about phpMyFAQ

Moderator: Thorsten

Thorsten
Posts: 15725
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Post by Thorsten »

Hi Nick,

mod_rewrite will be included in 1.5.0 and I'm currently working on that version.

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
nickn
Posts: 23
Joined: Mon Mar 15, 2004 5:58 pm

Post by nickn »

Sounds great. 8)

Looking forward to it, by the way, is there anyway to get an english version of your wishlist? It's kinda hard to navigate it when it's in german
Thorsten
Posts: 15725
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Post by Thorsten »

Hi,

hm, I don't know if amazon.de offers an English wishlist...

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
Thorsten
Posts: 15725
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Post by Thorsten »

Hi,

today I commited the first lines of code for mod_rewrite support for phpMyFAQ into CVS. And it runs successfully on my server. :-)

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
nickn
Posts: 23
Joined: Mon Mar 15, 2004 5:58 pm

Post by nickn »

That's great...can't wait for the 1.5'ish release!
Paul D. Buck
Posts: 38
Joined: Fri Dec 31, 2004 11:13 pm
Location: USA
Contact:

Post by Paul D. Buck »

I did it a little differently....

I created a page that lists all the URLs, then I copy them a group at a time into phpDig and spider them that way.

Because of the super linking I do, it seems to kill phpDig if I try to do any of the automatic updates, and the force feed seems to also barf if I try to do more than 10-15 pages at a whack ...

Code: Select all

<?php
/* ****************************************************************************************************************** */
// Include the initialization script 
/* ------------------------------------------------------------------------------------------------------------------ */
$gOrginalIncludePath = ini_get("include_path");
ini_set("include_path", ".:./_include/:../_include/:../../_include/:../../../_include/");
require_once("_page-0-initialize.php");


/* ****************************************************************************************************************** */
// Define main values needed for this page (and *THIS* page *ONLY*!) 
/* ------------------------------------------------------------------------------------------------------------------ */
$gPageTitle    = "Paul's BOINC Documentation Site";
$gContentTitle = "File List";


/* ****************************************************************************************************************** */
// Now make the actual page header
/* ------------------------------------------------------------------------------------------------------------------ */
require_once("_page-header.php");


/* ****************************************************************************************************************** */
// Misc. initialization
/* ------------------------------------------------------------------------------------------------------------------ */
// create metadata keywords
$gObjectTypeMetaTag = "Document";


/* ****************************************************************************************************************** */
// Content start
/* ------------------------------------------------------------------------------------------------------------------ */
BodyStart();

// count
$PageCount = 0;

function OutputFileList($Location) {

    global $PageCount;
    
    // open the directory
    $Dir = dir("../" . $Location);

    //echo "Location: " . $Location . "<p /> \r\n";

    // loop through the contents
    while ($Entry = $Dir->read()) {
    
        
        // make fully qualified path
        $FullPath = $Location . $Entry;
    
        // if the entry is a directory, NOT interested!
        if (strpos($FullPath, ".php") > 0) {
        
            echo "<tr><td>http://boinc-doc.net/" . $FullPath . "</td><tr>\r\n";
            
            // update the page count and output blank row as needed
            $PageCount = $PageCount + 1;
            if (($PageCount % 10) == 0) {
                echo "<tr><td></td><tr>\r\n";
            }
        }
    }

    // close handle
    $Dir->close();
}
echo "<table summary=\"File list for the site\" width=\"100%\">\r\n";
echo "<tr><td>http://boinc-doc.net/index.php</td><tr>\r\n";
echo "<tr><td></td><tr>\r\n";
// NOW; handle the index page
$PageCount = $PageCount + 1;
OutputFileList("site-boinc/oman-app/");
OutputFileList("site-common/glossary/");
OutputFileList("site-common/oman-web/");
OutputFileList("site-cpdn/");
OutputFileList("site-cpdn/oman-web/");
OutputFileList("site-einstein/");
OutputFileList("site-einstein/oman-web/");
OutputFileList("site-lhc/");
OutputFileList("site-lhc/oman-web/");
OutputFileList("site-misc/");
OutputFileList("site-predictor/");
OutputFileList("site-predictor/oman-web/");
OutputFileList("site-seti/");
OutputFileList("site-seti/news-technical/");
OutputFileList("site-seti/oman-web/");
OutputFileList("site-seti/seti-science/");
$PhpPageCount = $PageCount;
// now do the part with the FAQ
require_once($gPathLevel . "_phpMyFAQ/inc/data.php");
$dbc = mysql_connect($DB["server"], $DB["user"], $DB["password"]) or die ('Could not connect to MySQL: ' . mysql_error());
mysql_select_db($DB["db"]) or die ('Could not select database: ' . mysql_error());


$Query = "SELECT concat('http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat='," .
         "              `rubrik`," .
         "              '&id='," .
         "              `id`," .
         "              '&artlang=en')" .
         "  FROM `faqdata`" .
         " ORDER BY `rubrik`, `id`;";

// Do the fetch
$Result = @mysql_query ($Query);

// if we got something, work with it
if ($Result) {
    
    while ($Row = mysql_fetch_array($Result, MYSQL_NUM)) {
        echo "<tr><td>" . $Row[0] . "</td><tr>\r\n";

            // update the page count and output blank row as needed
            $PageCount = $PageCount + 1;
            if (($PageCount % 10) == 0) {
                echo "<tr><td></td><tr>\r\n";
            }
        
    }
}
    
// Ok, wrap it up!
echo "</table>\r\n";
echo "<p />\r\n";
echo "PHP Page count: " . $PhpPageCount;
echo "<p />\r\n";
echo "FAQ Page count: " . ($PageCount - $PhpPageCount);
echo "<p />\r\n";
echo "Total Page count: " . $PageCount;



?>

<!-- ++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
<!--                End the *ACTUAL* Body                 -->
<!-- ++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
<?php
include_once("_page-footer.php");
?>

<!-- End of file -->
This produces a page that has an output like:

Code: Select all

http://boinc-doc.net/site-seti/seti-science/science-content-seti.php

http://boinc-doc.net/site-seti/seti-science/seti-sounds.php
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=2&id=103&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=2&id=486&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=2&id=491&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=291798&id=27&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=291798&id=28&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=291798&id=29&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=291798&id=30&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=291798&id=31&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=291798&id=32&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=291798&id=33&artlang=en
http://boinc-doc.net/_phpMyFAQ/index.php?action=artikel&cat=291798&id=34&artlang=en

Then I copy them into phpDig with the engine set to only do one page and not look outside the page.

I use similar a technique to output lists of the topics for my Owner's Manual so as the content of the phpMyFAQ database changes with added error messages (with topic and keywords defined for discrimination).
NickRac
Posts: 5
Joined: Fri Jul 08, 2005 2:21 pm

Post by NickRac »

hey guys, my name is also Nick - I was the owner of cpanelFAQ AFTER nickn - all I can say is that this hack/mod was amazing.

Google spidered every page, they all got listed, they all got indexed and the results were AMAZING.

I have since sold the site and am working on starting a few others including WHMFAQ.com and vbulletinFAQ.com, both of which will have this hack (unless 1.5.0 is released soon)

Good Luck!
Nick
Post Reply