Instance gets spammed with search requests / How to track down originating IP

In this board you can talk about general questions about phpMyFAQ

Moderator: Thorsten

Brückentroll
Posts: 5
Joined: Thu Oct 10, 2024 11:18 am

Instance gets spammed with search requests / How to track down originating IP

Post by Brückentroll »

Hi all,

an instance of phpMyFAQ I manage is getting spammed with identical searches every other second - I can see table _faqsearches grow. Unnoticed, that lead to the table reaching 2.5 million lines over time and resulted in 100% CPU usage of the (rather low-end) server who struggled with the query running every time a search was entered there (I could verify that being the issue by looking at the running MySQL processes / queries, they took 5+ seconds and new ones came in faster).

I managed to temporarily "fix" the performance issue by emptying the _faqsearches table (the stats based of that were useless anyway due to the number of spam-searches), thus shortening query time enough to avoid the load issue, but the incoming queries are ongoing.

I have updated to 3.2.9, no change.
I am using a SLIGHTLY modified default theme (no functional changes, only removing some sections that are unwanted for our use-case, such as the comment-block).
I have tried banning some suspicious IPs that turned up in _faqsessions via an HTACCESS block but those seem to have been unrelated.

a) Any idea what could be causing this?

b) Does someone have a suggestion for where I could make a change in the code to temporarily save the searching user's IP as well, in order to try and pin down the source of the requests? I was unable to find the relevant section since I am not versed in OOP style PHP.

Thank you for any suggestions.
Thorsten
Posts: 15723
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: Instance gets spammed with search requests / How to track down originating IP

Post by Thorsten »

Hi,

that's bad, sorry for this!

To avoid new entries, you can edit the Search class in phpmyfaq/src/phpMyFAQ/Search.php:

From

Code: Select all

    public function logSearchTerm(string $searchTerm): void
    {
        if (Strings::strlen($searchTerm) === 0) {
            return;
        }

        $dateTime = new DateTime();
        $query = sprintf(
            "INSERT INTO %s (id, lang, searchterm, searchdate) VALUES (%d, '%s', '%s', '%s')",
            $this->table,
            $this->configuration->getDb()->nextId($this->table, 'id'),
            $this->configuration->getLanguage()->getLanguage(),
            $this->configuration->getDb()->escape($searchTerm),
            $dateTime->format('Y-m-d H:i:s')
        );

        $this->configuration->getDb()->query($query);
    }
to

Code: Select all

    public function logSearchTerm(string $searchTerm): void
    {
        return; // Don't store anything
        
        if (Strings::strlen($searchTerm) === 0) {
            return;
        }

        $dateTime = new DateTime();
        $query = sprintf(
            "INSERT INTO %s (id, lang, searchterm, searchdate) VALUES (%d, '%s', '%s', '%s')",
            $this->table,
            $this->configuration->getDb()->nextId($this->table, 'id'),
            $this->configuration->getLanguage()->getLanguage(),
            $this->configuration->getDb()->escape($searchTerm),
            $dateTime->format('Y-m-d H:i:s')
        );

        $this->configuration->getDb()->query($query);
    }
bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
Thorsten
Posts: 15723
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: Instance gets spammed with search requests / How to track down originating IP

Post by Thorsten »

Hi,

to get the full IP addresses, you have to remove this line:

https://github.com/thorsten/phpMyFAQ/bl ... n.php#L347

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
Brückentroll
Posts: 5
Joined: Thu Oct 10, 2024 11:18 am

Re: Instance gets spammed with search requests / How to track down originating IP

Post by Brückentroll »

Thank you, that helped me out a lot. I can now log the relevant IPs and we will see where that gets me.
Brückentroll
Posts: 5
Joined: Thu Oct 10, 2024 11:18 am

Re: Instance gets spammed with search requests / How to track down originating IP

Post by Brückentroll »

That was fast: Turns out the request are coming from Facebook/Meta IPs.

I assume that somewhere a crawler or link preview functionality in a facebook post is hitting on an URL that included a search request, and every time someone calls up that post a new search is registered. Will now look into blocking FB/Meta IPs/Bots to fix this issue.

Thanks again for your support!
Thorsten
Posts: 15723
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: Instance gets spammed with search requests / How to track down originating IP

Post by Thorsten »

Hi,

awesome, which bot is it, so we could add it to our default list of blocked bots.

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
Brückentroll
Posts: 5
Joined: Thu Oct 10, 2024 11:18 am

Re: Instance gets spammed with search requests / How to track down originating IP

Post by Brückentroll »

not sure how to figure that out but these are some of the relevant IPs:

173.252.107.12
57.141.0.6
57.141.0.11
57.141.0.4
R0CKY
Posts: 68
Joined: Mon May 19, 2008 9:18 pm

Re: Instance gets spammed with search requests / How to track down originating IP

Post by R0CKY »

Hi

Just to say this is still a problem - I had exactly the same issue as Brückentroll and it was also Meta spider constantly searching every 4 seconds for "unable log on" - there have been 1125876 searches on my site for this - every 4 seconds.

It got so bad that my webhost took my site offline due to 100% CPU usage.

1. Blocked meta using robots.txt
2. Blocked the meta IP 57.141.7. in htaccess
R0CKY
Posts: 68
Joined: Mon May 19, 2008 9:18 pm

Re: Instance gets spammed with search requests / How to track down originating IP

Post by R0CKY »

Well that didn't work for long - the FAQ is being bombarded today again with "unable log on" search queries every few seconds but I have been unable to identify which IP it is this time.

I have had to take the FAQ offline until this can be resolved.
Last edited by R0CKY on Tue Nov 05, 2024 9:02 pm, edited 1 time in total.
Thorsten
Posts: 15723
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: Instance gets spammed with search requests / How to track down originating IP

Post by Thorsten »

Hi,

wow, this is bad. Did you add something like this in your robots.txt?

Code: Select all

User-agent: Amazonbot
User-agent: anthropic-ai
User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: cohere-ai
User-agent: Diffbot
User-agent: FacebookBot
User-agent: facebookexternalhit
User-agent: FriendlyCrawler
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: GPTBot
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: Meta-ExternalAgent
User-agent: OAI-SearchBot
User-agent: omgili
User-agent: omgilibot
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: Scrapy
User-agent: Timpibot
User-agent: VelenPublicWebCrawler
User-agent: YouBot
User-agent: Meta-ExternalFetcher
User-agent: Applebot
Disallow: /
bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
Thorsten
Posts: 15723
Joined: Tue Sep 25, 2001 11:14 am
Location: #phpmyfaq
Contact:

Re: Instance gets spammed with search requests / How to track down originating IP

Post by Thorsten »

R0CKY wrote: Mon Nov 04, 2024 9:37 pm Well that didn't work for long - the FAQ is being bombarded today again with "unable log on" search queries every few seconds but I have been unabel to identify which IP it is this time.

I have had to take the FAQ offline until this can be resolved.
Do these search queries also come from the Meta Bot?

bye
Thorsten
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
R0CKY
Posts: 68
Joined: Mon May 19, 2008 9:18 pm

Re: Instance gets spammed with search requests / How to track down originating IP

Post by R0CKY »

Hi Thorsten

Found these is server logs, does this help?


facebookexternalhit/1.1
meta-externalagent/1.1

https://developers.facebook.com/docs/sh ... rs/crawler
http://www.facebook.com/externalhit_uatext.php
R0CKY
Posts: 68
Joined: Mon May 19, 2008 9:18 pm

Re: Instance gets spammed with search requests / How to track down originating IP

Post by R0CKY »

Thorsten wrote: Tue Nov 05, 2024 6:46 pm Hi,

wow, this is bad. Did you add something like this in your robots.txt?

Code: Select all

User-agent: Amazonbot
User-agent: anthropic-ai
User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: cohere-ai
User-agent: Diffbot
User-agent: FacebookBot
User-agent: facebookexternalhit
User-agent: FriendlyCrawler
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: GPTBot
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: Meta-ExternalAgent
User-agent: OAI-SearchBot
User-agent: omgili
User-agent: omgilibot
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: Scrapy
User-agent: Timpibot
User-agent: VelenPublicWebCrawler
User-agent: YouBot
User-agent: Meta-ExternalFetcher
User-agent: Applebot
Disallow: /
bye
Thorsten
Hi No, I only added

User-agent: meta-externalagent
Disallow: /faqs/ # Disallow a specific directory
R0CKY
Posts: 68
Joined: Mon May 19, 2008 9:18 pm

Re: Instance gets spammed with search requests / How to track down originating IP

Post by R0CKY »

I have now blocked these IPs

Deny from 57.141.7.
Deny from 173.252.
Deny from 66.220.149
Deny from 69.171.
Deny from 173.252.

And changed my robots.txt to

User-agent: meta-externalagent
User-agent: FacebookBot
User-agent: facebookexternalhit
User-agent: PetalBot
User-agent: Meta-ExternalFetcher
Disallow: /faqs/

I will put FAQs back online and see what happens by tomorrow.
R0CKY
Posts: 68
Joined: Mon May 19, 2008 9:18 pm

Re: Instance gets spammed with search requests / How to track down originating IP

Post by R0CKY »

Things seem okay now.
Post Reply