Hi,
thanks for the one good news today.
bye
Thorsten
Instance gets spammed with search requests / How to track down originating IP
Moderator: Thorsten
Re: Instance gets spammed with search requests / How to track down originating IP
phpMyFAQ Maintainer and Lead Developer
amazon.de Wishlist
amazon.de Wishlist
-
- Posts: 5
- Joined: Thu Oct 10, 2024 11:18 am
Re: Instance gets spammed with search requests / How to track down originating IP
Has been a while since I checked into this thread, thanks for all the additional info.
By logging the HTTP_USER_AGENT I also came up with the apparently new crawler "meta-externalagent" (see https://developers.facebook.com/docs/sh ... b-crawlers).
Blocking this via robots.txt has apparently resolved the issue. Will add the other agents that came up in this thread as needed.
Edit: since robots.txt appears to get ignored (at least to some extent) i blocked anything called facebook/meta via htaccess (while allowing the google bot) by adapting this (https://help.webhostinghub.com/hc/en-us ... wling-Bots):
#https://help.webhostinghub.com/hc/en-us ... wling-Bots
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(facebook|meta|bot|crawl|spider).*$ [NC]
# Allow Googlebot through:
RewriteCond %{HTTP_USER_AGENT} !Google [NC]
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteRule .* - [F]
By logging the HTTP_USER_AGENT I also came up with the apparently new crawler "meta-externalagent" (see https://developers.facebook.com/docs/sh ... b-crawlers).
Blocking this via robots.txt has apparently resolved the issue. Will add the other agents that came up in this thread as needed.
Edit: since robots.txt appears to get ignored (at least to some extent) i blocked anything called facebook/meta via htaccess (while allowing the google bot) by adapting this (https://help.webhostinghub.com/hc/en-us ... wling-Bots):
#https://help.webhostinghub.com/hc/en-us ... wling-Bots
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(facebook|meta|bot|crawl|spider).*$ [NC]
# Allow Googlebot through:
RewriteCond %{HTTP_USER_AGENT} !Google [NC]
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteRule .* - [F]