Protect your WordPress-based web site from AI-bots

Scenario: Customer web-site hit hard by AI bots. Server run Apache. Only one virtual host present. Task is to protect service from AI-bots at the same time allow normal traffic.

Configuration fragment of Apache server:

<VirtualHost *:443>
  ServerName your-great-site.ca
  SSLEngine on
</VirtualHost>

Solution implementation steps:

  • Install docker
  • Make sure mod_proxy for Apache installed/enabled
  • Change Apache configuration to proxy requests to Anubis
  • Add Apache configuration to serve as backend for Anubis
  • Restart Apache
  • Configure Anubis docker container for basic operations
  • Start Anubis container

Implementation (following process for SLES. for other Linux distro it is pretty similar with minimal adjustments).

Install docker-compse:

zypper install docker docker-compose -y

Start docker service:

systemctl enable --now docker.service

Create directory structure and docker-file:

mkdir -p /opt/anubis
mkdir -p /opt/anubis/config
touch /opt/anubis/docker-compose.yml
touch /opt/anubis/config/policies.yml

Content of docker-compose.yml:

services:
  anubis:
    image: ghcr.io/techarohq/anubis:latest
    container_name: anubis
    restart: always
    network_mode: "host"
    ports:
      - "127.0.0.1:8923:8923"   # bind only to localhost
    environment:
      # TARGET must point to your internal WordPress Apache backend
      - TARGET=http://127.0.0.1:8023
      - COOKIE_DOMAIN=your-great-site.ca
      - OG_PASSTHROUGH=true
      - OG_EXPIRY_TIME=1h
      - OG_CACHE_CONSIDER_HOST=true
      # Add this to allow your domain
      - REDIRECT_DOMAINS=your-great-site.ca
    volumes:
      - ./config:/data

Content of policies.yml (very basic policy. you can adjust it at your will):

allow:
  - path: '/robots.txt'
  - path_regex: '^/wp-content'
  - path_regex: '^/wp-includes'
  - path_regex: '^/wp-admin/admin-ajax.php'
  - user_agent_regex: 'Googlebot'
  - ip_cidr: '203.0.113.0/24'
challenge:
  - path_regex: '^/wp-admin'
block:
  - path_regex: '^/api/scrape'

Anubis part is done. let`s start container:

docker-compose up -d

Apache preparation.

First check if mod proxy is installed/enabled. ATTN!!!! This is a key point. If no mod proxy installed/working, process will fail.

a2enmod proxy
a2enmod proxy_http
apachectl -M | grep proxy

Modify initial Virtualhost configuration to pass all incoming traffic to Anubis. You should get something like this:

<VirtualHost *:443>
  ServerName your-great-site.ca
  SSLEngine on

  # Preserve original host & IP headers for Anubis
  ProxyPreserveHost On
  RequestHeader set X-Forwarded-Proto "https"
  RequestHeader set X-Forwarded-For %{REMOTE_ADDR}s
  RequestHeader set X-Real-IP %{REMOTE_ADDR}s
  RequestHeader set X-Forwarded-Proto https

  ProxyPass        /  http://127.0.0.1:8923/
  ProxyPassReverse /  http://127.0.0.1:8923/

</VirtualHost>

Create second listener for backend of site. it should be something like this:

Listen 8023

<VirtualHost *:8023>
    ServerName your-great-site-backend.ca

    DocumentRoot /srv/www/htdocs
    <Directory "/srv/www/htdocs">
        Options FollowSymLinks
        AllowOverride All
        Require all granted
    </Directory>

    # Optional: disable SSL on backend
    SSLEngine off

    ErrorLog /var/log/apache2/wordpress-backend-error.log
    CustomLog /var/log/apache2/wordpress-backend-access.log combined
</VirtualHost>

Check configuration sanity and restart apache

apachectl -t
service httpd stop
service httpd start

Enjoy your server free of unnessessary workload from AI bots!