Scenario: Customer web-site hit hard by AI bots. Server run Apache. Only one virtual host present. Task is to protect service from AI-bots at the same time allow normal traffic.
Configuration fragment of Apache server:
<VirtualHost *:443>
ServerName your-great-site.ca
SSLEngine on
</VirtualHost>
Solution implementation steps:
- Install docker
- Make sure mod_proxy for Apache installed/enabled
- Change Apache configuration to proxy requests to Anubis
- Add Apache configuration to serve as backend for Anubis
- Restart Apache
- Configure Anubis docker container for basic operations
- Start Anubis container
Implementation (following process for SLES. for other Linux distro it is pretty similar with minimal adjustments).
Install docker-compse:
zypper install docker docker-compose -y
Start docker service:
systemctl enable --now docker.service
Create directory structure and docker-file:
mkdir -p /opt/anubis
mkdir -p /opt/anubis/config
touch /opt/anubis/docker-compose.yml
touch /opt/anubis/config/policies.yml
Content of docker-compose.yml:
services:
anubis:
image: ghcr.io/techarohq/anubis:latest
container_name: anubis
restart: always
network_mode: "host"
ports:
- "127.0.0.1:8923:8923" # bind only to localhost
environment:
# TARGET must point to your internal WordPress Apache backend
- TARGET=http://127.0.0.1:8023
- COOKIE_DOMAIN=your-great-site.ca
- OG_PASSTHROUGH=true
- OG_EXPIRY_TIME=1h
- OG_CACHE_CONSIDER_HOST=true
# Add this to allow your domain
- REDIRECT_DOMAINS=your-great-site.ca
volumes:
- ./config:/data
Content of policies.yml (very basic policy. you can adjust it at your will):
allow:
- path: '/robots.txt'
- path_regex: '^/wp-content'
- path_regex: '^/wp-includes'
- path_regex: '^/wp-admin/admin-ajax.php'
- user_agent_regex: 'Googlebot'
- ip_cidr: '203.0.113.0/24'
challenge:
- path_regex: '^/wp-admin'
block:
- path_regex: '^/api/scrape'
Anubis part is done. let`s start container:
docker-compose up -d
Apache preparation.
First check if mod proxy is installed/enabled. ATTN!!!! This is a key point. If no mod proxy installed/working, process will fail.
a2enmod proxy
a2enmod proxy_http
apachectl -M | grep proxy
Modify initial Virtualhost configuration to pass all incoming traffic to Anubis. You should get something like this:
<VirtualHost *:443>
ServerName your-great-site.ca
SSLEngine on
# Preserve original host & IP headers for Anubis
ProxyPreserveHost On
RequestHeader set X-Forwarded-Proto "https"
RequestHeader set X-Forwarded-For %{REMOTE_ADDR}s
RequestHeader set X-Real-IP %{REMOTE_ADDR}s
RequestHeader set X-Forwarded-Proto https
ProxyPass / http://127.0.0.1:8923/
ProxyPassReverse / http://127.0.0.1:8923/
</VirtualHost>
Create second listener for backend of site. it should be something like this:
Listen 8023
<VirtualHost *:8023>
ServerName your-great-site-backend.ca
DocumentRoot /srv/www/htdocs
<Directory "/srv/www/htdocs">
Options FollowSymLinks
AllowOverride All
Require all granted
</Directory>
# Optional: disable SSL on backend
SSLEngine off
ErrorLog /var/log/apache2/wordpress-backend-error.log
CustomLog /var/log/apache2/wordpress-backend-access.log combined
</VirtualHost>
Check configuration sanity and restart apache
apachectl -t
service httpd stop
service httpd start
Enjoy your server free of unnessessary workload from AI bots!