Logging unconfigured domain names in Apache

Joor Loohuis, April 1, 2010, 9117 views.

When using virtual hosting with Apache, occasionally you may have domain names pointing to your webserver that you don't know about yet. A little tweak to the logging format will help you find out what these domain names are.

Tags: , , , ,

Over here at Loco we host an interesting variation of websites and web applications on Apache with a fairly conventional virtual hosting setup. Most of the domains are configured using name based virtual hosting with a default host pointing to a page displaying a standard 'missing domain' message. This has served us well so far, but one problem with this approach was that if a domain name is configured in DNS to point to one of our servers, requests would be directed to the default virtual hosts together with all kinds of other traffic. This makes it hard to detect whether one of our customers has registered another domain name, and just expects it to magically display the correct website.

The problem is that logging of the requests is normally done with the 'combined' log format defined in the Apache configuration. This logs the first line of each HTTP request, but this doesn't contain the domain name of the requested URL. Fortunately the very property that makes name based virtual hosting possible can be used to fix this in Apache 2.0 and up. The HTTP 1.1 protocol that is used by almost all user agents contains a 'Host' request header containing the domain name of the request. We can use this little bit of information to customize what Apache logs. The default 'combined' log format looks like this:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

The %r bit is the first line of the HTTP request, and we're extending this a little bit in another log format:

LogFormat "%h %l %u %t \"%m http://%{Host}i%U %H\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" hostnames

We reconstruct the first line of the HTTP request from the request method (%m), the value of the 'Host' HTTP request header (%{Host}i), the requested URL path (%U), and the request protocol(%H). We call this log format 'hostnames', and we apply it only to the default virtual hosts:

CustomLog logs/access_log hostnames

The result is that rather than the more or less useless

23.75.345.200 - - [01/Apr/2010:14:42:17 +0200] "GET / HTTP/1.1" 404 1283 "-" "Some User Agent"

we log the following

23.75.345.200 - - [01/Apr/2010:14:42:17 +0200] "GET http://www.missing-domain.tld/ HTTP/1.1" 
404 1283 "-" "Some User Agent"

(line break added for clarity). In addition, since we preserved the structure of the 'combined' log format, all our log parsers will handle these lines without modification. In particular, we use logwatch to report on the status of the virtual hosts of our servers, and in the section for the default virtual hosts we get a nice overview of any domain names that we may not know of. Mind you, it will also contain requests with spoofed HTTP headers and numeric IP addresses, but it also allows us to stay one step ahead of our customers, which is a good thing.

Social networking: Tweet this article on Twitter Pass on this article on LinkedIn Bookmark this article on Google Bookmark this article on Yahoo! Bookmark this article on Technorati Bookmark this article on Delicious Share this article on Facebook Digg this article on Digg Submit this article to Reddit Thumb this article up at StumbleUpon Submit this article to Furl


respond to this article