Digging deeper into AWstats

Armijn Hemel, January 21, 2011, 15030 views.

AWstats is a popular program to analyze logfiles. The AWstats files themselves are neat little databases from which you can extract a wealth of information.

Tags: , , ,

In a previous article we looked at how to mine the files that AWstats generates.

We specifically looked at how to extract the per day data. As it turns out, this might not always give you the correct information, but it depends on the directives in your AWstats configuration. By default visits by scripts/robots are stored in a different section, which can be found in the section specifically for robots. This can be quite a lot of data traffic if your site is popular with search engines.

Another source of bandwidth that we did not measure were the errors. The default configuration only counts bandwidth from requests with status 200 and 304 as valid and treats the rest as errors. This excludes things as 206 (partial content) and 207 (multi-status), which we see often with big downloads, or when running WebDAV (Subversion, webdisks).

One solution is to adapt the AWstats configuration to include more HTTP codes:

ValidHTTPCodes="200 206 207 304"

and to ignore robots:

LevelForRobotsDetection=0

In the case of robots you might lose information you would otherwise be interested in.

Another, perhaps better solution is to extend the script we wrote earlier and also have it report the bandwidth from robots and errors.

#! /usr/bin/python
 
import sys
 
total = 0
 
def processFile(filename):
        tmptotal = 0
        pos = 0
        robots = 0
        errors = 0
        line = filename.readline()
        while line:
                try:
                        if line.split()[0] == "POS_DAY":
                                pos = line.split()[1]
                                break
                        if line.split()[0] == "POS_ROBOT":
                                robots = line.split()[1]
                        if line.split()[0] == "POS_ERRORS":
                                errors = line.split()[1]
                        if pos != 0 and robots != 0 and errors != 0:
                                break
                except: pass
                line = filename.readline()
 
        pos = int(pos)
        filename.seek(pos)
        days = int(filename.readline().split()[1])
 
        for i in range(1,days+1):
                tmptotal += int(filename.readline().split()[3])
 
        robots = int(robots)
        awfile.seek(robots)
        numrobots = int(awfile.readline().split()[1])
 
        for i in range(1,numrobots+1):
                tmptotal += int(awfile.readline().split()[2])
 
        errors = int(errors)
        awfile.seek(errors)
        numerrors = int(awfile.readline().split()[1])
 
        for i in range(1,numerrors+1):
                tmptotal += int(awfile.readline().split()[2])
 
        return tmptotal
 
for i in sys.argv[1:]:
        awfile = open(i)
        total += processFile(awfile)
 
print total
Social networking: Tweet this article on Twitter Pass on this article on LinkedIn Bookmark this article on Google Bookmark this article on Yahoo! Bookmark this article on Technorati Bookmark this article on Delicious Share this article on Facebook Digg this article on Digg Submit this article to Reddit Thumb this article up at StumbleUpon Submit this article to Furl

Talkback

respond to this article