Digging deeper into AWstats
January 21, 2011,
AWstats is a popular program to analyze logfiles. The AWstats files themselves are neat little databases from which you can extract a wealth of information.
In a previous article we looked at how to mine the files that AWstats generates.
We specifically looked at how to extract the per day data. As it turns out, this might not always give you the correct information, but it depends on the directives in your AWstats configuration. By default visits by scripts/robots are stored in a different section, which can be found in the section specifically for robots. This can be quite a lot of data traffic if your site is popular with search engines.
Another source of bandwidth that we did not measure were the errors. The default configuration only counts bandwidth from requests with status 200 and 304 as valid and treats the rest as errors. This excludes things as 206 (partial content) and 207 (multi-status), which we see often with big downloads, or when running WebDAV (Subversion, webdisks).
One solution is to adapt the AWstats configuration to include more HTTP codes:
ValidHTTPCodes="200 206 207 304"
and to ignore robots:
In the case of robots you might lose information you would otherwise be interested in.
Another, perhaps better solution is to extend the script we wrote earlier and also have it report the bandwidth from robots and errors.
#! /usr/bin/python import sys total = 0 def processFile(filename): tmptotal = 0 pos = 0 robots = 0 errors = 0 line = filename.readline() while line: try: if line.split() == "POS_DAY": pos = line.split() break if line.split() == "POS_ROBOT": robots = line.split() if line.split() == "POS_ERRORS": errors = line.split() if pos != 0 and robots != 0 and errors != 0: break except: pass line = filename.readline() pos = int(pos) filename.seek(pos) days = int(filename.readline().split()) for i in range(1,days+1): tmptotal += int(filename.readline().split()) robots = int(robots) awfile.seek(robots) numrobots = int(awfile.readline().split()) for i in range(1,numrobots+1): tmptotal += int(awfile.readline().split()) errors = int(errors) awfile.seek(errors) numerrors = int(awfile.readline().split()) for i in range(1,numerrors+1): tmptotal += int(awfile.readline().split()) return tmptotal for i in sys.argv[1:]: awfile = open(i) total += processFile(awfile) print total