Related articles
Digging deeper into AWstats
Armijn Hemel,
January 21, 2011,
3338 views.
AWstats is a popular program to analyze logfiles. The AWstats files themselves are neat little databases from which you can extract a wealth of information.
Tags: administration, awstats, web, webdevelopment
In a previous article we looked at how to mine the files that AWstats generates.
We specifically looked at how to extract the per day data. As it turns out, this might not always give you the correct information, but it depends on the directives in your AWstats configuration. By default visits by scripts/robots are stored in a different section, which can be found in the section specifically for robots. This can be quite a lot of data traffic if your site is popular with search engines.
Another source of bandwidth that we did not measure were the errors. The default configuration only counts bandwidth from requests with status 200 and 304 as valid and treats the rest as errors. This excludes things as 206 (partial content) and 207 (multi-status), which we see often with big downloads, or when running WebDAV (Subversion, webdisks).
One solution is to adapt the AWstats configuration to include more HTTP codes:
ValidHTTPCodes="200 206 207 304"
and to ignore robots:
LevelForRobotsDetection=0
In the case of robots you might lose information you would otherwise be interested in.
Another, perhaps better solution is to extend the script we wrote earlier and also have it report the bandwidth from robots and errors.
#! /usr/bin/python
import sys
total = 0
def processFile(filename):
tmptotal = 0
pos = 0
robots = 0
errors = 0
line = filename.readline()
while line:
try:
if line.split()[0] == "POS_DAY":
pos = line.split()[1]
break
if line.split()[0] == "POS_ROBOT":
robots = line.split()[1]
if line.split()[0] == "POS_ERRORS":
errors = line.split()[1]
if pos != 0 and robots != 0 and errors != 0:
break
except: pass
line = filename.readline()
pos = int(pos)
filename.seek(pos)
days = int(filename.readline().split()[1])
for i in range(1,days+1):
tmptotal += int(filename.readline().split()[3])
robots = int(robots)
awfile.seek(robots)
numrobots = int(awfile.readline().split()[1])
for i in range(1,numrobots+1):
tmptotal += int(awfile.readline().split()[2])
errors = int(errors)
awfile.seek(errors)
numerrors = int(awfile.readline().split()[1])
for i in range(1,numerrors+1):
tmptotal += int(awfile.readline().split()[2])
return tmptotal
for i in sys.argv[1:]:
awfile = open(i)
total += processFile(awfile)
print total
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Netherlands License.










