Digging through AWstats files

Armijn Hemel, May 20, 2010, 9354 views.

AWstats is a popular program to analyze logfiles. The AWstats files themselves are neat little databases from which you can extract a wealth of information.

Tags: , , ,

With AWstats you can do neat things with Apache logfiles. It is very efficient in crunching large logfiles. The data is written by AWstats to a simple text file which contains a lot of information. AWstats has been developed and optimized for many years and it is fast, so we let it do a lot of the heavy lifting and then use the information AWstats has generated. The information that AWstats extracts from logfiles ranges from used bandwidth, to hits and search engine search strings. AWstats writes this information to a file, typically one per month per domain. The location where you can find these files varies, but /var/www/awstats/ is fairly typical.

In the beginning of the AWstats file there is a section that has a list of byte offsets inside the file for the categories that AWstats reports on, for example:

POS_VISITOR 52400

This says that the section with the list of visits can be found at byte offset 52400 in the file.

One of the things that AWstats can measure is how much bytes were sent by Apache. This information is useful for us, since we use it to monitor bandwidth trends. This information is kept in the section with daily reports. By looking at the right offset and jumping to that section we can easily get access to all the right data.

The section header specifies for how many days there are logs:

BEGIN_DAY 19

After that there is a line of data per day. A line of data looks like this:

20100501 768 1396 8933719 126

The first number is obviously the date. The rest of the numbers are the amount of pages that were served on that day, the amount of hits, the bandwidth and the unique visits.

So a script that would print the amount of data for all days in a single month (we use one file per month) would need to do the following:

  1. find the offset in the file
  2. find out how many days there are
  3. loop over the right number of lines (which is the number of days) and sum over the right data

or, in Python:

#! /usr/bin/python
 
import sys
 
total = 0
 
def processFile(filename):
        tmptotal = 0
        pos = 0
        line = filename.readline()
        while line:
                try:
                        if line.split()[0] == "POS_DAY":
                                pos = line.split()[1]
                                break
                except: pass
                line = filename.readline()
 
        pos = int(pos)
        filename.seek(pos)
        days = int(filename.readline().split()[1])
 
        for i in range(1,days+1):
                tmptotal += int(filename.readline().split()[3])
        return tmptotal
 
for i in sys.argv[1:]:
        awfile = open(i)
        total += processFile(awfile)
 
print total

You could run this script with a command similar to this:

$ awstats.py /path/to/awstatsfile
Social networking: Tweet this article on Twitter Pass on this article on LinkedIn Bookmark this article on Google Bookmark this article on Yahoo! Bookmark this article on Technorati Bookmark this article on Delicious Share this article on Facebook Digg this article on Digg Submit this article to Reddit Thumb this article up at StumbleUpon Submit this article to Furl

Talkback

respond to this article