- Digging deeper into AWstats
- Book review: Everything you know about CSS is wrong
- Multilingual websites and webapplications using PHP and Smarty, part 1: detecting languages and locales
- Using OpenStreetMaps with the Google Maps API
- Multilingual websites and webapplications using PHP and Smarty, part 3: locales
Digging through AWstats files
May 20, 2010,
AWstats is a popular program to analyze logfiles. The AWstats files themselves are neat little databases from which you can extract a wealth of information.
With AWstats you can do neat things with Apache logfiles. It is very efficient in crunching large logfiles. The data is written by AWstats to a simple text file which contains a lot of information. AWstats has been developed and optimized for many years and it is fast, so we let it do a lot of the heavy lifting and then use the information AWstats has generated. The information that AWstats extracts from logfiles ranges from used bandwidth, to hits and search engine search strings. AWstats writes this information to a file, typically one per month per domain. The location where you can find these files varies, but /var/www/awstats/ is fairly typical.
In the beginning of the AWstats file there is a section that has a list of byte offsets inside the file for the categories that AWstats reports on, for example:
This says that the section with the list of visits can be found at byte offset 52400 in the file.
One of the things that AWstats can measure is how much bytes were sent by Apache. This information is useful for us, since we use it to monitor bandwidth trends. This information is kept in the section with daily reports. By looking at the right offset and jumping to that section we can easily get access to all the right data.
The section header specifies for how many days there are logs:
After that there is a line of data per day. A line of data looks like this:
20100501 768 1396 8933719 126
The first number is obviously the date. The rest of the numbers are the amount of pages that were served on that day, the amount of hits, the bandwidth and the unique visits.
So a script that would print the amount of data for all days in a single month (we use one file per month) would need to do the following:
- find the offset in the file
- find out how many days there are
- loop over the right number of lines (which is the number of days) and sum over the right data
or, in Python:
#! /usr/bin/python import sys total = 0 def processFile(filename): tmptotal = 0 pos = 0 line = filename.readline() while line: try: if line.split() == "POS_DAY": pos = line.split() break except: pass line = filename.readline() pos = int(pos) filename.seek(pos) days = int(filename.readline().split()) for i in range(1,days+1): tmptotal += int(filename.readline().split()) return tmptotal for i in sys.argv[1:]: awfile = open(i) total += processFile(awfile) print total
You could run this script with a command similar to this:
$ awstats.py /path/to/awstatsfile