For reasons that are pretty much historical, I use Webalizer for analysing my web site access logs. (I’m in the market for something better: any suggestions would be welcome.) Since earlier this month, however, my statistics have been broken: Webalizer died with a SEGFAULT every time it tried to parse the access log. The culprit wasn’t too difficult to find, but it was an interesting exercise that probably benefits from being passed on.
In order to try to work out what was happening, I employed the services of strace, a handy debugging tool that monitors the system calls made by an application.
strace -o strace.log webalizer (normal command-line options)
After a few minutes of parsing, Webalizer died as before. I
checked the last few lines of
read(3, ".1" 200 5981 "http://po-ru.com/p"..., 4096) = 4096 read(3, "ul/small/IMG_2233.JPG HTTP/1.1" "..., 4096) = 4096 read(3, "ot-Mobile/2.1; +http://www.googl"..., 4096) = 4096 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++
From this, I could see that the last section parsed contained the text ‘ot-Mobile’. That was sufficiently unusual as a user agent to pique my interest. Running grep on the logs revealed something worth following up: a user agent I’d never seen before.
"Nokia6820/2.0 (4.83) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
From there, I did something very simple: I just removed the four lines containing ‘Googlebot-Mobile’ from the access log before feeding it to Webalizer.
This warrants further investigation, but for now, I’m happy to have fixed the problem and to have got my logs back.