Sunday, July 11, 2010

Analyzing Web Logs with AWStats

One of the important tasks in a Web Site performance improvement is to analyze the web server access logs. It's often overlooked option, but an import task. Access logs provides valuable inputs from to continuous improvement initiatives ranging from architecture, content enhancements to traffic generation.

In this blog we discuss about a simple way of analyzing web logs. Usually analyzing the web logs is a performance intensive task and is usually performed offline. We will look a simple tool called AWStats which can do both offline and real time analysis of the logs. In this blog we will discuss about offline processing.

What is AWStats ?

AWStats is a simple open source log analyzer tool which generates various web site statistics. It can analyze various log file formats like Common, W3C. It can also be customized to do various log formats.

Information / statistics provided by AWStats:

- Number of unique users and visits
- Hit ratios
- Bandwidth used
- Peak hour information
- List of Host, Browser, OS used to access the web site.
- Type of content served / accessed.

Other Features of AWStats:

- Can be analyzed via a CGI program or via offline CLI scripts
- Statistics can be periodically updated. i.e. For example you can update May month's statistics and then June month's and then it will provide combine statistics for the site.
- Can be customized and ability to generate different types of report. It supports standard plug-in model.

In this blog we will discuss about the offline processing.

System Requirements:

- It can run on any Operating System which supports perl. You just need Perl or Active Perl to run it.

Steps to Analyze:

(1) Install AWStats:
Download AWStats . In case of Windows it's available as zip or exe. Download the exe and install it. Let's say we have installed AWStats in e:\tools\AWStats directory.

(2) Install Perl:
If you have already Perl in the system, you can skip this step. Otherwise install latest version of Perl or ActivePerl

(3) Prepare the Access / Request Log:
Each Web Server has a different way of generating access log.
  • If you are using Apache Web Server, you can edit httpd.conf file and enable access log. For example look for CustomLog directive in httpd.conf and enable it. "CustomLog c:\temp\access.log common" directive makes logs to be logged in access.log file.
  • If you are using Apache Tomcat, you can enable access log by un-commenting the access log valve in server.xml.
  • Once the necessary changes are done restart the processes. In some OS it may not be necessary.
  • Once access logs are generated copy the access logs in to the server where AWStats is installed.
(4) Create AWStats configuration file:
  • AWStats requires a configuration files for each website being analyzed. For each web site we need to give a unique configuration name. This name is used while running the AWStats script. Let's say "www.example.com" is the configuration name.
  • A sample conf is present at e:\tools\AWStats\wwwroot\cgi-bin\awstats.model.conf. Copy this file as awstats.www.example.com.conf. This conf file format is just name=value properties file format.
  • Change the following entries in the conf file
SiteDomain="www.example.com" #site domain
HostAliases="www.example.com localhost 1.1.1.1 127.0.0.1" #Add aliases for the website include hostname, ip-address.
LogType=W
LogFormat=1 #specifies the log format is common / apache. Refer the AWStats documentation for more different types.
DNSLookup=1 #Enables DNS reverse lookup (i.e IP Address to hostname). If the webserver access log is already doing it, set this to 0 (zero) to disable it.
          • By default the following files types are not considered as page hits. "css js class gif jpg jpeg png bmp ico rss xml swf" You can change this by NotPageList property in the conf file.

          (5) Generate Statistics:
          Run the following command
          e:\Perl\bin\perl.exe E:\tools\AWStats\wwwroot\cgi-bin\awstats.pl -config=www.example.com -update -LogFile=e:\temp\access_log_Jun30

          Here e:\temp\access_log_Jun30 is the access log file.
          www.example.com is the config name given in previous step.

          You will see an output similar like below in the CLI prompt.

          Update for config "E:\tools\AWStats\wwwroot\cgi-bin\ awstats.www.example.com.conf"
          With data in log file "E:\temp\access.log_Jun30"...
          Phase 1 : First bypass old records, searching new record...
          Searching new records from beginning of log file...
          Phase 2 : Now process new records (Flush history on disk after 20000 hosts)...
          Jumped lines in file: 0
          Parsed lines in file: 539
          Found 1 dropped records,
          Found 4 corrupted records,
          Found 0 old records,
          Found 534 new qualified records.

          This will generate a statistics file awstatsMMYYYY.www.example.com.txt in the same directory as awstats.pl

          (6) Generate Report:
          Run the following command to generate the report.

          e:\Perl\bin\perl.exe e:\tools\AWStats\tools\awstats_buildstaticpages.pl -config=www.example.com -lang=en -awstatsprog="e:\tools\AWStats\wwwroot\cgi-bin\awstats.pl" -dir="e:\temp\report" -diricons="e:\tools\AWStats\wwwroot\icon"

          This will generate a set of html reports in e:\temp\report. You can change this to your preferred directory. This will also generate the report for the current month. If you want to run for a specific month for example june use "-month=06" option in the above command.
          The main index html will be "awstats.www.example.com.html" You can put this in the htdocs directory of a webserver so that everyone can access it or you can just a browser to access it.

          I hope this information is useful to you. Give your comments and it's very much appreciated.

          No comments:

          Post a Comment