Rob's plants
home garden plants wildlife seed photos
plant sale journal topics plantlinks fun guestbook

Rob's access log tool

In the textbox below is a php script I wrote to view the access logs for my web server in a readable format. It works great for me (moderate traffic site), and maybe it will work for you too. Instructions for using the tool are listed on this page. The tool is strictly for viewing, and doesn't attempt any kind of statistics. Please feel free to play around with this, and let me know if it works for you!
rob AT robsplants.com.

Click here to download a zip file with the access log tool

Mini-documentation

Installation

  • Download the zip file by clicking the link above.
  • Open the zip file, and copy the four files contained within to a folder in your www structure. It's best to put it in a directory that requires user authentication, to prevent access by people who just stumble into it.
  • Note: all four files should be in the same folder.
  • Adjust the parameters in the configuration file to suit your needs. Instructions on how to use the file are listed below. You'll especially need to set the file naming convention and directory so that it can find your logs.

Using the access log tool

  • At your browser, just fire up your .php file.
  • A request without a querystring (GET parameters) will retrieve the log for the current day
  • The following querystring parameters will modify how the tool works:
    • showlog.php?date=<date> returns results for the specified date
    • By default, the log will not include accesses by IPs you've defined as your own. To see your own accesses, use showlog.php?me=1
    • You can override the skipping of common requests (see logconfig.ini below) by using showlog.php?all=1
    • Use showlog.php?find=<search term> to display only entries that have the search term as part of the requested file. This setting overrides the [Excludes] settings and all=1 parameter. Examples: find=accesslog.php would retrieve all requests for the page you are reading; find=POST would find all POST requests (e.g., form submittals).
    • Use showlog.php?cluster=[1|2|3|4] to override the $ipcluster setting which governs how many segments of the IP address are used to determine whether subsequent requests are by the same person (to address those with ever-changing IPs). For example if cluster=2, then 24.65.199.34 will be treated as the same IP as 24.65.86.12.
    • If you are using the whois functionality, you may specify the whois server to query. For example: showlog.php?whoserver=whois.ripe.net.
  • These parameters may be combined, e.g. showlog.php?date=2/23&me=1&cluster=3

About the output

The tool will generate a report detailing the activity at your website. The following points are worth noting:
  • The first request by a "new" IP address (subject to the clustering mentioned above) will generate a heading, showing the IP, the alias (if defined in the $ips array), and the useragent string of the requester. This string shows which browser and operating system are being used by the person making the request.
  • If the whois or host (reverse DNS lookup) function is activated (see the bottom of this page), the IP address will be a link that pops up the DNS/whois query for that IP. Both functions can be turned on simultaneously.
  • If the request was accompanied by a valid "referred by" entry (not "-" and not from an address within your own website), this will be shown as the URL below the heading. The URL is a link - you can click to see where your visitor came from.
  • Subsequent requests by the same IP are shown under the same heading. The script incorporates some buffering, so that simulataneous accesses by several people are handled properly. Depending on the traffic at your site, you may need to adjust the $bufsize and $timeout parameters to get this working just right for you. The default settings work nicely for a moderate-traffic site, especially if common files such as small gifs are filtered out.
  • Requests with error codes (anything besides 200 and 304) are shown in bold red. Common error codes are 301/302/303, which indicate that your request was redirected (check for spelling errors in your links), and of course 404, indicating a request for a page that could not be found. Scanning your logs for error entries is a great way to detect small errors in your web design that may have slipped your attention.

The zip file download contains a template logconfig.ini file, which defines a few of the parameters that modify the script output. The file is divided into sections, with headings in square brackets. For now, the following sections are recognized:

  • [IPs]: This section defines known (blocks of) IP addresses, for which you want the alias to appear in the log (the numerical IP address will still be shown as well). The alias could be the internet provider to which the IP is assigned, or it could be the name of the person/organization who owns the IP. Note that many IP addresses float, so that a person may visit using one IP address one day, and another the next day - or even several IP addresses during the same visit. Each entry in this section consists of a line with the IP address (block) definition followed by the alias you wish to assign. You may specify IPs or IP blocks as:
    • Full four-segment IP address (only matched by identical IP address)
    • Two- or three-segment IP address (matched by IP addresses starting with the same segment values
    • IP address in which last segment is a range, e.g. 12.55.128.128-255 or 65.174-180 (matched by IP addresses whose segments are within the specified range.
  • [Excludes]: This section defines common files, the accesses to which you wish not to include in the log. For example, excluding small gifs or other graphics that appear on many of your pages will unclutter your logs and make them easier to peruse. Each line in this section simply defines an exclude string. If the exclude string appears in any part of the request string, that particular request will be left out. Wildcards are not supported - use ".gif" rather than "*.gif".
  • [LogLocation]: This section defines where the log viewer goes to look for your log files. You need to define the directory and the file naming convention. For now, conventions are defined for today's log, logs for previous days in the current week, and logs for previous weeks in the current year. The comments in the configuration file will help you decide how to set these location parameters. The naming conventions in the configuration file below reflect those of my 1and1.com server account.
  • [Settings]: This section allows you to override the default settings for a number of parameters that govern the behavior and output of the log viewer. See the documentation in the example below for more details. A few ones that may not be immediately obvious:
    • The myips parameter sets the number of IP addresses associated with yourself. Accesses by this IP will not appear in the log. For example, if you set myips to 2, the first two IPs defined in the [IPs] section will be skipped. Don't use this feature if your IP address floats in a common block of addresses, since you may wind up skipping visits by others as well.
    • The useragent_truncate parameter is the number of characters of the user agent string to display in the log. Some user agent strings are ridiculously long, and clutter up the log. Set this parameter to 0 to see the whole thing.

Two additional scripts in the zip file allow you to query whois or reverse DNS lookup information for the IP addresses that visited your site, and add IP definition entries to logconfig.ini. To use the whois function, you need to have some additional php (pear) components installed. If you want to do without this feature for now, just turn whois and host off in logconfig.ini.

home garden plants wildlife seed plant sale topics guestbook journal plantlinks

Last modified: September 13, 2004
Contact me