|
Rob's access log tool |
|
In the textbox below is a php script I wrote to view the access logs for my
web server in a readable format. It works great for me (moderate traffic site),
and maybe it will work for you too. Instructions for using the tool are listed
on this page. The tool is strictly for viewing, and doesn't attempt any kind of statistics.
Please feel free to play around with this, and let me know if it works for you!
rob AT robsplants.com.
Click here to download a zip file with the access log tool
Mini-documentation
Installation
- Download the zip file by clicking the link above.
- Open the zip file, and copy the four files contained within to a folder in
your www structure. It's best to put
it in a directory that requires user authentication, to prevent access by
people who just stumble into it.
- Note: all four files should be in the same folder.
- Adjust the parameters in the configuration file to suit your needs.
Instructions on how to use the file are listed below. You'll
especially need to set the file naming convention and directory so that it can
find your logs.
Using the access log tool
- At your browser, just fire up your .php file.
- A request without a querystring (GET parameters) will retrieve the log for the current day
- The following querystring parameters will modify how the tool works:
- showlog.php?date=<date> returns results for the specified date
- By default, the log will not include accesses by IPs you've defined as your
own. To see your own accesses, use showlog.php?me=1
- You can override the skipping of common requests (see logconfig.ini below)
by using showlog.php?all=1
- Use showlog.php?find=<search term> to display only entries that
have the search term as part of the requested file. This setting overrides the
[Excludes] settings and all=1 parameter. Examples: find=accesslog.php
would retrieve all requests for the page you are reading; find=POST would find
all POST requests (e.g., form submittals).
- Use showlog.php?cluster=[1|2|3|4] to override the $ipcluster setting which governs
how many segments of the IP address are used to determine whether subsequent
requests are by the same person (to address those with ever-changing IPs). For
example if cluster=2, then 24.65.199.34 will be treated as the same IP as 24.65.86.12.
- If you are using the whois functionality, you may specify the whois server
to query. For example: showlog.php?whoserver=whois.ripe.net.
- These parameters may be combined, e.g. showlog.php?date=2/23&me=1&cluster=3
About the output
The tool will generate a report detailing the activity at your website. The
following points are worth noting:
- The first request by a "new" IP address (subject to the clustering
mentioned above) will generate a heading, showing the IP, the alias (if
defined in the $ips array), and the useragent string of the requester. This
string shows which browser and operating system are being used by the person
making the request.
- If the whois or host (reverse DNS lookup) function is activated (see the
bottom of this page), the IP address will be a link that pops up the DNS/whois
query for that IP. Both functions can be turned on simultaneously.
- If the request was accompanied by a valid "referred by" entry (not "-"
and not from an address within your own website), this will be shown as the
URL below the heading. The URL is a link - you can click to see where your
visitor came from.
- Subsequent requests by the same IP are shown
under the same heading. The script incorporates some buffering, so that
simulataneous accesses by several people are handled properly. Depending on
the traffic at your site, you may need to adjust the $bufsize and $timeout
parameters to get this working just right for you. The default settings
work nicely for a moderate-traffic site, especially if common files such as
small gifs are filtered out.
- Requests with error codes (anything besides 200 and 304) are shown in
bold red. Common error codes are 301/302/303, which indicate that
your request was redirected (check for spelling errors in your links), and of course
404, indicating a request for a page that could not be found. Scanning your
logs for error entries is a great way to detect small errors in your web design
that may have slipped your attention.
The zip file download contains a template logconfig.ini file, which defines a few of
the parameters that modify the script output. The file is divided into
sections, with headings in square brackets. For now, the following sections
are recognized:
- [IPs]: This section defines known (blocks of) IP addresses, for which you
want the alias to appear in the log (the numerical IP address will still be
shown as well). The alias could be the internet provider to which the IP is
assigned, or it could be the name of the person/organization who owns the IP.
Note that many IP addresses float, so that a person may visit using one IP
address one day, and another the next day - or even several IP addresses during
the same visit. Each entry in this section consists of a line with the
IP address (block) definition followed by the alias you wish to assign. You may
specify IPs or IP blocks as:
- Full four-segment IP address (only matched by identical IP address)
- Two- or three-segment IP address (matched by IP addresses starting with the same segment values
- IP address in which last segment is a range, e.g. 12.55.128.128-255 or 65.174-180
(matched by IP addresses whose segments are within the specified range.
- [Excludes]: This section defines common files, the accesses to which you
wish not to include in the log. For example, excluding small gifs or other graphics that
appear on many of your pages will unclutter your logs and make them easier
to peruse. Each line in this section simply defines an exclude string. If the
exclude string appears in any part of the request string, that particular
request will be left out. Wildcards are not supported - use ".gif" rather
than "*.gif".
- [LogLocation]: This section defines where the log viewer goes to look for
your log files. You need to define the directory and the file naming convention.
For now, conventions are defined for today's log, logs for previous days in
the current week, and logs for previous weeks in the current year. The
comments in the configuration file will help you decide how to set these
location parameters. The naming conventions in the configuration file below
reflect those of my 1and1.com server account.
- [Settings]: This section allows you to override the default settings for a
number of parameters that govern the behavior and output of the log viewer.
See the documentation in the example below for more details. A few ones that
may not be immediately obvious:
- The myips parameter sets the number of IP addresses associated with
yourself. Accesses by this IP will not appear in the log. For example, if you
set myips to 2, the first two IPs defined in the [IPs] section will be skipped.
Don't use this feature if your IP address floats in a common
block of addresses, since you may wind up skipping visits by others as well.
- The useragent_truncate parameter is the number of characters of the
user agent string to display in the log. Some user agent strings are
ridiculously long, and clutter up the log. Set this parameter to 0 to see the
whole thing.
Two additional scripts in the zip file allow you to query whois or
reverse DNS lookup information for the IP addresses that visited your site,
and add IP definition entries to logconfig.ini. To use the whois function,
you need to have some additional php (pear) components installed. If you
want to do without this feature for now, just turn whois and
host off in logconfig.ini.
Last modified:
September 13, 2004
Contact me
|