WORLD WIDE WEB SITE VISITOR STUDIES TECHNIQUES:

USING SERVER LOG FILE DATA

By

Randy Michael Russell

russellr@msu.edu
517-432-0711

A DOCTORAL DISSERTATION

Department of Counseling, Educational Psychology and Special Education
Michigan State University

1998

 

Committee:

W. Patrick Dickson, Chairman

Carrie Heeter

Richard McLeod

Leighton Price

 

ABSTRACT

 

The World Wide Web has grown at a phenomenal rate. Much effort has been devoted to creating Web sites, including ones intended for educational use. Efforts to study the effectiveness of such materials have not, however, kept pace with site development efforts. Educators need tools to evaluate the effectiveness and influence of Web sites. Site developers need techniques to apply to formative evaluations of sites still under construction. Such techniques must allow researchers to produce results quickly, since the findings of many traditional approaches to educational research could be rendered obsolete prior to dissemination due to the rapid pace of evolution of the Web. Such methods of gathering formative feedback should also be straightforward enough to appeal to the many site developers who do not view themselves primarily as educational researchers.

The present study built upon methods used in museum visitor studies. Museum visitor studies researchers often use the time visitors spend viewing displays as a proxy indicator of the amount such visitors likely learned from those displays. Similarly, educational researchers have found correlations between students' "time on task" and learning outcomes. It would be useful to be able to measure "time on page" or "site visit durations" for visitors to Web sites. Such data could form the basis for determining whether correlations between Web site viewing times and learning exist.

This study used file request records stored in a Web server's log file as a source of data for studying site visitor behaviors and trends. Such data is automatically recorded for all file requests by the Web server software, and is thus very simple to collect. These data were analyzed and displayed using inexpensive and easy-to-use server log analysis software, standard spreadsheet and graphing programs, and common database filtering and sorting techniques. Reports showing long term trends in page view and visitor counts for an entire site were created. Distributions of page views by time, site sections, network addresses, and other categories for a selected "typical" week were examined. Finally, detailed records of visit "paths" through the site and of visit durations for a smaller group of site visitors during that case study week were analyzed.

Server log data was found to be inadequate for accurately monitoring visit durations, largely because of gaps in the data record caused by caching of pages by visitors' browsers. Attempts to test correlations between "time on page" and learning outcomes should seek other means to monitor visit durations. Many of the methods employed in this study are, however, suitable for establishing broad-brush overviews of site usage trends, and supply useful data with minimal resource expenditures. The basic research techniques used here are scalable; evaluators can dig deeper into the data to uncover greater detail in a flexible, adaptable way. These methods can produce results in a short time, which is more suitable to the rapidly evolving Web than many traditional approaches to educational research. The methods used in this study are simple enough to be adopted by developers who are not primarily researchers. They provide information which developers can use to fine-tune ongoing site development, and lead to insights which might not be evident without such a formal approach to the study of a site's impact.