home > current research > LumberJack
LumberJack 













project goal

LumberJack is a prototype service that is designed to analyze web usage logs. It is designed to be a push-button analysis system that is both more automated and accurate than past systems. Web Usage Mining enables new understanding of user goals on the Web. This understanding has broad applications, and traditional mining techniques such as association rules have been used in business applications. We have developed an automated method to directly infer the major groupings of user traffic on a Web site [Heer01]. We do this by utilizing multiple data features in a clustering analysis. We have performed an extensive, systematic evaluation of the proposed approach, and have discovered that certain clustering schemes we developed can achieve categorization accuracies as high as 99% [Heer02b].

                                                 publications  |  related projects  |  people  |  commercialization
 description


The Web has become part of the fabric of our society, and accordingly we have an increasing need to understand the activities and goals of web users. We can improve nearly every aspect of the user experience on a Web site by understanding the users’ goal and traffic composition.
Traditional user activity analysis methods such as user surveys are labor-intensive and intrusive when employed daily, slow to apply to on-the-fly Web personalization, and inaccurate due to surveying response inconsistency. Instead, what is needed is an automated means of directly mining the Web server logs for groupings of significant user activities.

Web Mining techniques build user profiles by combining users’ navigation paths with other data features, such as page viewing time, hyperlink structure, and page content [Heer01, Srivastava00]. While the specific techniques vary [Shahabi97, Fu99, Banerjee01, Heer01], the end goal is the same: to create groupings of user sessions that accurately categorize the sessions according to the users’ information needs.

There are two major issues with existing approaches. First, most approaches examine one or two combinations of data features for clustering the user sessions. What is needed is an approach that allows for any of the data features to be used, as the situation dictates. LumberJack solves this problem using the method described in the Figure below:

Second, in the literature, each technique’s validation is conducted on a different Web site, making it extremely difficult to compare the results. We have no basis from which to choose one data feature over another. What’s worse is that, since only user traces are used, there is no way of knowing a priori what the true user information need is for each user session. So we had no way of knowing whether the algorithms performed correctly and clustered the sessions into appropriate groupings. Evaluation method is described in Figure below:


The LumberJack project solves both of these issues, and we applied LumberJack in case studies in the real world. LumberJack integrates all information into a single report that analysts can use in the field to accurately gain an understanding of the overall traffic patterns at a site.

 publications

Ed H. Chi, Adam S. Rosien, Jeffrey Heer. LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition. In Proc. ACM-SIGKDD Workshop on Web Mining for Usage Patterns and User Profiles (WebKDD 2002), pp. --. ACM Press, July 2002. Edmonton, Canada.

Jeffrey Heer, Ed H. Chi. Separating the Swarm: Categorization Methods for User Access Sessions on the Web. In Proc. of ACM CHI 2002 Conference on Human Factors in Computing Systems, pp. 243--250. ACM Press, April 2002. Minneapolis, MN.

Jeffrey Heer, Ed H. Chi. Mining the Structure of User Activity using Cluster Stability. In Proceedings of the Workshop on Web Analytics, Second SIAM Conference on Data Mining. April, 2002. Arlington, VA.

Jeffrey Heer, Ed H. Chi. Identification of Web User Traffic Composition using Multi-Modal Clustering and Information Scent. In Proceedings of the Workshop on Web Mining, SIAM Conference on Data Mining, pp. 51--58. April 7th, 2001. Chicago, IL.

                                                                                                        more publications


                                                     LumberJack in the news

Understanding user interaction, key to improving Web info retrieval. 11/02 Fall COMDEX 2002                                                                                                                                                         Magazine
Forrester Report, Building A Better Automotive Web Site. 11/02 Forrester Report

 related projects

  
 BloodHound
 ScentTrails
 Information Scent
 IUNIS
 WUFIS
 Webology

  people


  Ed Chi,

 commercialization

  LumberJack is available for licensing. Please contact: