| home > current research > LumberJack | ||||
| LumberJack | ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|||
|
project goal LumberJack is a prototype service that is designed to analyze web usage logs. It is designed to be a push-button analysis system that is both more automated and accurate than past systems. Web Usage Mining enables new understanding of user goals on the Web. This understanding has broad applications, and traditional mining techniques such as association rules have been used in business applications. We have developed an automated method to directly infer the major groupings of user traffic on a Web site [Heer01]. We do this by utilizing multiple data features in a clustering analysis. We have performed an extensive, systematic evaluation of the proposed approach, and have discovered that certain clustering schemes we developed can achieve categorization accuracies as high as 99% [Heer02b]. |
![]() |
|||
| publications | related projects | people | commercialization | ||||
| description | ||||
|
Web Mining techniques build user profiles by combining users navigation paths with other data features, such as page viewing time, hyperlink structure, and page content [Heer01, Srivastava00]. While the specific techniques vary [Shahabi97, Fu99, Banerjee01, Heer01], the end goal is the same: to create groupings of user sessions that accurately categorize the sessions according to the users information needs. There are two major issues with existing approaches. First, most approaches examine one or two combinations of data features for clustering the user sessions. What is needed is an approach that allows for any of the data features to be used, as the situation dictates. LumberJack solves this problem using the method described in the Figure below:
Second, in the literature, each techniques validation is conducted on a different Web site, making it extremely difficult to compare the results. We have no basis from which to choose one data feature over another. Whats worse is that, since only user traces are used, there is no way of knowing a priori what the true user information need is for each user session. So we had no way of knowing whether the algorithms performed correctly and clustered the sessions into appropriate groupings. Evaluation method is described in Figure below:
The LumberJack
project solves both of these issues, and we applied LumberJack in case
studies in the real world. LumberJack integrates all information into
a single report that analysts can use in the field to accurately gain
an understanding of the overall traffic patterns at a site. |
||||
| publications | ||||
|
Ed H. Chi, Adam S. Rosien, Jeffrey Heer. LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition. In Proc. ACM-SIGKDD Workshop on Web Mining for Usage Patterns and User Profiles (WebKDD 2002), pp. --. ACM Press, July 2002. Edmonton, Canada. Jeffrey Heer, Ed H. Chi. Separating the Swarm: Categorization Methods for User Access Sessions on the Web. In Proc. of ACM CHI 2002 Conference on Human Factors in Computing Systems, pp. 243--250. ACM Press, April 2002. Minneapolis, MN. Jeffrey Heer, Ed H. Chi. Mining the Structure of User Activity using Cluster Stability. In Proceedings of the Workshop on Web Analytics, Second SIAM Conference on Data Mining. April, 2002. Arlington, VA. Jeffrey Heer,
Ed H. Chi.
Identification of Web User Traffic Composition using Multi-Modal
Clustering and Information Scent. In Proceedings of the Workshop
on Web Mining, SIAM Conference on Data Mining, pp. 51--58. April 7th,
2001. Chicago, IL. |
||||
|
|
||||
| related projects | ||||
|
|
||||
| people | ||||
| commercialization | ||||
|
LumberJack
is available for licensing. Please contact: |
||||