home > publications >
  publications












Abstracts for Royer, Christiaan

Modeling Informattion Scent: A Comparison of LSA, PMI-IR, and GLSA Similarity Measures on Common Test and Corpora
In this paper we describe a comparison among three systems that estimate semantic similarity between words: Latent Semantic Analysis [6], Pointwise Mutual Information [17], and Generalized Latent Semantic Analysis [8]. We compare all these techniques on a unique corpus (TASA) and, for PMI and GLSA, we also report performance on a different web-based corpus. The evaluation is carried out through two kinds of tests: (1) synonymy tests, and (2) comparison with human word similarity judgments.
Budiu, R., Royer, C. and Pirolli, P. (2006).
CHI 2006. [PDF]
Document representation with Generalized Latent Semantic Analys
Methods for dimensionality reduction, notably LSI, have been successfully applied to the information retrieval task and document classification on small document collections. Since they involve a computation of the eigenvalue or singular value decomposition of a document-term matrix, their use for large real world applications is somewhat limited. In addition to it, the information about the term similarity that these methods can use is inferred only from the current document collection. We present an algorithm that computes a low dimensional vector space representation of documents using point-wise mutual information as a term similarity measure. Point-wise term similarity can be computed using any additional resources, such as the Web. Our method uses the term by term matrix and can therefore be applied to large document collections. Experimental results show a considerable performance improvement on the information retrieval tasks.
Matveeva, I., Farahat, A. and Royer, C. (2005).
Conference 0n Research and Development in Information Retrieval (SIGIR 2005). [PDF]
Document representation with Generalized Latent Semantic Analysis
Methods for dimensionality reduction, notably LSA, have been successfully applied to the information retrieval task and document classification. Recently, corpus-based association measures such as point-wise mutual information have been found to outperform LSA on a variety of tasks. We have developed an algorithmic framework that computes a low-dimensional vector space representation of documents combining different measures of association with different dimensionality reduction techniques. Experimental results show a competitive performance on the synonymy and text classification tasks.
Matveeva, I., Farahat, A. and Royer, C. (2005).
16th European Conference on Machine Learning. [PDF]
Terms and document representation with generalized latent semantic analysis
Document indexing and representation of term-document relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by recent success of co-occurrence based measures of semantic similarity obtained from very large corpora. Our experiments demonstrate that GLSA term vectors efficiently capture semantic relations between terms and outperform related approaches on the synonymy test. We also show that termbased document representation improves performance on the document classification test.
Royer, C., Matveeva, I. and Farahat, A. (2005).
EMNLP. [PDF]
Term representation with generalized latent semantic analysis
Document indexing and representation of term document relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by recent success of co-occurrence based measures of semantic similarity obtained from very large corpora. Our experiments demonstrate that GLSA term vectors efficiently capture semantic relations between terms and outperform related approaches on the synonymy test.
Farahat, A., Matveeva, I. and Royer, C. (2005).
RANLP (Recent Advances in Natural Language Processing). [PDF]
GLSA Server @ PARC
No Abstract Available
Royer, C., Farahat, A., Pirolli, P. and Budiu, R. (2005).
Twelfth Annual ACT-R Workshop. [PPT]
Log-based Longitudinal Study Finds Window Thrashing
Although large displays are becoming more cost effective, most user interfaces are optimized for a single monitor of modest size even though many traditional workspaces such as desks and workbenches are much larger and some studies have found benefits from large displays. This paper explores whether a single monitor is sufficient for information work using standard software. A log-based longitudinal field study finds that most of the time a single monitor allows skilled information analysts to have a reasonable pattern of window activity. However, a novel visualization of the data shows that windows typically fill the monitor and the pattern is occasionally interrupted by window thrashing, the rapid manipulation of windows caused by limited display resource. Given these findings, we identify some common tasks that justify the development and the expense of wideband visual interfaces that are optimized for larger displays.
Mackinlay, J. D. and Royer, C. (2004).
. [PDF]
Wideband Visual Interfaces: Sensemaking on Multiple Monitors
Although vendors have made multiple-monitor systems for many years, our interfaces have been stuck in a 30-year old windows paradigm focused on displays much smaller than the desktops we use when working with paper. Advances in flat panel displays and graphics cards now enable affordable personal computers with 6-8 monitors and may someday eliminate seams. This paper argues that vendors should be developing wideband visual interfaces that are designed for displays that fill the human visual field. We describe a longitudinal field study of window activity that found that windows almost always filled a typical single monitor display and that subjects occasionally struggled with window thrashing when they needed to work with two or more windows at the same time. Vendors need not wait for affordable seamless wideband displays before addressing these findings. We have implemented several novel user interface techniques for creating seam-aware applications that target wideband displays based on multiple monitors.
Mackinlay, J. D., Heer, J. and Royer, C. (2003).
Technical Report. [PDF]
The Bloodhound Project: Automating Discovery of Web Usability Issues using the InfoScent™ Simulator
According to usability experts, the top user issue for Web sites is difficult navigation. We have been developing auto-mated usability tools for several years, and here we describe a prototype service called InfoScent™ Bloodhound Simula-tor, a push-button navigation analysis system, which auto-matically analyzes the information cues on a Web site to produce a usability report. We further build upon previous algorithms to create a method called Information Scent Absorption Rate, which measures the navigability of a site by computing the probability of users reaching the desired destinations on the site. Lastly, we present a user study involving 244 subjects over 1385 user sessions that show how Bloodhound correlates with real users surfing for in-formation on four Web sites. The hope is that, by using a simulation of user surfing behavior, we can reduce the need for human labor during usability testing, thus dramatically lower testing costs, and ultimately improving user experience. The Bloodhound Project is unique in that we apply a concrete HCI theory directly to a real-world prob-lem. The lack of empirically validated HCI theoretical model has plagued the development of our field, and this is a step toward that direction.
Chi, E. H., Rosien, A., Supattanasiri, G., Williams, A., Royer, C., Chow, C., Robles, E., Dalal, B., Chen, J. and Cousins, S. (2003).
CHI 2003, Fort Lauderdale, FL. [PDF]