Amplifying accuracy through style consistency

Prateek Sarkar, Thomas Breuel

Abstract

Character recognition errors have been reduced by 20-50% in laboratory experiments by style conscious, style specific, and adaptive classification methods. The boost in performance is achieved by exploting knowledge that the patterns share a common style. Style specific methods are applicable when huge volumes of similar documents are to be recognized. A few documents can be used for training or tuning classifiers which can then be applied to the remaining documents. Document image decoding models can be trained and applied over a wide range of image degradations to achieve error rates of 1% or lower. Hierarchical mixture models capture strong style dependence of patterns across classes. They can be trained automatically from data without style labels. They are specially useful for classifying patterns that are too few to allow parameter tuning or adaptation. Optimal classification of long fields is now practicable due to a new algorithm. A generalization to continuous styles is achieved by a hierarchical Bayesian approach that learns hyperpriors representing a ditribution of styles. Adaptive methods cluster large samples of isogenous patterns into equivalence groups, and then assign class labels to these groups. Novel similarity measures improve classification performance.

Several methods of exploiting style consistency for better recognition have been proposed and tested at PARC. Government grants will enable the continuation of this research as well as the building of tools that can be used outside a research environment.

Download paper

PDF

Bibtex entry

@inproceedings{sarkar:sdiut2003
,author = "Prateek Sarkar and Thomas Breuel"
,title = "Amplifying accuracy through style consistency"
,booktitle = "Proceedings of 2003 Symposium on Document Image Understanding Technology"
,address = "GreenBelt, Maryland, USA"
,month = "April"
,year = "2003"
,pages = "245-252"
,http = {http://www.parc.xerox.com/istl/members/psarkar/PUBLICATIONS/SDIUT2003/download2.html}
}
Prateek Sarkar
Last modified: Wed Jan 28 15:19:21 PST 2004