ON
C OMPUTER V ISION
AND
PATTERN R ECOGNITION 2001
Rapid Object Detection using a Boosted Cascade of Simple Features
Paul Viola viola@merl.com Mitsubishi Electric Research Labs 201 Broadway, 8th FL Cambridge, MA 02139 Michael Jones mjones@crl.dec.com Compaq CRL One Cambridge Center Cambridge, MA 02142 tected at 15 frames per second on a conventional 700 MHz Intel Pentium III. In other face detection systems, auxiliary information, such as image differences in video sequences, or pixel color in color images, have been used to achieve high frame rates. Our system achieves high frame rates working only with the information present in a single grey scale image. These alternative sources of information can also be integrated with our system to achieve even higher frame rates. There are three main contributions of our object detection framework. We will introduce each of these ideas briefly below and then describe them in detail in subsequent sections. The first contribution of this paper is a new image representation called an integral image that allows for very fast feature evaluation. Motivated in part by the work of Papageorgiou et al. our detection system does not work directly with image intensities [10]. Like these authors we use a set of features which are reminiscent of Haar Basis functions (though we will also use related filters which are more complex than Haar filters). In order to compute these features very rapidly at many scales we introduce the integral image representation for images. The integral image can be computed from an image using a few operations per pixel. Once computed, any one of these Harr-like features can be computed at any scale or location in constant time. The second contribution of this paper is a method for constructing a classifier by selecting a small number of important features using AdaBoost [6]. Within any image subwindow the total number of Harr-like features is very large, far larger than the
References: [1] Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and tree classifiers, 1997. [2] Anonymous. Anonymous. In Anonymous, 2000. 8 [16] K. Sung and T. Poggio. Example-based learning for viewbased face detection. In IEEE Patt. Anal. Mach. Intell., volume 20, pages 39–51, 1998. [17] J.K. Tsotsos, S.M. Culhane, W.Y.K. Wai, Y.H. Lai, N. Davis, and F. Nuflo. Modeling visual-attention via selective tuning. Artificial Intelligence Journal, 78(1-2):507–545, October 1995. [18] Andrew Webb. Statistical Pattern Recognition. Oxford University Press, New York, 1999. 9