Random Forests-based Active Appearance Models
Gabriele Fanelli, Matthias Dantone, Luc Van Gool
Computer Vision Laboratory, ETH Zurich,
Sternwartstrasse 7, 8092 Zurich, Switzerland
{fanelli/dantone/vangool}@vision.ee.ethz.ch
Abstract— Many desirable applications dealing with automatic face analysis rely on robust facial feature localization.
While extensive research has been carried out on standard 2D imagery, recent technological advances made the acquisition of 3D data both accurate and affordable, opening new ways to more accurate and robust algorithms. We present a modelbased approach to real time face alignment, fitting a 3D model to depth and intensity images of unseen expressive faces. We use random regression forests to drive the fitting in an Active
Appearance Model framework. We thoroughly evaluated the proposed approach on publicly available datasets and show how adding the depth channel boosts the robustness and accuracy of the algorithm.
I. INTRODUCTION
Future human-computer interfaces will likely use vision to understand the user’s movements and commands. In gaming, systems like Microsoft Kinect can already track body movements; however, facial movements also carry a great deal of important information in human-human communication and future smart interfaces should be able to interpret nods, facial expressions, or recognize the identity of the user. A necessary step for virtually all applications based on automatic face analysis is the localization of key features like eyes, nose, mouth, eyebrows, and face contours.
Most of the works in the literature so far have focused on standard images and videos, which present still unsolved challenges like changes in illumination conditions or the lack of texture in some areas of the face.
Today, both accurate [1] and affordable depth-sensing devices (like MS Kinect) are available, providing a valuable new cue for overcoming the above problems