Our approach proceeds from sparse to dense landmarking steps using a set of pose (yaw) and expression specific global shape and local texture models trained to best account for the variations in facial shape and texture manifested in real-world images. Histogram of Oriented Gradients (HOG) features are used to model local tetxure in our approach and the Real AdaBoost framework is used to construct local texture classifiers capabale of distuingishing the texture around correctly localized landmarks from that around incorrectly localized landmarks.


In the sparse landmark detection step, only a few key landmarks (referred to as seed landmarks), such as the centers of the eyes, the corners of the mouth, and the tip of the nose, are searched for in a sliding window based search. The sizes and locations of the windows can be easily modified to ensure tolerance to initialization from a face detection step. In addition to this, the use of local texture classifiers allows us to greedily retain the top scoring candidates from a response map. The process is repeated for each pose-specific model.


Our next step involves aligning a pose-specific canonical mean shape with candidates for two seed landmarks using a similarity transformation T. This is repeated for all combination of candidates to generate a large set of shapes. These shapes our scored using local texture based results and a single highest scoring densely aligned shape can be picked for each yaw range.


The final step is a refinement one that refines the locations of the landmarks in the highest scoring pose-specific shapes using a search strategy that is similar to that employed by Active Shape Models (ASMs). The landmarks are moved into the highest scoring (most confident) locations, based on the results obtained by the local texture classifiers. However, after this step,  we apropose the use of a novel shape (global shape that deals with the dynamics of the landmarks as a group) regularization approach that sets up this task as an l1-regularized least squares problem. The deformed shape s_def is regularized to generate a regularized shape s_reg by generating a set of coefficeints b that best represent it while simultaneously being constrained by a dictionary of training shapes D in an l1-framework and using these coefficients to weight the shapes in the dictionary.

                          Determination of seed landmark candidates
       Aligning pose-specific canonical shapes with seed landmark candidate     
                            combinations to obtain dense initial shapes
   Highest scoring pose-specific initial dense shapes that need to be refined
                                        Shape refinement process
       Sample qualitative fitting results produced by our approach on real-world
                            images from the challenging ibug dataset
Robust Facial Landmark Localization
Towards a Unified Framework for Pose,
Expression, and Occlusion Tolerant
Automatic Facial Alignment
To Appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.
author={K. Seshadri and M. Savvides},
title={Towards a {U}nified {F}ramework for {P}ose, {E}xpression, and {O}cclusion {T}olerant {A}utomatic {F}acial {A}lignment},
journal={To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},

The automatic localization of facial landmarks, also referred to as facial landmarking or facial alignment, is a key pre-processing step that is of vital importance to the carrying out of tasks such as facial recognition, the generation of 3D facial models, expression analysis, gender and ethnicity classification, age estimation, and a variety of other facial analytic tasks. Progress in all these areas of research has made it imperative to develop accurate facial alignment algorithms that can generalize well to handle simultaneous variations in pose, illumination, expression, and high levels of facial occlusion in real-world images.


We propose a facial alignment algorithm that is not only tolerant to the joint presence of real-world variations and degradations, but also provides feedback (misalignment/occlusion labels for the detected landmarks) that could be of use to subsequent stages in a facial analysis pipeline.

This avoids the generation of implausible facial shapes and results in higher landmark localization accuracies than those obtained using prior shape models. In addition to this, only the inlier (confidently localized) landmarks participate in the regularization process and outlier (potentially occluded/misaligned) landmarks do not sway the results. This allows us to deal with high levels of facial occlusion and to hallucinate the final coordinates of the occluded/misaligned landmarks in accordance with the rules for the typical facial shape by using training set exemplar based constraining.

Our approach is thoroughly evaluated and demonstrates higher landmark localization accuracies and more graceful degradation than several state-of-the-art methods on challenging real-world datasets. A detailed descroption of these results can be found in the previously mentioned paper.