Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

paraFaceTest: an ensemble of regression tree-based facial features extraction for efficient facial paralysis classification

paraFaceTest: an ensemble of regression tree-based facial features extraction for efficient... Background: Facial paralysis (FP) is a neuromotor dysfunction that losses voluntary muscles movement in one side of the human face. As the face is the basic means of social interactions and emotional expressions among humans, individuals afflicted can often be introverted and may develop psychological distress, which can be even more severe than the physical disability. This paper addresses the problem of objective facial paralysis evaluation. Methods: We present a novel approach for objective facial paralysis evaluation and classification, which is crucial for deciding the medical treatment scheme. For FP classification, in particular, we proposed a method based on the ensemble of regression trees to efficiently extract facial salient points and detect iris or sclera boundaries. We also nd employ 2 degree polynomial of parabolic function to improve Daugman’s algorithm for detecting occluded iris boundaries, thereby allowing us to efficiently get the area of the iris. The symmetry score of each face is measured by calculating the ratio of both iris area and the distances between the key points in both sides of the face. We build a model by employing hybrid classifier that discriminates healthy from unhealthy subjects and performs FP classification. Results: Objective analysis was conducted to evaluate the performance of the proposed method. As we explore the effect of data augmentation using publicly available datasets of facial expressions, experiments reveal that the proposed approach demonstrates efficiency. Conclusions: Extraction of iris and facial salient points on images based on ensemble of regression trees along with our hybrid classifier (classification tree plus regularized logistic regression) provides a more improved way of addressing FP classification problem. It addresses the common limiting factor introduced in the previous works, i.e. having the greater sensitivity to subjects exposed to peculiar facial images, whereby improper identification of initial evolving curve for facial feature segmentation results to inaccurate facial feature extraction. Leveraging ensemble of regression trees provides accurate salient points extraction, which is crucial for revealing the significant difference between the healthy and the palsy side when performing different facial expressions. Keywords: Facial paralysis classification, Facial paralysis objective evaluation, Ensemble of regression trees, Salient point detection, Iris detection, Facial paralysis evaluation framework Background every sixty people around the world, one of them can Facial paralysis (FP) or facial nerve palsy is a neuromo- be affected by facial paralysis [1]. Facial paralysis often tor dysfunction that losses voluntary muscles movement causes patients to be introverted and eventually suffer in one side of the human face. As a result, this leads to from social and psychological distress, which can be even the loss of the person’s ability to mimic facial expres- more severe than the physical disability [2]. It is usually sions. FP is not an uncommon medical condition. For encountered in clinical practices, which can be classi- fied into peripheral and central facial palsy [3, 4]. These *Correspondence: kangj@korea.ac.kr two categories differ from each other according to the Department of Computer Science and Engineering, Korea University, Seoul, behavior of the upper layer of the human face. Periph- South Korea Full list of author information is available at the end of the article eral facial palsy is a nerve disturbance in the pons of © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 2 of 14 the brainstem, which affects the upper, middle and lower Dong et al. [11] proposed the use of salient points for facial muscles of one side of the face. On the other hand, degree of facial paralysis estimation. A method was pro- central facial palsy is a nerve dysfunction in the cor- posed to detect the salient points that will be the basis for tical areas whereby the forehead and eyes are spared, the estimation of the degree of paralysis. As the salient but the lower half of one side of the face is affected, points may include some unnecessary points for describ- unlike in peripheral FP [3–5]. Such scenario has trig- ing facial features, edge detection was used to discard gered the interest of researchers and clinicians of this these points. Then K-means clustering is applied to clas- field, and, consequently, led them to the development of sify these salient points into six categories: 2 eyebrows, 2 objective grading facial functions and methods in mon- eyes, nose, and mouth. There are about 14 key points are itoring the effect of medical, rehabilitation or surgical found in the six facial regions, which would represent the treatment. points that may be affected when performing some facial Many computer-aided analysis systems have been expressions. However, this technique falls short when introduced to measure dysfunction of one part of the applied to elder patients, in which exact points can be face and the level of severity, but not much on facial difficult to apply [12]. paralysis type as the classification problem. Moreover, Another method was proposed that estimates the the efficiency of the method used for it to be univer- degree of facial paralysis by comparing multiple regions sally accepted is still underway [3]. Classification of each on human face. In [13], Liu et al. perform comparison case of facial paralysis into central or peripheral plays of the two sides of the face and compute the four ratios, a critical role in helping physicians to decide for the which are consequently used to represent the severity of most appropriate treatment scheme to use [4]. Image the paralysis. Nevertheless, this method suffers from the processing has been applied in the existing objective influence of uneven illumination [8]. facial paralysis assessment, but the processing methods In our previous work [4], a technique that generates used are mostly labor-intensive, if not; suffer from the closed contours for separating outer boundaries of an sensitivity to the extrinsic facial asymmetry caused by object from background using Localized Active Con- orientation, illumination and shadows. As such, creating tour (LAC) model for feature extraction was employed, a clinically usable and reliable method is still very chal- which reasonably reduced these drawbacks. However, lenging. Wachtman et al. [6, 7] measured facial paralysis one limiting factor of the method introduced is that it by examining the facial asymmetry on static images. has a greater sensitivity to subjects exposed to pecu- Their methods are prone to facial asymmetry even for liar images where facial features suffer from different healthy subjects due to sensitivity to orientation, illumi- occlusions (e.g. eyeglasses, eyebrows occluded with hair, nation and shadows [8]. Other previous works [9–11] wrinkles, excessive beard, etc.). Moreover, improper set- were also introduced but most of them are solely based ting or identification of parameters, such as the radius of on finding salient points on human face with the use the initial evolving curve (e.g. minimum-bounding box of the standard edge detection tool (e.g. Canny, Sobel, of the eyes, eyebrows, etc.) may lead to improper fea- SUSAN) for image segmentation. tures segmentation, which may in turn gives inaccurate Canny edge detection algorithm may yield to inaccu- detection of key points as revealed in Fig. 1. Although rate results as it influences connected edge points. This such limitation was addressed in [4] by applying a win- is because it does comparisons of the adjacent pixels on dow kernel (i.e. generated based on Otsu’s method [14]) the gradient direction to determine if the current pixel that is run through the binary image (e.g. detected eyes, has a local maximum. Improper segmentation will also eyebrows,lip,etc.),Otsu’smethodhas some drawbacks result to improper generation of key points. Moreover, of the assumption of uniform illumination. Additionally, it may be difficult to find and detect the exact points it does not use any object structure or spatial coherence, when the same algorithm is applied to elder patients. which may sometimes result to inaccurate generation of Fig. 1 Pre-processing results. a eye image with some uneven illumination, b-c extracted eye contour by LAC model when parameters of initial evolving curves are not properly identified Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 3 of 14 kernel paramaters that may in turn result to improper (i.e. still photos) of the patients with a front view face segmentation. and with reasonable illumination of lights so that each In order to address these drawbacks, we present a side of the face achieves roughly similar amount of light- novel approach based on ensemble of regression trees ing. The photo taking procedure starts with the patient (ERT) model. We leverage ERT model to extract salient performing the’at rest’faceposition, followedbythe points as basis for preparing the features or independent three voluntary facial movements that include raising of variables. Our target was to keep a natural framework eyebrows, screwing-up of nose, and showing of teeth or that finds 68 key points on edges of facial features (e.g. smiling. eyes, eyebrows, mouth, etc.); and regresses the loca- Theframework of theproposedobjective facial paraly- tion of the facial landmarks from a sparse subset of sisevaluationsystemispresented in Fig. 2.Facialimages intensity values extracted from an input image. As ERT of a patient, which are taken while being requested to model provides a simple and intuitive way in perform- perform some facial expressions, are stored in the image ing this task and in generating landmarks that separates database. In the rest of this section, we describe the boundaries of each significant facial feature from the details of the form of individual components of the facial background, without requiring setting-up of parameters landmark detection and how we perform evaluation for initial evolving curves as required by the previ- and classification of facial paralysis. To begin the pro- ous approach [4], we find this technique appropriate cess, raw images from the image database are extracted to address our problem. Furthermore, empirical results or retrieved. Dimension alignment and image resizing [15] reveal that ERT model does have an appealing qual- are then performed, followed by the pre-processing of ity that performs shape invariant feature selection while images to enhance contrast and remove undesirable minimizing the same loss function during training and image noise. test time, which significantly lessens time complexity of extracting features. Facial Image Acquisition and Pre-processing In this study, we make three significant contributions. Existing literatures show that facial data acquisition First, we introduce a more robust approach for efficient methods can be classified into two categories depending evaluation of facial paralysis and classification, which is on the processing methods they used and on the usage of crucial for determining the appropriate medical treat- images or videos as their databases. As image-based sys- ment scheme. Second, we provide an efficient method in tems do have an appealing advantage of ease of use and feature extraction based on ensemble of regression trees cost effectiveness [16], we utilize still images as inputs to in a computationally efficient way; and finally, we study our classification model. in depth, the effect of computing the facial landmarks symmetry features on the quality of facial paralysis clas- Face Normalization sifications. Facial feature extractions can be complicated as the face appearance changes, which is caused by illumination and Methods pose variations. Normalizing the acquired facial images The study performs objective evaluation of facial paral- prior to objective FP assessment may significantly reduce ysis particularly the facial classification and grading in a these complications. It is worth note taking, however, computationally efficient way. We capture facial images that while face normalization may be a good approach in Fig. 2 Framework of the proposed objective facial paralysis evaluation Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 4 of 14 connection with the methods for objective facial paraly- the pose estimator employed in [15]. In the context of sis assessment, it is but optional, so long as extracted fea- computer vision, a sliding window is a rectangular region ture parameters are normalized prior to the classification of fixed width and height that slides across an image, task. as described in the subsequent section. Normally, for each window, we take the window region and apply an Facial Feature Extraction image classifier to check if the window has an object (i.e. the face as our region of interest). Combined with Deformation Extraction image pyramids, image classifiers can be created that Facial features deformation can be categorized by the changes of texture and shape that lead to high spa- can recognize objects even if the scales and locations in tial gradients, which are good indicators for tracking the image vary. These techniques, in its simplicity, play facial actions [17]. In turn, these are good indicators for an absolutely critical role in object detection and image facial asymmetry and may be analyzed either in image classification. or spatial frequency domain, which can be computed by Features are extracted from the different facial expres- high-pass gradient or Gabor wavelet-based filters. Ngo sions that the patient will perform, which include: (a) et al. [18] utilized this method by combining it with at Rest; (b) Raising of eyebrows; (c) frowning or screw- local binary patterns and claimed to perform well for the ing up of nose; and (d) Smiling or showing of teeth. In task of quantitative assessment of facial paralysis. Defor- this study, we consider geometric and region-based fea- mation extraction approaches can be holistic or local. tures as our inputs for modelling a classifier. To extract Holistic image-based approaches are those where face is the geometric-based features, the salient points of the processed as a whole, while local methods focused on facial features (e.g. eyes, eyebrows and lip) are detected facial features or areas that are proned to change and are using the concept of ensemble of regression trees [15]. able to detect subtle changes in small areas [17]. In this The goal is to find the 68 key points on edges of facial work, we extract the facial salient points and iris area for features as shown in Fig. 3a. However, for generating geometric-based and region-based features generation, the geometric-based symmetry features of each image, respectively. we are interested in the following points: inner canthus (IC), mouth angle (MA), infra orbital (IO), upper eye- Salient points detection lids (UE), supra-orbital (SO), nose tip (NT) and nostrils Feature extraction starts from the preprocessing of the (see Fig. 3b). In order to detect the salient points of the input image and facial region detection. We find frontal facial image, we leverage the dlib library [15]thathas the capability to detect points of each facial features. These human faces in an image and estimate their pose. Such points are inputs for calculating the distance of the key estimated pose takes the facial form of 68 landmarks that points as identified in Fig. 4, and the corresponding dis- will lead to the detection of the key points of the mouth, eyebrows and eyes. The face detector is made using the tance ratio for both sides of the face. Additionally, we classic Histogram of Oriented Gradients (HOG) feature also extract region-based features, which involves detec- [19] combined with a linear classifier, an image pyra- tion of iris or sclera boundaries by leveraging Daugman’s mid, and sliding window detection scheme. We utilize Integro-Differential Operation [20]. Fig. 3 Facial landmarks or key points. a 68 key points, and b salient points for each facial feature utilized in this study Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 5 of 14 Fig. 4 Facial expressions used in the study. a at rest; b raising or lifting of eyebrows; c screwing-up of nose; and d smiling or showing of teeth Histogram of Oriented Gradients(HOG) 3. Block overlapping In the field of computer vision and image processing, 4. Normalizing parameters Histogram of Oriented Gradients (HOG) is a feature However, the recommended values for the HOG descriptor used for object or face detection. It is an algo- parameters include: (a)1D centered derivative mask [-1, rithm, which takes an image and outputs feature vectors 0, +1]; (b) Detection window size of 64x128; (c) Cell size or feature descriptors. Feature descriptors encode inter- of 8x8; and Block size of 16x16 (2x2 cells) [21]. esting information into a series of numbers and act as a sort of numerical fingerprint, which can differentiate one Ensemble of Regression Trees feature from another. Ideally, this information is invari- An ensemble of regression tree is a predictive model, ant under image transformation; hence, even if the image which composes a weighted combination of multiple is transformed in some way, we can still find the fea- regression trees. Generally, combining multiple regres- ture again. HOG uses locally normalized histogram of sion trees increases the predictive performance. It is with gradient orientation features similar to Scale Invariant the collection of regression trees that makes a bigger and Feature Transform (SIFT) descriptor in a dense overlap- better regression tree [15]. The core of each regression ping grid, which gives very good results in face detection. function r is the tree-based regressors fit to the residual It is similartoSIFT, except that HOGfeature descriptors targets during the gradient boosting algorithm. are computed on a dense grid of uniformly spaced cells and they use overlapping local contrast normalization Shape invariant split tests for improved performance [19, 21]. Based on the approach used by Kazemi et al. [15], we Implementation of the HOG descriptor algorithm is as make a decision based on thresholding the difference follows: between the intensities of two pixels at each split node in the regression tree. When defined in the coordinate 1. Partition the image into small connected regions system of the mean shape, the pixels used in the test are called cells, then for each cell, calculate the gradient at positions i and k. For facial images having arbitrary directions histogram for the pixels within the cell. shapes, we intend to index the points that have the same 2. According to the gradient orientation, discretize position relativetoits shape, as i and k have to the mean each cell into angular bins. Each pixel of the cell shape. To accomplish this task, the image can be warped contributes weighted gradient to its corresponding to the mean shape based on the current shape estimate angular bin. Adjacent group of cells are considered before extracting the features. Warping the location of as spatial regions, referred to as blocks. The points is more efficient rather than the whole image, grouping of cells into a block is then the basis for since we only utilize a very sparse representation of the histograms grouping and normalization. image. In what follows, the details are precisely shown. Normalized group of histograms represents the We let vi be the index of the facial landmark in the mean block histogram. The set of these block histograms shape that is nearest to i and its offset from i is defined represents the descriptor. as: The following basic configuration parameters are δY = i − Y (1) i vi required for computation of the HOG descriptor: 1. Masks to compute derivatives and gradients. Then for a shape S defined in image I , the position in j j 2. Geometry of splitting an image into cells and I that is qualitatively similar to i in the mean shape image grouping cells into a block. is given by Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 6 of 14 inner canthus (IC) and outer canthus (OC)) detected by i = Y + R δY (2) j,vi i ensemble of regression trees algorithm and from there, we generate the parameters of the eye region as inputs where S and R are the scale and rotation matrix of the j j to Daugman’s Integro Differential operator [20]todetect similarity transform which transforms S to S,the mean the iris sclera or boundaries. However, as some eye shape. The scale and rotation are found to minimize images do have eyelid occlusions, optimization of Daug- man’s algorithm was performed to ensure proper iris || Y − R Y + t || (3) m j j j,m j boundaries detection. In our previous work, LAC-based m=1 method was employed to optimize Daugman’s algorithm the sum of squares between the mean shape’s facial and perform subtraction method. However, LAC model landmark points, Y ’s ,and thoseofthe warped shape. is quite tedious as it may result to improper segmenta- k is similarly defined. Formally each split is a decision tion if the initial evolving curves and iterations are not involving 3 parameters θ = (τ, i, k) and is applied to each properly defined. Moreover, it has a greater sensitivity to training and test example as theeyesofold patients duetothe presence of excessive wrinkles. i − k >τ Iπ ( ) Iπ ( ) (t) j j h , S , θ = (4) In this paper, we implement the curve fitting scheme Iπ j j 0 Otherwise to ensure proper detection of iris sclera or boundaries, where i and k are defined using the scale and rotation thereby providing better results in detecting occluded (t) matrix which best warp S to S according to equation iris. By definition, a parabola is a plane curve. It is said to (2). be mirror-symmetrical and is approximately U-shaped. In practice, during the training phase, the assignments To fit a parabolic profile through the data, a second and local translations are identified. Computing the sim- degree polynomial is used, defined as: y = ax + bx + c ilarity transform, at test time, the most computationally This will exactly fit a simple curve to the three con- expensive part of the process is only done once at every straints, which include a point, angle, or curvature. More level of the cascade. often, the angle and curvature constraints are added to This method starts by using the following: the end of a curve. In this case, they are called end con- ditions. To fit a curve through a data, we first get the 1. A training set of labeled facial landmarks on an coefficients b and c of the line p(x) = bx + c. To evaluate image. These images are manually labeled, this line, values of x (e.g. given by xp) must be chosen. specifying specific (x, y)-coordinates of regions For example, the curve will be plotted for x belongs to [0, surrounding each facial structure. 7] in steps of delta x = 0.1. The generated coefficients will 2. Priors, or more specifically, the probability on then be used to generate the y values of the polynomial fit distance between pairs of input pixels. at the desired values of x given by xp. This means that the vector ( denoted as yp) can be generated, which contains Given this training data, ensemble of regression trees the parabolic fit to the data evaluated at the x-values xp. are trained to estimate the positions or locations of facial In what follows, we describe our approach for detect- landmarks directly from the pixel intensities, that is, no ing the iris sclera or boundaries: feature extraction is taking place. The final result is a detector of facial landmarks that can be utilized to effi- 1. Detect eye region (i.e. creating a rectangular ciently detect salient points of an image. Fig. 5 presents bounding box from the detected UE,IO, IC, OC as sample results of our proposed approach for facial fea- illustrated in Figure 3b) tures detection based on ensemble of regression trees 2. Detect upper eyelid edge using gray thresh level (ERT) model when applied in JAFFE dataset [22, 23]. (e.g. threshold <= 0.9) 3. Convert generated grayscale image to binary. Region-based Feature Extraction 4. Find significant points of the upper eyelid. Iris Detection A person who has a symptom of facial paralysis is likely Traverse each column and find the first row to have asymmetric distance between the infra orbital whose pixel value=1 and whose location (IO) and mouth angle (MA) of both sides of his face while variance (i.e. row address of the last 4 pixels) is performing facial movements such as smile and frown. minimal within the threshold. We call it vector They may also have an unbalanced exposure of iris when A. performing different voluntary muscle movements (e.g. 5. Implement curve fitting through the generated data screwing of nose, showing of teeth or smiling, raising of points using parabolic profile (i.e. second degree of eyebrows with both eyes directed upward) [4]. We uti- polynomial). We refer this as A’. lize the points (e.g. upper eyelid (UE), infra orbital (IO), Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 7 of 14 Fig. 5 Sample results of our proposed approach for facial feature extraction based on Ensemble of Regression Trees (ERT) model as applied in JAFFE dataset [22]. Facial landmarks (i.e. 68 key points) are detected from a single image, based on ERT algorithm 6. Detect iris using Daugman’s integro-differential computing the iris area (i.e. generated while the sub- operator and convert it to binary form. We call it ject performs raising of eyebrows while both eyes are vector B. directed upward; and screwing of nose or frown) and 7. Find intersections of the two vectors A’ and B and the distance between the two identified points on each take all pixels of vector B below the parabolic curve side of the face while the subject is asked to perform the A’. different facial expressions (e.g. at rest, raising of eye- 8. Utilize the points of the lower eyelid detected ERT brows, screwing of nose, and showing of teeth or smile). model, we call it vector C. Table 1 shows the summary of the salient points as basis 9. Finally, we find the the region of the detected iris for extracting features such as the ratio of iris area as well within the intersection of vector B and C. as the distance ratio between two sides of the face. With ’at rest’ and ’raising of eyebrows’, we calculate AcloserlookatFigs. 6 and 7 reveal interesting results the distance between two points: Infra Orbital (IO) and of this approach. supra-orbital (SO); and the Nose Tip (NT) and SO, while with ’smile’ expression, we get the distances between Facial Paralysis Measurement the two identified points: inner canthus (IC) and Mouth Symmetry Measurement by Iris and Key points Angle (MA); IO and MA; and the NT and MA. Lastly, for ’frown’expression,weget thedistancebetween thetwo In this paper, the symmetry of both sides of the face is measured using the ratio that are obtained from points: NT and MA; and NT and nostrils. Consequently, Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 8 of 14 Fig. 6 Extracted iris using our proposed approach based on Daugman’s algorithm. (a) converted gray scale image; (b) upper edge of the eye with gray thresh level of 0.9; (c) equivalent binary form of the detected upper eyelid; (d) data points of the upper eyelid; (e)-(f) upper eyelid as a result of nd employing parabolic function (2 degree polynomial); (g) result of Daugman’s Integro-Differential operator iris detection; (h)-(n) eyelid detection results using our optimized Daugman’s algorithm; and (o) final results of detecting iris boundaries with superior or upper eyelid occlusion the computed distance ratio of both sides of the face are features for symmetry measurement by comparing the considered as the symmetrical features of each subject. two facial images, the ‘at rest position’ and ‘raising of Computed distances include: P P , P P , eyebrows’ as shown in Fig. 4aand 4b. We compute the 20 R_IO 25 L_IO P P and P P (see Fig. 4aand 4b); P P , P P , distances a and b (Fig. 4a) where a and b are the dis- 20 31 25 31 31 33 31 35 1 1 1 1 P P and P P (see Figure 4c); and P P , P P , tance from P and P and P and P of the right 31 49 31 55 40 49 43 55 20 R_IO 20 31 P P , P P , P P and P P (see Figure 4d). eye, respectively. We then compute the ratio of a and b . 31 49 31 55 49 R_IO 55 L_IO 1 1 Additionally, we calculate the area of the extracted iris. Similarly, for the second image (Fig. 4b), we get a and b 2 2 This is followed by the computation of ratio between two as well as the ratio. Finally, we compute the difference of sides. We find the expression below: these two ratio (i.e. difference between a /b and a /b ) 1 1 2 2 and denote it as right_Movement. D D R L dRatio = (5) The same procedure is applied to the two images otherwise for finding the ratio difference for the left eye (i.e. where dRatio is the ratio of the computed distance D difference between y /x and y /x ) and denote it as 3 3 4 4 and D of the specified key points of each half of the face. left_Movement. The rate of movement can be com- We also consider the capability of the patients to raise puted by finding the ratio between right_Movement and left_Movement. Intuitively, the difference of these two the eyebrows (i.e. rate of movement) as one important Fig. 7 Some more results of extracted iris from the UBIRIS images [24]. a Original image; b Converted gray scale; c upper edge of the eye with gray thresh level of 0.9; d equivalent binary form of the detected upper eyelid; e data points of the upper eyelid; f-g upper eyelid as a result of employing nd parabolic function (2 degree polynomial); h result of Daugman’s Integro-Differential operator iris detection; i-n results eyelids detection with our optimized Daugman’s algorithm; and o final results of segmented iris occluded by upper and lower eyelids Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 9 of 14 Table 1 List of facial expressions and the corresponding prior to employing machine learning (ML) method. This landmarks used for feature extraction is based on empirical results [4, 11, 13], which show that Facial expression SO IO IC MA NT N normal subjects are likely to have an average facial mea- at Rest  surement ratio closer to 1.0 and central palsy patients are likely to have a distance ratio from Supra-orbital (SO) to Infra-orbital (IO) approaching to 1.0. Similarly, iris Raising of eyebrows exposure ratio usually results to values nearly close to 1. Hence, we find hybrid classifier (rule-based + ML) Smile appropriate in our work. This process is presented in Fig. 8. If rule number 1 is satisfied, the algorithm continues to move to the case path (i.e. for the second task), making a test if Frown rule number 2 is also satisfied; otherwise, it performs a machine learning task, such as RF, RLR, SVM, and NB. The rules are generated after fitting the training set to the DT model. For example, rule 1 may have con- ratio values for normal subjects is likely to be higher ditions, like if f < 0.95andf < 0.95 (where f andf x y x y (usually approaching to 1, which signifies the ability to are two of the predictors, the mean result of all param- raise both of his eyebrows) than those of the FP patients. eters and the IO_MA ratio, respectively, based on Table 1), then the subject is most likely to be diag- Facial Palsy Type Classification nosed with facial paralysis, and therefore can proceed Classification of facial paralysis type involves two phases: for rule no. 2 (i.e. to predict the FP type); otherwise, (1) discrimination of healthy from unhealthy subjects; it performs a machine learning task. If the classifier and (2) proper facial palsy classification. In this con- returns 0, the algorithm exits from the entire pro- text, we model the mapping of symmetry features cess as this signifies that the subject is classified as (as described in the previous subsection) into each normal/healthy, else it moves to the case path to per- phase as binomial classification problem. As such, we form a test on rule number 2 for facial palsy type employ two classifiers to be trained, one for healthy classification. and unhealthy discrimination (0-healthy, 1-unhealthy) If rule number 2 is satisfied, the system gives 1 (i.e. 0-PP; 1-CP), else the feature set is fed to another clas- and another one for facial palsy type classification sifier, which can yield either 0 or 1. The same with rule (0-peripheral palsy(PP), 1-central palsy(CP)). For each number 1, rule number 2 is also generated by DT model. classifier, we consider Random Forest (RF), Regular- For example, rule 2 may set conditions like, if f >0.95 ized Logistic Regression (RLR), Support Vector Machine (SVM), Decision Tree (DT), naïve bayes (NB) and Hybrid and f >0.95 (where f and f are two of the predic- b a b classifier as appropriate classification methods as they tors SO_IO ratio and iris area ratio, respectively), then have been successfully applied to pattern recognition and it is most likely to be diagnosed as having central palsy classification on datasets with realistic size [4, 8, 13, 22]. (CP), otherwise, the feature set is fed to another classifier, With hybrid classifier, we apply rule-based approach which could return either 0 or 1 (i.e. 0-PP; 1-CP). Fig. 8 Hybrid Classifier Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 10 of 14 Results model is tested out over the validation set using differ- In our experiments, we used 440 facial images, which ent parameters (e.g. lambda for logistic regression and were taken from 110 subjects (i.e. 50 patients and 60 gamma for SVM). For example, to form the first hybrid healthy subjects). From the 50 unhealthy subjects, 40 of model, we combine the rule extracted from the first fold which have peripheral palsy (PP) cases and 10 have cen- and a regularized logistic regression model (i.e. rule + tral palsy (CP) cases. We used 70% of the dataset as the RLR) and test out its performance over the validation training set and 30% as the test set. For example, in dis- set (left-out fold) while applying it to each of the 10 criminating healthy from unhealthy subjects, we used parameters. Therefore, for each fold, we generate 10 per- 77 subjects (i.e. 35 patients plus 42 normal subjects) as formance measures. We repeat this procedure for the the training set and 33 subjects (i.e. 15 patients plus 18 succeeding folds, which means performing the steps for normal) as the test set. While in FP type classification k times (i.e. with k = 10, since we are using 10-fold cross problem 35 unhealthy cases (i.e. 28 PP and 7 CP) as our validation) will give us 100 performance measures. We training set and 15 (i.e. 12 PP and 3 CP) as our test set. calculate the average performance in all folds. This yields 10 average performance measures (i.e. for each parame- Each subject was asked to perform 4 facial move- ter n), each of which corresponds to one specific hybrid ments. During image pre-processing, resolutions were model. Then we choose the best hybrid model, i.e. a converted to 960 x 720 pixels. Facial palsy type of each model with lambda that minimizes errors. We retrain the subject was pre-labeled based on the clinicians’ evalua- selected model on all of D’, test this out over the hidden tion, which was used during the training stage. This was test set T = ((x , y ), ·, (x , y )), i.e. 30% of the dataset D followed by the feature extraction or the calculation of 1 1 t t and get the performance of the hybrid model. the area of the extracted iris and the distances between the identified points as presented in Fig. 4. Overall, we utilize11featurestotrain theclassifier.The samplesfor Table 2 Comparison of the performance of different classifiers healthy and unhealthy classifier were categorized into for facial palsy classification two labels: 0-healthy, 1-unhealthy. Similarly, samples for classifier Sensitivity(%) Specificity(%) unhealthy subjects were classified into two labels: 0- RLR 85.9 97.7 central palsy and 1-peripheral palsy. It can be noted that RF 92.3 95.0 healthy subjects have very minimal asymmetry in both sides of the face resulting to a ratio that approaches to 1. SVM 72.5 94.8 DT 90.2 94.0 Facial paralysis type classification NB 79.9 95.4 Regularized logistic regression (RLR), Support Vector hDT_RLR 97.5 94.9 Machine (SVM), random forest (RF), naïve bayes (NB), hDT_RF 94.3 95.4 and classification tree (DT) were also utilized to com- hDT_SVM 96.9 90.0 pare with our hybrid classifiers. Given the dataset that is not very large, we adopt the k-fold cross-validation hDT_NB 96.9 90.9 test scheme in forming a hybrid model. The proce- hLR_RF 92.2 94.1 dure involves 2 steps: rule extraction and hybrid model hLR_SVM 85.9 94.8 formation, as applied in our previous work [4]. hLR_DT 92.5 93.7 hLR_NB 85.9 95.4 Step 1: rule extraction. hRF_RLR 96.0 92.3 If we have a dataset D = ((x , y ), ... , (x , y )), we hold 1 1 n n out 30% of D and use it as a test set T = ((x , y ), ... , (x , 1 1 t hRF_SVM 97.1 90.5 y )), leaving 70% as our new dataset D’. We adopt k-fold hRF_DT 95.4 93.1 cross-validation test scheme over D’, i.e. with k = 10. For hRF_NB 96.3 90.2 example, if N = 80 samples, each fold has 8 samples. In hSVM_LR 85.9 94.8 each iteration, we leave one fold out and consider it as hSVM_RF 88.0 94.2 our validation set and utilize the remaining 9 folds as our hSVM_DT 92.5 93.4 training set to learn a model (e.g. rules extraction). Since we have 10 folds, we do this process for 10 repetitions. hSVM_NB 83.0 91.7 We extract rules by fitting the training set to a DT model. hNB_LR 85.9 95.4 hNB_RF 87.9 96.0 Step 2: hybrid model formation. hNB_DT 95.9 90.9 In this step, we form a hybrid model by combining the hNB_SVM 83.0 91.6 rules generated in each fold and the ML classifier. Each Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 11 of 14 We evaluate the classifiers by their average per- To illustrate the diagnostic ability of our classifier, we formance for 20 repetitions of k-fold cross-validation create a graphical plot called receiver operating charac- using k = 10. We repeat this process for evaluation of teristic (ROC) curve, showing the graphical plot while other hybrid model (e.g. rule-based + RF, rule-based varying the discrimination threshold. The ROC curve is + RLR, etc.) and finally choose the hybrid model that a plot of the true positive rate (TPR) on the y-axis ver- performs best. The hybrid classifiers, RF, SVM, RLR, sus the False Positive Rate (FPR) on the x-axis for every DT and NB were tested and experiments reveal that possible classification threshold. hybrid classifier rule-based + RLR (hDT_RLR) out- In machine learning, the true-positive rate, also known performed other classifiers for discriminating healthy as sensitivity or recall answers the question how does from unhealthy (i.e. with paralysis) subjects. Simi- the classifier predict positive when the actual classifica- larly, for the classification task of facial palsy (PP- tion is positive (i.e. not healthy). On the other hand, the peripheral and CP-central palsy), hDT_RLR hybrid clas- false-positive rate, which is also known as the fall-out or sifier is superior among other classifiers used in the probability of false alarm answers the question how does experiments. the classifier incorrectly predict positive when the actual Table 2 presents a comparison of the average per- classification is negative (i.e. healthy). formance of our hybrid classifier, RF, RLR, SVM, DT The ROC curve is therefore the sensitivity as a func- and NB based on our proposed approach. For FP type tion of fall-out. Figures 9 and 10 present the com- classification, our hDT_RLR hybrid classifier achieves parison of the area under ROC curve (AUC) of our a better performance on sensitivity of 5.2% higher hybrid hDT_RLR classifier for healthy and unhealthy than RLR, RF, SVM, DT and NB (see Table 2). Other discrimination and for FP type classification (i.e. cen- hybrid classifiers also show good results comparable tral or peripheral), respectively, using three differ- with hDT_RLR. However, in the context of our study, ent feature extraction methods: (a)Localized Active we are more concerned on designing a classifier that Contour-based method for key points extraction (LAC- yields a stable results on sensitivity performance mea- KPFE));(b)Localized Active Contour-based method for sure without necessarily sacrificing the specificity or fall- geometric and region features extraction (LAC-GRFE); out or the probability of false alarm. Hence, we employ and (c)Ensemble of Regression Tree-based method for hDT_RLR. geometric and region features extraction (ERT-GRFE). Fig. 9 Comparison of the ROC curve of our classifiers using different feature extraction methods (ERT-GRFE, LAC-GRFE and LAC-KPFE) for Healthy and Not Healthy classification Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 12 of 14 Fig. 10 Comparison of the ROC curve of our classifiers using different feature extraction methods (ERT-GRFE, LAC-GRFE and LAC-KPFE)for Facial Palsy classification Our proposed approach ERT-GRFE with hybrid clas- Table 3 shows that in discriminating healthy from sifier hDT_RLR achieves better performance in AUC unhealthy subjects, our proposed approach outperforms of 2.7%-4.9% higher than other two methods in dis- other methods that use key or salient points-based fea- criminating healthy from unhealthy (i.e. with paralysis) tures using LAC model, in terms of sensitivity and speci- subjects as shown in Fig. 8. Similarly, for the palsy type ficity with the improvement of 9% and 4.1%, respectively. classification: central palsy (CP) and peripheral palsy Similarly, experiments show that our approach outper- (PP), ERT-GRFE plus hDT_RLR hybrid classifier out- forms the previous method [4] that used geometric and performed the two (2) feature extraction methods LAC- region-based features (GRFE) using LAC model in terms GRFE and LAC-KPFE used in the experiments with at of sensitivity and specificity with an improvement of least 2.5%-7.7% as in Fig. 9. Experiments reveal that our 3.1% AND 5.9%, respectively. On the other hand, Table 4 method yields more stable results. reveals that for central and peripheral palsy classifica- Tables 3 and 4 present a comparison of the perfor- tion, our proposed ERT-based GFRE is way better than mance of the three methods for discriminating healthy the previous approach that solely used key points-based from unhealthy subjects; and for classifying facial palsy type, respectively. Each approach differs according to the Table 3 Comparison of the performance of the three methods features applied; and the corresponding methods used for healthy and unhealthy discrimination for extracting such features, which include: (a)Localized Active Contour-based method for key points feature LAC-based LAC-based ERT-based GRFE extraction (LAC-KPFE));(b)Localized Active Contour- KPFE GRFE (our approach) based method for geometric and region-based features Sensitivity 89.12% 95.01% 98.12% (LAC-GRFE); and (c)Ensemble of Regression Tree- Specificity 90.01% 88.12% 94.06% based method for geometric and region-based feature AUC 93.40% 95.56% 98.34% extraction (ERT-GRFE). Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 13 of 14 Table 4 Comparison of the performance of the three methods classifier provides an efficient quantitative assessment for facial palsy classification of the facial paralysis. Our facial palsy type classifier LAC-based LAC-based ERT-based GRFE provides a tool essential to physicians for deciding the KPFE GRFE (our approach) medical treatment scheme to begin the patient’s rehabil- itation process. Furthermore, our approach has several Sensitivity 85.15% 91.09% 97.48% merits that are very essential to real application: Specificity 85.12% 88.10% 94.91% AUC 89.81% 95.01% 97.48% • geometric and iris region features based on ensemble of regression trees and optimized nd Daugman’s theory (using parabolic function, i.e. 2 degree polynomial), respectively, allows efficient features with an improvement of around 12% and 9%, identification of the asymmetry of the human face, of the sensitivity and specificity performance measure, which reveals the significant difference between the respectively. Furthermore, experiments reveal that our normal and afflicted side, which localized active ERT-based GFRE proposed approach yields better per- contour model fail to track especially for peculiar formance, particularly in sensitivity and specificity with images (e.g. wrinkles, excessive mustache, an improvement of 6.4% and 6.8%, respectively. Thus, occlusions, etc.); our proposed approach is superior among other three ERT model does have have a very appealing quality methods. that reduces errors in each iteration, which can be very useful in extracting boundaries of the facial Discussion features from the background; and Empirical results [15]revealthatensembleofregres- ERT model does not require proper or perfect sion trees (ERT) model does have an appealing quality identification of initial evolving curves to ensure that it performs shape invariant feature selection while accurate facial feature extraction. minimizing the same loss function during training and our method significantly lessens the time test time, which significantly lessens time complexity complexity of extracting features, without of extracting features. True enough, our experiments sacrificing the level of accuracy making it more reveal that ERT-based method for geometric and region suitable for the real application. features extraction (ERT-GRFE) works well. Further- more, our proposed approach to combine iris segmenta- Abbreviations 1D: 1 Dimensional; AUC: area under ROC curve; CP: Central palsy; CT: tion and the ERT-based key point detection for feature Classification tree; DT: Decision Tree; ERT: Ensemble of regression trees; extraction provides a better discrimination of central and ERT-GRFE: Ensemble of Regression Tree-based method for geometric and peripheral palsy most especially in 'raising of eyebrows' region features extraction; FP: Facial Paralysis; FPR: False Positive Rate; HOG: Histogram of Oriented Gradients; hDT_RLR: hybrid Decision tree and and 'screwing of nose' movements. It shows changes of Regularized Logistic Regression; IC: Inner Canthus; IO: Infra Orbital; IRB: the structure on edges of the eye, i.e., the significant dif- Institution Review Board; LAC: Localized Active Contour; LAC-GRFE: Localized ference between the normal side and the palsy side for Active Contour-based method for geometric and region features extraction; some facial movements (e.g. eyebrow lifting, nose screw- LAC-KPFE: Localized Active Contour-based method for key points extraction; MA: Mouth Angle; ML: Machine learning; NB: Naive bayes; NT: Nose tip; OC: ing, and showing of teeth). Also, features based on the Outer Canthus; PP: Peripheral palsy; RF: Random Forest; RLR: Regularized combination of iris and key points by utilizing ensem- Logistic Regression; ROC: Receiver operating characteristic curve; ROI: bleofregression tree technique canmodel thetypical Region of interest; SO: Supra-orbital; SVM: Support Vector Machine; TPR: True positive rate; UE: Upper Eyelids changes in the eye region. A closer look at the perfor- mance of our classifier, as shown in Tables 3 and 4 reveal Acknowledgments There are no acknowledgments. interesting statistics in terms of the specific abilities of Funding the three methods. Our method proves to have signifi- This research was supported by the National Research Foundation of Korea cant contribution in discriminating central from periph- (NRF-2017M3C4A7065887) and National IT Industry Promotion Agency grant eral palsy patients and healthy from facial palsy subjects. funded by the Ministry of Science and ICT and Ministry of Health and Welfare (NO. C1202-18-1001, Development Project of The Precision Medicine The combination of iris segmentation and ERT-based Hospital Information System (P-HIS)); and the scholarship was granted by the key point approach is more suitable for this operation. Korean Government Scholarship Program - NIIED, Ministry of Education, South Korea. Conclusion Availability of data and materials In this study, we present a novel approach to address The datasets generated and/or analysed during the current study are not available due to patient privacy. FP classification problem in facial images. Salient points and iris detection based on ensemble of regression trees Authors’ contributions are employed to extract the key features. Regularized Conceived and designed the methodology and and analyzed the methods: JB, JK. Performed experiments and programming: JB. Collected images and Logistic Regression (RLR) plus Classification Tree (CT) Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 14 of 14 performed manual evaluation: WS. All authors wrote the manuscript. All 13. Liu L, Cheng G, Dong J, Qu H. Evaluation of facial paralysis degree authors read and approved the final manuscript. based on regions. In: Proceedings of the 2010 Third International Conference onKnowledge Discovery and Data Mining. Washington, DC: Ethics approval and consent to participate IEEE Computer ScienceSociety; 2010. p. 514–7. This study was approved by the Institution Review Board (IRB)of Korea 14. Otsu N. A threshold selection method from gray-level histogram. IEEE University, Guro Hospital (with reference number MD14041-002). The board Trans Syst, Man Cybern. 1979;9:62–6. permitted not taking an informed consent due to the retrospective design 15. Kazemi V, Sullivan J. One millisecond face alignment with an ensemble of this study. of regression trees. In: Proc. IEEE Conf Comput Vis Pattern Recog. New York City: IEEE; 2014. p. 1867–1874. 16. Samsudin W, Sundaraj K. Image processing on facial paralysis for facial Consent for publication rehabilitation system: A review. In: IEEE International Conference on Not applicable. Control System, Computing and Engineering. Malaysia: IEEE; 2012. p. 259–63. Competing interests 17. Fasel B, Luttin J. Automatic facial expression analysis: Survey. Pattern The authors declare that they have no competing interests. Recog. 2003;36:259–75. 18. Ngo T, Seo M, Chen Y, Matsushiro N. Quantitative assessment of facial Publisher’s Note paralysis using local binary patterns and gabor filters. In: Proceedings of Springer Nature remains neutral with regard to jurisdictional claims in the 5th International Symposium on Information and Communication published maps and institutional affiliations. Technology (SoICT). New York: ACM; 2014. p. 155–61. 19. Déniz O, Bueno J, Salido F, De la Torre F. Face recognition using Author details histograms of oriented gradients. Pattern Recog Lett. 2011;32:1598–603. Department of Computer Science and Engineering, Korea University, Seoul, 20. Daugman J. How iris recognition works. New York City: IEEE; 2004. p. South Korea. IT Department, University of Science and Technology of 21–30. vol. 14, IEEE Trans Circ Syst Vi Technol. Southern Philippines, Cagayan de Oro, Philippines. Department of 21. Dalal N, Triggs B. Histograms of oriented gradients for human Neurology and Stroke Center, Samsung Medical Center, Seoul, South Korea. detection. In: Proc IEEE Conf Computer Vision and Pattern Recognition. Sungkyunkwan University School of Medicine, Department of Digital New York City: IEEE; 2005. Health, SAIHST, Sungkyunkwan University, Seoul, South Korea. 22. Lyons M, Budynek J, Akamatsu S. Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell. 1999;21(12):57–62. Received: 3 December 2018 Accepted: 2 April 2019 23. Lyons M, Akamatsu S, Kamachi M, Gyoba J. Coding facial expressions with gabor wavelets. In: Third IEEE Conf. Face and Gesture Recognition. IEEE; 1998. p. 200–5. 24. Proenca H, Alexandre L. Ubiris: A noisy iris image database. In: Proceed. References of ICIAP 2005 - Intern. Confer. on Image Analysis and Processing. United 1. Baugh R, Ishii G, RSchwartz S, Drumheller CM, Burkholder R, Deckard States: Springer Link; 2005. p. 970–977. N, Dawson C, Driscoll C, BoydGillespie M. Clinical practice guideline bellspalsy. Otolaryngol-Head Neck Surg. 2013;149:1–27. 2. Peitersen E. Bell’s palsy: the spontaneous course of 2,500 peripheral facialnerve palsies of different etiologies. Acta OtoLaryngol. 2002;122(7): 4–30. 3. Kanerva M. Peripheral facial palsy grading, etiology, and melkersson–rosenthal syndrome. PhD thesis. Finland: University of Helsinki; 2008. 4. Barbosa J, Lee K, Lee S, Lodhi B, Cho J, Seo W, Kang J. Efficient quantitative assessment of facial paralysis using iris segmentation and active contour-based key points detection with hybrid classifier. BMC Med Imaging. 2016;16:23–40. 5. May M. Anatomy for the clinician. In: May M, Schaitkin B, editors. The Facial Nerve, May’s Second Edition. 2nd. New York: Thieme, New York; 2000. p. 19–56. 6. Wachtman GS, Liu Y, Zhao T, Cohn J, Schmidt K, Henkelmann TC, VanSwearingen JM, Manders EK. Measurement of asymmetry in persons with facial paralysis. In: Combined Annu Conf. Robert H. Ivy. Ohio Valley: Societies of Plastic and Reconstructive Surgeons; 2002. 7. Liu Y, Schmidt K, Cohn J, Mitra S. Facial asymmetry quantification for expression invariant human identification. Comput Vis Image Underst J. 2003;91:138–59. 8. He S, Soraghan J, O’Reilly B, Xing D. Quantitative analysis of facial paralysis using local binary patterns in biomedical videos. IEEE Trans Biomed Eng. 2009;56:1864–70. 9. Wang S, Li H, Qi F, Zhao Y. Objective facial paralysis grading based on pface and eigenflow. Med Biol Eng Comput. 2004;42:598–603. 10. Anguraj K, Padma S. Analysis of facial paralysis disease using image processing technique. Int J Comput Appl(0975 – 8887). 2012;54:1–4. 11. Dong J, Ma L, Li W, Wang S, Liu L, Lin Y, Jian M. An approach for quantitative evaluation of the degree of facial paralysis based on salient point detection. In: International Symposium on Intelligent Information Technology Application Workshops. New York City: IEEE; 2008. 12. Liu L, Cheng G, Dong J, Qu H. Evaluation of facial paralysis degree based on regions. In: IEEE Computer Science Society Washington U.DC (ed.), editor. Third International Conference onKnowledge Discovery and Data Mining. Washington, DC: IEEE Computer Science Society; 2010. p. 514–7. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Medical Imaging Springer Journals

paraFaceTest: an ensemble of regression tree-based facial features extraction for efficient facial paralysis classification

BMC Medical Imaging , Volume 19 (1) – Apr 25, 2019

Loading next page...
 
/lp/springer-journals/parafacetest-an-ensemble-of-regression-tree-based-facial-features-I0Rmq0hdhm

References (38)

Publisher
Springer Journals
Copyright
Copyright © 2019 by The Author(s)
Subject
Medicine & Public Health; Imaging / Radiology
eISSN
1471-2342
DOI
10.1186/s12880-019-0330-8
Publisher site
See Article on Publisher Site

Abstract

Background: Facial paralysis (FP) is a neuromotor dysfunction that losses voluntary muscles movement in one side of the human face. As the face is the basic means of social interactions and emotional expressions among humans, individuals afflicted can often be introverted and may develop psychological distress, which can be even more severe than the physical disability. This paper addresses the problem of objective facial paralysis evaluation. Methods: We present a novel approach for objective facial paralysis evaluation and classification, which is crucial for deciding the medical treatment scheme. For FP classification, in particular, we proposed a method based on the ensemble of regression trees to efficiently extract facial salient points and detect iris or sclera boundaries. We also nd employ 2 degree polynomial of parabolic function to improve Daugman’s algorithm for detecting occluded iris boundaries, thereby allowing us to efficiently get the area of the iris. The symmetry score of each face is measured by calculating the ratio of both iris area and the distances between the key points in both sides of the face. We build a model by employing hybrid classifier that discriminates healthy from unhealthy subjects and performs FP classification. Results: Objective analysis was conducted to evaluate the performance of the proposed method. As we explore the effect of data augmentation using publicly available datasets of facial expressions, experiments reveal that the proposed approach demonstrates efficiency. Conclusions: Extraction of iris and facial salient points on images based on ensemble of regression trees along with our hybrid classifier (classification tree plus regularized logistic regression) provides a more improved way of addressing FP classification problem. It addresses the common limiting factor introduced in the previous works, i.e. having the greater sensitivity to subjects exposed to peculiar facial images, whereby improper identification of initial evolving curve for facial feature segmentation results to inaccurate facial feature extraction. Leveraging ensemble of regression trees provides accurate salient points extraction, which is crucial for revealing the significant difference between the healthy and the palsy side when performing different facial expressions. Keywords: Facial paralysis classification, Facial paralysis objective evaluation, Ensemble of regression trees, Salient point detection, Iris detection, Facial paralysis evaluation framework Background every sixty people around the world, one of them can Facial paralysis (FP) or facial nerve palsy is a neuromo- be affected by facial paralysis [1]. Facial paralysis often tor dysfunction that losses voluntary muscles movement causes patients to be introverted and eventually suffer in one side of the human face. As a result, this leads to from social and psychological distress, which can be even the loss of the person’s ability to mimic facial expres- more severe than the physical disability [2]. It is usually sions. FP is not an uncommon medical condition. For encountered in clinical practices, which can be classi- fied into peripheral and central facial palsy [3, 4]. These *Correspondence: kangj@korea.ac.kr two categories differ from each other according to the Department of Computer Science and Engineering, Korea University, Seoul, behavior of the upper layer of the human face. Periph- South Korea Full list of author information is available at the end of the article eral facial palsy is a nerve disturbance in the pons of © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 2 of 14 the brainstem, which affects the upper, middle and lower Dong et al. [11] proposed the use of salient points for facial muscles of one side of the face. On the other hand, degree of facial paralysis estimation. A method was pro- central facial palsy is a nerve dysfunction in the cor- posed to detect the salient points that will be the basis for tical areas whereby the forehead and eyes are spared, the estimation of the degree of paralysis. As the salient but the lower half of one side of the face is affected, points may include some unnecessary points for describ- unlike in peripheral FP [3–5]. Such scenario has trig- ing facial features, edge detection was used to discard gered the interest of researchers and clinicians of this these points. Then K-means clustering is applied to clas- field, and, consequently, led them to the development of sify these salient points into six categories: 2 eyebrows, 2 objective grading facial functions and methods in mon- eyes, nose, and mouth. There are about 14 key points are itoring the effect of medical, rehabilitation or surgical found in the six facial regions, which would represent the treatment. points that may be affected when performing some facial Many computer-aided analysis systems have been expressions. However, this technique falls short when introduced to measure dysfunction of one part of the applied to elder patients, in which exact points can be face and the level of severity, but not much on facial difficult to apply [12]. paralysis type as the classification problem. Moreover, Another method was proposed that estimates the the efficiency of the method used for it to be univer- degree of facial paralysis by comparing multiple regions sally accepted is still underway [3]. Classification of each on human face. In [13], Liu et al. perform comparison case of facial paralysis into central or peripheral plays of the two sides of the face and compute the four ratios, a critical role in helping physicians to decide for the which are consequently used to represent the severity of most appropriate treatment scheme to use [4]. Image the paralysis. Nevertheless, this method suffers from the processing has been applied in the existing objective influence of uneven illumination [8]. facial paralysis assessment, but the processing methods In our previous work [4], a technique that generates used are mostly labor-intensive, if not; suffer from the closed contours for separating outer boundaries of an sensitivity to the extrinsic facial asymmetry caused by object from background using Localized Active Con- orientation, illumination and shadows. As such, creating tour (LAC) model for feature extraction was employed, a clinically usable and reliable method is still very chal- which reasonably reduced these drawbacks. However, lenging. Wachtman et al. [6, 7] measured facial paralysis one limiting factor of the method introduced is that it by examining the facial asymmetry on static images. has a greater sensitivity to subjects exposed to pecu- Their methods are prone to facial asymmetry even for liar images where facial features suffer from different healthy subjects due to sensitivity to orientation, illumi- occlusions (e.g. eyeglasses, eyebrows occluded with hair, nation and shadows [8]. Other previous works [9–11] wrinkles, excessive beard, etc.). Moreover, improper set- were also introduced but most of them are solely based ting or identification of parameters, such as the radius of on finding salient points on human face with the use the initial evolving curve (e.g. minimum-bounding box of the standard edge detection tool (e.g. Canny, Sobel, of the eyes, eyebrows, etc.) may lead to improper fea- SUSAN) for image segmentation. tures segmentation, which may in turn gives inaccurate Canny edge detection algorithm may yield to inaccu- detection of key points as revealed in Fig. 1. Although rate results as it influences connected edge points. This such limitation was addressed in [4] by applying a win- is because it does comparisons of the adjacent pixels on dow kernel (i.e. generated based on Otsu’s method [14]) the gradient direction to determine if the current pixel that is run through the binary image (e.g. detected eyes, has a local maximum. Improper segmentation will also eyebrows,lip,etc.),Otsu’smethodhas some drawbacks result to improper generation of key points. Moreover, of the assumption of uniform illumination. Additionally, it may be difficult to find and detect the exact points it does not use any object structure or spatial coherence, when the same algorithm is applied to elder patients. which may sometimes result to inaccurate generation of Fig. 1 Pre-processing results. a eye image with some uneven illumination, b-c extracted eye contour by LAC model when parameters of initial evolving curves are not properly identified Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 3 of 14 kernel paramaters that may in turn result to improper (i.e. still photos) of the patients with a front view face segmentation. and with reasonable illumination of lights so that each In order to address these drawbacks, we present a side of the face achieves roughly similar amount of light- novel approach based on ensemble of regression trees ing. The photo taking procedure starts with the patient (ERT) model. We leverage ERT model to extract salient performing the’at rest’faceposition, followedbythe points as basis for preparing the features or independent three voluntary facial movements that include raising of variables. Our target was to keep a natural framework eyebrows, screwing-up of nose, and showing of teeth or that finds 68 key points on edges of facial features (e.g. smiling. eyes, eyebrows, mouth, etc.); and regresses the loca- Theframework of theproposedobjective facial paraly- tion of the facial landmarks from a sparse subset of sisevaluationsystemispresented in Fig. 2.Facialimages intensity values extracted from an input image. As ERT of a patient, which are taken while being requested to model provides a simple and intuitive way in perform- perform some facial expressions, are stored in the image ing this task and in generating landmarks that separates database. In the rest of this section, we describe the boundaries of each significant facial feature from the details of the form of individual components of the facial background, without requiring setting-up of parameters landmark detection and how we perform evaluation for initial evolving curves as required by the previ- and classification of facial paralysis. To begin the pro- ous approach [4], we find this technique appropriate cess, raw images from the image database are extracted to address our problem. Furthermore, empirical results or retrieved. Dimension alignment and image resizing [15] reveal that ERT model does have an appealing qual- are then performed, followed by the pre-processing of ity that performs shape invariant feature selection while images to enhance contrast and remove undesirable minimizing the same loss function during training and image noise. test time, which significantly lessens time complexity of extracting features. Facial Image Acquisition and Pre-processing In this study, we make three significant contributions. Existing literatures show that facial data acquisition First, we introduce a more robust approach for efficient methods can be classified into two categories depending evaluation of facial paralysis and classification, which is on the processing methods they used and on the usage of crucial for determining the appropriate medical treat- images or videos as their databases. As image-based sys- ment scheme. Second, we provide an efficient method in tems do have an appealing advantage of ease of use and feature extraction based on ensemble of regression trees cost effectiveness [16], we utilize still images as inputs to in a computationally efficient way; and finally, we study our classification model. in depth, the effect of computing the facial landmarks symmetry features on the quality of facial paralysis clas- Face Normalization sifications. Facial feature extractions can be complicated as the face appearance changes, which is caused by illumination and Methods pose variations. Normalizing the acquired facial images The study performs objective evaluation of facial paral- prior to objective FP assessment may significantly reduce ysis particularly the facial classification and grading in a these complications. It is worth note taking, however, computationally efficient way. We capture facial images that while face normalization may be a good approach in Fig. 2 Framework of the proposed objective facial paralysis evaluation Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 4 of 14 connection with the methods for objective facial paraly- the pose estimator employed in [15]. In the context of sis assessment, it is but optional, so long as extracted fea- computer vision, a sliding window is a rectangular region ture parameters are normalized prior to the classification of fixed width and height that slides across an image, task. as described in the subsequent section. Normally, for each window, we take the window region and apply an Facial Feature Extraction image classifier to check if the window has an object (i.e. the face as our region of interest). Combined with Deformation Extraction image pyramids, image classifiers can be created that Facial features deformation can be categorized by the changes of texture and shape that lead to high spa- can recognize objects even if the scales and locations in tial gradients, which are good indicators for tracking the image vary. These techniques, in its simplicity, play facial actions [17]. In turn, these are good indicators for an absolutely critical role in object detection and image facial asymmetry and may be analyzed either in image classification. or spatial frequency domain, which can be computed by Features are extracted from the different facial expres- high-pass gradient or Gabor wavelet-based filters. Ngo sions that the patient will perform, which include: (a) et al. [18] utilized this method by combining it with at Rest; (b) Raising of eyebrows; (c) frowning or screw- local binary patterns and claimed to perform well for the ing up of nose; and (d) Smiling or showing of teeth. In task of quantitative assessment of facial paralysis. Defor- this study, we consider geometric and region-based fea- mation extraction approaches can be holistic or local. tures as our inputs for modelling a classifier. To extract Holistic image-based approaches are those where face is the geometric-based features, the salient points of the processed as a whole, while local methods focused on facial features (e.g. eyes, eyebrows and lip) are detected facial features or areas that are proned to change and are using the concept of ensemble of regression trees [15]. able to detect subtle changes in small areas [17]. In this The goal is to find the 68 key points on edges of facial work, we extract the facial salient points and iris area for features as shown in Fig. 3a. However, for generating geometric-based and region-based features generation, the geometric-based symmetry features of each image, respectively. we are interested in the following points: inner canthus (IC), mouth angle (MA), infra orbital (IO), upper eye- Salient points detection lids (UE), supra-orbital (SO), nose tip (NT) and nostrils Feature extraction starts from the preprocessing of the (see Fig. 3b). In order to detect the salient points of the input image and facial region detection. We find frontal facial image, we leverage the dlib library [15]thathas the capability to detect points of each facial features. These human faces in an image and estimate their pose. Such points are inputs for calculating the distance of the key estimated pose takes the facial form of 68 landmarks that points as identified in Fig. 4, and the corresponding dis- will lead to the detection of the key points of the mouth, eyebrows and eyes. The face detector is made using the tance ratio for both sides of the face. Additionally, we classic Histogram of Oriented Gradients (HOG) feature also extract region-based features, which involves detec- [19] combined with a linear classifier, an image pyra- tion of iris or sclera boundaries by leveraging Daugman’s mid, and sliding window detection scheme. We utilize Integro-Differential Operation [20]. Fig. 3 Facial landmarks or key points. a 68 key points, and b salient points for each facial feature utilized in this study Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 5 of 14 Fig. 4 Facial expressions used in the study. a at rest; b raising or lifting of eyebrows; c screwing-up of nose; and d smiling or showing of teeth Histogram of Oriented Gradients(HOG) 3. Block overlapping In the field of computer vision and image processing, 4. Normalizing parameters Histogram of Oriented Gradients (HOG) is a feature However, the recommended values for the HOG descriptor used for object or face detection. It is an algo- parameters include: (a)1D centered derivative mask [-1, rithm, which takes an image and outputs feature vectors 0, +1]; (b) Detection window size of 64x128; (c) Cell size or feature descriptors. Feature descriptors encode inter- of 8x8; and Block size of 16x16 (2x2 cells) [21]. esting information into a series of numbers and act as a sort of numerical fingerprint, which can differentiate one Ensemble of Regression Trees feature from another. Ideally, this information is invari- An ensemble of regression tree is a predictive model, ant under image transformation; hence, even if the image which composes a weighted combination of multiple is transformed in some way, we can still find the fea- regression trees. Generally, combining multiple regres- ture again. HOG uses locally normalized histogram of sion trees increases the predictive performance. It is with gradient orientation features similar to Scale Invariant the collection of regression trees that makes a bigger and Feature Transform (SIFT) descriptor in a dense overlap- better regression tree [15]. The core of each regression ping grid, which gives very good results in face detection. function r is the tree-based regressors fit to the residual It is similartoSIFT, except that HOGfeature descriptors targets during the gradient boosting algorithm. are computed on a dense grid of uniformly spaced cells and they use overlapping local contrast normalization Shape invariant split tests for improved performance [19, 21]. Based on the approach used by Kazemi et al. [15], we Implementation of the HOG descriptor algorithm is as make a decision based on thresholding the difference follows: between the intensities of two pixels at each split node in the regression tree. When defined in the coordinate 1. Partition the image into small connected regions system of the mean shape, the pixels used in the test are called cells, then for each cell, calculate the gradient at positions i and k. For facial images having arbitrary directions histogram for the pixels within the cell. shapes, we intend to index the points that have the same 2. According to the gradient orientation, discretize position relativetoits shape, as i and k have to the mean each cell into angular bins. Each pixel of the cell shape. To accomplish this task, the image can be warped contributes weighted gradient to its corresponding to the mean shape based on the current shape estimate angular bin. Adjacent group of cells are considered before extracting the features. Warping the location of as spatial regions, referred to as blocks. The points is more efficient rather than the whole image, grouping of cells into a block is then the basis for since we only utilize a very sparse representation of the histograms grouping and normalization. image. In what follows, the details are precisely shown. Normalized group of histograms represents the We let vi be the index of the facial landmark in the mean block histogram. The set of these block histograms shape that is nearest to i and its offset from i is defined represents the descriptor. as: The following basic configuration parameters are δY = i − Y (1) i vi required for computation of the HOG descriptor: 1. Masks to compute derivatives and gradients. Then for a shape S defined in image I , the position in j j 2. Geometry of splitting an image into cells and I that is qualitatively similar to i in the mean shape image grouping cells into a block. is given by Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 6 of 14 inner canthus (IC) and outer canthus (OC)) detected by i = Y + R δY (2) j,vi i ensemble of regression trees algorithm and from there, we generate the parameters of the eye region as inputs where S and R are the scale and rotation matrix of the j j to Daugman’s Integro Differential operator [20]todetect similarity transform which transforms S to S,the mean the iris sclera or boundaries. However, as some eye shape. The scale and rotation are found to minimize images do have eyelid occlusions, optimization of Daug- man’s algorithm was performed to ensure proper iris || Y − R Y + t || (3) m j j j,m j boundaries detection. In our previous work, LAC-based m=1 method was employed to optimize Daugman’s algorithm the sum of squares between the mean shape’s facial and perform subtraction method. However, LAC model landmark points, Y ’s ,and thoseofthe warped shape. is quite tedious as it may result to improper segmenta- k is similarly defined. Formally each split is a decision tion if the initial evolving curves and iterations are not involving 3 parameters θ = (τ, i, k) and is applied to each properly defined. Moreover, it has a greater sensitivity to training and test example as theeyesofold patients duetothe presence of excessive wrinkles. i − k >τ Iπ ( ) Iπ ( ) (t) j j h , S , θ = (4) In this paper, we implement the curve fitting scheme Iπ j j 0 Otherwise to ensure proper detection of iris sclera or boundaries, where i and k are defined using the scale and rotation thereby providing better results in detecting occluded (t) matrix which best warp S to S according to equation iris. By definition, a parabola is a plane curve. It is said to (2). be mirror-symmetrical and is approximately U-shaped. In practice, during the training phase, the assignments To fit a parabolic profile through the data, a second and local translations are identified. Computing the sim- degree polynomial is used, defined as: y = ax + bx + c ilarity transform, at test time, the most computationally This will exactly fit a simple curve to the three con- expensive part of the process is only done once at every straints, which include a point, angle, or curvature. More level of the cascade. often, the angle and curvature constraints are added to This method starts by using the following: the end of a curve. In this case, they are called end con- ditions. To fit a curve through a data, we first get the 1. A training set of labeled facial landmarks on an coefficients b and c of the line p(x) = bx + c. To evaluate image. These images are manually labeled, this line, values of x (e.g. given by xp) must be chosen. specifying specific (x, y)-coordinates of regions For example, the curve will be plotted for x belongs to [0, surrounding each facial structure. 7] in steps of delta x = 0.1. The generated coefficients will 2. Priors, or more specifically, the probability on then be used to generate the y values of the polynomial fit distance between pairs of input pixels. at the desired values of x given by xp. This means that the vector ( denoted as yp) can be generated, which contains Given this training data, ensemble of regression trees the parabolic fit to the data evaluated at the x-values xp. are trained to estimate the positions or locations of facial In what follows, we describe our approach for detect- landmarks directly from the pixel intensities, that is, no ing the iris sclera or boundaries: feature extraction is taking place. The final result is a detector of facial landmarks that can be utilized to effi- 1. Detect eye region (i.e. creating a rectangular ciently detect salient points of an image. Fig. 5 presents bounding box from the detected UE,IO, IC, OC as sample results of our proposed approach for facial fea- illustrated in Figure 3b) tures detection based on ensemble of regression trees 2. Detect upper eyelid edge using gray thresh level (ERT) model when applied in JAFFE dataset [22, 23]. (e.g. threshold <= 0.9) 3. Convert generated grayscale image to binary. Region-based Feature Extraction 4. Find significant points of the upper eyelid. Iris Detection A person who has a symptom of facial paralysis is likely Traverse each column and find the first row to have asymmetric distance between the infra orbital whose pixel value=1 and whose location (IO) and mouth angle (MA) of both sides of his face while variance (i.e. row address of the last 4 pixels) is performing facial movements such as smile and frown. minimal within the threshold. We call it vector They may also have an unbalanced exposure of iris when A. performing different voluntary muscle movements (e.g. 5. Implement curve fitting through the generated data screwing of nose, showing of teeth or smiling, raising of points using parabolic profile (i.e. second degree of eyebrows with both eyes directed upward) [4]. We uti- polynomial). We refer this as A’. lize the points (e.g. upper eyelid (UE), infra orbital (IO), Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 7 of 14 Fig. 5 Sample results of our proposed approach for facial feature extraction based on Ensemble of Regression Trees (ERT) model as applied in JAFFE dataset [22]. Facial landmarks (i.e. 68 key points) are detected from a single image, based on ERT algorithm 6. Detect iris using Daugman’s integro-differential computing the iris area (i.e. generated while the sub- operator and convert it to binary form. We call it ject performs raising of eyebrows while both eyes are vector B. directed upward; and screwing of nose or frown) and 7. Find intersections of the two vectors A’ and B and the distance between the two identified points on each take all pixels of vector B below the parabolic curve side of the face while the subject is asked to perform the A’. different facial expressions (e.g. at rest, raising of eye- 8. Utilize the points of the lower eyelid detected ERT brows, screwing of nose, and showing of teeth or smile). model, we call it vector C. Table 1 shows the summary of the salient points as basis 9. Finally, we find the the region of the detected iris for extracting features such as the ratio of iris area as well within the intersection of vector B and C. as the distance ratio between two sides of the face. With ’at rest’ and ’raising of eyebrows’, we calculate AcloserlookatFigs. 6 and 7 reveal interesting results the distance between two points: Infra Orbital (IO) and of this approach. supra-orbital (SO); and the Nose Tip (NT) and SO, while with ’smile’ expression, we get the distances between Facial Paralysis Measurement the two identified points: inner canthus (IC) and Mouth Symmetry Measurement by Iris and Key points Angle (MA); IO and MA; and the NT and MA. Lastly, for ’frown’expression,weget thedistancebetween thetwo In this paper, the symmetry of both sides of the face is measured using the ratio that are obtained from points: NT and MA; and NT and nostrils. Consequently, Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 8 of 14 Fig. 6 Extracted iris using our proposed approach based on Daugman’s algorithm. (a) converted gray scale image; (b) upper edge of the eye with gray thresh level of 0.9; (c) equivalent binary form of the detected upper eyelid; (d) data points of the upper eyelid; (e)-(f) upper eyelid as a result of nd employing parabolic function (2 degree polynomial); (g) result of Daugman’s Integro-Differential operator iris detection; (h)-(n) eyelid detection results using our optimized Daugman’s algorithm; and (o) final results of detecting iris boundaries with superior or upper eyelid occlusion the computed distance ratio of both sides of the face are features for symmetry measurement by comparing the considered as the symmetrical features of each subject. two facial images, the ‘at rest position’ and ‘raising of Computed distances include: P P , P P , eyebrows’ as shown in Fig. 4aand 4b. We compute the 20 R_IO 25 L_IO P P and P P (see Fig. 4aand 4b); P P , P P , distances a and b (Fig. 4a) where a and b are the dis- 20 31 25 31 31 33 31 35 1 1 1 1 P P and P P (see Figure 4c); and P P , P P , tance from P and P and P and P of the right 31 49 31 55 40 49 43 55 20 R_IO 20 31 P P , P P , P P and P P (see Figure 4d). eye, respectively. We then compute the ratio of a and b . 31 49 31 55 49 R_IO 55 L_IO 1 1 Additionally, we calculate the area of the extracted iris. Similarly, for the second image (Fig. 4b), we get a and b 2 2 This is followed by the computation of ratio between two as well as the ratio. Finally, we compute the difference of sides. We find the expression below: these two ratio (i.e. difference between a /b and a /b ) 1 1 2 2 and denote it as right_Movement. D D R L dRatio = (5) The same procedure is applied to the two images otherwise for finding the ratio difference for the left eye (i.e. where dRatio is the ratio of the computed distance D difference between y /x and y /x ) and denote it as 3 3 4 4 and D of the specified key points of each half of the face. left_Movement. The rate of movement can be com- We also consider the capability of the patients to raise puted by finding the ratio between right_Movement and left_Movement. Intuitively, the difference of these two the eyebrows (i.e. rate of movement) as one important Fig. 7 Some more results of extracted iris from the UBIRIS images [24]. a Original image; b Converted gray scale; c upper edge of the eye with gray thresh level of 0.9; d equivalent binary form of the detected upper eyelid; e data points of the upper eyelid; f-g upper eyelid as a result of employing nd parabolic function (2 degree polynomial); h result of Daugman’s Integro-Differential operator iris detection; i-n results eyelids detection with our optimized Daugman’s algorithm; and o final results of segmented iris occluded by upper and lower eyelids Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 9 of 14 Table 1 List of facial expressions and the corresponding prior to employing machine learning (ML) method. This landmarks used for feature extraction is based on empirical results [4, 11, 13], which show that Facial expression SO IO IC MA NT N normal subjects are likely to have an average facial mea- at Rest  surement ratio closer to 1.0 and central palsy patients are likely to have a distance ratio from Supra-orbital (SO) to Infra-orbital (IO) approaching to 1.0. Similarly, iris Raising of eyebrows exposure ratio usually results to values nearly close to 1. Hence, we find hybrid classifier (rule-based + ML) Smile appropriate in our work. This process is presented in Fig. 8. If rule number 1 is satisfied, the algorithm continues to move to the case path (i.e. for the second task), making a test if Frown rule number 2 is also satisfied; otherwise, it performs a machine learning task, such as RF, RLR, SVM, and NB. The rules are generated after fitting the training set to the DT model. For example, rule 1 may have con- ratio values for normal subjects is likely to be higher ditions, like if f < 0.95andf < 0.95 (where f andf x y x y (usually approaching to 1, which signifies the ability to are two of the predictors, the mean result of all param- raise both of his eyebrows) than those of the FP patients. eters and the IO_MA ratio, respectively, based on Table 1), then the subject is most likely to be diag- Facial Palsy Type Classification nosed with facial paralysis, and therefore can proceed Classification of facial paralysis type involves two phases: for rule no. 2 (i.e. to predict the FP type); otherwise, (1) discrimination of healthy from unhealthy subjects; it performs a machine learning task. If the classifier and (2) proper facial palsy classification. In this con- returns 0, the algorithm exits from the entire pro- text, we model the mapping of symmetry features cess as this signifies that the subject is classified as (as described in the previous subsection) into each normal/healthy, else it moves to the case path to per- phase as binomial classification problem. As such, we form a test on rule number 2 for facial palsy type employ two classifiers to be trained, one for healthy classification. and unhealthy discrimination (0-healthy, 1-unhealthy) If rule number 2 is satisfied, the system gives 1 (i.e. 0-PP; 1-CP), else the feature set is fed to another clas- and another one for facial palsy type classification sifier, which can yield either 0 or 1. The same with rule (0-peripheral palsy(PP), 1-central palsy(CP)). For each number 1, rule number 2 is also generated by DT model. classifier, we consider Random Forest (RF), Regular- For example, rule 2 may set conditions like, if f >0.95 ized Logistic Regression (RLR), Support Vector Machine (SVM), Decision Tree (DT), naïve bayes (NB) and Hybrid and f >0.95 (where f and f are two of the predic- b a b classifier as appropriate classification methods as they tors SO_IO ratio and iris area ratio, respectively), then have been successfully applied to pattern recognition and it is most likely to be diagnosed as having central palsy classification on datasets with realistic size [4, 8, 13, 22]. (CP), otherwise, the feature set is fed to another classifier, With hybrid classifier, we apply rule-based approach which could return either 0 or 1 (i.e. 0-PP; 1-CP). Fig. 8 Hybrid Classifier Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 10 of 14 Results model is tested out over the validation set using differ- In our experiments, we used 440 facial images, which ent parameters (e.g. lambda for logistic regression and were taken from 110 subjects (i.e. 50 patients and 60 gamma for SVM). For example, to form the first hybrid healthy subjects). From the 50 unhealthy subjects, 40 of model, we combine the rule extracted from the first fold which have peripheral palsy (PP) cases and 10 have cen- and a regularized logistic regression model (i.e. rule + tral palsy (CP) cases. We used 70% of the dataset as the RLR) and test out its performance over the validation training set and 30% as the test set. For example, in dis- set (left-out fold) while applying it to each of the 10 criminating healthy from unhealthy subjects, we used parameters. Therefore, for each fold, we generate 10 per- 77 subjects (i.e. 35 patients plus 42 normal subjects) as formance measures. We repeat this procedure for the the training set and 33 subjects (i.e. 15 patients plus 18 succeeding folds, which means performing the steps for normal) as the test set. While in FP type classification k times (i.e. with k = 10, since we are using 10-fold cross problem 35 unhealthy cases (i.e. 28 PP and 7 CP) as our validation) will give us 100 performance measures. We training set and 15 (i.e. 12 PP and 3 CP) as our test set. calculate the average performance in all folds. This yields 10 average performance measures (i.e. for each parame- Each subject was asked to perform 4 facial move- ter n), each of which corresponds to one specific hybrid ments. During image pre-processing, resolutions were model. Then we choose the best hybrid model, i.e. a converted to 960 x 720 pixels. Facial palsy type of each model with lambda that minimizes errors. We retrain the subject was pre-labeled based on the clinicians’ evalua- selected model on all of D’, test this out over the hidden tion, which was used during the training stage. This was test set T = ((x , y ), ·, (x , y )), i.e. 30% of the dataset D followed by the feature extraction or the calculation of 1 1 t t and get the performance of the hybrid model. the area of the extracted iris and the distances between the identified points as presented in Fig. 4. Overall, we utilize11featurestotrain theclassifier.The samplesfor Table 2 Comparison of the performance of different classifiers healthy and unhealthy classifier were categorized into for facial palsy classification two labels: 0-healthy, 1-unhealthy. Similarly, samples for classifier Sensitivity(%) Specificity(%) unhealthy subjects were classified into two labels: 0- RLR 85.9 97.7 central palsy and 1-peripheral palsy. It can be noted that RF 92.3 95.0 healthy subjects have very minimal asymmetry in both sides of the face resulting to a ratio that approaches to 1. SVM 72.5 94.8 DT 90.2 94.0 Facial paralysis type classification NB 79.9 95.4 Regularized logistic regression (RLR), Support Vector hDT_RLR 97.5 94.9 Machine (SVM), random forest (RF), naïve bayes (NB), hDT_RF 94.3 95.4 and classification tree (DT) were also utilized to com- hDT_SVM 96.9 90.0 pare with our hybrid classifiers. Given the dataset that is not very large, we adopt the k-fold cross-validation hDT_NB 96.9 90.9 test scheme in forming a hybrid model. The proce- hLR_RF 92.2 94.1 dure involves 2 steps: rule extraction and hybrid model hLR_SVM 85.9 94.8 formation, as applied in our previous work [4]. hLR_DT 92.5 93.7 hLR_NB 85.9 95.4 Step 1: rule extraction. hRF_RLR 96.0 92.3 If we have a dataset D = ((x , y ), ... , (x , y )), we hold 1 1 n n out 30% of D and use it as a test set T = ((x , y ), ... , (x , 1 1 t hRF_SVM 97.1 90.5 y )), leaving 70% as our new dataset D’. We adopt k-fold hRF_DT 95.4 93.1 cross-validation test scheme over D’, i.e. with k = 10. For hRF_NB 96.3 90.2 example, if N = 80 samples, each fold has 8 samples. In hSVM_LR 85.9 94.8 each iteration, we leave one fold out and consider it as hSVM_RF 88.0 94.2 our validation set and utilize the remaining 9 folds as our hSVM_DT 92.5 93.4 training set to learn a model (e.g. rules extraction). Since we have 10 folds, we do this process for 10 repetitions. hSVM_NB 83.0 91.7 We extract rules by fitting the training set to a DT model. hNB_LR 85.9 95.4 hNB_RF 87.9 96.0 Step 2: hybrid model formation. hNB_DT 95.9 90.9 In this step, we form a hybrid model by combining the hNB_SVM 83.0 91.6 rules generated in each fold and the ML classifier. Each Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 11 of 14 We evaluate the classifiers by their average per- To illustrate the diagnostic ability of our classifier, we formance for 20 repetitions of k-fold cross-validation create a graphical plot called receiver operating charac- using k = 10. We repeat this process for evaluation of teristic (ROC) curve, showing the graphical plot while other hybrid model (e.g. rule-based + RF, rule-based varying the discrimination threshold. The ROC curve is + RLR, etc.) and finally choose the hybrid model that a plot of the true positive rate (TPR) on the y-axis ver- performs best. The hybrid classifiers, RF, SVM, RLR, sus the False Positive Rate (FPR) on the x-axis for every DT and NB were tested and experiments reveal that possible classification threshold. hybrid classifier rule-based + RLR (hDT_RLR) out- In machine learning, the true-positive rate, also known performed other classifiers for discriminating healthy as sensitivity or recall answers the question how does from unhealthy (i.e. with paralysis) subjects. Simi- the classifier predict positive when the actual classifica- larly, for the classification task of facial palsy (PP- tion is positive (i.e. not healthy). On the other hand, the peripheral and CP-central palsy), hDT_RLR hybrid clas- false-positive rate, which is also known as the fall-out or sifier is superior among other classifiers used in the probability of false alarm answers the question how does experiments. the classifier incorrectly predict positive when the actual Table 2 presents a comparison of the average per- classification is negative (i.e. healthy). formance of our hybrid classifier, RF, RLR, SVM, DT The ROC curve is therefore the sensitivity as a func- and NB based on our proposed approach. For FP type tion of fall-out. Figures 9 and 10 present the com- classification, our hDT_RLR hybrid classifier achieves parison of the area under ROC curve (AUC) of our a better performance on sensitivity of 5.2% higher hybrid hDT_RLR classifier for healthy and unhealthy than RLR, RF, SVM, DT and NB (see Table 2). Other discrimination and for FP type classification (i.e. cen- hybrid classifiers also show good results comparable tral or peripheral), respectively, using three differ- with hDT_RLR. However, in the context of our study, ent feature extraction methods: (a)Localized Active we are more concerned on designing a classifier that Contour-based method for key points extraction (LAC- yields a stable results on sensitivity performance mea- KPFE));(b)Localized Active Contour-based method for sure without necessarily sacrificing the specificity or fall- geometric and region features extraction (LAC-GRFE); out or the probability of false alarm. Hence, we employ and (c)Ensemble of Regression Tree-based method for hDT_RLR. geometric and region features extraction (ERT-GRFE). Fig. 9 Comparison of the ROC curve of our classifiers using different feature extraction methods (ERT-GRFE, LAC-GRFE and LAC-KPFE) for Healthy and Not Healthy classification Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 12 of 14 Fig. 10 Comparison of the ROC curve of our classifiers using different feature extraction methods (ERT-GRFE, LAC-GRFE and LAC-KPFE)for Facial Palsy classification Our proposed approach ERT-GRFE with hybrid clas- Table 3 shows that in discriminating healthy from sifier hDT_RLR achieves better performance in AUC unhealthy subjects, our proposed approach outperforms of 2.7%-4.9% higher than other two methods in dis- other methods that use key or salient points-based fea- criminating healthy from unhealthy (i.e. with paralysis) tures using LAC model, in terms of sensitivity and speci- subjects as shown in Fig. 8. Similarly, for the palsy type ficity with the improvement of 9% and 4.1%, respectively. classification: central palsy (CP) and peripheral palsy Similarly, experiments show that our approach outper- (PP), ERT-GRFE plus hDT_RLR hybrid classifier out- forms the previous method [4] that used geometric and performed the two (2) feature extraction methods LAC- region-based features (GRFE) using LAC model in terms GRFE and LAC-KPFE used in the experiments with at of sensitivity and specificity with an improvement of least 2.5%-7.7% as in Fig. 9. Experiments reveal that our 3.1% AND 5.9%, respectively. On the other hand, Table 4 method yields more stable results. reveals that for central and peripheral palsy classifica- Tables 3 and 4 present a comparison of the perfor- tion, our proposed ERT-based GFRE is way better than mance of the three methods for discriminating healthy the previous approach that solely used key points-based from unhealthy subjects; and for classifying facial palsy type, respectively. Each approach differs according to the Table 3 Comparison of the performance of the three methods features applied; and the corresponding methods used for healthy and unhealthy discrimination for extracting such features, which include: (a)Localized Active Contour-based method for key points feature LAC-based LAC-based ERT-based GRFE extraction (LAC-KPFE));(b)Localized Active Contour- KPFE GRFE (our approach) based method for geometric and region-based features Sensitivity 89.12% 95.01% 98.12% (LAC-GRFE); and (c)Ensemble of Regression Tree- Specificity 90.01% 88.12% 94.06% based method for geometric and region-based feature AUC 93.40% 95.56% 98.34% extraction (ERT-GRFE). Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 13 of 14 Table 4 Comparison of the performance of the three methods classifier provides an efficient quantitative assessment for facial palsy classification of the facial paralysis. Our facial palsy type classifier LAC-based LAC-based ERT-based GRFE provides a tool essential to physicians for deciding the KPFE GRFE (our approach) medical treatment scheme to begin the patient’s rehabil- itation process. Furthermore, our approach has several Sensitivity 85.15% 91.09% 97.48% merits that are very essential to real application: Specificity 85.12% 88.10% 94.91% AUC 89.81% 95.01% 97.48% • geometric and iris region features based on ensemble of regression trees and optimized nd Daugman’s theory (using parabolic function, i.e. 2 degree polynomial), respectively, allows efficient features with an improvement of around 12% and 9%, identification of the asymmetry of the human face, of the sensitivity and specificity performance measure, which reveals the significant difference between the respectively. Furthermore, experiments reveal that our normal and afflicted side, which localized active ERT-based GFRE proposed approach yields better per- contour model fail to track especially for peculiar formance, particularly in sensitivity and specificity with images (e.g. wrinkles, excessive mustache, an improvement of 6.4% and 6.8%, respectively. Thus, occlusions, etc.); our proposed approach is superior among other three ERT model does have have a very appealing quality methods. that reduces errors in each iteration, which can be very useful in extracting boundaries of the facial Discussion features from the background; and Empirical results [15]revealthatensembleofregres- ERT model does not require proper or perfect sion trees (ERT) model does have an appealing quality identification of initial evolving curves to ensure that it performs shape invariant feature selection while accurate facial feature extraction. minimizing the same loss function during training and our method significantly lessens the time test time, which significantly lessens time complexity complexity of extracting features, without of extracting features. True enough, our experiments sacrificing the level of accuracy making it more reveal that ERT-based method for geometric and region suitable for the real application. features extraction (ERT-GRFE) works well. Further- more, our proposed approach to combine iris segmenta- Abbreviations 1D: 1 Dimensional; AUC: area under ROC curve; CP: Central palsy; CT: tion and the ERT-based key point detection for feature Classification tree; DT: Decision Tree; ERT: Ensemble of regression trees; extraction provides a better discrimination of central and ERT-GRFE: Ensemble of Regression Tree-based method for geometric and peripheral palsy most especially in 'raising of eyebrows' region features extraction; FP: Facial Paralysis; FPR: False Positive Rate; HOG: Histogram of Oriented Gradients; hDT_RLR: hybrid Decision tree and and 'screwing of nose' movements. It shows changes of Regularized Logistic Regression; IC: Inner Canthus; IO: Infra Orbital; IRB: the structure on edges of the eye, i.e., the significant dif- Institution Review Board; LAC: Localized Active Contour; LAC-GRFE: Localized ference between the normal side and the palsy side for Active Contour-based method for geometric and region features extraction; some facial movements (e.g. eyebrow lifting, nose screw- LAC-KPFE: Localized Active Contour-based method for key points extraction; MA: Mouth Angle; ML: Machine learning; NB: Naive bayes; NT: Nose tip; OC: ing, and showing of teeth). Also, features based on the Outer Canthus; PP: Peripheral palsy; RF: Random Forest; RLR: Regularized combination of iris and key points by utilizing ensem- Logistic Regression; ROC: Receiver operating characteristic curve; ROI: bleofregression tree technique canmodel thetypical Region of interest; SO: Supra-orbital; SVM: Support Vector Machine; TPR: True positive rate; UE: Upper Eyelids changes in the eye region. A closer look at the perfor- mance of our classifier, as shown in Tables 3 and 4 reveal Acknowledgments There are no acknowledgments. interesting statistics in terms of the specific abilities of Funding the three methods. Our method proves to have signifi- This research was supported by the National Research Foundation of Korea cant contribution in discriminating central from periph- (NRF-2017M3C4A7065887) and National IT Industry Promotion Agency grant eral palsy patients and healthy from facial palsy subjects. funded by the Ministry of Science and ICT and Ministry of Health and Welfare (NO. C1202-18-1001, Development Project of The Precision Medicine The combination of iris segmentation and ERT-based Hospital Information System (P-HIS)); and the scholarship was granted by the key point approach is more suitable for this operation. Korean Government Scholarship Program - NIIED, Ministry of Education, South Korea. Conclusion Availability of data and materials In this study, we present a novel approach to address The datasets generated and/or analysed during the current study are not available due to patient privacy. FP classification problem in facial images. Salient points and iris detection based on ensemble of regression trees Authors’ contributions are employed to extract the key features. Regularized Conceived and designed the methodology and and analyzed the methods: JB, JK. Performed experiments and programming: JB. Collected images and Logistic Regression (RLR) plus Classification Tree (CT) Barbosa et al. BMC Medical Imaging (2019) 19:30 Page 14 of 14 performed manual evaluation: WS. All authors wrote the manuscript. All 13. Liu L, Cheng G, Dong J, Qu H. Evaluation of facial paralysis degree authors read and approved the final manuscript. based on regions. In: Proceedings of the 2010 Third International Conference onKnowledge Discovery and Data Mining. Washington, DC: Ethics approval and consent to participate IEEE Computer ScienceSociety; 2010. p. 514–7. This study was approved by the Institution Review Board (IRB)of Korea 14. Otsu N. A threshold selection method from gray-level histogram. IEEE University, Guro Hospital (with reference number MD14041-002). The board Trans Syst, Man Cybern. 1979;9:62–6. permitted not taking an informed consent due to the retrospective design 15. Kazemi V, Sullivan J. One millisecond face alignment with an ensemble of this study. of regression trees. In: Proc. IEEE Conf Comput Vis Pattern Recog. New York City: IEEE; 2014. p. 1867–1874. 16. Samsudin W, Sundaraj K. Image processing on facial paralysis for facial Consent for publication rehabilitation system: A review. In: IEEE International Conference on Not applicable. Control System, Computing and Engineering. Malaysia: IEEE; 2012. p. 259–63. Competing interests 17. Fasel B, Luttin J. Automatic facial expression analysis: Survey. Pattern The authors declare that they have no competing interests. Recog. 2003;36:259–75. 18. Ngo T, Seo M, Chen Y, Matsushiro N. Quantitative assessment of facial Publisher’s Note paralysis using local binary patterns and gabor filters. In: Proceedings of Springer Nature remains neutral with regard to jurisdictional claims in the 5th International Symposium on Information and Communication published maps and institutional affiliations. Technology (SoICT). New York: ACM; 2014. p. 155–61. 19. Déniz O, Bueno J, Salido F, De la Torre F. Face recognition using Author details histograms of oriented gradients. Pattern Recog Lett. 2011;32:1598–603. Department of Computer Science and Engineering, Korea University, Seoul, 20. Daugman J. How iris recognition works. New York City: IEEE; 2004. p. South Korea. IT Department, University of Science and Technology of 21–30. vol. 14, IEEE Trans Circ Syst Vi Technol. Southern Philippines, Cagayan de Oro, Philippines. Department of 21. Dalal N, Triggs B. Histograms of oriented gradients for human Neurology and Stroke Center, Samsung Medical Center, Seoul, South Korea. detection. In: Proc IEEE Conf Computer Vision and Pattern Recognition. Sungkyunkwan University School of Medicine, Department of Digital New York City: IEEE; 2005. Health, SAIHST, Sungkyunkwan University, Seoul, South Korea. 22. Lyons M, Budynek J, Akamatsu S. Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell. 1999;21(12):57–62. Received: 3 December 2018 Accepted: 2 April 2019 23. Lyons M, Akamatsu S, Kamachi M, Gyoba J. Coding facial expressions with gabor wavelets. In: Third IEEE Conf. Face and Gesture Recognition. IEEE; 1998. p. 200–5. 24. Proenca H, Alexandre L. Ubiris: A noisy iris image database. In: Proceed. References of ICIAP 2005 - Intern. Confer. on Image Analysis and Processing. United 1. Baugh R, Ishii G, RSchwartz S, Drumheller CM, Burkholder R, Deckard States: Springer Link; 2005. p. 970–977. N, Dawson C, Driscoll C, BoydGillespie M. Clinical practice guideline bellspalsy. Otolaryngol-Head Neck Surg. 2013;149:1–27. 2. Peitersen E. Bell’s palsy: the spontaneous course of 2,500 peripheral facialnerve palsies of different etiologies. Acta OtoLaryngol. 2002;122(7): 4–30. 3. Kanerva M. Peripheral facial palsy grading, etiology, and melkersson–rosenthal syndrome. PhD thesis. Finland: University of Helsinki; 2008. 4. Barbosa J, Lee K, Lee S, Lodhi B, Cho J, Seo W, Kang J. Efficient quantitative assessment of facial paralysis using iris segmentation and active contour-based key points detection with hybrid classifier. BMC Med Imaging. 2016;16:23–40. 5. May M. Anatomy for the clinician. In: May M, Schaitkin B, editors. The Facial Nerve, May’s Second Edition. 2nd. New York: Thieme, New York; 2000. p. 19–56. 6. Wachtman GS, Liu Y, Zhao T, Cohn J, Schmidt K, Henkelmann TC, VanSwearingen JM, Manders EK. Measurement of asymmetry in persons with facial paralysis. In: Combined Annu Conf. Robert H. Ivy. Ohio Valley: Societies of Plastic and Reconstructive Surgeons; 2002. 7. Liu Y, Schmidt K, Cohn J, Mitra S. Facial asymmetry quantification for expression invariant human identification. Comput Vis Image Underst J. 2003;91:138–59. 8. He S, Soraghan J, O’Reilly B, Xing D. Quantitative analysis of facial paralysis using local binary patterns in biomedical videos. IEEE Trans Biomed Eng. 2009;56:1864–70. 9. Wang S, Li H, Qi F, Zhao Y. Objective facial paralysis grading based on pface and eigenflow. Med Biol Eng Comput. 2004;42:598–603. 10. Anguraj K, Padma S. Analysis of facial paralysis disease using image processing technique. Int J Comput Appl(0975 – 8887). 2012;54:1–4. 11. Dong J, Ma L, Li W, Wang S, Liu L, Lin Y, Jian M. An approach for quantitative evaluation of the degree of facial paralysis based on salient point detection. In: International Symposium on Intelligent Information Technology Application Workshops. New York City: IEEE; 2008. 12. Liu L, Cheng G, Dong J, Qu H. Evaluation of facial paralysis degree based on regions. In: IEEE Computer Science Society Washington U.DC (ed.), editor. Third International Conference onKnowledge Discovery and Data Mining. Washington, DC: IEEE Computer Science Society; 2010. p. 514–7.

Journal

BMC Medical ImagingSpringer Journals

Published: Apr 25, 2019

There are no references for this article.