Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A method for improving semantic segmentation using thermographic images in infants

A method for improving semantic segmentation using thermographic images in infants Background: Regulation of temperature is clinically important in the care of neonates because it has a significant impact on prognosis. Although probes that make contact with the skin are widely used to monitor temperature and provide spot central and peripheral temperature information, they do not provide details of the temperature distribu- tion around the body. Although it is possible to obtain detailed temperature distributions using multiple probes, this is not clinically practical. Thermographic techniques have been reported for measurement of temperature distribution in infants. However, as these methods require manual selection of the regions of interest (ROIs), they are not suitable for introduction into clinical settings in hospitals. Here, we describe a method for segmentation of thermal images that enables continuous quantitative contactless monitoring of the temperature distribution over the whole body of neonates. Methods: The semantic segmentation method, U-Net, was applied to thermal images of infants. The optimal com- bination of Weight Normalization, Group Normalization, and Flexible Rectified Linear Unit (FReLU) was evaluated. U-Net Generative Adversarial Network (U-Net GAN) was applied to thermal images, and a Self-Attention (SA) module was finally applied to U-Net GAN (U-Net GAN + SA) to improve precision. The semantic segmentation performance of these methods was evaluated. Results: The optimal semantic segmentation performance was obtained with application of FReLU and Group Normalization to U-Net, showing accuracy of 92.9% and Mean Intersection over Union (mIoU) of 64.5%. U-Net GAN improved the performance, yielding accuracy of 93.3% and mIoU of 66.9%, and U-Net GAN + SA showed further improvement with accuracy of 93.5% and mIoU of 70.4%. Conclusions: FReLU and Group Normalization are appropriate semantic segmentation methods for application to neonatal thermal images. U-Net GAN and U-Net GAN + SA significantly improved the mIoU of segmentation. Keywords: Thermography, Semantic segmentation, Infants, Temperature Background Neonatal body temperature is known to have a signifi - cant effect on prognosis [1–5], and body temperature is inversely correlated with mortality in infants [1, 2, 4]. As temperature management is clinically important in *Correspondence: hidetsugu.asano@atomed.co.jp Technical Department, Atom Medical Corporation, 2-2-1, Dojo, neonatal care, a number of organizations, including the Sakura-ku, Saitama city, Saitama 338-0835, Japan World Health Organization (WHO), have proposed Full list of author information is available at the end of the article © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Asano et al. BMC Medical Imaging (2022) 22:1 Page 2 of 13 guidelines for neonatal temperature management [6–9]. image processing methods, such as edge extraction and However, there is still a lack of evidence regarding the ellipse fitting, for automatic ROI extraction in thermal optimal body temperature for infants [8]. Karlsson et  al. images of adults. However, these methods aim to exclude [10] investigated the differences in temperature of the other regions from the ROI, and are unable to segment head, body, arms, legs, and feet of healthy infants, and the human body into regions. Abbas et al. [27] proposed reported that differences in skin temperature at different a method for tracking analysis points using temporally sites can be used for diagnosis of infants [10–15]. Knobel continuous thermal images of infants, which allowed et al. [15] measured body temperature using thermistors analysis of the temporal variability of the analysis points. attached to the abdomen and feet of very low birth weight However, it was still necessary to set the analysis points (VLBW) infants, and reported its relation to peripheral manually in their method. vasoconstriction. These reports suggest the importance Deep Learning may be applicable to address the dis- of temperature control and detailed regional temperature advantages of these methods. There has been significant measurement in infants. However, these studies used progress in research on semantic segmentation, especially contact-type probes, which are associated with a num- in the field of automatic driving [28–30]. The application ber of issues that lead to inaccuracy of measurements, of semantic segmentation to thermal images of infants including probe position, fixation method, contact with would allow detailed analysis of global information. Ron- the skin, and the inability to measure the temperature neberger et  al. [31] proposed U-Net as a segmentation distribution over the whole body. Therefore, a number method for cellular images. U-Net has been used for seg- of recent studies used infrared thermography, a non- mentation of biomedical images, and has been applied contact, continuous thermal imaging technique that uses in a number of studies because of its stability and high infrared light emitted from objects in accordance with performance. Antink et  al. [32] proposed a method for heat, which is assumed to be the surface temperature in segmenting the body parts of neonates from RGB images. neonates [16–21]. At present, contact-type probes are In addition, there have been a number of studies on auto- used for continuous temperature measurement, but their matic classification of organs on magnetic resonance use is associated with hygiene risks and they can damage imaging (MRI) and computed tomography (CT) images the fragile skin of infants. However, there is increasing [33–35]. Deep Learning has also been applied to thermal interest in the application of neonatal thermography as images for medical applications. Lyra et  al. [36] applied it can reduce these risks. Medical adhesive-related skin Yolov4 [37] to thermal images for automatic extraction injuries (MARSI) are a known clinical problem, which of patients and medical staff and calculation of vital signs is particularly important in neonatal care, and the risk from the detected regions. Kwasniewska et  al. [38] per- of such injuries must be reduced [22–24]. Knobel et  al. formed image resolution enhancement of thermal images [16] examined the differences in temperature distribu - to increase the accuracy of estimation of vital signs from tion between the chest and abdomen due to necrotizing thermal images. Moreover, Ekici et al. [39] applied Deep enterocolitis (NEC) in VLBW infants, and reported that Learning to detect breast cancer in thermal images. How- children with NEC had significantly lower abdominal ever, the application of Deep Learning to thermal images temperatures compared to healthy infants. Using ther- in neonates has not been investigated in sufficient detail. mal imaging, Knobel et  al. [17] also demonstrated that Generative Adversarial Network (GAN) is a Deep the temperature of the feet was higher than that of the Learning method that has been under development in abdomen within the first 12  h of life in VLBW infants. recent years. GAN is a learning method proposed by Abbas et  al. [18] developed a detailed measurement Goodfellow et  al. [40] in which a Generator network model to accurately measure body temperature in infants that generates images and a Discriminator network that based on thermal images, and Ussat et al. [19] proposed a determines whether an input image is a natural or gen- non-contact method for measurement of respiratory rate erated image compete with each other. There have been based on the temperature difference of inhaled air. a number of reports of the application of GAN in image Therefore, there have been a number of studies on the style transformation, etc. [41, 42]. It has been applied utility of thermography for monitoring the body tem- in a number of fields, including Semantic segmenta - perature of infants. However, it was necessary to set the tion, where the loss function is difficult to define. Self- region of interest (ROI) manually for each analysis, pre- Attention (SA) [43] is a method that has had a significant venting continuous evaluation and therefore the evalua- impact on improving the performance of Deep Learning. tion was not strictly quantitative. There has been marked progress in the development of To address this issue, there have been a number of Deep Learning in the field of natural language process - studies regarding automated processing of ROIs by com- ing, and high-performance networks using the Attention puter. Duarte et  al. [25] and Rodriguez et  al. [26] used mechanism have been proposed [44, 45]. SA is a method A sano et al. BMC Medical Imaging (2022) 22:1 Page 3 of 13 that applies these techniques to image processing, ena- various variations in size, position, etc., were captured bling more complex analysis by learning and assigning for 66–140 h in each case, for a total of 1032 h. Figure  1 meaning to relationships between pixels, such as between shows an example of a thermal image obtained using this words in a sentence. In conventional convolutional net- system. works, local variations in an image are extracted and A total of 400 images were selected at random from weighted to achieve detection. SA takes into account the the thermographic images, excluding those taken dur- relations between the intensities of the pixel values in ing treatment or nursing care by medical staff, and the weighting, making it possible to express changes in the ground truth was generated manually. The pixels of the importance of pixel values. thermal images were divided into five classes, i.e., head, For continuous quantitative analysis of thermal images, body, arms, legs, and “other.” The cervical region was semantic segmentation can be applied for automatic ROI defined as the head, and the shoulder region was defined setting in infants. In this study, we propose a suitable as part of the arm region. In addition, diapers, probes, method for semantic segmentation of thermal images tubes, respiratory masks, and hair in the images were in infants. An accurate semantic segmentation method strictly excluded as non-skin areas. The definition of would enable detailed analysis of the temperature of each ground truth was made by a skilled neonatologist, who region of an infant’s entire body surface. This will ena - also checked the generated ground truth, as shown in ble early detection of diseases, such as sepsis and NEC, Fig.  2. Subsequent training and testing were conducted which are currently difficult to detect. Early detection of using the generated ground truth. these diseases will lead to better prognosis and to new The network structure was based on U-Net for ther - standards of care. Considering the extension to disease mal image segmentation, and we applied the Convolu- prediction using Deep Learning, we investigated meth- tion–Batch Normalization–Rectified Linear Unit (ReLU) ods of segmentation with the maximum possible accu- (CBR) structure used in ResNet [46]. As U-Net is often racy and detail. The methods and their performance were the first choice for semantic segmentation of medical evaluated using thermal images acquired in a clinical images, it was also used in this study as the base archi- setting. tecture and was shown to be suitable for analyzing ther- mal images of infants. The detailed network structure is Methods shown in Table  2. The total network was a 22-stage fully Twelve preterm infants without congenital or underly- convolutional network. A number of functions have been ing diseases, born at Nagasaki Harbor Medical Center proposed to improve the performance of networks, but (NHMC) and requiring incubator support, were included most have been evaluated only on RGB images, and there in this study. The characteristics of the patients are have been no reports of evaluation of thermal images. shown in Table  1. The median ± standard deviation Therefore, Weight Normalization [47], Group Normali - (SD) of the gestational age of the infants included in the zation [48], and Flexible Rectified Linear Unit (FReLU) study was 34 ± 2.8 weeks, birth weight was 2053 ± 712  g, [49], which have already been evaluated on images, were mean age at the start of imaging was 0 + 0.8  days, and applied to compare their accuracy on thermal images. male:female ratio was 7:5. This study was approved by the Weight Normalization was replaced by convolution, Ethics Committee of Nagasaki Harbor Medical Center Group Normalization by Batch Normalization, and (Approval No. NIRB No. R02-006). The research was car - FReLU by ReLU, and all combinations were evaluated. ried out in accordance with the Declaration of Helsinki. Preliminary experiments were conducted at 2-, 4-, 5-, 8-, A thermography camera was installed on the upper and tenfold at the image level, and the experiment was part of the incubator at the side closest to the feet of the assumed to be conducted at fourfold, where accuracy infant. Data with a resolution of 320 × 256 were acquired began to drop. With fourfold cross-validation, the clas- at 1 fps using a thermal camera (FLIR A35; FLIR, Mid- sification accuracy of segmentation and Mean Intersec - dletown, NY, USA). Thermographic images with tion over Union (mIoU) were used as evaluation metrics. Cross Entropy Loss was used as the loss function. No pre-training was performed. Table.1 Participant characteristics Furthermore, based on the network with the high- est accuracy in the above comparison, GAN and SA Characteristic (n = 12) Median ± SD were applied to extend the network, and the accuracy Gestational week at delivery 34 ± 2.8 was evaluated again. Here, we extended U-Net GAN Birth weight (kg) 2053 ± 712 [50] proposed by Schonfeld et  al., an image generation Age (days) 0 ± 0.8 method that uses U-Net as a Discriminator, and applied Sex (male) 7 (58%) it to neonatal thermography. This method optimizes Asano et al. BMC Medical Imaging (2022) 22:1 Page 4 of 13 Fig. 1 Thermographic images. Many variations in thermal images were obtained with different sizes and positions of the infants: blue, 28 °C; red, 40 °C not only the entire image, but also each pixel, resulting for one image. The decoder output has the same image in images with fewer errors than traditional GAN. The size as the input and classifies real/fake on a pixel-by- segmentation system using U-Net GAN is shown in pixel basis. Fig.  3, where x represents the correct data for segmen- In addition to U-Net GAN, SA was used to improve per- tation and T represents the input thermal image. The formance. Unlike RGB images, thermal images represent output of the generator that performs the segmentation single-channel data of temperature only, and the relation- of the thermal image T is denoted by G(T ) . The Dis - ships between the temperatures are important for the criminator has Encoder and Decoder sections, and its analysis. Therefore, application of the SA module to the output consists of D (x) , which predicts the Real/Fake network will make it possible to evaluate not only the spa- enc classification of the whole image, and D (x) , which tial relations but also the appearance patterns of heat and dec predicts the Real/Fake classification of each pixel. feature intensities, which will enable more detailed analy- The network with the highest accuracy in the experi - sis. The structure of the network with incorporation of the ments described above is used as the Generator of SA module into U-Net GAN (U-Net GAN + SA) is shown U-Net GAN. Here, we conducted preliminary experi- in Table  2. The number of channels remains unchanged, ments, and the Discriminator network was made although the depth of the network is increased due to the with four layers of CBR blocks and half the number bottleneck structure. The loss function of the Discrimina - of channels. Using U-Net GAN, segmentation results tor, L , was calculated using Eq. 1: were constrained to be similar to the manually gener- cons L = L + L + L D D D enc D (1) dec ated ground truth, while preserving accuracy and sup- dec pressing overfitting. The detailed network structure of cons where L ,L , and L are the Encoder Loss, D D enc dec D dec U-Net GAN Discriminator is shown in Table  3. The Decoder Loss, and Consistency Loss of the Discrimina- encoder output of the Discriminator is average-pool- tor, respectively, and are expressed in Eqs. 2–4: ing of the most downscaled image data of U-net, and the full connect is used to identify the real/fake binary L =−E [logD (x)] − E [log (1 − D (G(T )))] D x enc T enc enc value. Therefore, the encoder output is one data output (2) A sano et al. BMC Medical Imaging (2022) 22:1 Page 5 of 13 Fig. 2 Examples of thermal images and ground truth. The head is shown in red, the body in yellow, the arms in green, the legs in blue, and the other regions in black � � � �   � � log 1 − D G T ) ( ( )] log [D (x)] dec i,j i,j dec i,j i,j   (3) L =−E − E dec x T width ∗ height width ∗ height cons 2 (4) L = ||D (mix(x, G(T ),M) − mix(D (x), D (G(T )),M)|| dec dec dec dec to correctly predict the Real/Fake classification of each where mix(x , x ,M) is the CutMix function [51], which 1 2 pixel. Consistency Loss also improves the stability of the mixes x and x according to the mask M , and width and 1 2 Discriminator’s prediction by placing constraints on the height are the width and height of the image, respec- CutMix of D (x) and D (G(T )) and the prediction dec dec tively. The loss is given by L to correctly predict the enc results of the CutMix of x and G(T ) to be the same. The Real/Fake classification of the whole image, and by L dec loss function, L , of the generator is also shown in Eq. 5: log [D (G(T ))] CrossEntropy(x, G(T )) dec i,j i,j i,j (5) L =−E log D (G(T )) + +· G T enc width*height width ∗ height Asano et al. BMC Medical Imaging (2022) 22:1 Page 6 of 13 Table.2 Detailed network configuration of U-Net, U-Net GAN The first term represents the loss of the Discriminator Generator, and U-Net GAN + SA Generator and constrains segmentation to be similar to the ground truth. CrossEntropy(x , x ) represents the Cross Entropy 1 2 Layers Output size U-Net U-Net GAN + SA Loss, and  is a variable that balances the first and second Input 320 × 256 × 1 terms; in this paper,  = 0.1. Convolution 320 × 256 × 16 3 × 3, 16 d 3 × 3, 16 d As in the previous experiment, fourfold cross-valida- Downscale 160 × 128 × 32 5 × 5, 32 d, CBR 1 × 1, 32 d tion was performed to evaluate U-Net GAN and U-Net 3 × 3, 32 d, CBR 7 × 7, 32 d, SA GAN + SA. In addition to classification accuracy and 1 × 1, 32 d mIoU, a Confusion Matrix including U-Net was used as Downscale 80 × 6464 5 × 5, 64 d, CBR 1 × 1, 64 d 3 × 3, 64 d, CBR 7 × 7, 64 d, SA an evaluation metric. 1 × 1, 64 d For training, a PC with an AMD Ryzen 7 3700X CPU, Downscale 40 × 32 × 128 5 × 5, 128 d, CBR 1 × 1, 128 d 64  GB of memory, and a GeForce RTX 3090 GPGPU 3 × 3, 128 d, CBR 7 × 7, 128 d, SA running Windows 10 was used. We used Python 3.7 as 1 × 1, 128 d the programming language and Pytorch 1.1 was used as Downscale 20 × 16 × 256 5 × 5, 256 d, CBR 1 × 1, 256 d a deep learning package. The optimal values of learning 3 × 3, 256 d, CBR 7 × 7, 256 d, SA 1 × 1, 256 d parameters (i.e., network depth, number of channels per Downscale 10 × 8 × 512 5 × 5, 512 d, CBR 1 × 1, 512 d layer, batch size, learning rate) were determined through 3 × 3, 512 d, CBR 7 × 7, 512 d, SA a preliminary experiment. The number of training epochs 1 × 1, 512 d was determined before the model began overfitting. The Upscale 20 × 16 × 256 5 × 5, 256 d, CBR 1 × 1, 256 d parameters used for training are shown in Table  4. For 3 × 3, 256 d, CBR 7 × 7, 256 d, SA 1 × 1, 256 d Augmentation, we performed a vertical flip of the image Upscale 40 × 32 × 128 5 × 5, 128 d, CBR 1 × 1, 128 d and added random noise to each pixel. AMSGrad [52] 3 × 3, 128 d, CBR 7 × 7, 128 d, SA was used as the optimizer. 1 × 1, 128 d Statistical analyses were conducted to compare the Upscale 80 × 64 × 64 5 × 5, 64 d, CBR 1 × 1, 64 d accuracy between the methods. The Steel–Dwass test 3 × 3, 64 d, CBR 7 × 7, 64 d, SA 1 × 1, 64 d was applied as a nonparametric multiple comparison Upscale 160 × 128 × 32 5 × 5, 32 d, CBR 1 × 1, 32 d test. All analyses were performed using JMP 15 statisti- 3 × 3, 32 d, CBR 7 × 7, 32 d, SA cal software. For a detailed evaluation of segmentation 1 × 1, 32 d performance, the Hausdorff distance and IoU for each Upscale 320 × 256 × 16 5 × 5, 16 d, CBR 1 × . 1, 16 d region were calculated. 3 × 3, 16 d, CBR 7 × 7, 16 d, SA 1 × 1, 16 d Convolution 320 × 256 × 1 3 × 3, 1 d 3 × 3, 1 d Loss Real/Fake Input Generator Prediction Discriminator Real/Fake per pixel Grand Truth Fig. 3 Network diagram of U-Net GAN A sano et al. BMC Medical Imaging (2022) 22:1 Page 7 of 13 Table.3 Detailed network configuration of U-Net GAN U-Net showed very high segmentation accuracy with discriminator and U-Net GAN + SA discriminator a validation accuracy of 91.3% (SD 0.04%) and mIoU of 57.8% (SD 0.15%). FReLU showed improvements of 0.6% Layers Output size U-Net U-Net GAN + SA (SD 0.04%) in accuracy and 3.1% (SD 0.16%) in mIoU, Input 320 × 256 × 1 while Group Normalization showed improvements Convolution 320 × 256 × 8 3 × 3, 8 d 3 × 3, 8 d of 0.9% (SD 0.04%) in accuracy and 4.4% (SD 0.14%) in Downscale 160 × 128 × 16 5 × 5, 16 d, CBR 1 × 1, 16 d mIoU. However, Normalized Convolution decreased the 3 × 3, 16 d, CBR 7 × 7, 16 d, SA accuracy by 0.2% (SD 0.05%), but improved the mIoU by 1 × 1, 16 d 3.1% (SD 0.15%). The best results were obtained with the Downscale 80 × 64 × 32 5 × 5, 32 d, CBR 1 × 1, 32 d 3 × 3, 32 d, CBR 77, 32 d, SA combined application of FReLU and Group Normaliza- 1 × 1, 32 d tion showing 92.9% (SD 0.04%) accuracy and mIoU of Downscale 40 × 32 × 64 5 × 5, 64 d, CBR 1 × 1, 64 d 64.5% (SD 0.15%). 3 × 3, 64 d, CBR 7 × 7, 64 d, SA U-Net GAN and U-Net GAN + SA showed validation 1 × 1, 64 d accuracy of 93.3% (SD 0.03%) and 93.5% (SD 0.04%), rep- Encoder out 5 ReLU ReLU resenting improvements of 0.7% and 0.9%, respectively, ( D (x)) Average Pooling Average Pooling enc Linear, 5d Linear, 5 d and mIoU of 66.9% (SD 0.13%) and 70.4% (SD 0.13%), Upscale 80 × 64 × 32 5 × 5, 32 d, CBR 1 × 1, 32 d representing improvements of 2.4% and 5.9%, respec- 3 × 3, 32 d, CBR 7 × 7, 32 d, SA tively, compared to the best results of U-Net (Table  6). 1 × 1, 32 d Finally, the confusion matrices for U-Net, U-Net GAN, Upscale 160 × 128 × 16 5 × 5, 16 d, CBR 1 × 1, 16 d and U-Net GAN + SA are shown in Fig.  4. For each net- 3 × 3, 16 d, CBR 7 × 7, 16 d, SA 1 × 1, 16 d work, the accuracy was 82%, 82%, and 87% for head, 82%, Upscale 0 × 256 × 8 5 × 5, 8 d, CBR 1 × 1, 8 d 87%, and 88% for body, 66%, 72%, and 68% for arms, 33, 8 d, CBR 7 × 7, 8 d, SA 86%, 85%, and 81% for legs, and 94%, 97%, and 96% for 1 × 1, 8 d other, respectively. The results of the Steel–Dwass test Convolution 320 × 256 × 2 3 × 3, 2 d 3 × 3, 2 d are shown in Table  7. Significant differences were found ( D (x)) dec between several methods. The results of the Hausdor ff distance and IoU for each region are shown in Tables  8 and 9, respectively. Table.4 Parameters used for training Parameter Net U-Net GAN U-Net GAN + SA Learning rate 0.01 0.01 (generator) 0.01 (generator) Discussion 1e−4 (discriminator) 1e−4 (discriminator) All of the methods examined here showed highly accu- Batch size 75 30 12 rate classification performance. FReLU and Group Nor - Epoch 200 100 100 malization improved the classification accuracy and mIoU of U-Net, which was considered to be due to the improved representativeness of the network. Group Nor- malization shows that normalization within the channels Results of the network is more effective than Batch Normaliza - The accuracy of segmentation using U-Net was evalu - tion in this problem. This was because the input data ated and the results are shown in Table  5. Even standard Table.5 Segmentation performance using U-Net with and without normalized convolution, FReLU, and group normalization Normalized FReLU Group normalization Accuracy (%) SD (%) mIoU (%) SD (%) convolution 91.3 0.04 57.8 0.15 ✓ 91.1 0.05 60.9 0.15 ✓ 91.9 0.04 60.9 0.16 ✓ 92.2 0.04 62.2 0.14 ✓ ✓ 91.4 0.05 60.7 0.15 ✓ ✓ 92.4 0.04 63.8 0.13 ✓ ✓ 92.9 0.04 64.5 0.15 ✓ ✓ ✓ 92.4 0.04 62.9 0.15 Asano et al. BMC Medical Imaging (2022) 22:1 Page 8 of 13 Table.6 Segmentation performance of U-Net, U-Net GAN, and improvement. There was no significant difference U-Net GAN + SA between FReLU and U-Net GAN + SA, thus confirming the effectiveness of FReLU. U-Net GAN + SA showed Network Accuracy (%) SD (%) mIoU (%) SD (%) significant differences in many cases compared to the U-Net 92.9 0.04 64.5 0.15 other methods, confirming that it is a powerful method. U-Net GAN 93.3 0.03 66.9 0.13 However, there were no significant differences between U-Net GAN + SA 93.5 0.03 70.4 0.13 the four sets of results: FReLU with Group Normaliza- tion, FReLU with Group Normalization and Normal- ized Convolution, U-Net GAN, and U-Net GAN + SA . This suggests that the performance improvement may be consisted only of temperature information with similar approaching its limit. backgrounds, so there were many regions with similar Similar results were obtained with Hausdorff distance. values, and Batch Normalization may have the effect of FReLU with Group Normalization, U-Net GAN, and over-averaging the data. On the other hand, Normal- U-Net GAN + SA performed better than the other meth- ized Convolution showed a decrease in accuracy but an ods in almost all regions, and the SD was also lower. In improvement in mIoU. Depending on the location of the all methods, the Hausdorff distance was larger for the thermal imaging camera and the view angle, the “other” arms and legs than for the head and body. In IoU, Other region had 13–23 times more pixels than the “infant” was the highest in all methods, which may have been due region. Thus, Normalized Convolution may decrease the to the lower temperature in the Other region compared number of missed skin regions, but increase the percent- to the neonate, thus making segmentation easier. U-Net age of false positive identification of other regions as skin GAN + SA showed better results for infant region seg- regions. The application of U-Net with FReLU and Group mentation. SA was also effective in Semantic Segmenta - Normalization showed 1.6% better accuracy and 6.7% tion of thermal images. better mIoU than ReLU and Batch Normalization. These U-Net GAN is optimized by combining multiple loss results confirmed that the combined use of these tools functions. The Discriminator classifies the manually gen - resulted in significant improvements, especially in mIoU. erated ground truth and the results of U-Net segmenta- Using the network with FReLU and Group Normali- tion, and in addition to the conventional GAN evaluation zation applied to U-Net as a baseline, U-Net GAN and on a per-image basis, it also evaluates and feeds back U-Net GAN + SA were confirmed to show beneficial the results on a per-pixel basis. This yields not only effects. higher performance than normal U-Net, but is also visu- Compared to the accuracy of U-Net of 92.9%, U-Net ally closer to the manually obtained ground truth. The GAN showed a 0.4% improvement in accuracy and 2.4% accuracy was further improved in U-Net GAN + SA by improvement in mIoU, and U-Net GAN + SA improved changing the Convolution to SA. SA, which strictly eval- accuracy by 0.6% and mIoU by 5.9%. uates the relationship between pixels, was considered to The results of the Steel–Dwass test showed signifi - be effective as temperature images have lower value vari - cant differences between several methods. In particu - ation and dimensionality compared to RGB images. The lar, FReLU alone showed a significant performance Fig. 4 Confusion matrices of U-Net, U-Net GAN, and U-Net GAN + SA A sano et al. BMC Medical Imaging (2022) 22:1 Page 9 of 13 Table.7 Significant differences between the proposed methods Normalized Convolution FReLU Group Normalization ** *** * ** *** * ****** **** ** * U-Net GAN U-Net GAN+SA *p < 0.01, **p < 0.05 Table.8 Hausdorff distance for each region Head Body Arm Leg Other All (w/o other) Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD UNet 34.6 25.3 38.1 29.5 59.2 37.7 43.4 40.0 26.7 9.3 43.9 34.8 Normalized convolution 33.0 23.6 31.5 26.2 55.7 35.6 46.5 50.7 26.5 8.9 41.5 36.4 FReLU 31.2 21.3 31.2 22.6 58.4 38.0 42.8 40.0 25.5 9.1 40.8 33.3 Group normalization 30.3 18.2 30.1 21.9 63.4 40.0 50.7 48.6 25.8 9.5 43.4 36.9 Normalized convolution FReLU 27.8 18.9 31.1 25.6 57.2 37.1 47.9 47.5 25.5 9.2 40.8 35.7 Normalized convolution group 30.4 21.3 30.0 20.6 64.9 34.1 52.9 50.0 26.7 9.9 44.3 36.3 normalization FReLU group normalization 27.5 17.8 25.2 20.2 48.7 32.8 38.6 38.3 25.4 8.8 34.9 29.8 ALL 26.8 17.2 28.5 22.4 53.9 36.9 48.5 49.9 25.3 9.9 39.1 35.5 U-Net GAN 27.4 19.7 26.7 22.7 49.3 34.1 39.0 42.7 24.5 9.1 35.5 32.1 U-Net GAN + SA 27.1 17.7 26.7 22.7 46.3 32.6 41.4 42.1 23.7 9.4 35.2 31.1 Normalized Convolution FReLU Group Normalization U-Net GAN U-Net GAN+SA Asano et al. BMC Medical Imaging (2022) 22:1 Page 10 of 13 Table.9 IoU for each region Head Body Arm Leg Other All IoU (%) SD (%) IoU (%) SD (%) IoU (%) SD (%) IoU (%) SD (%) IoU (%) SD (%) IoU (%) SD (%) UNet 50.8 0.16 52.1 0.16 41.6 0.17 53.5 0.23 91.1 0.03 57.8 0.15 Normalized convolution 57.5 0.14 48.5 0.15 47.4 0.15 59.8 0.23 91.5 0.03 60.9 0.14 FReLU 54.8 0.17 56.6 0.16 44.3 0.18 57.1 0.26 91.8 0.03 60.9 0.16 Group normalization 56.4 0.16 60.1 0.15 43.1 0.16 58.6 0.24 93.0 0.03 62.2 0.15 Normalized convolution FReLU 55.1 0.14 57.3 0.14 41.4 0.13 57.6 0.21 92.2 0.03 60.7 0.13 Normalized convolution group 58.0 0.15 61.2 0.16 47.3 0.17 60.2 0.24 92.4 0.03 63.8 0.15 normalization FReLU group normalization 59.2 0.16 62.3 0.15 47.7 0.17 61.4 0.23 92.0 0.03 64.5 0.15 ALL 58.2 0.15 59.3 0.15 47.9 0.16 58.0 0.25 91.3 0.03 62.9 0.15 U-Net GAN 61.5 0.14 64.3 0.14 49.1 0.15 66.4 0.2 93.4 0.03 66.9 0.13 U-Net GAN + SA 64.8 0.14 67.9 0.14 57.9 0.14 67.6 0.2 93.6 0.02 70.4 0.13 Fig. 5 Examples of the differences in segmentation results between U-Net, U-Net GAN, and U-Net GAN + SA. a Input. b Ground truth. c U-Net. d U-Net GAN. e U-Net GAN + SA temperature image, ground truth, and images obtained showed high accuracy, but the features differed between by segmentation using U-Net, U-Net GAN, and U-Net methods. U-Net segmented the images with smooth GAN + SA are shown in Fig. 5. The results of all methods boundaries. On the other hand, it misdetected thin A sano et al. BMC Medical Imaging (2022) 22:1 Page 11 of 13 regions, such as cables on the body surface, resulting in the body temperature of infants and analyzing various finely over-segmented regions. U-Net GAN yielded a diseases. smoother segmentation shape and unnatural segmenta- Further studies are required to evaluate the accuracy tion was prevented, and U-Net GAN + SA successfully of measuring the body temperature of infants using our excluded fine non-skin areas, such as cables and the method. The segmentation accuracy was evaluated, but shapes near the boundaries of the segmented areas fol- the impact of this accuracy on the temperature measure- lowed the edges of the temperature information. These ment is not yet clear. Furthermore, large amounts of clin- results were attributed to the strict evaluation of temper- ical data will be collected and analyzed using the results ature relationships by SA, resulting in detailed semantics. obtained with this method to study the ability to predict The confusion matrix shown in Fig.  4 indicated that diseases and other conditions. In this process, the accu- the detection accuracy of each region differed between racy required for segmentation will be clarified. It will be methods. U-Net GAN + SA showed 5% higher detection necessary to examine these issues through clinical appli- accuracy for the head than the other methods. For the cation in future studies. body, U-Net GAN and U-Net GAN + SA showed 5%–6% higher accuracy than U-Net. For the arms, U-Net GAN Conclusion was 4–6% more accurate than the other methods, and for A U-Net-based network was confirmed to be able the legs, U-Net was 1–5% more accurate than the other to segment the skin area on thermographic ther- methods. U-Net GAN showed 1–3% higher accuracy for mal images of infants with high accuracy. FReLU and the other regions than the other methods. The features of Group Normalization were confirmed to be effec- the resulting segmented images differed according to the tive for thermal image segmentation. GAN was also method used, although the numerical differences were shown to improve the segmentation accuracy, and SA small. U-Net GAN + SA predicted the skin region of the achieved fine segmentation even on thermal images infant as “other” less frequently than the other methods, with few features. These tools contributed to the which was due to the strict evaluation of pixel-by-pixel improvement of mIoU, and U-Net GAN + SA showed temperature relationships by SA. The accuracy of U-Net a significant performance improvement over standard GAN + SA was higher for the head and body compared U-Net. to the other methods, while it showed lower accuracy for the arm and leg regions due to an increase in the number Abbreviations of cases where they were incorrectly detected as other CBR structure: Convolution–batch normalization–ReLU structure; CT: Com- skin regions. This was because the arms and legs have puted tomography; FReLU: Flexible rectified linear unit; GAN: Generative more variations in shape and positional relationships adversarial network; MARSI: Medical adhesive-related skin injuries; mIoU: Mean intersection over union; MRI: Magnetic resonance imaging; NHMC: Nagasaki than the head and body, and strictly evaluating the pixel- harbor medical center; NEC: Necrotizing enterocolitis; ReLU: Rectified linear by-pixel relationships leads to incorrect predictions. unit; ROI: Region of interest; SA: Self-attention; SD: Standard deviation; U-Net Therefore, additional training data and further augmen - GAN: U-Net generative adversarial network; U-Net GAN + SA: Self-attention module in U-Net GAN; VLBW: Very low birth weight; WHO: World Health tation are considered necessary for U-Net GAN + SA to Organization. detect arms and legs more accurately. U-Net and U-Net GAN tended to have slightly lower accuracy than U-Net Acknowledgements None. GAN + SA. However, SA requires a great deal of process- ing and large amounts of memory, so it is important to Authors’ contributions consider the device to be used and select the optimal HA analyzed and interpreted the data and contributed significantly to the preparation of the manuscript. EH designed the study, performed the experi- method to be applied. In medical applications, it is not ments, contributed to interpretation of the data, and revised the manuscript. necessary to evaluate the temperature of areas other than HH performed data analysis and data preparation, and contributed to the skin, and therefore U-Net GAN + SA is considered to interpretation of the data. KH performed experiments and contributed to interpretation of the data. YA and MO participated in the design of the experi- be effective. However, further improvements are needed ments. AU and TH contributed to interpretation of the data and revised the for regions where the shape and positional relation- manuscript. All authors have read and approved the final manuscript. ships may vary, such as the arms and legs, as the system Funding showed degradation of performance in such areas. Not applicable. The application of this method in clinical settings will enable continuous monitoring of temperature in Availability of data and materials The datasets generated and analyzed during the present study are not each region of the body. Further studies are required to publicly available due to participant privacy, but are available from the cor- confirm the effectiveness of this method in managing responding author on reasonable request. Asano et al. BMC Medical Imaging (2022) 22:1 Page 12 of 13 14. Lubkowska A, Szymański S, Chudecka M. Surface body temperature Declarations of full-term healthy newborns immediately after birth-pilot study. Int J Environ Res Public Health. 2019;16:1312. Ethics approval and consent to participate 15. Knobel RB, Holditch-Davis D, Schwartz TA, Wimmer JE Jr. Extremely low This study was approved by the Ethics Committee of the Nagasaki Harbor birth weight preterm infants lack vasomotor response in relationship to Medical Center, and the research was conducted (Approval No. NIRB No. cold body temperatures at birth. J Perinatol. 2009;29:814–21. R02-006). The research was carried out in accordance with the Declaration of 16. Knobel RB, Guenther BD, Rice HE. Thermoregulation and thermography Helsinki. Written informed consent was obtained from the parent or guardian in neonatal physiology and disease. Biol Res Nurs. 2011;13:274–82. of all participants. 17. Knobel-Dail RB, Holditch-Davis D, Sloane R, Guenther BD, Katz LM. Body temperature in premature infants during the first week of life: exploration Consent for publication using infrared thermal imaging. J Therm Biol. 2017;69:118–23. The parents of all participants consented to publication of the data in 18. Abbas AK, Heimann K, Blazek V, Orlikowsky T, Leonhardt S. Neonatal infra- anonymized form. red thermography imaging: analysis of heat flux during different clinical scenarios. Infrared Phys Technol. 2012;55:538–48. Competing interests 19. Ussat M, Vogtmann C, Gebauer C, Pulzer F, Thome U, Knüpfer M. The role Hidetsugu Asano, Hayato Hayashi, Yuto Asayama, and Masaaki Oohashi are of elevated central-peripheral temperature difference in early detection salaried employees of Atom Medical Corporation. of late-onset sepsis in preterm infants. Early Hum Dev. 2015;91:677–81. 20. Simpson RC, McEvoy HC, Machin G, Howell K, Naeem M, Plassmann Author details P, et al. In-field-of-view thermal image calibration system for medical Technical Department, Atom Medical Corporation, 2-2-1, Dojo, Sakura-ku, thermography applications. Int J Thermophys. 2008;29:1123–30. Saitama city, Saitama 338-0835, Japan. Department of Neonatology, Nagasaki 21. Topalidou A, Ali N, Sekulic S, Downe S. Thermal imaging applications in Harbor Medical Center, 6-39, Shinchi-machi, Nagasaki City, Nagasaki 850-8555, neonatal care: a scoping review. BMC Pregnancy Childbirth. 2019;19:381. Japan. Department of Neonatology, Kagoshima City Hospital, 37-1 Uea- 22. Lund CH, Nonato LB, Kuller JM, Franck LS, Cullander C, Durand DJ. rata-cho, Kagoshima City, Kagoshima 890-8760, Japan. Department of Clinical Disruption of barrier function in neonatal skin associated with adhesive Engineering, Nagasaki Harbor Medical Center, 6-39, Shinchi-machi, Nagasaki removal. J Pediatr. 1997;131(3):367–72. City, Nagasaki 850-8555, Japan. Department of Comprehensive Community 23. Dollison EJ, Beckstrand J. Adhesive tape vs pectin-based barrier use in Care Education, Nagasaki University Graduate School of Biomedical Sciences, preterm infants. Neonatal Netw. 1995;14(4):35–9. 1-14, Bunkyo-machi, Nagasaki City, Nagasaki 852-8521, Japan. Mobile Com- 24. Lund C, Kuller JM, Tobin C, Lefrak L, Franck LS. Evaluation of a pectin- puting Laboratory, Graduate School of Information Science and Technology, based barrier under tape to protect neonatal skin. J Obstet Gynecol Osaka University, 1-5, Yamadaoka, Suita, Osaka 565-0871, Japan. Neonatal Nurs. 1986;15(1):39–44. 25. Duarte A, Carrão L, Espanha M, Viana T, Freitas D, Bártolo P, et al. Segmen- Received: 1 June 2021 Accepted: 23 December 2021 tation algorithms for thermal images. Procedia Technol. 2014;16:1560–9. 26. Rodriguez-Lozano FJ, León-García F, Ruiz de Adana M, Palomares JM, Oli- vares J. Non-invasive forehead segmentation in thermographic imaging. Sensors (Basel). 2019;9:4096. 27. Abbas AK, Leonhardt S. Intelligent neonatal monitoring based on a References virtual thermal sensor. BMC Med Imaging. 2014;14:9. 1. Silverman WA, Fertig JW, Berger AP. The influence of the thermal environ- 28. Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: 2017 ment upon the survival of newly born premature infants. Pediatrics. IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 1958;22:876–86. 2. Vohra S, Frent G, Campbell V, Abbott M, Whyte R. Eec ff t of polyethylene 29. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N. BiSeNet: bilateral segmenta- occlusive skin wrapping on heat loss in very low birth weight infants at tion network for real-time semantic segmentation. In: Computer vision— delivery: a randomized trial. J Pediatr. 1999;134:547–51. ECCV 2018. Cham: Springer; 2018. p. 334–49. 3. O’Reilly JN. Heated carrier for transporting premature babies. Br Med J. 30. Zhu Y, Sapra K, Reda FA, Shih KJ, Newsam S, Tao A, et al. Improving 1945;2:731. semantic segmentation via video propagation and label relaxation. In: 4. Laptook AR, Bell EF, Shankaran S, Boghossian NS, Wyckoff MH, Kandefer 2019 IEEE/CVF conference on computer vision and pattern recognition S, et al. Admission temperature and associated mortality and morbidity (CVPR). IEEE; 2019. among moderately and extremely preterm infants. J Pediatr. 2018;192:53- 31. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for 59.e2. biomedical image segmentation. In: Lecture notes in computer science. 5. Asakura H. Fetal and neonatal thermoregulation. J Nippon Med Sch. Cham: Springer International Publishing; 2015. p. 234–41. 2004;71:360–70. 32. Antink CH, Ferreira JCM, Paul M, Lyra S, Heimann K, Karthik S, et al. Fast 6. Smith J, Alcock G, Usher K. Temperature measurement in the preterm and body part segmentation and tracking of neonatal video data using deep term neonate: a review of the literature. Neonatal Netw. 2013;32:16–25. learning. Med Biol Eng Comput. 2020;58(12):3049–61. 7. World Health Organization ( WHO). Thermal protection of the newborn: a 33. Zhou X, Ito T, Takayama R, Wang S, Hara T, Fujita H. Three-dimensional practical guide. Geneva: World Health Organization; 1997. CT image segmentation by combining 2D fully convolutional network 8. Perez A, van der Meer F, Singer D. Target body temperature in very low with 3D majority voting. In: Deep learning and data labeling for medical birth weight infants: clinical consensus in place of scientific evidence. applications. Cham: Springer; 2016. p. 111–20. Front Pediatr. 2019;7:227. 34. Ait Skourt B, El Hassani A, Majda A. Lung CT image segmentation using 9. Waldron S, MacKinnon R. Neonatal thermoregulation. Infant. deep neural networks. Procedia Comput Sci. 2018;127:109–13. 2007;3:101–4. 35. Myronenko A. 3D MRI brain tumor segmentation using autoencoder 10. Karlsson H, Hänel SE, Nilsson K, Olegård R. Measurement of skin tem- regularization. In: Brainlesion: glioma, multiple sclerosis, stroke and trau- perature and heat flow from skin in term newborn babies. Acta Paediatr. matic brain injuries. Cham: Springer; 2019. p. 311–20. 1995;84:605–12. 36. Lyra S, Mayer L, Ou L, Chen D, Timms P, Tay A, et al. A deep learning-based 11. Bensouda B, Mandel R, Mejri A, Lachapelle J, St-Hilaire M, Ali N. Tempera- camera approach for vital sign monitoring using thermography images ture probe placement during preterm infant resuscitation: a randomised for ICU patients. Sensors (Basel). 2021;21(4):1495. trial. Neonatology. 2018;113:27–32. 37. Bochkovskiy A, Wang C-Y, Liao H-YM. YOLOv4: optimal speed and accu- 12. Lyon AJ, Pikaar ME, Badger P, McIntosh N. Temperature control in very racy of object detection [Internet]. arXiv [cs.CV ]. 2020. Available from: low birthweight infants during first five days of life. Arch Dis Child Fetal http:// arxiv. org/ abs/ 2004. 10934. Neonatal Ed. 1997;76:F47-50. 38. Kwasniewska A, Ruminski J, Szankin M. Improving accuracy of contactless 13. Lantz B, Ottosson C. Using axillary temperature to approximate rectal respiratory rate estimation by enhancing thermal sequences with deep temperature in newborns. Acta Paediatr. 2015;104:766–70. neural networks. Appl Sci (Basel). 2019;9(20):4405. A sano et al. BMC Medical Imaging (2022) 22:1 Page 13 of 13 39. Ekici S, Jawzal H. Breast cancer diagnosis using thermography and convo- lutional neural networks. Med Hypotheses. 2020;137(109542):109542. 40. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks [Internet]. arXiv [stat.ML]. 2014. Available from: http:// arxiv. org/ abs/ 1406. 2661. 41. Isola P, Zhu J-Y, Zhou T, Efros AA. Image-to-image translation with condi- tional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2017. 42. Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV ). IEEE; 2017. 43. Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE; 2020. 44. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need [Internet]. arXiv [cs.CL]. 2017. Available from: http:// arxiv. org/ abs/ 1706. 03762. 45. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding [Internet]. arXiv [cs.CL]. 2018. Available from: http:// arxiv. org/ abs/ 1810. 04805. 46. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2016. 47. Salimans T, Kingma DP. Weight normalization: a simple reparameteriza- tion to accelerate training of deep neural networks [Internet]. arXiv [cs. LG]. 2016. Available from: http:// arxiv. org/ abs/ 1602. 07868. 48. Wu Y, He K. Group normalization. In: Computer vision—ECCV 2018. Cham: Springer; 2018. p. 3–19. 49. Ma N, Zhang X, Sun J. Funnel activation for visual recognition. In: Com- puter vision—ECCV 2020. Cham: Springer; 2020. p. 351–68. 50. Schonfeld E, Schiele B, Khoreva A. A U-net based discriminator for gen- erative adversarial networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE; 2020. 51. Yun S, Han D, Chun S, Oh SJ, Yoo Y, Choe J. CutMix: regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF international conference on computer vision (ICCV ). IEEE; 2019. 52. Reddi SJ, Kale S, Kumar S. On the convergence of Adam and beyond [Internet]. arXiv [cs.LG]. 2019. Available from: http:// arxiv. org/ abs/ 1904. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Medical Imaging Springer Journals

A method for improving semantic segmentation using thermographic images in infants

Loading next page...
 
/lp/springer-journals/a-method-for-improving-semantic-segmentation-using-thermographic-vB458se0Tc
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
eISSN
1471-2342
DOI
10.1186/s12880-021-00730-0
Publisher site
See Article on Publisher Site

Abstract

Background: Regulation of temperature is clinically important in the care of neonates because it has a significant impact on prognosis. Although probes that make contact with the skin are widely used to monitor temperature and provide spot central and peripheral temperature information, they do not provide details of the temperature distribu- tion around the body. Although it is possible to obtain detailed temperature distributions using multiple probes, this is not clinically practical. Thermographic techniques have been reported for measurement of temperature distribution in infants. However, as these methods require manual selection of the regions of interest (ROIs), they are not suitable for introduction into clinical settings in hospitals. Here, we describe a method for segmentation of thermal images that enables continuous quantitative contactless monitoring of the temperature distribution over the whole body of neonates. Methods: The semantic segmentation method, U-Net, was applied to thermal images of infants. The optimal com- bination of Weight Normalization, Group Normalization, and Flexible Rectified Linear Unit (FReLU) was evaluated. U-Net Generative Adversarial Network (U-Net GAN) was applied to thermal images, and a Self-Attention (SA) module was finally applied to U-Net GAN (U-Net GAN + SA) to improve precision. The semantic segmentation performance of these methods was evaluated. Results: The optimal semantic segmentation performance was obtained with application of FReLU and Group Normalization to U-Net, showing accuracy of 92.9% and Mean Intersection over Union (mIoU) of 64.5%. U-Net GAN improved the performance, yielding accuracy of 93.3% and mIoU of 66.9%, and U-Net GAN + SA showed further improvement with accuracy of 93.5% and mIoU of 70.4%. Conclusions: FReLU and Group Normalization are appropriate semantic segmentation methods for application to neonatal thermal images. U-Net GAN and U-Net GAN + SA significantly improved the mIoU of segmentation. Keywords: Thermography, Semantic segmentation, Infants, Temperature Background Neonatal body temperature is known to have a signifi - cant effect on prognosis [1–5], and body temperature is inversely correlated with mortality in infants [1, 2, 4]. As temperature management is clinically important in *Correspondence: hidetsugu.asano@atomed.co.jp Technical Department, Atom Medical Corporation, 2-2-1, Dojo, neonatal care, a number of organizations, including the Sakura-ku, Saitama city, Saitama 338-0835, Japan World Health Organization (WHO), have proposed Full list of author information is available at the end of the article © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Asano et al. BMC Medical Imaging (2022) 22:1 Page 2 of 13 guidelines for neonatal temperature management [6–9]. image processing methods, such as edge extraction and However, there is still a lack of evidence regarding the ellipse fitting, for automatic ROI extraction in thermal optimal body temperature for infants [8]. Karlsson et  al. images of adults. However, these methods aim to exclude [10] investigated the differences in temperature of the other regions from the ROI, and are unable to segment head, body, arms, legs, and feet of healthy infants, and the human body into regions. Abbas et al. [27] proposed reported that differences in skin temperature at different a method for tracking analysis points using temporally sites can be used for diagnosis of infants [10–15]. Knobel continuous thermal images of infants, which allowed et al. [15] measured body temperature using thermistors analysis of the temporal variability of the analysis points. attached to the abdomen and feet of very low birth weight However, it was still necessary to set the analysis points (VLBW) infants, and reported its relation to peripheral manually in their method. vasoconstriction. These reports suggest the importance Deep Learning may be applicable to address the dis- of temperature control and detailed regional temperature advantages of these methods. There has been significant measurement in infants. However, these studies used progress in research on semantic segmentation, especially contact-type probes, which are associated with a num- in the field of automatic driving [28–30]. The application ber of issues that lead to inaccuracy of measurements, of semantic segmentation to thermal images of infants including probe position, fixation method, contact with would allow detailed analysis of global information. Ron- the skin, and the inability to measure the temperature neberger et  al. [31] proposed U-Net as a segmentation distribution over the whole body. Therefore, a number method for cellular images. U-Net has been used for seg- of recent studies used infrared thermography, a non- mentation of biomedical images, and has been applied contact, continuous thermal imaging technique that uses in a number of studies because of its stability and high infrared light emitted from objects in accordance with performance. Antink et  al. [32] proposed a method for heat, which is assumed to be the surface temperature in segmenting the body parts of neonates from RGB images. neonates [16–21]. At present, contact-type probes are In addition, there have been a number of studies on auto- used for continuous temperature measurement, but their matic classification of organs on magnetic resonance use is associated with hygiene risks and they can damage imaging (MRI) and computed tomography (CT) images the fragile skin of infants. However, there is increasing [33–35]. Deep Learning has also been applied to thermal interest in the application of neonatal thermography as images for medical applications. Lyra et  al. [36] applied it can reduce these risks. Medical adhesive-related skin Yolov4 [37] to thermal images for automatic extraction injuries (MARSI) are a known clinical problem, which of patients and medical staff and calculation of vital signs is particularly important in neonatal care, and the risk from the detected regions. Kwasniewska et  al. [38] per- of such injuries must be reduced [22–24]. Knobel et  al. formed image resolution enhancement of thermal images [16] examined the differences in temperature distribu - to increase the accuracy of estimation of vital signs from tion between the chest and abdomen due to necrotizing thermal images. Moreover, Ekici et al. [39] applied Deep enterocolitis (NEC) in VLBW infants, and reported that Learning to detect breast cancer in thermal images. How- children with NEC had significantly lower abdominal ever, the application of Deep Learning to thermal images temperatures compared to healthy infants. Using ther- in neonates has not been investigated in sufficient detail. mal imaging, Knobel et  al. [17] also demonstrated that Generative Adversarial Network (GAN) is a Deep the temperature of the feet was higher than that of the Learning method that has been under development in abdomen within the first 12  h of life in VLBW infants. recent years. GAN is a learning method proposed by Abbas et  al. [18] developed a detailed measurement Goodfellow et  al. [40] in which a Generator network model to accurately measure body temperature in infants that generates images and a Discriminator network that based on thermal images, and Ussat et al. [19] proposed a determines whether an input image is a natural or gen- non-contact method for measurement of respiratory rate erated image compete with each other. There have been based on the temperature difference of inhaled air. a number of reports of the application of GAN in image Therefore, there have been a number of studies on the style transformation, etc. [41, 42]. It has been applied utility of thermography for monitoring the body tem- in a number of fields, including Semantic segmenta - perature of infants. However, it was necessary to set the tion, where the loss function is difficult to define. Self- region of interest (ROI) manually for each analysis, pre- Attention (SA) [43] is a method that has had a significant venting continuous evaluation and therefore the evalua- impact on improving the performance of Deep Learning. tion was not strictly quantitative. There has been marked progress in the development of To address this issue, there have been a number of Deep Learning in the field of natural language process - studies regarding automated processing of ROIs by com- ing, and high-performance networks using the Attention puter. Duarte et  al. [25] and Rodriguez et  al. [26] used mechanism have been proposed [44, 45]. SA is a method A sano et al. BMC Medical Imaging (2022) 22:1 Page 3 of 13 that applies these techniques to image processing, ena- various variations in size, position, etc., were captured bling more complex analysis by learning and assigning for 66–140 h in each case, for a total of 1032 h. Figure  1 meaning to relationships between pixels, such as between shows an example of a thermal image obtained using this words in a sentence. In conventional convolutional net- system. works, local variations in an image are extracted and A total of 400 images were selected at random from weighted to achieve detection. SA takes into account the the thermographic images, excluding those taken dur- relations between the intensities of the pixel values in ing treatment or nursing care by medical staff, and the weighting, making it possible to express changes in the ground truth was generated manually. The pixels of the importance of pixel values. thermal images were divided into five classes, i.e., head, For continuous quantitative analysis of thermal images, body, arms, legs, and “other.” The cervical region was semantic segmentation can be applied for automatic ROI defined as the head, and the shoulder region was defined setting in infants. In this study, we propose a suitable as part of the arm region. In addition, diapers, probes, method for semantic segmentation of thermal images tubes, respiratory masks, and hair in the images were in infants. An accurate semantic segmentation method strictly excluded as non-skin areas. The definition of would enable detailed analysis of the temperature of each ground truth was made by a skilled neonatologist, who region of an infant’s entire body surface. This will ena - also checked the generated ground truth, as shown in ble early detection of diseases, such as sepsis and NEC, Fig.  2. Subsequent training and testing were conducted which are currently difficult to detect. Early detection of using the generated ground truth. these diseases will lead to better prognosis and to new The network structure was based on U-Net for ther - standards of care. Considering the extension to disease mal image segmentation, and we applied the Convolu- prediction using Deep Learning, we investigated meth- tion–Batch Normalization–Rectified Linear Unit (ReLU) ods of segmentation with the maximum possible accu- (CBR) structure used in ResNet [46]. As U-Net is often racy and detail. The methods and their performance were the first choice for semantic segmentation of medical evaluated using thermal images acquired in a clinical images, it was also used in this study as the base archi- setting. tecture and was shown to be suitable for analyzing ther- mal images of infants. The detailed network structure is Methods shown in Table  2. The total network was a 22-stage fully Twelve preterm infants without congenital or underly- convolutional network. A number of functions have been ing diseases, born at Nagasaki Harbor Medical Center proposed to improve the performance of networks, but (NHMC) and requiring incubator support, were included most have been evaluated only on RGB images, and there in this study. The characteristics of the patients are have been no reports of evaluation of thermal images. shown in Table  1. The median ± standard deviation Therefore, Weight Normalization [47], Group Normali - (SD) of the gestational age of the infants included in the zation [48], and Flexible Rectified Linear Unit (FReLU) study was 34 ± 2.8 weeks, birth weight was 2053 ± 712  g, [49], which have already been evaluated on images, were mean age at the start of imaging was 0 + 0.8  days, and applied to compare their accuracy on thermal images. male:female ratio was 7:5. This study was approved by the Weight Normalization was replaced by convolution, Ethics Committee of Nagasaki Harbor Medical Center Group Normalization by Batch Normalization, and (Approval No. NIRB No. R02-006). The research was car - FReLU by ReLU, and all combinations were evaluated. ried out in accordance with the Declaration of Helsinki. Preliminary experiments were conducted at 2-, 4-, 5-, 8-, A thermography camera was installed on the upper and tenfold at the image level, and the experiment was part of the incubator at the side closest to the feet of the assumed to be conducted at fourfold, where accuracy infant. Data with a resolution of 320 × 256 were acquired began to drop. With fourfold cross-validation, the clas- at 1 fps using a thermal camera (FLIR A35; FLIR, Mid- sification accuracy of segmentation and Mean Intersec - dletown, NY, USA). Thermographic images with tion over Union (mIoU) were used as evaluation metrics. Cross Entropy Loss was used as the loss function. No pre-training was performed. Table.1 Participant characteristics Furthermore, based on the network with the high- est accuracy in the above comparison, GAN and SA Characteristic (n = 12) Median ± SD were applied to extend the network, and the accuracy Gestational week at delivery 34 ± 2.8 was evaluated again. Here, we extended U-Net GAN Birth weight (kg) 2053 ± 712 [50] proposed by Schonfeld et  al., an image generation Age (days) 0 ± 0.8 method that uses U-Net as a Discriminator, and applied Sex (male) 7 (58%) it to neonatal thermography. This method optimizes Asano et al. BMC Medical Imaging (2022) 22:1 Page 4 of 13 Fig. 1 Thermographic images. Many variations in thermal images were obtained with different sizes and positions of the infants: blue, 28 °C; red, 40 °C not only the entire image, but also each pixel, resulting for one image. The decoder output has the same image in images with fewer errors than traditional GAN. The size as the input and classifies real/fake on a pixel-by- segmentation system using U-Net GAN is shown in pixel basis. Fig.  3, where x represents the correct data for segmen- In addition to U-Net GAN, SA was used to improve per- tation and T represents the input thermal image. The formance. Unlike RGB images, thermal images represent output of the generator that performs the segmentation single-channel data of temperature only, and the relation- of the thermal image T is denoted by G(T ) . The Dis - ships between the temperatures are important for the criminator has Encoder and Decoder sections, and its analysis. Therefore, application of the SA module to the output consists of D (x) , which predicts the Real/Fake network will make it possible to evaluate not only the spa- enc classification of the whole image, and D (x) , which tial relations but also the appearance patterns of heat and dec predicts the Real/Fake classification of each pixel. feature intensities, which will enable more detailed analy- The network with the highest accuracy in the experi - sis. The structure of the network with incorporation of the ments described above is used as the Generator of SA module into U-Net GAN (U-Net GAN + SA) is shown U-Net GAN. Here, we conducted preliminary experi- in Table  2. The number of channels remains unchanged, ments, and the Discriminator network was made although the depth of the network is increased due to the with four layers of CBR blocks and half the number bottleneck structure. The loss function of the Discrimina - of channels. Using U-Net GAN, segmentation results tor, L , was calculated using Eq. 1: were constrained to be similar to the manually gener- cons L = L + L + L D D D enc D (1) dec ated ground truth, while preserving accuracy and sup- dec pressing overfitting. The detailed network structure of cons where L ,L , and L are the Encoder Loss, D D enc dec D dec U-Net GAN Discriminator is shown in Table  3. The Decoder Loss, and Consistency Loss of the Discrimina- encoder output of the Discriminator is average-pool- tor, respectively, and are expressed in Eqs. 2–4: ing of the most downscaled image data of U-net, and the full connect is used to identify the real/fake binary L =−E [logD (x)] − E [log (1 − D (G(T )))] D x enc T enc enc value. Therefore, the encoder output is one data output (2) A sano et al. BMC Medical Imaging (2022) 22:1 Page 5 of 13 Fig. 2 Examples of thermal images and ground truth. The head is shown in red, the body in yellow, the arms in green, the legs in blue, and the other regions in black � � � �   � � log 1 − D G T ) ( ( )] log [D (x)] dec i,j i,j dec i,j i,j   (3) L =−E − E dec x T width ∗ height width ∗ height cons 2 (4) L = ||D (mix(x, G(T ),M) − mix(D (x), D (G(T )),M)|| dec dec dec dec to correctly predict the Real/Fake classification of each where mix(x , x ,M) is the CutMix function [51], which 1 2 pixel. Consistency Loss also improves the stability of the mixes x and x according to the mask M , and width and 1 2 Discriminator’s prediction by placing constraints on the height are the width and height of the image, respec- CutMix of D (x) and D (G(T )) and the prediction dec dec tively. The loss is given by L to correctly predict the enc results of the CutMix of x and G(T ) to be the same. The Real/Fake classification of the whole image, and by L dec loss function, L , of the generator is also shown in Eq. 5: log [D (G(T ))] CrossEntropy(x, G(T )) dec i,j i,j i,j (5) L =−E log D (G(T )) + +· G T enc width*height width ∗ height Asano et al. BMC Medical Imaging (2022) 22:1 Page 6 of 13 Table.2 Detailed network configuration of U-Net, U-Net GAN The first term represents the loss of the Discriminator Generator, and U-Net GAN + SA Generator and constrains segmentation to be similar to the ground truth. CrossEntropy(x , x ) represents the Cross Entropy 1 2 Layers Output size U-Net U-Net GAN + SA Loss, and  is a variable that balances the first and second Input 320 × 256 × 1 terms; in this paper,  = 0.1. Convolution 320 × 256 × 16 3 × 3, 16 d 3 × 3, 16 d As in the previous experiment, fourfold cross-valida- Downscale 160 × 128 × 32 5 × 5, 32 d, CBR 1 × 1, 32 d tion was performed to evaluate U-Net GAN and U-Net 3 × 3, 32 d, CBR 7 × 7, 32 d, SA GAN + SA. In addition to classification accuracy and 1 × 1, 32 d mIoU, a Confusion Matrix including U-Net was used as Downscale 80 × 6464 5 × 5, 64 d, CBR 1 × 1, 64 d 3 × 3, 64 d, CBR 7 × 7, 64 d, SA an evaluation metric. 1 × 1, 64 d For training, a PC with an AMD Ryzen 7 3700X CPU, Downscale 40 × 32 × 128 5 × 5, 128 d, CBR 1 × 1, 128 d 64  GB of memory, and a GeForce RTX 3090 GPGPU 3 × 3, 128 d, CBR 7 × 7, 128 d, SA running Windows 10 was used. We used Python 3.7 as 1 × 1, 128 d the programming language and Pytorch 1.1 was used as Downscale 20 × 16 × 256 5 × 5, 256 d, CBR 1 × 1, 256 d a deep learning package. The optimal values of learning 3 × 3, 256 d, CBR 7 × 7, 256 d, SA 1 × 1, 256 d parameters (i.e., network depth, number of channels per Downscale 10 × 8 × 512 5 × 5, 512 d, CBR 1 × 1, 512 d layer, batch size, learning rate) were determined through 3 × 3, 512 d, CBR 7 × 7, 512 d, SA a preliminary experiment. The number of training epochs 1 × 1, 512 d was determined before the model began overfitting. The Upscale 20 × 16 × 256 5 × 5, 256 d, CBR 1 × 1, 256 d parameters used for training are shown in Table  4. For 3 × 3, 256 d, CBR 7 × 7, 256 d, SA 1 × 1, 256 d Augmentation, we performed a vertical flip of the image Upscale 40 × 32 × 128 5 × 5, 128 d, CBR 1 × 1, 128 d and added random noise to each pixel. AMSGrad [52] 3 × 3, 128 d, CBR 7 × 7, 128 d, SA was used as the optimizer. 1 × 1, 128 d Statistical analyses were conducted to compare the Upscale 80 × 64 × 64 5 × 5, 64 d, CBR 1 × 1, 64 d accuracy between the methods. The Steel–Dwass test 3 × 3, 64 d, CBR 7 × 7, 64 d, SA 1 × 1, 64 d was applied as a nonparametric multiple comparison Upscale 160 × 128 × 32 5 × 5, 32 d, CBR 1 × 1, 32 d test. All analyses were performed using JMP 15 statisti- 3 × 3, 32 d, CBR 7 × 7, 32 d, SA cal software. For a detailed evaluation of segmentation 1 × 1, 32 d performance, the Hausdorff distance and IoU for each Upscale 320 × 256 × 16 5 × 5, 16 d, CBR 1 × . 1, 16 d region were calculated. 3 × 3, 16 d, CBR 7 × 7, 16 d, SA 1 × 1, 16 d Convolution 320 × 256 × 1 3 × 3, 1 d 3 × 3, 1 d Loss Real/Fake Input Generator Prediction Discriminator Real/Fake per pixel Grand Truth Fig. 3 Network diagram of U-Net GAN A sano et al. BMC Medical Imaging (2022) 22:1 Page 7 of 13 Table.3 Detailed network configuration of U-Net GAN U-Net showed very high segmentation accuracy with discriminator and U-Net GAN + SA discriminator a validation accuracy of 91.3% (SD 0.04%) and mIoU of 57.8% (SD 0.15%). FReLU showed improvements of 0.6% Layers Output size U-Net U-Net GAN + SA (SD 0.04%) in accuracy and 3.1% (SD 0.16%) in mIoU, Input 320 × 256 × 1 while Group Normalization showed improvements Convolution 320 × 256 × 8 3 × 3, 8 d 3 × 3, 8 d of 0.9% (SD 0.04%) in accuracy and 4.4% (SD 0.14%) in Downscale 160 × 128 × 16 5 × 5, 16 d, CBR 1 × 1, 16 d mIoU. However, Normalized Convolution decreased the 3 × 3, 16 d, CBR 7 × 7, 16 d, SA accuracy by 0.2% (SD 0.05%), but improved the mIoU by 1 × 1, 16 d 3.1% (SD 0.15%). The best results were obtained with the Downscale 80 × 64 × 32 5 × 5, 32 d, CBR 1 × 1, 32 d 3 × 3, 32 d, CBR 77, 32 d, SA combined application of FReLU and Group Normaliza- 1 × 1, 32 d tion showing 92.9% (SD 0.04%) accuracy and mIoU of Downscale 40 × 32 × 64 5 × 5, 64 d, CBR 1 × 1, 64 d 64.5% (SD 0.15%). 3 × 3, 64 d, CBR 7 × 7, 64 d, SA U-Net GAN and U-Net GAN + SA showed validation 1 × 1, 64 d accuracy of 93.3% (SD 0.03%) and 93.5% (SD 0.04%), rep- Encoder out 5 ReLU ReLU resenting improvements of 0.7% and 0.9%, respectively, ( D (x)) Average Pooling Average Pooling enc Linear, 5d Linear, 5 d and mIoU of 66.9% (SD 0.13%) and 70.4% (SD 0.13%), Upscale 80 × 64 × 32 5 × 5, 32 d, CBR 1 × 1, 32 d representing improvements of 2.4% and 5.9%, respec- 3 × 3, 32 d, CBR 7 × 7, 32 d, SA tively, compared to the best results of U-Net (Table  6). 1 × 1, 32 d Finally, the confusion matrices for U-Net, U-Net GAN, Upscale 160 × 128 × 16 5 × 5, 16 d, CBR 1 × 1, 16 d and U-Net GAN + SA are shown in Fig.  4. For each net- 3 × 3, 16 d, CBR 7 × 7, 16 d, SA 1 × 1, 16 d work, the accuracy was 82%, 82%, and 87% for head, 82%, Upscale 0 × 256 × 8 5 × 5, 8 d, CBR 1 × 1, 8 d 87%, and 88% for body, 66%, 72%, and 68% for arms, 33, 8 d, CBR 7 × 7, 8 d, SA 86%, 85%, and 81% for legs, and 94%, 97%, and 96% for 1 × 1, 8 d other, respectively. The results of the Steel–Dwass test Convolution 320 × 256 × 2 3 × 3, 2 d 3 × 3, 2 d are shown in Table  7. Significant differences were found ( D (x)) dec between several methods. The results of the Hausdor ff distance and IoU for each region are shown in Tables  8 and 9, respectively. Table.4 Parameters used for training Parameter Net U-Net GAN U-Net GAN + SA Learning rate 0.01 0.01 (generator) 0.01 (generator) Discussion 1e−4 (discriminator) 1e−4 (discriminator) All of the methods examined here showed highly accu- Batch size 75 30 12 rate classification performance. FReLU and Group Nor - Epoch 200 100 100 malization improved the classification accuracy and mIoU of U-Net, which was considered to be due to the improved representativeness of the network. Group Nor- malization shows that normalization within the channels Results of the network is more effective than Batch Normaliza - The accuracy of segmentation using U-Net was evalu - tion in this problem. This was because the input data ated and the results are shown in Table  5. Even standard Table.5 Segmentation performance using U-Net with and without normalized convolution, FReLU, and group normalization Normalized FReLU Group normalization Accuracy (%) SD (%) mIoU (%) SD (%) convolution 91.3 0.04 57.8 0.15 ✓ 91.1 0.05 60.9 0.15 ✓ 91.9 0.04 60.9 0.16 ✓ 92.2 0.04 62.2 0.14 ✓ ✓ 91.4 0.05 60.7 0.15 ✓ ✓ 92.4 0.04 63.8 0.13 ✓ ✓ 92.9 0.04 64.5 0.15 ✓ ✓ ✓ 92.4 0.04 62.9 0.15 Asano et al. BMC Medical Imaging (2022) 22:1 Page 8 of 13 Table.6 Segmentation performance of U-Net, U-Net GAN, and improvement. There was no significant difference U-Net GAN + SA between FReLU and U-Net GAN + SA, thus confirming the effectiveness of FReLU. U-Net GAN + SA showed Network Accuracy (%) SD (%) mIoU (%) SD (%) significant differences in many cases compared to the U-Net 92.9 0.04 64.5 0.15 other methods, confirming that it is a powerful method. U-Net GAN 93.3 0.03 66.9 0.13 However, there were no significant differences between U-Net GAN + SA 93.5 0.03 70.4 0.13 the four sets of results: FReLU with Group Normaliza- tion, FReLU with Group Normalization and Normal- ized Convolution, U-Net GAN, and U-Net GAN + SA . This suggests that the performance improvement may be consisted only of temperature information with similar approaching its limit. backgrounds, so there were many regions with similar Similar results were obtained with Hausdorff distance. values, and Batch Normalization may have the effect of FReLU with Group Normalization, U-Net GAN, and over-averaging the data. On the other hand, Normal- U-Net GAN + SA performed better than the other meth- ized Convolution showed a decrease in accuracy but an ods in almost all regions, and the SD was also lower. In improvement in mIoU. Depending on the location of the all methods, the Hausdorff distance was larger for the thermal imaging camera and the view angle, the “other” arms and legs than for the head and body. In IoU, Other region had 13–23 times more pixels than the “infant” was the highest in all methods, which may have been due region. Thus, Normalized Convolution may decrease the to the lower temperature in the Other region compared number of missed skin regions, but increase the percent- to the neonate, thus making segmentation easier. U-Net age of false positive identification of other regions as skin GAN + SA showed better results for infant region seg- regions. The application of U-Net with FReLU and Group mentation. SA was also effective in Semantic Segmenta - Normalization showed 1.6% better accuracy and 6.7% tion of thermal images. better mIoU than ReLU and Batch Normalization. These U-Net GAN is optimized by combining multiple loss results confirmed that the combined use of these tools functions. The Discriminator classifies the manually gen - resulted in significant improvements, especially in mIoU. erated ground truth and the results of U-Net segmenta- Using the network with FReLU and Group Normali- tion, and in addition to the conventional GAN evaluation zation applied to U-Net as a baseline, U-Net GAN and on a per-image basis, it also evaluates and feeds back U-Net GAN + SA were confirmed to show beneficial the results on a per-pixel basis. This yields not only effects. higher performance than normal U-Net, but is also visu- Compared to the accuracy of U-Net of 92.9%, U-Net ally closer to the manually obtained ground truth. The GAN showed a 0.4% improvement in accuracy and 2.4% accuracy was further improved in U-Net GAN + SA by improvement in mIoU, and U-Net GAN + SA improved changing the Convolution to SA. SA, which strictly eval- accuracy by 0.6% and mIoU by 5.9%. uates the relationship between pixels, was considered to The results of the Steel–Dwass test showed signifi - be effective as temperature images have lower value vari - cant differences between several methods. In particu - ation and dimensionality compared to RGB images. The lar, FReLU alone showed a significant performance Fig. 4 Confusion matrices of U-Net, U-Net GAN, and U-Net GAN + SA A sano et al. BMC Medical Imaging (2022) 22:1 Page 9 of 13 Table.7 Significant differences between the proposed methods Normalized Convolution FReLU Group Normalization ** *** * ** *** * ****** **** ** * U-Net GAN U-Net GAN+SA *p < 0.01, **p < 0.05 Table.8 Hausdorff distance for each region Head Body Arm Leg Other All (w/o other) Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD UNet 34.6 25.3 38.1 29.5 59.2 37.7 43.4 40.0 26.7 9.3 43.9 34.8 Normalized convolution 33.0 23.6 31.5 26.2 55.7 35.6 46.5 50.7 26.5 8.9 41.5 36.4 FReLU 31.2 21.3 31.2 22.6 58.4 38.0 42.8 40.0 25.5 9.1 40.8 33.3 Group normalization 30.3 18.2 30.1 21.9 63.4 40.0 50.7 48.6 25.8 9.5 43.4 36.9 Normalized convolution FReLU 27.8 18.9 31.1 25.6 57.2 37.1 47.9 47.5 25.5 9.2 40.8 35.7 Normalized convolution group 30.4 21.3 30.0 20.6 64.9 34.1 52.9 50.0 26.7 9.9 44.3 36.3 normalization FReLU group normalization 27.5 17.8 25.2 20.2 48.7 32.8 38.6 38.3 25.4 8.8 34.9 29.8 ALL 26.8 17.2 28.5 22.4 53.9 36.9 48.5 49.9 25.3 9.9 39.1 35.5 U-Net GAN 27.4 19.7 26.7 22.7 49.3 34.1 39.0 42.7 24.5 9.1 35.5 32.1 U-Net GAN + SA 27.1 17.7 26.7 22.7 46.3 32.6 41.4 42.1 23.7 9.4 35.2 31.1 Normalized Convolution FReLU Group Normalization U-Net GAN U-Net GAN+SA Asano et al. BMC Medical Imaging (2022) 22:1 Page 10 of 13 Table.9 IoU for each region Head Body Arm Leg Other All IoU (%) SD (%) IoU (%) SD (%) IoU (%) SD (%) IoU (%) SD (%) IoU (%) SD (%) IoU (%) SD (%) UNet 50.8 0.16 52.1 0.16 41.6 0.17 53.5 0.23 91.1 0.03 57.8 0.15 Normalized convolution 57.5 0.14 48.5 0.15 47.4 0.15 59.8 0.23 91.5 0.03 60.9 0.14 FReLU 54.8 0.17 56.6 0.16 44.3 0.18 57.1 0.26 91.8 0.03 60.9 0.16 Group normalization 56.4 0.16 60.1 0.15 43.1 0.16 58.6 0.24 93.0 0.03 62.2 0.15 Normalized convolution FReLU 55.1 0.14 57.3 0.14 41.4 0.13 57.6 0.21 92.2 0.03 60.7 0.13 Normalized convolution group 58.0 0.15 61.2 0.16 47.3 0.17 60.2 0.24 92.4 0.03 63.8 0.15 normalization FReLU group normalization 59.2 0.16 62.3 0.15 47.7 0.17 61.4 0.23 92.0 0.03 64.5 0.15 ALL 58.2 0.15 59.3 0.15 47.9 0.16 58.0 0.25 91.3 0.03 62.9 0.15 U-Net GAN 61.5 0.14 64.3 0.14 49.1 0.15 66.4 0.2 93.4 0.03 66.9 0.13 U-Net GAN + SA 64.8 0.14 67.9 0.14 57.9 0.14 67.6 0.2 93.6 0.02 70.4 0.13 Fig. 5 Examples of the differences in segmentation results between U-Net, U-Net GAN, and U-Net GAN + SA. a Input. b Ground truth. c U-Net. d U-Net GAN. e U-Net GAN + SA temperature image, ground truth, and images obtained showed high accuracy, but the features differed between by segmentation using U-Net, U-Net GAN, and U-Net methods. U-Net segmented the images with smooth GAN + SA are shown in Fig. 5. The results of all methods boundaries. On the other hand, it misdetected thin A sano et al. BMC Medical Imaging (2022) 22:1 Page 11 of 13 regions, such as cables on the body surface, resulting in the body temperature of infants and analyzing various finely over-segmented regions. U-Net GAN yielded a diseases. smoother segmentation shape and unnatural segmenta- Further studies are required to evaluate the accuracy tion was prevented, and U-Net GAN + SA successfully of measuring the body temperature of infants using our excluded fine non-skin areas, such as cables and the method. The segmentation accuracy was evaluated, but shapes near the boundaries of the segmented areas fol- the impact of this accuracy on the temperature measure- lowed the edges of the temperature information. These ment is not yet clear. Furthermore, large amounts of clin- results were attributed to the strict evaluation of temper- ical data will be collected and analyzed using the results ature relationships by SA, resulting in detailed semantics. obtained with this method to study the ability to predict The confusion matrix shown in Fig.  4 indicated that diseases and other conditions. In this process, the accu- the detection accuracy of each region differed between racy required for segmentation will be clarified. It will be methods. U-Net GAN + SA showed 5% higher detection necessary to examine these issues through clinical appli- accuracy for the head than the other methods. For the cation in future studies. body, U-Net GAN and U-Net GAN + SA showed 5%–6% higher accuracy than U-Net. For the arms, U-Net GAN Conclusion was 4–6% more accurate than the other methods, and for A U-Net-based network was confirmed to be able the legs, U-Net was 1–5% more accurate than the other to segment the skin area on thermographic ther- methods. U-Net GAN showed 1–3% higher accuracy for mal images of infants with high accuracy. FReLU and the other regions than the other methods. The features of Group Normalization were confirmed to be effec- the resulting segmented images differed according to the tive for thermal image segmentation. GAN was also method used, although the numerical differences were shown to improve the segmentation accuracy, and SA small. U-Net GAN + SA predicted the skin region of the achieved fine segmentation even on thermal images infant as “other” less frequently than the other methods, with few features. These tools contributed to the which was due to the strict evaluation of pixel-by-pixel improvement of mIoU, and U-Net GAN + SA showed temperature relationships by SA. The accuracy of U-Net a significant performance improvement over standard GAN + SA was higher for the head and body compared U-Net. to the other methods, while it showed lower accuracy for the arm and leg regions due to an increase in the number Abbreviations of cases where they were incorrectly detected as other CBR structure: Convolution–batch normalization–ReLU structure; CT: Com- skin regions. This was because the arms and legs have puted tomography; FReLU: Flexible rectified linear unit; GAN: Generative more variations in shape and positional relationships adversarial network; MARSI: Medical adhesive-related skin injuries; mIoU: Mean intersection over union; MRI: Magnetic resonance imaging; NHMC: Nagasaki than the head and body, and strictly evaluating the pixel- harbor medical center; NEC: Necrotizing enterocolitis; ReLU: Rectified linear by-pixel relationships leads to incorrect predictions. unit; ROI: Region of interest; SA: Self-attention; SD: Standard deviation; U-Net Therefore, additional training data and further augmen - GAN: U-Net generative adversarial network; U-Net GAN + SA: Self-attention module in U-Net GAN; VLBW: Very low birth weight; WHO: World Health tation are considered necessary for U-Net GAN + SA to Organization. detect arms and legs more accurately. U-Net and U-Net GAN tended to have slightly lower accuracy than U-Net Acknowledgements None. GAN + SA. However, SA requires a great deal of process- ing and large amounts of memory, so it is important to Authors’ contributions consider the device to be used and select the optimal HA analyzed and interpreted the data and contributed significantly to the preparation of the manuscript. EH designed the study, performed the experi- method to be applied. In medical applications, it is not ments, contributed to interpretation of the data, and revised the manuscript. necessary to evaluate the temperature of areas other than HH performed data analysis and data preparation, and contributed to the skin, and therefore U-Net GAN + SA is considered to interpretation of the data. KH performed experiments and contributed to interpretation of the data. YA and MO participated in the design of the experi- be effective. However, further improvements are needed ments. AU and TH contributed to interpretation of the data and revised the for regions where the shape and positional relation- manuscript. All authors have read and approved the final manuscript. ships may vary, such as the arms and legs, as the system Funding showed degradation of performance in such areas. Not applicable. The application of this method in clinical settings will enable continuous monitoring of temperature in Availability of data and materials The datasets generated and analyzed during the present study are not each region of the body. Further studies are required to publicly available due to participant privacy, but are available from the cor- confirm the effectiveness of this method in managing responding author on reasonable request. Asano et al. BMC Medical Imaging (2022) 22:1 Page 12 of 13 14. Lubkowska A, Szymański S, Chudecka M. Surface body temperature Declarations of full-term healthy newborns immediately after birth-pilot study. Int J Environ Res Public Health. 2019;16:1312. Ethics approval and consent to participate 15. Knobel RB, Holditch-Davis D, Schwartz TA, Wimmer JE Jr. Extremely low This study was approved by the Ethics Committee of the Nagasaki Harbor birth weight preterm infants lack vasomotor response in relationship to Medical Center, and the research was conducted (Approval No. NIRB No. cold body temperatures at birth. J Perinatol. 2009;29:814–21. R02-006). The research was carried out in accordance with the Declaration of 16. Knobel RB, Guenther BD, Rice HE. Thermoregulation and thermography Helsinki. Written informed consent was obtained from the parent or guardian in neonatal physiology and disease. Biol Res Nurs. 2011;13:274–82. of all participants. 17. Knobel-Dail RB, Holditch-Davis D, Sloane R, Guenther BD, Katz LM. Body temperature in premature infants during the first week of life: exploration Consent for publication using infrared thermal imaging. J Therm Biol. 2017;69:118–23. The parents of all participants consented to publication of the data in 18. Abbas AK, Heimann K, Blazek V, Orlikowsky T, Leonhardt S. Neonatal infra- anonymized form. red thermography imaging: analysis of heat flux during different clinical scenarios. Infrared Phys Technol. 2012;55:538–48. Competing interests 19. Ussat M, Vogtmann C, Gebauer C, Pulzer F, Thome U, Knüpfer M. The role Hidetsugu Asano, Hayato Hayashi, Yuto Asayama, and Masaaki Oohashi are of elevated central-peripheral temperature difference in early detection salaried employees of Atom Medical Corporation. of late-onset sepsis in preterm infants. Early Hum Dev. 2015;91:677–81. 20. Simpson RC, McEvoy HC, Machin G, Howell K, Naeem M, Plassmann Author details P, et al. In-field-of-view thermal image calibration system for medical Technical Department, Atom Medical Corporation, 2-2-1, Dojo, Sakura-ku, thermography applications. Int J Thermophys. 2008;29:1123–30. Saitama city, Saitama 338-0835, Japan. Department of Neonatology, Nagasaki 21. Topalidou A, Ali N, Sekulic S, Downe S. Thermal imaging applications in Harbor Medical Center, 6-39, Shinchi-machi, Nagasaki City, Nagasaki 850-8555, neonatal care: a scoping review. BMC Pregnancy Childbirth. 2019;19:381. Japan. Department of Neonatology, Kagoshima City Hospital, 37-1 Uea- 22. Lund CH, Nonato LB, Kuller JM, Franck LS, Cullander C, Durand DJ. rata-cho, Kagoshima City, Kagoshima 890-8760, Japan. Department of Clinical Disruption of barrier function in neonatal skin associated with adhesive Engineering, Nagasaki Harbor Medical Center, 6-39, Shinchi-machi, Nagasaki removal. J Pediatr. 1997;131(3):367–72. City, Nagasaki 850-8555, Japan. Department of Comprehensive Community 23. Dollison EJ, Beckstrand J. Adhesive tape vs pectin-based barrier use in Care Education, Nagasaki University Graduate School of Biomedical Sciences, preterm infants. Neonatal Netw. 1995;14(4):35–9. 1-14, Bunkyo-machi, Nagasaki City, Nagasaki 852-8521, Japan. Mobile Com- 24. Lund C, Kuller JM, Tobin C, Lefrak L, Franck LS. Evaluation of a pectin- puting Laboratory, Graduate School of Information Science and Technology, based barrier under tape to protect neonatal skin. J Obstet Gynecol Osaka University, 1-5, Yamadaoka, Suita, Osaka 565-0871, Japan. Neonatal Nurs. 1986;15(1):39–44. 25. Duarte A, Carrão L, Espanha M, Viana T, Freitas D, Bártolo P, et al. Segmen- Received: 1 June 2021 Accepted: 23 December 2021 tation algorithms for thermal images. Procedia Technol. 2014;16:1560–9. 26. Rodriguez-Lozano FJ, León-García F, Ruiz de Adana M, Palomares JM, Oli- vares J. Non-invasive forehead segmentation in thermographic imaging. Sensors (Basel). 2019;9:4096. 27. Abbas AK, Leonhardt S. Intelligent neonatal monitoring based on a References virtual thermal sensor. BMC Med Imaging. 2014;14:9. 1. Silverman WA, Fertig JW, Berger AP. The influence of the thermal environ- 28. Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: 2017 ment upon the survival of newly born premature infants. Pediatrics. IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 1958;22:876–86. 2. Vohra S, Frent G, Campbell V, Abbott M, Whyte R. Eec ff t of polyethylene 29. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N. BiSeNet: bilateral segmenta- occlusive skin wrapping on heat loss in very low birth weight infants at tion network for real-time semantic segmentation. In: Computer vision— delivery: a randomized trial. J Pediatr. 1999;134:547–51. ECCV 2018. Cham: Springer; 2018. p. 334–49. 3. O’Reilly JN. Heated carrier for transporting premature babies. Br Med J. 30. Zhu Y, Sapra K, Reda FA, Shih KJ, Newsam S, Tao A, et al. Improving 1945;2:731. semantic segmentation via video propagation and label relaxation. In: 4. Laptook AR, Bell EF, Shankaran S, Boghossian NS, Wyckoff MH, Kandefer 2019 IEEE/CVF conference on computer vision and pattern recognition S, et al. Admission temperature and associated mortality and morbidity (CVPR). IEEE; 2019. among moderately and extremely preterm infants. J Pediatr. 2018;192:53- 31. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for 59.e2. biomedical image segmentation. In: Lecture notes in computer science. 5. Asakura H. Fetal and neonatal thermoregulation. J Nippon Med Sch. Cham: Springer International Publishing; 2015. p. 234–41. 2004;71:360–70. 32. Antink CH, Ferreira JCM, Paul M, Lyra S, Heimann K, Karthik S, et al. Fast 6. Smith J, Alcock G, Usher K. Temperature measurement in the preterm and body part segmentation and tracking of neonatal video data using deep term neonate: a review of the literature. Neonatal Netw. 2013;32:16–25. learning. Med Biol Eng Comput. 2020;58(12):3049–61. 7. World Health Organization ( WHO). Thermal protection of the newborn: a 33. Zhou X, Ito T, Takayama R, Wang S, Hara T, Fujita H. Three-dimensional practical guide. Geneva: World Health Organization; 1997. CT image segmentation by combining 2D fully convolutional network 8. Perez A, van der Meer F, Singer D. Target body temperature in very low with 3D majority voting. In: Deep learning and data labeling for medical birth weight infants: clinical consensus in place of scientific evidence. applications. Cham: Springer; 2016. p. 111–20. Front Pediatr. 2019;7:227. 34. Ait Skourt B, El Hassani A, Majda A. Lung CT image segmentation using 9. Waldron S, MacKinnon R. Neonatal thermoregulation. Infant. deep neural networks. Procedia Comput Sci. 2018;127:109–13. 2007;3:101–4. 35. Myronenko A. 3D MRI brain tumor segmentation using autoencoder 10. Karlsson H, Hänel SE, Nilsson K, Olegård R. Measurement of skin tem- regularization. In: Brainlesion: glioma, multiple sclerosis, stroke and trau- perature and heat flow from skin in term newborn babies. Acta Paediatr. matic brain injuries. Cham: Springer; 2019. p. 311–20. 1995;84:605–12. 36. Lyra S, Mayer L, Ou L, Chen D, Timms P, Tay A, et al. A deep learning-based 11. Bensouda B, Mandel R, Mejri A, Lachapelle J, St-Hilaire M, Ali N. Tempera- camera approach for vital sign monitoring using thermography images ture probe placement during preterm infant resuscitation: a randomised for ICU patients. Sensors (Basel). 2021;21(4):1495. trial. Neonatology. 2018;113:27–32. 37. Bochkovskiy A, Wang C-Y, Liao H-YM. YOLOv4: optimal speed and accu- 12. Lyon AJ, Pikaar ME, Badger P, McIntosh N. Temperature control in very racy of object detection [Internet]. arXiv [cs.CV ]. 2020. Available from: low birthweight infants during first five days of life. Arch Dis Child Fetal http:// arxiv. org/ abs/ 2004. 10934. Neonatal Ed. 1997;76:F47-50. 38. Kwasniewska A, Ruminski J, Szankin M. Improving accuracy of contactless 13. Lantz B, Ottosson C. Using axillary temperature to approximate rectal respiratory rate estimation by enhancing thermal sequences with deep temperature in newborns. Acta Paediatr. 2015;104:766–70. neural networks. Appl Sci (Basel). 2019;9(20):4405. A sano et al. BMC Medical Imaging (2022) 22:1 Page 13 of 13 39. Ekici S, Jawzal H. Breast cancer diagnosis using thermography and convo- lutional neural networks. Med Hypotheses. 2020;137(109542):109542. 40. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks [Internet]. arXiv [stat.ML]. 2014. Available from: http:// arxiv. org/ abs/ 1406. 2661. 41. Isola P, Zhu J-Y, Zhou T, Efros AA. Image-to-image translation with condi- tional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2017. 42. Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV ). IEEE; 2017. 43. Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE; 2020. 44. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need [Internet]. arXiv [cs.CL]. 2017. Available from: http:// arxiv. org/ abs/ 1706. 03762. 45. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding [Internet]. arXiv [cs.CL]. 2018. Available from: http:// arxiv. org/ abs/ 1810. 04805. 46. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2016. 47. Salimans T, Kingma DP. Weight normalization: a simple reparameteriza- tion to accelerate training of deep neural networks [Internet]. arXiv [cs. LG]. 2016. Available from: http:// arxiv. org/ abs/ 1602. 07868. 48. Wu Y, He K. Group normalization. In: Computer vision—ECCV 2018. Cham: Springer; 2018. p. 3–19. 49. Ma N, Zhang X, Sun J. Funnel activation for visual recognition. In: Com- puter vision—ECCV 2020. Cham: Springer; 2020. p. 351–68. 50. Schonfeld E, Schiele B, Khoreva A. A U-net based discriminator for gen- erative adversarial networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE; 2020. 51. Yun S, Han D, Chun S, Oh SJ, Yoo Y, Choe J. CutMix: regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF international conference on computer vision (ICCV ). IEEE; 2019. 52. Reddi SJ, Kale S, Kumar S. On the convergence of Adam and beyond [Internet]. arXiv [cs.LG]. 2019. Available from: http:// arxiv. org/ abs/ 1904. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions

Journal

BMC Medical ImagingSpringer Journals

Published: Jan 3, 2022

Keywords: Thermography; Semantic segmentation; Infants; Temperature

References