Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

3D AGSE-VNet: an automatic brain tumor MRI data segmentation framework

3D AGSE-VNet: an automatic brain tumor MRI data segmentation framework Background: Glioma is the most common brain malignant tumor, with a high morbidity rate and a mortality rate of more than three percent, which seriously endangers human health. The main method of acquiring brain tumors in the clinic is MRI. Segmentation of brain tumor regions from multi-modal MRI scan images is helpful for treatment inspection, post-diagnosis monitoring, and effect evaluation of patients. However, the common operation in clinical brain tumor segmentation is still manual segmentation, lead to its time-consuming and large performance difference between different operators, a consistent and accurate automatic segmentation method is urgently needed. With the continuous development of deep learning, researchers have designed many automatic segmentation algorithms; however, there are still some problems: (1) The research of segmentation algorithm mostly stays on the 2D plane, this will reduce the accuracy of 3D image feature extraction to a certain extent. (2) MRI images have gray-scale offset fields that make it difficult to divide the contours accurately. Methods: To meet the above challenges, we propose an automatic brain tumor MRI data segmentation framework which is called AGSE-VNet. In our study, the Squeeze and Excite (SE) module is added to each encoder, the Atten- tion Guide Filter (AG) module is added to each decoder, using the channel relationship to automatically enhance the useful information in the channel to suppress the useless information, and use the attention mechanism to guide the edge information and remove the influence of irrelevant information such as noise. Results: We used the BraTS2020 challenge online verification tool to evaluate our approach. The focus of verification is that the Dice scores of the whole tumor, tumor core and enhanced tumor are 0.68, 0.85 and 0.70, respectively. Conclusion: Although MRI images have different intensities, AGSE-VNet is not affected by the size of the tumor, and can more accurately extract the features of the three regions, it has achieved impressive results and made outstand- ing contributions to the clinical diagnosis and treatment of brain tumor patients. Keywords: Brain tumor, Magnetic resonance imaging, VNet, Automatic segmentation, Deep learning gliomas can be divided into four grades according to dif- Introduction ferent symptoms, of which I and II are low-grade gliomas Glioma is one of the common types of primary brain (LGG), III and IV are high-grade gliomas (HGG) [2]. Due tumors, accounting for about 50% of intracranial tumors to the high mortality rate of glioma, it can appear in any [1]. According to the WHO classification criteria, part of the brain and people of any age, with various his- tological subregions and varying degrees of invasiveness *Correspondence: g.yang@imperial.ac.uk; dmia_lab@zcmu.edu.cn [3]. Therefore, it has attracted widespread attention in Xi Guan and Guang Yang have contributed equally to this work School of Medical Technology and Information Engineering, Zhejiang the medical field. Because glioblastoma (GBM) cells are Chinese Medical University, Hangzhou 310053, China immersed in the healthy brain parenchyma and infiltrate Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 the surrounding tissues, they can grow and spread rap- 6NP, UK Full list of author information is available at the end of the article idly near the protein fibers, and the deterioration process © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Guan et al. BMC Medical Imaging (2022) 22:6 Page 2 of 18 is very rapid. Therefore, early diagnosis and treatment are on medical image segmentation have been developed in essential. both academia and industry. VNet [8] has good segmen- At present, the methods of acquiring brain tumors in tation performance in single-modal images, but there are clinical practice are mainly computed tomography (CT), still some shortcomings for multi-modal segmentation. positron emission tomography (PET), and magnetic reso- In this article, inspired by the integration of the “Project nance imaging (MRI) [4]. Among them, MRI has become and Excite” (PE) module into the 3D U-net proposed by the preferred medical imaging method for brain diagno- Anne-Marie et  al. [9], we proposed an automatic brain sis and treatment planning. Because it provides images tumor MRI Data segmentation framework, which is with high-contrast soft tissue and high spatial resolu- called 3D AGSE-VNet. The network structure is shown tion [5], it is a good representation of the anatomical in Fig.  1. The main contributions of this paper are: (1) structure of the cranial nerve soft tissue and the image Propose a combined segmentation model based on VNet, of the lesion. At the same time, MRI images can obtain integrating SE module and AG module. (2) Using volume multiple sequence information of brain tumors in differ - input, three-dimensional convolution is used to process ent spaces through one scan. This information includes MRI images. (3) Get excellent segmentation results, have four sequences of T1 weighting (T1), T1-weighted con- the potential clinical application. trast-enhanced (T1-CE), and T2 weighting (T2), fluid attenuation inversion recovery (FLAIR) [6, 7]. However, Related works manually segmenting tumors from MRI images requires Traditional machine learning professional prior knowledge, which is time-consuming At present, in clinical medicine, it is the goal that experts and labor-intensive, and is prone to errors, which is very and scholars have been pursuing to use fully automatic dependent on the doctor’s experience. Therefore, the segmentation methods to replace tedious manual seg- development of an accurate, reliable, and fully automatic mentation or semi-automatic segmentation. It is also the brain tumor segmentation algorithm has strong clinical focus and key technology of medical impact research in significance. recent years. Traditional image processing brain tumor With the development of computer vision and pat- segmentation algorithms use threshold-based segmenta- tern recognition, convolutional neural networks have tion methods, region-based segmentation methods, and been implemented to solve many challenging tasks. For boundary-based segmentation methods. Image segmen- example, classification, segmentation and target detec - tation based on the threshold is one of the simplest and tion capabilities have been greatly improved. In addition, most traditional methods in image processing. Tustison deep learning technology shows great potential in medi- et  al. proposed a two-stage segmentation framework cal image processing. So far, plenty of research studies based on Random Forest-derived probabilities, using the Fig. 1 The overall architecture of the proposed 3D AGSE-VNet Guan  et al. BMC Medical Imaging (2022) 22:6 Page 3 of 18 output of the first classifier to improve the segmentation of residual connections and proposed the residual path result of the second classifier [10]. Stadlbauer et  al. [11] (respath). It has verified its use in ISIC and BraTS. Good proposed using the normal distribution of data to obtain segmentation performance on the dataset [19]. Xu et  al. the threshold. According to the intensity change of each proposed progressive sequential causality to synthesize region, an adaptive threshold segmentation method was high-quality LGR-equivalent images and accurately seg- proposed to separate the foreground image from the ment all tissues related to the diagnosis to obtain highly background. However, this method has high limitations, accurate diagnostic indicators in a real clinical environ- and segmentation fails when multiple organizational ment [20]. Zhou et  al. proposed an effective 3D residual structures overlap. Amiri et  al. [12] proposed a multi- neural network for brain tumor segmentation, using a layer structure in which structural Random Forest and computationally efficient network 3D shuffleNetV2 as an Bayesian networks are embedded to learn tumor features encoder, and introducing a decoder with residual blocks better, but inputting a large number of features can eas- to achieve high-efficiency segmentation [21]. Saman et al. ily lead to dimensional disasters and waste plenty of time proposed an active contour model driven by optimized [13] uses a seed region growth algorithm to process brain energy function for MR brain tumor segmentation with MRI images according to the threshold T and the genera- uneven intensity correction and a method to identify and tion of PD images, and then uses the Markov logic algo- segment brain tumor slices in MRI images [22]. Liu et al. rithm to process them further to improve segmentation studied a deep learning model based on learnable group performance. convolution and deep supervision. This method replaces the convolution in the feature extraction stage with learn- Deep learning able group convolution. Tests on the BraTS2018 data- In recent years, convolutional neural networks have set show that the segmentation effect on the core area become the most popular method in image classifica - of the tumor is perfect, surpassing the winning method tion, and they are widely used in medical image analysis. NVDLMED [23]. In addition, CNN has also been widely Sérgio Pereira et al.[14] proposed an automatic position- used in other medical image analysis tasks. For example, ing method based on Convolutional Neural Network Yurttakal et  al. used the convolutional neural network (CNN) to explore the 3 × 3 core size, use a small core method for laryngeal histopathological image segmenta- to design a deeper architecture, and use intensity nor- tion, which is of great help to the early detection, moni- malization is used as a preprocessing step to train the toring and treatment of laryngeal cancer, and rapid and core validation data set in BraTS 2015. In the article by accurate tumor segmentation [24]. Hao et  al., a fully automatic brain tumor segmentation method based on U-net deep convolutional network Our work was proposed and evaluated on the BraTS 2015 data- Although many experts and scholars have proposed a set. Cross-validation shows that it can effectively obtain variety of deep learning network structures, and have promising segmentation [15]. Wang et  al. proposed a achieved good results in the field of brain tumor seg - cascade network. The first step is to segment the entire mentation. However, due to the inherent anisotropy of tumor. The second step is to segment the tumor nucleus brain glial tumors, MRI images show a high degree of using the obtained bounding box and segment the non-uniformity and irregular shapes [25]. Secondly, the enhanced tumor nucleus according to the bounding segmentation method of deep learning requires large- box of the tumor nucleus segmentation result. Use ani- scale annotation data, while brain tumor data is generally sotropic convolution and unfolded convolution, com- small and complex, and its inherent high heterogeneity bined with multi-view fusion methods to reduce false will cause intra-class differences between the sub-regions positives [16]. Andriy Myronenko proposed a 3D MRI of the brain tumor area and the tumor area, the differ - tumor subregion segmentation semantic network based ence between classes and non-tumor areas, etc. [26], on the encoder-decoder structure, which uses auto- these problems all affect the accuracy of brain tumor encoder branches to reconstruct images, and won first segmentation. place in the 2018 BraTS Challenge [17]. Feng Xue pro- In this article, to meet the above challenges, we use a posed an integrated 3D U-net brain tumor segmenta- combined model, integrate the “Squeeze and Excite” (SE) tion method. Using an integrated modelling method, module and the “Attention Guide Filter” (AG) module the encoder and decoder are input into 6 networks with into the VNet model for image segmentation of 3D MRI different colour block sizes and loss weights, and train - glioma brain tumors, it is an end-to-end network struc- ing has improved various performances [18]. In 2019, ture. We input data into the model in the form of volume Nabilibtehaz et  al. developed a novel architecture based input and use three-dimensional convolution to pro- on u-net, multires-unet, which increased the extension cess MRI images. When the image is compressed along Guan et al. BMC Medical Imaging (2022) 22:6 Page 4 of 18 with different encoder blocks, the resolution is halved, Methodology and the number of channels increases. After the image Method summary is convolved, the compression and compression module Our task is to segment multiple sequences of 3D MRI is performed. The importance of each feature channel is brain tumor images. In order to obtain good segmenta- automatically obtained through learning. Then according tion performance, we propose a new network structure to this important level to promote useful functions, and called AGSE-VNet, which combines SE (Squeeze-and- cancel the less useful functions of the current task. Each Excitation) [27] module with AG (Attention Guided decoder receives the characteristics of the corresponding Filter) module [28] is integrated into the network struc- stage of downsampling and decompresses the image, in ture, allowing the network to use global information to the upsampling, the AG module is integrated, the Atten- enhance useful feature channels selectively and suppress tion block is used to eliminate the influence of noise and useless feature channels, cleverly solving the mutual irrelevant background, and the Guide Image Filtering is dependence of feature maps, effectively suppressing the used to guide image features and structural information background information of the image, and enhancing the (edge information), it is worth mentioning that the idea accuracy of model segmentation. In the next section, we of skip connection is used in the model to avoid the dis- will introduce the network structure of AGSE-VNet in appearance of the gradient. Besides, we also use the Cate- detail. gorical_Dice loss function as the optimization function of the model, which effectively solves the problem of pixel Squeeze‑and‑excitation blocks imbalance. Figure 2 is a schematic diagram of the SE module, which We tested the performance of this model on the Mul- mainly includes the Squeeze module and the Excita- timodal Brain Tumor Segmentation Challenge (BraTS) tion module. The core of the module is to recalibrate 2020 dataset and compared it with the results of other the characteristic response of the channel adaptively teams participating in the challenge. The results show by explicitly modeling the interdependence between that our model has a good segmentation effect and has the channels. F in the figure is a standard convolu - tr the potential for clinical trials. The innovations of this tion operation, as shown in formula (1), input as X , ′ ′ ′ ′ Z ×W ×H ×C article are: (1) Clever use of channel relationships, using X ∈ R , where Z is the depth,H is the height global information to enhance useful information in the W is the width, C is the number of channels, the output Z×W×H×C s channel, to suppress useless information in the channel. is U , U ∈ R , v is a three-dimensional spatial (2) The attention mechanism is added, and the network convolution, v means that each channel acts on the cor- structure is also full of jump connections. The informa - responding channel feature. tion extracted by the downsampling can be quickly cap- tured to enhance the performance of the model. (3) Use s s U = v × X = v × x (1) c c the Categorical_Dice loss function to solve the problem s=1 of imbalance between foreground voxels and background voxels. Fig. 2 SE network module diagram Guan  et al. BMC Medical Imaging (2022) 22:6 Page 5 of 18 F (·) is the squeeze operation. As shown in formula (2), Among them, X = [x , x , . . . , x ] and F (u , s ) refer sq 1 2 c c c scale the feature U first passes the Squeeze operation. It com - to the corresponding channel between the feature map W×H presses the features along the spatial gradient and aggre- u ∈ R and the scalar s . c c gates the feature maps into the feature maps of dimension W × H as the feature descriptor. Each three-dimensional Attention guided filter blocks feature channel becomes a real number, which responds to Attention Guided Filter (AG) module combines attention the global distribution on the feature channel, to a certain block and guided image filtering. The Attention Guided extent, the real number at this time is closer to the global Filter filters the low-resolution feature maps and high- receptive field. This operation transforms the input of resolution feature maps to recover spatial information H × W × C into 1 × 1 × C output. and merge structural information from feature maps of different resolutions. Figure  3 is a schematic diagram of Z W H the Attention Block, where O and I are the input of the z = F (U ) = = k, i, j c sq c Z × H × W attention guided filter, and the attention map obtained i=1 j=1 k=1 by the calculation. Attention Block is extremely critical (2) in this method. It effectively solves the influence of the As shown in Eq.  3, in order to limit the complexity and background on the foreground and has the effect of high - generalization of the model, using two fully connected lay- lighting the foreground and reducing the background. ers as a parameterized gating mechanism. In Eq. 3, W × z For the given feature maps O and I , use convolution with represents a fully connected layer operation. The dimen - a channel of 1 × 1 × 1 to perform a linear transformation, sion of W is C × . Here m is a scaling parameter. In this and then combine the two converted feature maps with article, we set m = 4 empirically. The parameter aims is to the ReLU layer through element addition, and then use a reduce the number of channels and thus reduce the num- 1 × 1 × 1 . The convolution is again linearly transformed, ber of calculations. Then through a ReLU layer, the output and the sigmoid is most used to activate the final atten - dimension remains unchanged and then multiplying with tion feature map T . W . This process of multiplying with W is also an operation 2 2 Figure 4 is a schematic diagram of the results of the AG of a fully connected layer. The dimension of W is C × , module. The input is the guided feature map (I) and the and finally, through the Sigmoid function, the parameter s filtered feature map (O) , and the output is the high-reso- is obtained. lution feature map O , which is the product of the joint s = F (z, W ) = σ g(z, W ) = σ (W δ(W , z)) (3) action of I and O , as shown in formula (5). ex 2 1 C C ×C C× m m where δ is the ReLU operation, W ∈ R , W ∈ R , 1 2 O = W (I) · O i ij j (5) and finally a 1 × 1 × C real number sequence is com- i∈w bined with U , recalibrated, and the final output is obtained by formula (4). Different from the guided filtering proposed by Kaim - x = F (u , s ) = s · u (4) ing He [29], the attention feature map T is generated c scale c c c c from the filtered feature map (O) through the Atten- tion Block module. First, the guided feature map I is Fig. 3 Attention Block schematic diagram Guan et al. BMC Medical Imaging (2022) 22:6 Page 6 of 18 Fig. 4 Attention Guided Filter Blocks structure diagram down-sampled to obtain a low-resolution feature map average value of all windows containing the pixel at that I , which is similar to the feature map O Same size. Then pixel for such a pixel, as shown in formula (8). minimize the reconstruction error of O and I to obtain the coefficients A and B of the attention guided filter. O = (a I + b ) = A ∗ I + B l l i k i k l l l (8) |w| After that, by down-sampling A and B or coefficients l l k,i∈w A and B , finally get the high resolution generated by h h Get A and B through upsampling, and finally get the h h the attention filter Feature map O . Among them, the output O = A ∗ I + B . h h attention filter is essentially a specific window w with a radius of r . In particular, the attention guide filter will construct a square window w , and the radius of Downsamplings k at each position is r . In our study, we set r = 16 and In Fig.  1, we provide a schematic diagram of AGSE- ε = 0.1 empirically based on the final segmentation VNet. The network structure is divided into encoder and performance. (a , b ) is also the only certain constant decoder in total, as shown in Fig.  5. Among them, Fig. a k k coefficient, as shown in formula (6 ), where ridge regres- is the encoder, and the coding area mainly performs com- sion with a standard term is used to calculate the mini- pression path, and Fig. b is the decoder, and the decoding mum reconstruction error. area performs decompression. Downsampling is com- posed of four encoder blocks, each of which includes 2–3 2 2 2 min E(a , b ) := T (a I + b − O ) + εa layers of convolution, an extrusion and excitation layer k k k li k i i k a ,b k k i∈w and a downsampling layer, the processing process of the (6) SE module is shown on the right side of Fig. 5a. The fea - ture extraction is performed by convolution with a step where T is the attention weight at position i , ε is the reg- size of 2. The convolution is as follows (9), (10) shows: ularization parameter, and the calculation of (a , b ) is k k shown in formula (7). i = i + (s − 1)(i − 1) (9) I O − μ O i i k k |w| i∈w a = , b = O − a μ k k k k k i + 2p − k σ + ε o = + 1 k (10) (7) where μ is the average value of the window w pixels in k k i i where is the input size, is the output size after filling, the image I , σ is the variance of the window w , |w| is the k s p is the step size, is the filling size, k is the convolution sum of the window pixels, and O is the average pixel k o kernel size, and is the output size. value O = O of the image O to be filtered in the k i When the image is compressed along with different |w| i∈w encoder blocks, its resolution is halved and the num- window w , so that the non-edge area can be found if a ber of channels doubled. This is achieved by convolu - pixel is surrounded by multiple windows, calculate the 3 × 3 × 3 tion of voxels with a step size of 2. After the Guan  et al. BMC Medical Imaging (2022) 22:6 Page 7 of 18 Fig. 5 The architecture of encoder block and decoder block with AGSE-VNet convolution operation, the squeeze and compression deconvolution with a step size of 2 to fill in the image fea - module is performed, which ingeniously solves the rela- ture information. The deconvolution is shown in formula tionship between the channels and improves the effec - (11): tive information transmission in the channels. It is worth o = s(i − 1) + 2p − k + 2 (11) mentioning that all convolutional layers have adopted normalization and dropout processing, and the ReLU Each decoder block receives the characteristics of the activation function has also been applied to various corresponding stage of downsampling. The convolu - positions in the network structure. Besides, a jump con- tion kernel used in the last layer of the network struc- nection method is also used in the model to avoid the ture keeps the number of output channels consistent disappearance of the gradient as the network structure with the number of categories. Finally, the channel value deepens. is converted into a probability value output through the sigmoid function, and the voxel is converted into a brain tumor gangrene area. The idea of skip connection Upsampling is adopted in each decoder block. The feature map after After downsampling the model, we introduced the AG processing by the encoder and decoder is shown in Fig. 6, module to solve the problem of restoring spatial infor- where Fig.  6a is a feature map processed by the encoder, mation and fusing structural information from low-res- and Fig. 6b is a feature map processed by the decoder. olution feature maps to high-resolution feature maps. The AG module is similar to the SE module. Based on not changing the dimensions of input and output, the Skip connection features are enhanced. Therefore, we replace the splic - To further make up for the information lost in the down- ing module in the VNet model with the AG module and sampling of the encoder, concat is used between the integrate it into the decoder. The structure diagram is encoder and decoder of the network to fuse the feature shown in Fig.  5. Each decoder block includes an upsam- maps of the corresponding positions in the two pro- pling layer, an AG module, and three layers of convolu- cesses. In particular, the method extracted in this article tion, the processing flow of the AG module is shown in uses the AG (Attention Guided Filter Blocks) module the box on the right side of Fig. 5b. The decoder decom - instead of concat, so that the decoder can obtain infor- presses the image. In the up-sampling, this article uses mation during upsampling. With more high-resolution Guan et al. BMC Medical Imaging (2022) 22:6 Page 8 of 18 Fig. 6 Feature map processed by encoder and decoder information, the detailed information in the original convolutional layers with the same size feature map, image can be restored more perfectly, and the segmenta- that is, use the splicing layer to convolve the feature tion accuracy can be improved. map obtained through the convolution of the previ- We introduced adjacent layer feature reconstruc- ous layer and the next layer. Obtaining the channel size tion and cross-layer feature reconstruction in the net- achieves the purpose of maximizing the use of feature work. The cross-layer feature reconstruction module is information in all previous layers. based on the encoder-decoder structure. In the process of network communication, as the network continues to deepen, the acceptance domain of the correspond- Loss function ing feature map will become larger and larger, but the At present, the segmentation of medical images faces retained detailed information will become less and the problem of the imbalance between the foreground less. Based on the encoder-decoder symmetric struc- and the background regions. We also face such chal- ture, the splicing layer is used to splice the feature lenges in our tasks. Therefore, we choose the Categori - maps extracted from the down-sampling in the encoder cal_Dice loss function as the optimization function of process and the new features obtained from the up- our model. Heavy to solve this problem by adjusting sampling in the decoder process to perform channel- the weight of each forecast category. We set the weight dimensional splicing. Retaining more important feature of gangrene, edema, and enhanced tumor to 1, and the information is conducive to achieving a better seg- weight of background to 0.1. The Categorical_Dice loss mentation effect. Adjacent layer feature reconstruction function is shown in formula (12): is to establish a branch between each pair of adjacent Guan  et al. BMC Medical Imaging (2022) 22:6 Page 9 of 18 2|P ∩ G| The experimental environment was conducted on Ten - Dice(P, G) = (12) sorflow 1.13.1, and the runtime platform processor was |P| + |G| Intel (R) Core (TM) i7-9750H CPU @ 2.60  GHz, 32  GB Among them, G is Mask, the ground truth encoded by RAM, Nvidia GeForce RTX 2080, 64-bit Windows 10. one-hot, G ∈ [None, 64, 128, 128, 4] and P represents the The development software platform is PyCharm, and the predicted value, which is the probability result obtained python version is 3.6.9. after softmax calculation, P ∈ [None, 64, 128, 128, 4] . The partial differential calculation of formula (13) is per - Pre‑processing formed to obtain the gradient value relative to the pre- Since our data set has four modalities, T1, T1-CE, T2, dicted j-th voxel, where N stands for voxel, p ∈ P and and FLAIR, there is a problem of different contrast, g ∈ G. which may cause the gradient to disappear during the   � � � � � � � training process, so we use standardization to process the N N N 2 2 g p + g − 2p p g j j i i ∂D i i i i i   image, from the image pixel. The image data is normal - = 2  � �  � � ∂p ized by subtracting the average value and dividing by the j N N 2 2 p + g i i i i standard deviation. Calculated as follows: (13) X − μ The weight distribution of the loss function of each X = (15) node is shown in formula (14), and the weight value is [0.1, 1.0, 1.0, 1.0]. where μ donates the mean of the image, σ donates stand- ard deviation, X donates the image matrix, X is the nor- Loss =−Dice(P, G) × weight (14) malized image matrix. After normalization, we merge the images of the four modalities with the same contrast to form a three-dimen- Materials sional image with four channels. The original image size Dataset is, and the combined image size becomes. The size of the In this research, we use the dataset of the BraTS 2020 label is, and its pixel value contains 4 different values. challenge to train and test our model [30, 31]. The data Channel 0 is the normal tissue area, 1 is the gangrene set contains two types, namely low-grade glioma (LGG) area, 2 is the edema area, and 3 is the enhanced tumor and glioblastoma (HGG), each category has four modal area. Then, divide the image and mask into multiple images: T1 weighting (T1), T1-weighted contrast- blocks and perform the patch operation. Each case gen- enhanced (T1-CE), and T2 weighting (T2), fluid attenu - erates 175 images with a size of 128 × 128 × 64. Finally, ation inversion recovery (FLAIR). The mask of the brain save it in the corresponding folder in NumPy.npy format tumor includes the gangrene area, edema area, and (https:// numpy. org/ doc/ stable/ refer ence/). The preproc - enhancement area. Our task is to segment the three sub- essed image is shown in Fig. 7. regions formed by nesting tags, which are enhancement tumor (ET), whole tumor (WT), and tumor core (TC). Evaluation metrics There are 369 cases in the training set and 125 cases We use the dice coefficient, specificity, sensitivity, and in the validation set. The masks corresponding to these Hausdorff95 distance to measure the performance of our cases are not used for training, and their functions are model. Dice coefficient is calculated as: mainly used for evaluating the model after training. 2TP Dice = (16) FN + FP + 2TP Design detail where TP , FP and FN are the number of true positive, In deep learning training, the setting of hyperparameters false positive, and false negative respectively. Specificity is very essential, and it will determine the performance can be used to evaluate the number of true negative and of our model. But often in training, the initial value of false positive, it is used to measure the model ability to the hyperparameter is set by experience. In the training predict the background area, defined as: of the AGSE-VNet model, the initial learning rate is set to 0.0001, the dropout is set to 0.5, the number of train- TN Specificity = (17) ing steps is about 350,000, and then the learning rate is TN + FP adjusted to 0.00003. The dataset is halved every time it is traversed, and the data is shuffled to enhance the robust - where TN is the number of true negative. Sensitivity can ness and generalization ability of the model. be used to evaluate the number of the true positive and Guan et al. BMC Medical Imaging (2022) 22:6 Page 10 of 18 Fig. 7 Preprocessed result false negative, it is used to measure the sensitivity of the Results and discussions model to segmented regions, defined as: Results on AGSE‑VNet model Our data set includes a training set and a test set. The TP training set contains 369 cases and the test set contains Sensitivity = (18) TP + FN 125 cases. The mask of the tumor includes the gan - grene area, edema area, enhancement area, and back- The Hausdorff95 distance measures the distance between ground area. The labels correspond to 1, 2, 4, and 0, the surface of the real area and the predicted area, which is respectively. These labels are merged into three nested more sensitive to the segmented boundary, defined as: sub-areas, namely the enhancing tumor (ET), the whole tumor (WT), and the tumor core (TC), for these sub- Haus95(T , P) = max sup inf d(t, p), sup inf d(t, p) regions, we use four indicators of sensitivity, specificity, t∈T ,p∈P p∈P,t∈T dice coefficient, and Hausdorff95 distance to measure the (19) performance of the model. We use the data set of BraTS where inf denotes the infimum and sup denotes the 2020 for training and verification, and the average index supremum, t and p donate the points on the surface T of obtained is shown in Table  1. From Table  1, we observe the ground-truth area and the surface P of the pre data- that the model has a better segmentation effect on the set dictated area. Besides, d ·, · calculates the distance ( ) WT region. The Dice and Sensitivity of the training set between the assembly point t and the assembly point p. and the validation set are 0.846, 0.849, 0.825, and 0.833, respectively, which are significantly better than other regions. Table 1 Quantitative valuation on the training set and validation set Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Training 0.70 0.85 0.77 0.72 0.83 0.74 0.99 0.99 0.99 35.70 8.96 17.40 Validation 0.68 0.85 0.69 0.68 0.83 0.65 0.99 0.99 0.99 47.40 8.44 31.60 Guan  et al. BMC Medical Imaging (2022) 22:6 Page 11 of 18 On this basis, we conduct a statistical analysis of the can be seen that the fluctuation range is small. Observing experimental results. Figures 8 and 9 are the scatter plots the scatter diagram on the left side, it can be seen that and box plots of the four evaluation indicators of the the data are all clustered at a higher position, indicating training set and test set, reflecting the distribution char - that our model is the background area has a high level of acteristics of the results. It can be seen from the box plot prediction, which can effectively alleviate the problem of that there are fewer outliers of various indicators and imbalance between foreground pixels and background. minimal fluctuation of results. The horizontal line in the We randomly selected several slices from the training box plot represents the median of this set of data. It can set and compared the actual situation with the results be observed that the three indicators of Dice, Sensitiv- predicted by our model, as shown in Fig.  10a, the first ity, and Specificity are at a higher level, which shows that line is the original image, the second line is the label, and the segmentation effect of our proposed model is located the third line is the tumor sub-region predicted by our in a higher area. In the results of the four indicators, the model. At the same time, we also selected two of them sensitivity results are all concentrated at a higher level. It to display in Fig. 10b. Among them, the green area is the Fig. 8 A collection of scatter plots and box plots of four indicators in the training set Guan et al. BMC Medical Imaging (2022) 22:6 Page 12 of 18 Fig. 9 A collection of scatter plots and box plots of four indicators in the validation set whole tumor (WT), the red area is the tumor core (TC), display, as shown in Fig.  11a. Similarly, in Fig.  11b, we and the area combining yellow and red represents the also show the three-dimensional image of the segmen- enhancing tumor(ET). We show the 3D image of the seg- tation result and annotate the accuracy value of the ET mentation result in the last two columns. From the com- region, as can be seen from the figure, our model has parison of the segmentation results, we can find that our a good segmentation effect for MRI images of differ - model has a good effect on brain tumor segmentation, ent intensities, and can accurately segment tumor sub- especially the whole tumor (WT) region segmentation regions, which has a certain potential in brain tumor effect is excellent. However, the segmentation prediction image segmentation. of the tumor core (TC) is slightly biased, which may not In our research, we proposed the AGSE-VNet model be suitable for extraction due to the small features of the to segment 3D MRI brain tumor images and obtained tumor core. better segmentation results on the BraTS 2020 dataset. After the training is completed, we randomly selected In order to further verify the effect of our segmentation, several segmentation slices in the validation set for compare our experimental method with the methods Guan  et al. BMC Medical Imaging (2022) 22:6 Page 13 of 18 Fig. 10 Display of segmentation results in the training set. a Example segmentation results in 2D. b Example segmentation results with 3D rendering proposed by other outstanding teams participating in the our model performs well in the whole tumor (WT) competition. The results of other teams are available on region and obtains relatively excellent results, indicating the official website of the BraTS Multimodal Brain Tumor that the method we proposed has a certain potential in Segmentation Challenge 2020 (https:// www. cbica. upenn. segmentation. edu/ BraTS 20/ lboar dTrai ning. html). The comparison results of the training set are shown in Table  2, and the Discussion comparison results of the verification set are shown in The method proposed in this paper cleverly solves the Table  3. From the results in the table, we can find that problem of interdependence between channels, and Guan et al. BMC Medical Imaging (2022) 22:6 Page 14 of 18 Fig. 11 Display of segmentation results in the validation set. a Example segmentation results in 2D. b Example segmentation results with 3D rendering autonomously extracts effective features from channels and segmented, and the segmentation effect obtained to suppress useless feature channels. After the features has a good performance. This is beneficial to radiologists extracted by the encoder, low-resolution feature maps and oncologists, who can quickly predict the condition and high-resolution feature maps are filtered through of the tumor and assist in the treatment of the patient. the Attention module, to recover spatial information and Comparing the results in Tables 2 and 3, we find that our fusion structural information from feature maps of dif- model performs well in the whole tumor (WT) area, but ferent resolutions, our method is not affected by the size does not perform well in the enhancing tumor (ET) and and location of the tumor. For MRI images of different the tumor core (TC) areas, this may be because the tar- intensities, the tumor area can be automatically identi- get in the ET area is small and the feature is fuzzy and fied, and the tumor sub-regions can be feature extracted difficult to extract. At the same time, we compare our Guan  et al. BMC Medical Imaging (2022) 22:6 Page 15 of 18 Table 2 The results of various indicators in the training set Team Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Proposed 0.70 0.85 0.77 0.72 0.83 0.74 0.99 0.99 0.99 35.70 8.96 17.40 mpstanford 0.60 0.78 0.72 0.56 0.80 0.75 0.99 0.99 0.99 35.95 17.68 17.21 agussa 0.67 0.87 0.79 0.69 0.87 0.82 0.99 0.99 0.99 39.25 15.75 17.05 ovgu_seg 0.65 0.81 0.75 0.72 0.78 0.76 0.99 0.99 0.99 34.79 9.50 8.93 AI-Strollers 0.59 0.73 0.61 0.52 0.73 0.64 0.99 0.97 0.98 38.87 20.81 24.22 uran 0.48 0.79 0.64 0.45 0.74 0.61 0.99 0.99 0.99 37.92 7.72 14.07 CBICA 0.54 0.78 0.57 0.64 0.82 0.53 0.99 0.99 0.99 20.00 46.30 39.60 unet3d-sz 0.69 0.81 0.75 0.77 0.93 0.83 0.99 0.96 0.98 37.71 19.57 18.36 iris 0.76 0.88 0.81 0.78 0.90 0.83 0.99 0.99 0.99 32.30 18.07 14.70 VuongHN 0.74 0.81 0.82 0.84 0.98 0.84 0.95 0.93 0.99 21.97 12.32 8.72 Table 3 The results of various indicators in the validation set Team Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Proposed 0.68 0.85 0.69 0.68 0.83 0.65 0.99 0.99 0.99 47.40 8.44 31.60 mpstanford 0.49 0.72 0.62 0.49 0.81 0.69 0.99 0.99 0.99 61.89 26.00 28.02 agussa 0.59 0.83 0.69 0.60 0.87 0.71 0.99 0.99 .0.99 56.58 23.23 29.59 ovgu_seg 0.60 0.79 0.68 0.66 0.79 0.67 0.99 0.99 0.99 54.07 12.05 19.10 AI-Strollers 0.58 0.74 0.61 0.52 0.77 0.62 0.99 0.99 0.99 47.23 24.03 31.54 uran 0.75 0.88 0.76 0.77 0.85 0.71 0.99 0.99 0.99 36.42 6.62 19.30 CBICA 0.63 0.82 0.67 0.76 0.78 0.75 0.99 0.99 0.99 9.60 10.70 28.20 unet3d-sz 0.70 0.84 0.72 0.71 0.87 0.79 0.99 0.99 0.99 42.09 10.48 12.32 iris 0.68 0.86 0.73 0.67 0.90 0.70 0.99 0.99 0.99 44.13 23.87 20.02 VuongHN 0.79 0.90 0.83 0.80 0.89 0.80 0.99 0.99 0.99 21.43 6.74 7.05 Table 4 Comparison of our proposed AGSE-VNet model with the method proposed by Zhao et al., a new segmentation classic methods framework was developed, using a fully convolutional neural network to assign different labels to the image Method Dice_ET Dice_WT Dice_TC Dataset in pixel units, optimize the output results of FCNNs by Proposed 0.67 0.85 0.69 BraTs 2020 using the recurrent neural network constructed by the Zhou et al 0.65 0.87 0.75 BraTs 2018 conditional random place, this method was verified on Zhao et al 0.62 0.84 0.73 BraTs 2016 the BraTS 2016 dataset and got a good segmentation Pereira et al 0.65 0.78 0.75 BraTs 2015 effect. Pereira et  al. proposed an automatic position - ing method for convolutional neural networks, which achieved good results in the BraTS 2015 dataset. Analyzing Table 4, we found that our model has certain method with some classic algorithms for brain tumor advantages in segmentation, there are still differences segmentation. The results are shown in Table  4. In the in TC regional accuracy, and the model has limitations. BraTS Challenge, 2018, zhou et  al. [32] and others pro- In future work, we will propose solutions to this situa- posed a lightweight one-step multi-task segmentation tion, such as how to further segment the region of inter- model, by learning the shared parameters of joint fea- est after our model has extracted it, in order to improve tures and the composition features of distinguishing the accuracy of the enhancing tumor (ET) and the tumor specific task parameters, the imbalance factors of tumor core (TC) areas, more characteristic information can be types are effectively alleviated, uncertain information is captured. Besides, the algorithms proposed in many top suppressed, and the segmentation result is improved. In methods have their areas of excellent performance. How Guan et al. BMC Medical Imaging (2022) 22:6 Page 16 of 18 we combine the advantages of these algorithms and inte- input ratio and output ratio are unchanged. After the grate them into our model is the focus of our future work. SE module processes the model, the network learns the In clinical treatment, it helps experts to understand the global information and selects the useful information patient’s current situation more quickly and accurately, in the enhancement channel, and then uses the atten- saving experts time, and realizing a leap in the quality of tion mechanism of the Attention Guild Filter block to automatic medical segmentation. quickly capture its dependencies and enhance the per- In addition, in order to verify the robustness of our formance of the model. Secondly, we also introduced model to resist noise interference, we have now added a new loss function Categorical_Dice, set different Gaussian noises in the frequency domain (k-space) of the weights for unused masks, set the weight of the back- testing data to simulate realistic noise contaminations. ground area to 0.1, and set the tumor area of interest The comparison results are shown in Fig.  12. From the to 1, Ingeniously solve the problem of the voxel imbal- noisy and no-noise segmentation results, we have found ance between the foreground and the background. Our that the segmentation results of our AGSE-VNet model online verification tool on the BraTS Challenge website for the three regions are not much different. These results evaluated this approach. It is found that our model is can demonstrate that our model has a significant advan - still different from the top methods for the segmenta - tage in generalization when noises are present. tion of the enhancing tumor (ET) and the tumor core (TC) regions. It may be because the features of these Conclusion two regions are small and difficult to extract. How to All in all, we have implemented a good method to improve the accuracy of these two regions is our future segment 3D MRI brain tumor images, this method work direction. can automatically segment the three regions of the The automatic segmentation of brain tumors in the enhancing tumor (ET), the whole tumor (WT), and medical field has been a long-term research problem. the tumor core (TC) of the brain tumors. We con- How to design an automatic segmentation algorithm ducted experiments on the BraTS 2020 data set and with short time and high accuracy, and then form a com- got good results. The AGSE-VNet model is improved plete system is the current direction of a large number of based on VNet. There are five encoder blocks and four researchers. Therefore, we must continue to optimize our decoder blocks. Each encoder block has an extrusion segmentation model to achieve a qualitative leap in the and excitation block, and each decoder has an Atten- field of automatic segmentation. tion Guild Filter block. Such a design can be embed- ded in our model without affecting the size mismatch of the network structure under the condition that the Fig. 12 Comparison of segmentation results without noise and noise added Guan  et al. BMC Medical Imaging (2022) 22:6 Page 17 of 18 Authors’ contributions for brain tumor segmentation using multi-parametric MRI. NeuroImage Clin. XG, GY, and XL conceived and designed the study, contributed to data 2016;12(2):753–64. analysis, contributed to data interpretation, and contributed to the writing of 8. Milletari F, Navab N, Ahmadi S. V-Net: fully convolutional neural networks the report. XG, GY, JY, WY, XX, WJ, and XL contributed to the literature search. for volumetric medical image segmentation. In: 2016 fourth international JY and WY contributed to data collection. XG, GY, XX, WJ, and XL performed conference on 3D vision (3DV ). IEEE. 2016. data curation and contributed to the tables and figures. All authors read and 9. Rickmann A, Roy A, Sarasua I, Navab N, Wachinger C. `Project & Excite’ mod- approved the final manuscript. ules for segmentation of volumetric medical scans. Image Video Processing. Funding 10. Tustison N, Shrinidhi K, Wintermark M, Durst CR, Kandel BM, Gee JC, This work is funded in part by the National Natural Science Foundation Grossman MC, Avants BB. Optimal symmetric multimodal templates and of China (Grants No. 62072413, 61602419), in part by the Natural Science concatenated random forests for supervised brain tumor segmentation Foundation of Zhejiang Province of China (Grant No. LY16F010008), in part (simplified) with ANTsR. Neuroinformatics. 2015;13(2):209–25. by Medical and Health Science and Technology Plan of Zhejiang Province of 11. Rose S, Crozier S, Bourgeat P, Dowson N, Salvado O, Raniga P, Pannek K, Coul- China (Grant No. 2019RC224), in part by the Teacher Professional Develop- thard A, Fay M, Thomas P. Improved delineation of brain tumour margins ment Project of Domestic Visiting Scholar in Colleges and Universities of using whole-brain track-density mapping. In: Ismrm-esmrmb joint meeting: Zhejiang Province of China (Grants No.2020-19, 2020-20), in part by the UK clinical needs & technological solutions. International Society of Magnetic Research and Innovation Future Leaders Fellowship (MR/V023799/1), and Resonance in Medicine. 2009. also supported in part by the AI for Health Imaging Award ‘CHAIMELEON: 12. Amiri S, Mahjoub MA, Rekik I. Bayesian network and structured random Accelerating the Lab to Market Transition of AI Tools for Cancer Management’ forest cooperative deep learning for automatic multi-label brain tumor [H2020-SC1-FA-DTS-2019-1 952172]. segmentation. In: 10th international conference on agents and artificial intelligence. 2018. Availability of data and materials 13. Balafar M. Fuzzy cc-mean based brain MRI segementation algorithms. Artif The datasets analysed during this current study are available in the BRATS Intell Rev. 2014;41(3):441–9. 2020. https:// www. med. upenn. edu/ cbica/ brats 2020/ data. html. 14. Pereira S, Pinto A, Alves V. Brain tumor segmentation using convo- lutional neural networks in MRI images. IEEE Trans Med Imaging. 2016;35(5):1240–51. Declarations 15. Hao D, Yang G, Liu F, Mo Y, Guo Y. Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks. In: Annual Ethics approval and consent to participate conference on medical image understanding and analysis. Springer, Cham; Not applicable. 16. Wang G, Li W, Ourselin S, Vercauteren T. Automatic brain tumor segmenta- Consent for publication tion using cascaded anisotropic convolutional neural networks. Comput Vis All authors contributed to the article and approved the submitted version. Pattern Recognit. 2017;12(5). 17. Myronenko A. 3D MRI brain tumor segmentation using autoencoder regu- Competing interests larizatio. Berlin: Springer; 2018. The authors declare no competing interests. 18. Xue F, Nicholas T, Meyer C. Brain tumor segmentation using an ensemble of 3D U-nets and overall survival prediction using radiomic features. Comput Author details Vis Pattern Recognit. 2018;279–288. School of Medical Technology and Information Engineering, Zhejiang Chi- 19. NabilIbtehaz M, Rahman S. MultiResUNet: rethinking the U-net architec- nese Medical University, Hangzhou 310053, China. Cardiovascular Research ture for multimodal biomedical image segmentation. Comput Vis Pattern Centre, Royal Brompton Hospital, London SW3 6NP, UK. National Hear t Recognit. 2019;121. and Lung Institute, Imperial College London, London SW7 2AZ, UK. First Affili- 20. Xu C, Xu L, Ohorodnyk P, Roth M, Li M. Contrast agent-free synthesis and ated Hospital, Gannan Medical University, Ganzhou 341000, China. College segmentation of ischemic heart disease images using progressive sequen- of Life Science, Zhejiang Chinese Medical University, Hangzhou 310053, China. tial causal GANs. Medical Image Analysis 101668. 2020. 21. Zhou X, Li X, Hu K, Zhang Y, Chen Z, Gao X. ERV-Net: An efficient 3D residual Received: 16 January 2021 Accepted: 26 July 2021 neural network for brain tumor segmentation. Expert Syst Appl. 2021;170. 22. Saman S, Narayanan S. Active contour model driven by optimized energy functionals for MR brain tumor segmentation with intensity inhomogeneity correction. Multimedia Tools Appl. 2021;80(4):21925–54. 23. Liu H, Li Q, Wang L. A deep-learning model with learnable group convolu- References tion and deep supervision for brain tumor segmentation. Math Probl Eng. 1. Nie J, Xue Z, Liu T, Young GS, Setayesh K, Lei G. Automated brain tumor 2021;3:1–11. segmentation using spatial accuracy-weighted hidden Markov random 24. Yurttakal A, Erbay H. Segmentation of Larynx histopathology images via field. Comput Med Imaging Graph. 2009;33(6):431–41. convolutional neural networks. In: Intelligent and Fuzzy Techniques: Smart 2. Bakas S, Reyes M, Jakab A, et al. Identifying the best machine learning algo- and Innovative Solutions. 2021;949–954. rithms for brain tumor segmentation. Progression assessment, and overall 25. Zhao X, Wu Y, Song G, Li Z, Zhang Y, Fan Y. A deep learning model inte- survival prediction in the BRATS challenge, arXiv preprint arXiv: 1811. 02629. grating fcnns and crfs for brain tumor segmentation. Med Image Anal. 2018;43:98–111. 3. Essadike A, Ouabida E, Bouzid A. Brain tumor segmentation with Vander 26. Sturm D, Pfister S, Dtw J. Pediatric gliomas: current concepts on diagnosis, Lugt correlator based active contour. Comput Methods Programs Biomed. biology, and clinical management. J Clin Oncol. 2017;35(21):2370. 2018;60:103–17. 27. Hu J, Li S, Albanie S, Sun G. Squeeze-and-excitation networks. IEEE Trans 4. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y. Brain tumor Pattern Anal Mach Intell. 2017;99. segmentation with deep neural networks. Med Image Anal. 2017;35:18–31. 28. Zhang S, Fu H, Yan Y, Zhang Y, Wu Q, Tan M, Xu Y. Attention guided network 5. Akkus Z, Galimzianova A, Hoogi A, Daniel R. Deep learning for brain MRI for retinal image segmentation. In: Medical image computing and com- segmentation: state of the art and future directions. J Digit Imaging. puter assisted intervention—MICCAI 2019. 2019. 2017;30(4):449–59. 29. He K, Sun J, Tang X. Guided image filtering. Lect Notes Comput Sci. 6. Hussain S, Anwar S, Majid M. Segmentation of Glioma tumors in brain using 2013;35(6):1397–409. deep convolutional neural network. Neurocomputing. 2017; 282. 30. Menze B, Jakab A, Bauer S, Jayashree KC, Keyvan F, Justin K. The multimodal 7. Sauwen N, Acou M, Cauter S, Sima DM, Veraart J, Maes F, Himmelreich U, brain tumor image segmentation benchmark (BRATS). IEEE Trans Med lmag- Achten E, Van Huffel S. Comparison of unsupervised classification methods ing. 2015;34(10):1993–2024. Guan et al. BMC Medical Imaging (2022) 22:6 Page 18 of 18 31. Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Nat Sci Data. 2017;4: 170117. https:// doi. org/ 10. 1038/ sdata. 2017. 117. 32. Zhou C, Ding C, Wang X, Lu Z, Tao D. One-pass multi-task networks with cross-task guided attention for brain tumor segmentation. Computer Vision and Pattern Recognition. 2019. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Medical Imaging Springer Journals

3D AGSE-VNet: an automatic brain tumor MRI data segmentation framework

3D AGSE-VNet: an automatic brain tumor MRI data segmentation framework

Background: Glioma is the most common brain malignant tumor, with a high morbidity rate and a mortality rate of more than three percent, which seriously endangers human health. The main method of acquiring brain tumors in the clinic is MRI. Segmentation of brain tumor regions from multi-modal MRI scan images is helpful for treatment inspection, post-diagnosis monitoring, and effect evaluation of patients. However, the common operation in clinical brain tumor segmentation is still manual segmentation, lead to its time-consuming and large performance difference between different operators, a consistent and accurate automatic segmentation method is urgently needed. With the continuous development of deep learning, researchers have designed many automatic segmentation algorithms; however, there are still some problems: (1) The research of segmentation algorithm mostly stays on the 2D plane, this will reduce the accuracy of 3D image feature extraction to a certain extent. (2) MRI images have gray-scale offset fields that make it difficult to divide the contours accurately. Methods: To meet the above challenges, we propose an automatic brain tumor MRI data segmentation framework which is called AGSE-VNet. In our study, the Squeeze and Excite (SE) module is added to each encoder, the Atten- tion Guide Filter (AG) module is added to each decoder, using the channel relationship to automatically enhance the useful information in the channel to suppress the useless information, and use the attention mechanism to guide the edge information and remove the influence of irrelevant information such as noise. Results: We used the BraTS2020 challenge online verification tool to evaluate our approach. The focus of verification is that the Dice scores of the whole tumor, tumor core and enhanced tumor are 0.68, 0.85 and 0.70, respectively. Conclusion: Although MRI images have different intensities, AGSE-VNet is not affected by the size of the tumor, and can more accurately extract the features of the three regions, it has achieved impressive results and made outstand- ing contributions to the clinical diagnosis and treatment of brain tumor patients. Keywords: Brain tumor, Magnetic resonance imaging, VNet, Automatic segmentation, Deep learning gliomas can be divided into four grades according to dif- Introduction ferent symptoms, of which I and II are low-grade gliomas Glioma is one of the common types of primary brain (LGG), III and IV are high-grade gliomas (HGG) [2]. Due tumors, accounting for about 50% of intracranial tumors to the high mortality rate of glioma, it can appear in any [1]. According to the WHO classification criteria, part of the brain and people of any age, with various his- tological subregions and varying degrees of invasiveness *Correspondence: g.yang@imperial.ac.uk; dmia_lab@zcmu.edu.cn [3]. Therefore, it has attracted widespread attention in Xi Guan and Guang Yang have contributed equally to this work School of Medical Technology and Information Engineering, Zhejiang the medical field. Because glioblastoma (GBM) cells are Chinese Medical University, Hangzhou 310053, China immersed in the healthy brain parenchyma and infiltrate Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 the surrounding tissues, they can grow and spread rap- 6NP, UK Full list of author information is available at the end of the article idly near the protein fibers, and the deterioration process © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Guan et al. BMC Medical Imaging (2022) 22:6 Page 2 of 18 is very rapid. Therefore, early diagnosis and treatment are on medical image segmentation have been developed in essential. both academia and industry. VNet [8] has good segmen- At present, the methods of acquiring brain tumors in tation performance in single-modal images, but there are clinical practice are mainly computed tomography (CT), still some shortcomings for multi-modal segmentation. positron emission tomography (PET), and magnetic reso- In this article, inspired by the integration of the “Project nance imaging (MRI) [4]. Among them, MRI has become and Excite” (PE) module into the 3D U-net proposed by the preferred medical imaging method for brain diagno- Anne-Marie et  al. [9], we proposed an automatic brain sis and treatment planning. Because it provides images tumor MRI Data segmentation framework, which is with high-contrast soft tissue and high spatial resolu- called 3D AGSE-VNet. The network structure is shown tion [5], it is a good representation of the anatomical in Fig.  1. The main contributions of this paper are: (1) structure of the cranial nerve soft tissue and the image Propose a combined segmentation model based on VNet, of the lesion. At the same time, MRI images can obtain integrating SE module and AG module. (2) Using volume multiple sequence information of brain tumors in differ - input, three-dimensional convolution is used to process ent spaces through one scan. This information includes MRI images. (3) Get excellent segmentation results, have four sequences of T1 weighting (T1), T1-weighted con- the potential clinical application. trast-enhanced (T1-CE), and T2 weighting (T2), fluid attenuation inversion recovery (FLAIR) [6, 7]. However, Related works manually segmenting tumors from MRI images requires Traditional machine learning professional prior knowledge, which is time-consuming At present, in clinical medicine, it is the goal that experts and labor-intensive, and is prone to errors, which is very and scholars have been pursuing to use fully automatic dependent on the doctor’s experience. Therefore, the segmentation methods to replace tedious manual seg- development of an accurate, reliable, and fully automatic mentation or semi-automatic segmentation. It is also the brain tumor segmentation algorithm has strong clinical focus and key technology of medical impact research in significance. recent years. Traditional image processing brain tumor With the development of computer vision and pat- segmentation algorithms use threshold-based segmenta- tern recognition, convolutional neural networks have tion methods, region-based segmentation methods, and been implemented to solve many challenging tasks. For boundary-based segmentation methods. Image segmen- example, classification, segmentation and target detec - tation based on the threshold is one of the simplest and tion capabilities have been greatly improved. In addition, most traditional methods in image processing. Tustison deep learning technology shows great potential in medi- et  al. proposed a two-stage segmentation framework cal image processing. So far, plenty of research studies based on Random Forest-derived probabilities, using the Fig. 1 The overall architecture of the proposed 3D AGSE-VNet Guan  et al. BMC Medical Imaging (2022) 22:6 Page 3 of 18 output of the first classifier to improve the segmentation of residual connections and proposed the residual path result of the second classifier [10]. Stadlbauer et  al. [11] (respath). It has verified its use in ISIC and BraTS. Good proposed using the normal distribution of data to obtain segmentation performance on the dataset [19]. Xu et  al. the threshold. According to the intensity change of each proposed progressive sequential causality to synthesize region, an adaptive threshold segmentation method was high-quality LGR-equivalent images and accurately seg- proposed to separate the foreground image from the ment all tissues related to the diagnosis to obtain highly background. However, this method has high limitations, accurate diagnostic indicators in a real clinical environ- and segmentation fails when multiple organizational ment [20]. Zhou et  al. proposed an effective 3D residual structures overlap. Amiri et  al. [12] proposed a multi- neural network for brain tumor segmentation, using a layer structure in which structural Random Forest and computationally efficient network 3D shuffleNetV2 as an Bayesian networks are embedded to learn tumor features encoder, and introducing a decoder with residual blocks better, but inputting a large number of features can eas- to achieve high-efficiency segmentation [21]. Saman et al. ily lead to dimensional disasters and waste plenty of time proposed an active contour model driven by optimized [13] uses a seed region growth algorithm to process brain energy function for MR brain tumor segmentation with MRI images according to the threshold T and the genera- uneven intensity correction and a method to identify and tion of PD images, and then uses the Markov logic algo- segment brain tumor slices in MRI images [22]. Liu et al. rithm to process them further to improve segmentation studied a deep learning model based on learnable group performance. convolution and deep supervision. This method replaces the convolution in the feature extraction stage with learn- Deep learning able group convolution. Tests on the BraTS2018 data- In recent years, convolutional neural networks have set show that the segmentation effect on the core area become the most popular method in image classifica - of the tumor is perfect, surpassing the winning method tion, and they are widely used in medical image analysis. NVDLMED [23]. In addition, CNN has also been widely Sérgio Pereira et al.[14] proposed an automatic position- used in other medical image analysis tasks. For example, ing method based on Convolutional Neural Network Yurttakal et  al. used the convolutional neural network (CNN) to explore the 3 × 3 core size, use a small core method for laryngeal histopathological image segmenta- to design a deeper architecture, and use intensity nor- tion, which is of great help to the early detection, moni- malization is used as a preprocessing step to train the toring and treatment of laryngeal cancer, and rapid and core validation data set in BraTS 2015. In the article by accurate tumor segmentation [24]. Hao et  al., a fully automatic brain tumor segmentation method based on U-net deep convolutional network Our work was proposed and evaluated on the BraTS 2015 data- Although many experts and scholars have proposed a set. Cross-validation shows that it can effectively obtain variety of deep learning network structures, and have promising segmentation [15]. Wang et  al. proposed a achieved good results in the field of brain tumor seg - cascade network. The first step is to segment the entire mentation. However, due to the inherent anisotropy of tumor. The second step is to segment the tumor nucleus brain glial tumors, MRI images show a high degree of using the obtained bounding box and segment the non-uniformity and irregular shapes [25]. Secondly, the enhanced tumor nucleus according to the bounding segmentation method of deep learning requires large- box of the tumor nucleus segmentation result. Use ani- scale annotation data, while brain tumor data is generally sotropic convolution and unfolded convolution, com- small and complex, and its inherent high heterogeneity bined with multi-view fusion methods to reduce false will cause intra-class differences between the sub-regions positives [16]. Andriy Myronenko proposed a 3D MRI of the brain tumor area and the tumor area, the differ - tumor subregion segmentation semantic network based ence between classes and non-tumor areas, etc. [26], on the encoder-decoder structure, which uses auto- these problems all affect the accuracy of brain tumor encoder branches to reconstruct images, and won first segmentation. place in the 2018 BraTS Challenge [17]. Feng Xue pro- In this article, to meet the above challenges, we use a posed an integrated 3D U-net brain tumor segmenta- combined model, integrate the “Squeeze and Excite” (SE) tion method. Using an integrated modelling method, module and the “Attention Guide Filter” (AG) module the encoder and decoder are input into 6 networks with into the VNet model for image segmentation of 3D MRI different colour block sizes and loss weights, and train - glioma brain tumors, it is an end-to-end network struc- ing has improved various performances [18]. In 2019, ture. We input data into the model in the form of volume Nabilibtehaz et  al. developed a novel architecture based input and use three-dimensional convolution to pro- on u-net, multires-unet, which increased the extension cess MRI images. When the image is compressed along Guan et al. BMC Medical Imaging (2022) 22:6 Page 4 of 18 with different encoder blocks, the resolution is halved, Methodology and the number of channels increases. After the image Method summary is convolved, the compression and compression module Our task is to segment multiple sequences of 3D MRI is performed. The importance of each feature channel is brain tumor images. In order to obtain good segmenta- automatically obtained through learning. Then according tion performance, we propose a new network structure to this important level to promote useful functions, and called AGSE-VNet, which combines SE (Squeeze-and- cancel the less useful functions of the current task. Each Excitation) [27] module with AG (Attention Guided decoder receives the characteristics of the corresponding Filter) module [28] is integrated into the network struc- stage of downsampling and decompresses the image, in ture, allowing the network to use global information to the upsampling, the AG module is integrated, the Atten- enhance useful feature channels selectively and suppress tion block is used to eliminate the influence of noise and useless feature channels, cleverly solving the mutual irrelevant background, and the Guide Image Filtering is dependence of feature maps, effectively suppressing the used to guide image features and structural information background information of the image, and enhancing the (edge information), it is worth mentioning that the idea accuracy of model segmentation. In the next section, we of skip connection is used in the model to avoid the dis- will introduce the network structure of AGSE-VNet in appearance of the gradient. Besides, we also use the Cate- detail. gorical_Dice loss function as the optimization function of the model, which effectively solves the problem of pixel Squeeze‑and‑excitation blocks imbalance. Figure 2 is a schematic diagram of the SE module, which We tested the performance of this model on the Mul- mainly includes the Squeeze module and the Excita- timodal Brain Tumor Segmentation Challenge (BraTS) tion module. The core of the module is to recalibrate 2020 dataset and compared it with the results of other the characteristic response of the channel adaptively teams participating in the challenge. The results show by explicitly modeling the interdependence between that our model has a good segmentation effect and has the channels. F in the figure is a standard convolu - tr the potential for clinical trials. The innovations of this tion operation, as shown in formula (1), input as X , ′ ′ ′ ′ Z ×W ×H ×C article are: (1) Clever use of channel relationships, using X ∈ R , where Z is the depth,H is the height global information to enhance useful information in the W is the width, C is the number of channels, the output Z×W×H×C s channel, to suppress useless information in the channel. is U , U ∈ R , v is a three-dimensional spatial (2) The attention mechanism is added, and the network convolution, v means that each channel acts on the cor- structure is also full of jump connections. The informa - responding channel feature. tion extracted by the downsampling can be quickly cap- tured to enhance the performance of the model. (3) Use s s U = v × X = v × x (1) c c the Categorical_Dice loss function to solve the problem s=1 of imbalance between foreground voxels and background voxels. Fig. 2 SE network module diagram Guan  et al. BMC Medical Imaging (2022) 22:6 Page 5 of 18 F (·) is the squeeze operation. As shown in formula (2), Among them, X = [x , x , . . . , x ] and F (u , s ) refer sq 1 2 c c c scale the feature U first passes the Squeeze operation. It com - to the corresponding channel between the feature map W×H presses the features along the spatial gradient and aggre- u ∈ R and the scalar s . c c gates the feature maps into the feature maps of dimension W × H as the feature descriptor. Each three-dimensional Attention guided filter blocks feature channel becomes a real number, which responds to Attention Guided Filter (AG) module combines attention the global distribution on the feature channel, to a certain block and guided image filtering. The Attention Guided extent, the real number at this time is closer to the global Filter filters the low-resolution feature maps and high- receptive field. This operation transforms the input of resolution feature maps to recover spatial information H × W × C into 1 × 1 × C output. and merge structural information from feature maps of different resolutions. Figure  3 is a schematic diagram of Z W H the Attention Block, where O and I are the input of the z = F (U ) = = k, i, j c sq c Z × H × W attention guided filter, and the attention map obtained i=1 j=1 k=1 by the calculation. Attention Block is extremely critical (2) in this method. It effectively solves the influence of the As shown in Eq.  3, in order to limit the complexity and background on the foreground and has the effect of high - generalization of the model, using two fully connected lay- lighting the foreground and reducing the background. ers as a parameterized gating mechanism. In Eq. 3, W × z For the given feature maps O and I , use convolution with represents a fully connected layer operation. The dimen - a channel of 1 × 1 × 1 to perform a linear transformation, sion of W is C × . Here m is a scaling parameter. In this and then combine the two converted feature maps with article, we set m = 4 empirically. The parameter aims is to the ReLU layer through element addition, and then use a reduce the number of channels and thus reduce the num- 1 × 1 × 1 . The convolution is again linearly transformed, ber of calculations. Then through a ReLU layer, the output and the sigmoid is most used to activate the final atten - dimension remains unchanged and then multiplying with tion feature map T . W . This process of multiplying with W is also an operation 2 2 Figure 4 is a schematic diagram of the results of the AG of a fully connected layer. The dimension of W is C × , module. The input is the guided feature map (I) and the and finally, through the Sigmoid function, the parameter s filtered feature map (O) , and the output is the high-reso- is obtained. lution feature map O , which is the product of the joint s = F (z, W ) = σ g(z, W ) = σ (W δ(W , z)) (3) action of I and O , as shown in formula (5). ex 2 1 C C ×C C× m m where δ is the ReLU operation, W ∈ R , W ∈ R , 1 2 O = W (I) · O i ij j (5) and finally a 1 × 1 × C real number sequence is com- i∈w bined with U , recalibrated, and the final output is obtained by formula (4). Different from the guided filtering proposed by Kaim - x = F (u , s ) = s · u (4) ing He [29], the attention feature map T is generated c scale c c c c from the filtered feature map (O) through the Atten- tion Block module. First, the guided feature map I is Fig. 3 Attention Block schematic diagram Guan et al. BMC Medical Imaging (2022) 22:6 Page 6 of 18 Fig. 4 Attention Guided Filter Blocks structure diagram down-sampled to obtain a low-resolution feature map average value of all windows containing the pixel at that I , which is similar to the feature map O Same size. Then pixel for such a pixel, as shown in formula (8). minimize the reconstruction error of O and I to obtain the coefficients A and B of the attention guided filter. O = (a I + b ) = A ∗ I + B l l i k i k l l l (8) |w| After that, by down-sampling A and B or coefficients l l k,i∈w A and B , finally get the high resolution generated by h h Get A and B through upsampling, and finally get the h h the attention filter Feature map O . Among them, the output O = A ∗ I + B . h h attention filter is essentially a specific window w with a radius of r . In particular, the attention guide filter will construct a square window w , and the radius of Downsamplings k at each position is r . In our study, we set r = 16 and In Fig.  1, we provide a schematic diagram of AGSE- ε = 0.1 empirically based on the final segmentation VNet. The network structure is divided into encoder and performance. (a , b ) is also the only certain constant decoder in total, as shown in Fig.  5. Among them, Fig. a k k coefficient, as shown in formula (6 ), where ridge regres- is the encoder, and the coding area mainly performs com- sion with a standard term is used to calculate the mini- pression path, and Fig. b is the decoder, and the decoding mum reconstruction error. area performs decompression. Downsampling is com- posed of four encoder blocks, each of which includes 2–3 2 2 2 min E(a , b ) := T (a I + b − O ) + εa layers of convolution, an extrusion and excitation layer k k k li k i i k a ,b k k i∈w and a downsampling layer, the processing process of the (6) SE module is shown on the right side of Fig. 5a. The fea - ture extraction is performed by convolution with a step where T is the attention weight at position i , ε is the reg- size of 2. The convolution is as follows (9), (10) shows: ularization parameter, and the calculation of (a , b ) is k k shown in formula (7). i = i + (s − 1)(i − 1) (9) I O − μ O i i k k |w| i∈w a = , b = O − a μ k k k k k i + 2p − k σ + ε o = + 1 k (10) (7) where μ is the average value of the window w pixels in k k i i where is the input size, is the output size after filling, the image I , σ is the variance of the window w , |w| is the k s p is the step size, is the filling size, k is the convolution sum of the window pixels, and O is the average pixel k o kernel size, and is the output size. value O = O of the image O to be filtered in the k i When the image is compressed along with different |w| i∈w encoder blocks, its resolution is halved and the num- window w , so that the non-edge area can be found if a ber of channels doubled. This is achieved by convolu - pixel is surrounded by multiple windows, calculate the 3 × 3 × 3 tion of voxels with a step size of 2. After the Guan  et al. BMC Medical Imaging (2022) 22:6 Page 7 of 18 Fig. 5 The architecture of encoder block and decoder block with AGSE-VNet convolution operation, the squeeze and compression deconvolution with a step size of 2 to fill in the image fea - module is performed, which ingeniously solves the rela- ture information. The deconvolution is shown in formula tionship between the channels and improves the effec - (11): tive information transmission in the channels. It is worth o = s(i − 1) + 2p − k + 2 (11) mentioning that all convolutional layers have adopted normalization and dropout processing, and the ReLU Each decoder block receives the characteristics of the activation function has also been applied to various corresponding stage of downsampling. The convolu - positions in the network structure. Besides, a jump con- tion kernel used in the last layer of the network struc- nection method is also used in the model to avoid the ture keeps the number of output channels consistent disappearance of the gradient as the network structure with the number of categories. Finally, the channel value deepens. is converted into a probability value output through the sigmoid function, and the voxel is converted into a brain tumor gangrene area. The idea of skip connection Upsampling is adopted in each decoder block. The feature map after After downsampling the model, we introduced the AG processing by the encoder and decoder is shown in Fig. 6, module to solve the problem of restoring spatial infor- where Fig.  6a is a feature map processed by the encoder, mation and fusing structural information from low-res- and Fig. 6b is a feature map processed by the decoder. olution feature maps to high-resolution feature maps. The AG module is similar to the SE module. Based on not changing the dimensions of input and output, the Skip connection features are enhanced. Therefore, we replace the splic - To further make up for the information lost in the down- ing module in the VNet model with the AG module and sampling of the encoder, concat is used between the integrate it into the decoder. The structure diagram is encoder and decoder of the network to fuse the feature shown in Fig.  5. Each decoder block includes an upsam- maps of the corresponding positions in the two pro- pling layer, an AG module, and three layers of convolu- cesses. In particular, the method extracted in this article tion, the processing flow of the AG module is shown in uses the AG (Attention Guided Filter Blocks) module the box on the right side of Fig. 5b. The decoder decom - instead of concat, so that the decoder can obtain infor- presses the image. In the up-sampling, this article uses mation during upsampling. With more high-resolution Guan et al. BMC Medical Imaging (2022) 22:6 Page 8 of 18 Fig. 6 Feature map processed by encoder and decoder information, the detailed information in the original convolutional layers with the same size feature map, image can be restored more perfectly, and the segmenta- that is, use the splicing layer to convolve the feature tion accuracy can be improved. map obtained through the convolution of the previ- We introduced adjacent layer feature reconstruc- ous layer and the next layer. Obtaining the channel size tion and cross-layer feature reconstruction in the net- achieves the purpose of maximizing the use of feature work. The cross-layer feature reconstruction module is information in all previous layers. based on the encoder-decoder structure. In the process of network communication, as the network continues to deepen, the acceptance domain of the correspond- Loss function ing feature map will become larger and larger, but the At present, the segmentation of medical images faces retained detailed information will become less and the problem of the imbalance between the foreground less. Based on the encoder-decoder symmetric struc- and the background regions. We also face such chal- ture, the splicing layer is used to splice the feature lenges in our tasks. Therefore, we choose the Categori - maps extracted from the down-sampling in the encoder cal_Dice loss function as the optimization function of process and the new features obtained from the up- our model. Heavy to solve this problem by adjusting sampling in the decoder process to perform channel- the weight of each forecast category. We set the weight dimensional splicing. Retaining more important feature of gangrene, edema, and enhanced tumor to 1, and the information is conducive to achieving a better seg- weight of background to 0.1. The Categorical_Dice loss mentation effect. Adjacent layer feature reconstruction function is shown in formula (12): is to establish a branch between each pair of adjacent Guan  et al. BMC Medical Imaging (2022) 22:6 Page 9 of 18 2|P ∩ G| The experimental environment was conducted on Ten - Dice(P, G) = (12) sorflow 1.13.1, and the runtime platform processor was |P| + |G| Intel (R) Core (TM) i7-9750H CPU @ 2.60  GHz, 32  GB Among them, G is Mask, the ground truth encoded by RAM, Nvidia GeForce RTX 2080, 64-bit Windows 10. one-hot, G ∈ [None, 64, 128, 128, 4] and P represents the The development software platform is PyCharm, and the predicted value, which is the probability result obtained python version is 3.6.9. after softmax calculation, P ∈ [None, 64, 128, 128, 4] . The partial differential calculation of formula (13) is per - Pre‑processing formed to obtain the gradient value relative to the pre- Since our data set has four modalities, T1, T1-CE, T2, dicted j-th voxel, where N stands for voxel, p ∈ P and and FLAIR, there is a problem of different contrast, g ∈ G. which may cause the gradient to disappear during the   � � � � � � � training process, so we use standardization to process the N N N 2 2 g p + g − 2p p g j j i i ∂D i i i i i   image, from the image pixel. The image data is normal - = 2  � �  � � ∂p ized by subtracting the average value and dividing by the j N N 2 2 p + g i i i i standard deviation. Calculated as follows: (13) X − μ The weight distribution of the loss function of each X = (15) node is shown in formula (14), and the weight value is [0.1, 1.0, 1.0, 1.0]. where μ donates the mean of the image, σ donates stand- ard deviation, X donates the image matrix, X is the nor- Loss =−Dice(P, G) × weight (14) malized image matrix. After normalization, we merge the images of the four modalities with the same contrast to form a three-dimen- Materials sional image with four channels. The original image size Dataset is, and the combined image size becomes. The size of the In this research, we use the dataset of the BraTS 2020 label is, and its pixel value contains 4 different values. challenge to train and test our model [30, 31]. The data Channel 0 is the normal tissue area, 1 is the gangrene set contains two types, namely low-grade glioma (LGG) area, 2 is the edema area, and 3 is the enhanced tumor and glioblastoma (HGG), each category has four modal area. Then, divide the image and mask into multiple images: T1 weighting (T1), T1-weighted contrast- blocks and perform the patch operation. Each case gen- enhanced (T1-CE), and T2 weighting (T2), fluid attenu - erates 175 images with a size of 128 × 128 × 64. Finally, ation inversion recovery (FLAIR). The mask of the brain save it in the corresponding folder in NumPy.npy format tumor includes the gangrene area, edema area, and (https:// numpy. org/ doc/ stable/ refer ence/). The preproc - enhancement area. Our task is to segment the three sub- essed image is shown in Fig. 7. regions formed by nesting tags, which are enhancement tumor (ET), whole tumor (WT), and tumor core (TC). Evaluation metrics There are 369 cases in the training set and 125 cases We use the dice coefficient, specificity, sensitivity, and in the validation set. The masks corresponding to these Hausdorff95 distance to measure the performance of our cases are not used for training, and their functions are model. Dice coefficient is calculated as: mainly used for evaluating the model after training. 2TP Dice = (16) FN + FP + 2TP Design detail where TP , FP and FN are the number of true positive, In deep learning training, the setting of hyperparameters false positive, and false negative respectively. Specificity is very essential, and it will determine the performance can be used to evaluate the number of true negative and of our model. But often in training, the initial value of false positive, it is used to measure the model ability to the hyperparameter is set by experience. In the training predict the background area, defined as: of the AGSE-VNet model, the initial learning rate is set to 0.0001, the dropout is set to 0.5, the number of train- TN Specificity = (17) ing steps is about 350,000, and then the learning rate is TN + FP adjusted to 0.00003. The dataset is halved every time it is traversed, and the data is shuffled to enhance the robust - where TN is the number of true negative. Sensitivity can ness and generalization ability of the model. be used to evaluate the number of the true positive and Guan et al. BMC Medical Imaging (2022) 22:6 Page 10 of 18 Fig. 7 Preprocessed result false negative, it is used to measure the sensitivity of the Results and discussions model to segmented regions, defined as: Results on AGSE‑VNet model Our data set includes a training set and a test set. The TP training set contains 369 cases and the test set contains Sensitivity = (18) TP + FN 125 cases. The mask of the tumor includes the gan - grene area, edema area, enhancement area, and back- The Hausdorff95 distance measures the distance between ground area. The labels correspond to 1, 2, 4, and 0, the surface of the real area and the predicted area, which is respectively. These labels are merged into three nested more sensitive to the segmented boundary, defined as: sub-areas, namely the enhancing tumor (ET), the whole tumor (WT), and the tumor core (TC), for these sub- Haus95(T , P) = max sup inf d(t, p), sup inf d(t, p) regions, we use four indicators of sensitivity, specificity, t∈T ,p∈P p∈P,t∈T dice coefficient, and Hausdorff95 distance to measure the (19) performance of the model. We use the data set of BraTS where inf denotes the infimum and sup denotes the 2020 for training and verification, and the average index supremum, t and p donate the points on the surface T of obtained is shown in Table  1. From Table  1, we observe the ground-truth area and the surface P of the pre data- that the model has a better segmentation effect on the set dictated area. Besides, d ·, · calculates the distance ( ) WT region. The Dice and Sensitivity of the training set between the assembly point t and the assembly point p. and the validation set are 0.846, 0.849, 0.825, and 0.833, respectively, which are significantly better than other regions. Table 1 Quantitative valuation on the training set and validation set Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Training 0.70 0.85 0.77 0.72 0.83 0.74 0.99 0.99 0.99 35.70 8.96 17.40 Validation 0.68 0.85 0.69 0.68 0.83 0.65 0.99 0.99 0.99 47.40 8.44 31.60 Guan  et al. BMC Medical Imaging (2022) 22:6 Page 11 of 18 On this basis, we conduct a statistical analysis of the can be seen that the fluctuation range is small. Observing experimental results. Figures 8 and 9 are the scatter plots the scatter diagram on the left side, it can be seen that and box plots of the four evaluation indicators of the the data are all clustered at a higher position, indicating training set and test set, reflecting the distribution char - that our model is the background area has a high level of acteristics of the results. It can be seen from the box plot prediction, which can effectively alleviate the problem of that there are fewer outliers of various indicators and imbalance between foreground pixels and background. minimal fluctuation of results. The horizontal line in the We randomly selected several slices from the training box plot represents the median of this set of data. It can set and compared the actual situation with the results be observed that the three indicators of Dice, Sensitiv- predicted by our model, as shown in Fig.  10a, the first ity, and Specificity are at a higher level, which shows that line is the original image, the second line is the label, and the segmentation effect of our proposed model is located the third line is the tumor sub-region predicted by our in a higher area. In the results of the four indicators, the model. At the same time, we also selected two of them sensitivity results are all concentrated at a higher level. It to display in Fig. 10b. Among them, the green area is the Fig. 8 A collection of scatter plots and box plots of four indicators in the training set Guan et al. BMC Medical Imaging (2022) 22:6 Page 12 of 18 Fig. 9 A collection of scatter plots and box plots of four indicators in the validation set whole tumor (WT), the red area is the tumor core (TC), display, as shown in Fig.  11a. Similarly, in Fig.  11b, we and the area combining yellow and red represents the also show the three-dimensional image of the segmen- enhancing tumor(ET). We show the 3D image of the seg- tation result and annotate the accuracy value of the ET mentation result in the last two columns. From the com- region, as can be seen from the figure, our model has parison of the segmentation results, we can find that our a good segmentation effect for MRI images of differ - model has a good effect on brain tumor segmentation, ent intensities, and can accurately segment tumor sub- especially the whole tumor (WT) region segmentation regions, which has a certain potential in brain tumor effect is excellent. However, the segmentation prediction image segmentation. of the tumor core (TC) is slightly biased, which may not In our research, we proposed the AGSE-VNet model be suitable for extraction due to the small features of the to segment 3D MRI brain tumor images and obtained tumor core. better segmentation results on the BraTS 2020 dataset. After the training is completed, we randomly selected In order to further verify the effect of our segmentation, several segmentation slices in the validation set for compare our experimental method with the methods Guan  et al. BMC Medical Imaging (2022) 22:6 Page 13 of 18 Fig. 10 Display of segmentation results in the training set. a Example segmentation results in 2D. b Example segmentation results with 3D rendering proposed by other outstanding teams participating in the our model performs well in the whole tumor (WT) competition. The results of other teams are available on region and obtains relatively excellent results, indicating the official website of the BraTS Multimodal Brain Tumor that the method we proposed has a certain potential in Segmentation Challenge 2020 (https:// www. cbica. upenn. segmentation. edu/ BraTS 20/ lboar dTrai ning. html). The comparison results of the training set are shown in Table  2, and the Discussion comparison results of the verification set are shown in The method proposed in this paper cleverly solves the Table  3. From the results in the table, we can find that problem of interdependence between channels, and Guan et al. BMC Medical Imaging (2022) 22:6 Page 14 of 18 Fig. 11 Display of segmentation results in the validation set. a Example segmentation results in 2D. b Example segmentation results with 3D rendering autonomously extracts effective features from channels and segmented, and the segmentation effect obtained to suppress useless feature channels. After the features has a good performance. This is beneficial to radiologists extracted by the encoder, low-resolution feature maps and oncologists, who can quickly predict the condition and high-resolution feature maps are filtered through of the tumor and assist in the treatment of the patient. the Attention module, to recover spatial information and Comparing the results in Tables 2 and 3, we find that our fusion structural information from feature maps of dif- model performs well in the whole tumor (WT) area, but ferent resolutions, our method is not affected by the size does not perform well in the enhancing tumor (ET) and and location of the tumor. For MRI images of different the tumor core (TC) areas, this may be because the tar- intensities, the tumor area can be automatically identi- get in the ET area is small and the feature is fuzzy and fied, and the tumor sub-regions can be feature extracted difficult to extract. At the same time, we compare our Guan  et al. BMC Medical Imaging (2022) 22:6 Page 15 of 18 Table 2 The results of various indicators in the training set Team Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Proposed 0.70 0.85 0.77 0.72 0.83 0.74 0.99 0.99 0.99 35.70 8.96 17.40 mpstanford 0.60 0.78 0.72 0.56 0.80 0.75 0.99 0.99 0.99 35.95 17.68 17.21 agussa 0.67 0.87 0.79 0.69 0.87 0.82 0.99 0.99 0.99 39.25 15.75 17.05 ovgu_seg 0.65 0.81 0.75 0.72 0.78 0.76 0.99 0.99 0.99 34.79 9.50 8.93 AI-Strollers 0.59 0.73 0.61 0.52 0.73 0.64 0.99 0.97 0.98 38.87 20.81 24.22 uran 0.48 0.79 0.64 0.45 0.74 0.61 0.99 0.99 0.99 37.92 7.72 14.07 CBICA 0.54 0.78 0.57 0.64 0.82 0.53 0.99 0.99 0.99 20.00 46.30 39.60 unet3d-sz 0.69 0.81 0.75 0.77 0.93 0.83 0.99 0.96 0.98 37.71 19.57 18.36 iris 0.76 0.88 0.81 0.78 0.90 0.83 0.99 0.99 0.99 32.30 18.07 14.70 VuongHN 0.74 0.81 0.82 0.84 0.98 0.84 0.95 0.93 0.99 21.97 12.32 8.72 Table 3 The results of various indicators in the validation set Team Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Proposed 0.68 0.85 0.69 0.68 0.83 0.65 0.99 0.99 0.99 47.40 8.44 31.60 mpstanford 0.49 0.72 0.62 0.49 0.81 0.69 0.99 0.99 0.99 61.89 26.00 28.02 agussa 0.59 0.83 0.69 0.60 0.87 0.71 0.99 0.99 .0.99 56.58 23.23 29.59 ovgu_seg 0.60 0.79 0.68 0.66 0.79 0.67 0.99 0.99 0.99 54.07 12.05 19.10 AI-Strollers 0.58 0.74 0.61 0.52 0.77 0.62 0.99 0.99 0.99 47.23 24.03 31.54 uran 0.75 0.88 0.76 0.77 0.85 0.71 0.99 0.99 0.99 36.42 6.62 19.30 CBICA 0.63 0.82 0.67 0.76 0.78 0.75 0.99 0.99 0.99 9.60 10.70 28.20 unet3d-sz 0.70 0.84 0.72 0.71 0.87 0.79 0.99 0.99 0.99 42.09 10.48 12.32 iris 0.68 0.86 0.73 0.67 0.90 0.70 0.99 0.99 0.99 44.13 23.87 20.02 VuongHN 0.79 0.90 0.83 0.80 0.89 0.80 0.99 0.99 0.99 21.43 6.74 7.05 Table 4 Comparison of our proposed AGSE-VNet model with the method proposed by Zhao et al., a new segmentation classic methods framework was developed, using a fully convolutional neural network to assign different labels to the image Method Dice_ET Dice_WT Dice_TC Dataset in pixel units, optimize the output results of FCNNs by Proposed 0.67 0.85 0.69 BraTs 2020 using the recurrent neural network constructed by the Zhou et al 0.65 0.87 0.75 BraTs 2018 conditional random place, this method was verified on Zhao et al 0.62 0.84 0.73 BraTs 2016 the BraTS 2016 dataset and got a good segmentation Pereira et al 0.65 0.78 0.75 BraTs 2015 effect. Pereira et  al. proposed an automatic position - ing method for convolutional neural networks, which achieved good results in the BraTS 2015 dataset. Analyzing Table 4, we found that our model has certain method with some classic algorithms for brain tumor advantages in segmentation, there are still differences segmentation. The results are shown in Table  4. In the in TC regional accuracy, and the model has limitations. BraTS Challenge, 2018, zhou et  al. [32] and others pro- In future work, we will propose solutions to this situa- posed a lightweight one-step multi-task segmentation tion, such as how to further segment the region of inter- model, by learning the shared parameters of joint fea- est after our model has extracted it, in order to improve tures and the composition features of distinguishing the accuracy of the enhancing tumor (ET) and the tumor specific task parameters, the imbalance factors of tumor core (TC) areas, more characteristic information can be types are effectively alleviated, uncertain information is captured. Besides, the algorithms proposed in many top suppressed, and the segmentation result is improved. In methods have their areas of excellent performance. How Guan et al. BMC Medical Imaging (2022) 22:6 Page 16 of 18 we combine the advantages of these algorithms and inte- input ratio and output ratio are unchanged. After the grate them into our model is the focus of our future work. SE module processes the model, the network learns the In clinical treatment, it helps experts to understand the global information and selects the useful information patient’s current situation more quickly and accurately, in the enhancement channel, and then uses the atten- saving experts time, and realizing a leap in the quality of tion mechanism of the Attention Guild Filter block to automatic medical segmentation. quickly capture its dependencies and enhance the per- In addition, in order to verify the robustness of our formance of the model. Secondly, we also introduced model to resist noise interference, we have now added a new loss function Categorical_Dice, set different Gaussian noises in the frequency domain (k-space) of the weights for unused masks, set the weight of the back- testing data to simulate realistic noise contaminations. ground area to 0.1, and set the tumor area of interest The comparison results are shown in Fig.  12. From the to 1, Ingeniously solve the problem of the voxel imbal- noisy and no-noise segmentation results, we have found ance between the foreground and the background. Our that the segmentation results of our AGSE-VNet model online verification tool on the BraTS Challenge website for the three regions are not much different. These results evaluated this approach. It is found that our model is can demonstrate that our model has a significant advan - still different from the top methods for the segmenta - tage in generalization when noises are present. tion of the enhancing tumor (ET) and the tumor core (TC) regions. It may be because the features of these Conclusion two regions are small and difficult to extract. How to All in all, we have implemented a good method to improve the accuracy of these two regions is our future segment 3D MRI brain tumor images, this method work direction. can automatically segment the three regions of the The automatic segmentation of brain tumors in the enhancing tumor (ET), the whole tumor (WT), and medical field has been a long-term research problem. the tumor core (TC) of the brain tumors. We con- How to design an automatic segmentation algorithm ducted experiments on the BraTS 2020 data set and with short time and high accuracy, and then form a com- got good results. The AGSE-VNet model is improved plete system is the current direction of a large number of based on VNet. There are five encoder blocks and four researchers. Therefore, we must continue to optimize our decoder blocks. Each encoder block has an extrusion segmentation model to achieve a qualitative leap in the and excitation block, and each decoder has an Atten- field of automatic segmentation. tion Guild Filter block. Such a design can be embed- ded in our model without affecting the size mismatch of the network structure under the condition that the Fig. 12 Comparison of segmentation results without noise and noise added Guan  et al. BMC Medical Imaging (2022) 22:6 Page 17 of 18 Authors’ contributions for brain tumor segmentation using multi-parametric MRI. NeuroImage Clin. XG, GY, and XL conceived and designed the study, contributed to data 2016;12(2):753–64. analysis, contributed to data interpretation, and contributed to the writing of 8. Milletari F, Navab N, Ahmadi S. V-Net: fully convolutional neural networks the report. XG, GY, JY, WY, XX, WJ, and XL contributed to the literature search. for volumetric medical image segmentation. In: 2016 fourth international JY and WY contributed to data collection. XG, GY, XX, WJ, and XL performed conference on 3D vision (3DV ). IEEE. 2016. data curation and contributed to the tables and figures. All authors read and 9. Rickmann A, Roy A, Sarasua I, Navab N, Wachinger C. `Project & Excite’ mod- approved the final manuscript. ules for segmentation of volumetric medical scans. Image Video Processing. Funding 10. Tustison N, Shrinidhi K, Wintermark M, Durst CR, Kandel BM, Gee JC, This work is funded in part by the National Natural Science Foundation Grossman MC, Avants BB. Optimal symmetric multimodal templates and of China (Grants No. 62072413, 61602419), in part by the Natural Science concatenated random forests for supervised brain tumor segmentation Foundation of Zhejiang Province of China (Grant No. LY16F010008), in part (simplified) with ANTsR. Neuroinformatics. 2015;13(2):209–25. by Medical and Health Science and Technology Plan of Zhejiang Province of 11. Rose S, Crozier S, Bourgeat P, Dowson N, Salvado O, Raniga P, Pannek K, Coul- China (Grant No. 2019RC224), in part by the Teacher Professional Develop- thard A, Fay M, Thomas P. Improved delineation of brain tumour margins ment Project of Domestic Visiting Scholar in Colleges and Universities of using whole-brain track-density mapping. In: Ismrm-esmrmb joint meeting: Zhejiang Province of China (Grants No.2020-19, 2020-20), in part by the UK clinical needs & technological solutions. International Society of Magnetic Research and Innovation Future Leaders Fellowship (MR/V023799/1), and Resonance in Medicine. 2009. also supported in part by the AI for Health Imaging Award ‘CHAIMELEON: 12. Amiri S, Mahjoub MA, Rekik I. Bayesian network and structured random Accelerating the Lab to Market Transition of AI Tools for Cancer Management’ forest cooperative deep learning for automatic multi-label brain tumor [H2020-SC1-FA-DTS-2019-1 952172]. segmentation. In: 10th international conference on agents and artificial intelligence. 2018. Availability of data and materials 13. Balafar M. Fuzzy cc-mean based brain MRI segementation algorithms. Artif The datasets analysed during this current study are available in the BRATS Intell Rev. 2014;41(3):441–9. 2020. https:// www. med. upenn. edu/ cbica/ brats 2020/ data. html. 14. Pereira S, Pinto A, Alves V. Brain tumor segmentation using convo- lutional neural networks in MRI images. IEEE Trans Med Imaging. 2016;35(5):1240–51. Declarations 15. Hao D, Yang G, Liu F, Mo Y, Guo Y. Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks. In: Annual Ethics approval and consent to participate conference on medical image understanding and analysis. Springer, Cham; Not applicable. 16. Wang G, Li W, Ourselin S, Vercauteren T. Automatic brain tumor segmenta- Consent for publication tion using cascaded anisotropic convolutional neural networks. Comput Vis All authors contributed to the article and approved the submitted version. Pattern Recognit. 2017;12(5). 17. Myronenko A. 3D MRI brain tumor segmentation using autoencoder regu- Competing interests larizatio. Berlin: Springer; 2018. The authors declare no competing interests. 18. Xue F, Nicholas T, Meyer C. Brain tumor segmentation using an ensemble of 3D U-nets and overall survival prediction using radiomic features. Comput Author details Vis Pattern Recognit. 2018;279–288. School of Medical Technology and Information Engineering, Zhejiang Chi- 19. NabilIbtehaz M, Rahman S. MultiResUNet: rethinking the U-net architec- nese Medical University, Hangzhou 310053, China. Cardiovascular Research ture for multimodal biomedical image segmentation. Comput Vis Pattern Centre, Royal Brompton Hospital, London SW3 6NP, UK. National Hear t Recognit. 2019;121. and Lung Institute, Imperial College London, London SW7 2AZ, UK. First Affili- 20. Xu C, Xu L, Ohorodnyk P, Roth M, Li M. Contrast agent-free synthesis and ated Hospital, Gannan Medical University, Ganzhou 341000, China. College segmentation of ischemic heart disease images using progressive sequen- of Life Science, Zhejiang Chinese Medical University, Hangzhou 310053, China. tial causal GANs. Medical Image Analysis 101668. 2020. 21. Zhou X, Li X, Hu K, Zhang Y, Chen Z, Gao X. ERV-Net: An efficient 3D residual Received: 16 January 2021 Accepted: 26 July 2021 neural network for brain tumor segmentation. Expert Syst Appl. 2021;170. 22. Saman S, Narayanan S. Active contour model driven by optimized energy functionals for MR brain tumor segmentation with intensity inhomogeneity correction. Multimedia Tools Appl. 2021;80(4):21925–54. 23. Liu H, Li Q, Wang L. A deep-learning model with learnable group convolu- References tion and deep supervision for brain tumor segmentation. Math Probl Eng. 1. Nie J, Xue Z, Liu T, Young GS, Setayesh K, Lei G. Automated brain tumor 2021;3:1–11. segmentation using spatial accuracy-weighted hidden Markov random 24. Yurttakal A, Erbay H. Segmentation of Larynx histopathology images via field. Comput Med Imaging Graph. 2009;33(6):431–41. convolutional neural networks. In: Intelligent and Fuzzy Techniques: Smart 2. Bakas S, Reyes M, Jakab A, et al. Identifying the best machine learning algo- and Innovative Solutions. 2021;949–954. rithms for brain tumor segmentation. Progression assessment, and overall 25. Zhao X, Wu Y, Song G, Li Z, Zhang Y, Fan Y. A deep learning model inte- survival prediction in the BRATS challenge, arXiv preprint arXiv: 1811. 02629. grating fcnns and crfs for brain tumor segmentation. Med Image Anal. 2018;43:98–111. 3. Essadike A, Ouabida E, Bouzid A. Brain tumor segmentation with Vander 26. Sturm D, Pfister S, Dtw J. Pediatric gliomas: current concepts on diagnosis, Lugt correlator based active contour. Comput Methods Programs Biomed. biology, and clinical management. J Clin Oncol. 2017;35(21):2370. 2018;60:103–17. 27. Hu J, Li S, Albanie S, Sun G. Squeeze-and-excitation networks. IEEE Trans 4. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y. Brain tumor Pattern Anal Mach Intell. 2017;99. segmentation with deep neural networks. Med Image Anal. 2017;35:18–31. 28. Zhang S, Fu H, Yan Y, Zhang Y, Wu Q, Tan M, Xu Y. Attention guided network 5. Akkus Z, Galimzianova A, Hoogi A, Daniel R. Deep learning for brain MRI for retinal image segmentation. In: Medical image computing and com- segmentation: state of the art and future directions. J Digit Imaging. puter assisted intervention—MICCAI 2019. 2019. 2017;30(4):449–59. 29. He K, Sun J, Tang X. Guided image filtering. Lect Notes Comput Sci. 6. Hussain S, Anwar S, Majid M. Segmentation of Glioma tumors in brain using 2013;35(6):1397–409. deep convolutional neural network. Neurocomputing. 2017; 282. 30. Menze B, Jakab A, Bauer S, Jayashree KC, Keyvan F, Justin K. The multimodal 7. Sauwen N, Acou M, Cauter S, Sima DM, Veraart J, Maes F, Himmelreich U, brain tumor image segmentation benchmark (BRATS). IEEE Trans Med lmag- Achten E, Van Huffel S. Comparison of unsupervised classification methods ing. 2015;34(10):1993–2024. Guan et al. BMC Medical Imaging (2022) 22:6 Page 18 of 18 31. Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Nat Sci Data. 2017;4: 170117. https:// doi. org/ 10. 1038/ sdata. 2017. 117. 32. Zhou C, Ding C, Wang X, Lu Z, Tao D. One-pass multi-task networks with cross-task guided attention for brain tumor segmentation. Computer Vision and Pattern Recognition. 2019. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions
Loading next page...
 
/lp/springer-journals/3d-agse-vnet-an-automatic-brain-tumor-mri-data-segmentation-framework-UH0kU20iKK
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2021
eISSN
1471-2342
DOI
10.1186/s12880-021-00728-8
Publisher site
See Article on Publisher Site

Abstract

Background: Glioma is the most common brain malignant tumor, with a high morbidity rate and a mortality rate of more than three percent, which seriously endangers human health. The main method of acquiring brain tumors in the clinic is MRI. Segmentation of brain tumor regions from multi-modal MRI scan images is helpful for treatment inspection, post-diagnosis monitoring, and effect evaluation of patients. However, the common operation in clinical brain tumor segmentation is still manual segmentation, lead to its time-consuming and large performance difference between different operators, a consistent and accurate automatic segmentation method is urgently needed. With the continuous development of deep learning, researchers have designed many automatic segmentation algorithms; however, there are still some problems: (1) The research of segmentation algorithm mostly stays on the 2D plane, this will reduce the accuracy of 3D image feature extraction to a certain extent. (2) MRI images have gray-scale offset fields that make it difficult to divide the contours accurately. Methods: To meet the above challenges, we propose an automatic brain tumor MRI data segmentation framework which is called AGSE-VNet. In our study, the Squeeze and Excite (SE) module is added to each encoder, the Atten- tion Guide Filter (AG) module is added to each decoder, using the channel relationship to automatically enhance the useful information in the channel to suppress the useless information, and use the attention mechanism to guide the edge information and remove the influence of irrelevant information such as noise. Results: We used the BraTS2020 challenge online verification tool to evaluate our approach. The focus of verification is that the Dice scores of the whole tumor, tumor core and enhanced tumor are 0.68, 0.85 and 0.70, respectively. Conclusion: Although MRI images have different intensities, AGSE-VNet is not affected by the size of the tumor, and can more accurately extract the features of the three regions, it has achieved impressive results and made outstand- ing contributions to the clinical diagnosis and treatment of brain tumor patients. Keywords: Brain tumor, Magnetic resonance imaging, VNet, Automatic segmentation, Deep learning gliomas can be divided into four grades according to dif- Introduction ferent symptoms, of which I and II are low-grade gliomas Glioma is one of the common types of primary brain (LGG), III and IV are high-grade gliomas (HGG) [2]. Due tumors, accounting for about 50% of intracranial tumors to the high mortality rate of glioma, it can appear in any [1]. According to the WHO classification criteria, part of the brain and people of any age, with various his- tological subregions and varying degrees of invasiveness *Correspondence: g.yang@imperial.ac.uk; dmia_lab@zcmu.edu.cn [3]. Therefore, it has attracted widespread attention in Xi Guan and Guang Yang have contributed equally to this work School of Medical Technology and Information Engineering, Zhejiang the medical field. Because glioblastoma (GBM) cells are Chinese Medical University, Hangzhou 310053, China immersed in the healthy brain parenchyma and infiltrate Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 the surrounding tissues, they can grow and spread rap- 6NP, UK Full list of author information is available at the end of the article idly near the protein fibers, and the deterioration process © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Guan et al. BMC Medical Imaging (2022) 22:6 Page 2 of 18 is very rapid. Therefore, early diagnosis and treatment are on medical image segmentation have been developed in essential. both academia and industry. VNet [8] has good segmen- At present, the methods of acquiring brain tumors in tation performance in single-modal images, but there are clinical practice are mainly computed tomography (CT), still some shortcomings for multi-modal segmentation. positron emission tomography (PET), and magnetic reso- In this article, inspired by the integration of the “Project nance imaging (MRI) [4]. Among them, MRI has become and Excite” (PE) module into the 3D U-net proposed by the preferred medical imaging method for brain diagno- Anne-Marie et  al. [9], we proposed an automatic brain sis and treatment planning. Because it provides images tumor MRI Data segmentation framework, which is with high-contrast soft tissue and high spatial resolu- called 3D AGSE-VNet. The network structure is shown tion [5], it is a good representation of the anatomical in Fig.  1. The main contributions of this paper are: (1) structure of the cranial nerve soft tissue and the image Propose a combined segmentation model based on VNet, of the lesion. At the same time, MRI images can obtain integrating SE module and AG module. (2) Using volume multiple sequence information of brain tumors in differ - input, three-dimensional convolution is used to process ent spaces through one scan. This information includes MRI images. (3) Get excellent segmentation results, have four sequences of T1 weighting (T1), T1-weighted con- the potential clinical application. trast-enhanced (T1-CE), and T2 weighting (T2), fluid attenuation inversion recovery (FLAIR) [6, 7]. However, Related works manually segmenting tumors from MRI images requires Traditional machine learning professional prior knowledge, which is time-consuming At present, in clinical medicine, it is the goal that experts and labor-intensive, and is prone to errors, which is very and scholars have been pursuing to use fully automatic dependent on the doctor’s experience. Therefore, the segmentation methods to replace tedious manual seg- development of an accurate, reliable, and fully automatic mentation or semi-automatic segmentation. It is also the brain tumor segmentation algorithm has strong clinical focus and key technology of medical impact research in significance. recent years. Traditional image processing brain tumor With the development of computer vision and pat- segmentation algorithms use threshold-based segmenta- tern recognition, convolutional neural networks have tion methods, region-based segmentation methods, and been implemented to solve many challenging tasks. For boundary-based segmentation methods. Image segmen- example, classification, segmentation and target detec - tation based on the threshold is one of the simplest and tion capabilities have been greatly improved. In addition, most traditional methods in image processing. Tustison deep learning technology shows great potential in medi- et  al. proposed a two-stage segmentation framework cal image processing. So far, plenty of research studies based on Random Forest-derived probabilities, using the Fig. 1 The overall architecture of the proposed 3D AGSE-VNet Guan  et al. BMC Medical Imaging (2022) 22:6 Page 3 of 18 output of the first classifier to improve the segmentation of residual connections and proposed the residual path result of the second classifier [10]. Stadlbauer et  al. [11] (respath). It has verified its use in ISIC and BraTS. Good proposed using the normal distribution of data to obtain segmentation performance on the dataset [19]. Xu et  al. the threshold. According to the intensity change of each proposed progressive sequential causality to synthesize region, an adaptive threshold segmentation method was high-quality LGR-equivalent images and accurately seg- proposed to separate the foreground image from the ment all tissues related to the diagnosis to obtain highly background. However, this method has high limitations, accurate diagnostic indicators in a real clinical environ- and segmentation fails when multiple organizational ment [20]. Zhou et  al. proposed an effective 3D residual structures overlap. Amiri et  al. [12] proposed a multi- neural network for brain tumor segmentation, using a layer structure in which structural Random Forest and computationally efficient network 3D shuffleNetV2 as an Bayesian networks are embedded to learn tumor features encoder, and introducing a decoder with residual blocks better, but inputting a large number of features can eas- to achieve high-efficiency segmentation [21]. Saman et al. ily lead to dimensional disasters and waste plenty of time proposed an active contour model driven by optimized [13] uses a seed region growth algorithm to process brain energy function for MR brain tumor segmentation with MRI images according to the threshold T and the genera- uneven intensity correction and a method to identify and tion of PD images, and then uses the Markov logic algo- segment brain tumor slices in MRI images [22]. Liu et al. rithm to process them further to improve segmentation studied a deep learning model based on learnable group performance. convolution and deep supervision. This method replaces the convolution in the feature extraction stage with learn- Deep learning able group convolution. Tests on the BraTS2018 data- In recent years, convolutional neural networks have set show that the segmentation effect on the core area become the most popular method in image classifica - of the tumor is perfect, surpassing the winning method tion, and they are widely used in medical image analysis. NVDLMED [23]. In addition, CNN has also been widely Sérgio Pereira et al.[14] proposed an automatic position- used in other medical image analysis tasks. For example, ing method based on Convolutional Neural Network Yurttakal et  al. used the convolutional neural network (CNN) to explore the 3 × 3 core size, use a small core method for laryngeal histopathological image segmenta- to design a deeper architecture, and use intensity nor- tion, which is of great help to the early detection, moni- malization is used as a preprocessing step to train the toring and treatment of laryngeal cancer, and rapid and core validation data set in BraTS 2015. In the article by accurate tumor segmentation [24]. Hao et  al., a fully automatic brain tumor segmentation method based on U-net deep convolutional network Our work was proposed and evaluated on the BraTS 2015 data- Although many experts and scholars have proposed a set. Cross-validation shows that it can effectively obtain variety of deep learning network structures, and have promising segmentation [15]. Wang et  al. proposed a achieved good results in the field of brain tumor seg - cascade network. The first step is to segment the entire mentation. However, due to the inherent anisotropy of tumor. The second step is to segment the tumor nucleus brain glial tumors, MRI images show a high degree of using the obtained bounding box and segment the non-uniformity and irregular shapes [25]. Secondly, the enhanced tumor nucleus according to the bounding segmentation method of deep learning requires large- box of the tumor nucleus segmentation result. Use ani- scale annotation data, while brain tumor data is generally sotropic convolution and unfolded convolution, com- small and complex, and its inherent high heterogeneity bined with multi-view fusion methods to reduce false will cause intra-class differences between the sub-regions positives [16]. Andriy Myronenko proposed a 3D MRI of the brain tumor area and the tumor area, the differ - tumor subregion segmentation semantic network based ence between classes and non-tumor areas, etc. [26], on the encoder-decoder structure, which uses auto- these problems all affect the accuracy of brain tumor encoder branches to reconstruct images, and won first segmentation. place in the 2018 BraTS Challenge [17]. Feng Xue pro- In this article, to meet the above challenges, we use a posed an integrated 3D U-net brain tumor segmenta- combined model, integrate the “Squeeze and Excite” (SE) tion method. Using an integrated modelling method, module and the “Attention Guide Filter” (AG) module the encoder and decoder are input into 6 networks with into the VNet model for image segmentation of 3D MRI different colour block sizes and loss weights, and train - glioma brain tumors, it is an end-to-end network struc- ing has improved various performances [18]. In 2019, ture. We input data into the model in the form of volume Nabilibtehaz et  al. developed a novel architecture based input and use three-dimensional convolution to pro- on u-net, multires-unet, which increased the extension cess MRI images. When the image is compressed along Guan et al. BMC Medical Imaging (2022) 22:6 Page 4 of 18 with different encoder blocks, the resolution is halved, Methodology and the number of channels increases. After the image Method summary is convolved, the compression and compression module Our task is to segment multiple sequences of 3D MRI is performed. The importance of each feature channel is brain tumor images. In order to obtain good segmenta- automatically obtained through learning. Then according tion performance, we propose a new network structure to this important level to promote useful functions, and called AGSE-VNet, which combines SE (Squeeze-and- cancel the less useful functions of the current task. Each Excitation) [27] module with AG (Attention Guided decoder receives the characteristics of the corresponding Filter) module [28] is integrated into the network struc- stage of downsampling and decompresses the image, in ture, allowing the network to use global information to the upsampling, the AG module is integrated, the Atten- enhance useful feature channels selectively and suppress tion block is used to eliminate the influence of noise and useless feature channels, cleverly solving the mutual irrelevant background, and the Guide Image Filtering is dependence of feature maps, effectively suppressing the used to guide image features and structural information background information of the image, and enhancing the (edge information), it is worth mentioning that the idea accuracy of model segmentation. In the next section, we of skip connection is used in the model to avoid the dis- will introduce the network structure of AGSE-VNet in appearance of the gradient. Besides, we also use the Cate- detail. gorical_Dice loss function as the optimization function of the model, which effectively solves the problem of pixel Squeeze‑and‑excitation blocks imbalance. Figure 2 is a schematic diagram of the SE module, which We tested the performance of this model on the Mul- mainly includes the Squeeze module and the Excita- timodal Brain Tumor Segmentation Challenge (BraTS) tion module. The core of the module is to recalibrate 2020 dataset and compared it with the results of other the characteristic response of the channel adaptively teams participating in the challenge. The results show by explicitly modeling the interdependence between that our model has a good segmentation effect and has the channels. F in the figure is a standard convolu - tr the potential for clinical trials. The innovations of this tion operation, as shown in formula (1), input as X , ′ ′ ′ ′ Z ×W ×H ×C article are: (1) Clever use of channel relationships, using X ∈ R , where Z is the depth,H is the height global information to enhance useful information in the W is the width, C is the number of channels, the output Z×W×H×C s channel, to suppress useless information in the channel. is U , U ∈ R , v is a three-dimensional spatial (2) The attention mechanism is added, and the network convolution, v means that each channel acts on the cor- structure is also full of jump connections. The informa - responding channel feature. tion extracted by the downsampling can be quickly cap- tured to enhance the performance of the model. (3) Use s s U = v × X = v × x (1) c c the Categorical_Dice loss function to solve the problem s=1 of imbalance between foreground voxels and background voxels. Fig. 2 SE network module diagram Guan  et al. BMC Medical Imaging (2022) 22:6 Page 5 of 18 F (·) is the squeeze operation. As shown in formula (2), Among them, X = [x , x , . . . , x ] and F (u , s ) refer sq 1 2 c c c scale the feature U first passes the Squeeze operation. It com - to the corresponding channel between the feature map W×H presses the features along the spatial gradient and aggre- u ∈ R and the scalar s . c c gates the feature maps into the feature maps of dimension W × H as the feature descriptor. Each three-dimensional Attention guided filter blocks feature channel becomes a real number, which responds to Attention Guided Filter (AG) module combines attention the global distribution on the feature channel, to a certain block and guided image filtering. The Attention Guided extent, the real number at this time is closer to the global Filter filters the low-resolution feature maps and high- receptive field. This operation transforms the input of resolution feature maps to recover spatial information H × W × C into 1 × 1 × C output. and merge structural information from feature maps of different resolutions. Figure  3 is a schematic diagram of Z W H the Attention Block, where O and I are the input of the z = F (U ) = = k, i, j c sq c Z × H × W attention guided filter, and the attention map obtained i=1 j=1 k=1 by the calculation. Attention Block is extremely critical (2) in this method. It effectively solves the influence of the As shown in Eq.  3, in order to limit the complexity and background on the foreground and has the effect of high - generalization of the model, using two fully connected lay- lighting the foreground and reducing the background. ers as a parameterized gating mechanism. In Eq. 3, W × z For the given feature maps O and I , use convolution with represents a fully connected layer operation. The dimen - a channel of 1 × 1 × 1 to perform a linear transformation, sion of W is C × . Here m is a scaling parameter. In this and then combine the two converted feature maps with article, we set m = 4 empirically. The parameter aims is to the ReLU layer through element addition, and then use a reduce the number of channels and thus reduce the num- 1 × 1 × 1 . The convolution is again linearly transformed, ber of calculations. Then through a ReLU layer, the output and the sigmoid is most used to activate the final atten - dimension remains unchanged and then multiplying with tion feature map T . W . This process of multiplying with W is also an operation 2 2 Figure 4 is a schematic diagram of the results of the AG of a fully connected layer. The dimension of W is C × , module. The input is the guided feature map (I) and the and finally, through the Sigmoid function, the parameter s filtered feature map (O) , and the output is the high-reso- is obtained. lution feature map O , which is the product of the joint s = F (z, W ) = σ g(z, W ) = σ (W δ(W , z)) (3) action of I and O , as shown in formula (5). ex 2 1 C C ×C C× m m where δ is the ReLU operation, W ∈ R , W ∈ R , 1 2 O = W (I) · O i ij j (5) and finally a 1 × 1 × C real number sequence is com- i∈w bined with U , recalibrated, and the final output is obtained by formula (4). Different from the guided filtering proposed by Kaim - x = F (u , s ) = s · u (4) ing He [29], the attention feature map T is generated c scale c c c c from the filtered feature map (O) through the Atten- tion Block module. First, the guided feature map I is Fig. 3 Attention Block schematic diagram Guan et al. BMC Medical Imaging (2022) 22:6 Page 6 of 18 Fig. 4 Attention Guided Filter Blocks structure diagram down-sampled to obtain a low-resolution feature map average value of all windows containing the pixel at that I , which is similar to the feature map O Same size. Then pixel for such a pixel, as shown in formula (8). minimize the reconstruction error of O and I to obtain the coefficients A and B of the attention guided filter. O = (a I + b ) = A ∗ I + B l l i k i k l l l (8) |w| After that, by down-sampling A and B or coefficients l l k,i∈w A and B , finally get the high resolution generated by h h Get A and B through upsampling, and finally get the h h the attention filter Feature map O . Among them, the output O = A ∗ I + B . h h attention filter is essentially a specific window w with a radius of r . In particular, the attention guide filter will construct a square window w , and the radius of Downsamplings k at each position is r . In our study, we set r = 16 and In Fig.  1, we provide a schematic diagram of AGSE- ε = 0.1 empirically based on the final segmentation VNet. The network structure is divided into encoder and performance. (a , b ) is also the only certain constant decoder in total, as shown in Fig.  5. Among them, Fig. a k k coefficient, as shown in formula (6 ), where ridge regres- is the encoder, and the coding area mainly performs com- sion with a standard term is used to calculate the mini- pression path, and Fig. b is the decoder, and the decoding mum reconstruction error. area performs decompression. Downsampling is com- posed of four encoder blocks, each of which includes 2–3 2 2 2 min E(a , b ) := T (a I + b − O ) + εa layers of convolution, an extrusion and excitation layer k k k li k i i k a ,b k k i∈w and a downsampling layer, the processing process of the (6) SE module is shown on the right side of Fig. 5a. The fea - ture extraction is performed by convolution with a step where T is the attention weight at position i , ε is the reg- size of 2. The convolution is as follows (9), (10) shows: ularization parameter, and the calculation of (a , b ) is k k shown in formula (7). i = i + (s − 1)(i − 1) (9) I O − μ O i i k k |w| i∈w a = , b = O − a μ k k k k k i + 2p − k σ + ε o = + 1 k (10) (7) where μ is the average value of the window w pixels in k k i i where is the input size, is the output size after filling, the image I , σ is the variance of the window w , |w| is the k s p is the step size, is the filling size, k is the convolution sum of the window pixels, and O is the average pixel k o kernel size, and is the output size. value O = O of the image O to be filtered in the k i When the image is compressed along with different |w| i∈w encoder blocks, its resolution is halved and the num- window w , so that the non-edge area can be found if a ber of channels doubled. This is achieved by convolu - pixel is surrounded by multiple windows, calculate the 3 × 3 × 3 tion of voxels with a step size of 2. After the Guan  et al. BMC Medical Imaging (2022) 22:6 Page 7 of 18 Fig. 5 The architecture of encoder block and decoder block with AGSE-VNet convolution operation, the squeeze and compression deconvolution with a step size of 2 to fill in the image fea - module is performed, which ingeniously solves the rela- ture information. The deconvolution is shown in formula tionship between the channels and improves the effec - (11): tive information transmission in the channels. It is worth o = s(i − 1) + 2p − k + 2 (11) mentioning that all convolutional layers have adopted normalization and dropout processing, and the ReLU Each decoder block receives the characteristics of the activation function has also been applied to various corresponding stage of downsampling. The convolu - positions in the network structure. Besides, a jump con- tion kernel used in the last layer of the network struc- nection method is also used in the model to avoid the ture keeps the number of output channels consistent disappearance of the gradient as the network structure with the number of categories. Finally, the channel value deepens. is converted into a probability value output through the sigmoid function, and the voxel is converted into a brain tumor gangrene area. The idea of skip connection Upsampling is adopted in each decoder block. The feature map after After downsampling the model, we introduced the AG processing by the encoder and decoder is shown in Fig. 6, module to solve the problem of restoring spatial infor- where Fig.  6a is a feature map processed by the encoder, mation and fusing structural information from low-res- and Fig. 6b is a feature map processed by the decoder. olution feature maps to high-resolution feature maps. The AG module is similar to the SE module. Based on not changing the dimensions of input and output, the Skip connection features are enhanced. Therefore, we replace the splic - To further make up for the information lost in the down- ing module in the VNet model with the AG module and sampling of the encoder, concat is used between the integrate it into the decoder. The structure diagram is encoder and decoder of the network to fuse the feature shown in Fig.  5. Each decoder block includes an upsam- maps of the corresponding positions in the two pro- pling layer, an AG module, and three layers of convolu- cesses. In particular, the method extracted in this article tion, the processing flow of the AG module is shown in uses the AG (Attention Guided Filter Blocks) module the box on the right side of Fig. 5b. The decoder decom - instead of concat, so that the decoder can obtain infor- presses the image. In the up-sampling, this article uses mation during upsampling. With more high-resolution Guan et al. BMC Medical Imaging (2022) 22:6 Page 8 of 18 Fig. 6 Feature map processed by encoder and decoder information, the detailed information in the original convolutional layers with the same size feature map, image can be restored more perfectly, and the segmenta- that is, use the splicing layer to convolve the feature tion accuracy can be improved. map obtained through the convolution of the previ- We introduced adjacent layer feature reconstruc- ous layer and the next layer. Obtaining the channel size tion and cross-layer feature reconstruction in the net- achieves the purpose of maximizing the use of feature work. The cross-layer feature reconstruction module is information in all previous layers. based on the encoder-decoder structure. In the process of network communication, as the network continues to deepen, the acceptance domain of the correspond- Loss function ing feature map will become larger and larger, but the At present, the segmentation of medical images faces retained detailed information will become less and the problem of the imbalance between the foreground less. Based on the encoder-decoder symmetric struc- and the background regions. We also face such chal- ture, the splicing layer is used to splice the feature lenges in our tasks. Therefore, we choose the Categori - maps extracted from the down-sampling in the encoder cal_Dice loss function as the optimization function of process and the new features obtained from the up- our model. Heavy to solve this problem by adjusting sampling in the decoder process to perform channel- the weight of each forecast category. We set the weight dimensional splicing. Retaining more important feature of gangrene, edema, and enhanced tumor to 1, and the information is conducive to achieving a better seg- weight of background to 0.1. The Categorical_Dice loss mentation effect. Adjacent layer feature reconstruction function is shown in formula (12): is to establish a branch between each pair of adjacent Guan  et al. BMC Medical Imaging (2022) 22:6 Page 9 of 18 2|P ∩ G| The experimental environment was conducted on Ten - Dice(P, G) = (12) sorflow 1.13.1, and the runtime platform processor was |P| + |G| Intel (R) Core (TM) i7-9750H CPU @ 2.60  GHz, 32  GB Among them, G is Mask, the ground truth encoded by RAM, Nvidia GeForce RTX 2080, 64-bit Windows 10. one-hot, G ∈ [None, 64, 128, 128, 4] and P represents the The development software platform is PyCharm, and the predicted value, which is the probability result obtained python version is 3.6.9. after softmax calculation, P ∈ [None, 64, 128, 128, 4] . The partial differential calculation of formula (13) is per - Pre‑processing formed to obtain the gradient value relative to the pre- Since our data set has four modalities, T1, T1-CE, T2, dicted j-th voxel, where N stands for voxel, p ∈ P and and FLAIR, there is a problem of different contrast, g ∈ G. which may cause the gradient to disappear during the   � � � � � � � training process, so we use standardization to process the N N N 2 2 g p + g − 2p p g j j i i ∂D i i i i i   image, from the image pixel. The image data is normal - = 2  � �  � � ∂p ized by subtracting the average value and dividing by the j N N 2 2 p + g i i i i standard deviation. Calculated as follows: (13) X − μ The weight distribution of the loss function of each X = (15) node is shown in formula (14), and the weight value is [0.1, 1.0, 1.0, 1.0]. where μ donates the mean of the image, σ donates stand- ard deviation, X donates the image matrix, X is the nor- Loss =−Dice(P, G) × weight (14) malized image matrix. After normalization, we merge the images of the four modalities with the same contrast to form a three-dimen- Materials sional image with four channels. The original image size Dataset is, and the combined image size becomes. The size of the In this research, we use the dataset of the BraTS 2020 label is, and its pixel value contains 4 different values. challenge to train and test our model [30, 31]. The data Channel 0 is the normal tissue area, 1 is the gangrene set contains two types, namely low-grade glioma (LGG) area, 2 is the edema area, and 3 is the enhanced tumor and glioblastoma (HGG), each category has four modal area. Then, divide the image and mask into multiple images: T1 weighting (T1), T1-weighted contrast- blocks and perform the patch operation. Each case gen- enhanced (T1-CE), and T2 weighting (T2), fluid attenu - erates 175 images with a size of 128 × 128 × 64. Finally, ation inversion recovery (FLAIR). The mask of the brain save it in the corresponding folder in NumPy.npy format tumor includes the gangrene area, edema area, and (https:// numpy. org/ doc/ stable/ refer ence/). The preproc - enhancement area. Our task is to segment the three sub- essed image is shown in Fig. 7. regions formed by nesting tags, which are enhancement tumor (ET), whole tumor (WT), and tumor core (TC). Evaluation metrics There are 369 cases in the training set and 125 cases We use the dice coefficient, specificity, sensitivity, and in the validation set. The masks corresponding to these Hausdorff95 distance to measure the performance of our cases are not used for training, and their functions are model. Dice coefficient is calculated as: mainly used for evaluating the model after training. 2TP Dice = (16) FN + FP + 2TP Design detail where TP , FP and FN are the number of true positive, In deep learning training, the setting of hyperparameters false positive, and false negative respectively. Specificity is very essential, and it will determine the performance can be used to evaluate the number of true negative and of our model. But often in training, the initial value of false positive, it is used to measure the model ability to the hyperparameter is set by experience. In the training predict the background area, defined as: of the AGSE-VNet model, the initial learning rate is set to 0.0001, the dropout is set to 0.5, the number of train- TN Specificity = (17) ing steps is about 350,000, and then the learning rate is TN + FP adjusted to 0.00003. The dataset is halved every time it is traversed, and the data is shuffled to enhance the robust - where TN is the number of true negative. Sensitivity can ness and generalization ability of the model. be used to evaluate the number of the true positive and Guan et al. BMC Medical Imaging (2022) 22:6 Page 10 of 18 Fig. 7 Preprocessed result false negative, it is used to measure the sensitivity of the Results and discussions model to segmented regions, defined as: Results on AGSE‑VNet model Our data set includes a training set and a test set. The TP training set contains 369 cases and the test set contains Sensitivity = (18) TP + FN 125 cases. The mask of the tumor includes the gan - grene area, edema area, enhancement area, and back- The Hausdorff95 distance measures the distance between ground area. The labels correspond to 1, 2, 4, and 0, the surface of the real area and the predicted area, which is respectively. These labels are merged into three nested more sensitive to the segmented boundary, defined as: sub-areas, namely the enhancing tumor (ET), the whole tumor (WT), and the tumor core (TC), for these sub- Haus95(T , P) = max sup inf d(t, p), sup inf d(t, p) regions, we use four indicators of sensitivity, specificity, t∈T ,p∈P p∈P,t∈T dice coefficient, and Hausdorff95 distance to measure the (19) performance of the model. We use the data set of BraTS where inf denotes the infimum and sup denotes the 2020 for training and verification, and the average index supremum, t and p donate the points on the surface T of obtained is shown in Table  1. From Table  1, we observe the ground-truth area and the surface P of the pre data- that the model has a better segmentation effect on the set dictated area. Besides, d ·, · calculates the distance ( ) WT region. The Dice and Sensitivity of the training set between the assembly point t and the assembly point p. and the validation set are 0.846, 0.849, 0.825, and 0.833, respectively, which are significantly better than other regions. Table 1 Quantitative valuation on the training set and validation set Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Training 0.70 0.85 0.77 0.72 0.83 0.74 0.99 0.99 0.99 35.70 8.96 17.40 Validation 0.68 0.85 0.69 0.68 0.83 0.65 0.99 0.99 0.99 47.40 8.44 31.60 Guan  et al. BMC Medical Imaging (2022) 22:6 Page 11 of 18 On this basis, we conduct a statistical analysis of the can be seen that the fluctuation range is small. Observing experimental results. Figures 8 and 9 are the scatter plots the scatter diagram on the left side, it can be seen that and box plots of the four evaluation indicators of the the data are all clustered at a higher position, indicating training set and test set, reflecting the distribution char - that our model is the background area has a high level of acteristics of the results. It can be seen from the box plot prediction, which can effectively alleviate the problem of that there are fewer outliers of various indicators and imbalance between foreground pixels and background. minimal fluctuation of results. The horizontal line in the We randomly selected several slices from the training box plot represents the median of this set of data. It can set and compared the actual situation with the results be observed that the three indicators of Dice, Sensitiv- predicted by our model, as shown in Fig.  10a, the first ity, and Specificity are at a higher level, which shows that line is the original image, the second line is the label, and the segmentation effect of our proposed model is located the third line is the tumor sub-region predicted by our in a higher area. In the results of the four indicators, the model. At the same time, we also selected two of them sensitivity results are all concentrated at a higher level. It to display in Fig. 10b. Among them, the green area is the Fig. 8 A collection of scatter plots and box plots of four indicators in the training set Guan et al. BMC Medical Imaging (2022) 22:6 Page 12 of 18 Fig. 9 A collection of scatter plots and box plots of four indicators in the validation set whole tumor (WT), the red area is the tumor core (TC), display, as shown in Fig.  11a. Similarly, in Fig.  11b, we and the area combining yellow and red represents the also show the three-dimensional image of the segmen- enhancing tumor(ET). We show the 3D image of the seg- tation result and annotate the accuracy value of the ET mentation result in the last two columns. From the com- region, as can be seen from the figure, our model has parison of the segmentation results, we can find that our a good segmentation effect for MRI images of differ - model has a good effect on brain tumor segmentation, ent intensities, and can accurately segment tumor sub- especially the whole tumor (WT) region segmentation regions, which has a certain potential in brain tumor effect is excellent. However, the segmentation prediction image segmentation. of the tumor core (TC) is slightly biased, which may not In our research, we proposed the AGSE-VNet model be suitable for extraction due to the small features of the to segment 3D MRI brain tumor images and obtained tumor core. better segmentation results on the BraTS 2020 dataset. After the training is completed, we randomly selected In order to further verify the effect of our segmentation, several segmentation slices in the validation set for compare our experimental method with the methods Guan  et al. BMC Medical Imaging (2022) 22:6 Page 13 of 18 Fig. 10 Display of segmentation results in the training set. a Example segmentation results in 2D. b Example segmentation results with 3D rendering proposed by other outstanding teams participating in the our model performs well in the whole tumor (WT) competition. The results of other teams are available on region and obtains relatively excellent results, indicating the official website of the BraTS Multimodal Brain Tumor that the method we proposed has a certain potential in Segmentation Challenge 2020 (https:// www. cbica. upenn. segmentation. edu/ BraTS 20/ lboar dTrai ning. html). The comparison results of the training set are shown in Table  2, and the Discussion comparison results of the verification set are shown in The method proposed in this paper cleverly solves the Table  3. From the results in the table, we can find that problem of interdependence between channels, and Guan et al. BMC Medical Imaging (2022) 22:6 Page 14 of 18 Fig. 11 Display of segmentation results in the validation set. a Example segmentation results in 2D. b Example segmentation results with 3D rendering autonomously extracts effective features from channels and segmented, and the segmentation effect obtained to suppress useless feature channels. After the features has a good performance. This is beneficial to radiologists extracted by the encoder, low-resolution feature maps and oncologists, who can quickly predict the condition and high-resolution feature maps are filtered through of the tumor and assist in the treatment of the patient. the Attention module, to recover spatial information and Comparing the results in Tables 2 and 3, we find that our fusion structural information from feature maps of dif- model performs well in the whole tumor (WT) area, but ferent resolutions, our method is not affected by the size does not perform well in the enhancing tumor (ET) and and location of the tumor. For MRI images of different the tumor core (TC) areas, this may be because the tar- intensities, the tumor area can be automatically identi- get in the ET area is small and the feature is fuzzy and fied, and the tumor sub-regions can be feature extracted difficult to extract. At the same time, we compare our Guan  et al. BMC Medical Imaging (2022) 22:6 Page 15 of 18 Table 2 The results of various indicators in the training set Team Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Proposed 0.70 0.85 0.77 0.72 0.83 0.74 0.99 0.99 0.99 35.70 8.96 17.40 mpstanford 0.60 0.78 0.72 0.56 0.80 0.75 0.99 0.99 0.99 35.95 17.68 17.21 agussa 0.67 0.87 0.79 0.69 0.87 0.82 0.99 0.99 0.99 39.25 15.75 17.05 ovgu_seg 0.65 0.81 0.75 0.72 0.78 0.76 0.99 0.99 0.99 34.79 9.50 8.93 AI-Strollers 0.59 0.73 0.61 0.52 0.73 0.64 0.99 0.97 0.98 38.87 20.81 24.22 uran 0.48 0.79 0.64 0.45 0.74 0.61 0.99 0.99 0.99 37.92 7.72 14.07 CBICA 0.54 0.78 0.57 0.64 0.82 0.53 0.99 0.99 0.99 20.00 46.30 39.60 unet3d-sz 0.69 0.81 0.75 0.77 0.93 0.83 0.99 0.96 0.98 37.71 19.57 18.36 iris 0.76 0.88 0.81 0.78 0.90 0.83 0.99 0.99 0.99 32.30 18.07 14.70 VuongHN 0.74 0.81 0.82 0.84 0.98 0.84 0.95 0.93 0.99 21.97 12.32 8.72 Table 3 The results of various indicators in the validation set Team Dice Sensitivity Specificity Hausdorff95 ET WT TC ET WT TC ET WT TC ET WT TC Proposed 0.68 0.85 0.69 0.68 0.83 0.65 0.99 0.99 0.99 47.40 8.44 31.60 mpstanford 0.49 0.72 0.62 0.49 0.81 0.69 0.99 0.99 0.99 61.89 26.00 28.02 agussa 0.59 0.83 0.69 0.60 0.87 0.71 0.99 0.99 .0.99 56.58 23.23 29.59 ovgu_seg 0.60 0.79 0.68 0.66 0.79 0.67 0.99 0.99 0.99 54.07 12.05 19.10 AI-Strollers 0.58 0.74 0.61 0.52 0.77 0.62 0.99 0.99 0.99 47.23 24.03 31.54 uran 0.75 0.88 0.76 0.77 0.85 0.71 0.99 0.99 0.99 36.42 6.62 19.30 CBICA 0.63 0.82 0.67 0.76 0.78 0.75 0.99 0.99 0.99 9.60 10.70 28.20 unet3d-sz 0.70 0.84 0.72 0.71 0.87 0.79 0.99 0.99 0.99 42.09 10.48 12.32 iris 0.68 0.86 0.73 0.67 0.90 0.70 0.99 0.99 0.99 44.13 23.87 20.02 VuongHN 0.79 0.90 0.83 0.80 0.89 0.80 0.99 0.99 0.99 21.43 6.74 7.05 Table 4 Comparison of our proposed AGSE-VNet model with the method proposed by Zhao et al., a new segmentation classic methods framework was developed, using a fully convolutional neural network to assign different labels to the image Method Dice_ET Dice_WT Dice_TC Dataset in pixel units, optimize the output results of FCNNs by Proposed 0.67 0.85 0.69 BraTs 2020 using the recurrent neural network constructed by the Zhou et al 0.65 0.87 0.75 BraTs 2018 conditional random place, this method was verified on Zhao et al 0.62 0.84 0.73 BraTs 2016 the BraTS 2016 dataset and got a good segmentation Pereira et al 0.65 0.78 0.75 BraTs 2015 effect. Pereira et  al. proposed an automatic position - ing method for convolutional neural networks, which achieved good results in the BraTS 2015 dataset. Analyzing Table 4, we found that our model has certain method with some classic algorithms for brain tumor advantages in segmentation, there are still differences segmentation. The results are shown in Table  4. In the in TC regional accuracy, and the model has limitations. BraTS Challenge, 2018, zhou et  al. [32] and others pro- In future work, we will propose solutions to this situa- posed a lightweight one-step multi-task segmentation tion, such as how to further segment the region of inter- model, by learning the shared parameters of joint fea- est after our model has extracted it, in order to improve tures and the composition features of distinguishing the accuracy of the enhancing tumor (ET) and the tumor specific task parameters, the imbalance factors of tumor core (TC) areas, more characteristic information can be types are effectively alleviated, uncertain information is captured. Besides, the algorithms proposed in many top suppressed, and the segmentation result is improved. In methods have their areas of excellent performance. How Guan et al. BMC Medical Imaging (2022) 22:6 Page 16 of 18 we combine the advantages of these algorithms and inte- input ratio and output ratio are unchanged. After the grate them into our model is the focus of our future work. SE module processes the model, the network learns the In clinical treatment, it helps experts to understand the global information and selects the useful information patient’s current situation more quickly and accurately, in the enhancement channel, and then uses the atten- saving experts time, and realizing a leap in the quality of tion mechanism of the Attention Guild Filter block to automatic medical segmentation. quickly capture its dependencies and enhance the per- In addition, in order to verify the robustness of our formance of the model. Secondly, we also introduced model to resist noise interference, we have now added a new loss function Categorical_Dice, set different Gaussian noises in the frequency domain (k-space) of the weights for unused masks, set the weight of the back- testing data to simulate realistic noise contaminations. ground area to 0.1, and set the tumor area of interest The comparison results are shown in Fig.  12. From the to 1, Ingeniously solve the problem of the voxel imbal- noisy and no-noise segmentation results, we have found ance between the foreground and the background. Our that the segmentation results of our AGSE-VNet model online verification tool on the BraTS Challenge website for the three regions are not much different. These results evaluated this approach. It is found that our model is can demonstrate that our model has a significant advan - still different from the top methods for the segmenta - tage in generalization when noises are present. tion of the enhancing tumor (ET) and the tumor core (TC) regions. It may be because the features of these Conclusion two regions are small and difficult to extract. How to All in all, we have implemented a good method to improve the accuracy of these two regions is our future segment 3D MRI brain tumor images, this method work direction. can automatically segment the three regions of the The automatic segmentation of brain tumors in the enhancing tumor (ET), the whole tumor (WT), and medical field has been a long-term research problem. the tumor core (TC) of the brain tumors. We con- How to design an automatic segmentation algorithm ducted experiments on the BraTS 2020 data set and with short time and high accuracy, and then form a com- got good results. The AGSE-VNet model is improved plete system is the current direction of a large number of based on VNet. There are five encoder blocks and four researchers. Therefore, we must continue to optimize our decoder blocks. Each encoder block has an extrusion segmentation model to achieve a qualitative leap in the and excitation block, and each decoder has an Atten- field of automatic segmentation. tion Guild Filter block. Such a design can be embed- ded in our model without affecting the size mismatch of the network structure under the condition that the Fig. 12 Comparison of segmentation results without noise and noise added Guan  et al. BMC Medical Imaging (2022) 22:6 Page 17 of 18 Authors’ contributions for brain tumor segmentation using multi-parametric MRI. NeuroImage Clin. XG, GY, and XL conceived and designed the study, contributed to data 2016;12(2):753–64. analysis, contributed to data interpretation, and contributed to the writing of 8. Milletari F, Navab N, Ahmadi S. V-Net: fully convolutional neural networks the report. XG, GY, JY, WY, XX, WJ, and XL contributed to the literature search. for volumetric medical image segmentation. In: 2016 fourth international JY and WY contributed to data collection. XG, GY, XX, WJ, and XL performed conference on 3D vision (3DV ). IEEE. 2016. data curation and contributed to the tables and figures. All authors read and 9. Rickmann A, Roy A, Sarasua I, Navab N, Wachinger C. `Project & Excite’ mod- approved the final manuscript. ules for segmentation of volumetric medical scans. Image Video Processing. Funding 10. Tustison N, Shrinidhi K, Wintermark M, Durst CR, Kandel BM, Gee JC, This work is funded in part by the National Natural Science Foundation Grossman MC, Avants BB. Optimal symmetric multimodal templates and of China (Grants No. 62072413, 61602419), in part by the Natural Science concatenated random forests for supervised brain tumor segmentation Foundation of Zhejiang Province of China (Grant No. LY16F010008), in part (simplified) with ANTsR. Neuroinformatics. 2015;13(2):209–25. by Medical and Health Science and Technology Plan of Zhejiang Province of 11. Rose S, Crozier S, Bourgeat P, Dowson N, Salvado O, Raniga P, Pannek K, Coul- China (Grant No. 2019RC224), in part by the Teacher Professional Develop- thard A, Fay M, Thomas P. Improved delineation of brain tumour margins ment Project of Domestic Visiting Scholar in Colleges and Universities of using whole-brain track-density mapping. In: Ismrm-esmrmb joint meeting: Zhejiang Province of China (Grants No.2020-19, 2020-20), in part by the UK clinical needs & technological solutions. International Society of Magnetic Research and Innovation Future Leaders Fellowship (MR/V023799/1), and Resonance in Medicine. 2009. also supported in part by the AI for Health Imaging Award ‘CHAIMELEON: 12. Amiri S, Mahjoub MA, Rekik I. Bayesian network and structured random Accelerating the Lab to Market Transition of AI Tools for Cancer Management’ forest cooperative deep learning for automatic multi-label brain tumor [H2020-SC1-FA-DTS-2019-1 952172]. segmentation. In: 10th international conference on agents and artificial intelligence. 2018. Availability of data and materials 13. Balafar M. Fuzzy cc-mean based brain MRI segementation algorithms. Artif The datasets analysed during this current study are available in the BRATS Intell Rev. 2014;41(3):441–9. 2020. https:// www. med. upenn. edu/ cbica/ brats 2020/ data. html. 14. Pereira S, Pinto A, Alves V. Brain tumor segmentation using convo- lutional neural networks in MRI images. IEEE Trans Med Imaging. 2016;35(5):1240–51. Declarations 15. Hao D, Yang G, Liu F, Mo Y, Guo Y. Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks. In: Annual Ethics approval and consent to participate conference on medical image understanding and analysis. Springer, Cham; Not applicable. 16. Wang G, Li W, Ourselin S, Vercauteren T. Automatic brain tumor segmenta- Consent for publication tion using cascaded anisotropic convolutional neural networks. Comput Vis All authors contributed to the article and approved the submitted version. Pattern Recognit. 2017;12(5). 17. Myronenko A. 3D MRI brain tumor segmentation using autoencoder regu- Competing interests larizatio. Berlin: Springer; 2018. The authors declare no competing interests. 18. Xue F, Nicholas T, Meyer C. Brain tumor segmentation using an ensemble of 3D U-nets and overall survival prediction using radiomic features. Comput Author details Vis Pattern Recognit. 2018;279–288. School of Medical Technology and Information Engineering, Zhejiang Chi- 19. NabilIbtehaz M, Rahman S. MultiResUNet: rethinking the U-net architec- nese Medical University, Hangzhou 310053, China. Cardiovascular Research ture for multimodal biomedical image segmentation. Comput Vis Pattern Centre, Royal Brompton Hospital, London SW3 6NP, UK. National Hear t Recognit. 2019;121. and Lung Institute, Imperial College London, London SW7 2AZ, UK. First Affili- 20. Xu C, Xu L, Ohorodnyk P, Roth M, Li M. Contrast agent-free synthesis and ated Hospital, Gannan Medical University, Ganzhou 341000, China. College segmentation of ischemic heart disease images using progressive sequen- of Life Science, Zhejiang Chinese Medical University, Hangzhou 310053, China. tial causal GANs. Medical Image Analysis 101668. 2020. 21. Zhou X, Li X, Hu K, Zhang Y, Chen Z, Gao X. ERV-Net: An efficient 3D residual Received: 16 January 2021 Accepted: 26 July 2021 neural network for brain tumor segmentation. Expert Syst Appl. 2021;170. 22. Saman S, Narayanan S. Active contour model driven by optimized energy functionals for MR brain tumor segmentation with intensity inhomogeneity correction. Multimedia Tools Appl. 2021;80(4):21925–54. 23. Liu H, Li Q, Wang L. A deep-learning model with learnable group convolu- References tion and deep supervision for brain tumor segmentation. Math Probl Eng. 1. Nie J, Xue Z, Liu T, Young GS, Setayesh K, Lei G. Automated brain tumor 2021;3:1–11. segmentation using spatial accuracy-weighted hidden Markov random 24. Yurttakal A, Erbay H. Segmentation of Larynx histopathology images via field. Comput Med Imaging Graph. 2009;33(6):431–41. convolutional neural networks. In: Intelligent and Fuzzy Techniques: Smart 2. Bakas S, Reyes M, Jakab A, et al. Identifying the best machine learning algo- and Innovative Solutions. 2021;949–954. rithms for brain tumor segmentation. Progression assessment, and overall 25. Zhao X, Wu Y, Song G, Li Z, Zhang Y, Fan Y. A deep learning model inte- survival prediction in the BRATS challenge, arXiv preprint arXiv: 1811. 02629. grating fcnns and crfs for brain tumor segmentation. Med Image Anal. 2018;43:98–111. 3. Essadike A, Ouabida E, Bouzid A. Brain tumor segmentation with Vander 26. Sturm D, Pfister S, Dtw J. Pediatric gliomas: current concepts on diagnosis, Lugt correlator based active contour. Comput Methods Programs Biomed. biology, and clinical management. J Clin Oncol. 2017;35(21):2370. 2018;60:103–17. 27. Hu J, Li S, Albanie S, Sun G. Squeeze-and-excitation networks. IEEE Trans 4. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y. Brain tumor Pattern Anal Mach Intell. 2017;99. segmentation with deep neural networks. Med Image Anal. 2017;35:18–31. 28. Zhang S, Fu H, Yan Y, Zhang Y, Wu Q, Tan M, Xu Y. Attention guided network 5. Akkus Z, Galimzianova A, Hoogi A, Daniel R. Deep learning for brain MRI for retinal image segmentation. In: Medical image computing and com- segmentation: state of the art and future directions. J Digit Imaging. puter assisted intervention—MICCAI 2019. 2019. 2017;30(4):449–59. 29. He K, Sun J, Tang X. Guided image filtering. Lect Notes Comput Sci. 6. Hussain S, Anwar S, Majid M. Segmentation of Glioma tumors in brain using 2013;35(6):1397–409. deep convolutional neural network. Neurocomputing. 2017; 282. 30. Menze B, Jakab A, Bauer S, Jayashree KC, Keyvan F, Justin K. The multimodal 7. Sauwen N, Acou M, Cauter S, Sima DM, Veraart J, Maes F, Himmelreich U, brain tumor image segmentation benchmark (BRATS). IEEE Trans Med lmag- Achten E, Van Huffel S. Comparison of unsupervised classification methods ing. 2015;34(10):1993–2024. Guan et al. BMC Medical Imaging (2022) 22:6 Page 18 of 18 31. Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Nat Sci Data. 2017;4: 170117. https:// doi. org/ 10. 1038/ sdata. 2017. 117. 32. Zhou C, Ding C, Wang X, Lu Z, Tao D. One-pass multi-task networks with cross-task guided attention for brain tumor segmentation. Computer Vision and Pattern Recognition. 2019. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions

Journal

BMC Medical ImagingSpringer Journals

Published: Jan 5, 2022

Keywords: Brain tumor; Magnetic resonance imaging; VNet; Automatic segmentation; Deep learning

References