MAGAN: Mask Attention Generative Adversarial Network for Liver Tumor CT Image Synthesis

Yang Liu; Lu Meng; Jianping Zhong

doi:10.1155/2021/6675259

MAGAN: Mask Attention Generative Adversarial Network for Liver Tumor CT Image Synthesis

Liu, Yang;Meng, Lu;Zhong, Jianping 2021-01-31 00:00:00 Hindawi Journal of Healthcare Engineering Volume 2021, Article ID 6675259, 11 pages https://doi.org/10.1155/2021/6675259 Research Article MAGAN: Mask Attention Generative Adversarial Network for Liver Tumor CT Image Synthesis 1 2 2 Yang Liu, Lu Meng , and Jianping Zhong Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110000, China College of Information Science and Engineering, Northeastern University, Shenyang 110000, China Correspondence should be addressed to Lu Meng; menglu1982@gmail.com Received 8 December 2020; Revised 10 January 2021; Accepted 20 January 2021; Published 31 January 2021 Academic Editor: Jialin Peng Copyright © 2021 Yang Liu et al. (is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For deep learning, the size of the dataset greatly aﬀects the ﬁnal training eﬀect. However, in the ﬁeld of computer-aided diagnosis, medical image datasets are often limited and even scarce. We aim to synthesize medical images and enlarge the size of the medical image dataset. In the present study, we synthesized the liver CT images with a tumor based on the mask attention generative adversarial network (MAGAN). We masked the pixels of the liver tumor in the image as the attention map. And both the original image and attention map were loaded into the generator network to obtain the synthesized images. (en, the original images, the attention map, and the synthesized images were all loaded into the discriminator network to determine if the synthesized images were real or fake. Finally, we can use the generator network to synthesize liver CT images with a tumor. (e experiments showed that our method outperformed the other state-of-the-art methods and can achieve a mean peak signal-to-noise ratio (PSNR) of 64.72 dB. All these results indicated that our method can synthesize liver CT images with a tumor and build a large medical image dataset, which may facilitate the progress of medical image analysis and computer-aided diagnosis. An earlier version of our study has been presented as a preprint in the following link: https://www.researchsquare.com/article/rs-41685/v1. colleagues synthesized the images of skin lesions with GAN 1. Introduction [4], which enlarged the skin image dataset and improved the Medical image analysis and processing is the core of performance of lesion segmentation. For liver CT images, computer-aided diagnosis, which has been greatly prompted GAN was mainly used for expanding the dataset of the liver by deep learning. And the training of deep learning can be lesion [5] or image denoising [6], but the focus of GAN was extensively inﬂuenced by the size of the dataset; that is, the only on the liver lesion, not on the whole liver CT images. more datasets can be obtained, the better the performance For brain images [7], there are many image modules, such as the trained deep learning model can achieve. However, in CT images, magnetic resonance (MR) images, and positron the ﬁeld of computer-aided diagnosis, the medical image is emission tomography (PET), and diﬀerent modules have very limited and even scarce, due to the privacy of patients, diﬀerent image acquisition methods and diﬀerent inﬂuences on human brains. Dong Nie and colleagues used GAN to the expense of medical image acquisition, and so on. (erefore, synthesized medical images can be seen as the synthesize 7T images from 3T MR images [8] because 7T only feasible way to solve this problem, and generative magnetic resonance (MR) images were very rare due to the adversarial networks (GAN) [1, 2] provide us a powerful tool expensive image acquisition costs and the side eﬀects of high to realize it. magnetic ﬁeld strength. Moreover, some studies proposed to GAN was ﬁrstly proposed by Goodfellow and colleagues train a GAN to generate CT images from MR images to avoid in 2014 and was widely used in various ﬁelds, such as image the radiation from the CT image acquisition [9, 10]. For processing, natural language processing, and even medical retinal images, the image resolutions were generally smaller image synthesis [3]. For skin lesion images, Baur and than 100 × 100, and the image contents were only limited to 2 Journal of Healthcare Engineering single color background and vessels. Based on the charac- success of the proposed algorithm. In the procedure of image teristics, some studies [11] used GAN to synthesize the whole synthesis, the liver tumor was the saliency map in the whole retinal image to enlarge the retinal image dataset, but the liver CT image, which meant that all the pixels of the liver method cannot be generalized to other medical image tumors were masked by the attention map. (e original modules with bigger image resolution and more organs, image and the attention map were paired together and called such as liver CT image or brain MR image. “pairing A.” (en, the original image and the attention map Above all, all these medical image synthesis methods can were loaded into the generator network to obtain a syn- be categorized into three types: (1) transformation of dif- thesized image, and the attention map and the synthesized ferent modules, such as from CT images to MR images, (2) image were paired together and called “pairing B.” Next, transformation between the diﬀerent parameter of image pairing A and pairing B were both loaded into the dis- acquisition, such as from 3T MR images to 7T MR images, criminator network to determine if the synthesized image and (3) image synthesis of the small resolution, such as skin was real or fake. (e generator network and the discrimi- and retinal images. Although there were many existing nator network were trained with adversarial learning so that methods, medical image synthesis is far from clinical ap- both of them can become more and more powerful. After plications, since there are still some shortcoming. training, the generator network can ﬁll the pixels of the attention map with similar gray values, texture, and shape of liver tumors, to synthesize liver CT images with tumors. 1.1. Image Resolution. Many current medical image syn- More details of our model can be obtained from Sections thesis methods can only synthesize images with low reso- 2.1∼2.3. lution, which were lower than 128 × 128. However, most of the medical images in the clinical application were high image resolution, such as 512 × 512 CT images and 512 × 512 2.1. Attention Model. All liver CT images used in our MR images. method were from a public liver CT dataset, Liver Tumor Segmentation (LiTS) [13, 14], which was from the MICCAI 1.2. Lesions or Tumors. (e current existing medical image 2017 competition. In the LiTS dataset, the pixel distance was synthesis methods cannot synthesize images with abnor- from 0.55 mm to 1.0 mm, the slice spacing was from malities, such as liver lesions and liver tumors. As we know, 0.45 mm to 6.0 mm, and the image resolution was 512 × 512. the size and variety of the training dataset are essential to the LiTS consisted of 131 enhanced CT image sequences, and all performance of deep learning methods. During the training the tumors in the liver CT images were manually labeled by of medical images’ classiﬁcation and analysis, it was essential radiologists. We aimed to synthesize liver CT images with to have both normal images and abnormal images to create tumors, and the synthesized materials were from two as- an eﬀective data set, but the medical images with abnor- pects, liver CT images from healthy controls and liver tumor malities were relatively rare due to the hospital policy, pa- CT images from patients. Moreover, the liver tumor was the tients’ privacy, and so on. (erefore, synthesizing medical most salient region for clinicians and was also the most images with abnormalities can enlarge the dataset of deep diﬃcult part of the whole synthesis procedure. (erefore, learning methods and upgrade the performance. according to the tumor labels from the LiTS dataset, the To solve the shortcomings, we proposed a novel image image values of all the corresponding pixels in the tumors synthesis model for normal liver CT images and liver CT were changed to 4096, which meant “white color,” and images with tumors based on mask attention generative represented as an attention map in our model. Based on the adversarial network (MAGAN). Using this model, we can attention mechanism, the original image and the attention build a liver CT image dataset consisting of thousands of map were transformed into feature maps A and B by using synthesized 512 × 512 slices; furthermore, it also can facili- 1 × 1 convolution, respectively, and then all these feature tate the progress of computer-aided diagnosis and the maps were concatenated by using matrix multiplication, training of deep learning models. shown in Figure 2: (e main contributions of our work are as follows: (1) we combined GAN with attention mechanism and proposed a S � A B . (1) i,j i j novel MAGAN model and (2) we proposed an eﬀective method of enlarging the existing medical image dataset. (en, we performed softmax on the concatenated feature maps S to calculate the distribution of attention D on the i,j i,j 2. Materials and Methods ith position of the jth synthetic region: In the present study, we synthesized liver CT images with tumors based on the mask attention generative adversarial exp􏼐s 􏼑 i,j D � . (2) network (MAGAN) model [12], whose framework is shown i,j 􏽐 exp􏼐s 􏼑 i�1 i,j in Figure 1. Firstly, all the pixels of liver tumors in the original image were labeled by the white color and used as (erefore, the liver tumor mask images were used as the attention map. According to the attention mechanism, attention maps to eﬃciently ﬁnd the liver tumors’ internal liver tumors were the highlighted relevant features of the CT and external characteristics of the images. images, and the attention map was also the key part of the Journal of Healthcare Engineering 3 Generator Synthesized image network Original image Discriminator Pairing B network Real/fake? Attention map Pairing A Original image Figure 1: (e framework of our model: ⊗ represents matrix multiplication. 1 × 1 Original image Feature maps A Softmax Matrix multiplication Distribution of attention value 1 × 1 Attention map Feature maps B Figure 2: (e framework of the attention model. 2.2. Generator Network. (e structure of our generator dilated convolution operator. (e feature maps from both of network is shown in Figure 3, which consisted of two the two contracting paths were ﬁrstly loaded as input to the contracting paths and an expansive path, showing the attention model, whose framework is shown in Figure 2, and U-shape architecture [15]. (e input of these two con- then the distribution of attention value was transferred via tracting paths was the original image and attention map, residual connections. In the expansive path, the spatial in- respectively; both of them consisted of nine blocks, and each formation and the feature information were combined through a sequence of upconvolutions layer, BN layer, ReLu block was composed of the ReLu layer, convolutional layer, and batch normalization (BN) layer. layer, and residual connections with high-resolution features In the contracting path, the image resolution was re- from the attention model. Residual connections played duced but the feature information was increased. To over- important roles in MAGAN, which were used to bypass the come the drawback of a regular convolution operator, whose nonlinear transformation, accelerate the training speed, and receptive ﬁeld was small, we used a dilated convolution upgrade the performance of our model in the training of the operator [16] in the ﬁrst four layers of the contracting path, deep CNN. so that we can capture image features from a larger scale. 512 × 512 original image and attention map were loaded And we used a regular convolution operator in the other ﬁve as inputs into the generator network, and the image reso- layers of the contracting path because the sizes of the images lution was reduced by half while passing each block in the were already smaller than 32 × 32, which cannot support a contracting path. After nine blocks in the contracting path, 4 Journal of Healthcare Engineering 512 × 512 × 1 512 × 512 × 1 512 × 512 × 1 Regular conv. layer Dilated conv. layer Input: Output: Input: ReLu layer original image synthesized image attention image BN layer ReLu layer Tanh layer Upsample layer 256 × 256 × 64 256 × 256 × 64 256 × 256 × 64 A = attention model 128 × 128 × 128 U 128 × 128 × 128 D 128 × 128 × 128 U 64 × 64 × 256 64 × 64 × 256 D 64 × 64 × 256 D 32 × 32 × 512 32 × 32 × 512 = D 32 × 32 × 512 D 16 × 16 × 512 R 16 × 16 × 512 16 × 16 × 512 8 × 8 × 512 U R 8 × 8 × 512 8 × 8 × 512 = 4 × 4 × 512 R 4 × 4 × 512 R 4 × 4 × 512 2 × 2 × 512 U R 2 × 2 × 512 2 × 2 × 512 U R 1 × 1 × 1024 1 × 1 × 1024 R 1 × 1 × 1024 U = Figure 3: (e framework of our generator network. the input images became 1 × 1 with 1024 feature maps. (en, blocks, and each block was composed of a convolutional these feature maps were upconvolved in the expansive path, layer, ReLu layer, BN layer, or sigmoid layer. and the size of the image increased one time while passing (e inputs of the discriminator network were two each block in the expansive path. After nine blocks in the pairings, which were pairing A (original image, attention expansive path, the image was restored as a 512 × 512 res- map) and pairing B (synthesized image, attention map). olution image. In the generator network, the whitened re- Inspired by PatchGAN [12], all the 512 × 512 resolution gions in the liver CT images can be transformed into tumor images were divided into 900 patches, whose size was regions. (e loss function of our generator network is shown 142 × 142. After going through six blocks of the discrimi- as the following formula: nator network, the sizes of output probabilities maps were 30 × 30, which indicated each pixel in the output proba- L (G) � Ε 􏼂‖r − G(v)‖ 􏼃, (3) adv v,r∼p (v,r) 1 data bilities maps corresponded to one patch of the input images. (e mean value of all the pixels in the probabilities maps can where r denotes the real image, v denotes the concatenated be recognized as the result of the discriminator network. image, and G(v) denotes the synthesized image calculated by (e loss function of our discriminator network is shown the generator network. as the following formula: 2.3. Discriminator Network. (e structure of our discrimi- nator network is shown in Figure 4, which consisted of six L (D) � E 􏼂log D(v, r) 􏼃 + E 􏼂log1 − D(v, G(v, r)) 􏼁􏼃, (4) adv v,r∼p (v,r) real v∼p (v) fake data data where r denotes the real image, v denotes the attention map, training of deep learning algorithms, such as liver tumor G(v, r) denotes the synthesized image calculated by the segmentation or classiﬁcation. To enlarge the LiTS, we chose generator network, and D(v, r) denotes the discrimination 4555 2D slices with tumors from 131 sequences of liver CT probability calculated by the discriminator network. images. (en, all the images were normalized by using the (e total loss function of our GAN is shown as the following formula: following formula: value − mean original (6) value � , normalized L � arg min max λ L (G) + λ L (D), 1 adv 2 adv (5) std G D where value and value represent the original original normalized where λ and λ are coeﬃcients. 1 2 and normalized image pixels value, respectively. Mean in- dicate the mean value of image pixels, and std indicate the standard deviation of image pixels. Moreover, we specially 3. Results cut the tumor regions from the liver CT images and built a In our experiments, we used LiTS as our image dataset of liver tumor dataset; then, we augmented the tumor dataset liver CT images with tumors, which consisted of only 131 by ﬂipping, rotating, and scaling the original tumor region so sequences. (e size of LiTS was not big enough for the that we can create a liver tumor dataset of 50000 slices from Journal of Healthcare Engineering 5 512 × 512 × 1 512 × 512 × 1 512 × 512 × 1 Original image Attention map Synthesized image Conv. layer ReLu layer Pairing A Pairing B BN layer Sigmoid layer Probabilities maps 512 × 512 × 6 Concatenation 30 × 30 × 1 256 × 256 × 64 31 × 31 × 1024 32 × 32 × 512 128 × 128 × 128 64 × 64 × 256 64 × 64 × 256 Figure 4: (e framework of our discriminator network. the original 4555 slices, which were used as the mask at- with a learning rate of 0.0002. It costs about ten hours for the tention map in our method. whole procedure of the training. (e hardware and software conﬁguration of our ex- As shown in Figures 5(a)–5(d), we can ﬁnd that, as the periments are shown in Table 1. (e quantitative evaluation number of iterations increased, the performance of the metric used in our experiments was the peak signal to noise synthesized CT liver images became better and better. After ratio (PSNR). (ere were four sections in our experiments, the ﬁrst iteration of training (in Figure 5(a)), the perfor- including training of our model, quantitative comparison mance of the synthesized image from the generator network between our method and other state-of-the-art methods, was terrible; for example, most pixels were black and the Turing test for the synthesized images by radiologists, and contour was blurring, intense chessboard eﬀect. All these the evaluation of the synthetic dataset for the medical image bad performances indicated that the training had just segmentation. started, and more iterations were needed. After ten iterations (in Figure 5(b)), the whole image was more clear, the contour was more vivid, but the chessboard eﬀect still 3.1. Training of Our Model. (e conﬁgurations of hyper- existed. After one hundred iterations (in Figure 5(c)), the parameters in our model during the training are shown in performance of the synthesized image was much better and Table 2. (e proposed MAGAN network was implemented closer to the real image, more details can be visualized, by Python 2.7 and TensorFlow 1.1 and trained on an human organs were vivid, the chessboard eﬀect was weaker NVIDIA GeForce GTX 1080 GPU using Adam optimizer but still existed, and whitened regions were not ﬁlled with 6 Journal of Healthcare Engineering Table 1: Hardware and software conﬁguration of our experiments. networks. And we found that the network without residual connections can provide a PSNR of 55.23, while the Item Conﬁguration MAGAN with residual connections can provide a PSNR of Operating system Ubuntu 16.04 64.72, which indicated the eﬀectiveness of the residual GPU NVIDIA GeForce GTX 1080 connections in our network. (e running time of the pro- CPU Intel Core i5-7500 @3.4 GHz posed method was 0.087 seconds per frame. Software toolkit Python 2.7; TensorFlow 1.1; MATLAB 2016b Besides, we can also manually or automatically “add” Disk 500 GB tumor regions on the healthy liver CT images using our liver GPU memory 8 GB System memory 16 GB tumor dataset of 50000 slices, to create a diseased liver CT image, shown in Figure 10. (e healthy liver CT images were in the ﬁrst row. In the second row, manually change the pixel Table 2: Hyperparameters of our model. values of two regions to white color, which meant that these two regions were the selected tumor regions. Using our Parameter Value method, the results of the synthesized images are shown in Initial learning rate 0.0002 the third row. All these results showed that our method can Adam momentum 0.5 intelligently create liver CT images with tumors based on the λ in formula (5) 100 healthy liver CT images, and the synthesized diseased images λ in formula (5) 1 were almost identical to the real ones. Exponential decay 0.99 Batch_size 1 Epoch 10 3.2. Quantitative Comparison. In this section, we quantita- Dropout 0.5 tively compared our method with other seven state-of-the-art Frequency of saving loss value 100 medical synthesis methods using the same dataset as ours: (1) Frequency of saving model 500 atlas-based method [17]; (2) sparse representation (SR) based method; (3) structured random forest with ACM (SRF+) [18]; (4) manipulable object synthesis (MOS) [19]; (5) deep con- tumor texture. After one thousand iterations (in volutional adversarial networks (DCAN) method [8]; (6) Figure 5(d)), the chessboard eﬀect disappeared, all details of multiconditional GAN(MC-GAN) [20]; and (7) mask em- liver CT were restored, and it was hard to tell the diﬀerences bedding in conditional GAN (ME-cGAN) [21]. (e ﬁrst four between synthesized image and real image. methods were implemented by our group, and the source (e loss function of the generator network, discrimi- codes of DCAN, MOS, and ME-cGAN were downloaded nator network, and total network during the training is from GitHub (http://www.github.com/ginobilinie/ shown in Figures 6–8, respectively, and we can conclude that medSynthesis, http://www.github.com/HYOJINPARK/ the loss functions decreased as the number of iterations MC_GAN, and http://www.github.com/johnryh/ increased and became steady after about 10000 iterations, Face_Embedding_GAN). (e results of the quantitative which indicated that our model performed well during the comparison are shown in Table 3, which indicate that our training. method outperformed the other seven approaches and Results of the synthesized image are shown in Figure 9: beneﬁted from attention mechanism, dilated convolution three liver tumor images with tumor masks were in the ﬁrst operator, and residual connections. row, which was used as inputs of our model, and we can obtain the synthesized images in the second row. We compared the synthesized images and the real images and 3.3. Turing Test. To further verify the eﬀectiveness of our calculated the diﬀerences between them. (e color image of method, we did the Turing test. Two experienced radiologists the diﬀerences is shown in the fourth row. All these results from Shengjing Hospital of China Medical University were showed that our method can synthesize liver CT images with asked to classify one hundred liver CT images into two types: tumors, and the synthesized images were almost identical to real image or synthesized image. (e radiologists were not the real images. aware of the answer to each image before the Turing test. (e To test the impact of the dilated convolution operators in one hundred liver CT images consisted of ﬁfty real CT the MAGAN, we replaced the dilated convolution operators images and ﬁfty synthesized images. (e results of the Turing with the regular convolution operators in the contracting test are shown in Table 4: radiologist number 1 made correct path of the generator network and quantitatively compared judgments for 74% real image slices and 64% synthesized the PSNR of these two GAN networks. And we found that image slices and radiologist number 2 made correct judg- the network with regular convolution operators can provide ments for 84% real image slices and 12% synthesized image a PSNR of 59.66, while the MAGAN with dilated convo- slices. (e radiologists made correct judgments for most of lution operators can provide a PSNR of 64.72, which in- the real images and may be psychologically inﬂuenced by the dicated the eﬀectiveness of the dilated convolution operators existence of a synthesized image, so they made some errors in our network. about the real images. Furthermore, the radiologists made To test the impact of the residual connections in the diﬃcult judgments for the synthesized images and cannot MAGAN, we removed the residual connections and tell the obvious diﬀerences between the real images and the quantitatively compared the PSNR of these two GAN synthesized images. And according to radiologist #1, his Journal of Healthcare Engineering 7 (a) (b) (c) (d) Figure 5: Synthesized image during the training of the proposed model: (a) after one iteration of training, (b) after ten iterations of training, (c) after one hundred iterations of training, (d) after one thousand iterations of training. 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Step ×10 Figure 6: (e loss function of the generator network during the training. Training loss 8 Journal of Healthcare Engineering 1.6 1.5 1.4 1.3 1.2 1.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Step ×10 Figure 7: (e loss function of the discriminator network during the training. 5.5 4.5 3.5 2.5 1.5 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ×10 Step Figure 8: (e loss function of the total network during the training. most reliable evidence of telling the diﬀerence was the color and the FCN model trained by a new dataset can provide a of the tumor region was a little darker than the real ones, Dice value of 0.658 for the tumor segmentation. (e result which was also the improvement we needed to do in the indicated that the synthesized liver CT images obtained by future. All these results of the Turing test indicated that our the proposed method can eﬀectively enlarge the original method can synthesize liver CT images with a tumor, which dataset, and as the number of images in the dataset in- creased, the performance of the training of the deep learning were almost identical to the real ones. model can become better, which resulted in the higher Dice value for the liver tumor segmentation. 3.4. Evaluation of Synthetic Dataset for Medical Image Segmentation. To evaluate the eﬀectiveness of the synthetic 4. Discussion dataset in the training of deep learning models, we used a fully connected network (FCN) [15] to perform the tumor In the present study, we combined the attention mechanism segmentation task in the liver CT images and trained the and GAN model and proposed a novel CT image synthesis FCN model using the LiTS dataset (images from 131 sub- algorithm, which was MAGAN. As far as we know, the jects) and the new dataset obtained by our method (images existing medical image synthesis methods mainly focused on from 131 real subjects and 865 synthetic subjects). And we the transformation of diﬀerent modules or transformation used the Dice Index to quantitatively evaluate the perfor- between the diﬀerent parameter of image acquisition, and mance of the segmentation results from the two trained FCN our study was the ﬁrst research of synthesizing the liver CT models. (e FCN model trained by the LiTS dataset can images with tumors in high resolution and enlarging the size provide a Dice value of 0.611 for the tumor segmentation, of the medical image dataset. Training loss Training loss Journal of Healthcare Engineering 9 Liver CT images with tumors Synthesized images Real images Differences between synthesized and real images –400 –300 –200 –100 0 100 200 300 400 Figure 9: Results of the synthesized images and the comparison between the synthesized images and real images. (e pixel values of the fourth rows are weak and low because the diﬀerences between the real images and synthesized images were very small. Suppose that we had a dataset of chest CT images with method outperforms the others, and the main reasons were lung nodules, whose size was one hundred. While we used the attention map, which mainly focused on the regions of interest in the medical images, such as liver tumors or lung this dataset for the training of deep learning, we may ﬁnd that the trained model was not good enough due to the small nodules. size of the dataset. Under these circumstances, the proposed During the Turing test, two experienced radiologists MAGAN can be used to synthesize thousands of chest CT cannot clearly distinguish the synthesized liver CT images images with lung nodules based on the original one hundred and the real liver CT images. We used the judgments of images. (is kind of similar requirements from clinical experts as the golden standard, and we may conclude that researches and deep learning studies is very common. And the synthesized liver CT images with tumors can be used as the proposed method can meet the requirements. the real ones, and the size of the training dataset of medical From the quantitative comparison between the proposed images can be enlarged from one hundred to thousands. (e method and the other seven state-of-the-art medical image bigger the medical image dataset is, the better the training synthesized methods, we can conclude that the proposed performance can be. 10 Journal of Healthcare Engineering Healthy liver CT images Whitened regions were added in the liver Results of synthesized images Figure 10: Adding tumor regions on the healthy liver CT images and synthesizing diseased liver CT images using our method. Table 3: (e quantitative comparison between our method and seven other approaches. Method Atlas [17] SR SRF+ [18] MOS [19] DCAN [8] MC-GAN [20] ME-cGAN [21] Our method Mean PSNR(dB) 45.15 49.77 55.30 60.11 58.26 59.29 61.35 64.72 Table 4: (e Turing test of our method. Real image (50 slices) Synthesized image (50 slices) Be judged as real Be judged as synthesized Be judged as real Be judged as synthesized images images images images Radiologist number 37 13 18 32 Radiologist number 42 8 44 6 these results meant that, using our method, we can build a 5. Conclusions huge medical image dataset to facilitate the diagnosis of In the present study, we proposed a method of synthesizing computer-aided diagnosis and the training of deep learning. liver CT images with tumors based on mask attention generative adversarial networks. (e experimental results Data Availability showed that our method outperformed the other seven widely used approaches and can achieve 64.72 db mean Liver CT images used in our method were from a public liver PSNR, and the Turing test indicated that even the experi- CT dataset, which is Liver Tumor Segmentation (LiTS), and enced radiologists cannot tell the diﬀerences between the the data can be obtained from https://academictorrents. synthesized images from our method and the real ones. All com/details/27772adef6f563a1ecc0ae19a528b956e6c803ce. Journal of Healthcare Engineering 11 Lecture Notes in Computer Science, Springer, vol. 9351, Conflicts of Interest pp. 234–241, Berlin, Germany, 2015. [16] F. Yu and V. Koltun, “Multi-scale context aggregation by (e authors declare that they have no conﬂicts of interest. dilated convolutions,” in Proceedings of the International Conference of Learning Representations (ICLR), Princeton, NJ, Acknowledgments USA, May 2016. [17] T. Vercauteren, “Diﬀeomorphic demons: eﬃcient Non- (is research was funded by the National key Research and parametric Image Registration,” Neuroimage, vol. 45, Development Project (2018YFB2003200), National Natural no. S61–S72, p. 1, 2009. Science Foundation of China (61973058), and Fundamental [18] T. Huynh, Y. Gao, J. Kang et al., “Estimating ct image from Research Funds for the Central Universities (N2004020). MRI data using structured random forest and auto-context model,” IEEE Transactions on Medical Imaging, vol. 35, no. 1, pp. 174–183, 2016. References [19] S. Liu, E. Gibson, S. Grbic, Z. Xu, A. A. Arnaud et al., “Decompose to manipulate: manipulable object synthesis in [1] Y. Wang, L. Zhou, B. Yu et al., “3D auto-context-based locality 3D medical images with structured image decomposition,” adaptive multi-modality GANs for PET synthesis,” IEEE 2019, https://arxiv.org/abs/1812.01737. Transactions on Medical Imaging, vol. 38, no. 6, pp. 1328– [20] H. Park, Y.J. Yoo, and N. Kwak, “MC-GAN: Multi- 1339, 2019. conditional generative adversarial network for image syn- [2] Y. Wang, B. Yu, L. Wang et al., “3D conditional generative thesis,” 2018, https://arxiv.org/abs/1805.01123. adversarial networks for high-quality PET image estimation at [21] Y. Ren, Z. Zhu, Y. Li, and J. Lo, “Mask embedding in con- low dose,” Neuroimage, vol. 174, pp. 550–562, 2018. ditional GAN for guided synthesis of high resolution images,” [3] http://www.vlfeat.org/matconvnet/pretrained. 2019, https://arxiv.org/abs/1907.01710. [4] C. Baur, S. Albarqouni, and N. Navab, “Generating highly realistic images of skin lesions with GANs,” Lecture Notes in Computer Science, Springer, Berlin, Germany, pp. 260–267, [5] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, “GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classiﬁcation,” Neurocomputing, vol. 321, pp. 321–331, 2018. [6] Q. Yang, P. Yan, Y. Zhang et al., “Low-dose CT image denoising using a generative adversarial network with was- serstein distance and perceptual loss,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1348–1357, 2018. [7] L. Sun, J. Wang, Y. Huang, X. Ding, H. Greenspan, and J. Paisley, “An adversarial learning approach to medical image synthesis for lesion detection,” 2019, https://arxiv.org/abs/ 1810.10850. [8] D. Nie, R. Trullo, J. Lian et al., “Medical image synthesis with deep convolutional adversarial networks,” IEEE Transactions on Biomedical Engineering, vol. 65, no. 12, pp. 2720–2730, [9] J. M. Wolterink, A. M. Dinkla, M. H. F. Savenije, P. R. Seevinck, C. A. T. Berg, and I. Iˇ sgum, “Deep MR to CT synthesis using unpaired data,” in Proceedings of the 2017 International Workshop on Simulation and Synthesis in Medical Imaging, Quebec City, Canada, 2017. [10] C.-B. Jin, H. Kim, M. Liu et al., “Deep CT to MR synthesis using paired and unpaired data,” Sensors, vol. 19, no. 10, p. 2361, 2019. [11] P. Costa, A. Galdran, M. I. Meyer et al., “End-to-end adversarial retinal image synthesis,” IEEE Transactions on Medical Imaging, vol. 37, no. 3, pp. 781–791, 2018. [12] P. Isola, J. Zhu, and T. Zhou, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976, Honolulu, HI, USA, July 2017. [13] https://competitions.codalab.org/competitions/17094#learn_ the_details-overview. [14] P. Bilic, P. F. Christ, E. Vorontsov, and G. Chlebbus, “(e liver tumor segmentation benchmark (LiTS),” 2018, https://arxiv. org/abs/1901.04056. [15] O. Ronneberger, P. Fischer, T. Brox, and U-Net, “U-net: convolutional networks for biomedical image segmentation,” http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Healthcare Engineering Hindawi Publishing Corporation http://www.deepdyve.com/lp/hindawi-publishing-corporation/magan-mask-attention-generative-adversarial-network-for-liver-tumor-ct-iaeb8Z1Q0i

Loading next page...

References (21)

Patrick Bilic, P. Christ, Eugene Vorontsov, G. Chlebus, Hao Chen, Q. Dou, Chi-Wing Fu, Xiao Han, P. Heng, J. Hesser, S. Kadoury, Tomasz Konopczynski, Miao Le, Chunming Li, X. Li, Jana Lipková, J. Lowengrub, H. Meine, J. Moltz, C. Pal, M. Piraud, Xiaojuan Qi, Jin Qi, M. Rempfler, Karsten Roth, A. Schenk, A. Sekuboyina, Ping Zhou, Christian Hülsemeyer, M. Beetz, Florian Ettlinger, Felix Grün, Georgios Kaissis, F. Lohöfer, R. Braren, J. Holch, Felix Hofmann, W. Sommer, V. Heinemann, C. Jacobs, G. Mamani, B. Ginneken, G. Chartrand, A. Tang, M. Drozdzal, Avi Ben-Cohen, E. Klang, M. Amitai, E. Konen, H. Greenspan, Johan Moreau, A. Hostettler, L. Soler, R. Vivanti, Adi Szeskin, N. Lev-Cohain, J. Sosna, Leo Joskowicz, Bjoern Menze, ZENGMING SHEN (2019)
The Liver Tumor Segmentation Benchmark (LiTS)
Medical image analysis, 84
P. Costa, A. Galdran, Maria Meyer, M. Niemeijer, M. Abràmoff, A. Mendonça, A. Campilho (2018)
End-to-End Adversarial Retinal Image Synthesis
IEEE Transactions on Medical Imaging, 37
T. Huynh, Yaozong Gao, Jiayin Kang, Li Wang, Pei Zhang, J. Lian, D. Shen (2016)
Estimating CT Image From MRI Data Using Structured Random Forest and Auto-Context Model
IEEE Transactions on Medical Imaging, 35
Yinhao Ren, Zhe Zhu, Yingzhou Li, J. Lo (2019)
Mask Embedding in conditional GAN for Guided Synthesis of High Resolution Images
ArXiv, abs/1907.01710
Yan Wang, Luping Zhou, Biting Yu, Lei Wang, C. Zu, D. Lalush, Weili Lin, Xi Wu, Jiliu Zhou, D. Shen (2019)
3D Auto-Context-Based Locality Adaptive Multi-Modality GANs for PET Synthesis
IEEE Transactions on Medical Imaging, 38
Cheng-Bin Jin, Wonmo Jung, Seongsu Joo, Eunsik Park, Young Ahn, I. Han, Jae-Il Lee, X. Cui (2018)
Deep CT to MR Synthesis using Paired and Unpaired Data
ArXiv, abs/1805.10790
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei Efros (2016)
Image-to-Image Translation with Conditional Adversarial Networks
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
O. Ronneberger, P. Fischer, T. Brox (2015)
U-Net: Convolutional Networks for Biomedical Image Segmentation
ArXiv, abs/1505.04597
Hyojin Park, Y. Yoo, Nojun Kwak (2018)
MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis
ArXiv, abs/1805.01123
Qingsong Yang, Pingkun Yan, Yanbo Zhang, Hengyong Yu, Yongyi Shi, X. Mou, M. Kalra, Yi Zhang, Ling Sun, Ge Wang (2017)
Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss
IEEE Transactions on Medical Imaging, 37
Liyan Sun, Jiexiang Wang, Yue Huang, Xinghao Ding, H. Greenspan, J. Paisley (2018)
An Adversarial Learning Approach to Medical Image Synthesis for Lesion Detection
IEEE Journal of Biomedical and Health Informatics, 24
F. Yu, V. Koltun (2015)
Multi-Scale Context Aggregation by Dilated Convolutions
CoRR, abs/1511.07122
(https://competitions.codalab.org/competitions/17094#learn_the_details-overview)
https://competitions.codalab.org/competitions/17094#learn_the_details-overview
https://competitions.codalab.org/competitions/17094#learn_the_details-overview, https://competitions.codalab.org/competitions/17094#learn_the_details-overview
J. Wolterink, A. Dinkla, M. Savenije, P. Seevinck, C. Berg, I. Išgum (2017)
Deep MR to CT Synthesis Using Unpaired Data
ArXiv, abs/1708.01155
Siqi Liu, E. Gibson, Sasa Grbic, Zhoubing Xu, A. Setio, J. Yang, B. Georgescu, D. Comaniciu (2018)
Decompose to manipulate: Manipulable Object Synthesis in 3D Medical Images with Structured Image Decomposition
ArXiv, abs/1812.01737
Dong Nie, Roger Trullo, J. Lian, Li Wang, C. Petitjean, S. Ruan, Qian Wang, D. Shen (2018)
Medical Image Synthesis with Deep Convolutional Adversarial Networks
IEEE Transactions on Biomedical Engineering, 65
Maayan Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, H. Greenspan (2018)
GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification
Neurocomputing, 321
(http://www.vlfeat.org/matconvnet/pretrained)
http://www.vlfeat.org/matconvnet/pretrained
http://www.vlfeat.org/matconvnet/pretrained, http://www.vlfeat.org/matconvnet/pretrained
Christoph Baur, Shadi Albarqouni, Nassir Navab (2018)
Generating Highly Realistic Images of Skin Lesions with GANs
Yan Wang, Biting Yu, Lei Wang, C. Zu, D. Lalush, Weili Lin, Xi Wu, Jiliu Zhou, D. Shen, Luping Zhou (2018)
3D conditional generative adversarial networks for high-quality PET image estimation at low dose
NeuroImage, 174
T. Vercauteren, X. Pennec, A. Perchant, N. Ayache (2009)
Diffeomorphic demons: Efficient non-parametric image registration
NeuroImage, 45

Publisher: Hindawi Publishing Corporation
Copyright: Copyright © 2021 Yang Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN: 2040-2295
eISSN: 2040-2309
DOI: 10.1155/2021/6675259
Publisher site: See Article on Publisher Site

Abstract

Hindawi Journal of Healthcare Engineering Volume 2021, Article ID 6675259, 11 pages https://doi.org/10.1155/2021/6675259 Research Article MAGAN: Mask Attention Generative Adversarial Network for Liver Tumor CT Image Synthesis 1 2 2 Yang Liu, Lu Meng , and Jianping Zhong Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110000, China College of Information Science and Engineering, Northeastern University, Shenyang 110000, China Correspondence should be addressed to Lu Meng; menglu1982@gmail.com Received 8 December 2020; Revised 10 January 2021; Accepted 20 January 2021; Published 31 January 2021 Academic Editor: Jialin Peng Copyright © 2021 Yang Liu et al. (is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For deep learning, the size of the dataset greatly aﬀects the ﬁnal training eﬀect. However, in the ﬁeld of computer-aided diagnosis, medical image datasets are often limited and even scarce. We aim to synthesize medical images and enlarge the size of the medical image dataset. In the present study, we synthesized the liver CT images with a tumor based on the mask attention generative adversarial network (MAGAN). We masked the pixels of the liver tumor in the image as the attention map. And both the original image and attention map were loaded into the generator network to obtain the synthesized images. (en, the original images, the attention map, and the synthesized images were all loaded into the discriminator network to determine if the synthesized images were real or fake. Finally, we can use the generator network to synthesize liver CT images with a tumor. (e experiments showed that our method outperformed the other state-of-the-art methods and can achieve a mean peak signal-to-noise ratio (PSNR) of 64.72 dB. All these results indicated that our method can synthesize liver CT images with a tumor and build a large medical image dataset, which may facilitate the progress of medical image analysis and computer-aided diagnosis. An earlier version of our study has been presented as a preprint in the following link: https://www.researchsquare.com/article/rs-41685/v1. colleagues synthesized the images of skin lesions with GAN 1. Introduction [4], which enlarged the skin image dataset and improved the Medical image analysis and processing is the core of performance of lesion segmentation. For liver CT images, computer-aided diagnosis, which has been greatly prompted GAN was mainly used for expanding the dataset of the liver by deep learning. And the training of deep learning can be lesion [5] or image denoising [6], but the focus of GAN was extensively inﬂuenced by the size of the dataset; that is, the only on the liver lesion, not on the whole liver CT images. more datasets can be obtained, the better the performance For brain images [7], there are many image modules, such as the trained deep learning model can achieve. However, in CT images, magnetic resonance (MR) images, and positron the ﬁeld of computer-aided diagnosis, the medical image is emission tomography (PET), and diﬀerent modules have very limited and even scarce, due to the privacy of patients, diﬀerent image acquisition methods and diﬀerent inﬂuences on human brains. Dong Nie and colleagues used GAN to the expense of medical image acquisition, and so on. (erefore, synthesized medical images can be seen as the synthesize 7T images from 3T MR images [8] because 7T only feasible way to solve this problem, and generative magnetic resonance (MR) images were very rare due to the adversarial networks (GAN) [1, 2] provide us a powerful tool expensive image acquisition costs and the side eﬀects of high to realize it. magnetic ﬁeld strength. Moreover, some studies proposed to GAN was ﬁrstly proposed by Goodfellow and colleagues train a GAN to generate CT images from MR images to avoid in 2014 and was widely used in various ﬁelds, such as image the radiation from the CT image acquisition [9, 10]. For processing, natural language processing, and even medical retinal images, the image resolutions were generally smaller image synthesis [3]. For skin lesion images, Baur and than 100 × 100, and the image contents were only limited to 2 Journal of Healthcare Engineering single color background and vessels. Based on the charac- success of the proposed algorithm. In the procedure of image teristics, some studies [11] used GAN to synthesize the whole synthesis, the liver tumor was the saliency map in the whole retinal image to enlarge the retinal image dataset, but the liver CT image, which meant that all the pixels of the liver method cannot be generalized to other medical image tumors were masked by the attention map. (e original modules with bigger image resolution and more organs, image and the attention map were paired together and called such as liver CT image or brain MR image. “pairing A.” (en, the original image and the attention map Above all, all these medical image synthesis methods can were loaded into the generator network to obtain a syn- be categorized into three types: (1) transformation of dif- thesized image, and the attention map and the synthesized ferent modules, such as from CT images to MR images, (2) image were paired together and called “pairing B.” Next, transformation between the diﬀerent parameter of image pairing A and pairing B were both loaded into the dis- acquisition, such as from 3T MR images to 7T MR images, criminator network to determine if the synthesized image and (3) image synthesis of the small resolution, such as skin was real or fake. (e generator network and the discrimi- and retinal images. Although there were many existing nator network were trained with adversarial learning so that methods, medical image synthesis is far from clinical ap- both of them can become more and more powerful. After plications, since there are still some shortcoming. training, the generator network can ﬁll the pixels of the attention map with similar gray values, texture, and shape of liver tumors, to synthesize liver CT images with tumors. 1.1. Image Resolution. Many current medical image syn- More details of our model can be obtained from Sections thesis methods can only synthesize images with low reso- 2.1∼2.3. lution, which were lower than 128 × 128. However, most of the medical images in the clinical application were high image resolution, such as 512 × 512 CT images and 512 × 512 2.1. Attention Model. All liver CT images used in our MR images. method were from a public liver CT dataset, Liver Tumor Segmentation (LiTS) [13, 14], which was from the MICCAI 1.2. Lesions or Tumors. (e current existing medical image 2017 competition. In the LiTS dataset, the pixel distance was synthesis methods cannot synthesize images with abnor- from 0.55 mm to 1.0 mm, the slice spacing was from malities, such as liver lesions and liver tumors. As we know, 0.45 mm to 6.0 mm, and the image resolution was 512 × 512. the size and variety of the training dataset are essential to the LiTS consisted of 131 enhanced CT image sequences, and all performance of deep learning methods. During the training the tumors in the liver CT images were manually labeled by of medical images’ classiﬁcation and analysis, it was essential radiologists. We aimed to synthesize liver CT images with to have both normal images and abnormal images to create tumors, and the synthesized materials were from two as- an eﬀective data set, but the medical images with abnor- pects, liver CT images from healthy controls and liver tumor malities were relatively rare due to the hospital policy, pa- CT images from patients. Moreover, the liver tumor was the tients’ privacy, and so on. (erefore, synthesizing medical most salient region for clinicians and was also the most images with abnormalities can enlarge the dataset of deep diﬃcult part of the whole synthesis procedure. (erefore, learning methods and upgrade the performance. according to the tumor labels from the LiTS dataset, the To solve the shortcomings, we proposed a novel image image values of all the corresponding pixels in the tumors synthesis model for normal liver CT images and liver CT were changed to 4096, which meant “white color,” and images with tumors based on mask attention generative represented as an attention map in our model. Based on the adversarial network (MAGAN). Using this model, we can attention mechanism, the original image and the attention build a liver CT image dataset consisting of thousands of map were transformed into feature maps A and B by using synthesized 512 × 512 slices; furthermore, it also can facili- 1 × 1 convolution, respectively, and then all these feature tate the progress of computer-aided diagnosis and the maps were concatenated by using matrix multiplication, training of deep learning models. shown in Figure 2: (e main contributions of our work are as follows: (1) we combined GAN with attention mechanism and proposed a S � A B . (1) i,j i j novel MAGAN model and (2) we proposed an eﬀective method of enlarging the existing medical image dataset. (en, we performed softmax on the concatenated feature maps S to calculate the distribution of attention D on the i,j i,j 2. Materials and Methods ith position of the jth synthetic region: In the present study, we synthesized liver CT images with tumors based on the mask attention generative adversarial exp􏼐s 􏼑 i,j D � . (2) network (MAGAN) model [12], whose framework is shown i,j 􏽐 exp􏼐s 􏼑 i�1 i,j in Figure 1. Firstly, all the pixels of liver tumors in the original image were labeled by the white color and used as (erefore, the liver tumor mask images were used as the attention map. According to the attention mechanism, attention maps to eﬃciently ﬁnd the liver tumors’ internal liver tumors were the highlighted relevant features of the CT and external characteristics of the images. images, and the attention map was also the key part of the Journal of Healthcare Engineering 3 Generator Synthesized image network Original image Discriminator Pairing B network Real/fake? Attention map Pairing A Original image Figure 1: (e framework of our model: ⊗ represents matrix multiplication. 1 × 1 Original image Feature maps A Softmax Matrix multiplication Distribution of attention value 1 × 1 Attention map Feature maps B Figure 2: (e framework of the attention model. 2.2. Generator Network. (e structure of our generator dilated convolution operator. (e feature maps from both of network is shown in Figure 3, which consisted of two the two contracting paths were ﬁrstly loaded as input to the contracting paths and an expansive path, showing the attention model, whose framework is shown in Figure 2, and U-shape architecture [15]. (e input of these two con- then the distribution of attention value was transferred via tracting paths was the original image and attention map, residual connections. In the expansive path, the spatial in- respectively; both of them consisted of nine blocks, and each formation and the feature information were combined through a sequence of upconvolutions layer, BN layer, ReLu block was composed of the ReLu layer, convolutional layer, and batch normalization (BN) layer. layer, and residual connections with high-resolution features In the contracting path, the image resolution was re- from the attention model. Residual connections played duced but the feature information was increased. To over- important roles in MAGAN, which were used to bypass the come the drawback of a regular convolution operator, whose nonlinear transformation, accelerate the training speed, and receptive ﬁeld was small, we used a dilated convolution upgrade the performance of our model in the training of the operator [16] in the ﬁrst four layers of the contracting path, deep CNN. so that we can capture image features from a larger scale. 512 × 512 original image and attention map were loaded And we used a regular convolution operator in the other ﬁve as inputs into the generator network, and the image reso- layers of the contracting path because the sizes of the images lution was reduced by half while passing each block in the were already smaller than 32 × 32, which cannot support a contracting path. After nine blocks in the contracting path, 4 Journal of Healthcare Engineering 512 × 512 × 1 512 × 512 × 1 512 × 512 × 1 Regular conv. layer Dilated conv. layer Input: Output: Input: ReLu layer original image synthesized image attention image BN layer ReLu layer Tanh layer Upsample layer 256 × 256 × 64 256 × 256 × 64 256 × 256 × 64 A = attention model 128 × 128 × 128 U 128 × 128 × 128 D 128 × 128 × 128 U 64 × 64 × 256 64 × 64 × 256 D 64 × 64 × 256 D 32 × 32 × 512 32 × 32 × 512 = D 32 × 32 × 512 D 16 × 16 × 512 R 16 × 16 × 512 16 × 16 × 512 8 × 8 × 512 U R 8 × 8 × 512 8 × 8 × 512 = 4 × 4 × 512 R 4 × 4 × 512 R 4 × 4 × 512 2 × 2 × 512 U R 2 × 2 × 512 2 × 2 × 512 U R 1 × 1 × 1024 1 × 1 × 1024 R 1 × 1 × 1024 U = Figure 3: (e framework of our generator network. the input images became 1 × 1 with 1024 feature maps. (en, blocks, and each block was composed of a convolutional these feature maps were upconvolved in the expansive path, layer, ReLu layer, BN layer, or sigmoid layer. and the size of the image increased one time while passing (e inputs of the discriminator network were two each block in the expansive path. After nine blocks in the pairings, which were pairing A (original image, attention expansive path, the image was restored as a 512 × 512 res- map) and pairing B (synthesized image, attention map). olution image. In the generator network, the whitened re- Inspired by PatchGAN [12], all the 512 × 512 resolution gions in the liver CT images can be transformed into tumor images were divided into 900 patches, whose size was regions. (e loss function of our generator network is shown 142 × 142. After going through six blocks of the discrimi- as the following formula: nator network, the sizes of output probabilities maps were 30 × 30, which indicated each pixel in the output proba- L (G) � Ε 􏼂‖r − G(v)‖ 􏼃, (3) adv v,r∼p (v,r) 1 data bilities maps corresponded to one patch of the input images. (e mean value of all the pixels in the probabilities maps can where r denotes the real image, v denotes the concatenated be recognized as the result of the discriminator network. image, and G(v) denotes the synthesized image calculated by (e loss function of our discriminator network is shown the generator network. as the following formula: 2.3. Discriminator Network. (e structure of our discrimi- nator network is shown in Figure 4, which consisted of six L (D) � E 􏼂log D(v, r) 􏼃 + E 􏼂log1 − D(v, G(v, r)) 􏼁􏼃, (4) adv v,r∼p (v,r) real v∼p (v) fake data data where r denotes the real image, v denotes the attention map, training of deep learning algorithms, such as liver tumor G(v, r) denotes the synthesized image calculated by the segmentation or classiﬁcation. To enlarge the LiTS, we chose generator network, and D(v, r) denotes the discrimination 4555 2D slices with tumors from 131 sequences of liver CT probability calculated by the discriminator network. images. (en, all the images were normalized by using the (e total loss function of our GAN is shown as the following formula: following formula: value − mean original (6) value � , normalized L � arg min max λ L (G) + λ L (D), 1 adv 2 adv (5) std G D where value and value represent the original original normalized where λ and λ are coeﬃcients. 1 2 and normalized image pixels value, respectively. Mean in- dicate the mean value of image pixels, and std indicate the standard deviation of image pixels. Moreover, we specially 3. Results cut the tumor regions from the liver CT images and built a In our experiments, we used LiTS as our image dataset of liver tumor dataset; then, we augmented the tumor dataset liver CT images with tumors, which consisted of only 131 by ﬂipping, rotating, and scaling the original tumor region so sequences. (e size of LiTS was not big enough for the that we can create a liver tumor dataset of 50000 slices from Journal of Healthcare Engineering 5 512 × 512 × 1 512 × 512 × 1 512 × 512 × 1 Original image Attention map Synthesized image Conv. layer ReLu layer Pairing A Pairing B BN layer Sigmoid layer Probabilities maps 512 × 512 × 6 Concatenation 30 × 30 × 1 256 × 256 × 64 31 × 31 × 1024 32 × 32 × 512 128 × 128 × 128 64 × 64 × 256 64 × 64 × 256 Figure 4: (e framework of our discriminator network. the original 4555 slices, which were used as the mask at- with a learning rate of 0.0002. It costs about ten hours for the tention map in our method. whole procedure of the training. (e hardware and software conﬁguration of our ex- As shown in Figures 5(a)–5(d), we can ﬁnd that, as the periments are shown in Table 1. (e quantitative evaluation number of iterations increased, the performance of the metric used in our experiments was the peak signal to noise synthesized CT liver images became better and better. After ratio (PSNR). (ere were four sections in our experiments, the ﬁrst iteration of training (in Figure 5(a)), the perfor- including training of our model, quantitative comparison mance of the synthesized image from the generator network between our method and other state-of-the-art methods, was terrible; for example, most pixels were black and the Turing test for the synthesized images by radiologists, and contour was blurring, intense chessboard eﬀect. All these the evaluation of the synthetic dataset for the medical image bad performances indicated that the training had just segmentation. started, and more iterations were needed. After ten iterations (in Figure 5(b)), the whole image was more clear, the contour was more vivid, but the chessboard eﬀect still 3.1. Training of Our Model. (e conﬁgurations of hyper- existed. After one hundred iterations (in Figure 5(c)), the parameters in our model during the training are shown in performance of the synthesized image was much better and Table 2. (e proposed MAGAN network was implemented closer to the real image, more details can be visualized, by Python 2.7 and TensorFlow 1.1 and trained on an human organs were vivid, the chessboard eﬀect was weaker NVIDIA GeForce GTX 1080 GPU using Adam optimizer but still existed, and whitened regions were not ﬁlled with 6 Journal of Healthcare Engineering Table 1: Hardware and software conﬁguration of our experiments. networks. And we found that the network without residual connections can provide a PSNR of 55.23, while the Item Conﬁguration MAGAN with residual connections can provide a PSNR of Operating system Ubuntu 16.04 64.72, which indicated the eﬀectiveness of the residual GPU NVIDIA GeForce GTX 1080 connections in our network. (e running time of the pro- CPU Intel Core i5-7500 @3.4 GHz posed method was 0.087 seconds per frame. Software toolkit Python 2.7; TensorFlow 1.1; MATLAB 2016b Besides, we can also manually or automatically “add” Disk 500 GB tumor regions on the healthy liver CT images using our liver GPU memory 8 GB System memory 16 GB tumor dataset of 50000 slices, to create a diseased liver CT image, shown in Figure 10. (e healthy liver CT images were in the ﬁrst row. In the second row, manually change the pixel Table 2: Hyperparameters of our model. values of two regions to white color, which meant that these two regions were the selected tumor regions. Using our Parameter Value method, the results of the synthesized images are shown in Initial learning rate 0.0002 the third row. All these results showed that our method can Adam momentum 0.5 intelligently create liver CT images with tumors based on the λ in formula (5) 100 healthy liver CT images, and the synthesized diseased images λ in formula (5) 1 were almost identical to the real ones. Exponential decay 0.99 Batch_size 1 Epoch 10 3.2. Quantitative Comparison. In this section, we quantita- Dropout 0.5 tively compared our method with other seven state-of-the-art Frequency of saving loss value 100 medical synthesis methods using the same dataset as ours: (1) Frequency of saving model 500 atlas-based method [17]; (2) sparse representation (SR) based method; (3) structured random forest with ACM (SRF+) [18]; (4) manipulable object synthesis (MOS) [19]; (5) deep con- tumor texture. After one thousand iterations (in volutional adversarial networks (DCAN) method [8]; (6) Figure 5(d)), the chessboard eﬀect disappeared, all details of multiconditional GAN(MC-GAN) [20]; and (7) mask em- liver CT were restored, and it was hard to tell the diﬀerences bedding in conditional GAN (ME-cGAN) [21]. (e ﬁrst four between synthesized image and real image. methods were implemented by our group, and the source (e loss function of the generator network, discrimi- codes of DCAN, MOS, and ME-cGAN were downloaded nator network, and total network during the training is from GitHub (http://www.github.com/ginobilinie/ shown in Figures 6–8, respectively, and we can conclude that medSynthesis, http://www.github.com/HYOJINPARK/ the loss functions decreased as the number of iterations MC_GAN, and http://www.github.com/johnryh/ increased and became steady after about 10000 iterations, Face_Embedding_GAN). (e results of the quantitative which indicated that our model performed well during the comparison are shown in Table 3, which indicate that our training. method outperformed the other seven approaches and Results of the synthesized image are shown in Figure 9: beneﬁted from attention mechanism, dilated convolution three liver tumor images with tumor masks were in the ﬁrst operator, and residual connections. row, which was used as inputs of our model, and we can obtain the synthesized images in the second row. We compared the synthesized images and the real images and 3.3. Turing Test. To further verify the eﬀectiveness of our calculated the diﬀerences between them. (e color image of method, we did the Turing test. Two experienced radiologists the diﬀerences is shown in the fourth row. All these results from Shengjing Hospital of China Medical University were showed that our method can synthesize liver CT images with asked to classify one hundred liver CT images into two types: tumors, and the synthesized images were almost identical to real image or synthesized image. (e radiologists were not the real images. aware of the answer to each image before the Turing test. (e To test the impact of the dilated convolution operators in one hundred liver CT images consisted of ﬁfty real CT the MAGAN, we replaced the dilated convolution operators images and ﬁfty synthesized images. (e results of the Turing with the regular convolution operators in the contracting test are shown in Table 4: radiologist number 1 made correct path of the generator network and quantitatively compared judgments for 74% real image slices and 64% synthesized the PSNR of these two GAN networks. And we found that image slices and radiologist number 2 made correct judg- the network with regular convolution operators can provide ments for 84% real image slices and 12% synthesized image a PSNR of 59.66, while the MAGAN with dilated convo- slices. (e radiologists made correct judgments for most of lution operators can provide a PSNR of 64.72, which in- the real images and may be psychologically inﬂuenced by the dicated the eﬀectiveness of the dilated convolution operators existence of a synthesized image, so they made some errors in our network. about the real images. Furthermore, the radiologists made To test the impact of the residual connections in the diﬃcult judgments for the synthesized images and cannot MAGAN, we removed the residual connections and tell the obvious diﬀerences between the real images and the quantitatively compared the PSNR of these two GAN synthesized images. And according to radiologist #1, his Journal of Healthcare Engineering 7 (a) (b) (c) (d) Figure 5: Synthesized image during the training of the proposed model: (a) after one iteration of training, (b) after ten iterations of training, (c) after one hundred iterations of training, (d) after one thousand iterations of training. 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Step ×10 Figure 6: (e loss function of the generator network during the training. Training loss 8 Journal of Healthcare Engineering 1.6 1.5 1.4 1.3 1.2 1.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Step ×10 Figure 7: (e loss function of the discriminator network during the training. 5.5 4.5 3.5 2.5 1.5 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ×10 Step Figure 8: (e loss function of the total network during the training. most reliable evidence of telling the diﬀerence was the color and the FCN model trained by a new dataset can provide a of the tumor region was a little darker than the real ones, Dice value of 0.658 for the tumor segmentation. (e result which was also the improvement we needed to do in the indicated that the synthesized liver CT images obtained by future. All these results of the Turing test indicated that our the proposed method can eﬀectively enlarge the original method can synthesize liver CT images with a tumor, which dataset, and as the number of images in the dataset in- creased, the performance of the training of the deep learning were almost identical to the real ones. model can become better, which resulted in the higher Dice value for the liver tumor segmentation. 3.4. Evaluation of Synthetic Dataset for Medical Image Segmentation. To evaluate the eﬀectiveness of the synthetic 4. Discussion dataset in the training of deep learning models, we used a fully connected network (FCN) [15] to perform the tumor In the present study, we combined the attention mechanism segmentation task in the liver CT images and trained the and GAN model and proposed a novel CT image synthesis FCN model using the LiTS dataset (images from 131 sub- algorithm, which was MAGAN. As far as we know, the jects) and the new dataset obtained by our method (images existing medical image synthesis methods mainly focused on from 131 real subjects and 865 synthetic subjects). And we the transformation of diﬀerent modules or transformation used the Dice Index to quantitatively evaluate the perfor- between the diﬀerent parameter of image acquisition, and mance of the segmentation results from the two trained FCN our study was the ﬁrst research of synthesizing the liver CT models. (e FCN model trained by the LiTS dataset can images with tumors in high resolution and enlarging the size provide a Dice value of 0.611 for the tumor segmentation, of the medical image dataset. Training loss Training loss Journal of Healthcare Engineering 9 Liver CT images with tumors Synthesized images Real images Differences between synthesized and real images –400 –300 –200 –100 0 100 200 300 400 Figure 9: Results of the synthesized images and the comparison between the synthesized images and real images. (e pixel values of the fourth rows are weak and low because the diﬀerences between the real images and synthesized images were very small. Suppose that we had a dataset of chest CT images with method outperforms the others, and the main reasons were lung nodules, whose size was one hundred. While we used the attention map, which mainly focused on the regions of interest in the medical images, such as liver tumors or lung this dataset for the training of deep learning, we may ﬁnd that the trained model was not good enough due to the small nodules. size of the dataset. Under these circumstances, the proposed During the Turing test, two experienced radiologists MAGAN can be used to synthesize thousands of chest CT cannot clearly distinguish the synthesized liver CT images images with lung nodules based on the original one hundred and the real liver CT images. We used the judgments of images. (is kind of similar requirements from clinical experts as the golden standard, and we may conclude that researches and deep learning studies is very common. And the synthesized liver CT images with tumors can be used as the proposed method can meet the requirements. the real ones, and the size of the training dataset of medical From the quantitative comparison between the proposed images can be enlarged from one hundred to thousands. (e method and the other seven state-of-the-art medical image bigger the medical image dataset is, the better the training synthesized methods, we can conclude that the proposed performance can be. 10 Journal of Healthcare Engineering Healthy liver CT images Whitened regions were added in the liver Results of synthesized images Figure 10: Adding tumor regions on the healthy liver CT images and synthesizing diseased liver CT images using our method. Table 3: (e quantitative comparison between our method and seven other approaches. Method Atlas [17] SR SRF+ [18] MOS [19] DCAN [8] MC-GAN [20] ME-cGAN [21] Our method Mean PSNR(dB) 45.15 49.77 55.30 60.11 58.26 59.29 61.35 64.72 Table 4: (e Turing test of our method. Real image (50 slices) Synthesized image (50 slices) Be judged as real Be judged as synthesized Be judged as real Be judged as synthesized images images images images Radiologist number 37 13 18 32 Radiologist number 42 8 44 6 these results meant that, using our method, we can build a 5. Conclusions huge medical image dataset to facilitate the diagnosis of In the present study, we proposed a method of synthesizing computer-aided diagnosis and the training of deep learning. liver CT images with tumors based on mask attention generative adversarial networks. (e experimental results Data Availability showed that our method outperformed the other seven widely used approaches and can achieve 64.72 db mean Liver CT images used in our method were from a public liver PSNR, and the Turing test indicated that even the experi- CT dataset, which is Liver Tumor Segmentation (LiTS), and enced radiologists cannot tell the diﬀerences between the the data can be obtained from https://academictorrents. synthesized images from our method and the real ones. All com/details/27772adef6f563a1ecc0ae19a528b956e6c803ce. Journal of Healthcare Engineering 11 Lecture Notes in Computer Science, Springer, vol. 9351, Conflicts of Interest pp. 234–241, Berlin, Germany, 2015. [16] F. Yu and V. Koltun, “Multi-scale context aggregation by (e authors declare that they have no conﬂicts of interest. dilated convolutions,” in Proceedings of the International Conference of Learning Representations (ICLR), Princeton, NJ, Acknowledgments USA, May 2016. [17] T. Vercauteren, “Diﬀeomorphic demons: eﬃcient Non- (is research was funded by the National key Research and parametric Image Registration,” Neuroimage, vol. 45, Development Project (2018YFB2003200), National Natural no. S61–S72, p. 1, 2009. Science Foundation of China (61973058), and Fundamental [18] T. Huynh, Y. Gao, J. Kang et al., “Estimating ct image from Research Funds for the Central Universities (N2004020). MRI data using structured random forest and auto-context model,” IEEE Transactions on Medical Imaging, vol. 35, no. 1, pp. 174–183, 2016. References [19] S. Liu, E. Gibson, S. Grbic, Z. Xu, A. A. Arnaud et al., “Decompose to manipulate: manipulable object synthesis in [1] Y. Wang, L. Zhou, B. Yu et al., “3D auto-context-based locality 3D medical images with structured image decomposition,” adaptive multi-modality GANs for PET synthesis,” IEEE 2019, https://arxiv.org/abs/1812.01737. Transactions on Medical Imaging, vol. 38, no. 6, pp. 1328– [20] H. Park, Y.J. Yoo, and N. Kwak, “MC-GAN: Multi- 1339, 2019. conditional generative adversarial network for image syn- [2] Y. Wang, B. Yu, L. Wang et al., “3D conditional generative thesis,” 2018, https://arxiv.org/abs/1805.01123. adversarial networks for high-quality PET image estimation at [21] Y. Ren, Z. Zhu, Y. Li, and J. Lo, “Mask embedding in con- low dose,” Neuroimage, vol. 174, pp. 550–562, 2018. ditional GAN for guided synthesis of high resolution images,” [3] http://www.vlfeat.org/matconvnet/pretrained. 2019, https://arxiv.org/abs/1907.01710. [4] C. Baur, S. Albarqouni, and N. Navab, “Generating highly realistic images of skin lesions with GANs,” Lecture Notes in Computer Science, Springer, Berlin, Germany, pp. 260–267, [5] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, “GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classiﬁcation,” Neurocomputing, vol. 321, pp. 321–331, 2018. [6] Q. Yang, P. Yan, Y. Zhang et al., “Low-dose CT image denoising using a generative adversarial network with was- serstein distance and perceptual loss,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1348–1357, 2018. [7] L. Sun, J. Wang, Y. Huang, X. Ding, H. Greenspan, and J. Paisley, “An adversarial learning approach to medical image synthesis for lesion detection,” 2019, https://arxiv.org/abs/ 1810.10850. [8] D. Nie, R. Trullo, J. Lian et al., “Medical image synthesis with deep convolutional adversarial networks,” IEEE Transactions on Biomedical Engineering, vol. 65, no. 12, pp. 2720–2730, [9] J. M. Wolterink, A. M. Dinkla, M. H. F. Savenije, P. R. Seevinck, C. A. T. Berg, and I. Iˇ sgum, “Deep MR to CT synthesis using unpaired data,” in Proceedings of the 2017 International Workshop on Simulation and Synthesis in Medical Imaging, Quebec City, Canada, 2017. [10] C.-B. Jin, H. Kim, M. Liu et al., “Deep CT to MR synthesis using paired and unpaired data,” Sensors, vol. 19, no. 10, p. 2361, 2019. [11] P. Costa, A. Galdran, M. I. Meyer et al., “End-to-end adversarial retinal image synthesis,” IEEE Transactions on Medical Imaging, vol. 37, no. 3, pp. 781–791, 2018. [12] P. Isola, J. Zhu, and T. Zhou, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976, Honolulu, HI, USA, July 2017. [13] https://competitions.codalab.org/competitions/17094#learn_ the_details-overview. [14] P. Bilic, P. F. Christ, E. Vorontsov, and G. Chlebbus, “(e liver tumor segmentation benchmark (LiTS),” 2018, https://arxiv. org/abs/1901.04056. [15] O. Ronneberger, P. Fischer, T. Brox, and U-Net, “U-net: convolutional networks for biomedical image segmentation,”

Journal

Journal of Healthcare Engineering – Hindawi Publishing Corporation

Published: Jan 31, 2021

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

MAGAN: Mask Attention Generative Adversarial Network for Liver Tumor CT Image Synthesis

MAGAN: Mask Attention Generative Adversarial Network for Liver Tumor CT Image Synthesis

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

MAGAN: Mask Attention Generative Adversarial Network for Liver Tumor CT Image Synthesis

MAGAN: Mask Attention Generative Adversarial Network for Liver Tumor CT Image Synthesis

References (21)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies