Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

GANs for generation of synthetic ultrasound images from small datasets

GANs for generation of synthetic ultrasound images from small datasets DE GRUYTER Current Directions in Biomedical Engineering 2022;8(1): 17-20 Lennart Maack*, Lennart Holstein, and Alexander Schlaefer GANs for generation of synthetic ultrasound images from small datasets https://doi.org/10.1515/cdbme-2022-0005 one of the most important tools for diagnosing various diseases such as breast cancer [1]. The acquired images contain infor- Abstract: The task of medical image classification is increas- mation that must be comprehensively analysed by medical ex- ingly supported by algorithms. Deep learning methods like perts in a short time. A typical application of medical image convolutional neural networks (CNNs) show superior perfor- analysis is the classification of diseases in ultrasound images. mance in medical image analysis but need a high-quality train- Through the support of different algorithms, additional infor- ing dataset with a large number of annotated samples. Partic- mation is provided to the physician. This increases the chances ularly in the medical domain, the availability of such datasets of accurately identifying incidental findings in an automated is rare due to data privacy or the lack of data sharing practices manner and can lead to an improved clinical workflow. among institutes. Generative adversarial networks (GANs) are Especially deep learning methods like convolutional neu- able to generate high quality synthetic images. This work in- ral networks (CNNs) gained significant importance due to vestigates the capabilities of different state-of-the-art GAN ar- their superior performance in medical image analysis com- chitectures in generating realistic breast ultrasound images if pared to many explicit algorithms [2]. To be successful, CNNs only a small amount of training data is available. In a second need a high-quality training dataset with a large number of step, these synthetic images are used to augment the real ul- annotated samples, which are particularly scarce in the med- trasound image dataset utilized for training CNNs. The train- ical field. To artificially enlarge the training dataset, typical ing of both GANs and CNNs is conducted with systemati- data augmentation techniques are used [3]. These techniques cally reduced dataset sizes. The GAN architectures are ca- are limited in creating completely new patterns in the dataset pable of generating realistic ultrasound images. GANs using since they use a finite set of known invariances that are easy to data augmentation techniques outperform the baseline Style- invoke [4]. Generative adversarial networks (GANs) showed GAN2 with respect to the Fréchet Inception distance by up significant results in the generation of realistic images and can to 64.2%. CNN models trained with additional synthetic data be used to extend the training dataset with synthetic images outperform the baseline CNN model using only real data for [5]. In the ultrasound image domain, GANs have been used training by up to 15.3% with respect to the F1 score, espe- to generate images to extend a training dataset, which lead to cially for datasets containing less than 100 images. As a con- an improved classification performance of fetal brain anoma- clusion, GANs can successfully be used to generate synthetic lies [6]. However, GANs require sufficient amounts of training ultrasound images of high quality and diversity, improve clas- data to be able to synthesise realistic images. The influence of sification performance of CNNs and thus provide a benefit to the amount of training data on GAN performance has not been computer-aided diagnostics. considered in the previous work. Keywords: deep learning, medical image analysis, image In this work, we analyse the performance of GANs in the classification, ultrasound imaging, generative adversarial net- case of smaller available medical ultrasound datasets. For this works (GANs), synthetic image generation, small datasets purpose, we systematically reduce the amount of images used for training state-of-the art GANs. Furthermore the GANs’ performance and the influence of the corresponding generated 1 Introduction synthetic images on the performance of CNNs are evaluated. Ultrasound imaging is among the most cost-effective and portable modalities to acquire medical images today, making it 2 Methods and Materials *Corresponding author: Lennart Maack, Institute of Medical Technology and Intelligent Systems, Hamburg University of 2.1 Dataset Technology, Am Schwarzenberg-Campus 3, Hamburg, Germany, e-mail: lennart.maack@tuhh.de The Breast Ultrasound dataset (BUS) consists of 780 grayscale Lennart Holstein, Alexander Schlaefer, Institute of Medical images with an average image size of 500 × 500 pixels, col- Technology and Intelligent Systems, Hamburg University of Tech- lected among 600 female patients aged between 25 and 75 nology, Hamburg, Germany Open Access. © 2022 The Author(s), published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 International License. 17 Maack et al., GANs for synthetic medical image generation w c 2 1 Latent z ∈ Z is Differentiable Augmentation (DiffAug), another strategy to style circumvent the "leaking" problem by updating the generator block Normalize Mod with transformed samples using the generator loss 𝐿 [10]: Mapping 𝐺 Demod Conv 3x3 network f 𝐿 = E [𝑓 (−𝐷 (𝑇(𝐺(z))))]. (1) 𝐺 𝐺 FC 𝑧 𝑝(z) b + B FC The following augmentations are used: translation within [- FC style FC block 1/8, 1/8] of the image size and padded with zeros, as well as FC A Mod cutout. For all GAN experiments in this work, we adapt the Upsample FC implementation details of StyleGAN2 that achieved state-of- Demod Conv 3x3 FC the-art results for the LSUN datasets. FC + B For evaluating the different GAN methods, we use the Fréchet Inception Distance (FID) as the main metric [11]. The w ∈ W ... FID is determined by measuring the distance in terms of mean and covariance matrix between two data distributions that are Fig. 1: StyleGAN2 architecture with its two main components mapping network and style block explained in the main text. calculated from the image features of the real and synthetic im- ages, respectively. In order to further assess the quality of the synthetic images, the mean structural similarity index (SSIM), years [7]. The images are categorized into the three different as well as the Jensen-Shannon distance (JS distance) between classes: normal, benign and malignant. Before a systematic re- the gray value distributions of the real images from the testset duction of the BUS dataset can be applied, we split the dataset and the generated images are calculated. into a test set with 280 images and a training set of 500 im- ages, with the same distribution of classes across the different splits. All images are resized to 256 × 256 and normalized. 2.3 Classification evaluation Synthetic images generated by the different GAN methods are 2.2 GAN architectures and evaluation used to improve the classification performance of CNNs with EfficientNetb2 architecture [12]. Our baseline model is pre- The baseline architecture to generate synthetic ultrasound im- trained on ImageNet and finetuned with the real dataset only. ages is StyleGAN2 [8]. The generator architecture is visual- Other setups to compare with the baseline consist of pretrained ized in Figure 1 and utilizes two main components. The first models that are finetuned on combined real and synthetic data. component is the noise mapping network 𝑓 which consists of The amounts of added synthetic data are 50%, 100% and 200% eight fully connected layers and takes in a noise vector 𝑧 as relative to the real image dataset size. During the experiments, input and maps it into an intermediate noise vector 𝑤 to get a we show that pretrained models with 100% additional syn- more disentangled latent space. The second component is the thetic data achieve the best results. Therefore, only these re- so called style block. It takes the vector 𝑤 through a learned sults are presented in section 3.2. All CNN models are trained affine transformation 𝐴 and converts it to a parameter that using stratified k-fold cross validation and evaluated with the scales the initial convolutional weights 𝑤 of the input feature F1-score, a harmonic mean of the precision and recall. For our maps and controls the style details of the generated image. Af- multiclass problem, the F1-score is micro-averaged, i.e. glob- ter this modulation, a demodulation step to remove the effect ally counting the total true positives, false negatives and false of scaling from the statistics of the convolution output feature positives over the three classes. map is applied. Additionally, bias 𝑏 and a random noise tensor 𝐵 are inserted after each style block. To minimize the discriminator’s chance of overfitting and prevent the leaking of augmentations to the generated images, 3 Results we use adaptive discriminator augmentation (ADA) as the first GAN method in this work [9]. ADA implements an adaptive 3.1 Image Generation part that dynamically tunes the augmentation strength during the training using the overfitting heuristic 𝑟 . The augmenta- Figure 2 shows the FID scores of the different GAN models tions used for training consist of pixel blitting and general for each dataset size used for training. The FID scores of the geometric transformations. All augmentations are invertible baseline range from 140.6 ± 6.9 for the models trained with and differentiable. The second GAN method used in this work 500 images to 219.8 ± 27.62 for the models trained with 50 18 Maack et al., GANs for synthetic medical image generation Tab. 2: Classification results on BUS for CNN models with the StyleGAN2 respective number of images and type of additional synthetic data StyleGAN2 ADA StyleGAN2 DiffAugment generated by different GANs used for training. # images Extra synthetic data F1 score [%] p-value No synthetic data 81.14± 5.69 / StyleGAN2 87.14± 1.95 0.084 StyleGAN2 ADA 83.71± 3.49 0.478 StyleGAN2 DiffAug 85.57± 2.12 0.253 No synthetic data 75.86± 3.19 / StyleGAN2 79.29± 2.92 0.06 500 250 100 50 Dataset size for Training StyleGAN2 ADA 80.36± 2.52 < 0.01 StyleGAN2 DiffAug 80.86± 4.31 0.078 No synthetic data 62.79± 5.41 / Fig. 2: FID’s mean and std. of the different GAN models for each StyleGAN2 71.51± 6.51 < 0.01 dataset size. Low FID values indicate better GAN performance. StyleGAN2 ADA 74.14± 4.76 0.02 StyleGAN2 DiffAug 72.79± 3.85 < 0.01 Tab. 1: JS distance and SSIM of synthetic images generated by No synthetic data 59.36± 5.22 / the GAN models trained with different number of training images. StyleGAN2 69.64± 2.42 < 0.01 StyleGAN2 ADA 66.71± 4.81 0.03 # images GAN model JS (↓) SSIM (↑) StyleGAN2 DiffAug 67.07± 3.44 < 0.01 StyleGAN2 0.123 0.175± 0.05 500 StyleGAN2 ADA 0.096 0.156± 0.05 StyleGAN2 DiffAug 0.1 0.16± 0.05 F1 score. There is no indication, that synthetic images gener- StyleGAN2 0.133 0.151± 0.04 ated by a specific GAN result in an increased improvement of 250 StyleGAN2 ADA 0.091 0.155± 0.04 classification performance. To check if the F1 scores between StyleGAN2 DiffAug 0.1 0.163± 0.05 the baseline model and the models trained with an extended StyleGAN2 0.163 0.147± 0.04 100 StyleGAN2 ADA 0.1 0.158± 0.05 dataset differ significantly, the pairwise t-test is conducted. A StyleGAN2 DiffAug 0.097 0.15± 0.04 p-value below 0.05 indicates a significant difference. StyleGAN2 0.184 0.135± 0.03 50 StyleGAN2 ADA 0.097 0.138± 0.04 StyleGAN2 DiffAug 0.1 0.15± 0.04 4 Discussion images. For the ADA and DiffAug models, the FID scores The examined GAN models are able to generate realistic ul- range from 87.97 ± 3.28 and 95 ± 4.78 to 166.98 ± 32.23 and trasound images that show high quality details and reproduce 133.80 ± 7.23, respectively. ADA and DiffAug outperform the the typical speckle pattern in ultrasound images. Our qualita- baseline in terms of FID score. The SSIM score, as well as the tive assessment of artefacts and mode collapse detectable in JS distance values are displayed in Table 1. the synthetic images generated by the different GAN models Sample images of each GAN method trained with differ- correlates with the corresponding FID scores. The JS distance ent dataset sizes are displayed in Figure 3. Whereas ADA and metric indicates the same trend as the FID score for all mod- DiffAug generate images with high fidelity, even with only 50 els, whereas the SSIM metric shows slightly different results. training images available, the baseline GAN synthesises im- The FID scores of all models decrease when trained with less ages with a more blurry and wavy pattern the less training im- than or equal to 100 images, which might be due to the dis- ages are available. Furthermore, the baseline generates images criminator’s overfitting and the lack of useful information fed with lower diversity in comparison to ADA and DiffAug when back to the generator. StyleGAN2 is not able to generate syn- trained with smaller dataset sizes. thetic images with the same quality as ADA or DiffAug. Es- pecially for lower training dataset sizes, the baseline gener- ates synthetic ultrasound images with less fidelity and diver- 3.2 Image Classification sity compared to the two used augmentation methods. The use of data augmentation in GANs leads to the generation of di- Table 2 shows the classification results in terms of the F1 score verse and high quality synthetic samples even with only 50 of the CNN models trained with different dataset sizes. The training images available. All CNN models trained with addi- CNN models that use extra synthetic images for training out- tional synthetic data outperfom the baseline in terms of mean perform the baseline model for all dataset sizes in terms of the FID Maack et al., GANs for synthetic medical image generation real 500 250 100 50 normal malignant benign malignant benign benign malignant benign malignant benign malignant malignant benign malignant benign Fig. 3: Sample images from the BUS dataset and synthetic images generated by the different GANs StyleGAN2, ADA, DiffAug (num- bers above the column indicate the number of images used for training). The class of the image is displayed in the lower right corner. F1-score. Especially for dataset sizes 100 and 50, the improve- References ments with synthetic images are statistically significant. How- ever, the higher quality and diversity in the synthetic datasets [1] Cheng, H.D., Shan, J., Ju, W., Guo, Y., Zhang, L.: Automated generated by ADA and DiffAug do not increase the classifi- breast cancer detection and classification using ultrasound cation performance of the CNNs in comparison to synthetic images: A survey. Pattern Recognition 43(1), 299–317 (2010) [2] Domingues, I., Pereira, G., Martins, P. et al. Using deep datasets generated by the baseline GAN. A possible reason learning techniques in medical imaging: a systematic re- for this may be that the synthetic images with lower quality view of applications on CT and PET. AI Rev 53, 4093–4160 still contain useful new features to successfully enhance the (2020). training dataset. Another reason might be, that the particular [3] Shorten C, Khoshgoftaar TM A survey on image data aug- augmentations applied in ADA and DiffAug do not add much mentation for deep learning. J Big Data 6:50 (2019) value to the feature space needed to improve medical ultra- [4] Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. American Economic Jour- sound image classification. nal: Applied Economics (2018) [5] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Bengio, Y. et al.: Generative adversarial networks. Advances in Neural Information Processing Systems (NeurIPS) 27 (2014) 5 Conclusion [6] Montero, A.; Bonet-Carne, E.; Burgos-Artizzu, X.P. Gen- erative Adversarial Networks to Improve Fetal Brain Fine- In this work, we investigate the capabilities of GANs for gen- Grained Plane Classification. Sensors 21, 7975 (2021) erating high quality synthetic breast ultrasound images from [7] Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data in brief 28, 104863 (2020) small datasets. We show that especially data augmentation [8] Karras, T., Laine, S., Aittala, M., Aila, T. et al.: Analyzing techniques such as ADA and DiffAug improve the image and improving the image quality of stylegan. CVPR pp. quality and diversity when only small datasets are available. 8110–8119 (2020) Synthetic ultrasound images can improve the performance [9] Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., of CNNs used for classification. However, our results also Aila, T.: Training generative adversarial networks with limited indicate that higher visual quality of synthetic data does not data. NeurIPS 33 (2020) [10] Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable directly correlate with added value for training CNNs. augmentation for data-efficient gan training. NeurIPS 33 (2020) Author Statement [11] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Research funding: The author state no funding involved. Hochreiter, S.: Gans trained by a two time-scale update rule Conflict of interest: Authors state no conflict of interest. converge to a local nash equilibrium. NeurIPS (2017) [12] Tan, M., Le V, Q.: Efficientnet: Rethinking model scaling for Informed consent / Ethical approval: Not applicable, since a convolutional neural networks. ICML (2019) publicly available dataset was used. DiffAugment ADA StyleGAN2 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Current Directions in Biomedical Engineering de Gruyter

GANs for generation of synthetic ultrasound images from small datasets

Loading next page...
 
/lp/de-gruyter/gans-for-generation-of-synthetic-ultrasound-images-from-small-datasets-9v0tWzYHEd
Publisher
de Gruyter
Copyright
© 2022 by Walter de Gruyter Berlin/Boston
eISSN
2364-5504
DOI
10.1515/cdbme-2022-0005
Publisher site
See Article on Publisher Site

Abstract

DE GRUYTER Current Directions in Biomedical Engineering 2022;8(1): 17-20 Lennart Maack*, Lennart Holstein, and Alexander Schlaefer GANs for generation of synthetic ultrasound images from small datasets https://doi.org/10.1515/cdbme-2022-0005 one of the most important tools for diagnosing various diseases such as breast cancer [1]. The acquired images contain infor- Abstract: The task of medical image classification is increas- mation that must be comprehensively analysed by medical ex- ingly supported by algorithms. Deep learning methods like perts in a short time. A typical application of medical image convolutional neural networks (CNNs) show superior perfor- analysis is the classification of diseases in ultrasound images. mance in medical image analysis but need a high-quality train- Through the support of different algorithms, additional infor- ing dataset with a large number of annotated samples. Partic- mation is provided to the physician. This increases the chances ularly in the medical domain, the availability of such datasets of accurately identifying incidental findings in an automated is rare due to data privacy or the lack of data sharing practices manner and can lead to an improved clinical workflow. among institutes. Generative adversarial networks (GANs) are Especially deep learning methods like convolutional neu- able to generate high quality synthetic images. This work in- ral networks (CNNs) gained significant importance due to vestigates the capabilities of different state-of-the-art GAN ar- their superior performance in medical image analysis com- chitectures in generating realistic breast ultrasound images if pared to many explicit algorithms [2]. To be successful, CNNs only a small amount of training data is available. In a second need a high-quality training dataset with a large number of step, these synthetic images are used to augment the real ul- annotated samples, which are particularly scarce in the med- trasound image dataset utilized for training CNNs. The train- ical field. To artificially enlarge the training dataset, typical ing of both GANs and CNNs is conducted with systemati- data augmentation techniques are used [3]. These techniques cally reduced dataset sizes. The GAN architectures are ca- are limited in creating completely new patterns in the dataset pable of generating realistic ultrasound images. GANs using since they use a finite set of known invariances that are easy to data augmentation techniques outperform the baseline Style- invoke [4]. Generative adversarial networks (GANs) showed GAN2 with respect to the Fréchet Inception distance by up significant results in the generation of realistic images and can to 64.2%. CNN models trained with additional synthetic data be used to extend the training dataset with synthetic images outperform the baseline CNN model using only real data for [5]. In the ultrasound image domain, GANs have been used training by up to 15.3% with respect to the F1 score, espe- to generate images to extend a training dataset, which lead to cially for datasets containing less than 100 images. As a con- an improved classification performance of fetal brain anoma- clusion, GANs can successfully be used to generate synthetic lies [6]. However, GANs require sufficient amounts of training ultrasound images of high quality and diversity, improve clas- data to be able to synthesise realistic images. The influence of sification performance of CNNs and thus provide a benefit to the amount of training data on GAN performance has not been computer-aided diagnostics. considered in the previous work. Keywords: deep learning, medical image analysis, image In this work, we analyse the performance of GANs in the classification, ultrasound imaging, generative adversarial net- case of smaller available medical ultrasound datasets. For this works (GANs), synthetic image generation, small datasets purpose, we systematically reduce the amount of images used for training state-of-the art GANs. Furthermore the GANs’ performance and the influence of the corresponding generated 1 Introduction synthetic images on the performance of CNNs are evaluated. Ultrasound imaging is among the most cost-effective and portable modalities to acquire medical images today, making it 2 Methods and Materials *Corresponding author: Lennart Maack, Institute of Medical Technology and Intelligent Systems, Hamburg University of 2.1 Dataset Technology, Am Schwarzenberg-Campus 3, Hamburg, Germany, e-mail: lennart.maack@tuhh.de The Breast Ultrasound dataset (BUS) consists of 780 grayscale Lennart Holstein, Alexander Schlaefer, Institute of Medical images with an average image size of 500 × 500 pixels, col- Technology and Intelligent Systems, Hamburg University of Tech- lected among 600 female patients aged between 25 and 75 nology, Hamburg, Germany Open Access. © 2022 The Author(s), published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 International License. 17 Maack et al., GANs for synthetic medical image generation w c 2 1 Latent z ∈ Z is Differentiable Augmentation (DiffAug), another strategy to style circumvent the "leaking" problem by updating the generator block Normalize Mod with transformed samples using the generator loss 𝐿 [10]: Mapping 𝐺 Demod Conv 3x3 network f 𝐿 = E [𝑓 (−𝐷 (𝑇(𝐺(z))))]. (1) 𝐺 𝐺 FC 𝑧 𝑝(z) b + B FC The following augmentations are used: translation within [- FC style FC block 1/8, 1/8] of the image size and padded with zeros, as well as FC A Mod cutout. For all GAN experiments in this work, we adapt the Upsample FC implementation details of StyleGAN2 that achieved state-of- Demod Conv 3x3 FC the-art results for the LSUN datasets. FC + B For evaluating the different GAN methods, we use the Fréchet Inception Distance (FID) as the main metric [11]. The w ∈ W ... FID is determined by measuring the distance in terms of mean and covariance matrix between two data distributions that are Fig. 1: StyleGAN2 architecture with its two main components mapping network and style block explained in the main text. calculated from the image features of the real and synthetic im- ages, respectively. In order to further assess the quality of the synthetic images, the mean structural similarity index (SSIM), years [7]. The images are categorized into the three different as well as the Jensen-Shannon distance (JS distance) between classes: normal, benign and malignant. Before a systematic re- the gray value distributions of the real images from the testset duction of the BUS dataset can be applied, we split the dataset and the generated images are calculated. into a test set with 280 images and a training set of 500 im- ages, with the same distribution of classes across the different splits. All images are resized to 256 × 256 and normalized. 2.3 Classification evaluation Synthetic images generated by the different GAN methods are 2.2 GAN architectures and evaluation used to improve the classification performance of CNNs with EfficientNetb2 architecture [12]. Our baseline model is pre- The baseline architecture to generate synthetic ultrasound im- trained on ImageNet and finetuned with the real dataset only. ages is StyleGAN2 [8]. The generator architecture is visual- Other setups to compare with the baseline consist of pretrained ized in Figure 1 and utilizes two main components. The first models that are finetuned on combined real and synthetic data. component is the noise mapping network 𝑓 which consists of The amounts of added synthetic data are 50%, 100% and 200% eight fully connected layers and takes in a noise vector 𝑧 as relative to the real image dataset size. During the experiments, input and maps it into an intermediate noise vector 𝑤 to get a we show that pretrained models with 100% additional syn- more disentangled latent space. The second component is the thetic data achieve the best results. Therefore, only these re- so called style block. It takes the vector 𝑤 through a learned sults are presented in section 3.2. All CNN models are trained affine transformation 𝐴 and converts it to a parameter that using stratified k-fold cross validation and evaluated with the scales the initial convolutional weights 𝑤 of the input feature F1-score, a harmonic mean of the precision and recall. For our maps and controls the style details of the generated image. Af- multiclass problem, the F1-score is micro-averaged, i.e. glob- ter this modulation, a demodulation step to remove the effect ally counting the total true positives, false negatives and false of scaling from the statistics of the convolution output feature positives over the three classes. map is applied. Additionally, bias 𝑏 and a random noise tensor 𝐵 are inserted after each style block. To minimize the discriminator’s chance of overfitting and prevent the leaking of augmentations to the generated images, 3 Results we use adaptive discriminator augmentation (ADA) as the first GAN method in this work [9]. ADA implements an adaptive 3.1 Image Generation part that dynamically tunes the augmentation strength during the training using the overfitting heuristic 𝑟 . The augmenta- Figure 2 shows the FID scores of the different GAN models tions used for training consist of pixel blitting and general for each dataset size used for training. The FID scores of the geometric transformations. All augmentations are invertible baseline range from 140.6 ± 6.9 for the models trained with and differentiable. The second GAN method used in this work 500 images to 219.8 ± 27.62 for the models trained with 50 18 Maack et al., GANs for synthetic medical image generation Tab. 2: Classification results on BUS for CNN models with the StyleGAN2 respective number of images and type of additional synthetic data StyleGAN2 ADA StyleGAN2 DiffAugment generated by different GANs used for training. # images Extra synthetic data F1 score [%] p-value No synthetic data 81.14± 5.69 / StyleGAN2 87.14± 1.95 0.084 StyleGAN2 ADA 83.71± 3.49 0.478 StyleGAN2 DiffAug 85.57± 2.12 0.253 No synthetic data 75.86± 3.19 / StyleGAN2 79.29± 2.92 0.06 500 250 100 50 Dataset size for Training StyleGAN2 ADA 80.36± 2.52 < 0.01 StyleGAN2 DiffAug 80.86± 4.31 0.078 No synthetic data 62.79± 5.41 / Fig. 2: FID’s mean and std. of the different GAN models for each StyleGAN2 71.51± 6.51 < 0.01 dataset size. Low FID values indicate better GAN performance. StyleGAN2 ADA 74.14± 4.76 0.02 StyleGAN2 DiffAug 72.79± 3.85 < 0.01 Tab. 1: JS distance and SSIM of synthetic images generated by No synthetic data 59.36± 5.22 / the GAN models trained with different number of training images. StyleGAN2 69.64± 2.42 < 0.01 StyleGAN2 ADA 66.71± 4.81 0.03 # images GAN model JS (↓) SSIM (↑) StyleGAN2 DiffAug 67.07± 3.44 < 0.01 StyleGAN2 0.123 0.175± 0.05 500 StyleGAN2 ADA 0.096 0.156± 0.05 StyleGAN2 DiffAug 0.1 0.16± 0.05 F1 score. There is no indication, that synthetic images gener- StyleGAN2 0.133 0.151± 0.04 ated by a specific GAN result in an increased improvement of 250 StyleGAN2 ADA 0.091 0.155± 0.04 classification performance. To check if the F1 scores between StyleGAN2 DiffAug 0.1 0.163± 0.05 the baseline model and the models trained with an extended StyleGAN2 0.163 0.147± 0.04 100 StyleGAN2 ADA 0.1 0.158± 0.05 dataset differ significantly, the pairwise t-test is conducted. A StyleGAN2 DiffAug 0.097 0.15± 0.04 p-value below 0.05 indicates a significant difference. StyleGAN2 0.184 0.135± 0.03 50 StyleGAN2 ADA 0.097 0.138± 0.04 StyleGAN2 DiffAug 0.1 0.15± 0.04 4 Discussion images. For the ADA and DiffAug models, the FID scores The examined GAN models are able to generate realistic ul- range from 87.97 ± 3.28 and 95 ± 4.78 to 166.98 ± 32.23 and trasound images that show high quality details and reproduce 133.80 ± 7.23, respectively. ADA and DiffAug outperform the the typical speckle pattern in ultrasound images. Our qualita- baseline in terms of FID score. The SSIM score, as well as the tive assessment of artefacts and mode collapse detectable in JS distance values are displayed in Table 1. the synthetic images generated by the different GAN models Sample images of each GAN method trained with differ- correlates with the corresponding FID scores. The JS distance ent dataset sizes are displayed in Figure 3. Whereas ADA and metric indicates the same trend as the FID score for all mod- DiffAug generate images with high fidelity, even with only 50 els, whereas the SSIM metric shows slightly different results. training images available, the baseline GAN synthesises im- The FID scores of all models decrease when trained with less ages with a more blurry and wavy pattern the less training im- than or equal to 100 images, which might be due to the dis- ages are available. Furthermore, the baseline generates images criminator’s overfitting and the lack of useful information fed with lower diversity in comparison to ADA and DiffAug when back to the generator. StyleGAN2 is not able to generate syn- trained with smaller dataset sizes. thetic images with the same quality as ADA or DiffAug. Es- pecially for lower training dataset sizes, the baseline gener- ates synthetic ultrasound images with less fidelity and diver- 3.2 Image Classification sity compared to the two used augmentation methods. The use of data augmentation in GANs leads to the generation of di- Table 2 shows the classification results in terms of the F1 score verse and high quality synthetic samples even with only 50 of the CNN models trained with different dataset sizes. The training images available. All CNN models trained with addi- CNN models that use extra synthetic images for training out- tional synthetic data outperfom the baseline in terms of mean perform the baseline model for all dataset sizes in terms of the FID Maack et al., GANs for synthetic medical image generation real 500 250 100 50 normal malignant benign malignant benign benign malignant benign malignant benign malignant malignant benign malignant benign Fig. 3: Sample images from the BUS dataset and synthetic images generated by the different GANs StyleGAN2, ADA, DiffAug (num- bers above the column indicate the number of images used for training). The class of the image is displayed in the lower right corner. F1-score. Especially for dataset sizes 100 and 50, the improve- References ments with synthetic images are statistically significant. How- ever, the higher quality and diversity in the synthetic datasets [1] Cheng, H.D., Shan, J., Ju, W., Guo, Y., Zhang, L.: Automated generated by ADA and DiffAug do not increase the classifi- breast cancer detection and classification using ultrasound cation performance of the CNNs in comparison to synthetic images: A survey. Pattern Recognition 43(1), 299–317 (2010) [2] Domingues, I., Pereira, G., Martins, P. et al. Using deep datasets generated by the baseline GAN. A possible reason learning techniques in medical imaging: a systematic re- for this may be that the synthetic images with lower quality view of applications on CT and PET. AI Rev 53, 4093–4160 still contain useful new features to successfully enhance the (2020). training dataset. Another reason might be, that the particular [3] Shorten C, Khoshgoftaar TM A survey on image data aug- augmentations applied in ADA and DiffAug do not add much mentation for deep learning. J Big Data 6:50 (2019) value to the feature space needed to improve medical ultra- [4] Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. American Economic Jour- sound image classification. nal: Applied Economics (2018) [5] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Bengio, Y. et al.: Generative adversarial networks. Advances in Neural Information Processing Systems (NeurIPS) 27 (2014) 5 Conclusion [6] Montero, A.; Bonet-Carne, E.; Burgos-Artizzu, X.P. Gen- erative Adversarial Networks to Improve Fetal Brain Fine- In this work, we investigate the capabilities of GANs for gen- Grained Plane Classification. Sensors 21, 7975 (2021) erating high quality synthetic breast ultrasound images from [7] Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data in brief 28, 104863 (2020) small datasets. We show that especially data augmentation [8] Karras, T., Laine, S., Aittala, M., Aila, T. et al.: Analyzing techniques such as ADA and DiffAug improve the image and improving the image quality of stylegan. CVPR pp. quality and diversity when only small datasets are available. 8110–8119 (2020) Synthetic ultrasound images can improve the performance [9] Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., of CNNs used for classification. However, our results also Aila, T.: Training generative adversarial networks with limited indicate that higher visual quality of synthetic data does not data. NeurIPS 33 (2020) [10] Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable directly correlate with added value for training CNNs. augmentation for data-efficient gan training. NeurIPS 33 (2020) Author Statement [11] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Research funding: The author state no funding involved. Hochreiter, S.: Gans trained by a two time-scale update rule Conflict of interest: Authors state no conflict of interest. converge to a local nash equilibrium. NeurIPS (2017) [12] Tan, M., Le V, Q.: Efficientnet: Rethinking model scaling for Informed consent / Ethical approval: Not applicable, since a convolutional neural networks. ICML (2019) publicly available dataset was used. DiffAugment ADA StyleGAN2

Journal

Current Directions in Biomedical Engineeringde Gruyter

Published: Jul 1, 2022

Keywords: deep learning; medical image analysis; image classification; ultrasound imaging; generative adversarial networks (GANs); synthetic image generation; small datasets

There are no references for this article.