Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

SpeckleGAN: a generative adversarial network with an adaptive speckle layer to augment limited training data for ultrasound image processing

SpeckleGAN: a generative adversarial network with an adaptive speckle layer to augment limited... Purpose In the field of medical image analysis, deep learning methods gained huge attention over the last years. This can be explained by their often improved performance compared to classic explicit algorithms. In order to work well, they need large amounts of annotated data for supervised learning, but these are often not available in the case of medical image data. One way to overcome this limitation is to generate synthetic training data, e.g., by performing simulations to artificially augment the dataset. However, simulations require domain knowledge and are limited by the complexity of the underlying physical model. Another method to perform data augmentation is the generation of images by means of neural networks. Methods We developed a new algorithm for generation of synthetic medical images exhibiting speckle noise via generative adversarial networks (GANs). Key ingredient is a speckle layer, which can be incorporated into a neural network in order to add realistic and domain-dependent speckle. We call the resulting GAN architecture SpeckleGAN. Results We compared our new approach to an equivalent GAN without speckle layer. SpeckleGAN was able to generate ultrasound images with very crisp speckle patterns in contrast to the baseline GAN, even for small datasets of 50 images. SpeckleGAN outperformed the baseline GAN by up to 165 % with respect to the Fréchet Inception distance. For artery layer and lumen segmentation, a performance improvement of up to 4 % was obtained for small datasets, when these were augmented with images by SpeckleGAN. Conclusion SpeckleGAN facilitates the generation of realistic synthetic ultrasound images to augment small training sets for deep learning based image processing. Its application is not restricted to ultrasound images but could be used for every imaging methodology that produces images with speckle such as optical coherence tomography or radar. Keywords Deep learning · Synthetic image generation · Theory-guided neural networks · Speckle noise · Small datasets · Image segmentation Introduction In recent years, finding diagnoses has been more and more supported by algorithms which provide additional Cardiovascular diseases like atherosclerosis are the leading information to the physician. In particular, powerful deep cause of death globally [12]. A common methodology for learning methods gained significant importance due to their assessing the severity and progress of plaque building in superior performance compared to many explicit algorithms. coronary arteries is intravascular ultrasound (IVUS) as it Typical applications are detection and classification of dis- provides information regarding the vessel wall and the com- eases or segmentation of different tissues. position of plaques. A drawback is the need of large annotated training data- sets in order to get useful results. Annotations are usually made by trained experts to ensure high quality. This natu- rally leads to a lack of high-quality data. To overcome these * Lennart Bargsten limitations, data augmentation methods are commonly used lennart.bargsten@tuhh.de [18]. In addition to applying random transformations to the 1 data samples (which do not alter their labels), the genera- Institute of Medical Technology and Intelligent Systems, tion of artificial training data is a possible way to enlarge Hamburg University of Technology, Hamburg, Germany Vol.:(0123456789) 1 3 1428 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 the training set. One way to generate synthetic data is to run Material and methods simulations. These often rely on rather simple models or require in-depth domain knowledge leading to results which Speckle layer are either of low quality or quite time consuming. Another promising method to generate artificial data is Speckle is an interference phenomenon in imaging systems by training generative adversarial networks (GANs) [6]. and occurs if the mean distance between scatterers is smaller Nevertheless, GANs also need sufficient amounts of train - than the resolution cell defined by the imaging methodology ing data to reach satisfactory performances. Often they [2]. The size of the resolution cell is determined mainly by are trained with more than 10,000 images or even 100,000 the wavelength of the carrier (or excitation) signal. Another images when dealing with rather diverse datasets [15]. To condition for the developing of speckle is the presence of reduce the amount of needed data, theory-guided opera- independent random phases of the scattered waves at the tions or modules may be integrated into the neural network point of observation, usually generated by surface roughness architecture [11]. These arise from theoretical considerations (optics) or inhomogeneous volumes like tissue (ultrasound). or physical models which can replace parts of the network. Interference of these signals leads to characteristic speckle In this way, the amount of model capacity which would be patterns. used to learn these physical concepts is free to learn other The algorithm for the speckle layer resembles the one features. In addition, theory-based network modules serve to found in the appendix of [8] and is based on the principles of regularize the training process and can thus lead to improved Fourier optics explained in [7]. In Fourier optics, one takes performance. advantage of the fact that under certain simplifications the We designed such a theory-guided network module to propagation and diffraction of wave signals can be expressed add speckle noise to network feature maps and integrated it as Fourier transformations. Although the process of speckle into a GAN architecture, which we called SpeckleGAN. This formation differs in ultrasound systems, the resulting effect enables us to generate realistic IVUS images with very few on the gray values is similar and we illustrate the approach training examples, while keeping the overall network archi- in the context of a simple optical system. tecture simple. Furthermore, the size of resulting speckles The algorithm is based on an imaging system comprised can vary for a single image and is learned during the training of an illuminated rough object and a converging lens (see process. Finally, we show how we can improve IVUS image Fig. 1b). The propagation and focusing of the wave signal segmentation performance by means of pre-training a neu- emitted by the object can be represented by two consecutive ral network with synthetic images by SpeckleGAN if only Fourier transformations. This is possible if some approxima- very limited data are available. Our method thus enables the tions are applied to the following general form of the diffrac- training of high-capacity neural networks with few data by tion integral. It describes how wave signals are diffracted at simultaneously prevent overfitting. apertures and is defined as Fig. 1 a: Sketch showing diffraction at an aperture. Variable naming signal exhibits a spatial distribution of random phases which leads to corresponds to Eq.  1. b: Sketch of a simple imaging system with a speckle patterns in the focal plane of the lens rough object and a converging lens. Due to the roughness, the object’s 1 3 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 1429 implement Eq. 2. In order to generate the typical speckle exp (jkr ) U(x , y )= cos (,  ) U(x , y ) dx dy . 0 0 01 1 1 1 1 patterns for centric IVUS views, coordinate transforms from j r polar to Cartesian coordinates and vice-versa were added to (1) the pipeline. An exemplary speckle transformation process Here, U(x , y ) denotes the field amplitude in the plane of 0 0 is depicted in Fig. 2. observation, U(x , y ) the field amplitude in the aperture 1 1 plane and Σ the aperture. The vector  represents the normal SpeckleGAN architecture of the aperture plane, k is the wave number,  the vector between a point on the aperture plane and another point on To generate IVUS images with defined geometry regarding the plane of observation and r its norm. See Fig. 1a for a the artery lumen and the intima/media layers, a segmenta- corresponding sketch. Further details regarding the deriva- tion mask has to be used as a conditional input. A promising tion of the formula and its application to the imaging system way to process the segmentation masks is by using spatially- of Fig. 1b can be found in [7]. adaptive normalization (SPADE) for semantic image syn- The speckle layer imitates the optical system of Fig. 1b and thesis [15]. SPADE layers transform segmentation masks can be described by the following equation: (here, encoded as images with integer pixel values from {0, −1 j (x,y) I (x, y)= F F I(x, y) ⋅ e ⋅ rect (x, y) , (2) 1, 2}, where each value corresponds to a tissue class) into sp d feature maps  and  by feeding them through two convo- where I(x, y) and I (x, y) denote the source and speckled lutional layers, respectively. The segmentation masks are sp image, respectively. F represents the Fourier transforma- resized before feeding them into SPADE in order to have the tion and rect (x, y) the rectangular window function with same size as the feature maps which should be normalized. in edge length d. For the sake of simplicity we did not use a Pixel values x of input feature maps to be normalized n,c,h,w circular window function indicated by the lens in Fig. 1. On are transformed as follows: the one hand, we did not observe any difference in the visual in x − appearance of the resulting speckle, on the other hand the n,c,h,w out x =  +  , c,h,w c,h,w n,c,h,w calculation of a circular mask function is computationally more expensive, because the distance between every pixel where the multi-index (n, c, h, w) refers to (sample in batch, to the image center has to be calculated in every training channel, height, width). The parameters  and  denote the c c step. Equation 2 can be interpreted as a low-pass l fi ter of the in channel-wise mean and standard deviation of x , respec- source image which is multiplied pixel-wise with random ∶,c,∶,∶ tively. A colon indexes the whole tensor dimension. phases and is thus equivalent to Figure  3 gives an overview of the overall GAN archi- j (x,y) −1 tecture. Generator and discriminator consist of multiple I (x, y)= I(x, y) ⋅ e ∗ F rect (x, y) (3) sp d residual blocks [9]. In the generator, SPADE [15] layers are used to condition the generated image to a given segmenta- j (x,y) I (x, y)= I(x, y) ⋅ e ∗ sinc (x, y). tion mask. The first convolutions in all SPADE layers have (4) sp d 64 output channels. Batch normalization precedes the affine Here, ∗ is the convolution operator and sinc (x, y) the sinc- transformation by SPADE and is also used in the discrimi- function with scale d. The edge length d of the rectangu- nator. Upscaling in the generator is performed by nearest lar window function defines the mean size of the resulting neighbor interpolation, while downscaling in the discrimi- speckles and can be learned during training of the neural nator is performed by convolutions with a stride of 2. The network. Smaller windows lead to larger speckle patches. We generator is seeded with a 128-dimensional random vector note that the runtime complexity of a convolution operation sampled from a standard multivariate Gaussian distribution. scales with n while the fast Fourier transform (FFT) scales Spectral normalization [14] was applied to the generator and with n ⋅ log(n) . It is thus computationally more efficient to the discriminator. Fig. 2 Exemplary speckle transformation of a test image with subsequent coordinate transformation in order to get warped speckles typical for IVUS images 1 3 1430 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 Generator Segmentation Map Residual Block (Generator) z~N(0,1) Segmentation Map In Input Feature Maps put Feature Maps Discriminator Linear(4096) Input Image Segmentation Map SPADE 4x4-Conv Concatenate Reshape(256, 4, 4) 1x1-Conv SPADE / ReLU 4x4-Conv(8) Upsample(2), ResBlock(256) 4x4-Conv SPADE / ReLU BatchNorm / ReLU Upsample(2), ResBlock(128) ResBlock(8) with 2-Conv Upsample(2), ResBlock(64) SPADE / ReLU ResBlock(16) with 2-Conv Upsample(2), ResBlock(32) Output Feature Maps ResBlock(32) with 2-Conv Upsample(2), ResBlock(16) ResBlock(64) with 2-Conv Upsample(2), ResBlock(8) ResBlock(128) with 2-Conv Residual Block(Discriminator) Glob. Sum Pooling Input Feature Maps Input Feature Maps ResBlock(256) with 2-Conv Speckle Linear(512) Layer(32) ResBlock(256) 4x4-Conv Linear(32) Glob. Sum Pooling BN / LReLU 1x1-Conv Linear(1) 4x4-Conv BN / LReLU SPADE Sigmoid ResBlock(8) Output BN / LReLU 4x4-Conv(1) Output Feature Maps Tanh Output Image Fig. 3 Sketch of the architecture of SpeckleGAN. Numbers in round brackets depict the respective numbers of output channels. Exceptions are Upsample (number depicts the scaling factor) and Reshape (number depicts the output’s channel and spatial dimensions) The speckle layer follows the penultimate residual layer were found by grid-search and stayed the same for all of the generator. Here, the feature maps already reached experiments. The input feature maps of the speckle layer the output image size. Inserting the speckle layer into a are also used to compute channel attention coefficients by deeper part of the network led to poor results. One rea- applying global sum pooling and two linear layers. The son could be that the feature maps in deeper layers have output feature maps of the speckle layer are weighted with not yet reached the original image size. The speckle layer these coefficients to filter out unimportant combinations of adds speckle noise with 4 different speckle sizes to all input feature maps and speckle sizes. A spatial attention input feature maps, respectively. This means that 8 input approach led to massive checkerboard artifacts and was feature maps are transformed to 32 output feature maps, therefore discarded. The resulting synthetic IVUS images whereby 4 feature maps each exhibit the same morphology have a size of 256 × 256 pixels. but with different speckle sizes. These hyperparameters 1 3 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 1431 Dataset Training The underlying IVUS dataset was provided by Balocco We used the non-saturating GAN loss functions proposed et al. [1] and consists of 435 IVUS images captured with in [6]: a 20 MHz phased array transducer together with corre- L (, , )=−  log(D()) −  log(1 − D(G())) , D ∼p ∼p data sponding annotated contours marking the lumen border (6) and the media–adventicia interface. The dataset com- prises images with calcified and non-calcified plaque as L (, )=−  log(D(G())) , (7) G ∼p well as bifurcations, side branches and shadow artifacts. The annotated contours were transformed into segmen- where L and L denote the loss functions for discrimina- D G tor and generator, respectively. Furthermore,  denotes a tation masks containing three different classes (lumen, intima/media and adventicia/background). Figure 4 shows real image drawn from the data distribution p , whereas data denotes a condition. In this work,  is a segmentation mask. an example image with the corresponding segmentation mask. The random number  is the input of the generator and is drawn from a standard multivariate Gaussian distribution p . Finally, D and G are the discriminator and generator function, respectively. Fréchet Inception distance For defining a baseline GAN, the speckle layer was replaced with an identity mapping (cyan-colored box in the The Fréchet Inception distance (FID) [10] measures the distance between the generated image data distribution generator sketch of Fig. 3). Everything else remained the same. SpeckleGAN and the baseline GAN were trained with and the real image data distribution by combining mean values and covariance matrices of network activations 435, 200, 100 and 50 training examples, respectively. The validation during training was done by means of calculating arising from feeding both image sets into an Inception-v3 model [19], which was pre-trained on the ImageNet data- the FID score between 435 generated images and the whole dataset of 435 real images to make all cases comparable set [4]. Typically, activations of the penultimate network layer are used to calculate the FID score: (see “Segmentation evaluation” section for notes regarding overfitting). The GANs were conditioned with the segmenta- 2 1∕2 FID = ‖ −  ‖ + Tr(C + C − 2(C C ) ). (5) 1 2 1 2 1 2 2 tion masks of the dataset to generate synthetic images. This ensures that validation is not affected by artery morpholo - Here,  and  are the mean vectors and C and C the cor- 1 2 1 2 gies, but focuses on textures. responding covariance matrices. Small FID scores and thus For every combination of model and number of train- small distances between the image data distributions indi- ing examples, the best learning rate and learning rate decay cate visual similarity of the image sets as well as diversity scheme was grid searched individually. In summary, the of the generated image set meaning that mode collapse was initial learning rates ranged between 1e−3 and 3e−4 and prevented. It has not been proven so far that low FID scores were decreased to 1e−4 or 3e−5 in two steps every few induce high image quality when applied to medical images. hundred epochs. For optimization we used Adam with However, recent works indicate correlation between FID = 0.5 ,  = 0.999 and  = 1e−8 . During training, data 1 2 score and realism of generated medical images [13, 21] augmentation was performed by random rotations as well as Fig. 4 Example image and corresponding mask from the clinical dataset. The image on the right-hand side shows an overlay of image and mask 1 3 1432 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 horizontal and vertical flips. The edge lengths of the square synthetic images for segmentation pre-training. This means filter windows defining the speckle sizes in the speckle layer that the GANs will tend to overfit on the training set. How - were initialized with values ranging from 28 to 48 pixels. ever, when dealing with extremely small datasets, another split would reduce the amount of data too much in order to GAN evaluation get useful results. Furthermore, it is not well studied so far how overfitting via FID scores quantitatively affects GAN The final evaluation was done by means of calculating the performance. In the paper which introduces the FID score FID score between 1000 generated images and all 435 real [10], the authors also use the training set as a reference set images. For generating the synthetic images, the generators for calculating the FID score. were conditioned with artificial segmentation masks pro- The best performing GANs each generated 1000 IVUS duced by superimposing randomly rotated and disturbed images by using synthetic segmentation masks as condi- ellipses imitating artery lumen and intima/media layers. tional inputs (compare “GAN evaluation” section). The This approach simulates the way how GANs would be used segmentation networks were then pre-trained with the syn- in practice, namely to augment the dataset they were trained thetic IVUS data and fine-tuned with the real training data. with. As explained in “Fréchet Inception distance” section, We used the Dice coefficient and the modified Hausdorff the FID-score does not completely ensure reliability when distance [5] to measure the segmentation performances via used to evaluate realism of medical image sets. In order to fivefold cross-validation. The modified Hausdorff distance further assess the quality of the synthetic images, we calcu- allows meaningful evaluation of edge alignment for pixel lated two more metrics: The Jensen–Shannon divergence mask-based segmentation results, because it is less sensitive between gray value distributions of different segmenta- to outliers. The final results were calculated by means of the tion classes in ground-truth and synthetic images and the remaining test sets. structural similarity (SSIM) index between corresponding ground-truth and synthetic images. Results Segmentation evaluation Generation of synthetic IVUS images The generated IVUS images were used to improve seg- mentation performances of neural networks with U-Net The chart in Fig. 5 shows the FID scores of image sets gen- architecture [16]. The networks consisted of residual blocks erated by SpeckleGAN and the baseline GAN for different [9] in the down- and upsampling path. In each of the three numbers of training samples. The image sets generated by downsampling blocks, the spatial sizes of the feature maps SpeckleGAN result in FID scores ranging from 134.0 for 50 were halved while the numbers of feature maps were dou- training images to 113.4 for 435 training images. The base- bled up to 256. The upsampling blocks operated vice versa. line GAN on the other hand reaches values from just 354.9 The input image dimensions were 256×256 and the batch to 166.6. The performance improvements of SpeckleGAN size was 10. by means of FID scores for the different numbers of training To show that the use of synthetic IVUS data by Speck- examples are as follows: leGAN improves segmentation performance when dealing with small datasets, we went through two scenarios: – 50 GAN training examples: 165% – 100 GAN training examples: 58% 1. 50 examples available for training a segmentation net- – 200 GAN training examples: 30% work, – 435 GAN training examples: 47% 2. 100 examples available for training a segmentation net- work. Table  1 shows the GAN performances by means of Jensen–Shannon divergence and SSIM calculated between To get representative performance statistics for the segmen- synthetic and real images. The results are broken down into tation, we used the remaining examples from the whole data- the number of GAN training examples. set (385 for scenario 1 and 335 for scenario 2) as a test set. Figure 6 gives an overview of generated IVUS images for We used the training sets to train SpeckleGANs and baseline varying numbers of training examples. In all cases, Speck- GANs for data augmentation (we did not use the GANs from leGAN generates visually more appealing images than the “GAN evaluation” section). Because of the small datasets baseline GAN. The quality of SpeckleGAN images only in both scenarios, we used the whole training set as a refer- decreases slightly with fewer training examples, whereas the ence set for monitoring the FID score during training and quality of images generated by the baseline GAN decreases for finally choosing the model which is used to generate the strongly. 1 3 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 1433 Fig. 5 Comparison of Speckle- FID score GAN and the baseline GAN by means of resulting FID scores 354.9 for different numbers of GAN Speckle GAN training examples. A lower Baseline GAN values indicates better GAN performance 210.1 174.3 166.6 134.1 134 132.7 113.4 50 100200 435 Number of GAN training examples Table 1 Jensen–Shannon −3 # Train samples Datasets Jensen–Shannon divergence [10 ] SSIM divergences and structural similarity (SSIM) indices Lumen Intima/media Adventitia comparing synthetic and 50 samples SpeckleGAN/g-t 33.2 25.8 7.2 0.431 ±  0.028 ground-truth (g-t) image sets Baseline GAN/g-t 289.9 21.1 55.1 0.240 ± 0.029 100 samples SpeckleGAN/ -t 7.1 8.1 8.4 0.434 ± 0.030 Baseline GAN/g-t 28.6 12.0 4.1 0.445 ± 0.027 200 samples SpeckleGAN/g-t 4.5 8.7 9.3 0.440 ± 0.029 Baseline GAN/g-t 172.2 11.4 14.5 0.301 ± 0.027 435 samples SpeckleGAN/g-t 3.3 3.7 6.2 0.443 ± 0.027 Baseline GAN/g-t 186.0 10.5 11.7 0.324 ± 0.025 Low Jensen–Shannon divergence indicates similar gray value distributions, whereas high SSIM values indicate similar image appearance IVUS segmentation Discussion Table 2 shows the segmentation results of both scenarios Generation of synthetic IVUS images described in “Segmentation evaluation” section with and without pre-training by means of synthetic images gener- Keeping in mind that the FID score measures the structural ated by SpeckleGAN and the baseline GAN. The upper similarity of two image sets and their respective diversity, table presents the Dice coefficients, whereas the lower table Fig. 5 clearly shows that the baseline GAN fails to generate presents the modified Hausdorff distances. We performed t IVUS image sets with sufficient quality and diversity, if the tests in a pairwise fashion to check if the means differ sig- number of training examples decreases. SpeckleGAN on the nificantly. We note that p value correction for multi-hypoth- other hand hardly suffers from a reduced number of train- esis tests must not be applied in this setting, because we ing samples. The images in Fig. 6 show that SpeckleGAN do not perform multiple tests on the same dataset nor do outperforms the baseline GAN for all different numbers of we test one and the same hypothesis on several datasets. training examples. The visual appearance suffers just slightly The corresponding p values are depicted in the four right- from the reduced number of training examples. The baseline most columns. A low value (typically p < 0.05 ) indicates a GAN generates IVUS images with very blurry and wavy pat- significant difference in the calculated mean values of the terns which do not resemble real speckle. For 100 training underlying segmentation metrics. images, these are smeared out completely and checkerboard 1 3 1434 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 Number of GAN training examples 435 200 10050 Fig. 6 Comparison of IVUS images generated by SpeckleGAN and the baseline GAN for different numbers of GAN training examples. All images were acquired with the same conditional segmentation mask Table 2 Comparison of Dice coefficients (upper table) and modified Hausdorff distances (lower table) as a function of the number of training examples # Samples Model Dice coefficient (%) p-values Baseline GAN No pre-train Intima/media Lumen In/Me Lum In/Me Lum 50 SpeckleGAN 83.18 ± 0.23 93.44 ± 0.42 < 0.001 0.122 < 0.001 < 0.001 Baseline GAN 81.51 ± 92.87 92.87 ± 1.18 ∗ ∗ < 0.001 < 0.001 No pre-train 80.02 ± 0.67 91.74 ± 1.50 ∗ ∗ ∗ ∗ 100 SpeckleGAN 86.02 ± 0.44 95.70 ± 0.15 0.002 < 0.001 < 0.001 < 0.001 Baseline GAN 85.18 ± 0.46 94.95 ± 0.21 ∗ ∗ 0.07 0.495 No pre-train 84.79 ± 0.22 94.83 ± 0.19 ∗ ∗ ∗ ∗ # Samples Model Mod. Hausdorff dist. [p] p-values Baseline GAN No pre-train Intima/media Lumen In/Me Lum In/Me Lum 50 SpeckleGAN 1.88 ± 0.14 3.07 ± 0.40 0.058 1.000 < 0.001 0.011 Baseline GAN 2.07 ± 0.30 3.04 ± 1.45 ∗ ∗ < 0.001 0.008 No pre-train 2.54 ± 0.51 3.79 ± 2.09 ∗ ∗ ∗ ∗ 100 SpeckleGAN 0.79 ± 0.06 0.37 ± 0.11 0.020 < 0.001 0.001 < 0.001 Baseline GAN 0.90 ± 0.09 0.92 ± 0.36 ∗ ∗ 0.292 0.071 No pre-train 0.95 ± 0.06 1.13 ± 0.39 ∗ ∗ ∗ ∗ The four columns on the right show p-values calculated by pairwise t-tests. If p-values are smaller than 0.05, they are printed boldly artifacts are visible. The baseline GAN completely fails the superiority of SpeckleGAN apart from minor excep- when trained with 50 training samples. tions. It is interesting to see, that these exceptions occur The evaluations by means of the Jensen–Shannon in cases, where visual appearance clearly favors the Speck- divergence and the SSIM depicted in Table  1 also show leGAN results (compare Fig. 6). This can be explained by 1 3 Baseline GAN Speckle GAN International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 1435 considering that both metrics do not take into account all baseline GAN also improves the resulting Dice coefficients. image characteristics which are important for IVUS images. This indicates that valuable information is even present in For example, a low Jensen–Shannon divergence can be the morphology of blurred images. achieved even if the synthetic images do not show speckle The evaluation of the Jensen–Shannon divergence in at all, because the two-dimensional arrangement of the grey Table 1 and the comparison with [20] shows that the struc- values does not affect gray value histograms. SSIM on the ture of the adventitia in particular benefits from Speckle - other hand compares luminance, contrast and structure (i.e., GAN. However, its appearance is only of minor importance the correlation) of two images. In the case of IVUS images for the segmentation of lumen and intima/media. The base- luminance and contrast are reliable measures, whereas cor- line GAN achieves much worse Jensen–Shannon diver- relation does not necessarily imply similarity, because of gences for the lumen. Nevertheless, for 50 training examples speckle noise. Two almost identical images except from a the lumen segmentation performance is equivalent or even slight shift of speckle patches can have zero (or even nega- better by pre-training with images of the Baseline GAN. tive) correlation. This also holds for other classic similarity This leads to the conclusion that realistic speckle does not measures such as peak signal to noise ratio (PSNR) or mean play an important role for segmentation of the lumen when squared error (MSE), so these have not been used here. dealing with 20 MHz IVUS images. Comparing [1, 3, 22] The authors of [20] used 2075 images of the same clinical and the results of scenario 2, it can be seen that our approach IVUS dataset (without segmentation masks) for training a nearly reached state-of-the-art performance, although our two-stage GAN in order to generate synthetic images. Our training set was smaller and no special care was taken about approach results in Jensen–Shannon divergences which are optimization of the segmentation network used in this work one order of magnitude below the values achieved in [20], (see “Segmentation evaluation” section). even for only 100 training examples. In particular, the val- ues obtained for the adventitia layer are far superior, which shows that our approach results in speckle patches leading Conclusion to gray value distributions resembling the real ones very closely. This could be due to the ability of our algorithm to SpeckleGAN improves quality and diversity of generated produce speckles with various sizes over a single image. But IVUS images compared to a baseline GAN model with- also the baseline GAN performs better than the approach out a speckle layer. It generates visually appealing images in [20] regarding intima/media and adventitia layers when with defined morphology (conditioned by segmentation trained with 100 or more samples. masks) even when trained with extremely small datasets GANs often suffer from mode collapse [17]. This means of 50 images. SpeckleGAN offers a wide range of possible that only a few or even only a single mode of the data dis- applications. First of all, it is not limited to generate IVUS tribution can be generated, which reduces the variety of the images. It could be applied to ultrasound images in general samples drastically. SpeckleGAN has the advantage that and to other imaging modalities that produce images with mode collapse can only affect the morphology (or back - speckle such as optical coherence tomography or radar. As ground) of the image and not the speckle patterns, because seen in the previous section, realistic speckle patterns have these are randomly generated by the speckle layer. only minor impact on the performance when it comes to segmentation of lumen and intima/media layers in IVUS. IVUS segmentation Classification, detection or tracking tasks which heavily rely on speckle patterns could benefit much more from realistic It has been demonstrated (see Table  2) that pre-training speckles generated with SpeckleGAN when tackled with improves the mean Dice coefficient and the mean modified data driven algorithms. Hausdorff distance regardless of using synthetic images generated by SpeckleGAN or by the baseline GAN. But the improvements due to the baseline GAN are only statistically Acknowledgments Open Access funding provided by Projekt DEAL. significant for 50 training examples, not for 100 training examples. In nearly all cases, pre-training with synthetic Funding This work was partially funded by the European Regional images of SpeckleGAN leads to better mean segmentation Development Fund (ERDF), by the Hamburgische Investitions- und Förderbank (IFB) and by the Free and Hanseatic City of Hamburg. performances than pre-training with images from the base- line GAN. However, the improvement is not statistically Compliance with ethical standards significant in three cases of 50 training examples: for the Dice coefficient of the lumen as well as for the modified Conflict of interest The authors declare that they have no conflict of Hausdorff distance for both intima/media and lumen. It can interest. be seen that pre-training with low quality images from the 1 3 1436 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 Ethical approval This article does not contain any studies with human 10. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S participants or animals performed by any of the authors. (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information Informed consent Not applicable. processing systems. Curran Associates Inc, vol 30, pp 6626–6637 11. Karpatne A, Atluri G, Faghmous JH, Steinbach M, Banerjee A, Ganguly A, Shekhar S, Samatova N, Kumar V (2017) Theory- Open Access This article is licensed under a Creative Commons Attri- guided data science: a new paradigm for scientific discovery from bution 4.0 International License, which permits use, sharing, adapta- data. IEEE Trans Knowl Data Eng 29(10):2318–2331 tion, distribution and reproduction in any medium or format, as long 12. Mendis S, Puska P, Norrving B, Organization WH, Federation as you give appropriate credit to the original author(s) and the source, WH, Organization WS (2011) Global atlas on cardiovascular dis- provide a link to the Creative Commons licence, and indicate if changes ease prevention and control. World Health Organization, Geneva were made. The images or other third party material in this article are 13. Middel L, Palm C, Erdt M (2019) Synthesis of medical images included in the article’s Creative Commons licence, unless indicated using gans. In: First international workshop, UNSURE 2019, and otherwise in a credit line to the material. If material is not included in 8th International Workshop, CLIP 2019, Held in Conjunction with the article’s Creative Commons licence and your intended use is not MICCAI 2019, pp 125–134 permitted by statutory regulation or exceeds the permitted use, you will 14. Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral need to obtain permission directly from the copyright holder. To view a normalization for generative adversarial networks. In: 6th inter- copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. national conference on learning representations (ICLR 2018) 15. Park T, Liu MY, Wang TC, Zhu JY (2019) Semantic image syn- thesis with spatially-adaptive normalization. In: Proceedings of the IEEE conference on computer vision and pattern recognition References 16. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image 1. Balocco S, Gatta C, Ciompi F, Wahle A, Radeva P, Carlier S, Unal computing and computer-assisted intervention (MICCAI), G, Sanidas E, Mauri J, Carillo X, Kovarnik T, Wang CW, Chen Springer, Berlin, vol 9351, pp 234–241 HC, Exarchos TP, Fotiadis DI, Destrempes F, Cloutier G, Pujol 17. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, O, Alberti M, Mendizabal-Ruiz EG, Rivera M, Aksoy T, Downe Chen X, Chen X (2016) Improved techniques for training gans. RW, Kakadiaris IA (2014) Standardized evaluation methodology In: Advances in neural information processing systems 29, pp and reference database for evaluating ivus image segmentation. 2234–2242. Curran Associates, Inc Comput Med Imaging Graph 38(2):70–90 18. Shorten C, Khoshgoftaar TM (2019) A survey on image data aug- 2. Burckhardt C (1978) Speckle in ultrasound b-mode scans. IEEE mentation for deep learning. J Big Data 6:50 Trans Son Ultrason 25(1):1–6 19. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) 3. China D, Mitra P, Sheet D (2017) Segmentation of lumen and Rethinking the inception architecture for computer vision. In: external elastic laminae in intravascular ultrasound images using 2016 IEEE conference on computer vision and pattern recogni- ultrasonic backscattering physics initialized multiscale random tion (CVPR), pp 2818–2826 walks. In: Computer vision, graphics, and image processing. 20. Tom F, Sheet D (2008) Simulating patho-realistic ultrasound Springer, New York, pp 393–403 images using deep generative networks with adversarial learn- 4. Deng J, Dong W, Socher R, Li L, Li K Li F-F: Imagenet: a large- ing. In: 2018 IEEE 15th international symposium on biomedical scale hierarchical image database. In: 2009 IEEE conference on imaging (ISBI 2018), pp 1174–1177 (2018) computer vision and pattern recognition, pp 248–255 21. Uzunova H, Ehrhardt J, Jacob F, Frydrychowicz A, Handels H 5. Dubuisson MP, Jain A (1994) A modified hausdorff distance for (2019) Multi-scale gans for memory-efficient generation of high object matching. In: Proceedings of the 12th international confer- resolution medical images. In: Medical image computing and ence on pattern recognition, pp 566–568 computer assisted intervention (MICCAI), pp 112–120. Springer, 6. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley New York (2019) D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial 22. Yang J, Faraji M, Basu A (2019) Robust segmentation of arterial nets. In: Advances in neural information processing systems. Cur- walls in intravascular ultrasound images using dual path u-net. ran Associates Inc, vol 27, pp 2672–2680 Ultrasonics 96:24–33 7. Goodman JW (1968) Introduction to fourier optics. McGraw-Hill, New York Publisher’s Note Springer Nature remains neutral with regard to 8. Goodman JW (2007) Speckle phenomena in optics: theory and jurisdictional claims in published maps and institutional affiliations. applications. Roberts and Company Publishers, Englewood 9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Computer Assisted Radiology and Surgery Springer Journals

SpeckleGAN: a generative adversarial network with an adaptive speckle layer to augment limited training data for ultrasound image processing

Loading next page...
 
/lp/springer-journals/specklegan-a-generative-adversarial-network-with-an-adaptive-speckle-3MunZ7OyGb

References (39)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2020
ISSN
1861-6410
eISSN
1861-6429
DOI
10.1007/s11548-020-02203-1
Publisher site
See Article on Publisher Site

Abstract

Purpose In the field of medical image analysis, deep learning methods gained huge attention over the last years. This can be explained by their often improved performance compared to classic explicit algorithms. In order to work well, they need large amounts of annotated data for supervised learning, but these are often not available in the case of medical image data. One way to overcome this limitation is to generate synthetic training data, e.g., by performing simulations to artificially augment the dataset. However, simulations require domain knowledge and are limited by the complexity of the underlying physical model. Another method to perform data augmentation is the generation of images by means of neural networks. Methods We developed a new algorithm for generation of synthetic medical images exhibiting speckle noise via generative adversarial networks (GANs). Key ingredient is a speckle layer, which can be incorporated into a neural network in order to add realistic and domain-dependent speckle. We call the resulting GAN architecture SpeckleGAN. Results We compared our new approach to an equivalent GAN without speckle layer. SpeckleGAN was able to generate ultrasound images with very crisp speckle patterns in contrast to the baseline GAN, even for small datasets of 50 images. SpeckleGAN outperformed the baseline GAN by up to 165 % with respect to the Fréchet Inception distance. For artery layer and lumen segmentation, a performance improvement of up to 4 % was obtained for small datasets, when these were augmented with images by SpeckleGAN. Conclusion SpeckleGAN facilitates the generation of realistic synthetic ultrasound images to augment small training sets for deep learning based image processing. Its application is not restricted to ultrasound images but could be used for every imaging methodology that produces images with speckle such as optical coherence tomography or radar. Keywords Deep learning · Synthetic image generation · Theory-guided neural networks · Speckle noise · Small datasets · Image segmentation Introduction In recent years, finding diagnoses has been more and more supported by algorithms which provide additional Cardiovascular diseases like atherosclerosis are the leading information to the physician. In particular, powerful deep cause of death globally [12]. A common methodology for learning methods gained significant importance due to their assessing the severity and progress of plaque building in superior performance compared to many explicit algorithms. coronary arteries is intravascular ultrasound (IVUS) as it Typical applications are detection and classification of dis- provides information regarding the vessel wall and the com- eases or segmentation of different tissues. position of plaques. A drawback is the need of large annotated training data- sets in order to get useful results. Annotations are usually made by trained experts to ensure high quality. This natu- rally leads to a lack of high-quality data. To overcome these * Lennart Bargsten limitations, data augmentation methods are commonly used lennart.bargsten@tuhh.de [18]. In addition to applying random transformations to the 1 data samples (which do not alter their labels), the genera- Institute of Medical Technology and Intelligent Systems, tion of artificial training data is a possible way to enlarge Hamburg University of Technology, Hamburg, Germany Vol.:(0123456789) 1 3 1428 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 the training set. One way to generate synthetic data is to run Material and methods simulations. These often rely on rather simple models or require in-depth domain knowledge leading to results which Speckle layer are either of low quality or quite time consuming. Another promising method to generate artificial data is Speckle is an interference phenomenon in imaging systems by training generative adversarial networks (GANs) [6]. and occurs if the mean distance between scatterers is smaller Nevertheless, GANs also need sufficient amounts of train - than the resolution cell defined by the imaging methodology ing data to reach satisfactory performances. Often they [2]. The size of the resolution cell is determined mainly by are trained with more than 10,000 images or even 100,000 the wavelength of the carrier (or excitation) signal. Another images when dealing with rather diverse datasets [15]. To condition for the developing of speckle is the presence of reduce the amount of needed data, theory-guided opera- independent random phases of the scattered waves at the tions or modules may be integrated into the neural network point of observation, usually generated by surface roughness architecture [11]. These arise from theoretical considerations (optics) or inhomogeneous volumes like tissue (ultrasound). or physical models which can replace parts of the network. Interference of these signals leads to characteristic speckle In this way, the amount of model capacity which would be patterns. used to learn these physical concepts is free to learn other The algorithm for the speckle layer resembles the one features. In addition, theory-based network modules serve to found in the appendix of [8] and is based on the principles of regularize the training process and can thus lead to improved Fourier optics explained in [7]. In Fourier optics, one takes performance. advantage of the fact that under certain simplifications the We designed such a theory-guided network module to propagation and diffraction of wave signals can be expressed add speckle noise to network feature maps and integrated it as Fourier transformations. Although the process of speckle into a GAN architecture, which we called SpeckleGAN. This formation differs in ultrasound systems, the resulting effect enables us to generate realistic IVUS images with very few on the gray values is similar and we illustrate the approach training examples, while keeping the overall network archi- in the context of a simple optical system. tecture simple. Furthermore, the size of resulting speckles The algorithm is based on an imaging system comprised can vary for a single image and is learned during the training of an illuminated rough object and a converging lens (see process. Finally, we show how we can improve IVUS image Fig. 1b). The propagation and focusing of the wave signal segmentation performance by means of pre-training a neu- emitted by the object can be represented by two consecutive ral network with synthetic images by SpeckleGAN if only Fourier transformations. This is possible if some approxima- very limited data are available. Our method thus enables the tions are applied to the following general form of the diffrac- training of high-capacity neural networks with few data by tion integral. It describes how wave signals are diffracted at simultaneously prevent overfitting. apertures and is defined as Fig. 1 a: Sketch showing diffraction at an aperture. Variable naming signal exhibits a spatial distribution of random phases which leads to corresponds to Eq.  1. b: Sketch of a simple imaging system with a speckle patterns in the focal plane of the lens rough object and a converging lens. Due to the roughness, the object’s 1 3 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 1429 implement Eq. 2. In order to generate the typical speckle exp (jkr ) U(x , y )= cos (,  ) U(x , y ) dx dy . 0 0 01 1 1 1 1 patterns for centric IVUS views, coordinate transforms from j r polar to Cartesian coordinates and vice-versa were added to (1) the pipeline. An exemplary speckle transformation process Here, U(x , y ) denotes the field amplitude in the plane of 0 0 is depicted in Fig. 2. observation, U(x , y ) the field amplitude in the aperture 1 1 plane and Σ the aperture. The vector  represents the normal SpeckleGAN architecture of the aperture plane, k is the wave number,  the vector between a point on the aperture plane and another point on To generate IVUS images with defined geometry regarding the plane of observation and r its norm. See Fig. 1a for a the artery lumen and the intima/media layers, a segmenta- corresponding sketch. Further details regarding the deriva- tion mask has to be used as a conditional input. A promising tion of the formula and its application to the imaging system way to process the segmentation masks is by using spatially- of Fig. 1b can be found in [7]. adaptive normalization (SPADE) for semantic image syn- The speckle layer imitates the optical system of Fig. 1b and thesis [15]. SPADE layers transform segmentation masks can be described by the following equation: (here, encoded as images with integer pixel values from {0, −1 j (x,y) I (x, y)= F F I(x, y) ⋅ e ⋅ rect (x, y) , (2) 1, 2}, where each value corresponds to a tissue class) into sp d feature maps  and  by feeding them through two convo- where I(x, y) and I (x, y) denote the source and speckled lutional layers, respectively. The segmentation masks are sp image, respectively. F represents the Fourier transforma- resized before feeding them into SPADE in order to have the tion and rect (x, y) the rectangular window function with same size as the feature maps which should be normalized. in edge length d. For the sake of simplicity we did not use a Pixel values x of input feature maps to be normalized n,c,h,w circular window function indicated by the lens in Fig. 1. On are transformed as follows: the one hand, we did not observe any difference in the visual in x − appearance of the resulting speckle, on the other hand the n,c,h,w out x =  +  , c,h,w c,h,w n,c,h,w calculation of a circular mask function is computationally more expensive, because the distance between every pixel where the multi-index (n, c, h, w) refers to (sample in batch, to the image center has to be calculated in every training channel, height, width). The parameters  and  denote the c c step. Equation 2 can be interpreted as a low-pass l fi ter of the in channel-wise mean and standard deviation of x , respec- source image which is multiplied pixel-wise with random ∶,c,∶,∶ tively. A colon indexes the whole tensor dimension. phases and is thus equivalent to Figure  3 gives an overview of the overall GAN archi- j (x,y) −1 tecture. Generator and discriminator consist of multiple I (x, y)= I(x, y) ⋅ e ∗ F rect (x, y) (3) sp d residual blocks [9]. In the generator, SPADE [15] layers are used to condition the generated image to a given segmenta- j (x,y) I (x, y)= I(x, y) ⋅ e ∗ sinc (x, y). tion mask. The first convolutions in all SPADE layers have (4) sp d 64 output channels. Batch normalization precedes the affine Here, ∗ is the convolution operator and sinc (x, y) the sinc- transformation by SPADE and is also used in the discrimi- function with scale d. The edge length d of the rectangu- nator. Upscaling in the generator is performed by nearest lar window function defines the mean size of the resulting neighbor interpolation, while downscaling in the discrimi- speckles and can be learned during training of the neural nator is performed by convolutions with a stride of 2. The network. Smaller windows lead to larger speckle patches. We generator is seeded with a 128-dimensional random vector note that the runtime complexity of a convolution operation sampled from a standard multivariate Gaussian distribution. scales with n while the fast Fourier transform (FFT) scales Spectral normalization [14] was applied to the generator and with n ⋅ log(n) . It is thus computationally more efficient to the discriminator. Fig. 2 Exemplary speckle transformation of a test image with subsequent coordinate transformation in order to get warped speckles typical for IVUS images 1 3 1430 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 Generator Segmentation Map Residual Block (Generator) z~N(0,1) Segmentation Map In Input Feature Maps put Feature Maps Discriminator Linear(4096) Input Image Segmentation Map SPADE 4x4-Conv Concatenate Reshape(256, 4, 4) 1x1-Conv SPADE / ReLU 4x4-Conv(8) Upsample(2), ResBlock(256) 4x4-Conv SPADE / ReLU BatchNorm / ReLU Upsample(2), ResBlock(128) ResBlock(8) with 2-Conv Upsample(2), ResBlock(64) SPADE / ReLU ResBlock(16) with 2-Conv Upsample(2), ResBlock(32) Output Feature Maps ResBlock(32) with 2-Conv Upsample(2), ResBlock(16) ResBlock(64) with 2-Conv Upsample(2), ResBlock(8) ResBlock(128) with 2-Conv Residual Block(Discriminator) Glob. Sum Pooling Input Feature Maps Input Feature Maps ResBlock(256) with 2-Conv Speckle Linear(512) Layer(32) ResBlock(256) 4x4-Conv Linear(32) Glob. Sum Pooling BN / LReLU 1x1-Conv Linear(1) 4x4-Conv BN / LReLU SPADE Sigmoid ResBlock(8) Output BN / LReLU 4x4-Conv(1) Output Feature Maps Tanh Output Image Fig. 3 Sketch of the architecture of SpeckleGAN. Numbers in round brackets depict the respective numbers of output channels. Exceptions are Upsample (number depicts the scaling factor) and Reshape (number depicts the output’s channel and spatial dimensions) The speckle layer follows the penultimate residual layer were found by grid-search and stayed the same for all of the generator. Here, the feature maps already reached experiments. The input feature maps of the speckle layer the output image size. Inserting the speckle layer into a are also used to compute channel attention coefficients by deeper part of the network led to poor results. One rea- applying global sum pooling and two linear layers. The son could be that the feature maps in deeper layers have output feature maps of the speckle layer are weighted with not yet reached the original image size. The speckle layer these coefficients to filter out unimportant combinations of adds speckle noise with 4 different speckle sizes to all input feature maps and speckle sizes. A spatial attention input feature maps, respectively. This means that 8 input approach led to massive checkerboard artifacts and was feature maps are transformed to 32 output feature maps, therefore discarded. The resulting synthetic IVUS images whereby 4 feature maps each exhibit the same morphology have a size of 256 × 256 pixels. but with different speckle sizes. These hyperparameters 1 3 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 1431 Dataset Training The underlying IVUS dataset was provided by Balocco We used the non-saturating GAN loss functions proposed et al. [1] and consists of 435 IVUS images captured with in [6]: a 20 MHz phased array transducer together with corre- L (, , )=−  log(D()) −  log(1 − D(G())) , D ∼p ∼p data sponding annotated contours marking the lumen border (6) and the media–adventicia interface. The dataset com- prises images with calcified and non-calcified plaque as L (, )=−  log(D(G())) , (7) G ∼p well as bifurcations, side branches and shadow artifacts. The annotated contours were transformed into segmen- where L and L denote the loss functions for discrimina- D G tor and generator, respectively. Furthermore,  denotes a tation masks containing three different classes (lumen, intima/media and adventicia/background). Figure 4 shows real image drawn from the data distribution p , whereas data denotes a condition. In this work,  is a segmentation mask. an example image with the corresponding segmentation mask. The random number  is the input of the generator and is drawn from a standard multivariate Gaussian distribution p . Finally, D and G are the discriminator and generator function, respectively. Fréchet Inception distance For defining a baseline GAN, the speckle layer was replaced with an identity mapping (cyan-colored box in the The Fréchet Inception distance (FID) [10] measures the distance between the generated image data distribution generator sketch of Fig. 3). Everything else remained the same. SpeckleGAN and the baseline GAN were trained with and the real image data distribution by combining mean values and covariance matrices of network activations 435, 200, 100 and 50 training examples, respectively. The validation during training was done by means of calculating arising from feeding both image sets into an Inception-v3 model [19], which was pre-trained on the ImageNet data- the FID score between 435 generated images and the whole dataset of 435 real images to make all cases comparable set [4]. Typically, activations of the penultimate network layer are used to calculate the FID score: (see “Segmentation evaluation” section for notes regarding overfitting). The GANs were conditioned with the segmenta- 2 1∕2 FID = ‖ −  ‖ + Tr(C + C − 2(C C ) ). (5) 1 2 1 2 1 2 2 tion masks of the dataset to generate synthetic images. This ensures that validation is not affected by artery morpholo - Here,  and  are the mean vectors and C and C the cor- 1 2 1 2 gies, but focuses on textures. responding covariance matrices. Small FID scores and thus For every combination of model and number of train- small distances between the image data distributions indi- ing examples, the best learning rate and learning rate decay cate visual similarity of the image sets as well as diversity scheme was grid searched individually. In summary, the of the generated image set meaning that mode collapse was initial learning rates ranged between 1e−3 and 3e−4 and prevented. It has not been proven so far that low FID scores were decreased to 1e−4 or 3e−5 in two steps every few induce high image quality when applied to medical images. hundred epochs. For optimization we used Adam with However, recent works indicate correlation between FID = 0.5 ,  = 0.999 and  = 1e−8 . During training, data 1 2 score and realism of generated medical images [13, 21] augmentation was performed by random rotations as well as Fig. 4 Example image and corresponding mask from the clinical dataset. The image on the right-hand side shows an overlay of image and mask 1 3 1432 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 horizontal and vertical flips. The edge lengths of the square synthetic images for segmentation pre-training. This means filter windows defining the speckle sizes in the speckle layer that the GANs will tend to overfit on the training set. How - were initialized with values ranging from 28 to 48 pixels. ever, when dealing with extremely small datasets, another split would reduce the amount of data too much in order to GAN evaluation get useful results. Furthermore, it is not well studied so far how overfitting via FID scores quantitatively affects GAN The final evaluation was done by means of calculating the performance. In the paper which introduces the FID score FID score between 1000 generated images and all 435 real [10], the authors also use the training set as a reference set images. For generating the synthetic images, the generators for calculating the FID score. were conditioned with artificial segmentation masks pro- The best performing GANs each generated 1000 IVUS duced by superimposing randomly rotated and disturbed images by using synthetic segmentation masks as condi- ellipses imitating artery lumen and intima/media layers. tional inputs (compare “GAN evaluation” section). The This approach simulates the way how GANs would be used segmentation networks were then pre-trained with the syn- in practice, namely to augment the dataset they were trained thetic IVUS data and fine-tuned with the real training data. with. As explained in “Fréchet Inception distance” section, We used the Dice coefficient and the modified Hausdorff the FID-score does not completely ensure reliability when distance [5] to measure the segmentation performances via used to evaluate realism of medical image sets. In order to fivefold cross-validation. The modified Hausdorff distance further assess the quality of the synthetic images, we calcu- allows meaningful evaluation of edge alignment for pixel lated two more metrics: The Jensen–Shannon divergence mask-based segmentation results, because it is less sensitive between gray value distributions of different segmenta- to outliers. The final results were calculated by means of the tion classes in ground-truth and synthetic images and the remaining test sets. structural similarity (SSIM) index between corresponding ground-truth and synthetic images. Results Segmentation evaluation Generation of synthetic IVUS images The generated IVUS images were used to improve seg- mentation performances of neural networks with U-Net The chart in Fig. 5 shows the FID scores of image sets gen- architecture [16]. The networks consisted of residual blocks erated by SpeckleGAN and the baseline GAN for different [9] in the down- and upsampling path. In each of the three numbers of training samples. The image sets generated by downsampling blocks, the spatial sizes of the feature maps SpeckleGAN result in FID scores ranging from 134.0 for 50 were halved while the numbers of feature maps were dou- training images to 113.4 for 435 training images. The base- bled up to 256. The upsampling blocks operated vice versa. line GAN on the other hand reaches values from just 354.9 The input image dimensions were 256×256 and the batch to 166.6. The performance improvements of SpeckleGAN size was 10. by means of FID scores for the different numbers of training To show that the use of synthetic IVUS data by Speck- examples are as follows: leGAN improves segmentation performance when dealing with small datasets, we went through two scenarios: – 50 GAN training examples: 165% – 100 GAN training examples: 58% 1. 50 examples available for training a segmentation net- – 200 GAN training examples: 30% work, – 435 GAN training examples: 47% 2. 100 examples available for training a segmentation net- work. Table  1 shows the GAN performances by means of Jensen–Shannon divergence and SSIM calculated between To get representative performance statistics for the segmen- synthetic and real images. The results are broken down into tation, we used the remaining examples from the whole data- the number of GAN training examples. set (385 for scenario 1 and 335 for scenario 2) as a test set. Figure 6 gives an overview of generated IVUS images for We used the training sets to train SpeckleGANs and baseline varying numbers of training examples. In all cases, Speck- GANs for data augmentation (we did not use the GANs from leGAN generates visually more appealing images than the “GAN evaluation” section). Because of the small datasets baseline GAN. The quality of SpeckleGAN images only in both scenarios, we used the whole training set as a refer- decreases slightly with fewer training examples, whereas the ence set for monitoring the FID score during training and quality of images generated by the baseline GAN decreases for finally choosing the model which is used to generate the strongly. 1 3 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 1433 Fig. 5 Comparison of Speckle- FID score GAN and the baseline GAN by means of resulting FID scores 354.9 for different numbers of GAN Speckle GAN training examples. A lower Baseline GAN values indicates better GAN performance 210.1 174.3 166.6 134.1 134 132.7 113.4 50 100200 435 Number of GAN training examples Table 1 Jensen–Shannon −3 # Train samples Datasets Jensen–Shannon divergence [10 ] SSIM divergences and structural similarity (SSIM) indices Lumen Intima/media Adventitia comparing synthetic and 50 samples SpeckleGAN/g-t 33.2 25.8 7.2 0.431 ±  0.028 ground-truth (g-t) image sets Baseline GAN/g-t 289.9 21.1 55.1 0.240 ± 0.029 100 samples SpeckleGAN/ -t 7.1 8.1 8.4 0.434 ± 0.030 Baseline GAN/g-t 28.6 12.0 4.1 0.445 ± 0.027 200 samples SpeckleGAN/g-t 4.5 8.7 9.3 0.440 ± 0.029 Baseline GAN/g-t 172.2 11.4 14.5 0.301 ± 0.027 435 samples SpeckleGAN/g-t 3.3 3.7 6.2 0.443 ± 0.027 Baseline GAN/g-t 186.0 10.5 11.7 0.324 ± 0.025 Low Jensen–Shannon divergence indicates similar gray value distributions, whereas high SSIM values indicate similar image appearance IVUS segmentation Discussion Table 2 shows the segmentation results of both scenarios Generation of synthetic IVUS images described in “Segmentation evaluation” section with and without pre-training by means of synthetic images gener- Keeping in mind that the FID score measures the structural ated by SpeckleGAN and the baseline GAN. The upper similarity of two image sets and their respective diversity, table presents the Dice coefficients, whereas the lower table Fig. 5 clearly shows that the baseline GAN fails to generate presents the modified Hausdorff distances. We performed t IVUS image sets with sufficient quality and diversity, if the tests in a pairwise fashion to check if the means differ sig- number of training examples decreases. SpeckleGAN on the nificantly. We note that p value correction for multi-hypoth- other hand hardly suffers from a reduced number of train- esis tests must not be applied in this setting, because we ing samples. The images in Fig. 6 show that SpeckleGAN do not perform multiple tests on the same dataset nor do outperforms the baseline GAN for all different numbers of we test one and the same hypothesis on several datasets. training examples. The visual appearance suffers just slightly The corresponding p values are depicted in the four right- from the reduced number of training examples. The baseline most columns. A low value (typically p < 0.05 ) indicates a GAN generates IVUS images with very blurry and wavy pat- significant difference in the calculated mean values of the terns which do not resemble real speckle. For 100 training underlying segmentation metrics. images, these are smeared out completely and checkerboard 1 3 1434 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 Number of GAN training examples 435 200 10050 Fig. 6 Comparison of IVUS images generated by SpeckleGAN and the baseline GAN for different numbers of GAN training examples. All images were acquired with the same conditional segmentation mask Table 2 Comparison of Dice coefficients (upper table) and modified Hausdorff distances (lower table) as a function of the number of training examples # Samples Model Dice coefficient (%) p-values Baseline GAN No pre-train Intima/media Lumen In/Me Lum In/Me Lum 50 SpeckleGAN 83.18 ± 0.23 93.44 ± 0.42 < 0.001 0.122 < 0.001 < 0.001 Baseline GAN 81.51 ± 92.87 92.87 ± 1.18 ∗ ∗ < 0.001 < 0.001 No pre-train 80.02 ± 0.67 91.74 ± 1.50 ∗ ∗ ∗ ∗ 100 SpeckleGAN 86.02 ± 0.44 95.70 ± 0.15 0.002 < 0.001 < 0.001 < 0.001 Baseline GAN 85.18 ± 0.46 94.95 ± 0.21 ∗ ∗ 0.07 0.495 No pre-train 84.79 ± 0.22 94.83 ± 0.19 ∗ ∗ ∗ ∗ # Samples Model Mod. Hausdorff dist. [p] p-values Baseline GAN No pre-train Intima/media Lumen In/Me Lum In/Me Lum 50 SpeckleGAN 1.88 ± 0.14 3.07 ± 0.40 0.058 1.000 < 0.001 0.011 Baseline GAN 2.07 ± 0.30 3.04 ± 1.45 ∗ ∗ < 0.001 0.008 No pre-train 2.54 ± 0.51 3.79 ± 2.09 ∗ ∗ ∗ ∗ 100 SpeckleGAN 0.79 ± 0.06 0.37 ± 0.11 0.020 < 0.001 0.001 < 0.001 Baseline GAN 0.90 ± 0.09 0.92 ± 0.36 ∗ ∗ 0.292 0.071 No pre-train 0.95 ± 0.06 1.13 ± 0.39 ∗ ∗ ∗ ∗ The four columns on the right show p-values calculated by pairwise t-tests. If p-values are smaller than 0.05, they are printed boldly artifacts are visible. The baseline GAN completely fails the superiority of SpeckleGAN apart from minor excep- when trained with 50 training samples. tions. It is interesting to see, that these exceptions occur The evaluations by means of the Jensen–Shannon in cases, where visual appearance clearly favors the Speck- divergence and the SSIM depicted in Table  1 also show leGAN results (compare Fig. 6). This can be explained by 1 3 Baseline GAN Speckle GAN International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 1435 considering that both metrics do not take into account all baseline GAN also improves the resulting Dice coefficients. image characteristics which are important for IVUS images. This indicates that valuable information is even present in For example, a low Jensen–Shannon divergence can be the morphology of blurred images. achieved even if the synthetic images do not show speckle The evaluation of the Jensen–Shannon divergence in at all, because the two-dimensional arrangement of the grey Table 1 and the comparison with [20] shows that the struc- values does not affect gray value histograms. SSIM on the ture of the adventitia in particular benefits from Speckle - other hand compares luminance, contrast and structure (i.e., GAN. However, its appearance is only of minor importance the correlation) of two images. In the case of IVUS images for the segmentation of lumen and intima/media. The base- luminance and contrast are reliable measures, whereas cor- line GAN achieves much worse Jensen–Shannon diver- relation does not necessarily imply similarity, because of gences for the lumen. Nevertheless, for 50 training examples speckle noise. Two almost identical images except from a the lumen segmentation performance is equivalent or even slight shift of speckle patches can have zero (or even nega- better by pre-training with images of the Baseline GAN. tive) correlation. This also holds for other classic similarity This leads to the conclusion that realistic speckle does not measures such as peak signal to noise ratio (PSNR) or mean play an important role for segmentation of the lumen when squared error (MSE), so these have not been used here. dealing with 20 MHz IVUS images. Comparing [1, 3, 22] The authors of [20] used 2075 images of the same clinical and the results of scenario 2, it can be seen that our approach IVUS dataset (without segmentation masks) for training a nearly reached state-of-the-art performance, although our two-stage GAN in order to generate synthetic images. Our training set was smaller and no special care was taken about approach results in Jensen–Shannon divergences which are optimization of the segmentation network used in this work one order of magnitude below the values achieved in [20], (see “Segmentation evaluation” section). even for only 100 training examples. In particular, the val- ues obtained for the adventitia layer are far superior, which shows that our approach results in speckle patches leading Conclusion to gray value distributions resembling the real ones very closely. This could be due to the ability of our algorithm to SpeckleGAN improves quality and diversity of generated produce speckles with various sizes over a single image. But IVUS images compared to a baseline GAN model with- also the baseline GAN performs better than the approach out a speckle layer. It generates visually appealing images in [20] regarding intima/media and adventitia layers when with defined morphology (conditioned by segmentation trained with 100 or more samples. masks) even when trained with extremely small datasets GANs often suffer from mode collapse [17]. This means of 50 images. SpeckleGAN offers a wide range of possible that only a few or even only a single mode of the data dis- applications. First of all, it is not limited to generate IVUS tribution can be generated, which reduces the variety of the images. It could be applied to ultrasound images in general samples drastically. SpeckleGAN has the advantage that and to other imaging modalities that produce images with mode collapse can only affect the morphology (or back - speckle such as optical coherence tomography or radar. As ground) of the image and not the speckle patterns, because seen in the previous section, realistic speckle patterns have these are randomly generated by the speckle layer. only minor impact on the performance when it comes to segmentation of lumen and intima/media layers in IVUS. IVUS segmentation Classification, detection or tracking tasks which heavily rely on speckle patterns could benefit much more from realistic It has been demonstrated (see Table  2) that pre-training speckles generated with SpeckleGAN when tackled with improves the mean Dice coefficient and the mean modified data driven algorithms. Hausdorff distance regardless of using synthetic images generated by SpeckleGAN or by the baseline GAN. But the improvements due to the baseline GAN are only statistically Acknowledgments Open Access funding provided by Projekt DEAL. significant for 50 training examples, not for 100 training examples. In nearly all cases, pre-training with synthetic Funding This work was partially funded by the European Regional images of SpeckleGAN leads to better mean segmentation Development Fund (ERDF), by the Hamburgische Investitions- und Förderbank (IFB) and by the Free and Hanseatic City of Hamburg. performances than pre-training with images from the base- line GAN. However, the improvement is not statistically Compliance with ethical standards significant in three cases of 50 training examples: for the Dice coefficient of the lumen as well as for the modified Conflict of interest The authors declare that they have no conflict of Hausdorff distance for both intima/media and lumen. It can interest. be seen that pre-training with low quality images from the 1 3 1436 International Journal of Computer Assisted Radiology and Surgery (2020) 15:1427–1436 Ethical approval This article does not contain any studies with human 10. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S participants or animals performed by any of the authors. (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information Informed consent Not applicable. processing systems. Curran Associates Inc, vol 30, pp 6626–6637 11. Karpatne A, Atluri G, Faghmous JH, Steinbach M, Banerjee A, Ganguly A, Shekhar S, Samatova N, Kumar V (2017) Theory- Open Access This article is licensed under a Creative Commons Attri- guided data science: a new paradigm for scientific discovery from bution 4.0 International License, which permits use, sharing, adapta- data. IEEE Trans Knowl Data Eng 29(10):2318–2331 tion, distribution and reproduction in any medium or format, as long 12. Mendis S, Puska P, Norrving B, Organization WH, Federation as you give appropriate credit to the original author(s) and the source, WH, Organization WS (2011) Global atlas on cardiovascular dis- provide a link to the Creative Commons licence, and indicate if changes ease prevention and control. World Health Organization, Geneva were made. The images or other third party material in this article are 13. Middel L, Palm C, Erdt M (2019) Synthesis of medical images included in the article’s Creative Commons licence, unless indicated using gans. In: First international workshop, UNSURE 2019, and otherwise in a credit line to the material. If material is not included in 8th International Workshop, CLIP 2019, Held in Conjunction with the article’s Creative Commons licence and your intended use is not MICCAI 2019, pp 125–134 permitted by statutory regulation or exceeds the permitted use, you will 14. Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral need to obtain permission directly from the copyright holder. To view a normalization for generative adversarial networks. In: 6th inter- copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. national conference on learning representations (ICLR 2018) 15. Park T, Liu MY, Wang TC, Zhu JY (2019) Semantic image syn- thesis with spatially-adaptive normalization. In: Proceedings of the IEEE conference on computer vision and pattern recognition References 16. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image 1. Balocco S, Gatta C, Ciompi F, Wahle A, Radeva P, Carlier S, Unal computing and computer-assisted intervention (MICCAI), G, Sanidas E, Mauri J, Carillo X, Kovarnik T, Wang CW, Chen Springer, Berlin, vol 9351, pp 234–241 HC, Exarchos TP, Fotiadis DI, Destrempes F, Cloutier G, Pujol 17. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, O, Alberti M, Mendizabal-Ruiz EG, Rivera M, Aksoy T, Downe Chen X, Chen X (2016) Improved techniques for training gans. RW, Kakadiaris IA (2014) Standardized evaluation methodology In: Advances in neural information processing systems 29, pp and reference database for evaluating ivus image segmentation. 2234–2242. Curran Associates, Inc Comput Med Imaging Graph 38(2):70–90 18. Shorten C, Khoshgoftaar TM (2019) A survey on image data aug- 2. Burckhardt C (1978) Speckle in ultrasound b-mode scans. IEEE mentation for deep learning. J Big Data 6:50 Trans Son Ultrason 25(1):1–6 19. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) 3. China D, Mitra P, Sheet D (2017) Segmentation of lumen and Rethinking the inception architecture for computer vision. In: external elastic laminae in intravascular ultrasound images using 2016 IEEE conference on computer vision and pattern recogni- ultrasonic backscattering physics initialized multiscale random tion (CVPR), pp 2818–2826 walks. In: Computer vision, graphics, and image processing. 20. Tom F, Sheet D (2008) Simulating patho-realistic ultrasound Springer, New York, pp 393–403 images using deep generative networks with adversarial learn- 4. Deng J, Dong W, Socher R, Li L, Li K Li F-F: Imagenet: a large- ing. In: 2018 IEEE 15th international symposium on biomedical scale hierarchical image database. In: 2009 IEEE conference on imaging (ISBI 2018), pp 1174–1177 (2018) computer vision and pattern recognition, pp 248–255 21. Uzunova H, Ehrhardt J, Jacob F, Frydrychowicz A, Handels H 5. Dubuisson MP, Jain A (1994) A modified hausdorff distance for (2019) Multi-scale gans for memory-efficient generation of high object matching. In: Proceedings of the 12th international confer- resolution medical images. In: Medical image computing and ence on pattern recognition, pp 566–568 computer assisted intervention (MICCAI), pp 112–120. Springer, 6. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley New York (2019) D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial 22. Yang J, Faraji M, Basu A (2019) Robust segmentation of arterial nets. In: Advances in neural information processing systems. Cur- walls in intravascular ultrasound images using dual path u-net. ran Associates Inc, vol 27, pp 2672–2680 Ultrasonics 96:24–33 7. Goodman JW (1968) Introduction to fourier optics. McGraw-Hill, New York Publisher’s Note Springer Nature remains neutral with regard to 8. Goodman JW (2007) Speckle phenomena in optics: theory and jurisdictional claims in published maps and institutional affiliations. applications. Roberts and Company Publishers, Englewood 9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 1 3

Journal

International Journal of Computer Assisted Radiology and SurgerySpringer Journals

Published: Sep 18, 2020

There are no references for this article.