Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automated detection and segmentation of thoracic lymph nodes from CT using 3D foveal fully convolutional neural networks

Automated detection and segmentation of thoracic lymph nodes from CT using 3D foveal fully... Background: In oncology, the correct determination of nodal metastatic disease is essential for patient manage‑ ment, as patient treatment and prognosis are closely linked to the stage of the disease. The aim of the study was to develop a tool for automatic 3D detection and segmentation of lymph nodes (LNs) in computed tomography (CT ) scans of the thorax using a fully convolutional neural network based on 3D foveal patches. Methods: The training dataset was collected from the Computed Tomography Lymph Nodes Collection of the Cancer Imaging Archive, containing 89 contrast‑ enhanced CT scans of the thorax. A total number of 4275 LNs was segmented semi‑automatically by a radiologist, assessing the entire 3D volume of the LNs. Using this data, a fully convolutional neuronal network based on 3D foveal patches was trained with fourfold cross‑ validation. Testing was performed on an unseen dataset containing 15 contrast‑ enhanced CT scans of patients who were referred upon suspicion or for staging of bronchial carcinoma. Results: The algorithm achieved a good overall performance with a total detection rate of 76.9% for enlarged LNs during fourfold cross‑ validation in the training dataset with 10.3 false‑positives per volume and of 69.9% in the unseen testing dataset. In the training dataset a better detection rate was observed for enlarged LNs compared to smaller LNs, the detection rate for LNs with a short‑axis diameter (SAD) ≥ 20 mm and SAD 5–10 mm being 91.6% and 62.2% (p < 0.001), respectively. Best detection rates were obtained for LNs located in Level 4R (83.6%) and Level 7 (80.4%). Conclusions: The proposed 3D deep learning approach achieves an overall good performance in the automatic detection and segmentation of thoracic LNs and shows reasonable generalizability, yielding the potential to facilitate detection during routine clinical work and to enable radiomics research without observer‑bias. Keywords: Deep learning, Artificial intelligence, Lymph nodes, Computed tomography, Staging Background The correct determination of nodal metastatic disease is imperative for patient management in oncology, since the patients’ treatment and prognosis are inherently linked to the stage of disease [1]. For nodal disease stag- *Correspondence: andra.iuga@uk‑koeln.de ing of solid tumors, unidimensional measurements of Institute of Diagnostic and Interventional Radiology, Medical Faculty and University Hospital Cologne, University of Cologne, Kerpener Str. 62, lymph node (LN) short-axis diameters (SAD) are rou- 50937 Cologne, Germany tinely performed during tumor staging and re-staging Full list of author information is available at the end of the article © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Iuga et al. BMC Med Imaging (2021) 21:69 Page 2 of 12 imaging examinations and evaluated according to differ - clinical practice, AI has been lately used in digital pathol- ent standardized diagnostic criteria such as the Response ogy, in imaging of the brain for the detection of metasta- Evaluation Criteria in Solid Tumors (RECIST) [2]. For sis and in imaging of the chest, for the early detection of lymphomas, a different set of standardized diagnos - breast carcinoma [17, 18]. tic criteria such as the Response Evaluation Criteria in Regarding LNs a wide range of 2D approaches [21, 22] Lymphoma (RECIL) [3] or the Lugano Criteria [4] have have been proposed so far for detection and segmenta- been suggested, using bi- instead of unidimensional LN tion, where LNs were segmented using unidimensional measurements. measurements consisting of the determination of the Although it is commonly accepted that larger LNs have SAD of the target lesions. However, a unidimensional a higher probability of being malignant as compared to approach can underestimate the size as well as the smaller LNs, previous work has shown that enlargement growth of LNs, especially when considering enlarged of LNs alone is not the most reliable predictive factor LNs. Consequently, correct segmentation of LNs consid- for malignancy with only 62% sensitivity and specific - ering the whole volume of the lesion is of ultimate impor- ity being demonstrated for predicting LN metastasis in tance for proper diagnosis and follow-up. patients with non-small cell lung cancer when using the u Th s, the aim of the study was to develop a tool for proposed 10  mm cut-off [5]. Consequently, small LNs automatic 3D LN detection and segmentation in com- potentially harboring micrometastases should be taken puted tomography (CT) scans using a fully convolutional into consideration for improved diagnostic accuracy neural network based on 3D foveal patches. during disease staging [6–8]. Unfortunately, no imaging technique (including, e.g., functional techniques such as Methods diffusion-weighted magnetic resonance imaging) so far Description of the training and validation dataset has been demonstrated to be capable of reliably detecting For the training and validation dataset, images were LN micrometastases [9–11]. obtained from the CT Lymph Nodes Collection of the Radiomics is a promising novel strategy for predicting Cancer Imaging Archive [22]. The dataset can be accessed LN dignity from images. Radiomic models thereby are and downloaded at https:// wiki. cance rimag ingar chive. built using e.g., machine learning algorithms based on a net/ displ ay/ Public/ CT+ Lymph+ Nodes. The dataset was large set of quantitative features, which are mathemati- made available to allow for a direct comparison to other cally or statistically derived from medical images [12–14]. detection methods in order to advance the state of the For extraction of radiomic features and detection of LN art and to encourage development and improvement of macro- as well as micrometastases, a reliable and correct computer-aided detection methods. The dataset con - detection and whole-volume segmentation of small as tained contrast-enhanced CT images of 90 patients from well as large LNs is needed. Manual or even semi-auto- different scanners with an in-slice resolution between mated segmentation of LNs is extremely time-consuming 0.63 and 0.98 mm and a slice thickness ranging from 1 to and LN detection strongly depends on the radiologist’s 5 mm (88 CT scans with a slice thickness of 1 or 1.5 mm experience, thus currently hampering the translation of and 2 CT scans with a slice thickness of 5  mm). To the a radiomics-based decision support to clinical routine. best of our knowledge, there is no information available Consequently, fully automated approaches are urgently regarding patients’ disease or further demographic infor- needed for a fast and robust detection and segmentation mation. The included CT scans showed normal-sized of LNs. thoracic LNs (SAD < 10  mm) as well as lymphadenopa- Recent developments in deep learning (DL) have thy (SAD ≥ 10 mm). The datasets included also CT scans shown promising results in areas relying on imaging data, containing mediastinal bulky disease and bulky axillary especially in radiology [15, 16] and cancer imaging [17, lymphadenopathy. In order to allow better comparison 18]. While requiring little human input, DL algorithms to clinical routine with usually heterogeneous datasets, significantly outperform existing detection and segmen - these were not excluded from network training. One case tation methods [19], thus offering automated quantifica - was excluded from our study since it did not contain the tion and selection of the most robust features, including complete scan of the thorax. a proper 3D assessment of lesions. Moreover, previous For this dataset Institutional Review Board approval work showed that 3D DL architectures were successful was not required because it is a publicly available dataset. in learning high-level tumor appearance features out- performing 2D models [20]. In cancer imaging, the use Description of the testing dataset of artificial intelligence (AI) has shown a great utility not Further, a second unseen dataset was collected for only in the (semi)automatic tumor detection, but also independent testing. Similar to the training and in tumor characterization, and treatment follow-up. In validation dataset, the testing dataset consisted of Iuga  et al. BMC Med Imaging (2021) 21:69 Page 3 of 12 contrast-enhanced CT scans (n = 15). The patients (8 clinical experience and focus in oncological imaging. The male, 7 female; mean age 68 ± 16.6  years) were referred training and validation dataset consisted of 4275 LNs, upon suspicion or for staging of bronchial carcinoma with an average of 48 LNs per patient. When consider- from March 2016 to November 2017 (Table 1). All exam- ing the location of the LNs the 4275 LNs included 2272 inations were performed on a 128-slice PET/CT-system axillary and 2003 mediastinal/hilar LNs. The LNs had an (Siemens Biograph mCT Flow 128 Edge, Siemens Medi- SAD of 1.3–67.6  mm. A total number of 814 enlarged cal). Patients were scanned supine in cranio-caudal direc- (SAD > 10 mm) LNs was segmented. The segmentation of tion during inspirational breath-hold after intravenous the LNs took approximately between 45 and 120 min per injection of 120  ml contrast medium (Accupaque 350, dataset. LNs with an SAD < 5 mm that had been mistak- GE Healthcare) with an injection rate of 2.5  ml/s and a enly annotated (n = 690) were not included in the evalu- delay of 60  s. The following scan parameters were used: ation. Figure  1 shows the process of data collection and collimation 128 × 0.6  mm, rotation time 0.5  s, pitch 0.6. LN segmentation. All axial images were reconstructed with a slice thickness For data evaluation segmented LNs were divided into of 2  mm. Similar to the training and validation dataset, 3 groups based on their SAD: 5–10  mm (2523 LNs); the testing dataset included CT scans, that showed both 10–20 mm (954 LNs); and > 20 mm (107 LNs). normal-sized thoracic LNs and lymphadenopathy. Furthermore, based on their localization all segmented Ethical approval was waived due to the retrospective LNs were divided into axillary (right, left) and mediasti- design of the study based on preexisting images (Eth- nal (including hilar LNs). The mediastinal LNs were fur - ics Committee of the Faculty of Medicine, University of ther divided in 11 groups depending on their location Cologne, reference number 19-1390/ 07.08.2019). corresponding levels (levels 1–11), based on the Moun- tain–Dresler modification of the American Thoracic Lymph node segmentation Society LN map [23]. The side (right respectively left) was Training and validation dataset considered for level 1, level 2, level 4, level 10 and level A radiologist (blinded; more than 4  years of experience 11. in thoracic imaging) segmented all LNs of the train- ing and validation dataset with an SAD of at least 5 mm Testing dataset in the mediastinal, hilar and axillary regions using the In the testing dataset, a total of 113 LNs were segmented, semi-automatic 3D Multi-Modal Tumor Tracking tool with an average of 7.5 LNs per patient. The segmented of a commercially available software platform (Intel- LNs included both axillary and mediastinal/hilar LNs. liSpace Portal, Version 11.0, Philips Healthcare). In case In this dataset, nevertheless, because of time constraints of unclear LNs or findings CT images were discussed only LNs with an SAD > 10 mm were segmented. with an experienced radiologist with more than 15 years Network architecture A 3D fully convolutional neural network (u-net) was Table 1 Demographic details (age and sex) for all patients trained on the training dataset, which obtains as input included in the test dataset the original 3D images and the corresponding label Age Sex masks of the segmented LNs. The output of the net - work was a probability map, showing the probability of 1 33 Female each voxel belonging to a mediastinal or axillary LN. This 2 64 Female probability map was assessed with a fixed threshold of 3 79 Male 0.4 to obtain the final segmentation result. The threshold 4 69 Female value was optimized on the training images to yield the 5 79 Male best Dice value over all training samples. Finally, a con- 6 63 Male nected component analysis is applied to obtain the indi- 7 75 Male vidual predicted LNs. 8 68 Male The segmentation network was trained on the 3D 9 74 Female images. The used network architecture, named foveal 10 47 Male neural network (f-net) [24] is inspired by the human eye 11 33 Female and the distribution of the photoreceptor cells, which 12 69 Female have the highest resolution at the fovea centralis. A f-net 13 72 Male architecture has been used because this architecture 14 33 Female combines information of different resolution levels. On 15 67 Male the one hand, LNs were analyzed in high resolution to Iuga et al. BMC Med Imaging (2021) 21:69 Page 4 of 12 Fig. 1 Flow‑ chart showing data in‑ and exclusion together with segmentation for network training. From a total number of 90 contrast ‑ enhanced CT scans contained in the publicly available dataset 1 CT scan was excluded because it did not contain the complete scan of the thorax. Further, a total of 690 LNs were excluded because of an SAD < 5 mm. CT scans containing mediastinal bulky disease and bulky axillary lymphadenopathy were not excluded. Top left image—exemplary segmentation of bulky axillary lymphadenopathy; top right image—exemplary segmentation of normal axillary LNs; Bottom left image—exemplary segmentation of bulky mediastinal lymphadenopathy; Bottom right image—exemplary segmentation of enlarged LNs. Considerable differences in image quality of the different CT scans was noted as exemplarily shown in the bottom right image. CT computed tomography, LNs lymph nodes, SAD short‑axis diameter enable feature learning (texture, shape and size). On the feature extraction pathways is equivalent to the number other hand, neighboring anatomy was analyzed in low of resolution levels in the network. Each feature extrac- resolution. tion pathway comprises three successive blocks of valid As previously mentioned, the network considers convolution with a kernel size of 3, batch-normaliza- image patches at multiple resolution scales in order to tion, and rectified linear activation function, so called arrive at the final prediction, combining local informa - convolutional layer (Conv-L), batch normalization layer tion gained from high resolutions with context from (BN-L), and ReLU layer (ReLU-L) (CBR) blocks. The lower resolutions. Unlike u-nets [25], which receive a outputs of the feature extraction levels are combined single scale input image and create the coarser resolu- in a feature integration pathway through an additional tion scales by downsampling within the network, f-net CBR block followed by upsampling of the lower reso- directly receives the input as a multiscale pyramid of lution outputs. Finally, a channel-wise softmax layer is image patches. Here, an architecture with four resolu- applied to acquire pseudo class probabilities for the LN tion levels was used. Accordingly, each input sample labels. In addition, f-net was chosen because its archi- to the network consisted of four image patches at the tecture requires less memory and runtime compared to same position but downscaled for the lower resolution u-net [25]. Figure  2 shows an overview of the network levels. The input to each resolution level is processed architecture. in a feature extraction pathway. Thus, the number of Iuga  et al. BMC Med Imaging (2021) 21:69 Page 5 of 12 Fig. 2 Sketch of the network architecture. A 3D fully convolutional foveal neural network was trained. The network architecture is inspired by the human eye and the distribution of the photoreceptor cells, which have the highest resolution at the fovea centralis. The network consists of several blocks of convolutional layers, batch normalization and the rectified linear activation function (CBR), which extract features at different resolution levels. CBR blocks are followed by upsampling layers (CBRU) to match the resolution of the other levels Training and validation setup patches was applied on-the-fly with a maximal scaling Training was performed using Microsoft Cognitive factor of 1.1 and a maximal rotation of 7°. The usage of Toolkit CNTK with a Python interface (Hardware: stronger augmentation with regard to rotation and scal- 2.40  GHz processor with 2 × NVIDIA GTX 1080ti with ing showed a decline in performance and was therefore 11  GB graphics memory). The images were pre-pro - abandoned. Individual LNs were not manipulated during cessed by resampling them to a fixed isotropic sampling data augmentation and therefore the total number of LNs grid with a spacing of 1.5  mm, this increases the speed remained unchanged. Test-time augmentation has not of the network training and deployment while preserv- been performed. ing sufficient image detail. Used matrix size was standard The cross-entropy function was chosen for the optimi - (512 × 512) and data pixel size was 1 mm isotropic. zation of the network since it showed good performance To enhance the soft-tissue contrast of the LNs, only on many tasks [12, 13, 26].The network was trained for the gray-value window 750/70 Hounsfield Units (HU) 1000 epochs with a minibatch size of 8 and the AdaDelta was considered and gray-values outside this range were optimizer. clipped to the upper or lower limit. This gray-value win - The models were trained using fourfold cross-valida - dow was determined automatically on the training data tion on the training data with the dataset being randomly by computing the mean and standard-deviation of all split into four groups (i.e., training was performed on 3 of voxels labeled as LN and their direct neighborhood. No the groups while the remaining group was used for vali- further pre-processing was performed. dation.). The validation was used to explore performance Training was performed based on patches (hereby, it of the network architecture and training setup with was ensured that at least 30% of the patches contain LN regard to number of resolution levels in the network, voxels), which were drawn randomly from the images. As optimizer, augmentation and patch sampling strategy. In data augmentation, random scaling and rotation of the the following we present the results of the best training Iuga et al. BMC Med Imaging (2021) 21:69 Page 6 of 12 experiment. A full ablation study is beyond the scope of of the results, bootstrapping analysis was performed this paper and will be addressed in future work. (with replacement using 100% of the sample size with the number of simulations N = 10.000). Testing setup Finally, the model trained on the complete training data- Results set was tested on the previously unseen, in-house derived Calculation of the LN probability maps took about 24  s testing dataset. per dataset on a graphics processing unit (GPU), while training took 120–180 min. Evaluation criteria Bootstrap analysis was performed and confirmed the The performance of the network is assessed by looking at robustness of the results. The empirical distribution of the individual LNs. For the ground-truth the single nodes the detection rate showed a standard deviation of 1.7%. are available from the annotation process. For the pre- dicted LNs, a connected component analysis of the pre- Network performance: validation dataset dicted segmentation mask is performed. Segmentation accuracy One performance metric is the detection rate, which is Overall, a mean Dice value of 0.75 and 0.48 is achieved the number of detected LNs divided by the total number on the training and validation dataset. True positive rate of LNs. A LN is thereby counted as detected if there was and positive predictive value account to 0.76 and 0.75 on at least one voxel overlap with the segmentation mask the training and 0.45 and 0.62 on the validation data. The predicted by the network. Dice value for the mediastinal LN accounts to 0.44 and to The second performance metric is the number of false 0.55 for the axillary LN with a smaller gap between train- positives (FP) per volume. Here, a connected component ing and validation, therefore showing less overfitting. in the predicted segmentation mask without overlap to a More details can be seen in Fig. 3. ground truth is counted as FP. This rather loose criterion was chosen instead of Lymph node detection rate according to lymph node size stricter measures, e.g., larger overlap thresholds between The overall detection rate for all LNs with an SAD > 5 mm ground truth and predicted segmentation, as one par- using the trained network was 66.5% with 10.3 FPs per ticular challenge in LN assessment is that differentiation volume on average. Exemplary images of detected and of individual nodes is often not possible when adjacent missed LNs compared to the ground truth segmentations nodes merge into clusters due to pathology. Obviously, it are shown in Fig.  4. The highest detection rate could be can occur that a ground truth segmentation is ’detected’ observed when looking only at LNs with an SAD > 20 mm, by multiple predicted segmentations and similarly that a while detection rate was only good to moderate when predicted segmentation overlaps with multiple ground- considering smaller LNs (SAD > 20  mm vs. SAD truth segmentations. This criterion appears to be current 10–20  mm: 91.6% vs. 75.3%, p < 0.001; SAD > 20  mm vs. state-of-the-art and has been used in previous work [22]. SAD 5–10  mm: 91.6% vs. 62.2%, p < 0.001; Fig.  5). Look- In addition, the segmentation quality is assessed on a ing only at the subgroup of clinically relevant enlarged voxel level per image for the detected LNs. To this end, LNs (defined by an SAD > 10  mm), a total detection rate all missed LNs are removed from the ground-truth mask of 76.9% was obtained with a significantly higher detec - and all FP are removed from the predicted segmentation tion rate for LNs with an SAD > 10  mm as compared to mask. From the resulting masks Dice, true-positive rate LNs with an SAD < 10  mm (76.9% vs. 62.1%, p < 0.001; and positive predictive value are computed. Fig. 5). Statistical analysis Lymph node detection rate according to lymph node location Statistical analysis was performed in the open-source A better overall detection rate was obtained for the axil- statistics package R version 3.3.1 for Windows (R: A lan- lary LNs compared to mediastinal LNs (70.0% vs. 62.3%, guage and environment for statistical computing, R Core p < 0.001; Fig.  5). A better detection could be observed Team, R Foundation for Statistical Computing. ISBN when looking only at LNs with an SAD > 20  mm, while 3-900051-07-0, 2019, URL http://R- proje ct. org/). After detection rate was only good to moderate when con- assessing normal distribution of the data, a two-sided sidering smaller LNs, both for axillary and mediasti- unpaired t-test was applied to determine the differences nal LNs; axillary LNs with an SAD > 20  mm versus SAD in means of the detection rates considering both size and 10–20  mm: 90.5% versus 74.9%, p < 0.001; SAD > 20  mm location of the LNs. Statistical significance was defined versus SAD 5–10  mm: 90.5% versus 47.2%, p < 0.001; as p ≤ 0.05. To get an impression of the variability of the Fig.  5); mediastinal LNs with an SAD > 20  mm ver- observed detection rates and to confirm the robustness sus SAD 10–20  mm: 92.3% versus 75.7%, p < 0.001; Iuga  et al. BMC Med Imaging (2021) 21:69 Page 7 of 12 Fig. 3 Overview of dice (a), positive predictive value (b) and true positive rate (c) training and validation data for mediastinal and axillary lymph nodes Fig. 4 Examples of ground‑truth and predicted segmentations. a Optimal LN segmentation, b segmentation of a LN bulk, c purple—missed LNs; red—true positive, detected LN which was initially not segmented by the radiologist (short‑axis diameter < 5 mm); d red—false positive segmentation (vessels detected as LN). LN lymph node Fig. 5 Overview of the validation detection rates depending on the short‑axis diameter of the segmented LNs: 5–10 mm (2523 LNs), 10–20 mm (954 LNs), and > 20 mm (107 LNs). a Overall detection rates of both axillary, mediastinal and hilar LNs; b detection rates of axillary LNs, c detection rates of mediastinal and hilar LNs. LNs lymph nodes Iuga et al. BMC Med Imaging (2021) 21:69 Page 8 of 12 SAD > 20 mm versus SAD 5–10 mm: 92.3% versus 33.8%, Discussion p < 0.001; Fig.  5). Looking only at the subgroup of clini- The aim of the study was to develop a 3D DL algorithm cally relevant enlarged LNs (defined by an SAD > 10 mm), for robust LN detection and segmentation in contrast- a slightly better detection rate was shown for LNs of the enhanced CT scans of the thorax. The main findings can mediastinal region compared to the axillary (77.8% vs. be summarized as follows: (1) The algorithm achieved 76.0%, p < 0.05). a good overall performance with an overall validation Based on the labelling of the mediastinal LNs a fur- detection rate of 70% for LNs with an SAD over 5  mm. ther analysis was performed to establish detection rates (2) Reasonable generalizability was achieved with a simi- at different levels (Fig.  6). The best detection rates were lar detection rate for enlarged LNs (SAD > 10  mm) in obtained for LNs located in Level 4R (83.6%), and Level the fourfold cross-validation dataset compared to the 7 (80.4%), while the lowest detection rate was recorded unseen testing dataset of 76.9% and 69.9%, respectively. for LNs located in Level 8 (25.9%). A better detection (3) A better validation detection rate was observed for rate was shown for LNs > 10 mm for all levels. For exam- enlarged LNs compared to smaller LNs (enlarged LNs ple, level 2 R (right) showed a detection rate of 96.5% for showed a detection rate of 76.9%; the detection rate for LNs > 10 mm versus 63.5% for LNs < 10 mm. For level 7, a LNs with an SAD ≥ 20 mm and SAD 0–5 mm was 91.6% total detection rate of 93.3% was shown for LNs > 10 mm and 40.8%, respectively). (4). Regarding different LN loca - versus 72.0% for LNs < 10 mm. The detection rate was sta - tions, the best validation detection rates were obtained tistically significant different for different levels (Table 2). for LNs located in Level 4R (right mediastinal), Level 7 (mediastinal subcarinal), and Level 10 R (right hilar) of Network performance: testing dataset 83.6%, 80.4% and 74.6%, respectively. (5) Segmentation On our in-house dataset, which was unseen during accuracy shows a promising Dice value of 0.48. Segmen- training, a detection rate of 69.9% was achieved for the tation accuracy is superior in the axillary region with less enlarged LNs (SAD > 10  mm). This result compares well overfitting. This is probably due to the stronger homoge - to the 76.9% achieved on the validation data set. It shows neity of the data compared to the mediastinal LNs. the generalization capabilities of our network which Although a few DL approaches have been proposed is able to cope with the domain shift when applied to for mediastinal LNs [21, 22, 26], there is still only a images with a different pathology (bronchial cancer in very limited number of publications available. A study the testing data, unclear cancer in the training and vali- similar to this work using the same evaluation criteria, dation data). employs a 3D u-net with additional organ segmentation Fig. 6 Overview of the validation detection rates of the mediastinal and hilar lymph nodes according to the localization of the lymph nodes. R right, L left Iuga  et al. BMC Med Imaging (2021) 21:69 Page 9 of 12 Table 2 Comparison matrix for p values of systematic differences in validation detection rates between lymph node levels p VALUES 1L 1R 2L 2R 3A 3P 4L 4R 5 6 7 8 9 10L 10R 11L 11R 1L 0.29 0.02 < 0.001 0.08 0.4 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.96 0.32 0.02 < 0.001 0.02 0.04 1R 0.29 0.34 < 0.001 0.75 0.76 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.21 0.92 0.17 < 0.001 0.11 0.15 2L 0.02 0.34 < 0.001 0.28 0.14 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.5 0.45 < 0.001 0.26 0.32 2R < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.83 0.01 0.75 0.97 0.08 < 0.001 0.01 0.16 0.5 0.57 0.69 3A 0.08 0.75 0.28 < 0.001 0.42 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.01 0.9 0.15 < 0.001 0.11 0.17 3P 0.4 0.76 0.14 < 0.001 0.42 < 0.001 < 0.001 < 0.001 < 0.001 0 0.29 0.72 0.08 < 0.001 0.06 0.1 4L < 0.001 < 0.001 < 0.001 0.83 < 0.001 < 0.001 < 0.001 0.9 0.89 0.1 < 0.001 0.01 0.12 0.59 0.51 0.64 4R < 0.001 < 0.001 < 0.001 0.01 < 0.001 < 0.001 < 0.001 0.01 0.02 0.51 < 0.001 < 0.001 0.01 0.12 0.12 0.22 5 < 0.001 < 0.001 < 0.001 0.75 < 0.001 < 0.001 0.9 0.01 0.81 0.13 < 0.001 0.01 0.11 0.68 0.48 0.61 6 < 0.001 < 0.001 < 0.001 0.97 < 0.001 < 0.001 0.89 0.02 0.81 0.13 < 0.001 0.01 0.16 0.56 0.57 0.68 7 < 0.001 < 0.001 < 0.001 0.08 < 0.001 < 0.001 0.1 0.51 0.13 0.13 < 0.001 < 0.001 0.02 0.39 0.19 0.31 8 0.96 0.21 < 0.001 < 0.001 0.01 0.29 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.26 0.01 < 0.001 0.01 0.04 9 0.32 0.92 0.5 0.01 0.9 0.72 0.01 < 0.001 0.01 0.01 < 0.001 0.26 0.26 < 0.001 0.16 0.2 10L 0.02 0.17 0.45 0.16 0.15 0.08 0.12 0.01 0.11 0.16 0.02 0.01 0.26 0.08 0.64 0.64 10R < 0.001 < 0.001 < 0.001 0.5 < 0.001 < 0.001 0.59 0.12 0.68 0.56 0.39 0 < 0.001 0.08 0.38 0.51 11L 0.02 0.11 0.26 0.57 0.11 0.06 0.51 0.12 0.48 0.57 0.19 0.01 0.16 0.64 0.38 0.95 11R 0.04 0.15 0.32 0.69 0.17 0.1 0.64 0.22 0.61 0.68 0.31 0.04 0.2 0.64 0.51 0.95 R right, L left. Note that statistical significant p values are marked in bold. Iuga et al. BMC Med Imaging (2021) 21:69 Page 10 of 12 masks as input for mediastinal LN segmentation [26]. A The analysis by location showed considerable differ - detection rate of 95.5% is reported on a different data - ences for the different LN levels. For example, for Level 4 set including considerably fewer cases, thus impeding LNs a validation detection rate of 85.0% was achieved for comparison to this study. The current approach did not those localized on the right side whereas of only 71.0% rely on explicit shape modeling nor did it incorporate for those on the left side. A possible explanation could be segmentation of neighboring organs. In addition, both the considerable difference in the number of annotated axillary and mediastinal regions were simultaneously LNs—262 LNs in level 4R and only 160 LNs in level 4L. A addressed, thereby providing a complete assessment similar difference could be observed for Level 10 (75.0% of the thoracic region. Moreover, in contrast to other with 71 annotated LNs for the right side versus 55.0% publications a total of 3585 LNs have been used for the with only 29 annotated LNs for the left side). Addition- training dataset only. ally, worse contrast to surrounding tissue on the left ver- Previous studies using the same public dataset reported sus right side might be another reason for the differences detections rates of 78% [21], 84% [22] and up to 88% [27] in detection rates. with 6 FPs per scan. In those studies, only the center of Proper LNs classification and labelling is needed in the LN was detected, and a detection was counted as order to develop future approaches in the characteriza- correct if the detected landmark was within a distance tion of malignant LNs, for example when considering of 15  mm from the ground truth landmark annotation. Radiomics. Moreover, other features regarding the mor- These detection rates are in good agreement with the phology of the thoracic LNs in addition to size (for exam- validation results of this study while the current approach ple shape or homogeneity) should also be considered in simultaneously provides a 3D segmentation of the LNs. future work. The current work considers just the tho - Therefore, the algorithm can ensure a correct whole- racic LNs. Future work will address the extension to the volume segmentation of small as well as large LNs, nec- abdominal region. essary for the extraction of radiomic features in future The main limitation of this study was the fact that the approaches. Further, the whole-volume assessment of datasets were segmented by only one radiologist. How- the network should potentially facilitate future work con- ever, this radiologist was well trained in detection and sidering automated determination of total tumor load at segmentation of LNs in chest CTs (more than 4  years diagnosis and in treatment response evaluation. of experience) and unclear LNs were discussed with an In contrast to previous studies, where only cross- experienced radiologist (more than 15  years of experi- validation (sixfold [21, 27] and threefold [22]) was per- ence). We assumed to have a homogenous dataset of the formed, additional testing has been performed on a more than 4.000 manually segmented LNs with opti- completely independent previously unseen dataset in mized inter-rater variability. Nevertheless, in this study addition to the fourfold cross-validation, in order to the inter-rater effect of independent segmentation data - assess the generalizability of the trained network. Test- sets for training of the network has not been evaluated. ing showed a similar detection rate compared to the ini- This was beyond the purpose of this study and has to be tial fourfold cross-validation dataset, thus achieving a investigated in a subsequent trail. reasonable generalizability and facilitating LN detection Another limitation of the study is the limited number during routine clinical work. of annotated LNs. Adding more annotations to the train- This work considered both axillary and mediastinal ing dataset could most probably ensure a better detec- LNs using a single convolutional neural network, show- tion rate, especially for the mediastinal LNs located ing good validation results while addressing two differ - in levels for which the analyzed dataset had just few ent anatomical regions and therefore offering a complete representatives. analysis of the entire thorax with only one network. Finally, another limitation of the study is the limited Another way to potentially improve the detection rate number of data augmentation strategies that has been is by increasing the amount of training data. Multiple, applied, since multiple and stronger strategies could also stronger data augmentation strategies, which have not potentially improve the detection rate. been explored in the present study, have been proposed to improve vision tasks for images [28, 29]. Conclusions CT scans containing bulky axillary or mediastinal In conclusion, based on extensive and rigorous annota- lymphadenopathy have not been excluded. Even if the tions, the proposed 3D DL approach achieved a good delimitation and segmentation of individual LNs form- performance in the automatic detection and segmen- ing the lesions was challenging, consecutively influencing tation especially of enlarged LNs. In contrast to other the overall detection ICH rate negatively, these CT scans work, both the axillary and mediastinal regions have ensure a heterogeneous dataset. been simultaneously addressed and thus a complete Iuga  et al. BMC Med Imaging (2021) 21:69 Page 11 of 12 50937 Cologne, Germany. Philips Research, Röntgenstraße 24, 22335 Ham‑ assessment of the thoracic region is provided. Our burg, Germany. Institute of Diagnostic and Interventional Radiology, Univer‑ approach could be considered for further research sity Hospital Zürich, Zürich, Switzerland. regarding quantitative features of LNs to improve and Received: 4 October 2020 Accepted: 2 April 2021 accelerate diagnosis. Extension to other regions should be considered in the future. Abbreviations References AI: Artificial intelligence; BN‑L: Batch normalization layer; CBR: Convolutional 1. Walker CM, Chung JH, Abbott GF, Little BP, El‑Sherief AH, Shepard JAO, layer (Conv‑L), batch normalization layer (BN‑L) and ReLU layer (ReLU‑L); CBRU: et al. Mediastinal lymph node staging: from noninvasive to surgical. AJR. Convolutional layer (Conv‑L), batch normalization layer (BN‑L) and ReLU layer 2012;199:W54–64. (ReLU‑L), followed by upsampling layers; CNTK: Microsoft Cognitive Toolkit; 2. Schwartz LH, Bogaerts J, Ford R, Shankar L, Therasse P, Gwyther S, et al. Conv‑L: Convolutional layer; CT: Computed tomography; DL: Deep learning; Evaluation of lymph nodes with RECIST 1.1. Eur J Cancer. 2009;45:261–7. f‑net: Foveal neural network; FP: False positive; GPU: Graphics processing unit; 3. Younes A, Hilden P, Coiffier B, Hagenbeek A, Salles G, Wilson W, et al. HU: Hounsfield units; LN: Lymph node; RECIL: Response evaluation criteria International Working Group consensus response evaluation criteria in in lymphoma; RECIST: Response evaluation criteria in solid tumors; ReLU‑L: lymphoma (RECIL). Ann Oncol. 2017;2017:1436–47. Rectified linear units layer; SAD: Short ‑axis diameter; u‑net: Fully convolutional 4. Cheson BD. Staging and response assessment in lymphomas: the new networks. Lugano classification. Chin Clin Oncol. 2015;4:1–9. 5. De Langen AJ, Raijmakers P, Riphagen I, Paul MA, Hoekstra OS. The size of Acknowledgements mediastinal lymph nodes and its relation with metastatic involvement: a The authors would like to thank Martin Balthasar for assistance with the statis‑ meta‑analysis. Eur J Cardio ‑ Thorac Chirurgie. 2006;29:26–9. tical analysis and Jasmin Holz for assistance with data curation. 6. Sloothaak DAM, van der Linden RLA, van de Velde CJH, Bemelman WA, Lips DJ, van der Linden JC, et al. Prognostic implications of occult Authors’ contributions nodal tumour cells in stage I and II colon cancer: the correlation A‑II: Conceptualization, writing—original draft preparation, investigation, vali‑ between micrometastasis and disease recurrence. Eur J Surg Oncol. dation, visualization, writing—reviewing and editing. HC: conceptualization, 2017;43:1456–62. writing—original draft preparation, methodology, software, validation, visuali‑ 7. Choi SB, Han HJ, Park P, Kim WB, Song TJ, Choi SY. Systematic review of the zation, writing—reviewing and editing. AJH: Conceptualization, supervision. clinical significance of lymph node micrometastases of pancreatic adeno ‑ TB: Methodology, software. TK: Conceptualization, supervision, methodology. carcinoma following surgical resection. Pancreatology. 2017;17:342–9. DM: Conceptualization, supervision. TP: Supervision, writing—reviewing and 8. Leong SPL, Tseng WW. Micrometastatic cancer cells in lymph nodes, editing. BB: Supervision, writing—original draft preparation, Writing—review‑ bone marrow, and blood: clinical significance and biologic implications. ing and editing. MP: Conceptualization, supervision, writing—reviewing and CA Cancer J Clin. 2014;64:195–206. editing. All authors read and approved the final manuscript. 9. Dappa E, Elger T, Hasenburg A, Düber C, Battista MJ, Hötker AM. The value of advanced MRI techniques in the assessment of cervical cancer: a Funding review. Insights Imaging. 2017;8:471–81. Open Access funding enabled and organized by Projekt DEAL. This work was 10. Shen G, Zhou H, Jia Z, Deng H. Diagnostic performance of diffusion‑ partially supported by a Philips Clinical Research Fellowship. The work is part weighted MRI for detection of pelvic metastatic lymph nodes in patients of the SPP initiative of the Deutsche Forschungsgemeinschaft (DFG). with cervical cancer: a systematic review and meta‑analysis. Br J Radiol. 2015. https:// doi. org/ 10. 1259/ bjr. 20150 063. Availability of data and materials 11. Otero‑ García MM, Mesa‑Álvarez A, Nikolic O, Blanco ‑Lobato P, Basta‑ The training and validation dataset supporting the conclusions of this article Nikolic M, de Llano‑ Ortega RM, et al. Role of MRI in staging and follow‑up is available at https:// wiki. cance rimag ingar chive. net/ displ ay/ Public/ CT+ of endometrial and cervical cancer: pitfalls and mimickers. Insights Imag‑ Lymph+ Nodes or through request to the corresponding author. The testing ing. 2019;10:19. dataset supporting the conclusions of this article can be accessed through 12. Chen L, Zhou Z, Sher D, Zhang Q, Shah J, Pham NL, et al. Combining request to the corresponding author. many‑ objective radiomics and 3D convolutional neural network through evidential reasoning to predict lymph node metastasis in head and neck cancer. Phys Med Biol. 2019. https:// doi. org/ 10. 1088/ 1361‑ 6560/ ab083a. Declarations 13. Spuhler KD, Ding J, Liu C, Sun J, Serrano‑Sosa M, Moriarty M, et al. Task ‑ based assessment of a convolutional neural network for segmenting Ethics approval and consent to participate breast lesions for radiomic analysis. Magn Reson Med. 2019;82:786–95. For the training dataset Institutional Review Board approval was not required 14. Ji GW, Zhu FP, Zhang YD, Liu XS, Wu FY, Wang K, et al. A radiomics because it is a publicly available dataset. For the validation dataset ethical approach to predict lymph node metastasis and clinical outcome of approval was waived due to the retrospective design of the study based on intrahepatic cholangiocarcinoma. Eur Radiol. 2019;29:3725–35. preexisting images (ethics committee of the Faculty of Medicine, University of 15. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. Cologne, reference number 19‑1390/ 07.08.2019). A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. Consent for publication 16. Ching T, Himmelstein DS, Beaulieu‑ Jones BK, Kalinin AA, Do BT, Way Not applicable. GP, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15:20170387. Competing interests 17. Bi WL, Hosny A, Schabath MB, Giger ML, Birkbak NJ, Mehrtash A, et al. Arti‑ DM received speaker’s honoraria from Philips Healthcare. A‑II received ficial intelligence in cancer imaging: clinical challenges and applications. institutional research support from Philips Healthcare for research. HC, TB and CA Cancer J Clin. 2019;69:127–57. TK are employees from Philips Research for technical deployment of the AI 18. Zhou SK, Greenspan H, Davatzikos C, Duncan JS, van Ginneken B, Madab‑ algorithms. All other authors were independent researchers and guarantee hushi A, et al. A review of deep learning in medical imaging: Image traits, the correctness of the data and results. technology trends, case studies with progress highlights, and future promises. 2020; arXiv: 2008. 09104. Author details 19. Kooi T, Litjens G, van Ginneken B, Gubern‑Mérida A, Sánchez CI, Mann R, Institute of Diagnostic and Interventional Radiology, Medical Faculty et al. Large scale deep learning for computer aided detection of mam‑ and University Hospital Cologne, University of Cologne, Kerpener Str. 62, mographic lesions. Med Image Anal. 2017;35:303–12. Iuga et al. BMC Med Imaging (2021) 21:69 Page 12 of 12 20. Nie D, Zhang H, Adeli E, Liu L, Shen D. 3D deep learning for multi‑modal 26. Oda H, Bhatia KK, Roth HR, Oda M, Kitasaka T, Iwano S, et al. Dense volu‑ imaging‑ guided survival time prediction of brain tumor patients. In: metric detection and segmentation of mediastinal lymph nodes in chest International conference on medical image computing and computer‑ CT images. SPIE Med Imaging. 2018. https:// doi. org/ 10. 1117/ 12. 22870 66. assisted intervention, vol. 9901; 2016. p. 212–20. 27. Seff A, Lu L, Barbu A, Roth H, Shin HC, Summers RM. Leveraging mid‑ 21. Seff A, Lu L, Cherry KM, Roth HR, Liu J, Wang S, et al. 2D view aggregation level semantic boundary cues for automated lymph node detection. In: for lymph node detection using a shallow hierarchy of linear classifiers. In: International conference on medical image computing and computer‑ International conference on medical image computing and computer‑ assisted intervention. 2015. p. 53–61. assisted intervention, vol. 17; 2014. p. 544–52. 28. Razavian AS, Azizpour H, Sullivan J, Carlsson S. CNN features off‑the ‑shelf: 22. Roth HR, Lu L, Seff A, Cherry KM, Hoffman J, Wang S, et al. A new 2.5D An astounding baseline for recognition. In: IEEE Computer Society confer‑ representation for lymph node detection using random sets of deep ence on computer vision and pattern recognition workshops. 2014. p. convolutional neural network observations. In: International conference 512–9. on medical image computing and computer‑assisted intervention; 2014. 29. Hussain Z, Gimenez F, Yi D, Rubin D. Differential data augmentation p. 520–527. techniques for medical imaging classification tasks. In: AMIA annual 23. Mountain CF, Dresler CM. Regional lymph node classification for lung symposium proceedings. 2017. p. 979–84. cancer staging. Chest. 1997;111:1718–23. 24. Brosch T, Saalbach A. Foveal fully convolutional nets for multi‑ organ Publisher’s Note segmentation. Medical Imaging. 2018. https:// doi. org/ 10. 1117/ 12. 22935 Springer Nature remains neutral with regard to jurisdictional claims in pub‑ lished maps and institutional affiliations. 25. Ronneberger O, Fischer P, Brox T. U‑net: Convolutional networks for biomedical image segmentation; 2015. arXiv: 1505. 04597. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Medical Imaging Springer Journals

Automated detection and segmentation of thoracic lymph nodes from CT using 3D foveal fully convolutional neural networks

Loading next page...
 
/lp/springer-journals/automated-detection-and-segmentation-of-thoracic-lymph-nodes-from-ct-Ve4cD5Jr0M

References (41)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2021
eISSN
1471-2342
DOI
10.1186/s12880-021-00599-z
Publisher site
See Article on Publisher Site

Abstract

Background: In oncology, the correct determination of nodal metastatic disease is essential for patient manage‑ ment, as patient treatment and prognosis are closely linked to the stage of the disease. The aim of the study was to develop a tool for automatic 3D detection and segmentation of lymph nodes (LNs) in computed tomography (CT ) scans of the thorax using a fully convolutional neural network based on 3D foveal patches. Methods: The training dataset was collected from the Computed Tomography Lymph Nodes Collection of the Cancer Imaging Archive, containing 89 contrast‑ enhanced CT scans of the thorax. A total number of 4275 LNs was segmented semi‑automatically by a radiologist, assessing the entire 3D volume of the LNs. Using this data, a fully convolutional neuronal network based on 3D foveal patches was trained with fourfold cross‑ validation. Testing was performed on an unseen dataset containing 15 contrast‑ enhanced CT scans of patients who were referred upon suspicion or for staging of bronchial carcinoma. Results: The algorithm achieved a good overall performance with a total detection rate of 76.9% for enlarged LNs during fourfold cross‑ validation in the training dataset with 10.3 false‑positives per volume and of 69.9% in the unseen testing dataset. In the training dataset a better detection rate was observed for enlarged LNs compared to smaller LNs, the detection rate for LNs with a short‑axis diameter (SAD) ≥ 20 mm and SAD 5–10 mm being 91.6% and 62.2% (p < 0.001), respectively. Best detection rates were obtained for LNs located in Level 4R (83.6%) and Level 7 (80.4%). Conclusions: The proposed 3D deep learning approach achieves an overall good performance in the automatic detection and segmentation of thoracic LNs and shows reasonable generalizability, yielding the potential to facilitate detection during routine clinical work and to enable radiomics research without observer‑bias. Keywords: Deep learning, Artificial intelligence, Lymph nodes, Computed tomography, Staging Background The correct determination of nodal metastatic disease is imperative for patient management in oncology, since the patients’ treatment and prognosis are inherently linked to the stage of disease [1]. For nodal disease stag- *Correspondence: andra.iuga@uk‑koeln.de ing of solid tumors, unidimensional measurements of Institute of Diagnostic and Interventional Radiology, Medical Faculty and University Hospital Cologne, University of Cologne, Kerpener Str. 62, lymph node (LN) short-axis diameters (SAD) are rou- 50937 Cologne, Germany tinely performed during tumor staging and re-staging Full list of author information is available at the end of the article © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Iuga et al. BMC Med Imaging (2021) 21:69 Page 2 of 12 imaging examinations and evaluated according to differ - clinical practice, AI has been lately used in digital pathol- ent standardized diagnostic criteria such as the Response ogy, in imaging of the brain for the detection of metasta- Evaluation Criteria in Solid Tumors (RECIST) [2]. For sis and in imaging of the chest, for the early detection of lymphomas, a different set of standardized diagnos - breast carcinoma [17, 18]. tic criteria such as the Response Evaluation Criteria in Regarding LNs a wide range of 2D approaches [21, 22] Lymphoma (RECIL) [3] or the Lugano Criteria [4] have have been proposed so far for detection and segmenta- been suggested, using bi- instead of unidimensional LN tion, where LNs were segmented using unidimensional measurements. measurements consisting of the determination of the Although it is commonly accepted that larger LNs have SAD of the target lesions. However, a unidimensional a higher probability of being malignant as compared to approach can underestimate the size as well as the smaller LNs, previous work has shown that enlargement growth of LNs, especially when considering enlarged of LNs alone is not the most reliable predictive factor LNs. Consequently, correct segmentation of LNs consid- for malignancy with only 62% sensitivity and specific - ering the whole volume of the lesion is of ultimate impor- ity being demonstrated for predicting LN metastasis in tance for proper diagnosis and follow-up. patients with non-small cell lung cancer when using the u Th s, the aim of the study was to develop a tool for proposed 10  mm cut-off [5]. Consequently, small LNs automatic 3D LN detection and segmentation in com- potentially harboring micrometastases should be taken puted tomography (CT) scans using a fully convolutional into consideration for improved diagnostic accuracy neural network based on 3D foveal patches. during disease staging [6–8]. Unfortunately, no imaging technique (including, e.g., functional techniques such as Methods diffusion-weighted magnetic resonance imaging) so far Description of the training and validation dataset has been demonstrated to be capable of reliably detecting For the training and validation dataset, images were LN micrometastases [9–11]. obtained from the CT Lymph Nodes Collection of the Radiomics is a promising novel strategy for predicting Cancer Imaging Archive [22]. The dataset can be accessed LN dignity from images. Radiomic models thereby are and downloaded at https:// wiki. cance rimag ingar chive. built using e.g., machine learning algorithms based on a net/ displ ay/ Public/ CT+ Lymph+ Nodes. The dataset was large set of quantitative features, which are mathemati- made available to allow for a direct comparison to other cally or statistically derived from medical images [12–14]. detection methods in order to advance the state of the For extraction of radiomic features and detection of LN art and to encourage development and improvement of macro- as well as micrometastases, a reliable and correct computer-aided detection methods. The dataset con - detection and whole-volume segmentation of small as tained contrast-enhanced CT images of 90 patients from well as large LNs is needed. Manual or even semi-auto- different scanners with an in-slice resolution between mated segmentation of LNs is extremely time-consuming 0.63 and 0.98 mm and a slice thickness ranging from 1 to and LN detection strongly depends on the radiologist’s 5 mm (88 CT scans with a slice thickness of 1 or 1.5 mm experience, thus currently hampering the translation of and 2 CT scans with a slice thickness of 5  mm). To the a radiomics-based decision support to clinical routine. best of our knowledge, there is no information available Consequently, fully automated approaches are urgently regarding patients’ disease or further demographic infor- needed for a fast and robust detection and segmentation mation. The included CT scans showed normal-sized of LNs. thoracic LNs (SAD < 10  mm) as well as lymphadenopa- Recent developments in deep learning (DL) have thy (SAD ≥ 10 mm). The datasets included also CT scans shown promising results in areas relying on imaging data, containing mediastinal bulky disease and bulky axillary especially in radiology [15, 16] and cancer imaging [17, lymphadenopathy. In order to allow better comparison 18]. While requiring little human input, DL algorithms to clinical routine with usually heterogeneous datasets, significantly outperform existing detection and segmen - these were not excluded from network training. One case tation methods [19], thus offering automated quantifica - was excluded from our study since it did not contain the tion and selection of the most robust features, including complete scan of the thorax. a proper 3D assessment of lesions. Moreover, previous For this dataset Institutional Review Board approval work showed that 3D DL architectures were successful was not required because it is a publicly available dataset. in learning high-level tumor appearance features out- performing 2D models [20]. In cancer imaging, the use Description of the testing dataset of artificial intelligence (AI) has shown a great utility not Further, a second unseen dataset was collected for only in the (semi)automatic tumor detection, but also independent testing. Similar to the training and in tumor characterization, and treatment follow-up. In validation dataset, the testing dataset consisted of Iuga  et al. BMC Med Imaging (2021) 21:69 Page 3 of 12 contrast-enhanced CT scans (n = 15). The patients (8 clinical experience and focus in oncological imaging. The male, 7 female; mean age 68 ± 16.6  years) were referred training and validation dataset consisted of 4275 LNs, upon suspicion or for staging of bronchial carcinoma with an average of 48 LNs per patient. When consider- from March 2016 to November 2017 (Table 1). All exam- ing the location of the LNs the 4275 LNs included 2272 inations were performed on a 128-slice PET/CT-system axillary and 2003 mediastinal/hilar LNs. The LNs had an (Siemens Biograph mCT Flow 128 Edge, Siemens Medi- SAD of 1.3–67.6  mm. A total number of 814 enlarged cal). Patients were scanned supine in cranio-caudal direc- (SAD > 10 mm) LNs was segmented. The segmentation of tion during inspirational breath-hold after intravenous the LNs took approximately between 45 and 120 min per injection of 120  ml contrast medium (Accupaque 350, dataset. LNs with an SAD < 5 mm that had been mistak- GE Healthcare) with an injection rate of 2.5  ml/s and a enly annotated (n = 690) were not included in the evalu- delay of 60  s. The following scan parameters were used: ation. Figure  1 shows the process of data collection and collimation 128 × 0.6  mm, rotation time 0.5  s, pitch 0.6. LN segmentation. All axial images were reconstructed with a slice thickness For data evaluation segmented LNs were divided into of 2  mm. Similar to the training and validation dataset, 3 groups based on their SAD: 5–10  mm (2523 LNs); the testing dataset included CT scans, that showed both 10–20 mm (954 LNs); and > 20 mm (107 LNs). normal-sized thoracic LNs and lymphadenopathy. Furthermore, based on their localization all segmented Ethical approval was waived due to the retrospective LNs were divided into axillary (right, left) and mediasti- design of the study based on preexisting images (Eth- nal (including hilar LNs). The mediastinal LNs were fur - ics Committee of the Faculty of Medicine, University of ther divided in 11 groups depending on their location Cologne, reference number 19-1390/ 07.08.2019). corresponding levels (levels 1–11), based on the Moun- tain–Dresler modification of the American Thoracic Lymph node segmentation Society LN map [23]. The side (right respectively left) was Training and validation dataset considered for level 1, level 2, level 4, level 10 and level A radiologist (blinded; more than 4  years of experience 11. in thoracic imaging) segmented all LNs of the train- ing and validation dataset with an SAD of at least 5 mm Testing dataset in the mediastinal, hilar and axillary regions using the In the testing dataset, a total of 113 LNs were segmented, semi-automatic 3D Multi-Modal Tumor Tracking tool with an average of 7.5 LNs per patient. The segmented of a commercially available software platform (Intel- LNs included both axillary and mediastinal/hilar LNs. liSpace Portal, Version 11.0, Philips Healthcare). In case In this dataset, nevertheless, because of time constraints of unclear LNs or findings CT images were discussed only LNs with an SAD > 10 mm were segmented. with an experienced radiologist with more than 15 years Network architecture A 3D fully convolutional neural network (u-net) was Table 1 Demographic details (age and sex) for all patients trained on the training dataset, which obtains as input included in the test dataset the original 3D images and the corresponding label Age Sex masks of the segmented LNs. The output of the net - work was a probability map, showing the probability of 1 33 Female each voxel belonging to a mediastinal or axillary LN. This 2 64 Female probability map was assessed with a fixed threshold of 3 79 Male 0.4 to obtain the final segmentation result. The threshold 4 69 Female value was optimized on the training images to yield the 5 79 Male best Dice value over all training samples. Finally, a con- 6 63 Male nected component analysis is applied to obtain the indi- 7 75 Male vidual predicted LNs. 8 68 Male The segmentation network was trained on the 3D 9 74 Female images. The used network architecture, named foveal 10 47 Male neural network (f-net) [24] is inspired by the human eye 11 33 Female and the distribution of the photoreceptor cells, which 12 69 Female have the highest resolution at the fovea centralis. A f-net 13 72 Male architecture has been used because this architecture 14 33 Female combines information of different resolution levels. On 15 67 Male the one hand, LNs were analyzed in high resolution to Iuga et al. BMC Med Imaging (2021) 21:69 Page 4 of 12 Fig. 1 Flow‑ chart showing data in‑ and exclusion together with segmentation for network training. From a total number of 90 contrast ‑ enhanced CT scans contained in the publicly available dataset 1 CT scan was excluded because it did not contain the complete scan of the thorax. Further, a total of 690 LNs were excluded because of an SAD < 5 mm. CT scans containing mediastinal bulky disease and bulky axillary lymphadenopathy were not excluded. Top left image—exemplary segmentation of bulky axillary lymphadenopathy; top right image—exemplary segmentation of normal axillary LNs; Bottom left image—exemplary segmentation of bulky mediastinal lymphadenopathy; Bottom right image—exemplary segmentation of enlarged LNs. Considerable differences in image quality of the different CT scans was noted as exemplarily shown in the bottom right image. CT computed tomography, LNs lymph nodes, SAD short‑axis diameter enable feature learning (texture, shape and size). On the feature extraction pathways is equivalent to the number other hand, neighboring anatomy was analyzed in low of resolution levels in the network. Each feature extrac- resolution. tion pathway comprises three successive blocks of valid As previously mentioned, the network considers convolution with a kernel size of 3, batch-normaliza- image patches at multiple resolution scales in order to tion, and rectified linear activation function, so called arrive at the final prediction, combining local informa - convolutional layer (Conv-L), batch normalization layer tion gained from high resolutions with context from (BN-L), and ReLU layer (ReLU-L) (CBR) blocks. The lower resolutions. Unlike u-nets [25], which receive a outputs of the feature extraction levels are combined single scale input image and create the coarser resolu- in a feature integration pathway through an additional tion scales by downsampling within the network, f-net CBR block followed by upsampling of the lower reso- directly receives the input as a multiscale pyramid of lution outputs. Finally, a channel-wise softmax layer is image patches. Here, an architecture with four resolu- applied to acquire pseudo class probabilities for the LN tion levels was used. Accordingly, each input sample labels. In addition, f-net was chosen because its archi- to the network consisted of four image patches at the tecture requires less memory and runtime compared to same position but downscaled for the lower resolution u-net [25]. Figure  2 shows an overview of the network levels. The input to each resolution level is processed architecture. in a feature extraction pathway. Thus, the number of Iuga  et al. BMC Med Imaging (2021) 21:69 Page 5 of 12 Fig. 2 Sketch of the network architecture. A 3D fully convolutional foveal neural network was trained. The network architecture is inspired by the human eye and the distribution of the photoreceptor cells, which have the highest resolution at the fovea centralis. The network consists of several blocks of convolutional layers, batch normalization and the rectified linear activation function (CBR), which extract features at different resolution levels. CBR blocks are followed by upsampling layers (CBRU) to match the resolution of the other levels Training and validation setup patches was applied on-the-fly with a maximal scaling Training was performed using Microsoft Cognitive factor of 1.1 and a maximal rotation of 7°. The usage of Toolkit CNTK with a Python interface (Hardware: stronger augmentation with regard to rotation and scal- 2.40  GHz processor with 2 × NVIDIA GTX 1080ti with ing showed a decline in performance and was therefore 11  GB graphics memory). The images were pre-pro - abandoned. Individual LNs were not manipulated during cessed by resampling them to a fixed isotropic sampling data augmentation and therefore the total number of LNs grid with a spacing of 1.5  mm, this increases the speed remained unchanged. Test-time augmentation has not of the network training and deployment while preserv- been performed. ing sufficient image detail. Used matrix size was standard The cross-entropy function was chosen for the optimi - (512 × 512) and data pixel size was 1 mm isotropic. zation of the network since it showed good performance To enhance the soft-tissue contrast of the LNs, only on many tasks [12, 13, 26].The network was trained for the gray-value window 750/70 Hounsfield Units (HU) 1000 epochs with a minibatch size of 8 and the AdaDelta was considered and gray-values outside this range were optimizer. clipped to the upper or lower limit. This gray-value win - The models were trained using fourfold cross-valida - dow was determined automatically on the training data tion on the training data with the dataset being randomly by computing the mean and standard-deviation of all split into four groups (i.e., training was performed on 3 of voxels labeled as LN and their direct neighborhood. No the groups while the remaining group was used for vali- further pre-processing was performed. dation.). The validation was used to explore performance Training was performed based on patches (hereby, it of the network architecture and training setup with was ensured that at least 30% of the patches contain LN regard to number of resolution levels in the network, voxels), which were drawn randomly from the images. As optimizer, augmentation and patch sampling strategy. In data augmentation, random scaling and rotation of the the following we present the results of the best training Iuga et al. BMC Med Imaging (2021) 21:69 Page 6 of 12 experiment. A full ablation study is beyond the scope of of the results, bootstrapping analysis was performed this paper and will be addressed in future work. (with replacement using 100% of the sample size with the number of simulations N = 10.000). Testing setup Finally, the model trained on the complete training data- Results set was tested on the previously unseen, in-house derived Calculation of the LN probability maps took about 24  s testing dataset. per dataset on a graphics processing unit (GPU), while training took 120–180 min. Evaluation criteria Bootstrap analysis was performed and confirmed the The performance of the network is assessed by looking at robustness of the results. The empirical distribution of the individual LNs. For the ground-truth the single nodes the detection rate showed a standard deviation of 1.7%. are available from the annotation process. For the pre- dicted LNs, a connected component analysis of the pre- Network performance: validation dataset dicted segmentation mask is performed. Segmentation accuracy One performance metric is the detection rate, which is Overall, a mean Dice value of 0.75 and 0.48 is achieved the number of detected LNs divided by the total number on the training and validation dataset. True positive rate of LNs. A LN is thereby counted as detected if there was and positive predictive value account to 0.76 and 0.75 on at least one voxel overlap with the segmentation mask the training and 0.45 and 0.62 on the validation data. The predicted by the network. Dice value for the mediastinal LN accounts to 0.44 and to The second performance metric is the number of false 0.55 for the axillary LN with a smaller gap between train- positives (FP) per volume. Here, a connected component ing and validation, therefore showing less overfitting. in the predicted segmentation mask without overlap to a More details can be seen in Fig. 3. ground truth is counted as FP. This rather loose criterion was chosen instead of Lymph node detection rate according to lymph node size stricter measures, e.g., larger overlap thresholds between The overall detection rate for all LNs with an SAD > 5 mm ground truth and predicted segmentation, as one par- using the trained network was 66.5% with 10.3 FPs per ticular challenge in LN assessment is that differentiation volume on average. Exemplary images of detected and of individual nodes is often not possible when adjacent missed LNs compared to the ground truth segmentations nodes merge into clusters due to pathology. Obviously, it are shown in Fig.  4. The highest detection rate could be can occur that a ground truth segmentation is ’detected’ observed when looking only at LNs with an SAD > 20 mm, by multiple predicted segmentations and similarly that a while detection rate was only good to moderate when predicted segmentation overlaps with multiple ground- considering smaller LNs (SAD > 20  mm vs. SAD truth segmentations. This criterion appears to be current 10–20  mm: 91.6% vs. 75.3%, p < 0.001; SAD > 20  mm vs. state-of-the-art and has been used in previous work [22]. SAD 5–10  mm: 91.6% vs. 62.2%, p < 0.001; Fig.  5). Look- In addition, the segmentation quality is assessed on a ing only at the subgroup of clinically relevant enlarged voxel level per image for the detected LNs. To this end, LNs (defined by an SAD > 10  mm), a total detection rate all missed LNs are removed from the ground-truth mask of 76.9% was obtained with a significantly higher detec - and all FP are removed from the predicted segmentation tion rate for LNs with an SAD > 10  mm as compared to mask. From the resulting masks Dice, true-positive rate LNs with an SAD < 10  mm (76.9% vs. 62.1%, p < 0.001; and positive predictive value are computed. Fig. 5). Statistical analysis Lymph node detection rate according to lymph node location Statistical analysis was performed in the open-source A better overall detection rate was obtained for the axil- statistics package R version 3.3.1 for Windows (R: A lan- lary LNs compared to mediastinal LNs (70.0% vs. 62.3%, guage and environment for statistical computing, R Core p < 0.001; Fig.  5). A better detection could be observed Team, R Foundation for Statistical Computing. ISBN when looking only at LNs with an SAD > 20  mm, while 3-900051-07-0, 2019, URL http://R- proje ct. org/). After detection rate was only good to moderate when con- assessing normal distribution of the data, a two-sided sidering smaller LNs, both for axillary and mediasti- unpaired t-test was applied to determine the differences nal LNs; axillary LNs with an SAD > 20  mm versus SAD in means of the detection rates considering both size and 10–20  mm: 90.5% versus 74.9%, p < 0.001; SAD > 20  mm location of the LNs. Statistical significance was defined versus SAD 5–10  mm: 90.5% versus 47.2%, p < 0.001; as p ≤ 0.05. To get an impression of the variability of the Fig.  5); mediastinal LNs with an SAD > 20  mm ver- observed detection rates and to confirm the robustness sus SAD 10–20  mm: 92.3% versus 75.7%, p < 0.001; Iuga  et al. BMC Med Imaging (2021) 21:69 Page 7 of 12 Fig. 3 Overview of dice (a), positive predictive value (b) and true positive rate (c) training and validation data for mediastinal and axillary lymph nodes Fig. 4 Examples of ground‑truth and predicted segmentations. a Optimal LN segmentation, b segmentation of a LN bulk, c purple—missed LNs; red—true positive, detected LN which was initially not segmented by the radiologist (short‑axis diameter < 5 mm); d red—false positive segmentation (vessels detected as LN). LN lymph node Fig. 5 Overview of the validation detection rates depending on the short‑axis diameter of the segmented LNs: 5–10 mm (2523 LNs), 10–20 mm (954 LNs), and > 20 mm (107 LNs). a Overall detection rates of both axillary, mediastinal and hilar LNs; b detection rates of axillary LNs, c detection rates of mediastinal and hilar LNs. LNs lymph nodes Iuga et al. BMC Med Imaging (2021) 21:69 Page 8 of 12 SAD > 20 mm versus SAD 5–10 mm: 92.3% versus 33.8%, Discussion p < 0.001; Fig.  5). Looking only at the subgroup of clini- The aim of the study was to develop a 3D DL algorithm cally relevant enlarged LNs (defined by an SAD > 10 mm), for robust LN detection and segmentation in contrast- a slightly better detection rate was shown for LNs of the enhanced CT scans of the thorax. The main findings can mediastinal region compared to the axillary (77.8% vs. be summarized as follows: (1) The algorithm achieved 76.0%, p < 0.05). a good overall performance with an overall validation Based on the labelling of the mediastinal LNs a fur- detection rate of 70% for LNs with an SAD over 5  mm. ther analysis was performed to establish detection rates (2) Reasonable generalizability was achieved with a simi- at different levels (Fig.  6). The best detection rates were lar detection rate for enlarged LNs (SAD > 10  mm) in obtained for LNs located in Level 4R (83.6%), and Level the fourfold cross-validation dataset compared to the 7 (80.4%), while the lowest detection rate was recorded unseen testing dataset of 76.9% and 69.9%, respectively. for LNs located in Level 8 (25.9%). A better detection (3) A better validation detection rate was observed for rate was shown for LNs > 10 mm for all levels. For exam- enlarged LNs compared to smaller LNs (enlarged LNs ple, level 2 R (right) showed a detection rate of 96.5% for showed a detection rate of 76.9%; the detection rate for LNs > 10 mm versus 63.5% for LNs < 10 mm. For level 7, a LNs with an SAD ≥ 20 mm and SAD 0–5 mm was 91.6% total detection rate of 93.3% was shown for LNs > 10 mm and 40.8%, respectively). (4). Regarding different LN loca - versus 72.0% for LNs < 10 mm. The detection rate was sta - tions, the best validation detection rates were obtained tistically significant different for different levels (Table 2). for LNs located in Level 4R (right mediastinal), Level 7 (mediastinal subcarinal), and Level 10 R (right hilar) of Network performance: testing dataset 83.6%, 80.4% and 74.6%, respectively. (5) Segmentation On our in-house dataset, which was unseen during accuracy shows a promising Dice value of 0.48. Segmen- training, a detection rate of 69.9% was achieved for the tation accuracy is superior in the axillary region with less enlarged LNs (SAD > 10  mm). This result compares well overfitting. This is probably due to the stronger homoge - to the 76.9% achieved on the validation data set. It shows neity of the data compared to the mediastinal LNs. the generalization capabilities of our network which Although a few DL approaches have been proposed is able to cope with the domain shift when applied to for mediastinal LNs [21, 22, 26], there is still only a images with a different pathology (bronchial cancer in very limited number of publications available. A study the testing data, unclear cancer in the training and vali- similar to this work using the same evaluation criteria, dation data). employs a 3D u-net with additional organ segmentation Fig. 6 Overview of the validation detection rates of the mediastinal and hilar lymph nodes according to the localization of the lymph nodes. R right, L left Iuga  et al. BMC Med Imaging (2021) 21:69 Page 9 of 12 Table 2 Comparison matrix for p values of systematic differences in validation detection rates between lymph node levels p VALUES 1L 1R 2L 2R 3A 3P 4L 4R 5 6 7 8 9 10L 10R 11L 11R 1L 0.29 0.02 < 0.001 0.08 0.4 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.96 0.32 0.02 < 0.001 0.02 0.04 1R 0.29 0.34 < 0.001 0.75 0.76 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.21 0.92 0.17 < 0.001 0.11 0.15 2L 0.02 0.34 < 0.001 0.28 0.14 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.5 0.45 < 0.001 0.26 0.32 2R < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.83 0.01 0.75 0.97 0.08 < 0.001 0.01 0.16 0.5 0.57 0.69 3A 0.08 0.75 0.28 < 0.001 0.42 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.01 0.9 0.15 < 0.001 0.11 0.17 3P 0.4 0.76 0.14 < 0.001 0.42 < 0.001 < 0.001 < 0.001 < 0.001 0 0.29 0.72 0.08 < 0.001 0.06 0.1 4L < 0.001 < 0.001 < 0.001 0.83 < 0.001 < 0.001 < 0.001 0.9 0.89 0.1 < 0.001 0.01 0.12 0.59 0.51 0.64 4R < 0.001 < 0.001 < 0.001 0.01 < 0.001 < 0.001 < 0.001 0.01 0.02 0.51 < 0.001 < 0.001 0.01 0.12 0.12 0.22 5 < 0.001 < 0.001 < 0.001 0.75 < 0.001 < 0.001 0.9 0.01 0.81 0.13 < 0.001 0.01 0.11 0.68 0.48 0.61 6 < 0.001 < 0.001 < 0.001 0.97 < 0.001 < 0.001 0.89 0.02 0.81 0.13 < 0.001 0.01 0.16 0.56 0.57 0.68 7 < 0.001 < 0.001 < 0.001 0.08 < 0.001 < 0.001 0.1 0.51 0.13 0.13 < 0.001 < 0.001 0.02 0.39 0.19 0.31 8 0.96 0.21 < 0.001 < 0.001 0.01 0.29 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.26 0.01 < 0.001 0.01 0.04 9 0.32 0.92 0.5 0.01 0.9 0.72 0.01 < 0.001 0.01 0.01 < 0.001 0.26 0.26 < 0.001 0.16 0.2 10L 0.02 0.17 0.45 0.16 0.15 0.08 0.12 0.01 0.11 0.16 0.02 0.01 0.26 0.08 0.64 0.64 10R < 0.001 < 0.001 < 0.001 0.5 < 0.001 < 0.001 0.59 0.12 0.68 0.56 0.39 0 < 0.001 0.08 0.38 0.51 11L 0.02 0.11 0.26 0.57 0.11 0.06 0.51 0.12 0.48 0.57 0.19 0.01 0.16 0.64 0.38 0.95 11R 0.04 0.15 0.32 0.69 0.17 0.1 0.64 0.22 0.61 0.68 0.31 0.04 0.2 0.64 0.51 0.95 R right, L left. Note that statistical significant p values are marked in bold. Iuga et al. BMC Med Imaging (2021) 21:69 Page 10 of 12 masks as input for mediastinal LN segmentation [26]. A The analysis by location showed considerable differ - detection rate of 95.5% is reported on a different data - ences for the different LN levels. For example, for Level 4 set including considerably fewer cases, thus impeding LNs a validation detection rate of 85.0% was achieved for comparison to this study. The current approach did not those localized on the right side whereas of only 71.0% rely on explicit shape modeling nor did it incorporate for those on the left side. A possible explanation could be segmentation of neighboring organs. In addition, both the considerable difference in the number of annotated axillary and mediastinal regions were simultaneously LNs—262 LNs in level 4R and only 160 LNs in level 4L. A addressed, thereby providing a complete assessment similar difference could be observed for Level 10 (75.0% of the thoracic region. Moreover, in contrast to other with 71 annotated LNs for the right side versus 55.0% publications a total of 3585 LNs have been used for the with only 29 annotated LNs for the left side). Addition- training dataset only. ally, worse contrast to surrounding tissue on the left ver- Previous studies using the same public dataset reported sus right side might be another reason for the differences detections rates of 78% [21], 84% [22] and up to 88% [27] in detection rates. with 6 FPs per scan. In those studies, only the center of Proper LNs classification and labelling is needed in the LN was detected, and a detection was counted as order to develop future approaches in the characteriza- correct if the detected landmark was within a distance tion of malignant LNs, for example when considering of 15  mm from the ground truth landmark annotation. Radiomics. Moreover, other features regarding the mor- These detection rates are in good agreement with the phology of the thoracic LNs in addition to size (for exam- validation results of this study while the current approach ple shape or homogeneity) should also be considered in simultaneously provides a 3D segmentation of the LNs. future work. The current work considers just the tho - Therefore, the algorithm can ensure a correct whole- racic LNs. Future work will address the extension to the volume segmentation of small as well as large LNs, nec- abdominal region. essary for the extraction of radiomic features in future The main limitation of this study was the fact that the approaches. Further, the whole-volume assessment of datasets were segmented by only one radiologist. How- the network should potentially facilitate future work con- ever, this radiologist was well trained in detection and sidering automated determination of total tumor load at segmentation of LNs in chest CTs (more than 4  years diagnosis and in treatment response evaluation. of experience) and unclear LNs were discussed with an In contrast to previous studies, where only cross- experienced radiologist (more than 15  years of experi- validation (sixfold [21, 27] and threefold [22]) was per- ence). We assumed to have a homogenous dataset of the formed, additional testing has been performed on a more than 4.000 manually segmented LNs with opti- completely independent previously unseen dataset in mized inter-rater variability. Nevertheless, in this study addition to the fourfold cross-validation, in order to the inter-rater effect of independent segmentation data - assess the generalizability of the trained network. Test- sets for training of the network has not been evaluated. ing showed a similar detection rate compared to the ini- This was beyond the purpose of this study and has to be tial fourfold cross-validation dataset, thus achieving a investigated in a subsequent trail. reasonable generalizability and facilitating LN detection Another limitation of the study is the limited number during routine clinical work. of annotated LNs. Adding more annotations to the train- This work considered both axillary and mediastinal ing dataset could most probably ensure a better detec- LNs using a single convolutional neural network, show- tion rate, especially for the mediastinal LNs located ing good validation results while addressing two differ - in levels for which the analyzed dataset had just few ent anatomical regions and therefore offering a complete representatives. analysis of the entire thorax with only one network. Finally, another limitation of the study is the limited Another way to potentially improve the detection rate number of data augmentation strategies that has been is by increasing the amount of training data. Multiple, applied, since multiple and stronger strategies could also stronger data augmentation strategies, which have not potentially improve the detection rate. been explored in the present study, have been proposed to improve vision tasks for images [28, 29]. Conclusions CT scans containing bulky axillary or mediastinal In conclusion, based on extensive and rigorous annota- lymphadenopathy have not been excluded. Even if the tions, the proposed 3D DL approach achieved a good delimitation and segmentation of individual LNs form- performance in the automatic detection and segmen- ing the lesions was challenging, consecutively influencing tation especially of enlarged LNs. In contrast to other the overall detection ICH rate negatively, these CT scans work, both the axillary and mediastinal regions have ensure a heterogeneous dataset. been simultaneously addressed and thus a complete Iuga  et al. BMC Med Imaging (2021) 21:69 Page 11 of 12 50937 Cologne, Germany. Philips Research, Röntgenstraße 24, 22335 Ham‑ assessment of the thoracic region is provided. Our burg, Germany. Institute of Diagnostic and Interventional Radiology, Univer‑ approach could be considered for further research sity Hospital Zürich, Zürich, Switzerland. regarding quantitative features of LNs to improve and Received: 4 October 2020 Accepted: 2 April 2021 accelerate diagnosis. Extension to other regions should be considered in the future. Abbreviations References AI: Artificial intelligence; BN‑L: Batch normalization layer; CBR: Convolutional 1. Walker CM, Chung JH, Abbott GF, Little BP, El‑Sherief AH, Shepard JAO, layer (Conv‑L), batch normalization layer (BN‑L) and ReLU layer (ReLU‑L); CBRU: et al. Mediastinal lymph node staging: from noninvasive to surgical. AJR. Convolutional layer (Conv‑L), batch normalization layer (BN‑L) and ReLU layer 2012;199:W54–64. (ReLU‑L), followed by upsampling layers; CNTK: Microsoft Cognitive Toolkit; 2. Schwartz LH, Bogaerts J, Ford R, Shankar L, Therasse P, Gwyther S, et al. Conv‑L: Convolutional layer; CT: Computed tomography; DL: Deep learning; Evaluation of lymph nodes with RECIST 1.1. Eur J Cancer. 2009;45:261–7. f‑net: Foveal neural network; FP: False positive; GPU: Graphics processing unit; 3. Younes A, Hilden P, Coiffier B, Hagenbeek A, Salles G, Wilson W, et al. HU: Hounsfield units; LN: Lymph node; RECIL: Response evaluation criteria International Working Group consensus response evaluation criteria in in lymphoma; RECIST: Response evaluation criteria in solid tumors; ReLU‑L: lymphoma (RECIL). Ann Oncol. 2017;2017:1436–47. Rectified linear units layer; SAD: Short ‑axis diameter; u‑net: Fully convolutional 4. Cheson BD. Staging and response assessment in lymphomas: the new networks. Lugano classification. Chin Clin Oncol. 2015;4:1–9. 5. De Langen AJ, Raijmakers P, Riphagen I, Paul MA, Hoekstra OS. The size of Acknowledgements mediastinal lymph nodes and its relation with metastatic involvement: a The authors would like to thank Martin Balthasar for assistance with the statis‑ meta‑analysis. Eur J Cardio ‑ Thorac Chirurgie. 2006;29:26–9. tical analysis and Jasmin Holz for assistance with data curation. 6. Sloothaak DAM, van der Linden RLA, van de Velde CJH, Bemelman WA, Lips DJ, van der Linden JC, et al. Prognostic implications of occult Authors’ contributions nodal tumour cells in stage I and II colon cancer: the correlation A‑II: Conceptualization, writing—original draft preparation, investigation, vali‑ between micrometastasis and disease recurrence. Eur J Surg Oncol. dation, visualization, writing—reviewing and editing. HC: conceptualization, 2017;43:1456–62. writing—original draft preparation, methodology, software, validation, visuali‑ 7. Choi SB, Han HJ, Park P, Kim WB, Song TJ, Choi SY. Systematic review of the zation, writing—reviewing and editing. AJH: Conceptualization, supervision. clinical significance of lymph node micrometastases of pancreatic adeno ‑ TB: Methodology, software. TK: Conceptualization, supervision, methodology. carcinoma following surgical resection. Pancreatology. 2017;17:342–9. DM: Conceptualization, supervision. TP: Supervision, writing—reviewing and 8. Leong SPL, Tseng WW. Micrometastatic cancer cells in lymph nodes, editing. BB: Supervision, writing—original draft preparation, Writing—review‑ bone marrow, and blood: clinical significance and biologic implications. ing and editing. MP: Conceptualization, supervision, writing—reviewing and CA Cancer J Clin. 2014;64:195–206. editing. All authors read and approved the final manuscript. 9. Dappa E, Elger T, Hasenburg A, Düber C, Battista MJ, Hötker AM. The value of advanced MRI techniques in the assessment of cervical cancer: a Funding review. Insights Imaging. 2017;8:471–81. Open Access funding enabled and organized by Projekt DEAL. This work was 10. Shen G, Zhou H, Jia Z, Deng H. Diagnostic performance of diffusion‑ partially supported by a Philips Clinical Research Fellowship. The work is part weighted MRI for detection of pelvic metastatic lymph nodes in patients of the SPP initiative of the Deutsche Forschungsgemeinschaft (DFG). with cervical cancer: a systematic review and meta‑analysis. Br J Radiol. 2015. https:// doi. org/ 10. 1259/ bjr. 20150 063. Availability of data and materials 11. Otero‑ García MM, Mesa‑Álvarez A, Nikolic O, Blanco ‑Lobato P, Basta‑ The training and validation dataset supporting the conclusions of this article Nikolic M, de Llano‑ Ortega RM, et al. Role of MRI in staging and follow‑up is available at https:// wiki. cance rimag ingar chive. net/ displ ay/ Public/ CT+ of endometrial and cervical cancer: pitfalls and mimickers. Insights Imag‑ Lymph+ Nodes or through request to the corresponding author. The testing ing. 2019;10:19. dataset supporting the conclusions of this article can be accessed through 12. Chen L, Zhou Z, Sher D, Zhang Q, Shah J, Pham NL, et al. Combining request to the corresponding author. many‑ objective radiomics and 3D convolutional neural network through evidential reasoning to predict lymph node metastasis in head and neck cancer. Phys Med Biol. 2019. https:// doi. org/ 10. 1088/ 1361‑ 6560/ ab083a. Declarations 13. Spuhler KD, Ding J, Liu C, Sun J, Serrano‑Sosa M, Moriarty M, et al. Task ‑ based assessment of a convolutional neural network for segmenting Ethics approval and consent to participate breast lesions for radiomic analysis. Magn Reson Med. 2019;82:786–95. For the training dataset Institutional Review Board approval was not required 14. Ji GW, Zhu FP, Zhang YD, Liu XS, Wu FY, Wang K, et al. A radiomics because it is a publicly available dataset. For the validation dataset ethical approach to predict lymph node metastasis and clinical outcome of approval was waived due to the retrospective design of the study based on intrahepatic cholangiocarcinoma. Eur Radiol. 2019;29:3725–35. preexisting images (ethics committee of the Faculty of Medicine, University of 15. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. Cologne, reference number 19‑1390/ 07.08.2019). A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. Consent for publication 16. Ching T, Himmelstein DS, Beaulieu‑ Jones BK, Kalinin AA, Do BT, Way Not applicable. GP, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15:20170387. Competing interests 17. Bi WL, Hosny A, Schabath MB, Giger ML, Birkbak NJ, Mehrtash A, et al. Arti‑ DM received speaker’s honoraria from Philips Healthcare. A‑II received ficial intelligence in cancer imaging: clinical challenges and applications. institutional research support from Philips Healthcare for research. HC, TB and CA Cancer J Clin. 2019;69:127–57. TK are employees from Philips Research for technical deployment of the AI 18. Zhou SK, Greenspan H, Davatzikos C, Duncan JS, van Ginneken B, Madab‑ algorithms. All other authors were independent researchers and guarantee hushi A, et al. A review of deep learning in medical imaging: Image traits, the correctness of the data and results. technology trends, case studies with progress highlights, and future promises. 2020; arXiv: 2008. 09104. Author details 19. Kooi T, Litjens G, van Ginneken B, Gubern‑Mérida A, Sánchez CI, Mann R, Institute of Diagnostic and Interventional Radiology, Medical Faculty et al. Large scale deep learning for computer aided detection of mam‑ and University Hospital Cologne, University of Cologne, Kerpener Str. 62, mographic lesions. Med Image Anal. 2017;35:303–12. Iuga et al. BMC Med Imaging (2021) 21:69 Page 12 of 12 20. Nie D, Zhang H, Adeli E, Liu L, Shen D. 3D deep learning for multi‑modal 26. Oda H, Bhatia KK, Roth HR, Oda M, Kitasaka T, Iwano S, et al. Dense volu‑ imaging‑ guided survival time prediction of brain tumor patients. In: metric detection and segmentation of mediastinal lymph nodes in chest International conference on medical image computing and computer‑ CT images. SPIE Med Imaging. 2018. https:// doi. org/ 10. 1117/ 12. 22870 66. assisted intervention, vol. 9901; 2016. p. 212–20. 27. Seff A, Lu L, Barbu A, Roth H, Shin HC, Summers RM. Leveraging mid‑ 21. Seff A, Lu L, Cherry KM, Roth HR, Liu J, Wang S, et al. 2D view aggregation level semantic boundary cues for automated lymph node detection. In: for lymph node detection using a shallow hierarchy of linear classifiers. In: International conference on medical image computing and computer‑ International conference on medical image computing and computer‑ assisted intervention. 2015. p. 53–61. assisted intervention, vol. 17; 2014. p. 544–52. 28. Razavian AS, Azizpour H, Sullivan J, Carlsson S. CNN features off‑the ‑shelf: 22. Roth HR, Lu L, Seff A, Cherry KM, Hoffman J, Wang S, et al. A new 2.5D An astounding baseline for recognition. In: IEEE Computer Society confer‑ representation for lymph node detection using random sets of deep ence on computer vision and pattern recognition workshops. 2014. p. convolutional neural network observations. In: International conference 512–9. on medical image computing and computer‑assisted intervention; 2014. 29. Hussain Z, Gimenez F, Yi D, Rubin D. Differential data augmentation p. 520–527. techniques for medical imaging classification tasks. In: AMIA annual 23. Mountain CF, Dresler CM. Regional lymph node classification for lung symposium proceedings. 2017. p. 979–84. cancer staging. Chest. 1997;111:1718–23. 24. Brosch T, Saalbach A. Foveal fully convolutional nets for multi‑ organ Publisher’s Note segmentation. Medical Imaging. 2018. https:// doi. org/ 10. 1117/ 12. 22935 Springer Nature remains neutral with regard to jurisdictional claims in pub‑ lished maps and institutional affiliations. 25. Ronneberger O, Fischer P, Brox T. U‑net: Convolutional networks for biomedical image segmentation; 2015. arXiv: 1505. 04597. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions

Journal

BMC Medical ImagingSpringer Journals

Published: Apr 13, 2021

There are no references for this article.