Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automatic segmentation of head and neck primary tumors on MRI using a multi-view CNN

Automatic segmentation of head and neck primary tumors on MRI using a multi-view CNN Background : Accurate segmentation of head and neck squamous cell cancer (HNSCC) is important for radiotherapy treatment planning. Manual segmentation of these tumors is time-consuming and vulnerable to inconsistencies between experts, especially in the complex head and neck region. The aim of this study is to introduce and evaluate an automatic segmentation pipeline for HNSCC using a multi-view CNN (MV-CNN). Methods: The dataset included 220 patients with primary HNSCC and availability of T1-weighted, STIR and optionally contrast-enhanced T1-weighted MR images together with a manual reference segmentation of the primary tumor by an expert. A T1-weighted standard space of the head and neck region was created to register all MRI sequences to. An MV-CNN was trained with these three MRI sequences and evaluated in terms of volumetric and spatial performance in a cross-validation by measuring intra-class correlation (ICC) and dice similarity score (DSC), respectively. Results: The average manual segmented primary tumor volume was 11.8±6.70 cm with a median [IQR] of 13.9 3 3 [3.22-15.9] cm . The tumor volume measured by MV-CNN was 22.8±21.1 cm with a median [IQR] of 16.0 [8.24-31.1] cm . Compared to the manual segmentations, the MV-CNN scored an average ICC of 0.64±0.06 and a DSC of 0.49± 0.19. Improved segmentation performance was observed with increasing primary tumor volume: the smallest tumor 3 3 volume group (<3 cm ) scored a DSC of 0.26±0.16 and the largest group (>15 cm ) a DSC of 0.63±0.11 (p<0.001). The automated segmentation tended to overestimate compared to the manual reference, both around the actual primary tumor and in false positively classified healthy structures and pathologically enlarged lymph nodes. Conclusion: An automatic segmentation pipeline was evaluated for primary HNSCC on MRI. The MV-CNN produced reasonable segmentation results, especially on large tumors, but overestimation decreased overall performance. In further research, the focus should be on decreasing false positives and make it valuable in treatment planning. Keywords: Head and neck squamous cell cancer, MRI, Multi-view convolutional neural network, Registration, Segmentation * Correspondence: m.steenwijk@amsterdamumc.nl Department of Anatomy and Neurosciences, Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan 1117, Amsterdam, The Netherlands De Boelelaan 1108, 1081 HZ Amsterdam, The Netherlands Full list of author information is available at the end of the article © The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Schouten et al. Cancer Imaging (2022) 22:8 Page 2 of 9 Background approach that is able to handle patient movement and a Head and neck squamous cell cancer (HNSCC) accounts variety of anatomical tumor locations within the head for approximately 3% of cancers world-wide [1]. Head and neck region. Segmentation quality was evaluated and neck cancer is typically associated with heavy use of both in terms of volumetric and spatial performance. alcohol or tobacco, however in recent years the human papillomavirus emerged as a third risk factor in oropha- Materials and methods ryngeal cancers [2]. Treatment selection is based on the Study population best tradeoff between cure rate and quality of life, and For this retrospective study, data from two previous consists of surgery, chemotherapy and radiotherapy or a studies were combined [17–19]. Cases were included combination thereof, depending on e.g. the disease stage using the following criteria: (1) primary tumor located in [3]. Conservative treatment using concurrent chemo- the oral cavity, oropharynx or hypopharynx; (2) availabil- therapy and radiotherapy is increasingly applied to pa- ity of T1-weighted, short-T1 inversion recovery (STIR) tients with advanced stage HNSCC, with locoregional and optionally contrast-enhanced T1-weighted (T1gad) control and organ preservation as the main treatment images together with a manual reference segmentation goals. of the primary tumor on the T1-weighted scans before Accurate primary tumor delineation is a crucial step in therapy; (3) at least a T2 primary tumor according to the radiotherapy planning and is performed manually or TNM classification (7th -edition) [20]; Following these semi-automatically by radiation oncologists [4]. This criteria, we included 220 cases of which the demographi- process is often time consuming and inconsistencies cal, clinical and radiological details are shown in Table 1. between experts can have significant influence on pre- Reference segmentations of primary tumor tissue were cision of the treatment [5, 6]. Automatic segmenta- constructed manually on the T1-weighted scans within tion of HNSCC using deep learning is currently the scope of the previous studies. Experts were allowed mostly investigated with computed tomography (CT) to view STIR and optionally the T1gad while delineating [7, 8], fluorodeoxyglucose-positron emission tomog- tumor. The dataset is not publically available. raphy ( F-FDG-PET) or combined PET/CT [9, 10] as an input for delineation of tumors or surrounding MR imaging organs-at-risk. However, in head and neck cancer, Multiple MRI scanners (Siemens, Philips and GE) were MRI is the preferred imaging modality to detect local used for acquisition of the data used in this work. Proto- tumor extent because of its superior soft-tissue con- cols differed between these scanners and between the trast without adverse radiation effects [11]. A few two studies. A 2D STIR (TR/TE/TI=4198-9930/8-134/ studies with a limited number of patients used single 150-225ms) of the whole head and neck region was per- center MRI data that was obtained within a standard- formed after which the area of interest was scanned with ized research protocol to automatically segment a 2D T1-weighted (TR/TE=350-800/10-18ms) sequence HNSCC [12, 13]. With dice similarity scores (DSC) before and after injection of contrast. In a subset of the between 0.30 and 0.40, their performance is not com- data, this area of interest was also scanned with an add- parable with the segmentation performance when itional STIR. Due to the use of different scanners and using PET/CT (DSC above 0.70) and should improve acquisition protocols, reconstruction resolution varied to make it useful for the clinic. Ideally an HNSCC between cases. The (contrast-enhanced) T1-weighted segmentation method produces results independent of images typically had voxel dimensions of 0.4 × 0.4 × anatomical HNSCC location, MRI scanner vendor or 4.4mm, whereas STIR images had dimensions of 0.4 × acquisition protocol (2D or 3D). Deep learning 0.4 × 7.7mm. methods are also being developed constantly to im- prove performance on medical datasets. Multi-view Preprocessing convolutional neural networks (MV-CNNs) have been Tools from the FMRIB Software Library (FSL; http://fsl. successfully used in a variety of segmentation tasks fmrib.ox.ac.uk) were used to align all subjects and MRI on medical datasets, where three identical networks sequences into an isotropic 1-mm head and neck stand- are trained simultaneously each on a different 2D ard space (224 × 192 × 117 pixels), which is common plane so that information is included from three practice in brain lesion studies [21, 22]. Construction of planes without the computational complexity of 3D the head and neck standard space was done by merging patches [14–16]. the T1-weighted images in the dataset as follows. The The aim of this work is to introduce and evaluate a T1-weighted image of one subject was selected as a glo- pipeline for automatic segmentation of the primary bal reference and interpolated to 1-mm isotropic space. tumor in HNSCC on MRI with MV-CNN. To achieve This selection was done by visual assessment of all im- this, we developed a registration and segmentation ages by an experienced radiologist, selecting a case that Schouten et al. Cancer Imaging (2022) 22:8 Page 3 of 9 Table 1 Demographical, clinical and MRI characteristics of the weighted image of each participant to head and neck subjects included in this study standard space. First, FSL FLIRT was used to obtain an Total cases 220 initial affine transformation matrix between subject space and the head and neck standard space. In the sec- Demographical data ond step FSL FLIRT was initialized by the rigid compo- Age (yrs) 61.9±9.3 nent of the transformation matrix of the first step and Gender M: 148 (67%) weighting was applied to filter out the background. The Tumor locations STIR and T1gad of individual subjects were then also Oral cavity 52 (24%) co-registered with the corresponding T1-weighted im- Oropharynx 151 (69%) ages using the same two-step approach, and the trans- formation matrices were concatenated to transform all Hypopharynx 17 (7%) images to head and neck standard space (spline Tumor classification* interpolation). T2 78 (35%) T3 45 (21%) Network structure T4 97 (44%) A multi-view convolutional neural network (MV-CNN) Lymph node classification** architecture was implemented with three equal branches, modelling the axial, coronal and sagittal 2D N0 78 (35%) plane [14]. Together, these branches combine the infor- N1 42 (19%) mation of three views with reduced computational com- N2 97 (44%) plexity compared with 3D patches [23]. A visualization N3 3 (1%) of the network is displayed in Fig. 1. The input of each MRI Sequences branch is a 32 × 32 patch with three channels represent- T1 220 (100%) ing the three MRI sequences: T1-weighted, T1gad and STIR. When a sequence was not available in a subject, T1gad 213 (97%) the channel was zeroed. Each branch consists of a con- STIR 220 (100%) volutional (3 × 3 kernel, batch normalization, ReLu acti- MR vendors vation), max pooling (2 × 2 kernel), dropout (25%) and GE 104 (47%) dense layer. The output of each branch is concatenated Philips 95 (43%) before passing through two dense layers (ReLu and soft- Siemens 21 (9%) max activation, respectively) to get an output of size * Tumor classification was defined according to the TNM criteria (7th edition): two, representing non-tumor and tumor. To include In general, T2 = the tumor is between 2 and 4 cm in the greatest dimension; larger-scale contextual information, a pyramid structure T3 = the tumor is larger than 4 cm in the greatest dimension or invading was implemented where two inputs (scale 0; 32 × 32 × 3 surrounding structures; T4 = the tumor invades other (critical) tissues. ** Lymph node classification was defined according to the TNM criteria (7th and scale 1; 64 × 64 × 3) are included for each of the edition): N0 = no regional lymph-node metastases; N1 = metastases to one or three views [24]. The latter was first downsampled to more ipsilateral lymph nodes with the greatest dimension smaller than 6 cm; N2 = metastases to contralateral or bilateral lymph nodes with the greatest 32 × 32 × 3 to fit in the network. dimension smaller than 6 cm; N3 = metastases to one or more lymph nodes with the greatest dimension larger than 6 cm. Abbreviations: T1gad = Training procedure contrast-enhanced T1-weighted; STIR = short-T1 inversion recovery The model was trained on a NVIDIA GeForce GTX had a relatively small primary tumor, an average bended 1080 TI graphics processor unit (GPU) with tensorflow- neck and the full FOV occupied. The T1-weighted im- gpu version 1.9.0, CUDA version 10.1 and Python 3.6.7. ages of all other subjects were first rigidly (FSL FLIRT, Because approximately 2% of the MRI image voxels default scheme, spline interpolation) and then non- belonged to tumor tissue, we reduced the class imbal- linearly (FSL FNIRT, default scheme, spline ance by randomly sampling 50% of all tumor voxels and interpolation) registered to the global reference in order 1% of all healthy tissue voxels for training. Both the 32 × to form a stack of co-registered T1-weighted head and 32 and 64 × 64 patch were created around the same se- neck images. The head and neck standard space was lected pixel and the channels in each patch were vari- then constructed by taking the voxel-wise non-zero ance normalized to include intensities from four average of the stack of T1-weighted images and only in- standard deviations from the mean. Only voxels repre- cluding the voxels that were in at least 30% of the regis- senting healthy of tumor tissue were used for training or tered T1-weighted images. testing. To prevent border effects, an extra border of 32 Then, again, a two-step approach was used to rigidly zeros was padded around the full image in all three di- register (FSL FLIRT, spline interpolation) the T1- mensions. Five-fold cross-validation was used to evaluate Schouten et al. Cancer Imaging (2022) 22:8 Page 4 of 9 Fig. 1 The MV-CNN architecture used in the current study. On the left side, the schematic overview with the pyramid structure (scale 0 and scale 1). Each branch of the MV-CNN has the same structure, which is shown on the right, consisting of convolutional (with batch normalization (BN) and ReLu activation), max pooling, dropout and dense layers. The outputs of the branches are concatenated and with two dense layers reduces to the output of size two, representing non-tumor and tumor performance. Manual quality control ensured the [IQR] of 16.0 [8.24-31.1]. Average volumetric perform- consistency of the distribution of demographical and ance of MV-CNN was ICC=0.64±0.06 and average medical characteristics distributions between the folds. spatial performance was DSC=0.49±0.19 (Table 2). Fig- Each model and fold was trained in 25 epochs using a ure 2 illustrates four typical segmentation results, of batch size of 512, Adam optimizer [25] and softdice loss which 2 A and 2B show a good and reasonable result. function. The softdice loss coefficient was calculated Figure 2C and 2D illustrate the effect on the spatial per- over all voxels within a batch. Using an initial learning formance of the MV-CNN of false positive classifica- rate of 0.001, the learning rate was lowered by 20% after tions, both in healthy tissue structures (Fig. 2C) as well every fifth epoch. as in pathologically enlarged lymph nodes (Fig. 2D). MV-CNN showed a structural overestimation of the pre- Evaluation dicted tumor volume (Fig. 3A). Although misclassifica- To obtain full segmentation of the test images, the trained tions of the automatic segmentation often occurred in network was applied to all voxels within the mask. After pathologically enlarged lymph nodes, there was no dif- interference the intra-class correlation coefficient (ICC; ference found in model performance between cases from single measure and absolute agreement [26]) and the Dice different lymph node subgroups in the TNM classifica- Similarity Coefficient (DSC) were calculated to evaluate tion (Table 2). There is also no difference in the per- volumetric and spatial performance. DSC was also com- formance between the T3 and T4 subgroups in the pared between subgroups based on tumor classification TNM classification, only the T2 subgroup scored a lower and location. Because the T-stage in the TNM- spatial performance compared with the other subgroups classification both includes information on tumor size and (both p<0.001). invasiveness of the tumor into surrounding tissues, we created four similar sized groups based on tumor volume Tumor size dependency to compare DSC between these subgroups. Due to the fact that the TNM classification is not only based on tumor size but also on invasiveness in sur- Statistical analysis rounding structures, the spatial performance was mea- Statistical differences in spatial performance were sured additionally between four groups of tumor assessed using Python SciPy package comparing sub- volumes: patients with tumor volume <3 cm (V1), tu- groups in tumor classification, location and volume mors between 3 and 7 cm (V2), tumors between 7 and (Wilcoxon rank-sum test). P-values < 0.05 were consid- 3 3 15 cm (V3) and tumors >15 cm (V4). It was found that ered statistically significant. the spatial performance of the MV-CNN increased sig- nificantly with larger tumor volumes (V3 vs. V4 p=0.037, Results all others p<0.001), which is also visible in Fig. 3B where The average reference volume was 11.8±6.70 cm with a these volume groups are plotted against the DSC scores. median [IQR] of 13.9 [3.22-15.9] cm . The average MV- The smallest tumor volume group scored a DSC of CNN tumor volume was 22.8±21.1cm with a median Schouten et al. Cancer Imaging (2022) 22:8 Page 5 of 9 Table 2 Performance results in ICC and DSC (mean±standard the effect of distortion correction in apparent diffusion deviation) by the MV-CNN for all test cases and DSCs per coefficient (ADC) measurements on the segmentation subgroup based on tumor classification, location, volume and performance of CNNs. In their study, they used a 3D lymph node classification for the five-fold cross-validation DeepMedic network structure to segment HNSCC in 18 N MV-CNN patients. They found no significant performance differ- INTRA-CLASS CORRELATION (ICC) ence with or without this distortion correction and re- ceived an average DSC of 0.40. They also emphasized All 220 0.64±0.06 the impact of the complexity of the head and neck re- DICE SIMILARITY SCORE (DSC) gion and the large variety of sizes, shapes and locations All 220 0.49±0.19 in HNSCC on the performance of an automatic segmen- Tumor classification tation algorithm. In a more recent study of Bielak et al. T2 78 0.39±0.21 [13], only HNSCCs were included with a shortest diam- T3 45 0.53±0.17 eter of at least 2 cm to investigate the segmentation per- formance. Even though these tumors would classify as T4 97 0.55±0.15 larger tumors in our study and full MRI protocols with Tumor location seven MRI sequences were used as input for the 3D Oral cavity 52 0.38±0.19 DeepMedic CNN (compared with three sequences in Oropharynx 151 0.51±0.18 this study), they scored a lower mean DSC of 0.30. Hypopharynx 17 0.57±0.11 In other previous studies, MRI-based segmentation Tumor volume proved to be able to segment nasopharyngeal carcinoma in the head and neck region with good performance with V<= 3 cm 51 0.26±0.16 mean DSCs around 0.80 [27, 28]. However, this type of 3< V <=7 cm 62 0.47±0.12 cancer is not considered to be biologically related to 7< V <=15cm 50 0.59±0.11 HNSCC [29] and always arises from the nasopharynx V > 15 cm 57 0.63±0.11 epithelium [30], which makes anatomical localization Lymph node classification easier than HNSCCs that in our case were located in N0 78 0.47±0.18 various anatomical locations within the hypopharynx, oropharynx or oral cavity. N1 42 0.53±0.20 Segmentation of HNSCC with both PET- and CT- N2/3 100 0.48±0.19 scans have been done more frequently in literature and with better results than with MRI. Guo et al. [9] showed 0.26±0.16 and the largest tumor volume group a DSC of a DSC of 0.31 using only CT scans, but reached a DSC 0.63±0.11. of 0.71 when CT was combined with PET images. Other The oral cavity tumors showed a lower spatial perform- PET/CT studies also show high segmentation perfor- ance than the other two tumor locations (oropharynx and mances with DSCs above 0.70 [10, 31–33]. Although the hypopharynx; both p<0.001). This could be explained by results in this study seem to be an improvement to pre- the lower tumor volumes of included oral cavity tumors vious HNSCC segmentation results in MRI, there is still (6.4±8.3 cm ) in comparison to oropharyngeal (p<0.001) a gap between the performance of MRI and PET/CT and hypopharyngeal (p<0.001) tumors (12.9±15.0 and that should be overcome first to make MRI as suitable 18.6±13.3 cm , respectively) in our study group. for automatic segmentation of HNSCC as PET/CT. Be- sides further research in segmentation methods with Discussion only MRI input data, the use of data obtained from an In this study, we developed and evaluated a pipeline for integrated PET/MRI system might help in bridging the automatic primary tumor segmentation in HNSCC using gap in the future. three conventional MRI sequences obtained from our The relatively low mean ICC and DSC in this study multivendor MRI database. The proposed MV-CNN were mainly driven by false positives, both in the tumor produces segmentations with reasonable volumetric and border and in pathologically enlarged lymph nodes. De- spatial performances (ICC=0.64±0.06 and DSC=0.49± pending on the tissue within the head and neck region, 0.19 respectively). the transition between tumor and healthy tissue can vary Only a limited number of studies are available on pri- significantly. The registration of the images to the stand- mary tumor segmentation in the head and neck area ard space also resulted in loss of data when the voxel solely using MRI data as input. Although the goal of Bie- size changes to 1mm in all directions, which can also lak et al. [12] was different from this study, the segmen- have an effect on the clearness of the border around the tation results can be compared to ours. They evaluated tumor. Apart from this, there were also frequently false Schouten et al. Cancer Imaging (2022) 22:8 Page 6 of 9 Fig. 2 Segmentation results with the three MRI sequences, in red the manual segmentation and in green the network segmentation of the MV- CNN on T1-weighted images. The whole image DSC scores are given per example. On top a good (A) and reasonable (B) result is shown in an oral cavity (tongue) and floor of the mouth tumor, respectively. False positives in the predicted segmentation were found often. In (C) the oropharyngeal (tonsillar fossa) tumor was located on the right side, with false positive classifications on the contralateral side in healthy tissue. In (D) a large oropharyngeal cancer (base of tongue) was adequately segmented, however the network also had false positive segmentations in the bilateral lymphadenopathy (yellow arrows) positively classified structures present in healthy tissues The performance of the segmentation network also as well as in pathologically enlarged lymph nodes. depended on size of the primary tumor. Since T-stage Lymph node metastases occur frequently in HNSCC and in the TNM classification is not based solely on are sometimes even larger in volume compared with the tumor size, but also takes into account tumor inva- primary tumor. That the network classifies these en- sion into critical anatomical structures, we added the larged nodes as tumor tissue can be understood since analysis of network performance in categories of the normal anatomy is sometimes significantly distorted tumor volume. Larger tumor volumes showed signifi- due to the local mass effect. It can be hypothesized that cantly higher DSCs compared with the low volume an integrated network, which is trained on both primary tumors. There is only little information on tumor vol- tumors and lymph node metastases might show a better umes of included HNSCCs used in previously pub- spatial and volumetric performance compared with net- lished studies. Besides the fact that smaller tumors works only based on the primary tumor. Therefore, in- are also harder to manually segment for experts, an- cluding manual reference segmentations of the other explanation of this lower spatial performance pathologic lymph nodes could potentially further in- could be that the DSC of a small object is also more crease the segmentation accuracy of the primary tumor. easily affected by false positives because of the smaller Another solution might be to manually draw a cube true positive area. To be able to really compare per- around the tumor in which the network accurately seg- formances between studies, information on the in- ments the tumor to reduce misclassifications in healthy cluded tumor volumes should be provided in tissue. published studies. Schouten et al. Cancer Imaging (2022) 22:8 Page 7 of 9 Fig. 3 A The reference tumor volume plotted against the predicted tumor volume that shows systematic overestimation of the tumor. B For the four volume groups, the spatial performance in DSC of the MV-CNN is shown. The spatial performance increases when the tumor volume. V1 = 3 3 3 tumor volumes below 3 cm ; V2 = tumor volumes between 3 and 7 cm ; V3 = tumor volumes between 7 and 15 cm ; V4 = tumor volumes above 15 cm Head movement (turning or tilting of the head), swal- even removing the preprocessing method to make the lowing and metal artifacts are important causes of MR network more robust for new datasets. image artifacts in the head and neck area. Besides ham- The manual reference segmentation was drawn on the pering the quality of the images, movement will also T1-weighted scan, so it is therefore still possible that the cause a variable appearance of normal anatomy. To im- manual segmentation did not align perfectly with the prove network training, we applied registration to the in- other MRI sequences which could also influence the dividual MRI images. By first creating a standard space automatic segmentation performance. Improvements neck by non-linear registration of T1 images, the success can also be made on the manual segmentations of the rate of the linear registration of each case to this stand- tumor. Because our study included data from two previ- ard space was increased and similarity in the orientation ous studies, the manual segmentation has been done by of the scans was created for the network to train with. two different experts. The exact border between tumor Linear registration was chosen for the actual registration and healthy tissue is sometimes difficult to appreciate in of the cases, to not alter the anatomical structures in HNSCC, causing a substantial disagreement on the de- each subject. Because of the linear registration, differ- lineation of the tumor between raters [5]. Adding more ences between cases due to swallowing or breathing structural (T2-weighted) and functional (i.e. diffusion were not corrected for and could still affect the segmen- weighted imaging and 3D ultrafast dynamic contrast en- tation performance. The STIR and T1gad scans were hanced) MRI sequences could potentially improve man- also not perfectly aligned to the T1-weighted scan due ual and automated tumor delineations [34, 35], where it to the linear registration. The registration could also be would also be interesting to evaluate the added value of improved when patients would lay in a mask during the each MRI sequence to the overall performance of a net- MRI scan, which is often the case in radiotherapy plan- work. Furthermore, the used conventional MRI se- ning (PET-)CT scans that also have been used for man- quences all consisted of 2D images, which were ual segmentation in the cited PET/CT studies [9, 31]. interpolated to make it usable for our 2.5D approach. Although the segmentations were evaluated in the stand- The increased availability of functional MRI sequences ard space in this study, they could be brought back to and 3D MRI sequences for high-resolution diagnostic the original image dimensions of the MRI scans by ap- imaging in HNSCC can further increase automatic plying the inverse matrices of the registration process. tumor segmentation in the future. Because data was Further research would be useful to further optimize, or used from two previous studies and was obtained by Schouten et al. Cancer Imaging (2022) 22:8 Page 8 of 9 scanners from three different MRI vendors, the data was Received: 2 August 2021 Accepted: 31 December 2021 intrinsically heterogeneous. Although this makes it harder to train the network it can be hypothesized that the result will be more robust and better suited for data References 1. Bray F, et al. Global cancer statistics 2018: GLOBOCAN estimates of acquired in a clinical setting. incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. Conclusions 2. Marur S, et al. HPV-associated head and neck cancer: a virus-related cancer epidemic. The lancet oncology. 2010;11(8):781–9. We investigated an automatic segmentation pipeline for 3. Chow LQ. Head and Neck Cancer. N Engl J Med. 2020;382(1):60–72. primary HNSCC based on MRI data. After registration 4. Njeh C. Tumor delineation: The weakest link in the search for accuracy in of the MRI sequences using a head and neck standard radiotherapy. Journal of medical physics/Association of Medical Physicists of India. 2008;33(4):136. space, MV-CNN produced reasonable volumetric and 5. Vinod SK, et al. Uncertainties in volume delineation in radiation oncology: a spatial performances, especially in large tumors, but to systematic review and recommendations for future studies. Radiother be able to use the automatic segmentations only on MRI Oncol. 2016;121(2):169–79. 6. Verbakel WF, et al. Targeted intervention to improve the quality of head data in treatment planning, the performance has to in- and neck radiation therapy treatment planning in the Netherlands: short crease. This could be achieved by reducing the number and long-term impact. International Journal of Radiation Oncology* of false positives in the predicted segmentation. Biology* Physics. 2019;105(3):514–24. 7. Nikolov S, et al., Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv preprint arXiv:1809.04430, 2018. Abbreviations 8. Ibragimov B, Xing L. Segmentation of organs-at‐risks in head and neck CT HNSCC: Head and neck squamous cell cancer; CNN: Convolutional neural images using convolutional neural networks. Medical physics. 2017;44(2): network; MV-CNN: Multi-view convolutional neural network; ICC: Intra-class 547–57. correlation; DSC: Dice similarity score; MRI: Magnetic resonance imaging; CT: Computed tomography; PET: Positron emission tomography; STIR: Short- 9. Guo Z, et al. Gross tumor volume segmentation for head and neck cancer T1 inversion recovery; T1gad: Contrast-enhanced T1-weighted imaging; radiotherapy using deep dense multi-modality network. Phys Med Biol. TR: Repetition time; TE: Echo time; FOV: Field of view; GPU: Graphics 2019;64(20):205015. processor unit 10. Huang B, et al., Fully automated delineation of gross tumor volume for head and neck cancer on PET-CT using deep learning: a dual-center study. Contrast media & molecular imaging, 2018. 2018. Acknowledgements 11. Chung NN, et al. Impact of magnetic resonance imaging versus CT on Not applicable. nasopharyngeal carcinoma: primary tumor target delineation for radiotherapy. Head Neck: Journal for the Sciences Specialties of the Head Authors’ contributions Neck. 2004;26(3):241–6. The data was provided by RM, SM and RL. JS combined the datasets and 12. Bielak L, et al. Automatic Tumor Segmentation With a Convolutional Neural trained the network with support of MS and PdG. JS, MS and PdG wrote the Network in Multiparametric MRI: Influence of Distortion Correction. report with feedback from all authors. All authors approved the final version Tomography. 2019;5(3):292. of the manuscript. 13. Bielak L, et al. Convolutional neural networks for head and neck tumor segmentation on 7-channel multiparametric MRI: a leave-one-out analysis. Funding Radiat Oncol. 2020;15(1):1–9. This work is financially supported by a grant from the Amsterdam UMC, 14. Birenbaum A, Greenspan H. Multi-view longitudinal CNN for multiple Cancer Center Amsterdam (CCA 2017-5-40). sclerosis lesion segmentation. Eng Appl Artif Intell. 2017;65:111–8. 15. Steenwijk M, et al. Multi-view convolutional neural networks using batch normalization outperform human raters during automatic white matter Availability of data and materials lesion segmentation. in MULTIPLE SCLEROSIS JOURNAL. 2017. SAGE The datasets analyzed during the current study are not publicly available. PUBLICATIONS LTD 1 OLIVERS YARD, 55 CITY ROAD, LONDON EC1Y 1SP, ENGLAND. Declarations 16. Roth HR, et al. A new 2.5 D representation for lymph node detection using random sets of deep convolutional neural network observations. in Ethics approval and consent to participate International conference on medical image computing and computer- The Dutch Medical Research Involving Human Subjects Act (WMO) does not assisted intervention. 2014. Springer. apply to this study and therefore informed consent was waived by the 17. Mes SW, et al. Outcome prediction of head and neck squamous cell Medical Ethics Review Committee at Amsterdam UMC. carcinoma by MRI radiomic signatures. European Radiology. 2020;30(11): 6311–6321. Consent for publication 18. Martens RM, et al. Predictive value of quantitative diffusion-weighted Not applicable. imaging and 18-F-FDG-PET in head and neck squamous cell carcinoma treated by (chemo) radiotherapy. Eur J Radiol. 2019;113:39–50. Competing interests 19. Martens RM, et al. Multiparametric functional MRI and 18 F-FDG-PET for The authors declare that they have no competing interests. survival prediction in patients with head and neck squamous cell carcinoma treated with (chemo) radiation. Eur Radiol. 2021;31(2):616–28. Author details 20. Sobin LH, Gospodarowicz MK, Wittekind C. TNM classification of malignant Department of Radiology and Nuclear Medicine, Cancer Center Amsterdam, tumours. John Wiley & Sons; 2011. Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan 1117, 21. Liew S-L, et al. A large, open source dataset of stroke anatomical brain Amsterdam, The Netherlands. Department of Anatomy and Neurosciences, images and manual lesion segmentations. Scientific data. 2018;5(1):1–11. Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan 1117, 22. Suntrup-Krueger S, et al. The impact of lesion location on dysphagia Amsterdam, The Netherlands. Department of Otolaryngology – Head and incidence, pattern and complications in acute stroke. Part 2: Oropharyngeal Neck Surgery, Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan residue, swallow and cough response, and pneumonia. European journal of 1117, Amsterdam, The Netherlands. De Boelelaan 1108, 1081 HZ neurology. 2017;24(6):867–74. Amsterdam, The Netherlands. Schouten et al. Cancer Imaging (2022) 22:8 Page 9 of 9 23. Hesamian MH, et al. Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging. 2019;32(4): 582–96. 24. Ding P, et al. Pyramid context learning for object detection. Journal of Supercomputing. 2020;76(12);1–14. 25. Kingma DP, Ba J, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 26. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine. 2016; 15(2):155–63. 27. Lin L, et al. Deep learning for automated contouring of primary tumor volumes by MRI for nasopharyngeal carcinoma. Radiology. 2019;291(3):677–86. 28. Wang Y, et al. Automatic tumor segmentation with deep convolutional neural networks for radiotherapy applications. Neural Process Lett. 2018; 48(3):1323–34. 29. Bruce JP, et al. Nasopharyngeal cancer: molecular landscape. J Clin Oncol. 2015;33(29):3346–55. 30. Chua ML, et al. Nasopharyngeal carcinoma. The Lancet. 2016;387(10022): 1012–24. 31. Yang J, et al. A multimodality segmentation framework for automatic target delineation in head and neck radiotherapy. Medical physics. 2015;42(9): 5310–20. 32. Berthon B, et al. Head and neck target delineation using a novel PET automatic segmentation algorithm. Radiother Oncol. 2017;122(2):242–7. 33. Stefano A, et al. An enhanced random walk algorithm for delineation of head and neck cancers in PET studies. Med Biol Eng Comput. 2017;55(6): 897–908. 34. Cardoso M, et al. Evaluating diffusion-weighted magnetic resonance imaging for target volume delineation in head and neck radiotherapy. J Med Imaging Radiat Oncol. 2019;63(3):399–407. 35. Martens RM, et al. The Additional Value of Ultrafast DCE-MRI to DWI-MRI and 18F-FDG-PET to Detect Occult Primary Head and Neck Squamous Cell Carcinoma. Cancers. 2020;12(10):2826. Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Cancer Imaging Springer Journals

Automatic segmentation of head and neck primary tumors on MRI using a multi-view CNN

Loading next page...
 
/lp/springer-journals/automatic-segmentation-of-head-and-neck-primary-tumors-on-mri-using-a-00RJVSDHDs

References (50)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
eISSN
1470-7330
DOI
10.1186/s40644-022-00445-7
Publisher site
See Article on Publisher Site

Abstract

Background : Accurate segmentation of head and neck squamous cell cancer (HNSCC) is important for radiotherapy treatment planning. Manual segmentation of these tumors is time-consuming and vulnerable to inconsistencies between experts, especially in the complex head and neck region. The aim of this study is to introduce and evaluate an automatic segmentation pipeline for HNSCC using a multi-view CNN (MV-CNN). Methods: The dataset included 220 patients with primary HNSCC and availability of T1-weighted, STIR and optionally contrast-enhanced T1-weighted MR images together with a manual reference segmentation of the primary tumor by an expert. A T1-weighted standard space of the head and neck region was created to register all MRI sequences to. An MV-CNN was trained with these three MRI sequences and evaluated in terms of volumetric and spatial performance in a cross-validation by measuring intra-class correlation (ICC) and dice similarity score (DSC), respectively. Results: The average manual segmented primary tumor volume was 11.8±6.70 cm with a median [IQR] of 13.9 3 3 [3.22-15.9] cm . The tumor volume measured by MV-CNN was 22.8±21.1 cm with a median [IQR] of 16.0 [8.24-31.1] cm . Compared to the manual segmentations, the MV-CNN scored an average ICC of 0.64±0.06 and a DSC of 0.49± 0.19. Improved segmentation performance was observed with increasing primary tumor volume: the smallest tumor 3 3 volume group (<3 cm ) scored a DSC of 0.26±0.16 and the largest group (>15 cm ) a DSC of 0.63±0.11 (p<0.001). The automated segmentation tended to overestimate compared to the manual reference, both around the actual primary tumor and in false positively classified healthy structures and pathologically enlarged lymph nodes. Conclusion: An automatic segmentation pipeline was evaluated for primary HNSCC on MRI. The MV-CNN produced reasonable segmentation results, especially on large tumors, but overestimation decreased overall performance. In further research, the focus should be on decreasing false positives and make it valuable in treatment planning. Keywords: Head and neck squamous cell cancer, MRI, Multi-view convolutional neural network, Registration, Segmentation * Correspondence: m.steenwijk@amsterdamumc.nl Department of Anatomy and Neurosciences, Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan 1117, Amsterdam, The Netherlands De Boelelaan 1108, 1081 HZ Amsterdam, The Netherlands Full list of author information is available at the end of the article © The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Schouten et al. Cancer Imaging (2022) 22:8 Page 2 of 9 Background approach that is able to handle patient movement and a Head and neck squamous cell cancer (HNSCC) accounts variety of anatomical tumor locations within the head for approximately 3% of cancers world-wide [1]. Head and neck region. Segmentation quality was evaluated and neck cancer is typically associated with heavy use of both in terms of volumetric and spatial performance. alcohol or tobacco, however in recent years the human papillomavirus emerged as a third risk factor in oropha- Materials and methods ryngeal cancers [2]. Treatment selection is based on the Study population best tradeoff between cure rate and quality of life, and For this retrospective study, data from two previous consists of surgery, chemotherapy and radiotherapy or a studies were combined [17–19]. Cases were included combination thereof, depending on e.g. the disease stage using the following criteria: (1) primary tumor located in [3]. Conservative treatment using concurrent chemo- the oral cavity, oropharynx or hypopharynx; (2) availabil- therapy and radiotherapy is increasingly applied to pa- ity of T1-weighted, short-T1 inversion recovery (STIR) tients with advanced stage HNSCC, with locoregional and optionally contrast-enhanced T1-weighted (T1gad) control and organ preservation as the main treatment images together with a manual reference segmentation goals. of the primary tumor on the T1-weighted scans before Accurate primary tumor delineation is a crucial step in therapy; (3) at least a T2 primary tumor according to the radiotherapy planning and is performed manually or TNM classification (7th -edition) [20]; Following these semi-automatically by radiation oncologists [4]. This criteria, we included 220 cases of which the demographi- process is often time consuming and inconsistencies cal, clinical and radiological details are shown in Table 1. between experts can have significant influence on pre- Reference segmentations of primary tumor tissue were cision of the treatment [5, 6]. Automatic segmenta- constructed manually on the T1-weighted scans within tion of HNSCC using deep learning is currently the scope of the previous studies. Experts were allowed mostly investigated with computed tomography (CT) to view STIR and optionally the T1gad while delineating [7, 8], fluorodeoxyglucose-positron emission tomog- tumor. The dataset is not publically available. raphy ( F-FDG-PET) or combined PET/CT [9, 10] as an input for delineation of tumors or surrounding MR imaging organs-at-risk. However, in head and neck cancer, Multiple MRI scanners (Siemens, Philips and GE) were MRI is the preferred imaging modality to detect local used for acquisition of the data used in this work. Proto- tumor extent because of its superior soft-tissue con- cols differed between these scanners and between the trast without adverse radiation effects [11]. A few two studies. A 2D STIR (TR/TE/TI=4198-9930/8-134/ studies with a limited number of patients used single 150-225ms) of the whole head and neck region was per- center MRI data that was obtained within a standard- formed after which the area of interest was scanned with ized research protocol to automatically segment a 2D T1-weighted (TR/TE=350-800/10-18ms) sequence HNSCC [12, 13]. With dice similarity scores (DSC) before and after injection of contrast. In a subset of the between 0.30 and 0.40, their performance is not com- data, this area of interest was also scanned with an add- parable with the segmentation performance when itional STIR. Due to the use of different scanners and using PET/CT (DSC above 0.70) and should improve acquisition protocols, reconstruction resolution varied to make it useful for the clinic. Ideally an HNSCC between cases. The (contrast-enhanced) T1-weighted segmentation method produces results independent of images typically had voxel dimensions of 0.4 × 0.4 × anatomical HNSCC location, MRI scanner vendor or 4.4mm, whereas STIR images had dimensions of 0.4 × acquisition protocol (2D or 3D). Deep learning 0.4 × 7.7mm. methods are also being developed constantly to im- prove performance on medical datasets. Multi-view Preprocessing convolutional neural networks (MV-CNNs) have been Tools from the FMRIB Software Library (FSL; http://fsl. successfully used in a variety of segmentation tasks fmrib.ox.ac.uk) were used to align all subjects and MRI on medical datasets, where three identical networks sequences into an isotropic 1-mm head and neck stand- are trained simultaneously each on a different 2D ard space (224 × 192 × 117 pixels), which is common plane so that information is included from three practice in brain lesion studies [21, 22]. Construction of planes without the computational complexity of 3D the head and neck standard space was done by merging patches [14–16]. the T1-weighted images in the dataset as follows. The The aim of this work is to introduce and evaluate a T1-weighted image of one subject was selected as a glo- pipeline for automatic segmentation of the primary bal reference and interpolated to 1-mm isotropic space. tumor in HNSCC on MRI with MV-CNN. To achieve This selection was done by visual assessment of all im- this, we developed a registration and segmentation ages by an experienced radiologist, selecting a case that Schouten et al. Cancer Imaging (2022) 22:8 Page 3 of 9 Table 1 Demographical, clinical and MRI characteristics of the weighted image of each participant to head and neck subjects included in this study standard space. First, FSL FLIRT was used to obtain an Total cases 220 initial affine transformation matrix between subject space and the head and neck standard space. In the sec- Demographical data ond step FSL FLIRT was initialized by the rigid compo- Age (yrs) 61.9±9.3 nent of the transformation matrix of the first step and Gender M: 148 (67%) weighting was applied to filter out the background. The Tumor locations STIR and T1gad of individual subjects were then also Oral cavity 52 (24%) co-registered with the corresponding T1-weighted im- Oropharynx 151 (69%) ages using the same two-step approach, and the trans- formation matrices were concatenated to transform all Hypopharynx 17 (7%) images to head and neck standard space (spline Tumor classification* interpolation). T2 78 (35%) T3 45 (21%) Network structure T4 97 (44%) A multi-view convolutional neural network (MV-CNN) Lymph node classification** architecture was implemented with three equal branches, modelling the axial, coronal and sagittal 2D N0 78 (35%) plane [14]. Together, these branches combine the infor- N1 42 (19%) mation of three views with reduced computational com- N2 97 (44%) plexity compared with 3D patches [23]. A visualization N3 3 (1%) of the network is displayed in Fig. 1. The input of each MRI Sequences branch is a 32 × 32 patch with three channels represent- T1 220 (100%) ing the three MRI sequences: T1-weighted, T1gad and STIR. When a sequence was not available in a subject, T1gad 213 (97%) the channel was zeroed. Each branch consists of a con- STIR 220 (100%) volutional (3 × 3 kernel, batch normalization, ReLu acti- MR vendors vation), max pooling (2 × 2 kernel), dropout (25%) and GE 104 (47%) dense layer. The output of each branch is concatenated Philips 95 (43%) before passing through two dense layers (ReLu and soft- Siemens 21 (9%) max activation, respectively) to get an output of size * Tumor classification was defined according to the TNM criteria (7th edition): two, representing non-tumor and tumor. To include In general, T2 = the tumor is between 2 and 4 cm in the greatest dimension; larger-scale contextual information, a pyramid structure T3 = the tumor is larger than 4 cm in the greatest dimension or invading was implemented where two inputs (scale 0; 32 × 32 × 3 surrounding structures; T4 = the tumor invades other (critical) tissues. ** Lymph node classification was defined according to the TNM criteria (7th and scale 1; 64 × 64 × 3) are included for each of the edition): N0 = no regional lymph-node metastases; N1 = metastases to one or three views [24]. The latter was first downsampled to more ipsilateral lymph nodes with the greatest dimension smaller than 6 cm; N2 = metastases to contralateral or bilateral lymph nodes with the greatest 32 × 32 × 3 to fit in the network. dimension smaller than 6 cm; N3 = metastases to one or more lymph nodes with the greatest dimension larger than 6 cm. Abbreviations: T1gad = Training procedure contrast-enhanced T1-weighted; STIR = short-T1 inversion recovery The model was trained on a NVIDIA GeForce GTX had a relatively small primary tumor, an average bended 1080 TI graphics processor unit (GPU) with tensorflow- neck and the full FOV occupied. The T1-weighted im- gpu version 1.9.0, CUDA version 10.1 and Python 3.6.7. ages of all other subjects were first rigidly (FSL FLIRT, Because approximately 2% of the MRI image voxels default scheme, spline interpolation) and then non- belonged to tumor tissue, we reduced the class imbal- linearly (FSL FNIRT, default scheme, spline ance by randomly sampling 50% of all tumor voxels and interpolation) registered to the global reference in order 1% of all healthy tissue voxels for training. Both the 32 × to form a stack of co-registered T1-weighted head and 32 and 64 × 64 patch were created around the same se- neck images. The head and neck standard space was lected pixel and the channels in each patch were vari- then constructed by taking the voxel-wise non-zero ance normalized to include intensities from four average of the stack of T1-weighted images and only in- standard deviations from the mean. Only voxels repre- cluding the voxels that were in at least 30% of the regis- senting healthy of tumor tissue were used for training or tered T1-weighted images. testing. To prevent border effects, an extra border of 32 Then, again, a two-step approach was used to rigidly zeros was padded around the full image in all three di- register (FSL FLIRT, spline interpolation) the T1- mensions. Five-fold cross-validation was used to evaluate Schouten et al. Cancer Imaging (2022) 22:8 Page 4 of 9 Fig. 1 The MV-CNN architecture used in the current study. On the left side, the schematic overview with the pyramid structure (scale 0 and scale 1). Each branch of the MV-CNN has the same structure, which is shown on the right, consisting of convolutional (with batch normalization (BN) and ReLu activation), max pooling, dropout and dense layers. The outputs of the branches are concatenated and with two dense layers reduces to the output of size two, representing non-tumor and tumor performance. Manual quality control ensured the [IQR] of 16.0 [8.24-31.1]. Average volumetric perform- consistency of the distribution of demographical and ance of MV-CNN was ICC=0.64±0.06 and average medical characteristics distributions between the folds. spatial performance was DSC=0.49±0.19 (Table 2). Fig- Each model and fold was trained in 25 epochs using a ure 2 illustrates four typical segmentation results, of batch size of 512, Adam optimizer [25] and softdice loss which 2 A and 2B show a good and reasonable result. function. The softdice loss coefficient was calculated Figure 2C and 2D illustrate the effect on the spatial per- over all voxels within a batch. Using an initial learning formance of the MV-CNN of false positive classifica- rate of 0.001, the learning rate was lowered by 20% after tions, both in healthy tissue structures (Fig. 2C) as well every fifth epoch. as in pathologically enlarged lymph nodes (Fig. 2D). MV-CNN showed a structural overestimation of the pre- Evaluation dicted tumor volume (Fig. 3A). Although misclassifica- To obtain full segmentation of the test images, the trained tions of the automatic segmentation often occurred in network was applied to all voxels within the mask. After pathologically enlarged lymph nodes, there was no dif- interference the intra-class correlation coefficient (ICC; ference found in model performance between cases from single measure and absolute agreement [26]) and the Dice different lymph node subgroups in the TNM classifica- Similarity Coefficient (DSC) were calculated to evaluate tion (Table 2). There is also no difference in the per- volumetric and spatial performance. DSC was also com- formance between the T3 and T4 subgroups in the pared between subgroups based on tumor classification TNM classification, only the T2 subgroup scored a lower and location. Because the T-stage in the TNM- spatial performance compared with the other subgroups classification both includes information on tumor size and (both p<0.001). invasiveness of the tumor into surrounding tissues, we created four similar sized groups based on tumor volume Tumor size dependency to compare DSC between these subgroups. Due to the fact that the TNM classification is not only based on tumor size but also on invasiveness in sur- Statistical analysis rounding structures, the spatial performance was mea- Statistical differences in spatial performance were sured additionally between four groups of tumor assessed using Python SciPy package comparing sub- volumes: patients with tumor volume <3 cm (V1), tu- groups in tumor classification, location and volume mors between 3 and 7 cm (V2), tumors between 7 and (Wilcoxon rank-sum test). P-values < 0.05 were consid- 3 3 15 cm (V3) and tumors >15 cm (V4). It was found that ered statistically significant. the spatial performance of the MV-CNN increased sig- nificantly with larger tumor volumes (V3 vs. V4 p=0.037, Results all others p<0.001), which is also visible in Fig. 3B where The average reference volume was 11.8±6.70 cm with a these volume groups are plotted against the DSC scores. median [IQR] of 13.9 [3.22-15.9] cm . The average MV- The smallest tumor volume group scored a DSC of CNN tumor volume was 22.8±21.1cm with a median Schouten et al. Cancer Imaging (2022) 22:8 Page 5 of 9 Table 2 Performance results in ICC and DSC (mean±standard the effect of distortion correction in apparent diffusion deviation) by the MV-CNN for all test cases and DSCs per coefficient (ADC) measurements on the segmentation subgroup based on tumor classification, location, volume and performance of CNNs. In their study, they used a 3D lymph node classification for the five-fold cross-validation DeepMedic network structure to segment HNSCC in 18 N MV-CNN patients. They found no significant performance differ- INTRA-CLASS CORRELATION (ICC) ence with or without this distortion correction and re- ceived an average DSC of 0.40. They also emphasized All 220 0.64±0.06 the impact of the complexity of the head and neck re- DICE SIMILARITY SCORE (DSC) gion and the large variety of sizes, shapes and locations All 220 0.49±0.19 in HNSCC on the performance of an automatic segmen- Tumor classification tation algorithm. In a more recent study of Bielak et al. T2 78 0.39±0.21 [13], only HNSCCs were included with a shortest diam- T3 45 0.53±0.17 eter of at least 2 cm to investigate the segmentation per- formance. Even though these tumors would classify as T4 97 0.55±0.15 larger tumors in our study and full MRI protocols with Tumor location seven MRI sequences were used as input for the 3D Oral cavity 52 0.38±0.19 DeepMedic CNN (compared with three sequences in Oropharynx 151 0.51±0.18 this study), they scored a lower mean DSC of 0.30. Hypopharynx 17 0.57±0.11 In other previous studies, MRI-based segmentation Tumor volume proved to be able to segment nasopharyngeal carcinoma in the head and neck region with good performance with V<= 3 cm 51 0.26±0.16 mean DSCs around 0.80 [27, 28]. However, this type of 3< V <=7 cm 62 0.47±0.12 cancer is not considered to be biologically related to 7< V <=15cm 50 0.59±0.11 HNSCC [29] and always arises from the nasopharynx V > 15 cm 57 0.63±0.11 epithelium [30], which makes anatomical localization Lymph node classification easier than HNSCCs that in our case were located in N0 78 0.47±0.18 various anatomical locations within the hypopharynx, oropharynx or oral cavity. N1 42 0.53±0.20 Segmentation of HNSCC with both PET- and CT- N2/3 100 0.48±0.19 scans have been done more frequently in literature and with better results than with MRI. Guo et al. [9] showed 0.26±0.16 and the largest tumor volume group a DSC of a DSC of 0.31 using only CT scans, but reached a DSC 0.63±0.11. of 0.71 when CT was combined with PET images. Other The oral cavity tumors showed a lower spatial perform- PET/CT studies also show high segmentation perfor- ance than the other two tumor locations (oropharynx and mances with DSCs above 0.70 [10, 31–33]. Although the hypopharynx; both p<0.001). This could be explained by results in this study seem to be an improvement to pre- the lower tumor volumes of included oral cavity tumors vious HNSCC segmentation results in MRI, there is still (6.4±8.3 cm ) in comparison to oropharyngeal (p<0.001) a gap between the performance of MRI and PET/CT and hypopharyngeal (p<0.001) tumors (12.9±15.0 and that should be overcome first to make MRI as suitable 18.6±13.3 cm , respectively) in our study group. for automatic segmentation of HNSCC as PET/CT. Be- sides further research in segmentation methods with Discussion only MRI input data, the use of data obtained from an In this study, we developed and evaluated a pipeline for integrated PET/MRI system might help in bridging the automatic primary tumor segmentation in HNSCC using gap in the future. three conventional MRI sequences obtained from our The relatively low mean ICC and DSC in this study multivendor MRI database. The proposed MV-CNN were mainly driven by false positives, both in the tumor produces segmentations with reasonable volumetric and border and in pathologically enlarged lymph nodes. De- spatial performances (ICC=0.64±0.06 and DSC=0.49± pending on the tissue within the head and neck region, 0.19 respectively). the transition between tumor and healthy tissue can vary Only a limited number of studies are available on pri- significantly. The registration of the images to the stand- mary tumor segmentation in the head and neck area ard space also resulted in loss of data when the voxel solely using MRI data as input. Although the goal of Bie- size changes to 1mm in all directions, which can also lak et al. [12] was different from this study, the segmen- have an effect on the clearness of the border around the tation results can be compared to ours. They evaluated tumor. Apart from this, there were also frequently false Schouten et al. Cancer Imaging (2022) 22:8 Page 6 of 9 Fig. 2 Segmentation results with the three MRI sequences, in red the manual segmentation and in green the network segmentation of the MV- CNN on T1-weighted images. The whole image DSC scores are given per example. On top a good (A) and reasonable (B) result is shown in an oral cavity (tongue) and floor of the mouth tumor, respectively. False positives in the predicted segmentation were found often. In (C) the oropharyngeal (tonsillar fossa) tumor was located on the right side, with false positive classifications on the contralateral side in healthy tissue. In (D) a large oropharyngeal cancer (base of tongue) was adequately segmented, however the network also had false positive segmentations in the bilateral lymphadenopathy (yellow arrows) positively classified structures present in healthy tissues The performance of the segmentation network also as well as in pathologically enlarged lymph nodes. depended on size of the primary tumor. Since T-stage Lymph node metastases occur frequently in HNSCC and in the TNM classification is not based solely on are sometimes even larger in volume compared with the tumor size, but also takes into account tumor inva- primary tumor. That the network classifies these en- sion into critical anatomical structures, we added the larged nodes as tumor tissue can be understood since analysis of network performance in categories of the normal anatomy is sometimes significantly distorted tumor volume. Larger tumor volumes showed signifi- due to the local mass effect. It can be hypothesized that cantly higher DSCs compared with the low volume an integrated network, which is trained on both primary tumors. There is only little information on tumor vol- tumors and lymph node metastases might show a better umes of included HNSCCs used in previously pub- spatial and volumetric performance compared with net- lished studies. Besides the fact that smaller tumors works only based on the primary tumor. Therefore, in- are also harder to manually segment for experts, an- cluding manual reference segmentations of the other explanation of this lower spatial performance pathologic lymph nodes could potentially further in- could be that the DSC of a small object is also more crease the segmentation accuracy of the primary tumor. easily affected by false positives because of the smaller Another solution might be to manually draw a cube true positive area. To be able to really compare per- around the tumor in which the network accurately seg- formances between studies, information on the in- ments the tumor to reduce misclassifications in healthy cluded tumor volumes should be provided in tissue. published studies. Schouten et al. Cancer Imaging (2022) 22:8 Page 7 of 9 Fig. 3 A The reference tumor volume plotted against the predicted tumor volume that shows systematic overestimation of the tumor. B For the four volume groups, the spatial performance in DSC of the MV-CNN is shown. The spatial performance increases when the tumor volume. V1 = 3 3 3 tumor volumes below 3 cm ; V2 = tumor volumes between 3 and 7 cm ; V3 = tumor volumes between 7 and 15 cm ; V4 = tumor volumes above 15 cm Head movement (turning or tilting of the head), swal- even removing the preprocessing method to make the lowing and metal artifacts are important causes of MR network more robust for new datasets. image artifacts in the head and neck area. Besides ham- The manual reference segmentation was drawn on the pering the quality of the images, movement will also T1-weighted scan, so it is therefore still possible that the cause a variable appearance of normal anatomy. To im- manual segmentation did not align perfectly with the prove network training, we applied registration to the in- other MRI sequences which could also influence the dividual MRI images. By first creating a standard space automatic segmentation performance. Improvements neck by non-linear registration of T1 images, the success can also be made on the manual segmentations of the rate of the linear registration of each case to this stand- tumor. Because our study included data from two previ- ard space was increased and similarity in the orientation ous studies, the manual segmentation has been done by of the scans was created for the network to train with. two different experts. The exact border between tumor Linear registration was chosen for the actual registration and healthy tissue is sometimes difficult to appreciate in of the cases, to not alter the anatomical structures in HNSCC, causing a substantial disagreement on the de- each subject. Because of the linear registration, differ- lineation of the tumor between raters [5]. Adding more ences between cases due to swallowing or breathing structural (T2-weighted) and functional (i.e. diffusion were not corrected for and could still affect the segmen- weighted imaging and 3D ultrafast dynamic contrast en- tation performance. The STIR and T1gad scans were hanced) MRI sequences could potentially improve man- also not perfectly aligned to the T1-weighted scan due ual and automated tumor delineations [34, 35], where it to the linear registration. The registration could also be would also be interesting to evaluate the added value of improved when patients would lay in a mask during the each MRI sequence to the overall performance of a net- MRI scan, which is often the case in radiotherapy plan- work. Furthermore, the used conventional MRI se- ning (PET-)CT scans that also have been used for man- quences all consisted of 2D images, which were ual segmentation in the cited PET/CT studies [9, 31]. interpolated to make it usable for our 2.5D approach. Although the segmentations were evaluated in the stand- The increased availability of functional MRI sequences ard space in this study, they could be brought back to and 3D MRI sequences for high-resolution diagnostic the original image dimensions of the MRI scans by ap- imaging in HNSCC can further increase automatic plying the inverse matrices of the registration process. tumor segmentation in the future. Because data was Further research would be useful to further optimize, or used from two previous studies and was obtained by Schouten et al. Cancer Imaging (2022) 22:8 Page 8 of 9 scanners from three different MRI vendors, the data was Received: 2 August 2021 Accepted: 31 December 2021 intrinsically heterogeneous. Although this makes it harder to train the network it can be hypothesized that the result will be more robust and better suited for data References 1. Bray F, et al. Global cancer statistics 2018: GLOBOCAN estimates of acquired in a clinical setting. incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. Conclusions 2. Marur S, et al. HPV-associated head and neck cancer: a virus-related cancer epidemic. The lancet oncology. 2010;11(8):781–9. We investigated an automatic segmentation pipeline for 3. Chow LQ. Head and Neck Cancer. N Engl J Med. 2020;382(1):60–72. primary HNSCC based on MRI data. After registration 4. Njeh C. Tumor delineation: The weakest link in the search for accuracy in of the MRI sequences using a head and neck standard radiotherapy. Journal of medical physics/Association of Medical Physicists of India. 2008;33(4):136. space, MV-CNN produced reasonable volumetric and 5. Vinod SK, et al. Uncertainties in volume delineation in radiation oncology: a spatial performances, especially in large tumors, but to systematic review and recommendations for future studies. Radiother be able to use the automatic segmentations only on MRI Oncol. 2016;121(2):169–79. 6. Verbakel WF, et al. Targeted intervention to improve the quality of head data in treatment planning, the performance has to in- and neck radiation therapy treatment planning in the Netherlands: short crease. This could be achieved by reducing the number and long-term impact. International Journal of Radiation Oncology* of false positives in the predicted segmentation. Biology* Physics. 2019;105(3):514–24. 7. Nikolov S, et al., Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv preprint arXiv:1809.04430, 2018. Abbreviations 8. Ibragimov B, Xing L. Segmentation of organs-at‐risks in head and neck CT HNSCC: Head and neck squamous cell cancer; CNN: Convolutional neural images using convolutional neural networks. Medical physics. 2017;44(2): network; MV-CNN: Multi-view convolutional neural network; ICC: Intra-class 547–57. correlation; DSC: Dice similarity score; MRI: Magnetic resonance imaging; CT: Computed tomography; PET: Positron emission tomography; STIR: Short- 9. Guo Z, et al. Gross tumor volume segmentation for head and neck cancer T1 inversion recovery; T1gad: Contrast-enhanced T1-weighted imaging; radiotherapy using deep dense multi-modality network. Phys Med Biol. TR: Repetition time; TE: Echo time; FOV: Field of view; GPU: Graphics 2019;64(20):205015. processor unit 10. Huang B, et al., Fully automated delineation of gross tumor volume for head and neck cancer on PET-CT using deep learning: a dual-center study. Contrast media & molecular imaging, 2018. 2018. Acknowledgements 11. Chung NN, et al. Impact of magnetic resonance imaging versus CT on Not applicable. nasopharyngeal carcinoma: primary tumor target delineation for radiotherapy. Head Neck: Journal for the Sciences Specialties of the Head Authors’ contributions Neck. 2004;26(3):241–6. The data was provided by RM, SM and RL. JS combined the datasets and 12. Bielak L, et al. Automatic Tumor Segmentation With a Convolutional Neural trained the network with support of MS and PdG. JS, MS and PdG wrote the Network in Multiparametric MRI: Influence of Distortion Correction. report with feedback from all authors. All authors approved the final version Tomography. 2019;5(3):292. of the manuscript. 13. Bielak L, et al. Convolutional neural networks for head and neck tumor segmentation on 7-channel multiparametric MRI: a leave-one-out analysis. Funding Radiat Oncol. 2020;15(1):1–9. This work is financially supported by a grant from the Amsterdam UMC, 14. Birenbaum A, Greenspan H. Multi-view longitudinal CNN for multiple Cancer Center Amsterdam (CCA 2017-5-40). sclerosis lesion segmentation. Eng Appl Artif Intell. 2017;65:111–8. 15. Steenwijk M, et al. Multi-view convolutional neural networks using batch normalization outperform human raters during automatic white matter Availability of data and materials lesion segmentation. in MULTIPLE SCLEROSIS JOURNAL. 2017. SAGE The datasets analyzed during the current study are not publicly available. PUBLICATIONS LTD 1 OLIVERS YARD, 55 CITY ROAD, LONDON EC1Y 1SP, ENGLAND. Declarations 16. Roth HR, et al. A new 2.5 D representation for lymph node detection using random sets of deep convolutional neural network observations. in Ethics approval and consent to participate International conference on medical image computing and computer- The Dutch Medical Research Involving Human Subjects Act (WMO) does not assisted intervention. 2014. Springer. apply to this study and therefore informed consent was waived by the 17. Mes SW, et al. Outcome prediction of head and neck squamous cell Medical Ethics Review Committee at Amsterdam UMC. carcinoma by MRI radiomic signatures. European Radiology. 2020;30(11): 6311–6321. Consent for publication 18. Martens RM, et al. Predictive value of quantitative diffusion-weighted Not applicable. imaging and 18-F-FDG-PET in head and neck squamous cell carcinoma treated by (chemo) radiotherapy. Eur J Radiol. 2019;113:39–50. Competing interests 19. Martens RM, et al. Multiparametric functional MRI and 18 F-FDG-PET for The authors declare that they have no competing interests. survival prediction in patients with head and neck squamous cell carcinoma treated with (chemo) radiation. Eur Radiol. 2021;31(2):616–28. Author details 20. Sobin LH, Gospodarowicz MK, Wittekind C. TNM classification of malignant Department of Radiology and Nuclear Medicine, Cancer Center Amsterdam, tumours. John Wiley & Sons; 2011. Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan 1117, 21. Liew S-L, et al. A large, open source dataset of stroke anatomical brain Amsterdam, The Netherlands. Department of Anatomy and Neurosciences, images and manual lesion segmentations. Scientific data. 2018;5(1):1–11. Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan 1117, 22. Suntrup-Krueger S, et al. The impact of lesion location on dysphagia Amsterdam, The Netherlands. Department of Otolaryngology – Head and incidence, pattern and complications in acute stroke. Part 2: Oropharyngeal Neck Surgery, Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan residue, swallow and cough response, and pneumonia. European journal of 1117, Amsterdam, The Netherlands. De Boelelaan 1108, 1081 HZ neurology. 2017;24(6):867–74. Amsterdam, The Netherlands. Schouten et al. Cancer Imaging (2022) 22:8 Page 9 of 9 23. Hesamian MH, et al. Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging. 2019;32(4): 582–96. 24. Ding P, et al. Pyramid context learning for object detection. Journal of Supercomputing. 2020;76(12);1–14. 25. Kingma DP, Ba J, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 26. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine. 2016; 15(2):155–63. 27. Lin L, et al. Deep learning for automated contouring of primary tumor volumes by MRI for nasopharyngeal carcinoma. Radiology. 2019;291(3):677–86. 28. Wang Y, et al. Automatic tumor segmentation with deep convolutional neural networks for radiotherapy applications. Neural Process Lett. 2018; 48(3):1323–34. 29. Bruce JP, et al. Nasopharyngeal cancer: molecular landscape. J Clin Oncol. 2015;33(29):3346–55. 30. Chua ML, et al. Nasopharyngeal carcinoma. The Lancet. 2016;387(10022): 1012–24. 31. Yang J, et al. A multimodality segmentation framework for automatic target delineation in head and neck radiotherapy. Medical physics. 2015;42(9): 5310–20. 32. Berthon B, et al. Head and neck target delineation using a novel PET automatic segmentation algorithm. Radiother Oncol. 2017;122(2):242–7. 33. Stefano A, et al. An enhanced random walk algorithm for delineation of head and neck cancers in PET studies. Med Biol Eng Comput. 2017;55(6): 897–908. 34. Cardoso M, et al. Evaluating diffusion-weighted magnetic resonance imaging for target volume delineation in head and neck radiotherapy. J Med Imaging Radiat Oncol. 2019;63(3):399–407. 35. Martens RM, et al. The Additional Value of Ultrafast DCE-MRI to DWI-MRI and 18F-FDG-PET to Detect Occult Primary Head and Neck Squamous Cell Carcinoma. Cancers. 2020;12(10):2826. Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Journal

Cancer ImagingSpringer Journals

Published: Jan 15, 2022

Keywords: Head and neck squamous cell cancer; MRI; Multi-view convolutional neural network; Registration; Segmentation

There are no references for this article.