Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Towards Automated Brain Aneurysm Detection in TOF-MRA: Open Data, Weak Labels, and Anatomical Knowledge

Towards Automated Brain Aneurysm Detection in TOF-MRA: Open Data, Weak Labels, and Anatomical... Brain aneurysm detection in Time-Of-Flight Magnetic Resonance Angiography (TOF-MRA) has undergone drastic improve- ments with the advent of Deep Learning (DL). However, performances of supervised DL models heavily rely on the quantity of labeled samples, which are extremely costly to obtain. Here, we present a DL model for aneurysm detection that over- comes the issue with “weak” labels: oversized annotations which are considerably faster to create. Our weak labels resulted to be four times faster to generate than their voxel-wise counterparts. In addition, our model leverages prior anatomical knowledge by focusing only on plausible locations for aneurysm occurrence. We first train and evaluate our model through cross-validation on an in-house TOF-MRA dataset comprising 284 subjects (170 females / 127 healthy controls / 157 patients with 198 aneurysms). On this dataset, our best model achieved a sensitivity of 83%, with False Positive (FP) rate of 0.8 per patient. To assess model generalizability, we then participated in a challenge for aneurysm detection with TOF-MRA data (93 patients, 20 controls, 125 aneurysms). On the public challenge, sensitivity was 68% (FP rate = 2.5), ranking 4th/18 on the open leaderboard. We found no significant difference in sensitivity between aneurysm risk-of-rupture groups (p = 0.75), locations (p = 0.72), or sizes (p = 0.15). Data, code and model weights are released under permissive licenses. We demonstrate that weak labels and anatomical knowledge can alleviate the necessity for prohibitively expensive voxel-wise annotations. Keywords Model robustness · Weak annotation · Domain knowledge · Deep learning · Magnetic resonance angiography · Aneurysm detection Introduction and lead to subarachnoid hemorrhages which have a mor- tality rate of 40% and usually cause severe disability for Time-Of-Flight Magnetic Resonance Angiography (TOF- patients (Frösen et al., 2012). MRA) is a non-invasive and non-contrast imaging technique Manually assessing a TOF-MRA is a costly process: radi- sensitive to the blood flow in brain vessels. TOF-MRA has ologists detect aneurysms by selectively scrolling through found widespread clinical application to identify Unrup- the TOF-MRA volumes in different planes—for instance, tured Intracranial Aneurysms (UIAs) which are small (typi- they check in the axial plane the most recurrent locations cal diameter ≅ 5 mm) abnormal focal dilatations in cerebral where aneurysms can occur. Then, the sagittal view permits arteries (Chen et al., 2018). If untreated, UIAs can rupture better views of areas like the basilar trunk; afterwards, the coronal view can be used for areas like the anterior cerebral arteries or the Sylvian segments. In addition, Maximum * Tommaso Di Noto Intensity Projection (MIP) images can be used to search tommaso.di-noto@chuv.ch for stenoses, or to confirm potential aneurysms that were Department of Radiology, Lausanne University Hospital spotted. and University of Lausanne, Lausanne, Switzerland Considering that the workload of radiologists is steadily Center for Psychiatric Neuroscience, Department increasing (Rao et al., 2021) and the detection of UIAs is of Psychiatry, Lausanne University Hospital and University a meticulous and non-trivial task (Nakao et al., 2018), the of Lausanne, Lausanne, Switzerland development of automated algorithms that aid clinicians in Center for Biomedical Imaging, CIBM, Lausanne, detecting aneurysms with high sensitivity is an active line of Switzerland Vol.:(0123456789) 1 3 Neuroinformatics research which holds the promise of improving care while hinders comparisons across models. Of all reviewed studies reducing radiologists’ assessment times. of Table 1, only (Baumgartner et al., 2021) evaluated their Before the popularization of Deep Learning (DL), models on the challenge dataset. (Arimura et  al., 2004) detected aneurysms by means of In this work, we develop a fully automated DL network image filtering, and later, (Yang et al., 2011) used candidate for UIA detection and propose to mitigate the data avail- points of interest in the brain arteries to locate aneurysms. ability bottleneck as follows: we explore the use of “weak” Then, starting from 2016, there was a shift towards DL labels (Abousamra et al., 2020; Ezhov et al., 2018; Ke et al., algorithms, which have now become the de facto standard 2020). These can be coarse or oversized annotations that for UIA detection. Table 1 illustrates several recent studies are less precise, but considerably faster to create for medi- that use DL for UIA detection. Despite their success, these cal experts. In addition, we release our annotated in-house DL approaches are still constrained by a major bottleneck dataset to the community. To the best of our knowledge, this common to several medical applications: the lack of large, will be the largest openly available TOF-MRA aneurysm labeled datasets. This is mainly due to two factors: first, the dataset to date. creation of voxel-wise labels for medical images is tedious Furthermore, we constrain the DL analysis only to the and time-consuming for radiologists (Razzak et al., 2018); areas of the brain where aneurysm occurrence is plausible. second, none of the TOF-MRA studies to date made their This anatomically-informed approach aims at simulating dataset publicly available (Joo et al., 2020; Nakao et al., the selective analysis that radiologists perform on the TOF- 2018; Sichtermann et al., 2019; Stember et al., 2019; Ueda MRA scans. Then, we assess multi-site robustness by evalu- et al., 2019). This hampers reproducibility and multi-site ating our algorithm on the external TOF-MRA challenge analyses that are paramount for building robust DL archi- dataset (Timmins et al., 2021). Last, since every aneurysm tectures. The lack of openly available data, such as the can have a different prognosis, we investigate how the per - TOF-MRA challenge dataset (Timmins et al., 2021), also formances of our model change with respect to aneurysm Table 1 Summary of papers that use deep learning models to tackle automated brain aneurysm detection/segmentation Paper Modality Task(s) N. Sub N. Aneurysms DL Model Model input Voxel-wise Use anatomical Multi-Site labels information (Ueda et al., MRA Detection 1271 1477 ResNet 2D patches Not specified No Yes 2019) (Joo et al., 2020) MRA Detection 744 761 3D ResNet 3D patches Yes Yes Yes (Nakao et al., MRA Detection 450 508 CNN 2D MIP patches Yes Yes No 2018) (Stember et al., MRA Detection 302 336 RCNN 2D MIP patches Yes No No 2019) (Baumgartner MRA Detection 254 N/A nnDetection 3D patches Yes No No et al., 2021) (Sichtermann MRA Detection (via 85 115 DeepMedic 3D patches Yes Yes No et al., 2019) segmentation) (Shi et al., 2020) CTA Detection + Seg- 1177 1099 3D UNET 3D patches Yes Yes Yes mentation (Yang et al., CTA Detection 1068 1337 ResNet 3D patches Not specified No Yes 2020) (Park et al., CTA Segmenta- 662 358 HeadXNet 3D patches Yes Yes No 2019) tion + CAD assessment (Dai et al., 2020) CTA Detection 311 352 RCNN 2D NP images Not specified No Yes (Liu et al., 2021) DSA Detection + Seg- 451 485 3D UNET 3D DSA vol- Yes Yes No mentation umes (Duan et al., DSA Detection 281 261 2D CNN 2D DSA images Bounding Boxes Yes No 2019) (Hainc et al., DSA Detection 240 187 2D CNN 2D DSA images ROI circle No No 2020) Use anatomical information: whether the method uses some sort of anatomical prior knowledge during training, patch sampling or inference (more details in  Online Resources – Section A) MRA Magnetic Resonance Angiography, CTA  Computed Tomography Angiography, DSA Digital Subtraction Angiography, N number, Sub sub- jects 1 3 Neuroinformatics risk-of-rupture groups (defined in “ Aneurysm Annotation, 1. Weak labels: for most subjects (246/284), the radi- Size, Location and Risk Groups for In-house Dataset” sec- ologist used the Multi-image Analysis GUI (Mango) tion), location and size. software (v. 4.0.1) to create the aforementioned weak labels. These correspond to spheres that enclose the whole aneurysm, regardless of the shape. In other Materials and Methods words, the size of the spheres was chosen manually by our radiologist on a case-by-case basis ensuring that the In‑house Dataset whole aneurysm was always entirely enclosed within the sphere. A visual example of one weak label is provided This study was approved by the regional ethics committee; in Fig. 1. written informed consent was waived. In this retrospec- 2. Voxel-wise labels: for the remaining subjects (38/284), tive work, we included consecutive patients that underwent the radiologist used ITK-SNAP (v. 3.6.0) (Yushkevich TOF-MRA between 2010 and 2015, and for which the cor- et al., 2006) to create voxel-wise labels drawn slice by responding radiological reports were available. Patients slice scrolling in the axial plane. No specific selection with ruptured/treated aneurysms or with other vascular criterion was used to select the 38 subjects, which were pathologies were excluded. Totally thrombosed aneurysms consecutive to the 246 of the first group. and infundibula (dilatations of the origin of an artery) were likewise excluded. In total, we retrieved 284 TOF-MRA sub- The overall number of aneurysms included in the study jects: 157 had one (or more) UIAs, while 127 did not present is 198 (178 saccular, 20 fusiform). Table 3 shows their loca- any. Table 2 illustrates the main demographic information tions and sizes grouped according to the PHASES score for our study group. A 3D gradient recalled echo sequence (Greving et al., 2014). This is a clinical score used to assess with Partial Fourier technique was used for all subjects the 5-year risk of rupture of aneurysms. Although using the (acquisition parameters are reported in Online Resources— PHASES sizes leads to a very skewed distribution (e.g. the Table 1). 214 subjects of this study were also used in (Di category size d ≤ 7 mm contains 91% of the aneurysms), we Noto et al., 2020). This prior article dealt with patch-wise decided to stick to this grouping since it is the one used in classification, whereas here we address patient-wise aneu- the clinic. rysm detection. The dataset was anonymized and organ- In addition, for post-hoc analysis and stratification pur - ized according to the Brain Imaging Data Structure (BIDS) poses, we divided the aneurysms into two groups based on standard (Gorgolewski, 2008). It is available on OpenNeuro their risk of rupture: low-risk and medium-risk. Aneurysms (Markiewicz et al., 2021) at https:// openn euro. org/ datas ets/ in the low-risk group are those that are monitored over time, ds003 949 under the CC0 license. but do not require any intervention. Instead, aneurysms in the medium-risk group can be considered for treatment. We computed for each aneurysm a partial PHASES score Aneurysm Annotation, Size, Location and Risk that only considered size, location, and patient’s age, thus Groups for In‑house Dataset neglecting population, hypertension, and earlier aneurysmal hemorrhage, since this information was not available for all Aneurysms were annotated by one radiologist with 2 years patients. If an aneurysm had partial PHASES score ≤ 4, it of experience in neuroimaging, and double-checked by a was assigned to the low-risk group, while if it had a partial senior neuroradiologist with over 15 years of experience score 4, it was assigned to the medium-risk group. Each to exclude potential false positives or false negatives. Two aneurysm was reviewed by our senior neuroradiologist to annotation schemes were followed: assess whether the partial PHASES score was reasonable. Table 2 Demographics of the Patients Controls Test, p value Whole Sample study sample N 157 127 / 284 –7 Age (y) 56 ± 14 46 ± 17 t = –4, 3, p = 7.6 × 10 51 ± 16 Sex 53 M, 104F 61 M, 66F χ  = 5.9 p = 0.01 114 M, 170F # UIA 198 0 / 198 Patients = subjects with aneurysm(s). Controls = subjects without aneurysms. Age calculated in years and presented as mean ± standard deviation. Two-sided t-test to compare age between patients and controls. Chi-squared test to compare sex counts between patients and controls N number of samples, M males, F females, UIA Unruptured Intracranial Aneurysms 1 3 Neuroinformatics Fig. 1 TOF-MRA orthogonal views of a 62-year-old female patient. Red areas correspond to our spherical weak labels. Top- left: axial plane; top-right: 3D posterior reconstruction of the cerebral arteries; bottom-left: sagittal plane; bottom-right: coronal plane Fusiform aneurysms were excluded from the risk analy- Data Processing sis since the PHASES score was built for saccular aneu- rysms. Similarly, extracranial carotid artery aneurysms were Several preprocessing steps were carried out for each excluded since they do not bleed in the subarachnoid space. subject. First, we performed skull-stripping with the FSL This resulted in 141 low-risk and 23 medium-risk aneu- Brain Extraction Tool (v. 6.0.1) (Smith, 2002). Second, we rysms. A table summarizing aneurysm shape, size, location, applied N4 bias field correction with SimpleITK (v. 1.2.0) associated PHASES score and risk groups is provided as (Tustison et al., 2010). Third, we resampled all volumes Supplementary Material. to a median voxel spacing, again with SimpleITK. This effectively normalizes nonuniform voxel sizes (Isensee et  al., 2021). Last, a probabilistic vessel atlas built from multi-center MRA datasets (Mouches & Forkert, 2014) was Table 3 Locations and sizes of aneurysms according to the PHASES co-registered to each patient’s TOF-MRA using ANTS (v. score for the in-house dataset 2.3.1) (Avants et al., 2014) (details in Online Resources Count % – Section B). The atlas was used both during training and Location ICA 59 29.8 (59/198) inference (see “Use of Anatomical Information” section). MCA 57 28.8 (57/198) ACA/Pcom/Posterior 82 41.4 (82/198) Network, Cross‑Validation, Metrics and Statistics Size    180 91.0 (180/198) d ≤ 7 mm 7 − 9, 9 mm    7 3.5 (7/198) Network The deep learning model used in this study is a 10 − 19, 9 mm   10 5.0 (10/198) custom 3D UNET, inspired by the original work (Özgün d ≥ 20 mm   1 0.5 (1/198) et  al., 2016). We used upsample layers in the decoding branch rather than transpose convolutions since these led to ICA Internal Carotid Artery, MCA Middle Cerebral Artery, ACA Ante- faster model convergence. Figure 2 illustrates the structure of rior Cerebral Arteries, Pcom Posterior communicating artery, Poste- our network. We used 3D TOF-MRA patches as input to our rior posterior circulation, d maximum diameter 1 3 Neuroinformatics TOF-MRA 3 3 3 3 3 26x64 9x64 18x64 9x64 9x32 9x64 3 3 3 3 3 26x16 61x32 26x32 52x32 26x32 26x32 61x8 3 3 3 61x16 74x16 122x16 61x16 3x3x3 3x3x3 61x16 BatchNorm conv max_pool 3x3x3 upsample concatenate 3D conv 74x8 Fig. 2 Proposed variant of the 3D UNET. The input corresponds to sponds to the probability of either belonging to foreground (i.e., aneu- a 64x64x64 voxels TOF-MRA patch. The output is a probabilistic rysm) or background. Conv convolutional, Max_pool max pooling, patch with the same size of the input, but where each voxel corre- BatchNorm batch normalization network. We set the side of the input patches to 64x64x64 is 855,111. Training and evaluation were performed with voxels to include even the largest aneurysms. All patches Tensorflow 2.4.0 and a GeForce RTX 2080TI GPU with were Z-score normalized, as is common practice (Bengio 11 GB of SDRAM. et al., 2016). A kernel size of 3x3x3 was used in all con- volutional layers, with padding and stride = 1. We applied Cross‑validation To evaluate detection performances, the ReLU activation function for all layers, except for the we conducted a fivefold cross-validation on the 246 sub - last layer which is followed by a sigmoid function. To fit the jects with weak labels. At each cross-validation split, 80% model, the Adam optimization algorithm (Kingma & Ba, (≈197/246) of the subjects are used for training the net- 2015) was applied with adaptive learning rate (initial learn- work, while the remaining 20% (≈49/246) of the subjects ing rate = 0.0001). We trained the model for 100 epochs, are used to compute patient-wise results (i.e. for inference). and we adopted the Combo loss function (Taghanaki et al., This division occurs 5 times (as the number of folds) and 2019) with α = β = 0.5. This function combines Dice and every time a different 80%-20% split is created, meaning Cross-entropy, and has proven to be effective for imbalanced that all 246 patients are ultimately used for evaluation. At segmentation tasks. We used Xavier initialization (Glorot & each cross-validation split, the 38 patients with voxel-wise Bengio, 2010) for all layers. Biases were initialized to 0 and labels were always added to the training set to increase the a batch size of 8 was chosen. Batch normalization (Ioffe & effect size of label quality in further analyses (see experi- Szegedy, 2015) was used to prevent overfitting. The num- ments in “Use of Weak Labels”). To avoid over-optimistic ber of convolutional filters, the batch size, the value of α results, we ensured that patients with multiple sessions were (and therefore β = 1 − α) and the learning rate were chosen not split between training and test set. In order to make using the Optuna algorithm (Akiba et al., 2019) on an inter- results comparable across experiments, we always used the nal validation set (20% of training cases of external cross- same cross-validation split and we released this split for validation fold 1, see below for cross-validation details). reproducibility on https:// git hub. com/ conne ct omi cslab/ The total number of trainable parameters in our network Aneur ysm_ Detec tion. 1 3 Neuroinformatics In all experiments on the in-house dataset, we always pre- the changes in detection performances with respect to trained our network on the whole ADAM training dataset aneurysm risk-of-rupture groups, location and size. (Timmins et al., 2021) and then fine-tuned it on the in-house training data. To validate the effectiveness of pre-training on ADAM, we performed ablation experiments of domain Use of Weak Labels adaptation across the two datasets (in-house and ADAM). As these experiments are beyond the main focus of the man- The goal of this experiment was to answer the following uscript, we added them in the Online Resources – Section F. questions: 1) how much faster is the creation of weak labels When performing pre-training on the ADAM dataset, we with respect to the creation of voxel-wise labels? 2) what is applied both anatomically-informed expedients described the impact of using weak labels in terms of detection perfor- below in “Use of Anatomical Information” section. mances when comparing to voxel-wise labels? To answer the first question, we selected a subset of 14 Metrics and Statistics In line with the ADAM challenge patients (mean aneurysm size (s.d.) = 5.2 (1.0) mm), and (presented in “ Participation to the ADAM Challenge” sec- we assessed the time difference between the two annotation tion), we used sensitivity and false positive (FP) rate as schemes (i.e. all 14 patients were annotated first with weak detection metrics. A detection was considered correct if the labels, and then with voxel-wise labels). These 14 patients center-of-mass of the predicted aneurysm was located within were chosen randomly among the 284 TOF-MRA subjects, the maximum aneurysm size of the ground truth mask. In but we ensured that the mean aneurysm size was representa- addition, we computed the Free-response Receiver Operat- tive of the whole cohort. ing Characteristic (FROC) curve (Chakraborty & Berbaum, To answer the second question, we used the 38 subjects 2004). To compare different model configurations, we used with voxel-wise labels and for these patients we artificially a two-sided Wilcoxon signed-rank test of the areas under the generated corresponding weak spherical labels (‘weakened’ FROC curves across test subjects, as similarly performed labels, details in Online Resources – Section C). Then, to in (Ward et al., 1999). To compare the performances of a evaluate the influence of annotation quality (weakened vs. configuration with respect to aneurysm rupture risk, location voxel-wise) in terms of detection performances, we con- and size we performed several Chi-squared tests (McHugh, ducted 3 experiments in which we used an increasing num- 2012). The statistical tests were performed using SciPy ber of patients with voxel-wise labels: (i) all 38 patients (v.1.4.1), setting a significance threshold α = 0.05. with weakened labels (Model 1, Table  4), (ii) 19 patients with weakened labels and 19 with voxel-wise labels (Model Experiments 2, Table 4), and (iii) all 38 patients with voxel-wise labels (Model 3, Table 4). Results related to the use of weak labels In this section, we will present the four experiments are presented in “Weak Labels Allow Fourfold Annotation that we conducted: in “Use of Weak Labels” section, we Speedup Without Degrading Performances” section. investigate the use of weak labels in terms of difference in annotation time and in detection performances, when com- Use of Anatomical Information paring to voxel-wise labels; in “Use of Anatomical Infor- mation” section, we present our anatomically-informed Because the task of aneurysm detection is extremely spa- approach for tackling UIA detection; in “Participation to tially constrained, we exploit the prior information that the ADAM Challenge” section, we describe the participa- aneurysms a) must occur in vessels, and b) tend to occur in tion to the ADAM challenge to investigate the generali- specific locations of the vasculature. To include this ana- zation of our model; in “Performances With Respect to tomical knowledge, one of our radiologists pinpointed in Risk-of-rupture, Location and Size” section, we analyze the vessel atlas (described in “Aneurysm Annotation, Size, Table 4 Average detection results on the in-house dataset across test folds when changing the ratio of voxel-wise/weakened labels. Sensitivity values are reported as mean and 95% Wilson confidence interval inside parentheses Model Configu- Anatomically-informed Anatomically-informed Labels of 38 added subs Avg. Sensitivity (CI) Avg. FP rate ration patch selection sliding window Model 1 Yes Yes 38 weakened 95/127 = 75% (65%, 80%) 1.3 Model 2 Yes Yes 19 weakened, 19 voxel-wise 99/127 = 78% (68%, 82%) 0.9 Model 3 Yes Yes 38 voxel-wise 101/127 = 80% (72%, 85%) 1.2 Bold values represent the best performances Avg average, FP false positive, CI confidence interval, voxel-wise labels drawn slice by slice on the axial plane,  weakened voxel-wise labels that are artificially converted to weak spherical labels, subs subjects 1 3 Neuroinformatics Location and Risk Groups for In-house Dataset” section) the Validation To validate the effectiveness of our two ana- location of 20 landmark points where aneurysm occurrence tomically-informed expedients (patch sampling and slid- is most frequent (list in Online Resources – Table 2). These ing window), we first evaluated an anatomically-agnostic points were chosen according to the literature (Brown & baseline where none of the two expedients is used and Broderick, 2014) and were co-registered to the TOF-MRA the 38 added subjects have weakened labels (Model 4, space of each subject, as illustrated in Fig. 3. Table 5). Second, we evaluated the same anatomically- agnostic baseline (none of the two expedients used) but Training We apply an anatomically-informed selection of with the 38 subjects having voxel-wise labels (Model training patches to sample both negative (without aneu- 5, Table 5). Third, we tested one model where only the rysms) and positive (with aneurysms) samples. Specifically, anatomically-informed patch sampling is carried out 8 positive patches per aneurysm were randomly extracted (Model 6, Table  5). Last, we computed performances in a non-centered fashion. Then, we extracted 50 negative when only the anatomically-informed sliding window is patches per TOF-MRA volume. Out of these, 20 were cen- performed (Model 7, Table 5). Results related to the use tered in correspondence with the landmark points, 20 were of anatomical information are shown in “Anatomically- patches containing vessels (details in Online Resources – informed Sliding Window Increases Detection Perfor- Section D), and 10 were extracted randomly. Overall, this mances” section. sampling strategy allows us to extract most of the negative patches (i.e., all but the random ones) which are comparable Participation to the ADAM Challenge to the positive ones in terms of average intensity. To mitigate class imbalance, we applied data augmentations on positive To evaluate model performances in data coming from patches: namely, rotations (90°, 180°, 270°), flipping (hori- a different institution, we participated to the Aneurysm zontal, vertical), contrast adjustment, gamma correction, and Detection And segMentation (ADAM) challenge (http:// addition of gaussian noise. adam. isi. uu. nl/) (Timmins et  al., 2021). The ADAM training dataset is composed of 113 TOF-MRA (93 Inference The patient-wise evaluation was performed fol- patients with UIAs, 20 controls). The total number of lowing the sliding window approach (details in Online UIAs is 125 and the voxel-wise annotations were drawn Resources – Section E). We exploited again the prior ana- in the axial plane by two radiologists. Instead, the unre- tomical information described above by retaining only the leased test dataset is made of 141 cases (117 patients, patches which are both within a minimum distance from the 26 controls) and it is solely used by the organizers to landmark points and fulfill specific intensity criteria (details compute patient-wise results. To improve detection per- in Online Resources – Section D). The rationale behind this formances on the ADAM test set, we pre-trained our choice was to only focus on patches located in the main network on the whole in-house dataset and then fine- cerebral arteries, as shown in Fig. 4. Two post-processing tuned it on the ADAM training dataset. Results related steps were adopted: first, we kept a maximum of 5 candidate to our model submission to the ADAM challenge are aneurysms per patient (only the 5 most probable). Second, presented in “The Proposed Model Ranked At the Top we applied test-time augmentation to increase sensitivity. of the ADAM Challenge” section. Fig. 3 left: 20 landmark points Probabilistic vessel atlas TOF-MRA volume (in red) located in specific positions of the cerebral arteries (white segmentation) in MNI space. right: same landmark points co-registered to the TOF-MRA space of a 21-year- old, female subject without aneurysms MNI spaceSubject space 1 3 Neuroinformatics Fig. 4 TOF-MRA orthogonal views of a 62-year-old female subject after brain extrac- tion: blue patches are the ones which are retained in the anatomically-informed sliding- window approach. (top-right): 3D schematic representation of sliding-window approach; out of all the patches in the volume (white patches), we only retain those located in the proximity of the main brain arteries (blue ones) Performances with Respect to Riskof ‑ ‑rupture, Location and Size addition, we also explored how performances would vary with respect to aneurysm location and size. Although the Each aneurysm has a different prognosis and, depend- latter analysis is less relevant from a clinical perspective, it is still interesting from a methodological point of view ing on its risk-of-rupture group (defined in “Aneurysm Annotation, Size, Location and Risk Groups for In-house and it is also frequent in the literature. Results related to the detection performances with respect to aneurysm Dataset” section), it will be either monitored over time (low risk) or considered for treatment (medium risk). risk-of-rupture groups, location and size are described in “Detection Performances Across Rupture Risk, Location, Therefore, we investigated how detection performances would vary with respect to the risk-of-rupture groups. In and Size” section. Table 5 Average detection results on the in-house dataset across test folds when applying none, or one of the two anatomically-informed expedi- ents. Sensitivity values are reported as mean and 95% Wilson confidence interval inside parentheses Model Anatomically-informed Anatomically-informed Labels of 38 added subs Avg. Sensitivity (CI) Avg. FP rate Configuration patch selection sliding window Model 4 No No 38 weakened 83/127 = 65% (55%, 71%) 4.6 Model 5 No No 38 voxel-wise 95/127 = 74% (63%, 78%) 4.5 Model 6 Yes No 38 voxel-wise 61/127 = 48% (38%, 55%) 4.8 Model 7 No Yes 38 voxel-wise 106/127 = 83% (75%, 88%) 0.8 Bold values represent the best performances Avg average, FP false positive, CI confidence interval, voxel-wise labels drawn slice by slice on the axial plane, weakened voxel-wise labels that are artificially converted to weak spherical labels, subs subjects 1 3 Neuroinformatics an effective expedient to increase detection results. In fact, Results sensitivity is increased and the average FP rate is drastically reduced. In addition, we compared Model 5 and Model 6 Weak Labels Allow Fourfold Annotation Speedup and we saw that Model 5 significantly outperforms Model Without Degrading Performances −6 p = 8 × 10 6 (W = 202.0, ). This finding shows that the anatomically-informed patch sampling is detrimental for When measuring the time needed to create weak vs. voxel- detection performances when the sliding window is anatom- wise annotations on the 14 subjects described in “Use of ically-agnostic. Last, when comparing Model 3 and Model Weak Labels” section, we noticed a significant difference 7 we found no significant difference (W = 81.5, p = 0.24 ): (two-sided Wilcoxon signed-rank test – annotation tim- this result indicates that the anatomically-informed patch ings, W = 0, p = 0.001): creating weak annotations (aver- sampling is not detrimental when we are also applying the age 23 s ± 6 per aneurysm) resulted to be approximately 4 anatomically-informed sliding window. times faster than creating voxel-wise annotations (average To provide a visual interpretation of our network predic- 93 s ± 25). A more detailed stratification of the timings with tions, we show in Fig. 5 one correctly identified aneurysm respect to location and size is provided in Supplementary (true positive), one small, missed aneurysm (false negative) Figs. 1 and 2. and one false positive prediction. Also, in Fig. 6 we report Subsequently, to investigate the effect that voxel-wise the FROC curves corresponding to Model 5, Model 6, and labels can have for detection performances with respect to Model 7. This figure reflects the statistical tests: Model 7 weak labels, we conducted several experiments where an (green curve) outperforms the anatomically-agnostic Model increasing ratio of voxel-wise/weakened labels was used 5 (red curve) at all operating points. Similarly, Model 5 (red for the 38 patients described in “Use of Weak Labels” sec- curve) significantly outperforms Model 6 (blue curve), tion. Table 4 shows detection performances when the ratio confirming the effectiveness of the anatomically-informed is gradually increased. sliding window and the ineffectiveness of the anatomically- The configuration with all voxel-wise labels (Model 3) informed patch sampling. had higher sensitivity with respect to the other two con- figurations with weakened labels (Model 1 and Model 2). The Proposed Model Ranked At the Top of the ADAM However, this difference was not significant (two-sided Challenge Wilcoxon signed-rank test on the areas under the FROC curves, W = 14.0, p = 0.054 when comparing to Model 1 and Table 6 illustrates detection results on the ADAM test data- W = 685.5, p = 0.977 when comparing to Model 2). set. Our algorithm ranked in 4th/18 position for detection and in 4th/15 position for segmentation (with highest volu- Anatomically‑informed Sliding Window Increases metric similarity). Interested readers can check the methods Detection Performances proposed by other teams on the challenge website (https:// adam. isi. uu. nl/) and in the paper (Timmins et al., 2021). In Table  5, we report detection results when adopting zero, one, or both anatomically-informed expedients pre- Detection Performances Across Rupture Risk, sented in “Use of Anatomical Information” section. In the Location, and Size anatomically-agnostic baseline with the 38 subjects having weakened labels (Model 4), the negative patch sampling Supplementary Fig. 3 illustrates performances achieved by is random and all non-zero patches of the TOF-MRA vol- one of our top-performing models (Model 3, Table 4) strati- umes are retained in the sliding window approach, thus fied according to the two risk groups presented in “ Aneu- disregarding any anatomical constrain. Similarly, row 2 rysm Annotation, Size, Location and Risk Groups for In- (Model 5) shows detection results when using neither the house Dataset” section. For the low-risk group, our model anatomically-informed patch sampling nor the anatomically- reaches a mean sensitivity of 80%, while for the medium-risk informed sliding window, but this time with the 38 subjects group it reaches a mean sensitivity of 73%. The difference having voxel-wise labels. Row 3 (Model 6) illustrates detec- was not significant (  = 0.09 , DoF = 1, p = 0.75). In Sup- tion performances when only the anatomically-informed plementary Figs. 4 and 5, we also report the model sensi- patch sampling is applied, but the sliding window is still tivity stratified according to aneurysm location and size of anatomically-agnostic. Instead, row 4 (Model 7) shows the the PHASES score, respectively. No significant difference inverse scenario (i.e. random negative patch sampling, but was found across different locations (  = 0.64, DoF = 2, anatomically-informed sliding window). Model 7 statisti- −6 2 p = 0.72) or sizes (  = 0.92, DoF = 2, p = 0.15, excluding cally outperformed Model 5 (W = 74.5, p = 2 × 10 ), t hus n = 1 huge aneurysm with 20 mm). Regarding aneurysm indicating that the anatomically-informed sliding window is s > 1 3 Neuroinformatics Fig. 5 Qualitative analysis of predictions and errors. The heatmap aneurysm) in the internal carotid artery. The ground truth label mask generated by the network ranges from 0 (low probability, yellow is shown in blue. c False positive prediction in the internal carotid color) to 1 (high probability, red color) (a) True positive prediction artery in the anterior communicating artery. b  False negative (i.e., missed size, we conducted a further stratification of performances Discussion since most of the aneurysms lied in the group (< 7 mm). Thus, we divided this group into subgroups, namely ≤ 3, This work shows that competitive results can be obtained 3 < s ≤ 5, and 5 < s < 7. Detection results with this more in automated aneurysm detection for TOF-MRA data granular stratification are shown in Supplementary Fig. 6. even with rapid data annotation. To this end, we pro- The model sensitivity was significantly lower for the tiny posed a fully-automated, deep learning algorithm that is aneurysms (≤ 3) with respect to the other two subgroups (  trained using weak labels and exploits prior anatomical −6 = 27.57, DoF = 2, p = ). knowledge. Fig. 6 Mean Free-response Receiver Operating Charac- Mean FROC curves teristic (FROC) curves across the five test folds of the cross-validation. Shaded areas represent the 95% Wilson confidence interval. The three models correspond to Model 5, Model 6, and Model 7. Anatomically-agnostic model: none of the two anatomically-informed expe- dients are used. Anat: Anatomically-Informed Model 5: anatomically-agnostic + 38 voxelwise Model 6: only anat. patch sampling + 38 voxelwise Model 7: only anat. sliding wind. + 38 voxelwise Nb. allowed FP per patient 1 3 Sensitivity Neuroinformatics Table 6 Detection results on the ADAM dataset. Our team (in bold) no anatomically-informed sliding window). We think this ranked in 4th position in the open leaderboard out of 18 participating difference between training and test domain is what causes groups the decrease in performances in the comparison Model 5 Detection vs. Model 6. Nevertheless, the anatomically-informed sliding window Ranking Team Sens Avg. FP rate expedient suggests that injecting prior anatomical knowl- 1 abc 68% 0.40 edge in the pipeline can improve detection performances. 2 xlim 70% 4.03 We believe this general principle is also applicable to other 3 mibaumgartner 67% 0.13 pathologies with sparse spatial extent. 4 unil-chuv3 68% 2.50 The state-of-the-art for automated brain aneurysm detec- 5 joker 63% 0.16 tion in TOF-MRA has been rapidly advancing in the last five years, especially due to the advent of deep learning 18 ibbm 2% 0.01 algorithms. However, further multi-site validation is needed before safely applying these algorithms during routine clini- Sens sensitivity, FP false positive cal practice. Although (Joo et al., 2020; Ueda et al., 2019) did publish results obtained from multiple institutions, Despite being less accurate, weak labels are drastically none of them released their dataset publicly which makes faster to create for medical experts reducing fourfold the comparisons between algorithms unfeasible. The compari- annotation time. Although the configuration with all voxel- sons between methods are further hindered by the use of wise labels (Model 3, Table 4) had higher sensitivity, we non-standardized evaluation metrics (e.g. FROC/lesion- found no statistical difference when comparing with the wise sensitivity/subject-wise specificity) or by the fact that configurations with some (Model 2) or all weakened labels not all related studies include both patients (subjects with (Model 1). This finding indicates that weak labels are suffi- aneurysms) and controls (subjects without aneurysms). By cient to obtain satisfactory detection results on our in- house openly releasing our dataset, we aim to bridge the data avail- dataset. If reasoning in terms of larger datasets (e.g., thou- ability gap and foster reproducibility in the medical imaging sands of patients), the weak annotation proposed in this work community. The combination of our in-house dataset and is a scalable solution which can significantly alleviate the the ADAM dataset will allow researchers to assess the real- annotation bottleneck in medical ML applications. istic robustness of proposed algorithms on heterogeneous In addition to the use of weak labels, our model leverages data generated from different scanners, acquisition protocols the underlying anatomy of the brain vasculature (i.e., we and study population. In addition, it could help increasing “anatomically-informed” our network) in order to simulate detection performances which are still too far from being the radiologists’ exploration of the TOF-MRA scans. First, clinically useful, considering that even the team with highest most of the negative patches (i.e. patches without aneu- sensitivity on the ADAM test set (team xlim) only reaches rysms) extracted during training either contained a vessel or a value of 70% (i.e., 30% of aneurysms still not detected), were located in correspondence with the aneurysm landmark with 4 FPs per case. points. Second, we limited the sliding window approach only In a separate analysis, we also computed the sensitiv- to regions of the brain that are plausible for aneurysm occur- ity of our model with respect to the PHASES score risk of rence. These constraints reflect the radiologists’ behavior in rupture, location, and size. No significant differences were the sense that only regions containing vessels, or at higher found across the three groups indicating that our model is risk for aneurysms are scanned, while the rest of the brain robust to different aneurysm types. However, when strati - is neglected. The experiments in “Anatomically-informed fying the aneurysm sizes into finer subgroups, we noticed Sliding Window Increases Detection Performances” section that sensitivity for extremely tiny aneurysms (≤ 3 mm) was showed that the anatomically-informed sliding window is an significantly lower, which confirms a known trend (Timmins eec ff tive expedient since it increases sensitivity, while reduc - et al., 2021). ing the average FP rate. Instead, the anatomically-informed Our work has several limitations. First, even combining patch sampling proved to be negligible when combined our in-house dataset with the ADAM dataset, the number with the anatomically-informed sliding-window (Model of subjects is still limited when compared to some related 3 vs. Model 7), or even detrimental when the sliding win- TOF-MRA (Joo et al., 2020; Ueda et al., 2019) or Computed dow was anatomically-agnostic (Model 5 vs. Model 6). We Tomography Angiography (Park et  al., 2019; Shi et  al., hypothesize that applying only the anatomically-informed 2020; Yang et al., 2020) studies. Second, we acknowledge patch sampling leads to a domain shift issue: specifically, that the number of patients for whom we compared the dif- the model is trained using intensity-matched patches, but ferent annotations schemes (i.e., weak vs. voxel-wise) is then is tested with any patch in the brain (because there is limited (N = 38); it is possible that statistically significant 1 3 Neuroinformatics as you give appropriate credit to the original author(s) and the source, performance die ff rences could be found with a larger sample provide a link to the Creative Commons licence, and indicate if changes size. Third, we have to further increase detection perfor- were made. The images or other third party material in this article are mances if we plan to deploy our model as a second reader included in the article's Creative Commons licence, unless indicated for radiologists, especially to detect tiny aneurysms which otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not are more frequently overlooked (Keedy, 2006). permitted by statutory regulation or exceeds the permitted use, you will In future works, we aim at enlarging the TOF-MRA dataset need to obtain permission directly from the copyright holder. To view a and experiment new variants of the 3D encoding–decoding copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . UNET. For instance, we might consider a multi-scale approach with patches of larger (or smaller) scales. Alternatively, we are considering combining our anatomically-driven approach with References the novel nnUnet model (Isensee et al., 2021) which has proven to be effective not only for aneurysm detection (it was adopted Abousamra, S., Fassler, D., Hou, L., Zhang, Y., Gupta, R., Kurc, T., by 2 of the top-performing teams in the ADAM challenge), Escobar-Hoyos, L. F., Samaras, D., Knudson, B., Shroyer, K., Saltz, J., & Chen, C. (2020). Weakly-supervised deep stain decom- but also for several other segmentation tasks. We believe this position for multiplex IHC images. Proceedings - International combination holds potential to boost detection performances. Symposium on Biomedical Imaging, 481–485. https:// doi. org/ 10. Also, the ablation study performed in the Online Resources 1109/ ISBI4 5749. 2020. 90986 52 – Section F showed that pre-training on the ADAM dataset did Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: a next-generation hyperparameter optimization frame- not increase detections results. Therefore, future works should work. Proceedings of the ACM SIGKDD International Confer- investigate a different transfer learning approach to better lev - ence on Knowledge Discovery and Data Mining. https:// doi. org/ erage knowledge acquired from the ADAM dataset. Last, we 10. 1145/ 32925 00. 33307 01 plan to conduct further error analyses to identify common pat- Arimura, H., Li, Q., Korogi, Y., Hirai, T., & Abe, H. (2004). Automated computerized scheme for detection of unruptured intracranial aneu- terns for both false positive and false negative cases. rysms in three- dimensional magnetic resonance angiography 1. In conclusion, our study presented an anatomically- Academic Radiology. https:// doi. org/ 10. 1016/j. acra. 2004. 07. 011 informed 3D UNET that tackles brain aneurysm detection Avants, B. B., Tustison, N., & Johnson, H. (2014). Advanced Normali- across different sites. The combination of time-saving weak zation Tools (ANTS). Insight J, 2(365), 1–35. https:// brian avants. w or dp r ess. com/ 2012/ 04/ 13/ updat ed- ants- com pi le- ins tr uctio ns- labels and anatomical prior knowledge allowed us to build april- 12- 2012/. Accessed January 2021. a robust deep learning model. We believe our approach and Baumgartner, M., Jäger, P. F., Isensee, F., & Maier-Hein, K. H. (2021). dataset (both openly available) can foster the development of nnDetection: a self-configuring method for medical object detec- clinically applicable automated systems for the task at hand. tion. MICCAI. https:// git hub. com/ MIC- DKFZ/ nnDe t ection. Accessed July 2021. Bengio, Y., Goodfellow, I., & Courville, A. (2016). Deep learning. MIT Information Sharing Statement ‑ Data Press, 29(7553). Availability Brown, R. D., & Broderick, J. P. (2014). Unruptured intracranial aneu- rysms: Epidemiology, natural history, management options, and familial screening. The Lancet Neurology, 13(4), 393–404. https:// Our open-access dataset is available on OpenNeuro under doi. org/ 10. 1016/ S1474- 4422(14) 70015-8 the CC0 license at https://openn eur o.or g/dat ase ts/ds003 949 . Chakraborty, D. P., & Berbaum, K. S. (2004). Observer studies involv- The ADAM dataset can be downloaded from the challenge ing detection and localization: Modeling, analysis, and validation. website https://adam. isi. uu. nl/ dat a/ after signing a confiden- Medical Physics, 31(8), 2313–2330. https:// doi. org/ 10. 1118/1. 17693 52 tiality agreement. The code used for this study is available Chen, X., Liu, Y., Tong, H., Dong, Y., Ma, D., Xu, L., & Yang, C. at https://git hub.com/ conne ct omicslab/ Aneur y sm_De tection (2018). Meta-analysis of computed tomography angiography ver- under the Apache-2.0 license, together with the configura- sus magnetic resonance angiography for intracranial aneurysm. tion files to replicate all the experiments, and the weights of Medicine (United States), 97(20). https:// doi. org/ 10. 1097/ MD. 00000 00000 010771 the trained model if users simply want to perform inference. Dai, X., Huang, L., Qian, Y., Xia, S., Chong, W., Liu, J., Di Ieva, A., Hou, X., & Ou, C. (2020). Deep learning for automated cerebral Supplementary Information The online version contains supplemen- aneurysm detection on computed tomography images. Interna- tary material available at https://doi. or g/10. 1007/ s12021- 022- 09597-0 . tional Journal of Computer Assisted Radiology and Surgery, 15(4), 715–723. https:// doi. org/ 10. 1007/ s11548- 020- 02121-2 Acknowledgements We would like to thank the organizing team of the Di Noto, T., Marie, G., Tourbier, S., Alemán-Gómez, Y., Saliou, G., ADAM challenge for their great effort and availability. Cuadra, M. B., Hagmann, P., & Richiardi, J. (2020). An anatomically- informed 3D CNN for brain aneurysm classification with weak labels. Funding Open access funding provided by University of Lausanne. Machine Learning in Clinical Neuroimaging and Radiogenomics in Neuro-Oncology. http:// arxiv. org/ abs/ 2012. 08645. Accessed Janu- ary 2021. Open Access This article is licensed under a Creative Commons Attri- Duan, H., Huang, Y., Liu, L., Dai, H., Chen, L., & Zhou, L. (2019). bution 4.0 International License, which permits use, sharing, adapta- Automatic detection on intracranial aneurysm from digital sub- tion, distribution and reproduction in any medium or format, as long traction angiography with cascade convolutional neural networks. 1 3 Neuroinformatics BioMedical Engineering Online, 18(1). https:// doi. org/ 10. 1186/ healthy subjects. Scientific Data, 6(1), 1–8. https:// doi. org/ 10. s12938- 019- 0726-21038/ s41597- 019- 0034-5 Ezhov, M., Zakirov, A., & Gusarev, M. (2019). Coarse-to-fine volumet- Nakao, T., Hanaoka, S., Nomura, Y., Sato, I., Nemoto, M., Miki, S., ric segmentation of teeth in cone-beam CT. IEEE 16th Interna- Maeda, E., Yoshikawa, T., Hayashi, N., & Abe, O. (2018). Deep tional Symposium on Biomedical Imaging (ISBI 2019). neural network-based computer-assisted detection of cerebral Frösen, J., Tulamo, R., Paetau, A., Laaksamo, E., Korja, M., Laakso, aneurysms in MR angiography. Journal of Magnetic Resonance A., Niemelä, M., & Hernesniemi, J. (2012). Saccular intracranial Imaging, 47(4), 948–953. https:// doi. org/ 10. 1002/ jmri. 25842 aneurysm: Pathology and mechanisms. Acta Neuropathologica, Özgün, Ç., Abdulkadir, A., Lienkamp, S., Brox, T., & Ronneberg, 123(6), 773–786. https:// doi. org/ 10. 1007/ s00401- 011- 0939-3 O. (2016). 3D U-Net: Learning dense volumetric segmenta- Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of train- tion from sparse annotation. ArXiv. https:// doi. or g/ 10. 1007/ ing deep feedforward neural networks. Journal of Machine Learn-978-3- 319- 46723-8 ing Research, 9, 249–256. Park, A., Chute, C., Rajpurkar, P., Lou, J., Ball, R. L., Shpanskaya, Gorgolewski, K. J. (2008). The brain imaging data structure, a format K., Jabarkheel, R., Kim, L. H., McKenna, E., Tseng, J., Ni, J., for organizing and describing outputs of neuroimaging experi- Wishah, F., Wittber, F., Hong, D. S., Wilson, T. J., Halabi, S., ments. Scientific Data. https://doi. or g/10. 1007/ 978-1- 4020- 6754- Basu, S., Patel, B. N., Lungren, M. P., & Yeom, K. W. (2019). 9_ 1720 Deep learning-assisted diagnosis of cerebral aneurysms using Greving, J. P., Wermer, M. J. H., Brown, R. D., Morita, A., Juvela, the HeadXNet model. JAMA Network Open, 2(6), e195600. S., Yonekura, M., Ishibashi, T., Torner, J. C., Nakayama, T., https:// doi. org/ 10. 1001/ jaman etwor kopen. 2019. 5600 Rinkel, G. J. E., & Algra, A. (2014). Development of the Rao, B., Zohrabian, V., Cedeno, P., Saha, A., Pahade, J., & Davis, M. PHASES score for prediction of risk of rupture of intracranial A. (2021). Utility of artificial intelligence tool as a prospective aneurysms: A pooled analysis of six prospective cohort studies. radiology peer reviewer — detection of unreported intracranial The Lancet Neurology, 13(1), 59–66. https:// doi. org/ 10. 1016/ hemorrhage. Academic Radiology, 28(1), 85–93. https://doi. or g/ S1474- 4422(13) 70263-110. 1016/j. acra. 2020. 01. 035 Hainc, N., Mannil, M., Anagnostakou, V., Alkadhi, H., Blüthgen, C., Razzak, M. I., Naz, S., & Zaib, A. (2018). Deep learning for medical Wacht, L., Bink, A., Husain, S., Kulcsár, Z., & Winklhofer, S. image processing: Overview, challenges and the future. Lecture (2020). Deep learning based detection of intracranial aneurysms Notes in Computational Vision and Biomechanics, 26, 323–350. on digital subtraction angiography: A feasibility study. Neu-https:// doi. org/ 10. 1007/ 978-3- 319- 65981-7_ 12 roradiology Journal, 33(4), 311–317. https:// doi. org/ 10. 1177/ Shi, Z., Miao, C., Schoepf, U. J., Savage, R. H., Dargis, D. M., Pan, 19714 00920 937647 C., Chai, X., Li, X. L., Xia, S., Zhang, X., Gu, Y., Zhang, Y., Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating Hu, B., Xu, W., Zhou, C., Luo, S., Wang, H., Mao, L., Liang, deep network training by reducing internal covariate shift. Inter- K., & Zhang, L. J. (2020). A clinically applicable deep-learning national Conference on Machine Learning. PMLR, 2015. model for detecting intracranial aneurysm in computed tomog- Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J., & Maier-Hein, K. raphy angiography images. Nature Communications. https://d oi. H. (2021). nnU-Net: A self-cong fi uring method for deep learning-org/ 10. 1038/ s41467- 020- 19527-w based biomedical image segmentation. Nature Methods, 18(2), Sichtermann, T., Faron, A., Sijben, R., Teichert, N., Freiherr, J., 203–211. https:// doi. org/ 10. 1038/ s41592- 020- 01008-z & Wiesmann, M. (2019). Deep learning – based detection of Joo, B., Ahn, S. S., Yoon, P. H., Bae, S., Sohn, B., Lee, Y. E., Bae, J. intracranial aneurysms in 3D TOF-MRA. American Journal of H., Park, M. S., Choi, H. S., & Lee, S. K. (2020). A deep learn- Neuroradiology. https:// doi. org/ 10. 3174/ ajnr. A5911 ing algorithm may automate intracranial aneurysm detection Smith, S. M. (2002). Fast robust automated brain extraction. Human on MR angiography with high diagnostic performance. Euro- Brain Mapping, 17(3), 143–155. https:// doi. org/ 10. 1002/ hbm. pean Radiology, 30(11), 5785–5793. https:// doi. org/ 10. 1007/ 10062 s00330- 020- 06966-8 Stember, J. N., Chang, P., Stember, D. M., Liu, M., Grinband, J., Ke, R., Bugeau, A., Papadakis, N., Schuetz, P., & Schönlieb, C. Filippi, C. G., Meyers, P., & Jambawalikar, S. (2019). Convo- -B. (2020). Learning to segment microscopy images with lazy lutional neural networks for the detection and measurement of labels. ArXiv. https:// doi. org/ 10. 1007/ 978-3- 030- 66415-2_ 27 cerebral aneurysms on magnetic resonance angiography. Jour- Keedy, A. (2006). An overview of intracranial aneurysms. McGill nal of Digital Imaging, 32(5), 808–815. https://doi. or g/10. 1007/ Journal of Medicine: MJM, 9(2).s10278- 018- 0162-z Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic Taghanaki, S. A., Zheng, Y., Kevin Zhou, S., Georgescu, B., Sharma, optimization. 3rd International Conference on Learning Repre- P., Xu, D., Comaniciu, D., & Hamarneh, G. (2019). Combo loss: sentations, ICLR 2015 - Conference Track Proceedings, 1–15. Handling input and output imbalance in multi-organ segmenta- Liu, X., Feng, J., Wu, Z., Neo, Z., Zhu, C., Zhang, P., Wang, Y., tion. Computerized Medical Imaging and Graphics, 75, 24–33. Jiang, Y., Mitsouras, D., & Li, Y. (2021). Deep neural network-https:// doi. org/ 10. 1016/j. compm edimag. 2019. 04. 005 based detection and segmentation of intracranial aneurysms on Timmins, K. M., van der Schaaf, I. C., Bennink, E., Ruigrok, Y. M., 3D rotational DSA. Interventional Neuroradiology. https://doi. An, X., Baumgartner, M., Bourdon, P., De Feo, R., Noto, T., Di org/ 10. 1177/ 15910 19921 10009 56 Dubost, F., Fava-Sanches, A., Feng, X., Giroud, C., Group, I., Markiewicz, C. J., Gorgolewski, K. J., Feingold, F., Blair, R., Halchenko, Hu, M., Jaeger, P. F., Kaiponen, J., Klimont, M., Li, Y., & Kuijf, Y. O., Miller, E., Hardcastle, N., Wexler, J., Esteban, O., Goncalves, H. J. (2021). Comparing methods of detecting and segmenting M., Jwa, A., & Poldrack, R. A. (2021). OpenNeuro: An open unruptured intracranial aneurysms on TOF-MRAS: The ADAM resource for sharing of neuroimaging data. BioRxiv. https:// doi. org/ challenge. NeuroImage, 238, 118216. https://d oi.o rg/1 0.1 016/j. 10. 1101/ 2021. 06. 28. 450168neuro image. 2021. 118216 McHugh, M. L. (2012). The chi-square test of independence. Bio- Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan, A., chemia Medica, 23(2), 143–149. https:// doi. org/ 10. 11613/ BM. Yushkevich, P. A., & Gee, J. C. (2010). N4ITK: Improved N3 2013. 018 bias correction. IEEE Transactions on Medical Imaging, 29(6), Mouches, P., & Forkert, N. D. (2014). A statistical atlas of cere- 1310–1320. https:// doi. org/ 10. 1109/ TMI. 2010. 20469 08 bral arteries generated using multi-center MRA datasets from 1 3 Neuroinformatics Ueda, D., Doishita, S., & Choppin, A. (2019). Deep learning for Yang, X., Blezek, D. J., Cheng, L. T. E., Ryan, W. J., Kallmes, D. F., MR angiography : automated detection of cerebral aneurysms. & Erickson, B. J. (2011). Computer-aided detection of intracra- Radiology. https:// doi. org/ 10. 1148/ radiol. 20181 80901 nial aneurysms in MR angiography. Journal of Digital Imaging, Ward, J., Naik, K. S., Guthrie, F. J. A., Wilson, D., & Robinson, P. 24(1), 86–95. https:// doi. org/ 10. 1007/ s10278- 009- 9254-0 J. (1999). Hepatic lesion detection: comparison of MR imag- Yushkevich, P. A., Piven, J., Hazlett, H. C., Smith, R. G., Ho, S., ing after the administration of superparamagnetic iron oxide Gee, J. C., & Gerig, G. (2006). User-guided 3D active contour with dual-phase CT by using alternative-free response receiver segmentation of anatomical structures: Significantly improved operating characteristic analysis 1. Radiology. https:// doi. org/ efficiency and reliability. NeuroImage, 31(3), 1116–1128. 10. 1148/ radio logy. 210.2. r99fe 05459https:// doi. org/ 10. 1016/j. neuro image. 2006. 01. 015 Yang, J., Xie, M., Hu, C., Alwalid, O., Xu, Y., Liu, J., Jin, T., Li, C., Tu, D., Liu, X., Zhang, C., Li, C., & Long, X. (2020). Deep Publisher's Note Springer Nature remains neutral with regard to learning for detecting cerebral aneurysms with CT angiography. jurisdictional claims in published maps and institutional affiliations. Radiology, 298(1), 155–163. https://doi. or g/10. 1148/ RADIOL. 20201 92154 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Neuroinformatics Springer Journals

Towards Automated Brain Aneurysm Detection in TOF-MRA: Open Data, Weak Labels, and Anatomical Knowledge

Loading next page...
 
/lp/springer-journals/towards-automated-brain-aneurysm-detection-in-tof-mra-open-data-weak-shWUmrU02J
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
ISSN
1539-2791
eISSN
1559-0089
DOI
10.1007/s12021-022-09597-0
Publisher site
See Article on Publisher Site

Abstract

Brain aneurysm detection in Time-Of-Flight Magnetic Resonance Angiography (TOF-MRA) has undergone drastic improve- ments with the advent of Deep Learning (DL). However, performances of supervised DL models heavily rely on the quantity of labeled samples, which are extremely costly to obtain. Here, we present a DL model for aneurysm detection that over- comes the issue with “weak” labels: oversized annotations which are considerably faster to create. Our weak labels resulted to be four times faster to generate than their voxel-wise counterparts. In addition, our model leverages prior anatomical knowledge by focusing only on plausible locations for aneurysm occurrence. We first train and evaluate our model through cross-validation on an in-house TOF-MRA dataset comprising 284 subjects (170 females / 127 healthy controls / 157 patients with 198 aneurysms). On this dataset, our best model achieved a sensitivity of 83%, with False Positive (FP) rate of 0.8 per patient. To assess model generalizability, we then participated in a challenge for aneurysm detection with TOF-MRA data (93 patients, 20 controls, 125 aneurysms). On the public challenge, sensitivity was 68% (FP rate = 2.5), ranking 4th/18 on the open leaderboard. We found no significant difference in sensitivity between aneurysm risk-of-rupture groups (p = 0.75), locations (p = 0.72), or sizes (p = 0.15). Data, code and model weights are released under permissive licenses. We demonstrate that weak labels and anatomical knowledge can alleviate the necessity for prohibitively expensive voxel-wise annotations. Keywords Model robustness · Weak annotation · Domain knowledge · Deep learning · Magnetic resonance angiography · Aneurysm detection Introduction and lead to subarachnoid hemorrhages which have a mor- tality rate of 40% and usually cause severe disability for Time-Of-Flight Magnetic Resonance Angiography (TOF- patients (Frösen et al., 2012). MRA) is a non-invasive and non-contrast imaging technique Manually assessing a TOF-MRA is a costly process: radi- sensitive to the blood flow in brain vessels. TOF-MRA has ologists detect aneurysms by selectively scrolling through found widespread clinical application to identify Unrup- the TOF-MRA volumes in different planes—for instance, tured Intracranial Aneurysms (UIAs) which are small (typi- they check in the axial plane the most recurrent locations cal diameter ≅ 5 mm) abnormal focal dilatations in cerebral where aneurysms can occur. Then, the sagittal view permits arteries (Chen et al., 2018). If untreated, UIAs can rupture better views of areas like the basilar trunk; afterwards, the coronal view can be used for areas like the anterior cerebral arteries or the Sylvian segments. In addition, Maximum * Tommaso Di Noto Intensity Projection (MIP) images can be used to search tommaso.di-noto@chuv.ch for stenoses, or to confirm potential aneurysms that were Department of Radiology, Lausanne University Hospital spotted. and University of Lausanne, Lausanne, Switzerland Considering that the workload of radiologists is steadily Center for Psychiatric Neuroscience, Department increasing (Rao et al., 2021) and the detection of UIAs is of Psychiatry, Lausanne University Hospital and University a meticulous and non-trivial task (Nakao et al., 2018), the of Lausanne, Lausanne, Switzerland development of automated algorithms that aid clinicians in Center for Biomedical Imaging, CIBM, Lausanne, detecting aneurysms with high sensitivity is an active line of Switzerland Vol.:(0123456789) 1 3 Neuroinformatics research which holds the promise of improving care while hinders comparisons across models. Of all reviewed studies reducing radiologists’ assessment times. of Table 1, only (Baumgartner et al., 2021) evaluated their Before the popularization of Deep Learning (DL), models on the challenge dataset. (Arimura et  al., 2004) detected aneurysms by means of In this work, we develop a fully automated DL network image filtering, and later, (Yang et al., 2011) used candidate for UIA detection and propose to mitigate the data avail- points of interest in the brain arteries to locate aneurysms. ability bottleneck as follows: we explore the use of “weak” Then, starting from 2016, there was a shift towards DL labels (Abousamra et al., 2020; Ezhov et al., 2018; Ke et al., algorithms, which have now become the de facto standard 2020). These can be coarse or oversized annotations that for UIA detection. Table 1 illustrates several recent studies are less precise, but considerably faster to create for medi- that use DL for UIA detection. Despite their success, these cal experts. In addition, we release our annotated in-house DL approaches are still constrained by a major bottleneck dataset to the community. To the best of our knowledge, this common to several medical applications: the lack of large, will be the largest openly available TOF-MRA aneurysm labeled datasets. This is mainly due to two factors: first, the dataset to date. creation of voxel-wise labels for medical images is tedious Furthermore, we constrain the DL analysis only to the and time-consuming for radiologists (Razzak et al., 2018); areas of the brain where aneurysm occurrence is plausible. second, none of the TOF-MRA studies to date made their This anatomically-informed approach aims at simulating dataset publicly available (Joo et al., 2020; Nakao et al., the selective analysis that radiologists perform on the TOF- 2018; Sichtermann et al., 2019; Stember et al., 2019; Ueda MRA scans. Then, we assess multi-site robustness by evalu- et al., 2019). This hampers reproducibility and multi-site ating our algorithm on the external TOF-MRA challenge analyses that are paramount for building robust DL archi- dataset (Timmins et al., 2021). Last, since every aneurysm tectures. The lack of openly available data, such as the can have a different prognosis, we investigate how the per - TOF-MRA challenge dataset (Timmins et al., 2021), also formances of our model change with respect to aneurysm Table 1 Summary of papers that use deep learning models to tackle automated brain aneurysm detection/segmentation Paper Modality Task(s) N. Sub N. Aneurysms DL Model Model input Voxel-wise Use anatomical Multi-Site labels information (Ueda et al., MRA Detection 1271 1477 ResNet 2D patches Not specified No Yes 2019) (Joo et al., 2020) MRA Detection 744 761 3D ResNet 3D patches Yes Yes Yes (Nakao et al., MRA Detection 450 508 CNN 2D MIP patches Yes Yes No 2018) (Stember et al., MRA Detection 302 336 RCNN 2D MIP patches Yes No No 2019) (Baumgartner MRA Detection 254 N/A nnDetection 3D patches Yes No No et al., 2021) (Sichtermann MRA Detection (via 85 115 DeepMedic 3D patches Yes Yes No et al., 2019) segmentation) (Shi et al., 2020) CTA Detection + Seg- 1177 1099 3D UNET 3D patches Yes Yes Yes mentation (Yang et al., CTA Detection 1068 1337 ResNet 3D patches Not specified No Yes 2020) (Park et al., CTA Segmenta- 662 358 HeadXNet 3D patches Yes Yes No 2019) tion + CAD assessment (Dai et al., 2020) CTA Detection 311 352 RCNN 2D NP images Not specified No Yes (Liu et al., 2021) DSA Detection + Seg- 451 485 3D UNET 3D DSA vol- Yes Yes No mentation umes (Duan et al., DSA Detection 281 261 2D CNN 2D DSA images Bounding Boxes Yes No 2019) (Hainc et al., DSA Detection 240 187 2D CNN 2D DSA images ROI circle No No 2020) Use anatomical information: whether the method uses some sort of anatomical prior knowledge during training, patch sampling or inference (more details in  Online Resources – Section A) MRA Magnetic Resonance Angiography, CTA  Computed Tomography Angiography, DSA Digital Subtraction Angiography, N number, Sub sub- jects 1 3 Neuroinformatics risk-of-rupture groups (defined in “ Aneurysm Annotation, 1. Weak labels: for most subjects (246/284), the radi- Size, Location and Risk Groups for In-house Dataset” sec- ologist used the Multi-image Analysis GUI (Mango) tion), location and size. software (v. 4.0.1) to create the aforementioned weak labels. These correspond to spheres that enclose the whole aneurysm, regardless of the shape. In other Materials and Methods words, the size of the spheres was chosen manually by our radiologist on a case-by-case basis ensuring that the In‑house Dataset whole aneurysm was always entirely enclosed within the sphere. A visual example of one weak label is provided This study was approved by the regional ethics committee; in Fig. 1. written informed consent was waived. In this retrospec- 2. Voxel-wise labels: for the remaining subjects (38/284), tive work, we included consecutive patients that underwent the radiologist used ITK-SNAP (v. 3.6.0) (Yushkevich TOF-MRA between 2010 and 2015, and for which the cor- et al., 2006) to create voxel-wise labels drawn slice by responding radiological reports were available. Patients slice scrolling in the axial plane. No specific selection with ruptured/treated aneurysms or with other vascular criterion was used to select the 38 subjects, which were pathologies were excluded. Totally thrombosed aneurysms consecutive to the 246 of the first group. and infundibula (dilatations of the origin of an artery) were likewise excluded. In total, we retrieved 284 TOF-MRA sub- The overall number of aneurysms included in the study jects: 157 had one (or more) UIAs, while 127 did not present is 198 (178 saccular, 20 fusiform). Table 3 shows their loca- any. Table 2 illustrates the main demographic information tions and sizes grouped according to the PHASES score for our study group. A 3D gradient recalled echo sequence (Greving et al., 2014). This is a clinical score used to assess with Partial Fourier technique was used for all subjects the 5-year risk of rupture of aneurysms. Although using the (acquisition parameters are reported in Online Resources— PHASES sizes leads to a very skewed distribution (e.g. the Table 1). 214 subjects of this study were also used in (Di category size d ≤ 7 mm contains 91% of the aneurysms), we Noto et al., 2020). This prior article dealt with patch-wise decided to stick to this grouping since it is the one used in classification, whereas here we address patient-wise aneu- the clinic. rysm detection. The dataset was anonymized and organ- In addition, for post-hoc analysis and stratification pur - ized according to the Brain Imaging Data Structure (BIDS) poses, we divided the aneurysms into two groups based on standard (Gorgolewski, 2008). It is available on OpenNeuro their risk of rupture: low-risk and medium-risk. Aneurysms (Markiewicz et al., 2021) at https:// openn euro. org/ datas ets/ in the low-risk group are those that are monitored over time, ds003 949 under the CC0 license. but do not require any intervention. Instead, aneurysms in the medium-risk group can be considered for treatment. We computed for each aneurysm a partial PHASES score Aneurysm Annotation, Size, Location and Risk that only considered size, location, and patient’s age, thus Groups for In‑house Dataset neglecting population, hypertension, and earlier aneurysmal hemorrhage, since this information was not available for all Aneurysms were annotated by one radiologist with 2 years patients. If an aneurysm had partial PHASES score ≤ 4, it of experience in neuroimaging, and double-checked by a was assigned to the low-risk group, while if it had a partial senior neuroradiologist with over 15 years of experience score 4, it was assigned to the medium-risk group. Each to exclude potential false positives or false negatives. Two aneurysm was reviewed by our senior neuroradiologist to annotation schemes were followed: assess whether the partial PHASES score was reasonable. Table 2 Demographics of the Patients Controls Test, p value Whole Sample study sample N 157 127 / 284 –7 Age (y) 56 ± 14 46 ± 17 t = –4, 3, p = 7.6 × 10 51 ± 16 Sex 53 M, 104F 61 M, 66F χ  = 5.9 p = 0.01 114 M, 170F # UIA 198 0 / 198 Patients = subjects with aneurysm(s). Controls = subjects without aneurysms. Age calculated in years and presented as mean ± standard deviation. Two-sided t-test to compare age between patients and controls. Chi-squared test to compare sex counts between patients and controls N number of samples, M males, F females, UIA Unruptured Intracranial Aneurysms 1 3 Neuroinformatics Fig. 1 TOF-MRA orthogonal views of a 62-year-old female patient. Red areas correspond to our spherical weak labels. Top- left: axial plane; top-right: 3D posterior reconstruction of the cerebral arteries; bottom-left: sagittal plane; bottom-right: coronal plane Fusiform aneurysms were excluded from the risk analy- Data Processing sis since the PHASES score was built for saccular aneu- rysms. Similarly, extracranial carotid artery aneurysms were Several preprocessing steps were carried out for each excluded since they do not bleed in the subarachnoid space. subject. First, we performed skull-stripping with the FSL This resulted in 141 low-risk and 23 medium-risk aneu- Brain Extraction Tool (v. 6.0.1) (Smith, 2002). Second, we rysms. A table summarizing aneurysm shape, size, location, applied N4 bias field correction with SimpleITK (v. 1.2.0) associated PHASES score and risk groups is provided as (Tustison et al., 2010). Third, we resampled all volumes Supplementary Material. to a median voxel spacing, again with SimpleITK. This effectively normalizes nonuniform voxel sizes (Isensee et  al., 2021). Last, a probabilistic vessel atlas built from multi-center MRA datasets (Mouches & Forkert, 2014) was Table 3 Locations and sizes of aneurysms according to the PHASES co-registered to each patient’s TOF-MRA using ANTS (v. score for the in-house dataset 2.3.1) (Avants et al., 2014) (details in Online Resources Count % – Section B). The atlas was used both during training and Location ICA 59 29.8 (59/198) inference (see “Use of Anatomical Information” section). MCA 57 28.8 (57/198) ACA/Pcom/Posterior 82 41.4 (82/198) Network, Cross‑Validation, Metrics and Statistics Size    180 91.0 (180/198) d ≤ 7 mm 7 − 9, 9 mm    7 3.5 (7/198) Network The deep learning model used in this study is a 10 − 19, 9 mm   10 5.0 (10/198) custom 3D UNET, inspired by the original work (Özgün d ≥ 20 mm   1 0.5 (1/198) et  al., 2016). We used upsample layers in the decoding branch rather than transpose convolutions since these led to ICA Internal Carotid Artery, MCA Middle Cerebral Artery, ACA Ante- faster model convergence. Figure 2 illustrates the structure of rior Cerebral Arteries, Pcom Posterior communicating artery, Poste- our network. We used 3D TOF-MRA patches as input to our rior posterior circulation, d maximum diameter 1 3 Neuroinformatics TOF-MRA 3 3 3 3 3 26x64 9x64 18x64 9x64 9x32 9x64 3 3 3 3 3 26x16 61x32 26x32 52x32 26x32 26x32 61x8 3 3 3 61x16 74x16 122x16 61x16 3x3x3 3x3x3 61x16 BatchNorm conv max_pool 3x3x3 upsample concatenate 3D conv 74x8 Fig. 2 Proposed variant of the 3D UNET. The input corresponds to sponds to the probability of either belonging to foreground (i.e., aneu- a 64x64x64 voxels TOF-MRA patch. The output is a probabilistic rysm) or background. Conv convolutional, Max_pool max pooling, patch with the same size of the input, but where each voxel corre- BatchNorm batch normalization network. We set the side of the input patches to 64x64x64 is 855,111. Training and evaluation were performed with voxels to include even the largest aneurysms. All patches Tensorflow 2.4.0 and a GeForce RTX 2080TI GPU with were Z-score normalized, as is common practice (Bengio 11 GB of SDRAM. et al., 2016). A kernel size of 3x3x3 was used in all con- volutional layers, with padding and stride = 1. We applied Cross‑validation To evaluate detection performances, the ReLU activation function for all layers, except for the we conducted a fivefold cross-validation on the 246 sub - last layer which is followed by a sigmoid function. To fit the jects with weak labels. At each cross-validation split, 80% model, the Adam optimization algorithm (Kingma & Ba, (≈197/246) of the subjects are used for training the net- 2015) was applied with adaptive learning rate (initial learn- work, while the remaining 20% (≈49/246) of the subjects ing rate = 0.0001). We trained the model for 100 epochs, are used to compute patient-wise results (i.e. for inference). and we adopted the Combo loss function (Taghanaki et al., This division occurs 5 times (as the number of folds) and 2019) with α = β = 0.5. This function combines Dice and every time a different 80%-20% split is created, meaning Cross-entropy, and has proven to be effective for imbalanced that all 246 patients are ultimately used for evaluation. At segmentation tasks. We used Xavier initialization (Glorot & each cross-validation split, the 38 patients with voxel-wise Bengio, 2010) for all layers. Biases were initialized to 0 and labels were always added to the training set to increase the a batch size of 8 was chosen. Batch normalization (Ioffe & effect size of label quality in further analyses (see experi- Szegedy, 2015) was used to prevent overfitting. The num- ments in “Use of Weak Labels”). To avoid over-optimistic ber of convolutional filters, the batch size, the value of α results, we ensured that patients with multiple sessions were (and therefore β = 1 − α) and the learning rate were chosen not split between training and test set. In order to make using the Optuna algorithm (Akiba et al., 2019) on an inter- results comparable across experiments, we always used the nal validation set (20% of training cases of external cross- same cross-validation split and we released this split for validation fold 1, see below for cross-validation details). reproducibility on https:// git hub. com/ conne ct omi cslab/ The total number of trainable parameters in our network Aneur ysm_ Detec tion. 1 3 Neuroinformatics In all experiments on the in-house dataset, we always pre- the changes in detection performances with respect to trained our network on the whole ADAM training dataset aneurysm risk-of-rupture groups, location and size. (Timmins et al., 2021) and then fine-tuned it on the in-house training data. To validate the effectiveness of pre-training on ADAM, we performed ablation experiments of domain Use of Weak Labels adaptation across the two datasets (in-house and ADAM). As these experiments are beyond the main focus of the man- The goal of this experiment was to answer the following uscript, we added them in the Online Resources – Section F. questions: 1) how much faster is the creation of weak labels When performing pre-training on the ADAM dataset, we with respect to the creation of voxel-wise labels? 2) what is applied both anatomically-informed expedients described the impact of using weak labels in terms of detection perfor- below in “Use of Anatomical Information” section. mances when comparing to voxel-wise labels? To answer the first question, we selected a subset of 14 Metrics and Statistics In line with the ADAM challenge patients (mean aneurysm size (s.d.) = 5.2 (1.0) mm), and (presented in “ Participation to the ADAM Challenge” sec- we assessed the time difference between the two annotation tion), we used sensitivity and false positive (FP) rate as schemes (i.e. all 14 patients were annotated first with weak detection metrics. A detection was considered correct if the labels, and then with voxel-wise labels). These 14 patients center-of-mass of the predicted aneurysm was located within were chosen randomly among the 284 TOF-MRA subjects, the maximum aneurysm size of the ground truth mask. In but we ensured that the mean aneurysm size was representa- addition, we computed the Free-response Receiver Operat- tive of the whole cohort. ing Characteristic (FROC) curve (Chakraborty & Berbaum, To answer the second question, we used the 38 subjects 2004). To compare different model configurations, we used with voxel-wise labels and for these patients we artificially a two-sided Wilcoxon signed-rank test of the areas under the generated corresponding weak spherical labels (‘weakened’ FROC curves across test subjects, as similarly performed labels, details in Online Resources – Section C). Then, to in (Ward et al., 1999). To compare the performances of a evaluate the influence of annotation quality (weakened vs. configuration with respect to aneurysm rupture risk, location voxel-wise) in terms of detection performances, we con- and size we performed several Chi-squared tests (McHugh, ducted 3 experiments in which we used an increasing num- 2012). The statistical tests were performed using SciPy ber of patients with voxel-wise labels: (i) all 38 patients (v.1.4.1), setting a significance threshold α = 0.05. with weakened labels (Model 1, Table  4), (ii) 19 patients with weakened labels and 19 with voxel-wise labels (Model Experiments 2, Table 4), and (iii) all 38 patients with voxel-wise labels (Model 3, Table 4). Results related to the use of weak labels In this section, we will present the four experiments are presented in “Weak Labels Allow Fourfold Annotation that we conducted: in “Use of Weak Labels” section, we Speedup Without Degrading Performances” section. investigate the use of weak labels in terms of difference in annotation time and in detection performances, when com- Use of Anatomical Information paring to voxel-wise labels; in “Use of Anatomical Infor- mation” section, we present our anatomically-informed Because the task of aneurysm detection is extremely spa- approach for tackling UIA detection; in “Participation to tially constrained, we exploit the prior information that the ADAM Challenge” section, we describe the participa- aneurysms a) must occur in vessels, and b) tend to occur in tion to the ADAM challenge to investigate the generali- specific locations of the vasculature. To include this ana- zation of our model; in “Performances With Respect to tomical knowledge, one of our radiologists pinpointed in Risk-of-rupture, Location and Size” section, we analyze the vessel atlas (described in “Aneurysm Annotation, Size, Table 4 Average detection results on the in-house dataset across test folds when changing the ratio of voxel-wise/weakened labels. Sensitivity values are reported as mean and 95% Wilson confidence interval inside parentheses Model Configu- Anatomically-informed Anatomically-informed Labels of 38 added subs Avg. Sensitivity (CI) Avg. FP rate ration patch selection sliding window Model 1 Yes Yes 38 weakened 95/127 = 75% (65%, 80%) 1.3 Model 2 Yes Yes 19 weakened, 19 voxel-wise 99/127 = 78% (68%, 82%) 0.9 Model 3 Yes Yes 38 voxel-wise 101/127 = 80% (72%, 85%) 1.2 Bold values represent the best performances Avg average, FP false positive, CI confidence interval, voxel-wise labels drawn slice by slice on the axial plane,  weakened voxel-wise labels that are artificially converted to weak spherical labels, subs subjects 1 3 Neuroinformatics Location and Risk Groups for In-house Dataset” section) the Validation To validate the effectiveness of our two ana- location of 20 landmark points where aneurysm occurrence tomically-informed expedients (patch sampling and slid- is most frequent (list in Online Resources – Table 2). These ing window), we first evaluated an anatomically-agnostic points were chosen according to the literature (Brown & baseline where none of the two expedients is used and Broderick, 2014) and were co-registered to the TOF-MRA the 38 added subjects have weakened labels (Model 4, space of each subject, as illustrated in Fig. 3. Table 5). Second, we evaluated the same anatomically- agnostic baseline (none of the two expedients used) but Training We apply an anatomically-informed selection of with the 38 subjects having voxel-wise labels (Model training patches to sample both negative (without aneu- 5, Table 5). Third, we tested one model where only the rysms) and positive (with aneurysms) samples. Specifically, anatomically-informed patch sampling is carried out 8 positive patches per aneurysm were randomly extracted (Model 6, Table  5). Last, we computed performances in a non-centered fashion. Then, we extracted 50 negative when only the anatomically-informed sliding window is patches per TOF-MRA volume. Out of these, 20 were cen- performed (Model 7, Table 5). Results related to the use tered in correspondence with the landmark points, 20 were of anatomical information are shown in “Anatomically- patches containing vessels (details in Online Resources – informed Sliding Window Increases Detection Perfor- Section D), and 10 were extracted randomly. Overall, this mances” section. sampling strategy allows us to extract most of the negative patches (i.e., all but the random ones) which are comparable Participation to the ADAM Challenge to the positive ones in terms of average intensity. To mitigate class imbalance, we applied data augmentations on positive To evaluate model performances in data coming from patches: namely, rotations (90°, 180°, 270°), flipping (hori- a different institution, we participated to the Aneurysm zontal, vertical), contrast adjustment, gamma correction, and Detection And segMentation (ADAM) challenge (http:// addition of gaussian noise. adam. isi. uu. nl/) (Timmins et  al., 2021). The ADAM training dataset is composed of 113 TOF-MRA (93 Inference The patient-wise evaluation was performed fol- patients with UIAs, 20 controls). The total number of lowing the sliding window approach (details in Online UIAs is 125 and the voxel-wise annotations were drawn Resources – Section E). We exploited again the prior ana- in the axial plane by two radiologists. Instead, the unre- tomical information described above by retaining only the leased test dataset is made of 141 cases (117 patients, patches which are both within a minimum distance from the 26 controls) and it is solely used by the organizers to landmark points and fulfill specific intensity criteria (details compute patient-wise results. To improve detection per- in Online Resources – Section D). The rationale behind this formances on the ADAM test set, we pre-trained our choice was to only focus on patches located in the main network on the whole in-house dataset and then fine- cerebral arteries, as shown in Fig. 4. Two post-processing tuned it on the ADAM training dataset. Results related steps were adopted: first, we kept a maximum of 5 candidate to our model submission to the ADAM challenge are aneurysms per patient (only the 5 most probable). Second, presented in “The Proposed Model Ranked At the Top we applied test-time augmentation to increase sensitivity. of the ADAM Challenge” section. Fig. 3 left: 20 landmark points Probabilistic vessel atlas TOF-MRA volume (in red) located in specific positions of the cerebral arteries (white segmentation) in MNI space. right: same landmark points co-registered to the TOF-MRA space of a 21-year- old, female subject without aneurysms MNI spaceSubject space 1 3 Neuroinformatics Fig. 4 TOF-MRA orthogonal views of a 62-year-old female subject after brain extrac- tion: blue patches are the ones which are retained in the anatomically-informed sliding- window approach. (top-right): 3D schematic representation of sliding-window approach; out of all the patches in the volume (white patches), we only retain those located in the proximity of the main brain arteries (blue ones) Performances with Respect to Riskof ‑ ‑rupture, Location and Size addition, we also explored how performances would vary with respect to aneurysm location and size. Although the Each aneurysm has a different prognosis and, depend- latter analysis is less relevant from a clinical perspective, it is still interesting from a methodological point of view ing on its risk-of-rupture group (defined in “Aneurysm Annotation, Size, Location and Risk Groups for In-house and it is also frequent in the literature. Results related to the detection performances with respect to aneurysm Dataset” section), it will be either monitored over time (low risk) or considered for treatment (medium risk). risk-of-rupture groups, location and size are described in “Detection Performances Across Rupture Risk, Location, Therefore, we investigated how detection performances would vary with respect to the risk-of-rupture groups. In and Size” section. Table 5 Average detection results on the in-house dataset across test folds when applying none, or one of the two anatomically-informed expedi- ents. Sensitivity values are reported as mean and 95% Wilson confidence interval inside parentheses Model Anatomically-informed Anatomically-informed Labels of 38 added subs Avg. Sensitivity (CI) Avg. FP rate Configuration patch selection sliding window Model 4 No No 38 weakened 83/127 = 65% (55%, 71%) 4.6 Model 5 No No 38 voxel-wise 95/127 = 74% (63%, 78%) 4.5 Model 6 Yes No 38 voxel-wise 61/127 = 48% (38%, 55%) 4.8 Model 7 No Yes 38 voxel-wise 106/127 = 83% (75%, 88%) 0.8 Bold values represent the best performances Avg average, FP false positive, CI confidence interval, voxel-wise labels drawn slice by slice on the axial plane, weakened voxel-wise labels that are artificially converted to weak spherical labels, subs subjects 1 3 Neuroinformatics an effective expedient to increase detection results. In fact, Results sensitivity is increased and the average FP rate is drastically reduced. In addition, we compared Model 5 and Model 6 Weak Labels Allow Fourfold Annotation Speedup and we saw that Model 5 significantly outperforms Model Without Degrading Performances −6 p = 8 × 10 6 (W = 202.0, ). This finding shows that the anatomically-informed patch sampling is detrimental for When measuring the time needed to create weak vs. voxel- detection performances when the sliding window is anatom- wise annotations on the 14 subjects described in “Use of ically-agnostic. Last, when comparing Model 3 and Model Weak Labels” section, we noticed a significant difference 7 we found no significant difference (W = 81.5, p = 0.24 ): (two-sided Wilcoxon signed-rank test – annotation tim- this result indicates that the anatomically-informed patch ings, W = 0, p = 0.001): creating weak annotations (aver- sampling is not detrimental when we are also applying the age 23 s ± 6 per aneurysm) resulted to be approximately 4 anatomically-informed sliding window. times faster than creating voxel-wise annotations (average To provide a visual interpretation of our network predic- 93 s ± 25). A more detailed stratification of the timings with tions, we show in Fig. 5 one correctly identified aneurysm respect to location and size is provided in Supplementary (true positive), one small, missed aneurysm (false negative) Figs. 1 and 2. and one false positive prediction. Also, in Fig. 6 we report Subsequently, to investigate the effect that voxel-wise the FROC curves corresponding to Model 5, Model 6, and labels can have for detection performances with respect to Model 7. This figure reflects the statistical tests: Model 7 weak labels, we conducted several experiments where an (green curve) outperforms the anatomically-agnostic Model increasing ratio of voxel-wise/weakened labels was used 5 (red curve) at all operating points. Similarly, Model 5 (red for the 38 patients described in “Use of Weak Labels” sec- curve) significantly outperforms Model 6 (blue curve), tion. Table 4 shows detection performances when the ratio confirming the effectiveness of the anatomically-informed is gradually increased. sliding window and the ineffectiveness of the anatomically- The configuration with all voxel-wise labels (Model 3) informed patch sampling. had higher sensitivity with respect to the other two con- figurations with weakened labels (Model 1 and Model 2). The Proposed Model Ranked At the Top of the ADAM However, this difference was not significant (two-sided Challenge Wilcoxon signed-rank test on the areas under the FROC curves, W = 14.0, p = 0.054 when comparing to Model 1 and Table 6 illustrates detection results on the ADAM test data- W = 685.5, p = 0.977 when comparing to Model 2). set. Our algorithm ranked in 4th/18 position for detection and in 4th/15 position for segmentation (with highest volu- Anatomically‑informed Sliding Window Increases metric similarity). Interested readers can check the methods Detection Performances proposed by other teams on the challenge website (https:// adam. isi. uu. nl/) and in the paper (Timmins et al., 2021). In Table  5, we report detection results when adopting zero, one, or both anatomically-informed expedients pre- Detection Performances Across Rupture Risk, sented in “Use of Anatomical Information” section. In the Location, and Size anatomically-agnostic baseline with the 38 subjects having weakened labels (Model 4), the negative patch sampling Supplementary Fig. 3 illustrates performances achieved by is random and all non-zero patches of the TOF-MRA vol- one of our top-performing models (Model 3, Table 4) strati- umes are retained in the sliding window approach, thus fied according to the two risk groups presented in “ Aneu- disregarding any anatomical constrain. Similarly, row 2 rysm Annotation, Size, Location and Risk Groups for In- (Model 5) shows detection results when using neither the house Dataset” section. For the low-risk group, our model anatomically-informed patch sampling nor the anatomically- reaches a mean sensitivity of 80%, while for the medium-risk informed sliding window, but this time with the 38 subjects group it reaches a mean sensitivity of 73%. The difference having voxel-wise labels. Row 3 (Model 6) illustrates detec- was not significant (  = 0.09 , DoF = 1, p = 0.75). In Sup- tion performances when only the anatomically-informed plementary Figs. 4 and 5, we also report the model sensi- patch sampling is applied, but the sliding window is still tivity stratified according to aneurysm location and size of anatomically-agnostic. Instead, row 4 (Model 7) shows the the PHASES score, respectively. No significant difference inverse scenario (i.e. random negative patch sampling, but was found across different locations (  = 0.64, DoF = 2, anatomically-informed sliding window). Model 7 statisti- −6 2 p = 0.72) or sizes (  = 0.92, DoF = 2, p = 0.15, excluding cally outperformed Model 5 (W = 74.5, p = 2 × 10 ), t hus n = 1 huge aneurysm with 20 mm). Regarding aneurysm indicating that the anatomically-informed sliding window is s > 1 3 Neuroinformatics Fig. 5 Qualitative analysis of predictions and errors. The heatmap aneurysm) in the internal carotid artery. The ground truth label mask generated by the network ranges from 0 (low probability, yellow is shown in blue. c False positive prediction in the internal carotid color) to 1 (high probability, red color) (a) True positive prediction artery in the anterior communicating artery. b  False negative (i.e., missed size, we conducted a further stratification of performances Discussion since most of the aneurysms lied in the group (< 7 mm). Thus, we divided this group into subgroups, namely ≤ 3, This work shows that competitive results can be obtained 3 < s ≤ 5, and 5 < s < 7. Detection results with this more in automated aneurysm detection for TOF-MRA data granular stratification are shown in Supplementary Fig. 6. even with rapid data annotation. To this end, we pro- The model sensitivity was significantly lower for the tiny posed a fully-automated, deep learning algorithm that is aneurysms (≤ 3) with respect to the other two subgroups (  trained using weak labels and exploits prior anatomical −6 = 27.57, DoF = 2, p = ). knowledge. Fig. 6 Mean Free-response Receiver Operating Charac- Mean FROC curves teristic (FROC) curves across the five test folds of the cross-validation. Shaded areas represent the 95% Wilson confidence interval. The three models correspond to Model 5, Model 6, and Model 7. Anatomically-agnostic model: none of the two anatomically-informed expe- dients are used. Anat: Anatomically-Informed Model 5: anatomically-agnostic + 38 voxelwise Model 6: only anat. patch sampling + 38 voxelwise Model 7: only anat. sliding wind. + 38 voxelwise Nb. allowed FP per patient 1 3 Sensitivity Neuroinformatics Table 6 Detection results on the ADAM dataset. Our team (in bold) no anatomically-informed sliding window). We think this ranked in 4th position in the open leaderboard out of 18 participating difference between training and test domain is what causes groups the decrease in performances in the comparison Model 5 Detection vs. Model 6. Nevertheless, the anatomically-informed sliding window Ranking Team Sens Avg. FP rate expedient suggests that injecting prior anatomical knowl- 1 abc 68% 0.40 edge in the pipeline can improve detection performances. 2 xlim 70% 4.03 We believe this general principle is also applicable to other 3 mibaumgartner 67% 0.13 pathologies with sparse spatial extent. 4 unil-chuv3 68% 2.50 The state-of-the-art for automated brain aneurysm detec- 5 joker 63% 0.16 tion in TOF-MRA has been rapidly advancing in the last five years, especially due to the advent of deep learning 18 ibbm 2% 0.01 algorithms. However, further multi-site validation is needed before safely applying these algorithms during routine clini- Sens sensitivity, FP false positive cal practice. Although (Joo et al., 2020; Ueda et al., 2019) did publish results obtained from multiple institutions, Despite being less accurate, weak labels are drastically none of them released their dataset publicly which makes faster to create for medical experts reducing fourfold the comparisons between algorithms unfeasible. The compari- annotation time. Although the configuration with all voxel- sons between methods are further hindered by the use of wise labels (Model 3, Table 4) had higher sensitivity, we non-standardized evaluation metrics (e.g. FROC/lesion- found no statistical difference when comparing with the wise sensitivity/subject-wise specificity) or by the fact that configurations with some (Model 2) or all weakened labels not all related studies include both patients (subjects with (Model 1). This finding indicates that weak labels are suffi- aneurysms) and controls (subjects without aneurysms). By cient to obtain satisfactory detection results on our in- house openly releasing our dataset, we aim to bridge the data avail- dataset. If reasoning in terms of larger datasets (e.g., thou- ability gap and foster reproducibility in the medical imaging sands of patients), the weak annotation proposed in this work community. The combination of our in-house dataset and is a scalable solution which can significantly alleviate the the ADAM dataset will allow researchers to assess the real- annotation bottleneck in medical ML applications. istic robustness of proposed algorithms on heterogeneous In addition to the use of weak labels, our model leverages data generated from different scanners, acquisition protocols the underlying anatomy of the brain vasculature (i.e., we and study population. In addition, it could help increasing “anatomically-informed” our network) in order to simulate detection performances which are still too far from being the radiologists’ exploration of the TOF-MRA scans. First, clinically useful, considering that even the team with highest most of the negative patches (i.e. patches without aneu- sensitivity on the ADAM test set (team xlim) only reaches rysms) extracted during training either contained a vessel or a value of 70% (i.e., 30% of aneurysms still not detected), were located in correspondence with the aneurysm landmark with 4 FPs per case. points. Second, we limited the sliding window approach only In a separate analysis, we also computed the sensitiv- to regions of the brain that are plausible for aneurysm occur- ity of our model with respect to the PHASES score risk of rence. These constraints reflect the radiologists’ behavior in rupture, location, and size. No significant differences were the sense that only regions containing vessels, or at higher found across the three groups indicating that our model is risk for aneurysms are scanned, while the rest of the brain robust to different aneurysm types. However, when strati - is neglected. The experiments in “Anatomically-informed fying the aneurysm sizes into finer subgroups, we noticed Sliding Window Increases Detection Performances” section that sensitivity for extremely tiny aneurysms (≤ 3 mm) was showed that the anatomically-informed sliding window is an significantly lower, which confirms a known trend (Timmins eec ff tive expedient since it increases sensitivity, while reduc - et al., 2021). ing the average FP rate. Instead, the anatomically-informed Our work has several limitations. First, even combining patch sampling proved to be negligible when combined our in-house dataset with the ADAM dataset, the number with the anatomically-informed sliding-window (Model of subjects is still limited when compared to some related 3 vs. Model 7), or even detrimental when the sliding win- TOF-MRA (Joo et al., 2020; Ueda et al., 2019) or Computed dow was anatomically-agnostic (Model 5 vs. Model 6). We Tomography Angiography (Park et  al., 2019; Shi et  al., hypothesize that applying only the anatomically-informed 2020; Yang et al., 2020) studies. Second, we acknowledge patch sampling leads to a domain shift issue: specifically, that the number of patients for whom we compared the dif- the model is trained using intensity-matched patches, but ferent annotations schemes (i.e., weak vs. voxel-wise) is then is tested with any patch in the brain (because there is limited (N = 38); it is possible that statistically significant 1 3 Neuroinformatics as you give appropriate credit to the original author(s) and the source, performance die ff rences could be found with a larger sample provide a link to the Creative Commons licence, and indicate if changes size. Third, we have to further increase detection perfor- were made. The images or other third party material in this article are mances if we plan to deploy our model as a second reader included in the article's Creative Commons licence, unless indicated for radiologists, especially to detect tiny aneurysms which otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not are more frequently overlooked (Keedy, 2006). permitted by statutory regulation or exceeds the permitted use, you will In future works, we aim at enlarging the TOF-MRA dataset need to obtain permission directly from the copyright holder. To view a and experiment new variants of the 3D encoding–decoding copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . UNET. For instance, we might consider a multi-scale approach with patches of larger (or smaller) scales. Alternatively, we are considering combining our anatomically-driven approach with References the novel nnUnet model (Isensee et al., 2021) which has proven to be effective not only for aneurysm detection (it was adopted Abousamra, S., Fassler, D., Hou, L., Zhang, Y., Gupta, R., Kurc, T., by 2 of the top-performing teams in the ADAM challenge), Escobar-Hoyos, L. F., Samaras, D., Knudson, B., Shroyer, K., Saltz, J., & Chen, C. (2020). Weakly-supervised deep stain decom- but also for several other segmentation tasks. We believe this position for multiplex IHC images. Proceedings - International combination holds potential to boost detection performances. Symposium on Biomedical Imaging, 481–485. https:// doi. org/ 10. Also, the ablation study performed in the Online Resources 1109/ ISBI4 5749. 2020. 90986 52 – Section F showed that pre-training on the ADAM dataset did Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: a next-generation hyperparameter optimization frame- not increase detections results. Therefore, future works should work. Proceedings of the ACM SIGKDD International Confer- investigate a different transfer learning approach to better lev - ence on Knowledge Discovery and Data Mining. https:// doi. org/ erage knowledge acquired from the ADAM dataset. Last, we 10. 1145/ 32925 00. 33307 01 plan to conduct further error analyses to identify common pat- Arimura, H., Li, Q., Korogi, Y., Hirai, T., & Abe, H. (2004). Automated computerized scheme for detection of unruptured intracranial aneu- terns for both false positive and false negative cases. rysms in three- dimensional magnetic resonance angiography 1. In conclusion, our study presented an anatomically- Academic Radiology. https:// doi. org/ 10. 1016/j. acra. 2004. 07. 011 informed 3D UNET that tackles brain aneurysm detection Avants, B. B., Tustison, N., & Johnson, H. (2014). Advanced Normali- across different sites. The combination of time-saving weak zation Tools (ANTS). Insight J, 2(365), 1–35. https:// brian avants. w or dp r ess. com/ 2012/ 04/ 13/ updat ed- ants- com pi le- ins tr uctio ns- labels and anatomical prior knowledge allowed us to build april- 12- 2012/. Accessed January 2021. a robust deep learning model. We believe our approach and Baumgartner, M., Jäger, P. F., Isensee, F., & Maier-Hein, K. H. (2021). dataset (both openly available) can foster the development of nnDetection: a self-configuring method for medical object detec- clinically applicable automated systems for the task at hand. tion. MICCAI. https:// git hub. com/ MIC- DKFZ/ nnDe t ection. Accessed July 2021. Bengio, Y., Goodfellow, I., & Courville, A. (2016). Deep learning. MIT Information Sharing Statement ‑ Data Press, 29(7553). Availability Brown, R. D., & Broderick, J. P. (2014). Unruptured intracranial aneu- rysms: Epidemiology, natural history, management options, and familial screening. The Lancet Neurology, 13(4), 393–404. https:// Our open-access dataset is available on OpenNeuro under doi. org/ 10. 1016/ S1474- 4422(14) 70015-8 the CC0 license at https://openn eur o.or g/dat ase ts/ds003 949 . Chakraborty, D. P., & Berbaum, K. S. (2004). Observer studies involv- The ADAM dataset can be downloaded from the challenge ing detection and localization: Modeling, analysis, and validation. website https://adam. isi. uu. nl/ dat a/ after signing a confiden- Medical Physics, 31(8), 2313–2330. https:// doi. org/ 10. 1118/1. 17693 52 tiality agreement. The code used for this study is available Chen, X., Liu, Y., Tong, H., Dong, Y., Ma, D., Xu, L., & Yang, C. at https://git hub.com/ conne ct omicslab/ Aneur y sm_De tection (2018). Meta-analysis of computed tomography angiography ver- under the Apache-2.0 license, together with the configura- sus magnetic resonance angiography for intracranial aneurysm. tion files to replicate all the experiments, and the weights of Medicine (United States), 97(20). https:// doi. org/ 10. 1097/ MD. 00000 00000 010771 the trained model if users simply want to perform inference. Dai, X., Huang, L., Qian, Y., Xia, S., Chong, W., Liu, J., Di Ieva, A., Hou, X., & Ou, C. (2020). Deep learning for automated cerebral Supplementary Information The online version contains supplemen- aneurysm detection on computed tomography images. Interna- tary material available at https://doi. or g/10. 1007/ s12021- 022- 09597-0 . tional Journal of Computer Assisted Radiology and Surgery, 15(4), 715–723. https:// doi. org/ 10. 1007/ s11548- 020- 02121-2 Acknowledgements We would like to thank the organizing team of the Di Noto, T., Marie, G., Tourbier, S., Alemán-Gómez, Y., Saliou, G., ADAM challenge for their great effort and availability. Cuadra, M. B., Hagmann, P., & Richiardi, J. (2020). An anatomically- informed 3D CNN for brain aneurysm classification with weak labels. Funding Open access funding provided by University of Lausanne. Machine Learning in Clinical Neuroimaging and Radiogenomics in Neuro-Oncology. http:// arxiv. org/ abs/ 2012. 08645. Accessed Janu- ary 2021. Open Access This article is licensed under a Creative Commons Attri- Duan, H., Huang, Y., Liu, L., Dai, H., Chen, L., & Zhou, L. (2019). bution 4.0 International License, which permits use, sharing, adapta- Automatic detection on intracranial aneurysm from digital sub- tion, distribution and reproduction in any medium or format, as long traction angiography with cascade convolutional neural networks. 1 3 Neuroinformatics BioMedical Engineering Online, 18(1). https:// doi. org/ 10. 1186/ healthy subjects. Scientific Data, 6(1), 1–8. https:// doi. org/ 10. s12938- 019- 0726-21038/ s41597- 019- 0034-5 Ezhov, M., Zakirov, A., & Gusarev, M. (2019). Coarse-to-fine volumet- Nakao, T., Hanaoka, S., Nomura, Y., Sato, I., Nemoto, M., Miki, S., ric segmentation of teeth in cone-beam CT. IEEE 16th Interna- Maeda, E., Yoshikawa, T., Hayashi, N., & Abe, O. (2018). Deep tional Symposium on Biomedical Imaging (ISBI 2019). neural network-based computer-assisted detection of cerebral Frösen, J., Tulamo, R., Paetau, A., Laaksamo, E., Korja, M., Laakso, aneurysms in MR angiography. Journal of Magnetic Resonance A., Niemelä, M., & Hernesniemi, J. (2012). Saccular intracranial Imaging, 47(4), 948–953. https:// doi. org/ 10. 1002/ jmri. 25842 aneurysm: Pathology and mechanisms. Acta Neuropathologica, Özgün, Ç., Abdulkadir, A., Lienkamp, S., Brox, T., & Ronneberg, 123(6), 773–786. https:// doi. org/ 10. 1007/ s00401- 011- 0939-3 O. (2016). 3D U-Net: Learning dense volumetric segmenta- Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of train- tion from sparse annotation. ArXiv. https:// doi. or g/ 10. 1007/ ing deep feedforward neural networks. Journal of Machine Learn-978-3- 319- 46723-8 ing Research, 9, 249–256. Park, A., Chute, C., Rajpurkar, P., Lou, J., Ball, R. L., Shpanskaya, Gorgolewski, K. J. (2008). The brain imaging data structure, a format K., Jabarkheel, R., Kim, L. H., McKenna, E., Tseng, J., Ni, J., for organizing and describing outputs of neuroimaging experi- Wishah, F., Wittber, F., Hong, D. S., Wilson, T. J., Halabi, S., ments. Scientific Data. https://doi. or g/10. 1007/ 978-1- 4020- 6754- Basu, S., Patel, B. N., Lungren, M. P., & Yeom, K. W. (2019). 9_ 1720 Deep learning-assisted diagnosis of cerebral aneurysms using Greving, J. P., Wermer, M. J. H., Brown, R. D., Morita, A., Juvela, the HeadXNet model. JAMA Network Open, 2(6), e195600. S., Yonekura, M., Ishibashi, T., Torner, J. C., Nakayama, T., https:// doi. org/ 10. 1001/ jaman etwor kopen. 2019. 5600 Rinkel, G. J. E., & Algra, A. (2014). Development of the Rao, B., Zohrabian, V., Cedeno, P., Saha, A., Pahade, J., & Davis, M. PHASES score for prediction of risk of rupture of intracranial A. (2021). Utility of artificial intelligence tool as a prospective aneurysms: A pooled analysis of six prospective cohort studies. radiology peer reviewer — detection of unreported intracranial The Lancet Neurology, 13(1), 59–66. https:// doi. org/ 10. 1016/ hemorrhage. Academic Radiology, 28(1), 85–93. https://doi. or g/ S1474- 4422(13) 70263-110. 1016/j. acra. 2020. 01. 035 Hainc, N., Mannil, M., Anagnostakou, V., Alkadhi, H., Blüthgen, C., Razzak, M. I., Naz, S., & Zaib, A. (2018). Deep learning for medical Wacht, L., Bink, A., Husain, S., Kulcsár, Z., & Winklhofer, S. image processing: Overview, challenges and the future. Lecture (2020). Deep learning based detection of intracranial aneurysms Notes in Computational Vision and Biomechanics, 26, 323–350. on digital subtraction angiography: A feasibility study. Neu-https:// doi. org/ 10. 1007/ 978-3- 319- 65981-7_ 12 roradiology Journal, 33(4), 311–317. https:// doi. org/ 10. 1177/ Shi, Z., Miao, C., Schoepf, U. J., Savage, R. H., Dargis, D. M., Pan, 19714 00920 937647 C., Chai, X., Li, X. L., Xia, S., Zhang, X., Gu, Y., Zhang, Y., Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating Hu, B., Xu, W., Zhou, C., Luo, S., Wang, H., Mao, L., Liang, deep network training by reducing internal covariate shift. Inter- K., & Zhang, L. J. (2020). A clinically applicable deep-learning national Conference on Machine Learning. PMLR, 2015. model for detecting intracranial aneurysm in computed tomog- Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J., & Maier-Hein, K. raphy angiography images. Nature Communications. https://d oi. H. (2021). nnU-Net: A self-cong fi uring method for deep learning-org/ 10. 1038/ s41467- 020- 19527-w based biomedical image segmentation. Nature Methods, 18(2), Sichtermann, T., Faron, A., Sijben, R., Teichert, N., Freiherr, J., 203–211. https:// doi. org/ 10. 1038/ s41592- 020- 01008-z & Wiesmann, M. (2019). Deep learning – based detection of Joo, B., Ahn, S. S., Yoon, P. H., Bae, S., Sohn, B., Lee, Y. E., Bae, J. intracranial aneurysms in 3D TOF-MRA. American Journal of H., Park, M. S., Choi, H. S., & Lee, S. K. (2020). A deep learn- Neuroradiology. https:// doi. org/ 10. 3174/ ajnr. A5911 ing algorithm may automate intracranial aneurysm detection Smith, S. M. (2002). Fast robust automated brain extraction. Human on MR angiography with high diagnostic performance. Euro- Brain Mapping, 17(3), 143–155. https:// doi. org/ 10. 1002/ hbm. pean Radiology, 30(11), 5785–5793. https:// doi. org/ 10. 1007/ 10062 s00330- 020- 06966-8 Stember, J. N., Chang, P., Stember, D. M., Liu, M., Grinband, J., Ke, R., Bugeau, A., Papadakis, N., Schuetz, P., & Schönlieb, C. Filippi, C. G., Meyers, P., & Jambawalikar, S. (2019). Convo- -B. (2020). Learning to segment microscopy images with lazy lutional neural networks for the detection and measurement of labels. ArXiv. https:// doi. org/ 10. 1007/ 978-3- 030- 66415-2_ 27 cerebral aneurysms on magnetic resonance angiography. Jour- Keedy, A. (2006). An overview of intracranial aneurysms. McGill nal of Digital Imaging, 32(5), 808–815. https://doi. or g/10. 1007/ Journal of Medicine: MJM, 9(2).s10278- 018- 0162-z Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic Taghanaki, S. A., Zheng, Y., Kevin Zhou, S., Georgescu, B., Sharma, optimization. 3rd International Conference on Learning Repre- P., Xu, D., Comaniciu, D., & Hamarneh, G. (2019). Combo loss: sentations, ICLR 2015 - Conference Track Proceedings, 1–15. Handling input and output imbalance in multi-organ segmenta- Liu, X., Feng, J., Wu, Z., Neo, Z., Zhu, C., Zhang, P., Wang, Y., tion. Computerized Medical Imaging and Graphics, 75, 24–33. Jiang, Y., Mitsouras, D., & Li, Y. (2021). Deep neural network-https:// doi. org/ 10. 1016/j. compm edimag. 2019. 04. 005 based detection and segmentation of intracranial aneurysms on Timmins, K. M., van der Schaaf, I. C., Bennink, E., Ruigrok, Y. M., 3D rotational DSA. Interventional Neuroradiology. https://doi. An, X., Baumgartner, M., Bourdon, P., De Feo, R., Noto, T., Di org/ 10. 1177/ 15910 19921 10009 56 Dubost, F., Fava-Sanches, A., Feng, X., Giroud, C., Group, I., Markiewicz, C. J., Gorgolewski, K. J., Feingold, F., Blair, R., Halchenko, Hu, M., Jaeger, P. F., Kaiponen, J., Klimont, M., Li, Y., & Kuijf, Y. O., Miller, E., Hardcastle, N., Wexler, J., Esteban, O., Goncalves, H. J. (2021). Comparing methods of detecting and segmenting M., Jwa, A., & Poldrack, R. A. (2021). OpenNeuro: An open unruptured intracranial aneurysms on TOF-MRAS: The ADAM resource for sharing of neuroimaging data. BioRxiv. https:// doi. org/ challenge. NeuroImage, 238, 118216. https://d oi.o rg/1 0.1 016/j. 10. 1101/ 2021. 06. 28. 450168neuro image. 2021. 118216 McHugh, M. L. (2012). The chi-square test of independence. Bio- Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan, A., chemia Medica, 23(2), 143–149. https:// doi. org/ 10. 11613/ BM. Yushkevich, P. A., & Gee, J. C. (2010). N4ITK: Improved N3 2013. 018 bias correction. IEEE Transactions on Medical Imaging, 29(6), Mouches, P., & Forkert, N. D. (2014). A statistical atlas of cere- 1310–1320. https:// doi. org/ 10. 1109/ TMI. 2010. 20469 08 bral arteries generated using multi-center MRA datasets from 1 3 Neuroinformatics Ueda, D., Doishita, S., & Choppin, A. (2019). Deep learning for Yang, X., Blezek, D. J., Cheng, L. T. E., Ryan, W. J., Kallmes, D. F., MR angiography : automated detection of cerebral aneurysms. & Erickson, B. J. (2011). Computer-aided detection of intracra- Radiology. https:// doi. org/ 10. 1148/ radiol. 20181 80901 nial aneurysms in MR angiography. Journal of Digital Imaging, Ward, J., Naik, K. S., Guthrie, F. J. A., Wilson, D., & Robinson, P. 24(1), 86–95. https:// doi. org/ 10. 1007/ s10278- 009- 9254-0 J. (1999). Hepatic lesion detection: comparison of MR imag- Yushkevich, P. A., Piven, J., Hazlett, H. C., Smith, R. G., Ho, S., ing after the administration of superparamagnetic iron oxide Gee, J. C., & Gerig, G. (2006). User-guided 3D active contour with dual-phase CT by using alternative-free response receiver segmentation of anatomical structures: Significantly improved operating characteristic analysis 1. Radiology. https:// doi. org/ efficiency and reliability. NeuroImage, 31(3), 1116–1128. 10. 1148/ radio logy. 210.2. r99fe 05459https:// doi. org/ 10. 1016/j. neuro image. 2006. 01. 015 Yang, J., Xie, M., Hu, C., Alwalid, O., Xu, Y., Liu, J., Jin, T., Li, C., Tu, D., Liu, X., Zhang, C., Li, C., & Long, X. (2020). Deep Publisher's Note Springer Nature remains neutral with regard to learning for detecting cerebral aneurysms with CT angiography. jurisdictional claims in published maps and institutional affiliations. Radiology, 298(1), 155–163. https://doi. or g/10. 1148/ RADIOL. 20201 92154 1 3

Journal

NeuroinformaticsSpringer Journals

Published: Aug 18, 2022

Keywords: Model robustness; Weak annotation; Domain knowledge; Deep learning; Magnetic resonance angiography; Aneurysm detection

References