Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DeepDicomSort: An Automatic Sorting Algorithm for Brain Magnetic Resonance Imaging Data

DeepDicomSort: An Automatic Sorting Algorithm for Brain Magnetic Resonance Imaging Data With the increasing size of datasets used in medical imaging research, the need for automated data curation is arising. One important data curation task is the structured organization of a dataset for preserving integrity and ensuring reusability. Therefore, we investigated whether this data organization step can be automated. To this end, we designed a convolutional neural network (CNN) that automatically recognizes eight different brain magnetic resonance imaging (MRI) scan types based on visual appearance. Thus, our method is unaffected by inconsistent or missing scan metadata. It can recognize pre- contrast T1-weighted (T1w), post-contrast T1-weighted (T1wC), T2-weighted (T2w), proton density-weighted (PDw) and derived maps (e.g. apparent diffusion coefficient and cerebral blood flow). In a first experiment, we used scans of subjects with brain tumors: 11065 scans of 719 subjects for training, and 2369 scans of 192 subjects for testing. The CNN achieved an overall accuracy of 98.7%. In a second experiment, we trained the CNN on all 13434 scans from the first experiment and tested it on 7227 scans of 1318 Alzheimer’s subjects. Here, the CNN achieved an overall accuracy of 98.5%. In conclusion, our method can accurately predict scan type, and can quickly and automatically sort a brain MRI dataset virtually without the need for manual verification. In this way, our method can assist with properly organizing a dataset, which maximizes the shareability and integrity of the data. Keywords DICOM · Brain imaging · Machine learning · Magnetic resonance imaging · BIDS · Data curation Introduction that is shared in public repositories (Greenspan et al. 2016; Lundervold and Lundervold 2019). However, this increase With the rising popularity of machine learning, deep in available data also means that proper data curation, the learning, and automatic pipelines in the medical imaging management of data throughout its life cycle, is needed to field, the demand for large datasets is increasing. To satisfy keep the data manageable and workable (Prevedello et al. this hunger for data, the amount of imaging data collected at 2019; van Ooijen 2019). One essential data curation step healthcare institutes keeps growing, as is the amount of data is organizing a dataset such that it can easily be used and reused. Properly organizing the dataset maximizes the shareability and preserves the full integrity of the dataset, Alzheimer’s Disease Neuroimaging Initiative (ADNI) is a ensuring repeatability of an experiment and reuse of the Group/Institutional Author. Data used in preparation of this article were obtained from the dataset in other experiments. Alzheimer’s Disease Neuroimaging Initiative (ADNI) database Unfortunately, the organization of medical imaging data (adni.loni.usc.edu). As such, the investigators within the ADNI is not standardized, and the format in which a dataset is pro- contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing vided often differs between sources (Lambin et al. 2017; of this report. A complete listing of ADNI investigators can van Ooijen 2019). Efforts such as the brain imaging data be found at: http://adni.loni.ucla.edu/wpcontent/uploads/how to structure (BIDS) (Gorgolewski et al. 2016) propose a stan- apply/ADNI Acknowledgement List.pdf dardized data structure, to which some public data repos- Sebastian R. van der Voort itories adhere (e.g. OpenNeuro Gorgolewski et al. 2017, s.vandervoort@erasmusmc.nl ABIDE Martino et al. 2017 and OASIS LaMontagne et al. 2018). However, other repositories do not conform to this standard (e.g. The Cancer Imaging Archive (TCIA) Clark Extended author information available on the last page of the article. 160 Neuroinform (2021) 19:159–184 et al. 2013, Alzheimer’s Disease Neuroimaging Initiative trace/isotropic images), perfusion-weighted dynamic sus- (ADNI), and PPMI Marek et al. 2018). Furthermore, similar ceptibility contrast (PWI-DSC) scans, and diffusion- to some prospectively collected research data, retrospec- weighted and perfusion-weighted derived maps (including, tively collected data from clinical practice usually do not for example, apparent diffusion coefficient (ADC), frac- follow a standardized format either (van Ooijen 2019). tional anisotropy, and relative cerebral blood flow). Once Thus, the challenge of structuring a dataset, either into a the scan types have been identified, DeepDicomSort can BIDS compliant dataset or a different format, remains. organize the dataset into a structured, user-defined layout or When using a medical imaging dataset in a research turn the dataset into a BIDS compliant dataset. We made all project, one needs to select the scan types that are relevant our source code, including code for the pre-processing and to the research question (Montagnon et al. 2020; Lambin post-processing, and pre-trained models publicly available, et al. 2017). Thus, it is essential to identify the scan type of to facilitate reuse by the community. each scan when sorting a medical imaging dataset. Different data sources do not use consistent naming conventions Materials & Methods in the metadata of a scan (e.g. the series description), which complicates the automatic identification of the scan Terminology type (van Ooijen 2019;Wang et al. 2011). Moreover, in some cases, this metadata is not consistently stored (e.g. Since the exact meaning of specific terms can differ contrast administration Hirsch et al. 2015) and might even depending on one’s background, we have provided an be partially or entirely missing, as can be the case for overview of the terminology as it is used in this paper anonymized data (Moore et al. 2015). As a result, the sorting in Table 1. We have tried to adhere to the terminology is frequently done manually, by looking at each scan and used by BIDS as much as possible, and have provided labeling it according to the perceived scan type. This manual the equivalent BIDS terminology in Table 1 as well. We labeling can be a very time-consuming task, which hampers differ from the BIDS terminology regarding two terms: scientific progress; thus, it is highly desirable to automate scan and scan type. Scan type is referred to as modality in this step of the data curation pipeline. Similar arguments BIDS, but to avoid confusion with the more common use of concerning the complexity of medical imaging data and the modality to indicate different types of equipment (e.g. MRI importance of data structuring also motivated the creation and computed tomography (CT)), we instead use scan type. of the BIDS standard (Gorgolewski et al. 2016). Scan is used instead of “data acquisition” or “run” as used Previous research has focused on modality recognition in BIDS, to be more in line with common terminology and (Dimitrovski et al. 2015;Yu et al. 2015; Arias et al. 2016), to avoid confusion with other types of data acquisition. We as well as on distinguishing different modalities of (MRI) define a structured dataset as a dataset where all the data scans (Srinivas and Mohan 2014; Remedios et al. 2018). for the different subjects and scans is provided in the same Only one of these studies (Remedios et al. 2018) considered way. For example, a folder structure with a folder for each the prediction of the scan type of magnetic resonance subject, session and scan with a consistent naming format imaging MRI scans, who predicted 4 scan types, namely for the different folders and scan types. A standardized precontrast T1-weighted (T1w), post-contrast T1-weighted dataset is a dataset where the data has been structured (T1wC), fluid-attenuated inversion recovery (FLAIR) and according to a specific, public standard, for example BIDS. T2-weighted (T2w) scans. However, with the increasing popularity of multi-parametric MRI in machine learning Data algorithms and automatic pipelines (Li et al. 2017; Akkus et al. 2017; Nie et al. 2016; Pereira et al. 2015), the need to An extensive collection of data from multiple different recognize more scan types is arising. sources was used to construct our method and evaluate its In this research, we propose a method, called DeepDi- performance. We used MRI scans of subjects with brain comSort, that recognizes eight different scan types of brain tumors, as well as scans of subjects without brain tumors. MRI scans, and facilitates sorting into a structured format. To ensure sufficient heterogeneity in our dataset, we DeepDicomSort is a pipeline consisting of a pre-processing included scans from multiple different sources, and we step to prepare scans as inputs for a convolutional neu- only excluded scans if their scan type did not fall into one ral network (CNN), a scan type recognition step using of the eight categories that we aimed at predicting with a CNN, and a post-processing step to sort the identified our method. Thus, no scans were excluded based on other scan types into a structured format. Our method identi- criteria such as low image quality, the occurrence of imaging fies T1w, T1wC, T2w, proton density-weighted (PDw), artifacts, scanner settings, or disease state of the subject. T2-weighted fluid-attenuated inversion recovery (T2w- FLAIR), diffusion-weighted imaging (DWI) (including https://github.com/Svdvoort/DeepDicomSort Neuroinform (2021) 19:159–184 161 Table 1 Overview of terminology used in this paper, the corresponding BIDS terminology and meaning of each term Term BIDS term Meaning Modality Modality Type of technique used to acquire a scan (e.g. MRI, CT) Subject Subject A person participating in a study Site Site Institute at which a scan of the subject has been acquired Session Session A single visit of a subject to a site in which one or more scans have been acquired Scan Data acquisition/run A single 3D image that has been acquired of a subject in a session Slice N/A A single 2D cross-section that has been extracted from a scan Scan type Modality Specific visual appearance category of a scan (e.g. T1w, T2w) Sample N/A A single input for the CNN Class N/A An output category of the CNN DICOM DICOM A data format used to store medical imaging data. In addition to the imaging data, DICOM files can also store metadata about the scanner equipment, the specific imaging protocol and clinical information. NIfTI NIfTI A data format used to store (neuro) medical imaging data. Brain Tumor Dataset 2016) and TCGA-LGG (Pedano et al. 2016) collections from TCIA (Clark et al. 2013). Two datasets from The Our method was initially developed and subsequently tested Norwegian National Advisory Unit for Ultrasound and on brain MRI scans of subjects with brain tumors. Scans Image Guided Therapy (USIGT) (Fyllingen et al. 2016; of subjects with brain tumors were used because the brain Xiao et al. 2017) were also included in the brain tumor tumor imaging protocols used to acquire these scans usually train set. In total, the data originated from 17 different sites, span a wide array of scan types, including pre-contrast and and the scans were acquired on at least 29 different scanner post-contrast scans. The brain tumor dataset consisted of a models from 4 different vendors (GE, Hitachi, Philips, and train set and an independent test set, which in total included Siemens). data from 11 different sources. The subjects were distributed The brain tumor test set contained 2369 scans of 302 among the brain tumor train set and brain tumor test set different sessions from 192 subjects. These scans were before starting any experiments, and the data was divided included from the brain images of tumors for evaluation such that the distribution of the scan types was similar in (BITE) dataset (Mercier et al. 2012) as well as the Clinical the train set and the test set. We chose to put all subjects Proteomic Tumor Analysis Consortium Glioblastoma Mul- that originated from the same dataset in either the train set tiforme (CPTAC-GBM) (National Cancer Institute Clinical or test set to test the generalizability of our algorithm. Thus, Proteomic Tumor Analysis Consortium (CPTAC) 2018), all scans of a subject were either all in the brain tumor train Repository of Molecular Brain Neoplasia Data (REM- set or all in the brain tumor test set, and no data leak could BRANDT) (Scarpace et al. 2015), and Reference Image take place, precluding an overly optimistic estimation of the Database to Evaluate Therapy Response: Neuro MRI performance of our method. In this way, a good performance (RIDER Neuro MRI) (Barboriak 2015) collections from the of our method on the test set could not be the result of TCIA. In total, the data originated from 8 different sites, the algorithm having learned features that are specific to a and the scans were acquired on at least 15 different scanner particular site or scanner. models from 4 different vendors (GE, Philips, Siemens, and The brain tumor train set contained 11065 scans of 1347 Toshiba). different sessions from 719 subjects. These scans were For some scans, the scanner type was not available in included from the Brain-Tumor-Progression (Schmainda the DICOM tags (DICOM tag (0008, 1090)); thus, the data and Prah 2018), Ivy Glioblastoma Atlas Project (Ivy GAP) variation in the number of scanners could be even larger. (Shah et al. 2016), LGG-1p19qDeletion (Erickson et al. All subjects included in the brain tumor dataset had a 2016; Akkus et al. 2017), TCGA-GBM (Scarpace et al. (pre-operative or post-operative) brain tumor. The scans in 162 Neuroinform (2021) 19:159–184 the datasets were manually sorted, and T1w, T1wC, T2w, were included since the derived category encompasses all PDw, T2w-FLAIR, DWI, PWI-DSC, and derived images diffusion and perfusion derived imaging. These PWI-ASL were identified. The different types of derived images were derived maps explain the 47 3D scans in Table 3. combined into a single category, as the derivation of these images is often inconsistent among scanners and vendors, DeepDicomSort and thus these images need to be rederived from the raw data (e.g. the original DWI or PWI-DSC scan). The pipeline of our proposed method, DeepDicomSort, The details of the brain tumor train set and brain tumor consisted of three phases: test set are presented in Table 2. An example of the eight 1. Pre-processing: prepare the scans as an input for the scan types for a single subject from the brain tumor test set CNN can be seen in Fig. 1. 2. Scan type prediction: obtain the predicted scan type using the CNN ADNI Dataset 3. Post-processing: use the predictions to sort the dataset In order to evaluate the results of the algorithm on non- By passing a dataset through this pipeline, it can be tumor brain imaging, we used the ADNI dataset (adni.loni. turned into a BIDS compliant dataset, or it can be structured usc.edu). The ADNI was launched in 2003 as a public- according to a user-defined layout. If one chooses to create a private partnership, led by Principal Investigator Michael BIDS compliant dataset, the scans are stored as NIfTI files; W. Weiner, MD. The primary goal of ADNI has been to if a user-defined structure is used, the scans are stored as test whether serial magnetic resonance imaging, positron DICOM files. An overview of the DeepDicomSort pipeline emission tomography, other biological markers, and clinical is presented in Fig. 2. and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and Pre-Processing early Alzheimer’s disease. For up-to-date information, see adni-info.org. As a first pre-processing step, all DICOM files were We used the baseline and screening data of 1318 subjects, converted to NIfTI format using dcm2niix (Li et al. 2016), resulting in 7227 scans. These scans originated from 67 as this simplifies the further processing of the scans. This different sites and were acquired on 23 different scanner step was skipped for the USIGT and BITE datasets, as these models from 3 different vendors (GE, Philips, and Siemens). were already provided in NIfTI format (no DICOM files Details of the ADNI dataset are presented in Table 3.Since were available). no contrast is administered to subjects in the ADNI study, In the next step, the number of dimensions of each scan there are no T1wC or PWI-DSC scans in this dataset. The was automatically determined. Although most scans were ADNI dataset does include arterial spin labeling perfusion- 3-dimensional, some scans happened to be 4-dimensional. weighted imaging (PWI-ASL), however since our algorithm This was the case for some DWI scans, which consisted of was not designed to recognize these scans, they were multiple b-values and potentially b-vectors, and for some excluded. The derived maps from these PWI-ASL scans PWI-DSC scans, which contained multiple time points. If Table 2 Overview of data in the brain tumor dataset Brain tumor train set Brain tumor test set Scan type Ax Cor Sag 3D Total Ax Cor Sag 3D Total T1w 580 14 872 454 1920 206 2 202 26 436 T1wC 964 526 298 1040 2828 208 133 97 172 610 T2w 1151 411 23 31 1616 232 46 16 1 295 PDw 413 40 0 0 453 145 36 0 0 181 T2w-FLAIR 991 39 4 50 1084 221 3 0 32 256 DWI 1359 0 0 0 1359 347 0 0 0 347 PWI-DSC 669 0 0 0 669 87 0 0 0 87 Derived 1136 0 0 0 1136 157 0 0 0 157 Total 7263 1030 1197 1575 11065 1603 220 315 231 2369 The number of scans for each scan type and the different spatial orientations (axial, coronal, sagittal and 3D) are specified Neuroinform (2021) 19:159–184 163 Fig. 1 Examples of the different scan types for a single subject from the brain tumor test set (a) T1w (b) T1wC (c) T2w (d) PDw (e) T2w-FLAIR (f) DWI (g) PWI-DSC (h) Derived (ADC) a scan was 4-dimensional, the first (3D) element of the 25 individual slices of 256×256 voxels. The slice extraction sequence was extracted and was subsequently used instead was then followed by an intensity scaling of each slice. of the full 4-dimensional scan. This extraction was done to The intensity was scaled such that the minimum intensity make sure that the CNN would also recognize scan types was 0, and the maximum intensity was 1 to compensate for that generally contain repeats in situations where this was intensity differences between slices. These pre-processed not the case. For example, this could be the case when the slices were than used as input samples for the CNN. No data different b-values of a DWI scan were stored as multiple, augmentation was used, as the large number of scans and separate (3D) scans instead of a single (4D) scan. Since different data sources that were used to train the algorithm the information that a scan is 4-dimensional can aid the already ensured sufficient natural variation in the samples, algorithm in recognizing the scan type, a “4D” label was obviating the need for additional augmentation. attached to each scan. This 4D label was set to 1 if the scan After applying these pre-processing steps, the brain was 4-dimensional, and to 0 if it was not. tumor train set consisted of 276625 samples, the brain tumor All scans were then reoriented to match the orientation test set consisted of 59225 samples, and the ADNI dataset of a common template using FSL’s reorient2std (Jenkinson consisted of 180675 samples. et al. 2012). After this step, the scans were resampled to 256 × 256 × 25 voxels, using cubic b-spline interpolation, Network while maintaining the original field of view. All of these resampled (3D) scans were split into (2D) slices, resulting in A CNN was used to classify the samples into one of eight different classes: T1w, T1wC, T2w, PDw, T2w-FLAIR, DWI, PWI-DSC, or derived. The architecture of the CNN is Table 3 Overview of data in the ADNI dataset shown in Fig. 3. This architecture was inspired by the VGG ADNI dataset network (Simonyan and Zisserman 2015). The network was implemented using TensorFlow 1.12.3 Scan type Ax Cor Sag 3D Total (Abadi et al. 2016). The cross-entropy between the predicted and ground truth labels was used as a loss T1w 0 0 276 2380 2656 function. Weights were initialized using Glorot Uniform T1wC 0 0 0 0 0 initialization (Glorot and Bengio 2010). We used Adam as T2w 1725 488 5 0 2218 an optimizer (Kingma and Ba 2015), which started with a PDw 1069 0 0 0 1069 learning rate of 0.001, β = 0.9, and β = 0.999, as these 1 2 T2w-FLAIR 1 0 3 488 492 were proposed as reasonable default values (Kingma and Ba DWI 558 0 2 0 560 2015). The learning rate was automatically adjusted based PWI-DSC 0 0 0 0 0 on the training loss; if the training loss did not decrease Derived 183 0 2 47 232 during 3 epochs, the learning rate was decreased by a Total 3536 488 288 2915 7227 −7 factor 10, with a minimum learning rate of 1 · 10 .The network could train for a maximum of 100 epochs, and the The number of scans for each scan type and the different spatial network automatically stopped training when the loss did orientations (axial, coronal, sagittal and 3D) are specified 164 Neuroinform (2021) 19:159–184 Fig. 2 Overview of the DeepDicomSort pipeline. Scans are first converted from DICOM to NIfTI format and pre-processed. During the pre-processing the scan is split into 25 individual slices, that are then classified as one of eight scan types by the CNN. The predictions of the individual slices are combined in a majority vote and the predicted scan type of each scan is used to structure the dataset. DeepDicomSort can structure either the original DICOM files, or the NIfTI files. In the last case the dataset turns into BIDS compliant dataset not decrease during 6 epochs. We used a batch size of 32. Post-Processing We arrived at this CNN design and these settings by testing multiple different options and selecting the best performing Once the scan type of each scan is predicted, these one. Details about the optimization of the settings are predictions can then be used in (optional) post-processing presented in Section “Experiments”, Fig. 4, and Appendix steps to automatically structure the dataset. We provide two A. options for the structured format: During the training of the network, all slices were – Sort the original DICOM files; this can be done in a inputted to the CNN as individual samples, and no user-defined folder structure. information about the (possible) relation between different – Sort the NIfTI files; in this case the BIDS format is slices was provided. After training the network, the scan used. type of a scan was predicted by passing all 25 slices of the scan through the CNN and then combining these individual During the post-processing, the spatial orientation of the slice predictions using a majority vote. scan (axial, coronal, sagittal, or 3D) is also determined Fig. 3 The architecture of the CNN. The convolutional blocks consisted of N 2D convolutional filters followed by batch normalization and a parametric rectified linear unit. The output size of the convolutional blocks and pooling layers is specified Neuroinform (2021) 19:159–184 165 Fig. 4 Overview of Experiment I. In this experiment, the brain tumor train set was used to obtain the optimal model parameters and to train the algorithm. The trained model was then evaluated on the brain tumor test set based on the direction cosines (DICOM tag (0020, 0037)), motion-corrected or is a derived image, based on specific which can be used to define the structured layout when keywords being present in the image type DICOM tag. choosing to sort the DICOM files. These characteristics can also be used in the heuristic file. Although more scan metadata can be used to define the HeuDiConv heuristic, such as subject gender and referring physician, we considered this metadata irrelevant for our purpose of scan HeuDiConv is a heuristic-centric DICOM converter, which type prediction. In addition, this kind of metadata was often uses information from the DICOM tags, along with a user- missing due to anonymization. defined heuristic file to organize an unstructured DICOM dataset into a structured layout. HeuDiConv is currently Experiments one of the most widespread, publicly available methods that can structure an unsorted DICOM dataset. Therefore, we Evaluation of DeepDicomSort used HeuDiConv as a benchmark so we could compare our method, which is based on the visual appearance of a scan, We performed two experiments in which we constructed and with a method that is based on the metadata of a scan. evaluated our method, to show the generalizability among Before HeuDiConv can be used to sort a dataset, different datasets: one first needs to define the heuristic file, which is – Experiment I: Algorithm trained on brain tumor train essentially a translation table between the metadata of a set and tested on brain tumor test set scan and its scan type. This heuristic file is based on – Experiment II: Algorithm trained on brain tumor dataset scan metadata that is extracted from the DICOM tags. (brain tumor train set and brain tumor test set), and Available metadata includes image type, study description, tested on ADNI dataset series description, repetition time, echo time, size of the scan along 4 dimensions, protocol name, and sequence In Experiment I we developed the algorithm and tried name. HeuDiConv also determines whether a scan is different CNN architectures, pre-processing settings, and optimizer settings, collectively referred to as the model parameters, using a train/validation split of the brain tumor https://github.com/nipy/heudiconv 166 Neuroinform (2021) 19:159–184 train set. We then selected the best performing model number of scans. The per-class accuracy was defined as parameters and trained a CNN using the whole brain tumor the number of correctly predicted scans of a specific scan train set. Once the model was trained, its performance was type divided by the total number of scans of that scan type. evaluated on the brain tumor test set. In Experiment I, the We also computed the confusion matrices, which show the brain tumor test set was only used to evaluate the results relationship between the ground truth and predicted class. and was left untouched during the development and training To visualize which parts of the slice contributed most of the algorithm. Figure 4 shows an overview of the model to the prediction of the CNN, we generated saliency maps parameter selection, training and testing steps, and the data (Simonyan et al. 2014). Saliency maps were generated by used in Experiment I. More details about the selection of calculating the gradient of a specific class with respect to the optimal model parameters and the results of other model each input pixel, thus giving a measure of the contribution parameters can be found in Appendix A. of each pixel. To obtain sharper maps, we used guided In Experiment II we used the ADNI dataset as a test set backpropagation (Springenberg et al. 2015) and applied a to see if our method also generalizes to scans in which no rectified linear activation to the obtained maps. Saliency brain tumor was present. In this experiment, we trained the maps were generated for all slices of the scans of the CNN using the whole brain tumor dataset (a combination of example subject shown in Fig. 1, based on the trained all the data in the brain tumor train set and brain tumor test model from Experiment I. Additional saliency maps were set) and then evaluated the performance of the model on the generated for 20 samples of each scan type that were ADNI dataset. No model parameter selection was done in randomly selected from the test sets of Experiment I and this experiment, instead the optimal model parameters that Experiment II. The saliency maps for the samples from were obtained from Experiment I were used. Thus, apart Experiment I were generated using the CNN trained in from training the CNN on a larger dataset, the methods used Experiment I, and for the samples from Experiment II the in Experiment I and Experiment II were the same. Figure 5 CNN trained in Experiment II was used. By generating shows an overview of the training and testing steps and the saliency maps for multiple samples, we could show the data used in Experiment II. In this experiment, no T1wC and behavior of our algorithm for different scan appearances. PWI-DSC scans were present in the test set, however in a Some of these samples contained tumors, contained imaging real-world setting one may not know a priori whether these artifacts or had a low image quality. Thus, these saliency scan types were present or absent. Thus, we still allowed maps also showed the robustness of our algorithm to the model to predict the scan type as one of these classes to unusual scan appearance. To gain some insight into the mirror this realistic setting. behavior of each convolutional layer we determined the To evaluate the performance of our algorithm, we feature maps of each convolutional layer. We calculated the calculated the overall accuracy and the per-class accuracy feature maps for the T1w slice shown in Fig. 1 by passing of the classification. The overall accuracy was defined as it through the network and determining the output of each the number of correctly predicted scans divided by the total filter after each convolutional layer. Fig. 5 Overview of Experiment II. In this experiment the brain tumor dataset was used to train the algorithm, and the trained model was then evaluated on the ADNI dataset Neuroinform (2021) 19:159–184 167 Table 4 DICOM tag numbers and descriptions of the DICOM tags Comparison with HeuDiConv extracted for the HeuDiConv heuristic We compared the performance of HeuDiConv and DeepDi- Tag description Tag number comSort using the data from Experiment I, since the data Image type 0008,0008 72 unique instances in Experiment II did not include all scan types. When using Study description 0008,1030 435 unique instances HeuDiConv, only the scans which were available in DICOM Series description 0008,103E 1215 unique instances format could be processed. This meant that the scans from the USIGT dataset were removed from the brain tumor train Repetition time 0018,0080 Mean ± std: 3912 ± 4078 set, and the scans from the BITE dataset were removed Echo time 0018,0081 Mean ± std: 52.11 ± 48.9 from the brain tumor test set, as these were not available Number of rows in image 0028,0010 Range: 128 - 1152 in DICOM format. Thus, 86 scans (43 T1wC and 43 T2w- Number of columns in image 0028,0011 Range: 128 - 1152 FLAIR) were removed from the brain tumor train set and 27 scans (all T1wC) were removed from the brain tumor test For text-based tags the number of unique instances is shown and for numerical-based tags the distribution is shown, based on the scans in set, reducing the train set to 10979 scans and the test set to the brain tumor train set 2342 scans. To construct our heuristic, we first extracted all the relevant DICOM tags from the scans in the brain tumor for a human (for example, the most superior and inferior train set, see Table 4.Table 4 also shows the number of slices). unique occurrences for text-based tags and the distribution of the numerical tags in the brain tumor train set. An Experiment II - Evaluation on ADNI Dataset iterative approach was followed to construct the heuristic, where rules were added or adjusted until the performance The results from Experiment II (evaluation on the ADNI of HeuDiConv on the brain tumor train set could no longer dataset, containing scans of subjects without brain tumors) be increased, see Fig. 6. Our initial heuristic was a simple are reported in Table 5. Just like in Experiment I the one, based solely on certain text being present in the series network was trained for 96 epochs. In this experiment description. For example, if the text “T1” was present in the our method achieved an overall accuracy of 98.5%. It series description, it was considered a T1w scan. took approximately 22 hours to train the network of this To compare the performance of HeuDiConv with the experiment using an Nvidia Titan V GPU with 12 GB performance of DeepDicomSort the overall accuracy and memory. per-class accuracy of the scan type predictions obtained The highest per-class accuracy was achieved for the T1w from HeuDiConv were calculated. scans (100.0%), whereas the T2w scans had the lowest accuracy (95.1%). Most of the incorrectly predicted T2w scans were predicted as T1wC or PDw scans. Furthermore, Results although no T1wC and PWI-DSC scans were present in the test set used in this experiment, our method incorrectly Experiment I - Evaluation on Brain Tumor Dataset classified 40 scans as T1wC (mainly T2w scans) and 3 scans as PWI-DSC scans (all DWI scans). The full confusion The results from Experiment I (evaluation on the brain matrix can be found in Appendix B. tumor test set, containing scans of subjects with brain tumors) are reported in Table 5. The network was trained Focus of the Network for 96 epochs. In this experiment our method achieved an overall accuracy of 98.7%. Figure 7 shows the saliency maps for the different scan The highest per-class accuracy was achieved for the types, for the same slices as in Fig. 1. For most scan types, PDw and PWI-DSC scans (100.0% for both), whereas the CNN seemed to focus on the ventricles, the cerebral the T2w-FLAIR scans had the lowest accuracy (93.0%). spinal fluid (CSF) around the skull, the nose, and the eyes. The confusion matrices show that most of the incorrectly For the PDw slice, the CNN did not have a specific focus predicted T2w-FLAIR scans were classified as T1w scans on the ventricles and did not seem to have a particular focus (see Appendix B). Appendix C shows the performance of inside the brain. The DWI and derived slices also showed our method on a per-slice basis before the majority vote has some focus outside of the skull, probably because of the taken place to determine the scan class, which shows that the artifacts outside of the skull that these scan types often per-slice accuracy is lower than the per-scan accuracy. This feature (as can be seen in Fig. 7h). We have created saliency is not surprising since there are slices in a scan from which maps for all 25 slices of the scans shown in Fig. 1,which it is almost impossible to determine the scan type even are shown in Appendix E. For most other slices the focus 168 Neuroinform (2021) 19:159–184 Fig. 6 Overview of the HeuDiConv experiment. In this experiment the scans from the brain tumor train set that were available in DICOM format were used to construct the heuristic file. HeuDiConv used this heuristic file to predict the scan type of the scans from the brain tumor test set which were available in DICOM format of the CNN was the same as for the slices from Fig. 7. poor imaging quality or artifacts. The feature maps of all Furthermore, the presence of a tumor did not disturb the convolutional layers are shown in Appendix G.For the prediction as also evidenced by the high accuracy achieved shallow convolutional layers, some filters seemed to detect in Experiment I. Only on the most superior and inferior the skull without looking at the brain tissue, whereas other slices did the CNN struggle, probably due to the fact that layers seemed to focus more on specific brain structures the brain was barely visible on those slices. such as the CSF. Interpreting the deeper convolutional Additional saliency maps for randomly selected samples layers gets harder as the feature maps of those layers have a from the test sets of Experiment I and Experiment II lower resolution. are shown in Appendix F. These examples show that our method is robust to heterogeneity in the visual appearance HeuDiConv Predictive Performance of the scans, as well as to the presence of tumors, the presence of imaging artifacts, and poor image quality. The top-level rules of the derived heuristic for HeuDiConv This is demonstrated by the fact that the CNN focused were mainly based on the series description, with additional on the same brain structures for almost all of the slices lower-level rules based on the echo time, image type, and and correctly predicted the scan type even for slices with the derived status of the scan. The overall accuracy obtained within the brain tumor train set after several iterations of improving the heuristic was 91.0%. The overall accuracy in the brain tumor test set was 72.0%. The results for each class Table 5 Overall accuracy and per-class accuracy achieved by can be found in Table 6, along with a comparison to the DeepDicomSort in Experiment I and Experiment II accuracy of the CNN evaluated on the brain tumor test set. Experiment I Experiment II For the evaluation of the CNN’s performance, we included the same scans as present in the test set for HeuDiConv (i.e. Overall 0.987 0.985 those which were available in DICOM format). Although T1w 0.993 1.000 a slightly different dataset was used for this test set, the T1wC 0.997 N/A results of the CNN in Tables 5 and 6 appear to be the T2w 0.990 0.965 same. This can be explained by the fact that only T1wC PDw 1.000 0.998 scans were removed from the test set, thus for all other T2w-FLAIR 0.930 0.951 classes the accuracy remained the same. Furthermore, due DWI 0.991 0.995 to the large number of scans the difference is only visible PWI-DSC 1.000 N/A at more decimals, e.g. the overall accuracy in Table 5 was Derived 0.994 0.983 98.73% whereas in Table 6 it was 98.72%. These results Neuroinform (2021) 19:159–184 169 Fig. 7 Saliency maps of the scan types, generated by the CNN evaluated on the same slices as in Fig. 1. This CNN was the model obtained in Experiment I (a) T1w (b) T1wC (c) T2w (d) PDw (e) T2w-FLAIR (f) DWI (g) PWI-DSC (h) Derived (ADC) show that DeepDicomSort outperformed HeuDiConv both from different sites, scanners, subjects, and scan protocols. in terms of the overall accuracy and the per-class accuracy Our method was also able to correctly predict the scan type for all classes. Appendix D compares the time required to of scans that had poor imaging quality or contained imaging sort the datasets using either DeepDicomSort, HeuDiConv, artifacts, as can be seen in Appendix F.1. The CNN focused or by hand, which shows that DeepDicomSort is more than mainly on the ventricles, areas close to the skull, and the twice as fast as the other two methods. CSF at the edges of the brain. There was also some focus on the gray matter and white matter, although these structures seemed less relevant for the decision making of the CNN. It Discussion makes sense that the CNN focuses on the CSF, both in the ventricles and at the edges of the brain, because their visual Our results show that it is possible to use a CNN to appearance is very characteristic of the scan type. Although automatically identify the scan type of brain MRI scans and the CNN also focused on the eyes and nose, we do not use this to sort a large, heterogeneous dataset. Because of expect this to disrupt the prediction when these structures the high accuracy of our method, it can be used virtually are absent (e.g. in defaced scans). There were a lot of slices without manual verification. The CNN performed well both in which the eyes and nose were not present, such as the for scans with and without the presence of a tumor. The most inferiorly and superiorly located slices, for which the performance of our method generalizes well across scans CNN predicted the scan type correctly. Data sorting is just one step of the data curation pipeline, and in recent years more research on the automation of other Table 6 Accuracy of HeuDiConv on the brain tumor test set data curation tasks has been carried out. Some examples HeuDiConv DeepDicomSort include automatic scan quality checking (Esteban et al. 2017), motion artifact correction (Tamada et al. 2020), and Overall 0.720 0.987 missing scan type imputation from the present scan types T1w 0.963 0.993 (Lee et al. 2019). However, to automate other data curation T1wC 0.447 0.997 steps the dataset first needs to follow a structured format, T2w 0.930 0.990 making our tool a crucial first step in the overall pipeline. PDw 0.077 1.000 The increasing data complexity, both in volume and in the T2w-FLAIR 0.684 0.930 number of different types of data, not only shows a need for DWI 0.887 0.991 a proper data curation pipeline, but also shows the need for PWI-DSC 0.600 1.000 a standardized data structure for scans and their associated Derived 0.948 0.994 metadata (van Erp et al. 2011; Gorgolewski et al. 2016; Lambin et al. 2017). The widespread adoption of a common, Results of DeepDicomSort on this test set are also given, where the standardized data structure would be favorable over the use scans which were not available in the DICOM format were excluded of our tool or similar tools. Unfortunately, both in research from the test set 170 Neuroinform (2021) 19:159–184 and in clinic practice, it is currently not commonplace to metadata. While not perfect, our method does have a very provide datasets in a standardized format, thus making our high performance overall and the comparison with manual tool a valuable addition to the data curation pipeline. Even sorting shows that it considerably reduces the time required if a standardized data structure were to be widely adopted, to sort a dataset. our tool would remain valuable as a quality assessment tool. The CNN was trained and evaluated by using the Although the accuracy of our method is high overall, our ground truth labels, which were obtained by manually going method predicted the incorrect scan type in some cases. For through the dataset and annotating each scan according example, in Experiment I the CNN mainly misclassified to the perceived scan type. It is possible that the scan T2w-FLAIR scans. Almost all of these misclassified T2w- type was incorrectly annotated for some of the scans. FLAIR scans originated from the RIDER Neuro MRI To limit this possibility we took a second look at scans dataset. Comparing a T2w-FLAIR scan from the RIDER where there was a mismatch between the prediction from dataset with a T2w-FLAIR scan from the train set used in DeepDicomSort and the ground truth label, both for train Experiment I shows a big difference in visual appearance, datasets and test datasets. We corrected the ground truth see Fig. 8a and b. These figures show that the white label for scans that were incorrectly annotated and these matter and gray matter appear very different on the two corrected labels were used for the experiments presented in scans, even though they have the same scan type, which this paper. The labels of around 0.1% of the scans in the probably confused the network. In Experiment II the per- dataset were corrected in this way. Although it is possible class accuracy was the lowest for the T2w scans. Almost that there were still some incorrectly annotated scans, all of the misclassified T2w scans were hippocampus based on these findings we expect this fraction to be very scans, an example of which can be seen in Fig. 8c. The small. misclassification of these scans can be explained by their We chose a CNN as the basis of our method because limited field of view. Since the CNN did not see any such we wanted to minimize the number of pre-processing steps. scans in the training set, as all scans in the training set Using more traditional machine learning approaches, such covered the full brain, it is not surprising that our method as a support vector machine or random forest, would require failed in these cases. The saliency maps in Fig. 8 show that the extraction of relevant features from each scan. This the CNN had difficulty focusing on the relevant parts of the would complicate our method as we would first have to slice. For example, for the T2w-FLAIR slices in Figs. 7e hand-craft these features and add a pre-processing step in and 8d it can be seen that the CNN focused mainly on which we extract these features from the scan. Furthermore, the ventricles, whereas in Fig. 8e there was more focus on the extraction of these features would likely require a brain the edge of the brain, similar to the T1w slice in Fig. 7a. mask to prevent the features from being influenced too Although we did not achieve a perfect prediction accuracy, much by the background. The creation of this brain mask it is unlikely that any scan sorting method ever will, due would add a pre-processing step, and could be a potential to the large heterogeneity in scan appearance and scan source of error. Instead, by using a CNN, no features had Fig. 8 Examples of scans our method misclassified (b and c) and a correctly classified scan (a) as comparison, along with their saliency maps. The T2w- FLAIR scan in (b) is probably misclassified as its appearance is very different from T2w-FLAIR (a) T2w-FLAIR scan from (b) T2w-FLAIR scan (c) T2w scan from the scans that were in the train the Ivy GAP collection. from the RIDER Neuro ADNI dataset. This scan dataset. The T2w scan in (c)is This scan type was cor- MRI collection. This scan type was misclassified as a probably misclassified because rectly predicted. type was misclassified as T1wC. it has a very limited field of view a T1w. (d) Saliency map of the (e) Saliency map of the (f) Saliency map of the correctly classified scan misclassified scan from misclassified scan from (c). from (a). (b). Neuroinform (2021) 19:159–184 171 to be defined as the CNN automatically learns the relevant performances on the same dataset. Remedios et al. tested features. The CNN also does not require a brain mask, as it their method on 1281 scans, which came from 4 different has learned to ignore the background and focus on the brain sites and 5 different scanner models. Their dataset was thus itself, as shown by the saliency maps. considerably smaller and less heterogeneous than our test We opted for a 2D CNN instead of a 3D CNN, because data set. Furthermore, our method can identify more scan this allowed us to extract a larger region of the scan to be types and does so using only a single CNN instead of used as an input for the CNN. By using a 2D CNN, this three. region could encompass a full slice of the brain enabling the A limitation of our method is that it can only classify a CNN to learn features that capture the relative differences scan as one of the eight scan types for which it was trained. in appearance of the various tissue types (white matter, gray Thus, when it is presented with an unknown scan type matter, CSF, bone, skin, etc.), which are characteristic of (e.g. PWI-ASL or dynamic contrast-enhanced perfusion- the scan type. Furthermore, because a 2D CNN typically weighted imaging), our method will (wrongly) predict it requires less memory than a 3D CNN (Prasoon et al. 2013), as one of the other classes. In future work, this limitation it requires less computational power (making our method could be addressed in two ways. The first option would be accessible to a broader audience), and also requires less time to adapt the network to either recognize more scan types to train and evaluate (Li et al. 2014). or to replace one of the existing classes by a different one. Our method achieved a better overall accuracy and per- This can be done using a transfer learning approach by fine- class accuracy than HeuDiConv. The results obtained using tuning the weights obtained in this research on additional HeuDiConv show the difficulty of creating a method based data (Tajbakhsh et al. 2016). Since we did not have enough on DICOM tags that generalizes well to other datasets. data for other scan types, we limited the CNN to the eight Even within one dataset, it can be difficult to create a classes for which we did have enough data. A second option heuristic that correctly maps the scan metadata to the scan would be to extend our method to allow out-of-distribution type; for example Table 4, shows that 1215 different series detection (DeVries and Taylor 2018). In this methodology, descriptions are used just for the eight scan types considered the network could not only predict the scan type of a scan in this research. HeuDiConv has particular difficulty in but could also indicate if a scan belongs to an unknown identifying scans that have similar metadata but have scan type. This requires a significant change to the model different scan types. For example, this is reflected in the architecture, which we considered outside the scope of this results for the T1w and T1wC scans. These scans usually research for now. have similar scan settings and series descriptions, making Another limitation is the use of reorient2std from FSL, it hard to determine whether a scan is obtained pre- or which means that (this part of) the code cannot be used post-contrast administration. The same difficulty plays a in a commercial setting. Commercially allowed alternatives role for T2w and PDw scans, which are often acquired exist, such as the ‘reorient image’ function from ANTs at the same time in a combined imaging sequence and (http://stnava.github.io/ANTs/), however these have not thus have the same series description. In our timing results been tested as part of the DeepDicomSort pipeline. (Appendix D), it was faster to sort the dataset by hand than A promising future direction could be to predict the to use HeuDiConv. This was caused by HeuDiConv often metadata of a scan based on its visual appearance. For misclassifying T2w-FLAIR and T1wC scans as a different example, one could predict the sequence that has been used scan type, and thus a lot of manual time was needed to to acquire a scan (e.g. MPRAGE or MP2RAGE in the case correct these mistakes. of a T1w scan), or reconstruct the acquisition settings of A method that, similar to ours, classifies the scan type a scan (e.g. the spin echo time). In this research, we did based on the visual appearance of the scan was proposed not consider these types of predictions because we first by Remedios et al. (2018) called -net. Their method wanted to focus on the dataset organization, however we can identify T1w, T1wC, T2w, and pre-contrast and post- think that our method can provide a basis for these types of contrast FLAIR scans. Remedios et al. do this using a predictions. cascaded CNN approach where a first CNN is used to classify a scan as T1-weighted, T2-weighted, or FLAIR. Two other CNNs are then used to classify a scan as pre- Conclusion contrast or post-contrast, one CNN for the T1-weighted scans and one CNN for the FLAIR scans. -net achieved an We developed an algorithm that can recognize T1w, T1wC, overall accuracy of 97.6%, which is lower than our overall T2w, PDw, T2w-FLAIR, DWI, PWI-DSC, and derived accuracy of 98.7% (Experiment I) and 98.5% (Experiment brain MRI scans with high accuracy, outperforming the II). Since Remedios et al. did not make their trained model currently available methods. We have made our code and publicly available, it was not possible to directly compare trained models publicly available under an Apache 2.0 172 Neuroinform (2021) 19:159–184 license. Using the code and the trained models, one can run LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diag- nostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis the DeepDicomSort pipeline and structure a dataset either Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; according to the BIDS standard or a self-defined layout. Takeda Pharmaceutical Company; and Transition Therapeutics. The We think that scan type recognition is an essential step in Canadian Institutes of Health Research is providing funds to sup- any data curation pipeline used in medical imaging. With port ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health this method, and by making our code and trained models (www.fnih.org). The grantee organization is the Northern California available, we can automate this step in the pipeline and Institute for Research and Education, and the study is coordinated make working with large, heterogeneous datasets easier, by the Alzheimer’s Therapeutic Research Institute at the University faster, and more accessible. of Southern California. ADNI data are disseminated by the Labo- ratory for Neuro Imaging at the University of Southern California. Compliance with Ethical Standards Information Sharing Statement Conflict of interests The authors declare that they have no conflict of Code and trained models for the algorithms constructed in interest. this paper are publicly available on GitHub under an Apache 2.0 license at https://github.com/Svdvoort/DeepDicomSort. Open Access This article is licensed under a Creative Commons Part of the pre-processing code depends on FSL. Since FSL Attribution 4.0 International License, which permits use, sharing, is only licensed for non-commercial use, (this part of) the adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the code cannot be used in a commercial setting. source, provide a link to the Creative Commons licence, and indicate All data used in this research is publicly avail- if changes were made. The images or other third party material in this able. The Cancer Imaging Archive collections mentioned article are included in the article’s Creative Commons licence, unless are all publicly available at cancerimagingarchive.net indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended (RRID:SCR 008927). The datasets from the Norwegian use is not permitted by statutory regulation or exceeds the permitted National Advisory Unit for Ultrasound and Image-Guided use, you will need to obtain permission directly from the copyright Therapy are publicly available at sintef.no/projectweb/ holder. To view a copy of this licence, visit http://creativecommons. usigt-en/data. The BITE collection is publicly available at org/licenses/by/4.0/. nist.mni.mcgill.ca/?page id=672. The Alzheimer’s Disease Neuroimaging Initiative (RRID:SCR 003007) data is avail- able at adni.loni.usc.edu, after submitting an application Appendix A: Model Parameter Selection which must be approved by the ADNI Data Sharing and Publications Committee. To determine the optimal model parameters (i.e. the CNN architecture, pre-processing settings and optimizer settings) Acknowledgements Sebastian van der Voort acknowledges funding of the CNN used in the DeepDicomSort pipeline, we by the Dutch Cancer Society (KWF project number EMCR 2015- evaluated the performance of different model parameters on 7859). We would like to thank Nvidia for providing the GPUs used in this the brain tumor train set, the train set from Experiment I. research. Before carrying out the experiments, the brain tumor train This work was carried out on the Dutch national e-infrastructure set was partitioned into a train set and validation set. 85% with the support of SURF Cooperative. of the scans was used as a train set and 15% of the scans The results published here are in part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/. was used as a validation set. Only one such split was made Data used in this publication were generated by the National Cancer since training and validating the network for multiple splits Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC). would be too time-consuming. During the splitting, all slices Data collection and sharing for this project was funded by the of a scan where either all in the train set or all in the ADNI (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). validation set to prevent data leakage between the train set ADNI is funded by the National Institute on Aging, the National Insti- and validation set. tute of Biomedical Imaging and Bioengineering, and through gen- We compared five different CNN architectures: the erous contributions from the following: AbbVie, Alzheimer’s Asso- architecture proposed in this paper, Alexnet (Krizhevsky ciation; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, et al. 2012), ResNet18 (He et al. 2016), DenseNet121 Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and (Huang et al. 2017) and VGG19 (Simonyan and Zisserman Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affili- 2015). For all networks, the same pre-processing approach ated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO as described in Section “Pre-Processing” was used, with Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development the optimizer settings as described in Section “Network”. The only difference was that the learning rate reduction was https://github.com/Svdvoort/DeepDicomSort based on the validation loss instead of the training loss. Neuroinform (2021) 19:159–184 173 Table 7 Overall training accuracy, overall validation accuracy, and the time it took to train each network for the different network architectures test in the model parameter selection Train Validation Training time (h) Our model 0.999 0.971 14.8 2D AlexNet 0.255 0.255 11.0 2D DensNet121 1.000 0.980 21.8 2D ResNet18 1.000 0.973 26.4 2D VGG19 1.000 0.948 34.6 3D DenseNet121 1.000 0.868 22.9 3D ResNet18 1.000 0.832 27.7 A train/validation split of the brain tumor train set was used to determine the performance Except for the AlexNet model, all the other models were able to properly learn, and the final validation accuracy was roughly the same for all models. The DenseNet model achieved the highest validation accuracy of 98%, a full overview of the performance of the different CNN architectures can be found in Table 7. These results show that multiple models work for the problem at hand. Ultimately, we chose to employ our proposed architecture because it is less computationally intensive than the other models. Not only does our model train faster (shown in Table 7), it also requires less time to predict the scan type of new scans, and requires less (GPU) memory. Selecting the least computationally intensive model allows a wider adoption of our tool. We also trained two 3D models to compare their perfor- mance with the 2D models. In the case of the 3D models, most of the pre-processing steps were kept the same, apart from the slice extraction. Instead of extracting 25 slices, 3D patches with a size of 90 × 90 × 15 voxels were extracted. A maximum of 10 patches per scan were extracted, in Fig. 9 Learning curves of the different network architectures tested in such a way that they covered as much of the (geometrical) the model parameter selection center of the scan as possible to ensure that the patches contained brain and not just background. We trained a 3D ResNet18 and a 3D DenseNet121; the learning curves can For the VGG19 model, the initial learning rate was lowered be seen in Fig. 9f and g. These 3D architectures achieved to 0.0001, as the model would otherwise get stuck in a a lower validation accuracy than their 2D counterparts, poor minimum early in the training stage. Different pre- 0.87 versus 0.98 for the DenseNet model and 0.83 versus processing settings (e.g. different normalization settings) 0.97 for the ResNet model. These results justified our and model settings (e.g. learning rate) were tested. However, choice for a 2D model, which not only achieved a higher here we show only the effect of the different architectures accuracy but was also less computationally intensive. using the same pre-processing settings for all models to make a fair comparison and since we obtained the best results using these pre-processing settings. Appendix B: Confusion Matrices The learning curves for the different models are shown in Fig. 9. The learning curve for the AlexNet model (Fig. 9b), shows that this model is the only one that was not capable The confusion matrices for Experiment I (Table 8)and Experiment II (Table 9), which show the relation between to properly train for the task at hand, probably due to the low number of weights that can be optimized in this model. the ground truth scan type and the predicted scan type. 174 Neuroinform (2021) 19:159–184 Table 8 Confusion matrix of results from Experiment I Ground truth Predicted T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived T1w 433 2 0 1 0 0 0 0 T1wC 2 608 0 0 0 0 0 0 T2w 1 2 292 0 0 0 0 0 PDw 0 0 0 181 0 0 0 0 T2w-FLAIR 18 0 0 0 238 0 0 0 DWI 0 0 0 0 0 344 2 1 PWI-DSC 00 0 0 0 0 87 0 Derived 0 0 1 0 0 0 0 156 Table 9 Confusion matrix of results from Experiment II Ground truth Predicted T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived T1w 2655 1 0 0 0 0 0 0 T1wC 00 000 0 0 0 T2w 0 34 2140 44 0 0 0 0 PDw 0 0 2 1067 0 0 0 0 T2w-FLAIR 6 5 0 1 468 12 0 0 DWI 00 000 557 3 0 PWI-DSC 00 000 0 0 0 Derived 00 100 3 0 228 Appendix C: Predictive Performance on Appendix D: Time Comparison Between Per-Slice Basis DeepDicomSort, HeuDiConv and Manual Sorting Table 10 shows the accuracy of the CNNs from Experiment We estimated the potential time that can be saved by using I and Experiment II on a per-slice basis instead of on a DeepDicomSort to sort a dataset instead of doing so by per-scan basis. These results are obtained by comparing hand or using HeuDiConv. We did so by assuming the the predicted class of a slice directly with the ground truth hypothetical situation where one has an automated tool that class of that slice before the individual slice predictions are requires the T1wC and T2w-FLAIR scans as inputs, and combined by a majority vote to obtain the scan type. we compared the time needed to find the T1wC and T2w- FLAIR scans for all subjects and sessions in the brain tumor test set. The manual sorting was simulated by iterating Table 10 Overall accuracy and per-class accuracy achieved by DeepDicomSort in Experiment I and Experiment II on a per-slice basis over all scans in a session in random order until either the T1wC and T2w-FLAIR scans were found or until there Experiment I Experiment II were no more scans to check. The sorting of the dataset using HeuDiConv or DeepDicomSort was simulated by Overall 0.934 0.851 first iterating over all scans that were predicted as a T1wC T1w 0.942 0.814 or T2w-FLAIR by these methods, and checking whether T1wC 0.940 N/A that prediction was correct. If the predicted scan type was T2w 0.926 0.894 incorrect, the same approach as for the manual sorting was PDw 0.905 0.914 followed to find the correct scans. We assumed that, on T2w-FLAIR 0.879 0.592 average, a human required 25 seconds per scan to visually DWI 0.985 0.943 identify the correct scan type. By multiplying this time per PWI-DSC 0.925 N/A scan with the total number of scans that were iterated over, Derived 0.990 0.908 we obtained an estimate for the total time taken by each Neuroinform (2021) 19:159–184 175 T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived method to find the T1wC and T2w-FLAIR scans. We used the brain tumor test set to evaluate the timing results, since HeuDiConv was only optimized for the brain tumor dataset. D.1 Results The time required to identify the T1wC and T2w-FLAIR scan for each session in the brain tumor test set by hand was estimated to be 29.0 hours. The estimated time required to check and correct the automated scan type recognition by HeuDiConv was 35.7 hours, which excludes the time required to construct the heuristic. If the automated scan type recognition was done by DeepDicomSort instead, we estimated that 12.3 hours of manual time were required. The time required to run the DeepDicomSort pipeline on the dataset was 61.5 minutes using an Intel Xeon Processor E5-2690 v3 for pre-processing and post-processing, and an Nvidia Tesla K40m GPU to classify the samples using the CNN. If the scans identified by DeepDicomSort were used without a manual check, in which case the total sorting time was only 61.5 minutes, 527 scans would have been correctly identified. Four scans were incorrectly identified as a T1wC or T2w-FLAIR scan, for one session the T1wC would not have been found, and for 8 sessions the T2w-FLAIR would not have been found. It should be noted that with the automated methods (DeepDicomSort and HeuDiConv), one gets a fully sorted Fig. 10 Saliency maps for slices 1 through 13 of the subject from Fig. 1 dataset, whereas the sorting by hand still requires the sorting T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived of the scans that were not yet identified. Appendix E: Saliency Map for Full Scan Figures 10 and 11 show the saliency maps for all 25 slices from the scans of the example subject from Fig. 1.The CNN seems to focus on the same features as in Fig. 7, mostly on the ventricles and on the CSF at the edges of the brain. In the superior slices of the scan, it can be seen that the presence of a tumor does not disrupt the CNN. Although it looks at the edge of the tumor, it does not put a lot of focus on the tumor itself. For the most superior slices of the T1w, T1wC and T2w scans it can be seen that when the brain is no longer present in the slice the CNN loses it focus and seems to look randomly throughout the slice. Appendix F: Saliency Maps for Additional Examples F.1 Random Samples from the Brain Tumor Test Set To show the robustness of our method to differences in scan appearance, as well as to imaging artifacts, we have Fig. 11 Saliency maps for slices 14 through 25 of the subject from Fig. 1 176 Neuroinform (2021) 19:159–184 T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived randomly selected 20 slices of each scan type from the brain tumor test set. All of these slices were then passed through P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1wC 1wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d the CNN, and we determined the saliency maps along with † † † the predicted class of each slice. This is the prediction P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d based on the slice itself, and thus before the majority vote. † † The saliency maps and predicted scan types are shown in P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d Figs. 12 and 13. We have highlighted slices that contain imaging artifacts (†), have a poor image quality (†), and P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d subjects with a large head tilt (†). These saliency maps show † † that the CNN is quite robust to the presence of a tumor, the P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d presence of imaging artifacts, or poor image quality, in most † † cases the CNN still predicts the correct scan type. P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d † † F.2 Random Samples from ADNI Dataset P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1wC 1wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d The same approach as in Appendix F.1 has been applied to P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d show the saliency maps from random samples of the ADNI † † dataset. In this case, the saliency maps were derived using P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d the trained model from Experiment II instead of Experiment I. Once again the saliency maps and the predicted scan type P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d are shown in Figs. 14 and 15. We have highlighted slices Fig. 12 Saliency maps and predicted scan type of randomly drawn that contain imaging artifacts, including hippocampus scans samples from the brain tumor test set Fig. 13 Saliency maps and T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived predicted scan type of randomly † † drawn samples from the brain tumor test set P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d P Predicted: redicted: T1w T1w P Predicted: redicted: T2w T2w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † P Predicted: redicted: T T1w 1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1wC 1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d Neuroinform (2021) 19:159–184 177 Fig. 14 Saliency maps and T1w T2w PDw T2w-FLAIR DWI Derived predicted scan type of randomly drawn samples from the ADNI dataset P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1wC 1wC P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted:D DW WII † † P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d † † P Predicted: redicted: T T1w 1w P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T2w 2w P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: PDw PDw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T1wC T1wC P Predicted: redicted:D DW WII P Predicted: redicted:D DW WII with a limited field of view, (†), have a poor image quality then determined the saliency maps for these slices and the (†), and subjects with a large head tilt (†). predicted scan type, the results are shown in Fig. 16. These results show that our method is quite robust against bright spots in a scan. Only for the T1w and PWI-DSC F.3 Robustness Against Bright Noise scans there were slices that were misclassified. In the case To test the effect of potential bright spots in the scan, we of the T1w slice, there were two out of five slices that performed an experiment where random bright spots were were predicted to be T1wC. This is most likely caused introduced in the slices from Fig. 1. Within each slice 0.5% by the CNN having learned that a T1w and T1wC scan of voxels were randomly chosen, and the intensity of these have a similar appearance in general, but that the T1wC voxels was set to the maximum intensity of the slice. We scan has brighter spots. In two cases the PWI-DSC slice 178 Neuroinform (2021) 19:159–184 Fig. 15 Saliency maps and T1w T2w PDw T2w-FLAIR DWI Derived predicted scan type of randomly drawn samples from the ADNI dataset P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted:D DW WII P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted:D DW WII was misclassified as a DWI. Probably this is caused by Appendix G: Feature Map Visualizations the CNN seeing the random brightness spots outside the skull as imaging artifacts, which often show up in DWI Figures 17 through 22 show the feature maps of all filters of scans and less so in PWI-DSC scans. Although the CNN each convolutional layer for the T1w slice shown in Fig. 1. misclassified the T1w and PWI-DSC slices in some cases, It can be seen that some filters mainly identify the skull (for when bright spots were introduced on all 25 slices of the example, filter 1 from convolutional layer 1), whereas other T1w and PWI-DSC scans (randomly for each slice) and filters seem to focus on specific structure (for example, filter then passed through the network, the CNN still predicted the 4 from convolutional layer 1, which seems to identify gray correct scan type of the scan after the majority vote. matter). Neuroinform (2021) 19:159–184 179 Fig. 16 Saliency maps and T1w T1w T1wC T1wC T2w T2w PDw PDw T2w-FLAIR T2w-FLAIR D DW WII PWI-DSC PWI-DSC Deriv Derived ed predicted scan types of the derived slices from Fig. 1 after randomly setting some pixels to the maximum intensity. Every time the slice with the added noise is shown, followed by the P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T T T1wC T1wC 1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: P P PWI-DSC PWI-DSC W WII--D DS SC C P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d saliency map and predicted scan type for the same slice in the row below P Predicted: P Predicted: rreed diicctteed d:: T1w T1w T T1 1w w P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d P Predicted: P Predicted: rreed diicctteed d:: T1w T1w T T1 1w w P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d P Predicted: P Predicted: rreed diicctteed d:: T1w T1w T T1 1w w P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: P P PWI-DSC PWI-DSC W WII--D DS SC C P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: P P PWI-DSC PWI-DSC W WII--D DS SC C P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d Fig. 17 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results directly after convolutional layer F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 11 11 F Filter ilter 12 12 F Filter ilter 1 13 3 F Filter ilter 1 14 4 F Filter ilter 1 15 5 F Filter ilter 1 16 6 1. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 19 19 F Filter ilter 20 20 F Filter ilter 2 21 1 F Filter ilter 2 22 2 F Filter ilter 2 23 3 F Filter ilter 2 24 4 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 27 27 F Filter ilter 28 28 F Filter ilter 2 29 9 F Filter ilter 3 30 0 F Filter ilter 3 31 1 F Filter ilter 3 32 2 Fig. 18 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results directly after convolutional layer F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 12 12 F Filter ilter 13 13 F Filter ilter 1 14 4 F Filter ilter 1 15 5 F Filter ilter 1 16 6 2. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 20 20 F Filter ilter 21 21 F Filter ilter 2 22 2 F Filter ilter 2 23 3 F Filter ilter 2 24 4 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 28 28 F Filter ilter 29 29 F Filter ilter 3 30 0 F Filter ilter 3 31 1 F Filter ilter 3 32 2 180 Neuroinform (2021) 19:159–184 Fig. 19 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 1 12 2 F Filter ilter 13 13 F Filter ilter 14 14 F Filter ilter 1 15 5 F Filter ilter 16 16 directly after convolutional layer 3. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 2 20 0 F Filter ilter 21 21 F Filter ilter 22 22 F Filter ilter 2 23 3 F Filter ilter 24 24 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 2 28 8 F Filter ilter 29 29 F Filter ilter 30 30 F Filter ilter 3 31 1 F Filter ilter 32 32 F Filter ilter 3 33 3 F Filter ilter 3 34 4 F Filter ilter 3 35 5 F Filter ilter 3 36 6 F Filter ilter 37 37 F Filter ilter 38 38 F Filter ilter 3 39 9 F Filter ilter 40 40 F Filter ilter 4 41 1 F Filter ilter 4 42 2 F Filter ilter 4 43 3 F Filter ilter 4 44 4 F Filter ilter 45 45 F Filter ilter 46 46 F Filter ilter 4 47 7 F Filter ilter 48 48 F Filter ilter 4 49 9 F Filter ilter 5 50 0 F Filter ilter 5 51 1 F Filter ilter 5 52 2 F Filter ilter 53 53 F Filter ilter 54 54 F Filter ilter 5 55 5 F Filter ilter 56 56 F Filter ilter 5 57 7 F Filter ilter 5 58 8 F Filter ilter 5 59 9 F Filter ilter 6 60 0 F Filter ilter 61 61 F Filter ilter 62 62 F Filter ilter 6 63 3 F Filter ilter 64 64 Fig. 20 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 1 12 2 F Filter ilter 1 13 3 F Filter ilter 14 14 F Filter ilter 1 15 5 F Filter ilter 1 16 6 directly after convolutional layer 4. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 2 20 0 F Filter ilter 2 21 1 F Filter ilter 22 22 F Filter ilter 2 23 3 F Filter ilter 2 24 4 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 2 28 8 F Filter ilter 2 29 9 F Filter ilter 30 30 F Filter ilter 3 31 1 F Filter ilter 3 32 2 F Filter ilter 3 33 3 F Filter ilter 3 34 4 F Filter ilter 3 35 5 F Filter ilter 3 36 6 F Filter ilter 3 37 7 F Filter ilter 38 38 F Filter ilter 3 39 9 F Filter ilter 4 40 0 F Filter ilter 4 41 1 F Filter ilter 4 42 2 F Filter ilter 4 43 3 F Filter ilter 4 44 4 F Filter ilter 4 45 5 F Filter ilter 46 46 F Filter ilter 4 47 7 F Filter ilter 4 48 8 F Filter ilter 4 49 9 F Filter ilter 5 50 0 F Filter ilter 5 51 1 F Filter ilter 5 52 2 F Filter ilter 5 53 3 F Filter ilter 54 54 F Filter ilter 5 55 5 F Filter ilter 5 56 6 F Filter ilter 5 57 7 F Filter ilter 5 58 8 F Filter ilter 5 59 9 F Filter ilter 6 60 0 F Filter ilter 6 61 1 F Filter ilter 62 62 F Filter ilter 6 63 3 F Filter ilter 6 64 4 Neuroinform (2021) 19:159–184 181 Fig. 21 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 1 12 2 F Filter ilter 1 13 3 F Filter ilter 1 14 4 F Filter ilter 1 15 5 F Filter ilter 16 16 directly after convolutional layer 5. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 2 20 0 F Filter ilter 2 21 1 F Filter ilter 2 22 2 F Filter ilter 2 23 3 F Filter ilter 24 24 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 2 28 8 F Filter ilter 2 29 9 F Filter ilter 3 30 0 F Filter ilter 3 31 1 F Filter ilter 32 32 F Filter ilter 3 33 3 F Filter ilter 3 34 4 F Filter ilter 3 35 5 F Filter ilter 3 36 6 F Filter ilter 3 37 7 F Filter ilter 3 38 8 F Filter ilter 3 39 9 F Filter ilter 40 40 F Filter ilter 4 41 1 F Filter ilter 4 42 2 F Filter ilter 4 43 3 F Filter ilter 4 44 4 F Filter ilter 4 45 5 F Filter ilter 4 46 6 F Filter ilter 4 47 7 F Filter ilter 48 48 F Filter ilter 4 49 9 F Filter ilter 5 50 0 F Filter ilter 5 51 1 F Filter ilter 5 52 2 F Filter ilter 5 53 3 F Filter ilter 5 54 4 F Filter ilter 5 55 5 F Filter ilter 56 56 F Filter ilter 5 57 7 F Filter ilter 5 58 8 F Filter ilter 5 59 9 F Filter ilter 6 60 0 F Filter ilter 6 61 1 F Filter ilter 6 62 2 F Filter ilter 6 63 3 F Filter ilter 64 64 Fig. 22 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 1 12 2 F Filter ilter 1 13 3 F Filter ilter 14 14 F Filter ilter 1 15 5 F Filter ilter 1 16 6 directly after convolutional layer 6. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 2 20 0 F Filter ilter 2 21 1 F Filter ilter 22 22 F Filter ilter 2 23 3 F Filter ilter 2 24 4 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 2 28 8 F Filter ilter 2 29 9 F Filter ilter 30 30 F Filter ilter 3 31 1 F Filter ilter 3 32 2 F Filter ilter 3 33 3 F Filter ilter 3 34 4 F Filter ilter 3 35 5 F Filter ilter 3 36 6 F Filter ilter 3 37 7 F Filter ilter 38 38 F Filter ilter 3 39 9 F Filter ilter 4 40 0 F Filter ilter 4 41 1 F Filter ilter 4 42 2 F Filter ilter 4 43 3 F Filter ilter 4 44 4 F Filter ilter 4 45 5 F Filter ilter 46 46 F Filter ilter 4 47 7 F Filter ilter 4 48 8 F Filter ilter 4 49 9 F Filter ilter 5 50 0 F Filter ilter 5 51 1 F Filter ilter 5 52 2 F Filter ilter 5 53 3 F Filter ilter 54 54 F Filter ilter 5 55 5 F Filter ilter 5 56 6 F Filter ilter 5 57 7 F Filter ilter 5 58 8 F Filter ilter 5 59 9 F Filter ilter 6 60 0 F Filter ilter 6 61 1 F Filter ilter 62 62 F Filter ilter 6 63 3 F Filter ilter 6 64 4 182 Neuroinform (2021) 19:159–184 References Greenspan, H., van Ginneken, B., Summers, R.M. (2016). Deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging, Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., 35(5), 1153–1159. https://doi.org/10.1109/TMI.2016.2553401. Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, for image recognition. In 2016 IEEE Conference on computer B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, vision and pattern recognition (CVPR), (Vol. 29 pp. 770–778), Y., Zheng, X. (2016). TensorFlow: a system for large-scale https://doi.org/10.1109/CVPR.2016.90. machine learning. In 12th USENIX Symposium on operating Hirsch, J.D., Siegel, E.L., Balasubramanian, S., Wang, K.C. systems design and implementation (OSDI 16) (pp. 265–283): (2015). We built this house; it’s time to move in: Lever- USENIX Association. https://www.usenix.org/conference/osdi16/ aging existing DICOM structure to more completely technical- sessions/presentation/abad. utilize readily available detailed contrast administration Akkus, Z., Ali, I., Sedla ´ˇ r, J., Agrawal, J.P., Parney, I.F., Giannini, information. Journal of Digital Imaging, 28(4), 407–411. C., Erickson, B.J. (2017). Predicting deletion of chromosomal https://doi.org/10.1007/s10278-015-9771-y. arms 1p/19q in low-grade gliomas from MR images using Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q. (2017). machine intelligence. Journal of Digital Imaging, 30(4), 469–476. Densely connected convolutional networks. In 2017 IEEE https://doi.org/10.1007/s10278-017-9984-3. Conference on computer vision and pattern recognition (CVPR), Arias, J., Mart´ ınez-Gomez, ´ J., Gamez, ´ J.A., de Herrera, A.GS., Mu ¨ ller, (Vol. 30 pp. 2261–2269), https://doi.org/10.1109/CVPR.2017.243. H. (2016). Medical image modality classification using discrete Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Bayesian networks. Computer Vision and Image Understanding, Smith, S.M. (2012). FSL. NeuroImage, 62(2), 782–790. 151, 61–71. https://doi.org/10.1016/j.cviu.2016.04.002. https://doi.org/10.1016/j.neuroimage.2011.09.015. Barboriak, D. (2015). Data from RIDER NEURO MRI. Kingma, D.P., & Ba, J. (2015). Adam: a method for stochastic https://doi.org/10.7937/K9/TCIA.2015.VOSN3HN1. optimization. In: 3rd International conference on learning repre- Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Kop- sentations, ICLR conference track proceedings. ArXiv:1412.6980. pel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). ImageNet Tarbox, L., Prior, F. (2013). The cancer imaging archive classification with deep convolutional neural networks. In (TCIA): maintaining and operating a public information Pereira, F., Burges, C.J., Bottou, L., Weinberger, K.Q. (Eds.) repository. Journal of Digital Imaging, 26(6), 1045–1057. Communications of the ACM, (Vol. 25 pp. 1097–1105): Curran https://doi.org/10.1007/s10278-013-9622-7. Associates, Inc., https://doi.org/10.1145/3065386. DeVries, T., & Taylor, G.W. (2018). Learning confidence for out-of- Lambin, P., Leijenaar, R.T., Deist, T.M., Peerlings, J., de Jong, distribution detection in neural networks. arXiv:180204865. E.E., van Timmeren, J., Sanduleanu, S., Larue, R.T., Even, Dimitrovski, I., Kocev, D., Kitanovski, I., Loskovska, S., Dzer ˇ oski, A.J., Jochems, A., van Wijk, Y., Woodruff, H., van Soest, J., S. (2015). Improved medical image modality classifica- Lustberg, T., Roelofs, E., van Elmpt, W., Dekker, A., Mot- tion using a combination of visual and textual features. taghy, F.M., Wildberger, J.E., Walsh, S. (2017). Radiomics: Computerized Medical Imaging and Graphics, 39, 14–26. the bridge between medical imaging and personalized https://doi.org/10.1016/j.compmedimag.2014.06.005. medicine. Nature Reviews Clinical Oncology, 14(12), 749–762. Erickson, B., Akkus, Z., Sedlar, J., Korfiatis, P. (2016). Data from https://doi.org/10.1038/nrclinonc.2017.141. LGG-1p19qDeletion. https://doi.org/10.7937/K9/TCIA.2017. LaMontagne, P.J., Keefe, S., Lauren, W., Xiong, C., Grant, dwehtz9v. E.A., Moulder, K.L., Morris, J.C., Benzinger, T.L., Mar- Esteban, O., Birman, D., Schaer, M., Koyejo, O.O., Poldrack, R.A., cus, D.S. (2018). OASIS-3: longitudinal neuroimaging, Gorgolewski, K.J. (2017). MRIQC: advancing the automatic clinical, and cognitive dataset for normal aging and prediction of image quality in MRI from unseen sites. PLOS ONE, Alzheimer’s disease. Alzheimer’s & Dementia, 14(7), P1097. 12(9), 1–21. https://doi.org/10.1371/journal.pone.0184661. https://doi.org/10.1016/j.jalz.2018.06.1439. Fyllingen, E.H., Stensjøen, A.L., Berntsen, E.M., Solheim, O., Rein- Lee, D., Kim, J., Moon, W.J., Ye, J.C. (2019). CollaGAN: collaborative ertsen, I. (2016). Glioblastoma segmentation: comparison of three GAN for missing image data imputation. In 2019 IEEE/CVF different software packages. PLOS ONE, 11(10), e0164891:1– conference on computer vision and pattern recognition (CVPR) e0164891:16. https://doi.org/10.1371/journal.pone.0164891. (pp. 2482–2491). https://doi.org/10.1109/CVPR.2019.00259. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of Li, R., Zhang, W., Suk, H.I., Wang, L., Li, J., Shen, D., Ji, training deep feedforward neural networks. In Teh, Y.W., & S. (2014). Deep learning based imaging data completion Titterington, M. (Eds.) Proceedings of the thirteenth international for improved brain disease diagnosis. In Golland, P., Hata, conference on artificial intelligence and statistics, PMLR, N., Barillot, C., Hornegger, J., Howe, R. (Eds.) Medical proceedings of machine learning research, (Vol. 9 pp. 249–256). image computing and computer-assisted intervention – MIC- http://proceedings.mlr.press/v9/glorot10a.htm. CAI 2014 (pp. 305–312): Springer International Publishing, Gorgolewski, K.J., Auer, T., Calhoun, V.D., Craddock, R.C., Das, S., https://doi.org/10.1007/978-3-319-10443-0 39. Duff, E.P., Flandin, G., Ghosh, S.S., Glatard, T., Halchenko, Y.O., Li, X., Morgan, P.S., Ashburner, J., Smith, J., Rorden, C. (2016). Handwerker, D.A., Hanke, M., Keator, D., Li, X., Michael, Z., The first step for neuroimaging data analysis: DICOM to Maumet, C., Nichols, B.N., Nichols, T.E., Pellman, J., Poline, NIfTI conversion. Journal of Neuroscience Methods, 264, 47–56. J.B., Rokem, A., Schaefer, G., Sochat, V., Triplett, W., Turner, https://doi.org/10.1016/j.jneumeth.2016.03.001. J.A., Varoquaux, G., Poldrack, R.A. (2016). The brain imaging Li, Z., Wang, Y., Yu, J., Guo, Y., Cao, W. (2017). Deep learning based data structure, a format for organizing and describing outputs radiomics (DLR) and its usage in noninvasive IDH1 prediction of neuroimaging experiments. Scientific Data, 3(1), 160044:1– for low grade glioma. Scientific Reports, 7(1), 5467:1–5467:11. 160044:9. https://doi.org/10.1038/sdata.2016.44. https://doi.org/10.1038/s41598-017-05848-2. Gorgolewski, K., Esteban, O., Schaefer, G., Wandell, B., Poldrack, Lundervold, A.S., & Lundervold, A. (2019). An overview R. (2017). OpenNeuro — a free online platform for sharing of deep learning in medical imaging focusing on MRI. and analysis of neuroimaging data. F1000research, 6, 1055. Zeitschrift fur ¨ Medizinische Physik, 29(2), 102–127. https://doi.org/10.7490/f1000research.1114354.1. https://doi.org/10.1016/j.zemedi.2018.11.002. Neuroinform (2021) 19:159–184 183 Marek, K., Chowdhury, S., Siderowf, A., Lasch, S., Coffey, C.S., tumor analysis consortium glioblastoma multiforme CPTAC- Caspell-Garcia, C., Simuni, T., Jennings, D., Tanner, C.M., GBM collection. https://doi.org/10.7937/k9/tcia.2018.3rje41q1. Trojanowski, J.Q., Shaw, L.M., Seibyl, J., Schuff, N., Singleton, Nie, D., Zhang, H., Adeli, E., Liu, L., Shen, D. (2016). 3D deep A., Kieburtz, K., Toga, A.W., Mollenhauer, B., Galasko, D., learning for multi-modal imaging-guided survival time prediction Chahine, L.M., Weintraub, D., Foroud, T., Tosun-Turgut, D., of brain tumor patients. In Ourselin, S., Joskowicz, L., Sabuncu, Poston, K., Arnedo, V., Frasier, M., Sherer, T., Bressman, S., M.R.,Unal, G.,Wells, W.(Eds.) Medical image computing and Merchant, M., Poewe, W., Kopil, C., Naito, A., Dorsey, R., computer-assisted intervention – MICCAI 2016, (Vol. 19 pp. 212– Casaceli, C., Daegele, N., Albani, J., Uribe, L., Foster, E., Long, 220): Springer, https://doi.org/10.1007/978-3-319-46723-8 25. J., Seedorff, N., Crawford, K., Smith, D., Casalin, P., Malferrari, Pedano, N., Flanders, A.E., Scarpace, L., Mikkelsen, T., G., Halter, C., Heathers, L., Russell, D., Factor, S., Hogarth, P., Eschbacher, J.M., Hermes, B., Sisneros, V., Barnholtz-Sloan, Amara, A., Hauser, R., Jankovic, J., Stern, M., Hu, S.C., Todd, J., Ostrom, Q. (2016). Radiology data from the cancer G., Saunders-Pullman, R., Richard, I., Saint-Hilaire, H., Seppi, genome atlas low grade glioma [TCGA-LGG] collection. K., Shill, H., Fernandez, H., Trenkwalder, C., Oertel, W., Berg, https://doi.org/10.7937/K9/TCIA.2016.L4LTD3TK. D., Brockman, K., Wurster, I., Rosenthal, L., Tai, Y., Pavese, Pereira, S., Pinto, A., Alves, V., Silva, C.A. (2015). Deep con- N., Barone, P., Isaacson, S., Espay, A., Rowe, D., Brandabur, volutional neural networks for the segmentation of gliomas M., Tetrud, J., Liang, G., Iranzo, A., Tolosa, E., Marder, K., in multi-sequence MRI. In Crimi, A., Menze, B., Maier, O., Sanchez, M., Stefanis, L., Marti, M., Martinez, J., Corvol, J.C., Reyes, M., Handels, H. (Eds.) Brainlesion: glioma, multiple Assly, O., Brillman, S., Giladi, N., Smejdir, D., Pelaggi, J., sclerosis, stroke and traumatic brain injuries. Lecture notes Kausar, F., Rees, L., Sommerfield, B., Cresswell, M., Blair, C., in computer science, (Vol. 9556 pp. 131–143): Springer, Williams, K., Zimmerman, G., Guthrie, S., Rawlins, A., Donharl, https://doi.org/10.1007/978-3-319-30858-6 12. L., Hunter, C., Tran, B., Darin, A., Venkov, H., Thomas, C.A., Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M. James, R., Heim, B., Deritis, P., Sprenger, F., Raymond, D., (2013). Deep feature learning for knee cartilage segmentation Willeke, D., Obradov, Z., Mule, J., Monahan, N., Gauss, K., using a triplanar convolutional neural network. In Mori, K., Fontaine, D., Szpak, D., McCoy, A., Dunlop, B., Payne, L., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (Eds.) Advanced Ainscough, S., Carvajal, L., Silverstein, R., Espay, K., Ranola, M., information systems engineering (pp. 246–253). Berlin: Springer, Rezola, E., Santana, H., Stamelou, M., Garrido, A., Carvalho, S., https://doi.org/10.1007/978-3-642-40763-5 31. Kristiansen, G., Specketer, K., Mirlman, A., Facheris, M., Soares, Prevedello, L.M., Halabi, S.S., Shih, G., Wu, C.C., Kohli, M.D., H., Mintun, A., Cedarbaum, J., Taylor, P., Jennings, D., Slieker, Chokshi, F.H., Erickson, B.J., Kalpathy-Cramer, J., Andriole, K.P., L., McBride, B., Watson, C., Montagut, E., Sheikh, Z., Bingol, B., Flanders, A.E. (2019). Challenges related to artificial intelligence Forrat, R., Sardi, P., Fischer, T., Reith, D., Egebjerg, J., Larsen, research in medical imaging and the importance of image analysis L., Breysse, N., Meulien, D., Saba, B., Kiyasova, V., Min, C., competitions. Radiology: Artificial Intelligence, 1(1), e180031. McAvoy, T., Umek, R., Iredale, P., Edgerton, J., Santi, D., Czech, https://doi.org/10.1148/ryai.2019180031. C., Boess, F., Sevigny, J., Kremer, T., Grachev, I., Merchant, Remedios, S., Roy, S., Pham, D.L., Butman, J.A. (2018). Classi- K., Avbersek, A., Muglia, P., Stewart, A., Prashad, R., Taucher, fying magnetic resonance image modalities with convolutional J., the Parkinson’s Progression Markers Initiative (2018). The neural networks. In Petrick, N., & Mori, K. (Eds.) Medical parkinson’s progression markers initiative (PPMI,) – establishing imaging 2018: computer-aided diagnosis, international society a PD biomarker cohort. Annals of Clinical and Translational for optics and photonics, SPIE, (Vol. 10575 pp. 558–563), Neurology, 5(12), 1460–1477. https://doi.org/10.1002/acn3.644. https://doi.org/10.1117/12.2293943. Martino, A.D., O’Connor, D., Chen, B., Alaerts, K., Anderson, J.S., Scarpace, L., Flanders, A.E., Jain, R., Mikkelsen, T., Assaf, M., Balsters, J.H., Baxter, L., Beggiato, A., Bernaerts, Andrews, D.W. (2015). Data from REMBRANDT. S., Blanken, L.ME., Bookheimer, S.Y., Braden, B.B., Byrge, https://doi.org/10.7937/K9/TCIA.2015.588OZUZB. L., Castellanos, F.X., Dapretto, M., Delorme, R., Fair, D.A., Scarpace, L., Mikkelsen, T., Cha, S., Rao, S., Tekchandani, Fishman, I., Fitzgerald, J., Gallagher, L., Keehn, R., Kennedy, S., Gutman, D., Saltz, J.H., Erickson, B.J., Pedano, N., D.P., Lainhart, J.E., Luna, B., Mostofsky, S.H., Mu ¨ ller, R.A., Flanders, A.E., Barnholtz-Sloan, J., Ostrom, Q., Barboriak, Nebel, M.B., Nigg, J.T., O’Hearn, K., Solomon, M., Toro, D., Pierce, L.J. (2016). Radiology data from the cancer R., Vaidya, C.J., Wenderoth, N., White, T., Craddock, R.C., genome atlas glioblastoma multiforme [TCGA-GBM] collection. Lord, C., Leventhal, B., Milham, M.P. (2017). Enhancing https://doi.org/10.7937/K9/TCIA.2016.RNYFUYE9. studies of the connectome in autism using the autism brain Schmainda, K., & Prah, M. (2018). Data from brain-tumor- imaging data exchange II. Scientific Data, 4(1), 170010. progression. https://doi.org/10.7937/K9/TCIA.2018.15quzvnb. https://doi.org/10.1038/sdata.2017.10. Shah, N., Feng, X., Lankerovich, M., Puchalski, Mercier, L., Maestro, R.FD., Petrecca, K., Araujo, D., Haegelen, R.B., Keogh, B. (2016). Data from ivy GAP. C., Collins, D.L. (2012). Online database of clinical MR and https://doi.org/10.7937/K9/TCIA.2016.XLwaN6nL. ultrasound images of brain tumors. Medical Physics, 39(6 Part 1), Simonyan, K., & Zisserman, A. (2015). Very deep convolutional 3253–3261. https://doi.org/10.1118/1.4709600. networks for large-scale image recognition. In Bengio, Y., Montagnon, E., Cerny, M., Cadrin-Chene ˆ vert, A., Hamilton, V., & LeCun, Y. (Eds.) International conference on learning Derennes, T., Ilinca, A., Vandenbroucke-Menu, F., Turcotte, representations, ICLR, conference track proceedings,Vol. 3. S., Kadoury, S., Tang, A. (2020). Deep learning workflow https://dblp.org/rec/html/journals/corr/SimonyanZ14a. in radiology: a primer. Insights into Imaging, 11(1), 22. Simonyan, K., Vedaldi, A., Zisserman, A. (2014). Deep inside https://doi.org/10.1186/s13244-019-0832-5. convolutional networks: visualising image classification models Moore, S.M., Maffitt, D.R., Smith, K.E., Kirby, J.S., Clark, and saliency maps. In Bengio, Y., & LeCun, Y. (Eds.) International K.W., Freymann, J.B., Vendt, B.A., Tarbox, L.R., Prior, F.W. conference on learning representations, ICLR, workshop track (2015). De-identification of medical images with retention proceedings. http://arxiv.org/abs/1312.6034. of scientific research value. RadioGraphics, 35(3), 727–735. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A. (2015). https://doi.org/10.1148/rg.2015140244. Striving for simplicity: the all convolutional net. In International National Cancer Institute Clinical Proteomic Tumor Analysis Consor- conference on learning representations ICLR, workshop track tium (CPTAC) (2018). adiology data from the clinical proteomic proceedings. http://arxiv.org/abs/1412.6806. 184 Neuroinform (2021) 19:159–184 Srinivas, M., & Mohan, C.K. (2014). Medical images modality van Ooijen, P.MA. (2019). Quality and curation of medical images classification using multi-scale dictionary learning. In 2014 19th and data (pp. 247–255): Springer International Publishing. International conference on digital signal processing,(Vol.19 https://doi.org/10.1007/978-3-319-94878-2 17. pp. 621–625), https://doi.org/10.1109/ICDSP.2014.6900739. Wang, S., Pavlicek, W., Roberts, C.C., Langer, S.G., Zhang, Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, M., Hu, M., Morin, R.L., Schueler, B.A., Wellnitz, C.V., C.B., Gotway, M.B., Liang, J. (2016). Convolutional neural Wu, T. (2011). An automated DICOM database capable of networks for medical image analysis: full training or fine tuning? arbitrary data mining (including radiation dose indicators) for IEEE Transactions on Medical Imaging, 35(5), 1299–1312. quality monitoring. Journal of Digital Imaging, 24(2), 223–233. https://doi.org/10.1109/TMI.2016.2535302. https://doi.org/10.1007/s10278-010-9329-y. Tamada, D., Kromrey, M.L., Ichikawa, S., Onishi, H., Motosugi, U. Xiao,Y., Fortin,M., Unsgar ˚ d, G., Rivaz, H., Reinertsen, I. (2020). Motion artifact reduction using a convolutional neural (2017). REtroSpective evaluation of cerebral tumors (RESECT): network for dynamic contrast enhanced MR imaging of the A clinical database of pre-operative MRI and intra-operative liver. Magnetic Resonance in Medical Sciences, 19(1), 64–76. ultrasound in low-grade glioma surgeries. Medical Physics, 44(7), https://doi.org/10.2463/mrms.mp.2018-0156. 3875–3882. https://doi.org/10.1002/mp.12268. van Erp, T.GM., Chervenak, A.L., Kesselman, C., D’Arcy, M., Yu, Y., Lin, H., Yu, Q., Meng, J., Zhao, Z., Li, Y., Zuo, L. Sobell, J., Keator, D., Dahm, L., Murry, J., Law, M., Hasso, (2015). Modality classification for medical images using multiple A., Ames, J., Macciardi, F., Potkin, S.G. (2011). Infrastruc- deep convolutional neural networks. Journal of Computational ture for sharing standardized clinical brain scans across hos- Information Systems, 11(15), 5403–5413. pitals. In 2011 IEEE International conference on bioinfor- Publisher’s Note Springer Nature remains neutral with regard to matics and biomedicine workshops (BIBMW) (pp. 1026–1028), jurisdictional claims in published maps and institutional affiliations. https://doi.org/10.1109/BIBMW.2011.6112547. Affiliations 1 2 1 Sebastian R. van der Voort · Marion Smits · Stefan Klein · for the Alzheimer’s Disease Neuroimaging Initiative Biomedical Imaging Group Rotterdam, Departments of Radiology and Nuclear Medicine and Medical Informatics, Erasmus MC - University Medical Centre Rotterdam, Rotterdam, The Netherlands Department of Radiology and Nuclear Medicine, Erasmus MC - University Medical Centre Rotterdam, Rotterdam, The Netherlands http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Neuroinformatics Springer Journals

DeepDicomSort: An Automatic Sorting Algorithm for Brain Magnetic Resonance Imaging Data

Loading next page...
 
/lp/springer-journals/deepdicomsort-an-automatic-sorting-algorithm-for-brain-magnetic-wmQdknRDPK

References (54)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2020
ISSN
1539-2791
eISSN
1559-0089
DOI
10.1007/s12021-020-09475-7
Publisher site
See Article on Publisher Site

Abstract

With the increasing size of datasets used in medical imaging research, the need for automated data curation is arising. One important data curation task is the structured organization of a dataset for preserving integrity and ensuring reusability. Therefore, we investigated whether this data organization step can be automated. To this end, we designed a convolutional neural network (CNN) that automatically recognizes eight different brain magnetic resonance imaging (MRI) scan types based on visual appearance. Thus, our method is unaffected by inconsistent or missing scan metadata. It can recognize pre- contrast T1-weighted (T1w), post-contrast T1-weighted (T1wC), T2-weighted (T2w), proton density-weighted (PDw) and derived maps (e.g. apparent diffusion coefficient and cerebral blood flow). In a first experiment, we used scans of subjects with brain tumors: 11065 scans of 719 subjects for training, and 2369 scans of 192 subjects for testing. The CNN achieved an overall accuracy of 98.7%. In a second experiment, we trained the CNN on all 13434 scans from the first experiment and tested it on 7227 scans of 1318 Alzheimer’s subjects. Here, the CNN achieved an overall accuracy of 98.5%. In conclusion, our method can accurately predict scan type, and can quickly and automatically sort a brain MRI dataset virtually without the need for manual verification. In this way, our method can assist with properly organizing a dataset, which maximizes the shareability and integrity of the data. Keywords DICOM · Brain imaging · Machine learning · Magnetic resonance imaging · BIDS · Data curation Introduction that is shared in public repositories (Greenspan et al. 2016; Lundervold and Lundervold 2019). However, this increase With the rising popularity of machine learning, deep in available data also means that proper data curation, the learning, and automatic pipelines in the medical imaging management of data throughout its life cycle, is needed to field, the demand for large datasets is increasing. To satisfy keep the data manageable and workable (Prevedello et al. this hunger for data, the amount of imaging data collected at 2019; van Ooijen 2019). One essential data curation step healthcare institutes keeps growing, as is the amount of data is organizing a dataset such that it can easily be used and reused. Properly organizing the dataset maximizes the shareability and preserves the full integrity of the dataset, Alzheimer’s Disease Neuroimaging Initiative (ADNI) is a ensuring repeatability of an experiment and reuse of the Group/Institutional Author. Data used in preparation of this article were obtained from the dataset in other experiments. Alzheimer’s Disease Neuroimaging Initiative (ADNI) database Unfortunately, the organization of medical imaging data (adni.loni.usc.edu). As such, the investigators within the ADNI is not standardized, and the format in which a dataset is pro- contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing vided often differs between sources (Lambin et al. 2017; of this report. A complete listing of ADNI investigators can van Ooijen 2019). Efforts such as the brain imaging data be found at: http://adni.loni.ucla.edu/wpcontent/uploads/how to structure (BIDS) (Gorgolewski et al. 2016) propose a stan- apply/ADNI Acknowledgement List.pdf dardized data structure, to which some public data repos- Sebastian R. van der Voort itories adhere (e.g. OpenNeuro Gorgolewski et al. 2017, s.vandervoort@erasmusmc.nl ABIDE Martino et al. 2017 and OASIS LaMontagne et al. 2018). However, other repositories do not conform to this standard (e.g. The Cancer Imaging Archive (TCIA) Clark Extended author information available on the last page of the article. 160 Neuroinform (2021) 19:159–184 et al. 2013, Alzheimer’s Disease Neuroimaging Initiative trace/isotropic images), perfusion-weighted dynamic sus- (ADNI), and PPMI Marek et al. 2018). Furthermore, similar ceptibility contrast (PWI-DSC) scans, and diffusion- to some prospectively collected research data, retrospec- weighted and perfusion-weighted derived maps (including, tively collected data from clinical practice usually do not for example, apparent diffusion coefficient (ADC), frac- follow a standardized format either (van Ooijen 2019). tional anisotropy, and relative cerebral blood flow). Once Thus, the challenge of structuring a dataset, either into a the scan types have been identified, DeepDicomSort can BIDS compliant dataset or a different format, remains. organize the dataset into a structured, user-defined layout or When using a medical imaging dataset in a research turn the dataset into a BIDS compliant dataset. We made all project, one needs to select the scan types that are relevant our source code, including code for the pre-processing and to the research question (Montagnon et al. 2020; Lambin post-processing, and pre-trained models publicly available, et al. 2017). Thus, it is essential to identify the scan type of to facilitate reuse by the community. each scan when sorting a medical imaging dataset. Different data sources do not use consistent naming conventions Materials & Methods in the metadata of a scan (e.g. the series description), which complicates the automatic identification of the scan Terminology type (van Ooijen 2019;Wang et al. 2011). Moreover, in some cases, this metadata is not consistently stored (e.g. Since the exact meaning of specific terms can differ contrast administration Hirsch et al. 2015) and might even depending on one’s background, we have provided an be partially or entirely missing, as can be the case for overview of the terminology as it is used in this paper anonymized data (Moore et al. 2015). As a result, the sorting in Table 1. We have tried to adhere to the terminology is frequently done manually, by looking at each scan and used by BIDS as much as possible, and have provided labeling it according to the perceived scan type. This manual the equivalent BIDS terminology in Table 1 as well. We labeling can be a very time-consuming task, which hampers differ from the BIDS terminology regarding two terms: scientific progress; thus, it is highly desirable to automate scan and scan type. Scan type is referred to as modality in this step of the data curation pipeline. Similar arguments BIDS, but to avoid confusion with the more common use of concerning the complexity of medical imaging data and the modality to indicate different types of equipment (e.g. MRI importance of data structuring also motivated the creation and computed tomography (CT)), we instead use scan type. of the BIDS standard (Gorgolewski et al. 2016). Scan is used instead of “data acquisition” or “run” as used Previous research has focused on modality recognition in BIDS, to be more in line with common terminology and (Dimitrovski et al. 2015;Yu et al. 2015; Arias et al. 2016), to avoid confusion with other types of data acquisition. We as well as on distinguishing different modalities of (MRI) define a structured dataset as a dataset where all the data scans (Srinivas and Mohan 2014; Remedios et al. 2018). for the different subjects and scans is provided in the same Only one of these studies (Remedios et al. 2018) considered way. For example, a folder structure with a folder for each the prediction of the scan type of magnetic resonance subject, session and scan with a consistent naming format imaging MRI scans, who predicted 4 scan types, namely for the different folders and scan types. A standardized precontrast T1-weighted (T1w), post-contrast T1-weighted dataset is a dataset where the data has been structured (T1wC), fluid-attenuated inversion recovery (FLAIR) and according to a specific, public standard, for example BIDS. T2-weighted (T2w) scans. However, with the increasing popularity of multi-parametric MRI in machine learning Data algorithms and automatic pipelines (Li et al. 2017; Akkus et al. 2017; Nie et al. 2016; Pereira et al. 2015), the need to An extensive collection of data from multiple different recognize more scan types is arising. sources was used to construct our method and evaluate its In this research, we propose a method, called DeepDi- performance. We used MRI scans of subjects with brain comSort, that recognizes eight different scan types of brain tumors, as well as scans of subjects without brain tumors. MRI scans, and facilitates sorting into a structured format. To ensure sufficient heterogeneity in our dataset, we DeepDicomSort is a pipeline consisting of a pre-processing included scans from multiple different sources, and we step to prepare scans as inputs for a convolutional neu- only excluded scans if their scan type did not fall into one ral network (CNN), a scan type recognition step using of the eight categories that we aimed at predicting with a CNN, and a post-processing step to sort the identified our method. Thus, no scans were excluded based on other scan types into a structured format. Our method identi- criteria such as low image quality, the occurrence of imaging fies T1w, T1wC, T2w, proton density-weighted (PDw), artifacts, scanner settings, or disease state of the subject. T2-weighted fluid-attenuated inversion recovery (T2w- FLAIR), diffusion-weighted imaging (DWI) (including https://github.com/Svdvoort/DeepDicomSort Neuroinform (2021) 19:159–184 161 Table 1 Overview of terminology used in this paper, the corresponding BIDS terminology and meaning of each term Term BIDS term Meaning Modality Modality Type of technique used to acquire a scan (e.g. MRI, CT) Subject Subject A person participating in a study Site Site Institute at which a scan of the subject has been acquired Session Session A single visit of a subject to a site in which one or more scans have been acquired Scan Data acquisition/run A single 3D image that has been acquired of a subject in a session Slice N/A A single 2D cross-section that has been extracted from a scan Scan type Modality Specific visual appearance category of a scan (e.g. T1w, T2w) Sample N/A A single input for the CNN Class N/A An output category of the CNN DICOM DICOM A data format used to store medical imaging data. In addition to the imaging data, DICOM files can also store metadata about the scanner equipment, the specific imaging protocol and clinical information. NIfTI NIfTI A data format used to store (neuro) medical imaging data. Brain Tumor Dataset 2016) and TCGA-LGG (Pedano et al. 2016) collections from TCIA (Clark et al. 2013). Two datasets from The Our method was initially developed and subsequently tested Norwegian National Advisory Unit for Ultrasound and on brain MRI scans of subjects with brain tumors. Scans Image Guided Therapy (USIGT) (Fyllingen et al. 2016; of subjects with brain tumors were used because the brain Xiao et al. 2017) were also included in the brain tumor tumor imaging protocols used to acquire these scans usually train set. In total, the data originated from 17 different sites, span a wide array of scan types, including pre-contrast and and the scans were acquired on at least 29 different scanner post-contrast scans. The brain tumor dataset consisted of a models from 4 different vendors (GE, Hitachi, Philips, and train set and an independent test set, which in total included Siemens). data from 11 different sources. The subjects were distributed The brain tumor test set contained 2369 scans of 302 among the brain tumor train set and brain tumor test set different sessions from 192 subjects. These scans were before starting any experiments, and the data was divided included from the brain images of tumors for evaluation such that the distribution of the scan types was similar in (BITE) dataset (Mercier et al. 2012) as well as the Clinical the train set and the test set. We chose to put all subjects Proteomic Tumor Analysis Consortium Glioblastoma Mul- that originated from the same dataset in either the train set tiforme (CPTAC-GBM) (National Cancer Institute Clinical or test set to test the generalizability of our algorithm. Thus, Proteomic Tumor Analysis Consortium (CPTAC) 2018), all scans of a subject were either all in the brain tumor train Repository of Molecular Brain Neoplasia Data (REM- set or all in the brain tumor test set, and no data leak could BRANDT) (Scarpace et al. 2015), and Reference Image take place, precluding an overly optimistic estimation of the Database to Evaluate Therapy Response: Neuro MRI performance of our method. In this way, a good performance (RIDER Neuro MRI) (Barboriak 2015) collections from the of our method on the test set could not be the result of TCIA. In total, the data originated from 8 different sites, the algorithm having learned features that are specific to a and the scans were acquired on at least 15 different scanner particular site or scanner. models from 4 different vendors (GE, Philips, Siemens, and The brain tumor train set contained 11065 scans of 1347 Toshiba). different sessions from 719 subjects. These scans were For some scans, the scanner type was not available in included from the Brain-Tumor-Progression (Schmainda the DICOM tags (DICOM tag (0008, 1090)); thus, the data and Prah 2018), Ivy Glioblastoma Atlas Project (Ivy GAP) variation in the number of scanners could be even larger. (Shah et al. 2016), LGG-1p19qDeletion (Erickson et al. All subjects included in the brain tumor dataset had a 2016; Akkus et al. 2017), TCGA-GBM (Scarpace et al. (pre-operative or post-operative) brain tumor. The scans in 162 Neuroinform (2021) 19:159–184 the datasets were manually sorted, and T1w, T1wC, T2w, were included since the derived category encompasses all PDw, T2w-FLAIR, DWI, PWI-DSC, and derived images diffusion and perfusion derived imaging. These PWI-ASL were identified. The different types of derived images were derived maps explain the 47 3D scans in Table 3. combined into a single category, as the derivation of these images is often inconsistent among scanners and vendors, DeepDicomSort and thus these images need to be rederived from the raw data (e.g. the original DWI or PWI-DSC scan). The pipeline of our proposed method, DeepDicomSort, The details of the brain tumor train set and brain tumor consisted of three phases: test set are presented in Table 2. An example of the eight 1. Pre-processing: prepare the scans as an input for the scan types for a single subject from the brain tumor test set CNN can be seen in Fig. 1. 2. Scan type prediction: obtain the predicted scan type using the CNN ADNI Dataset 3. Post-processing: use the predictions to sort the dataset In order to evaluate the results of the algorithm on non- By passing a dataset through this pipeline, it can be tumor brain imaging, we used the ADNI dataset (adni.loni. turned into a BIDS compliant dataset, or it can be structured usc.edu). The ADNI was launched in 2003 as a public- according to a user-defined layout. If one chooses to create a private partnership, led by Principal Investigator Michael BIDS compliant dataset, the scans are stored as NIfTI files; W. Weiner, MD. The primary goal of ADNI has been to if a user-defined structure is used, the scans are stored as test whether serial magnetic resonance imaging, positron DICOM files. An overview of the DeepDicomSort pipeline emission tomography, other biological markers, and clinical is presented in Fig. 2. and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and Pre-Processing early Alzheimer’s disease. For up-to-date information, see adni-info.org. As a first pre-processing step, all DICOM files were We used the baseline and screening data of 1318 subjects, converted to NIfTI format using dcm2niix (Li et al. 2016), resulting in 7227 scans. These scans originated from 67 as this simplifies the further processing of the scans. This different sites and were acquired on 23 different scanner step was skipped for the USIGT and BITE datasets, as these models from 3 different vendors (GE, Philips, and Siemens). were already provided in NIfTI format (no DICOM files Details of the ADNI dataset are presented in Table 3.Since were available). no contrast is administered to subjects in the ADNI study, In the next step, the number of dimensions of each scan there are no T1wC or PWI-DSC scans in this dataset. The was automatically determined. Although most scans were ADNI dataset does include arterial spin labeling perfusion- 3-dimensional, some scans happened to be 4-dimensional. weighted imaging (PWI-ASL), however since our algorithm This was the case for some DWI scans, which consisted of was not designed to recognize these scans, they were multiple b-values and potentially b-vectors, and for some excluded. The derived maps from these PWI-ASL scans PWI-DSC scans, which contained multiple time points. If Table 2 Overview of data in the brain tumor dataset Brain tumor train set Brain tumor test set Scan type Ax Cor Sag 3D Total Ax Cor Sag 3D Total T1w 580 14 872 454 1920 206 2 202 26 436 T1wC 964 526 298 1040 2828 208 133 97 172 610 T2w 1151 411 23 31 1616 232 46 16 1 295 PDw 413 40 0 0 453 145 36 0 0 181 T2w-FLAIR 991 39 4 50 1084 221 3 0 32 256 DWI 1359 0 0 0 1359 347 0 0 0 347 PWI-DSC 669 0 0 0 669 87 0 0 0 87 Derived 1136 0 0 0 1136 157 0 0 0 157 Total 7263 1030 1197 1575 11065 1603 220 315 231 2369 The number of scans for each scan type and the different spatial orientations (axial, coronal, sagittal and 3D) are specified Neuroinform (2021) 19:159–184 163 Fig. 1 Examples of the different scan types for a single subject from the brain tumor test set (a) T1w (b) T1wC (c) T2w (d) PDw (e) T2w-FLAIR (f) DWI (g) PWI-DSC (h) Derived (ADC) a scan was 4-dimensional, the first (3D) element of the 25 individual slices of 256×256 voxels. The slice extraction sequence was extracted and was subsequently used instead was then followed by an intensity scaling of each slice. of the full 4-dimensional scan. This extraction was done to The intensity was scaled such that the minimum intensity make sure that the CNN would also recognize scan types was 0, and the maximum intensity was 1 to compensate for that generally contain repeats in situations where this was intensity differences between slices. These pre-processed not the case. For example, this could be the case when the slices were than used as input samples for the CNN. No data different b-values of a DWI scan were stored as multiple, augmentation was used, as the large number of scans and separate (3D) scans instead of a single (4D) scan. Since different data sources that were used to train the algorithm the information that a scan is 4-dimensional can aid the already ensured sufficient natural variation in the samples, algorithm in recognizing the scan type, a “4D” label was obviating the need for additional augmentation. attached to each scan. This 4D label was set to 1 if the scan After applying these pre-processing steps, the brain was 4-dimensional, and to 0 if it was not. tumor train set consisted of 276625 samples, the brain tumor All scans were then reoriented to match the orientation test set consisted of 59225 samples, and the ADNI dataset of a common template using FSL’s reorient2std (Jenkinson consisted of 180675 samples. et al. 2012). After this step, the scans were resampled to 256 × 256 × 25 voxels, using cubic b-spline interpolation, Network while maintaining the original field of view. All of these resampled (3D) scans were split into (2D) slices, resulting in A CNN was used to classify the samples into one of eight different classes: T1w, T1wC, T2w, PDw, T2w-FLAIR, DWI, PWI-DSC, or derived. The architecture of the CNN is Table 3 Overview of data in the ADNI dataset shown in Fig. 3. This architecture was inspired by the VGG ADNI dataset network (Simonyan and Zisserman 2015). The network was implemented using TensorFlow 1.12.3 Scan type Ax Cor Sag 3D Total (Abadi et al. 2016). The cross-entropy between the predicted and ground truth labels was used as a loss T1w 0 0 276 2380 2656 function. Weights were initialized using Glorot Uniform T1wC 0 0 0 0 0 initialization (Glorot and Bengio 2010). We used Adam as T2w 1725 488 5 0 2218 an optimizer (Kingma and Ba 2015), which started with a PDw 1069 0 0 0 1069 learning rate of 0.001, β = 0.9, and β = 0.999, as these 1 2 T2w-FLAIR 1 0 3 488 492 were proposed as reasonable default values (Kingma and Ba DWI 558 0 2 0 560 2015). The learning rate was automatically adjusted based PWI-DSC 0 0 0 0 0 on the training loss; if the training loss did not decrease Derived 183 0 2 47 232 during 3 epochs, the learning rate was decreased by a Total 3536 488 288 2915 7227 −7 factor 10, with a minimum learning rate of 1 · 10 .The network could train for a maximum of 100 epochs, and the The number of scans for each scan type and the different spatial network automatically stopped training when the loss did orientations (axial, coronal, sagittal and 3D) are specified 164 Neuroinform (2021) 19:159–184 Fig. 2 Overview of the DeepDicomSort pipeline. Scans are first converted from DICOM to NIfTI format and pre-processed. During the pre-processing the scan is split into 25 individual slices, that are then classified as one of eight scan types by the CNN. The predictions of the individual slices are combined in a majority vote and the predicted scan type of each scan is used to structure the dataset. DeepDicomSort can structure either the original DICOM files, or the NIfTI files. In the last case the dataset turns into BIDS compliant dataset not decrease during 6 epochs. We used a batch size of 32. Post-Processing We arrived at this CNN design and these settings by testing multiple different options and selecting the best performing Once the scan type of each scan is predicted, these one. Details about the optimization of the settings are predictions can then be used in (optional) post-processing presented in Section “Experiments”, Fig. 4, and Appendix steps to automatically structure the dataset. We provide two A. options for the structured format: During the training of the network, all slices were – Sort the original DICOM files; this can be done in a inputted to the CNN as individual samples, and no user-defined folder structure. information about the (possible) relation between different – Sort the NIfTI files; in this case the BIDS format is slices was provided. After training the network, the scan used. type of a scan was predicted by passing all 25 slices of the scan through the CNN and then combining these individual During the post-processing, the spatial orientation of the slice predictions using a majority vote. scan (axial, coronal, sagittal, or 3D) is also determined Fig. 3 The architecture of the CNN. The convolutional blocks consisted of N 2D convolutional filters followed by batch normalization and a parametric rectified linear unit. The output size of the convolutional blocks and pooling layers is specified Neuroinform (2021) 19:159–184 165 Fig. 4 Overview of Experiment I. In this experiment, the brain tumor train set was used to obtain the optimal model parameters and to train the algorithm. The trained model was then evaluated on the brain tumor test set based on the direction cosines (DICOM tag (0020, 0037)), motion-corrected or is a derived image, based on specific which can be used to define the structured layout when keywords being present in the image type DICOM tag. choosing to sort the DICOM files. These characteristics can also be used in the heuristic file. Although more scan metadata can be used to define the HeuDiConv heuristic, such as subject gender and referring physician, we considered this metadata irrelevant for our purpose of scan HeuDiConv is a heuristic-centric DICOM converter, which type prediction. In addition, this kind of metadata was often uses information from the DICOM tags, along with a user- missing due to anonymization. defined heuristic file to organize an unstructured DICOM dataset into a structured layout. HeuDiConv is currently Experiments one of the most widespread, publicly available methods that can structure an unsorted DICOM dataset. Therefore, we Evaluation of DeepDicomSort used HeuDiConv as a benchmark so we could compare our method, which is based on the visual appearance of a scan, We performed two experiments in which we constructed and with a method that is based on the metadata of a scan. evaluated our method, to show the generalizability among Before HeuDiConv can be used to sort a dataset, different datasets: one first needs to define the heuristic file, which is – Experiment I: Algorithm trained on brain tumor train essentially a translation table between the metadata of a set and tested on brain tumor test set scan and its scan type. This heuristic file is based on – Experiment II: Algorithm trained on brain tumor dataset scan metadata that is extracted from the DICOM tags. (brain tumor train set and brain tumor test set), and Available metadata includes image type, study description, tested on ADNI dataset series description, repetition time, echo time, size of the scan along 4 dimensions, protocol name, and sequence In Experiment I we developed the algorithm and tried name. HeuDiConv also determines whether a scan is different CNN architectures, pre-processing settings, and optimizer settings, collectively referred to as the model parameters, using a train/validation split of the brain tumor https://github.com/nipy/heudiconv 166 Neuroinform (2021) 19:159–184 train set. We then selected the best performing model number of scans. The per-class accuracy was defined as parameters and trained a CNN using the whole brain tumor the number of correctly predicted scans of a specific scan train set. Once the model was trained, its performance was type divided by the total number of scans of that scan type. evaluated on the brain tumor test set. In Experiment I, the We also computed the confusion matrices, which show the brain tumor test set was only used to evaluate the results relationship between the ground truth and predicted class. and was left untouched during the development and training To visualize which parts of the slice contributed most of the algorithm. Figure 4 shows an overview of the model to the prediction of the CNN, we generated saliency maps parameter selection, training and testing steps, and the data (Simonyan et al. 2014). Saliency maps were generated by used in Experiment I. More details about the selection of calculating the gradient of a specific class with respect to the optimal model parameters and the results of other model each input pixel, thus giving a measure of the contribution parameters can be found in Appendix A. of each pixel. To obtain sharper maps, we used guided In Experiment II we used the ADNI dataset as a test set backpropagation (Springenberg et al. 2015) and applied a to see if our method also generalizes to scans in which no rectified linear activation to the obtained maps. Saliency brain tumor was present. In this experiment, we trained the maps were generated for all slices of the scans of the CNN using the whole brain tumor dataset (a combination of example subject shown in Fig. 1, based on the trained all the data in the brain tumor train set and brain tumor test model from Experiment I. Additional saliency maps were set) and then evaluated the performance of the model on the generated for 20 samples of each scan type that were ADNI dataset. No model parameter selection was done in randomly selected from the test sets of Experiment I and this experiment, instead the optimal model parameters that Experiment II. The saliency maps for the samples from were obtained from Experiment I were used. Thus, apart Experiment I were generated using the CNN trained in from training the CNN on a larger dataset, the methods used Experiment I, and for the samples from Experiment II the in Experiment I and Experiment II were the same. Figure 5 CNN trained in Experiment II was used. By generating shows an overview of the training and testing steps and the saliency maps for multiple samples, we could show the data used in Experiment II. In this experiment, no T1wC and behavior of our algorithm for different scan appearances. PWI-DSC scans were present in the test set, however in a Some of these samples contained tumors, contained imaging real-world setting one may not know a priori whether these artifacts or had a low image quality. Thus, these saliency scan types were present or absent. Thus, we still allowed maps also showed the robustness of our algorithm to the model to predict the scan type as one of these classes to unusual scan appearance. To gain some insight into the mirror this realistic setting. behavior of each convolutional layer we determined the To evaluate the performance of our algorithm, we feature maps of each convolutional layer. We calculated the calculated the overall accuracy and the per-class accuracy feature maps for the T1w slice shown in Fig. 1 by passing of the classification. The overall accuracy was defined as it through the network and determining the output of each the number of correctly predicted scans divided by the total filter after each convolutional layer. Fig. 5 Overview of Experiment II. In this experiment the brain tumor dataset was used to train the algorithm, and the trained model was then evaluated on the ADNI dataset Neuroinform (2021) 19:159–184 167 Table 4 DICOM tag numbers and descriptions of the DICOM tags Comparison with HeuDiConv extracted for the HeuDiConv heuristic We compared the performance of HeuDiConv and DeepDi- Tag description Tag number comSort using the data from Experiment I, since the data Image type 0008,0008 72 unique instances in Experiment II did not include all scan types. When using Study description 0008,1030 435 unique instances HeuDiConv, only the scans which were available in DICOM Series description 0008,103E 1215 unique instances format could be processed. This meant that the scans from the USIGT dataset were removed from the brain tumor train Repetition time 0018,0080 Mean ± std: 3912 ± 4078 set, and the scans from the BITE dataset were removed Echo time 0018,0081 Mean ± std: 52.11 ± 48.9 from the brain tumor test set, as these were not available Number of rows in image 0028,0010 Range: 128 - 1152 in DICOM format. Thus, 86 scans (43 T1wC and 43 T2w- Number of columns in image 0028,0011 Range: 128 - 1152 FLAIR) were removed from the brain tumor train set and 27 scans (all T1wC) were removed from the brain tumor test For text-based tags the number of unique instances is shown and for numerical-based tags the distribution is shown, based on the scans in set, reducing the train set to 10979 scans and the test set to the brain tumor train set 2342 scans. To construct our heuristic, we first extracted all the relevant DICOM tags from the scans in the brain tumor for a human (for example, the most superior and inferior train set, see Table 4.Table 4 also shows the number of slices). unique occurrences for text-based tags and the distribution of the numerical tags in the brain tumor train set. An Experiment II - Evaluation on ADNI Dataset iterative approach was followed to construct the heuristic, where rules were added or adjusted until the performance The results from Experiment II (evaluation on the ADNI of HeuDiConv on the brain tumor train set could no longer dataset, containing scans of subjects without brain tumors) be increased, see Fig. 6. Our initial heuristic was a simple are reported in Table 5. Just like in Experiment I the one, based solely on certain text being present in the series network was trained for 96 epochs. In this experiment description. For example, if the text “T1” was present in the our method achieved an overall accuracy of 98.5%. It series description, it was considered a T1w scan. took approximately 22 hours to train the network of this To compare the performance of HeuDiConv with the experiment using an Nvidia Titan V GPU with 12 GB performance of DeepDicomSort the overall accuracy and memory. per-class accuracy of the scan type predictions obtained The highest per-class accuracy was achieved for the T1w from HeuDiConv were calculated. scans (100.0%), whereas the T2w scans had the lowest accuracy (95.1%). Most of the incorrectly predicted T2w scans were predicted as T1wC or PDw scans. Furthermore, Results although no T1wC and PWI-DSC scans were present in the test set used in this experiment, our method incorrectly Experiment I - Evaluation on Brain Tumor Dataset classified 40 scans as T1wC (mainly T2w scans) and 3 scans as PWI-DSC scans (all DWI scans). The full confusion The results from Experiment I (evaluation on the brain matrix can be found in Appendix B. tumor test set, containing scans of subjects with brain tumors) are reported in Table 5. The network was trained Focus of the Network for 96 epochs. In this experiment our method achieved an overall accuracy of 98.7%. Figure 7 shows the saliency maps for the different scan The highest per-class accuracy was achieved for the types, for the same slices as in Fig. 1. For most scan types, PDw and PWI-DSC scans (100.0% for both), whereas the CNN seemed to focus on the ventricles, the cerebral the T2w-FLAIR scans had the lowest accuracy (93.0%). spinal fluid (CSF) around the skull, the nose, and the eyes. The confusion matrices show that most of the incorrectly For the PDw slice, the CNN did not have a specific focus predicted T2w-FLAIR scans were classified as T1w scans on the ventricles and did not seem to have a particular focus (see Appendix B). Appendix C shows the performance of inside the brain. The DWI and derived slices also showed our method on a per-slice basis before the majority vote has some focus outside of the skull, probably because of the taken place to determine the scan class, which shows that the artifacts outside of the skull that these scan types often per-slice accuracy is lower than the per-scan accuracy. This feature (as can be seen in Fig. 7h). We have created saliency is not surprising since there are slices in a scan from which maps for all 25 slices of the scans shown in Fig. 1,which it is almost impossible to determine the scan type even are shown in Appendix E. For most other slices the focus 168 Neuroinform (2021) 19:159–184 Fig. 6 Overview of the HeuDiConv experiment. In this experiment the scans from the brain tumor train set that were available in DICOM format were used to construct the heuristic file. HeuDiConv used this heuristic file to predict the scan type of the scans from the brain tumor test set which were available in DICOM format of the CNN was the same as for the slices from Fig. 7. poor imaging quality or artifacts. The feature maps of all Furthermore, the presence of a tumor did not disturb the convolutional layers are shown in Appendix G.For the prediction as also evidenced by the high accuracy achieved shallow convolutional layers, some filters seemed to detect in Experiment I. Only on the most superior and inferior the skull without looking at the brain tissue, whereas other slices did the CNN struggle, probably due to the fact that layers seemed to focus more on specific brain structures the brain was barely visible on those slices. such as the CSF. Interpreting the deeper convolutional Additional saliency maps for randomly selected samples layers gets harder as the feature maps of those layers have a from the test sets of Experiment I and Experiment II lower resolution. are shown in Appendix F. These examples show that our method is robust to heterogeneity in the visual appearance HeuDiConv Predictive Performance of the scans, as well as to the presence of tumors, the presence of imaging artifacts, and poor image quality. The top-level rules of the derived heuristic for HeuDiConv This is demonstrated by the fact that the CNN focused were mainly based on the series description, with additional on the same brain structures for almost all of the slices lower-level rules based on the echo time, image type, and and correctly predicted the scan type even for slices with the derived status of the scan. The overall accuracy obtained within the brain tumor train set after several iterations of improving the heuristic was 91.0%. The overall accuracy in the brain tumor test set was 72.0%. The results for each class Table 5 Overall accuracy and per-class accuracy achieved by can be found in Table 6, along with a comparison to the DeepDicomSort in Experiment I and Experiment II accuracy of the CNN evaluated on the brain tumor test set. Experiment I Experiment II For the evaluation of the CNN’s performance, we included the same scans as present in the test set for HeuDiConv (i.e. Overall 0.987 0.985 those which were available in DICOM format). Although T1w 0.993 1.000 a slightly different dataset was used for this test set, the T1wC 0.997 N/A results of the CNN in Tables 5 and 6 appear to be the T2w 0.990 0.965 same. This can be explained by the fact that only T1wC PDw 1.000 0.998 scans were removed from the test set, thus for all other T2w-FLAIR 0.930 0.951 classes the accuracy remained the same. Furthermore, due DWI 0.991 0.995 to the large number of scans the difference is only visible PWI-DSC 1.000 N/A at more decimals, e.g. the overall accuracy in Table 5 was Derived 0.994 0.983 98.73% whereas in Table 6 it was 98.72%. These results Neuroinform (2021) 19:159–184 169 Fig. 7 Saliency maps of the scan types, generated by the CNN evaluated on the same slices as in Fig. 1. This CNN was the model obtained in Experiment I (a) T1w (b) T1wC (c) T2w (d) PDw (e) T2w-FLAIR (f) DWI (g) PWI-DSC (h) Derived (ADC) show that DeepDicomSort outperformed HeuDiConv both from different sites, scanners, subjects, and scan protocols. in terms of the overall accuracy and the per-class accuracy Our method was also able to correctly predict the scan type for all classes. Appendix D compares the time required to of scans that had poor imaging quality or contained imaging sort the datasets using either DeepDicomSort, HeuDiConv, artifacts, as can be seen in Appendix F.1. The CNN focused or by hand, which shows that DeepDicomSort is more than mainly on the ventricles, areas close to the skull, and the twice as fast as the other two methods. CSF at the edges of the brain. There was also some focus on the gray matter and white matter, although these structures seemed less relevant for the decision making of the CNN. It Discussion makes sense that the CNN focuses on the CSF, both in the ventricles and at the edges of the brain, because their visual Our results show that it is possible to use a CNN to appearance is very characteristic of the scan type. Although automatically identify the scan type of brain MRI scans and the CNN also focused on the eyes and nose, we do not use this to sort a large, heterogeneous dataset. Because of expect this to disrupt the prediction when these structures the high accuracy of our method, it can be used virtually are absent (e.g. in defaced scans). There were a lot of slices without manual verification. The CNN performed well both in which the eyes and nose were not present, such as the for scans with and without the presence of a tumor. The most inferiorly and superiorly located slices, for which the performance of our method generalizes well across scans CNN predicted the scan type correctly. Data sorting is just one step of the data curation pipeline, and in recent years more research on the automation of other Table 6 Accuracy of HeuDiConv on the brain tumor test set data curation tasks has been carried out. Some examples HeuDiConv DeepDicomSort include automatic scan quality checking (Esteban et al. 2017), motion artifact correction (Tamada et al. 2020), and Overall 0.720 0.987 missing scan type imputation from the present scan types T1w 0.963 0.993 (Lee et al. 2019). However, to automate other data curation T1wC 0.447 0.997 steps the dataset first needs to follow a structured format, T2w 0.930 0.990 making our tool a crucial first step in the overall pipeline. PDw 0.077 1.000 The increasing data complexity, both in volume and in the T2w-FLAIR 0.684 0.930 number of different types of data, not only shows a need for DWI 0.887 0.991 a proper data curation pipeline, but also shows the need for PWI-DSC 0.600 1.000 a standardized data structure for scans and their associated Derived 0.948 0.994 metadata (van Erp et al. 2011; Gorgolewski et al. 2016; Lambin et al. 2017). The widespread adoption of a common, Results of DeepDicomSort on this test set are also given, where the standardized data structure would be favorable over the use scans which were not available in the DICOM format were excluded of our tool or similar tools. Unfortunately, both in research from the test set 170 Neuroinform (2021) 19:159–184 and in clinic practice, it is currently not commonplace to metadata. While not perfect, our method does have a very provide datasets in a standardized format, thus making our high performance overall and the comparison with manual tool a valuable addition to the data curation pipeline. Even sorting shows that it considerably reduces the time required if a standardized data structure were to be widely adopted, to sort a dataset. our tool would remain valuable as a quality assessment tool. The CNN was trained and evaluated by using the Although the accuracy of our method is high overall, our ground truth labels, which were obtained by manually going method predicted the incorrect scan type in some cases. For through the dataset and annotating each scan according example, in Experiment I the CNN mainly misclassified to the perceived scan type. It is possible that the scan T2w-FLAIR scans. Almost all of these misclassified T2w- type was incorrectly annotated for some of the scans. FLAIR scans originated from the RIDER Neuro MRI To limit this possibility we took a second look at scans dataset. Comparing a T2w-FLAIR scan from the RIDER where there was a mismatch between the prediction from dataset with a T2w-FLAIR scan from the train set used in DeepDicomSort and the ground truth label, both for train Experiment I shows a big difference in visual appearance, datasets and test datasets. We corrected the ground truth see Fig. 8a and b. These figures show that the white label for scans that were incorrectly annotated and these matter and gray matter appear very different on the two corrected labels were used for the experiments presented in scans, even though they have the same scan type, which this paper. The labels of around 0.1% of the scans in the probably confused the network. In Experiment II the per- dataset were corrected in this way. Although it is possible class accuracy was the lowest for the T2w scans. Almost that there were still some incorrectly annotated scans, all of the misclassified T2w scans were hippocampus based on these findings we expect this fraction to be very scans, an example of which can be seen in Fig. 8c. The small. misclassification of these scans can be explained by their We chose a CNN as the basis of our method because limited field of view. Since the CNN did not see any such we wanted to minimize the number of pre-processing steps. scans in the training set, as all scans in the training set Using more traditional machine learning approaches, such covered the full brain, it is not surprising that our method as a support vector machine or random forest, would require failed in these cases. The saliency maps in Fig. 8 show that the extraction of relevant features from each scan. This the CNN had difficulty focusing on the relevant parts of the would complicate our method as we would first have to slice. For example, for the T2w-FLAIR slices in Figs. 7e hand-craft these features and add a pre-processing step in and 8d it can be seen that the CNN focused mainly on which we extract these features from the scan. Furthermore, the ventricles, whereas in Fig. 8e there was more focus on the extraction of these features would likely require a brain the edge of the brain, similar to the T1w slice in Fig. 7a. mask to prevent the features from being influenced too Although we did not achieve a perfect prediction accuracy, much by the background. The creation of this brain mask it is unlikely that any scan sorting method ever will, due would add a pre-processing step, and could be a potential to the large heterogeneity in scan appearance and scan source of error. Instead, by using a CNN, no features had Fig. 8 Examples of scans our method misclassified (b and c) and a correctly classified scan (a) as comparison, along with their saliency maps. The T2w- FLAIR scan in (b) is probably misclassified as its appearance is very different from T2w-FLAIR (a) T2w-FLAIR scan from (b) T2w-FLAIR scan (c) T2w scan from the scans that were in the train the Ivy GAP collection. from the RIDER Neuro ADNI dataset. This scan dataset. The T2w scan in (c)is This scan type was cor- MRI collection. This scan type was misclassified as a probably misclassified because rectly predicted. type was misclassified as T1wC. it has a very limited field of view a T1w. (d) Saliency map of the (e) Saliency map of the (f) Saliency map of the correctly classified scan misclassified scan from misclassified scan from (c). from (a). (b). Neuroinform (2021) 19:159–184 171 to be defined as the CNN automatically learns the relevant performances on the same dataset. Remedios et al. tested features. The CNN also does not require a brain mask, as it their method on 1281 scans, which came from 4 different has learned to ignore the background and focus on the brain sites and 5 different scanner models. Their dataset was thus itself, as shown by the saliency maps. considerably smaller and less heterogeneous than our test We opted for a 2D CNN instead of a 3D CNN, because data set. Furthermore, our method can identify more scan this allowed us to extract a larger region of the scan to be types and does so using only a single CNN instead of used as an input for the CNN. By using a 2D CNN, this three. region could encompass a full slice of the brain enabling the A limitation of our method is that it can only classify a CNN to learn features that capture the relative differences scan as one of the eight scan types for which it was trained. in appearance of the various tissue types (white matter, gray Thus, when it is presented with an unknown scan type matter, CSF, bone, skin, etc.), which are characteristic of (e.g. PWI-ASL or dynamic contrast-enhanced perfusion- the scan type. Furthermore, because a 2D CNN typically weighted imaging), our method will (wrongly) predict it requires less memory than a 3D CNN (Prasoon et al. 2013), as one of the other classes. In future work, this limitation it requires less computational power (making our method could be addressed in two ways. The first option would be accessible to a broader audience), and also requires less time to adapt the network to either recognize more scan types to train and evaluate (Li et al. 2014). or to replace one of the existing classes by a different one. Our method achieved a better overall accuracy and per- This can be done using a transfer learning approach by fine- class accuracy than HeuDiConv. The results obtained using tuning the weights obtained in this research on additional HeuDiConv show the difficulty of creating a method based data (Tajbakhsh et al. 2016). Since we did not have enough on DICOM tags that generalizes well to other datasets. data for other scan types, we limited the CNN to the eight Even within one dataset, it can be difficult to create a classes for which we did have enough data. A second option heuristic that correctly maps the scan metadata to the scan would be to extend our method to allow out-of-distribution type; for example Table 4, shows that 1215 different series detection (DeVries and Taylor 2018). In this methodology, descriptions are used just for the eight scan types considered the network could not only predict the scan type of a scan in this research. HeuDiConv has particular difficulty in but could also indicate if a scan belongs to an unknown identifying scans that have similar metadata but have scan type. This requires a significant change to the model different scan types. For example, this is reflected in the architecture, which we considered outside the scope of this results for the T1w and T1wC scans. These scans usually research for now. have similar scan settings and series descriptions, making Another limitation is the use of reorient2std from FSL, it hard to determine whether a scan is obtained pre- or which means that (this part of) the code cannot be used post-contrast administration. The same difficulty plays a in a commercial setting. Commercially allowed alternatives role for T2w and PDw scans, which are often acquired exist, such as the ‘reorient image’ function from ANTs at the same time in a combined imaging sequence and (http://stnava.github.io/ANTs/), however these have not thus have the same series description. In our timing results been tested as part of the DeepDicomSort pipeline. (Appendix D), it was faster to sort the dataset by hand than A promising future direction could be to predict the to use HeuDiConv. This was caused by HeuDiConv often metadata of a scan based on its visual appearance. For misclassifying T2w-FLAIR and T1wC scans as a different example, one could predict the sequence that has been used scan type, and thus a lot of manual time was needed to to acquire a scan (e.g. MPRAGE or MP2RAGE in the case correct these mistakes. of a T1w scan), or reconstruct the acquisition settings of A method that, similar to ours, classifies the scan type a scan (e.g. the spin echo time). In this research, we did based on the visual appearance of the scan was proposed not consider these types of predictions because we first by Remedios et al. (2018) called -net. Their method wanted to focus on the dataset organization, however we can identify T1w, T1wC, T2w, and pre-contrast and post- think that our method can provide a basis for these types of contrast FLAIR scans. Remedios et al. do this using a predictions. cascaded CNN approach where a first CNN is used to classify a scan as T1-weighted, T2-weighted, or FLAIR. Two other CNNs are then used to classify a scan as pre- Conclusion contrast or post-contrast, one CNN for the T1-weighted scans and one CNN for the FLAIR scans. -net achieved an We developed an algorithm that can recognize T1w, T1wC, overall accuracy of 97.6%, which is lower than our overall T2w, PDw, T2w-FLAIR, DWI, PWI-DSC, and derived accuracy of 98.7% (Experiment I) and 98.5% (Experiment brain MRI scans with high accuracy, outperforming the II). Since Remedios et al. did not make their trained model currently available methods. We have made our code and publicly available, it was not possible to directly compare trained models publicly available under an Apache 2.0 172 Neuroinform (2021) 19:159–184 license. Using the code and the trained models, one can run LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diag- nostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis the DeepDicomSort pipeline and structure a dataset either Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; according to the BIDS standard or a self-defined layout. Takeda Pharmaceutical Company; and Transition Therapeutics. The We think that scan type recognition is an essential step in Canadian Institutes of Health Research is providing funds to sup- any data curation pipeline used in medical imaging. With port ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health this method, and by making our code and trained models (www.fnih.org). The grantee organization is the Northern California available, we can automate this step in the pipeline and Institute for Research and Education, and the study is coordinated make working with large, heterogeneous datasets easier, by the Alzheimer’s Therapeutic Research Institute at the University faster, and more accessible. of Southern California. ADNI data are disseminated by the Labo- ratory for Neuro Imaging at the University of Southern California. Compliance with Ethical Standards Information Sharing Statement Conflict of interests The authors declare that they have no conflict of Code and trained models for the algorithms constructed in interest. this paper are publicly available on GitHub under an Apache 2.0 license at https://github.com/Svdvoort/DeepDicomSort. Open Access This article is licensed under a Creative Commons Part of the pre-processing code depends on FSL. Since FSL Attribution 4.0 International License, which permits use, sharing, is only licensed for non-commercial use, (this part of) the adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the code cannot be used in a commercial setting. source, provide a link to the Creative Commons licence, and indicate All data used in this research is publicly avail- if changes were made. The images or other third party material in this able. The Cancer Imaging Archive collections mentioned article are included in the article’s Creative Commons licence, unless are all publicly available at cancerimagingarchive.net indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended (RRID:SCR 008927). The datasets from the Norwegian use is not permitted by statutory regulation or exceeds the permitted National Advisory Unit for Ultrasound and Image-Guided use, you will need to obtain permission directly from the copyright Therapy are publicly available at sintef.no/projectweb/ holder. To view a copy of this licence, visit http://creativecommons. usigt-en/data. The BITE collection is publicly available at org/licenses/by/4.0/. nist.mni.mcgill.ca/?page id=672. The Alzheimer’s Disease Neuroimaging Initiative (RRID:SCR 003007) data is avail- able at adni.loni.usc.edu, after submitting an application Appendix A: Model Parameter Selection which must be approved by the ADNI Data Sharing and Publications Committee. To determine the optimal model parameters (i.e. the CNN architecture, pre-processing settings and optimizer settings) Acknowledgements Sebastian van der Voort acknowledges funding of the CNN used in the DeepDicomSort pipeline, we by the Dutch Cancer Society (KWF project number EMCR 2015- evaluated the performance of different model parameters on 7859). We would like to thank Nvidia for providing the GPUs used in this the brain tumor train set, the train set from Experiment I. research. Before carrying out the experiments, the brain tumor train This work was carried out on the Dutch national e-infrastructure set was partitioned into a train set and validation set. 85% with the support of SURF Cooperative. of the scans was used as a train set and 15% of the scans The results published here are in part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/. was used as a validation set. Only one such split was made Data used in this publication were generated by the National Cancer since training and validating the network for multiple splits Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC). would be too time-consuming. During the splitting, all slices Data collection and sharing for this project was funded by the of a scan where either all in the train set or all in the ADNI (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). validation set to prevent data leakage between the train set ADNI is funded by the National Institute on Aging, the National Insti- and validation set. tute of Biomedical Imaging and Bioengineering, and through gen- We compared five different CNN architectures: the erous contributions from the following: AbbVie, Alzheimer’s Asso- architecture proposed in this paper, Alexnet (Krizhevsky ciation; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, et al. 2012), ResNet18 (He et al. 2016), DenseNet121 Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and (Huang et al. 2017) and VGG19 (Simonyan and Zisserman Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affili- 2015). For all networks, the same pre-processing approach ated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO as described in Section “Pre-Processing” was used, with Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development the optimizer settings as described in Section “Network”. The only difference was that the learning rate reduction was https://github.com/Svdvoort/DeepDicomSort based on the validation loss instead of the training loss. Neuroinform (2021) 19:159–184 173 Table 7 Overall training accuracy, overall validation accuracy, and the time it took to train each network for the different network architectures test in the model parameter selection Train Validation Training time (h) Our model 0.999 0.971 14.8 2D AlexNet 0.255 0.255 11.0 2D DensNet121 1.000 0.980 21.8 2D ResNet18 1.000 0.973 26.4 2D VGG19 1.000 0.948 34.6 3D DenseNet121 1.000 0.868 22.9 3D ResNet18 1.000 0.832 27.7 A train/validation split of the brain tumor train set was used to determine the performance Except for the AlexNet model, all the other models were able to properly learn, and the final validation accuracy was roughly the same for all models. The DenseNet model achieved the highest validation accuracy of 98%, a full overview of the performance of the different CNN architectures can be found in Table 7. These results show that multiple models work for the problem at hand. Ultimately, we chose to employ our proposed architecture because it is less computationally intensive than the other models. Not only does our model train faster (shown in Table 7), it also requires less time to predict the scan type of new scans, and requires less (GPU) memory. Selecting the least computationally intensive model allows a wider adoption of our tool. We also trained two 3D models to compare their perfor- mance with the 2D models. In the case of the 3D models, most of the pre-processing steps were kept the same, apart from the slice extraction. Instead of extracting 25 slices, 3D patches with a size of 90 × 90 × 15 voxels were extracted. A maximum of 10 patches per scan were extracted, in Fig. 9 Learning curves of the different network architectures tested in such a way that they covered as much of the (geometrical) the model parameter selection center of the scan as possible to ensure that the patches contained brain and not just background. We trained a 3D ResNet18 and a 3D DenseNet121; the learning curves can For the VGG19 model, the initial learning rate was lowered be seen in Fig. 9f and g. These 3D architectures achieved to 0.0001, as the model would otherwise get stuck in a a lower validation accuracy than their 2D counterparts, poor minimum early in the training stage. Different pre- 0.87 versus 0.98 for the DenseNet model and 0.83 versus processing settings (e.g. different normalization settings) 0.97 for the ResNet model. These results justified our and model settings (e.g. learning rate) were tested. However, choice for a 2D model, which not only achieved a higher here we show only the effect of the different architectures accuracy but was also less computationally intensive. using the same pre-processing settings for all models to make a fair comparison and since we obtained the best results using these pre-processing settings. Appendix B: Confusion Matrices The learning curves for the different models are shown in Fig. 9. The learning curve for the AlexNet model (Fig. 9b), shows that this model is the only one that was not capable The confusion matrices for Experiment I (Table 8)and Experiment II (Table 9), which show the relation between to properly train for the task at hand, probably due to the low number of weights that can be optimized in this model. the ground truth scan type and the predicted scan type. 174 Neuroinform (2021) 19:159–184 Table 8 Confusion matrix of results from Experiment I Ground truth Predicted T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived T1w 433 2 0 1 0 0 0 0 T1wC 2 608 0 0 0 0 0 0 T2w 1 2 292 0 0 0 0 0 PDw 0 0 0 181 0 0 0 0 T2w-FLAIR 18 0 0 0 238 0 0 0 DWI 0 0 0 0 0 344 2 1 PWI-DSC 00 0 0 0 0 87 0 Derived 0 0 1 0 0 0 0 156 Table 9 Confusion matrix of results from Experiment II Ground truth Predicted T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived T1w 2655 1 0 0 0 0 0 0 T1wC 00 000 0 0 0 T2w 0 34 2140 44 0 0 0 0 PDw 0 0 2 1067 0 0 0 0 T2w-FLAIR 6 5 0 1 468 12 0 0 DWI 00 000 557 3 0 PWI-DSC 00 000 0 0 0 Derived 00 100 3 0 228 Appendix C: Predictive Performance on Appendix D: Time Comparison Between Per-Slice Basis DeepDicomSort, HeuDiConv and Manual Sorting Table 10 shows the accuracy of the CNNs from Experiment We estimated the potential time that can be saved by using I and Experiment II on a per-slice basis instead of on a DeepDicomSort to sort a dataset instead of doing so by per-scan basis. These results are obtained by comparing hand or using HeuDiConv. We did so by assuming the the predicted class of a slice directly with the ground truth hypothetical situation where one has an automated tool that class of that slice before the individual slice predictions are requires the T1wC and T2w-FLAIR scans as inputs, and combined by a majority vote to obtain the scan type. we compared the time needed to find the T1wC and T2w- FLAIR scans for all subjects and sessions in the brain tumor test set. The manual sorting was simulated by iterating Table 10 Overall accuracy and per-class accuracy achieved by DeepDicomSort in Experiment I and Experiment II on a per-slice basis over all scans in a session in random order until either the T1wC and T2w-FLAIR scans were found or until there Experiment I Experiment II were no more scans to check. The sorting of the dataset using HeuDiConv or DeepDicomSort was simulated by Overall 0.934 0.851 first iterating over all scans that were predicted as a T1wC T1w 0.942 0.814 or T2w-FLAIR by these methods, and checking whether T1wC 0.940 N/A that prediction was correct. If the predicted scan type was T2w 0.926 0.894 incorrect, the same approach as for the manual sorting was PDw 0.905 0.914 followed to find the correct scans. We assumed that, on T2w-FLAIR 0.879 0.592 average, a human required 25 seconds per scan to visually DWI 0.985 0.943 identify the correct scan type. By multiplying this time per PWI-DSC 0.925 N/A scan with the total number of scans that were iterated over, Derived 0.990 0.908 we obtained an estimate for the total time taken by each Neuroinform (2021) 19:159–184 175 T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived method to find the T1wC and T2w-FLAIR scans. We used the brain tumor test set to evaluate the timing results, since HeuDiConv was only optimized for the brain tumor dataset. D.1 Results The time required to identify the T1wC and T2w-FLAIR scan for each session in the brain tumor test set by hand was estimated to be 29.0 hours. The estimated time required to check and correct the automated scan type recognition by HeuDiConv was 35.7 hours, which excludes the time required to construct the heuristic. If the automated scan type recognition was done by DeepDicomSort instead, we estimated that 12.3 hours of manual time were required. The time required to run the DeepDicomSort pipeline on the dataset was 61.5 minutes using an Intel Xeon Processor E5-2690 v3 for pre-processing and post-processing, and an Nvidia Tesla K40m GPU to classify the samples using the CNN. If the scans identified by DeepDicomSort were used without a manual check, in which case the total sorting time was only 61.5 minutes, 527 scans would have been correctly identified. Four scans were incorrectly identified as a T1wC or T2w-FLAIR scan, for one session the T1wC would not have been found, and for 8 sessions the T2w-FLAIR would not have been found. It should be noted that with the automated methods (DeepDicomSort and HeuDiConv), one gets a fully sorted Fig. 10 Saliency maps for slices 1 through 13 of the subject from Fig. 1 dataset, whereas the sorting by hand still requires the sorting T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived of the scans that were not yet identified. Appendix E: Saliency Map for Full Scan Figures 10 and 11 show the saliency maps for all 25 slices from the scans of the example subject from Fig. 1.The CNN seems to focus on the same features as in Fig. 7, mostly on the ventricles and on the CSF at the edges of the brain. In the superior slices of the scan, it can be seen that the presence of a tumor does not disrupt the CNN. Although it looks at the edge of the tumor, it does not put a lot of focus on the tumor itself. For the most superior slices of the T1w, T1wC and T2w scans it can be seen that when the brain is no longer present in the slice the CNN loses it focus and seems to look randomly throughout the slice. Appendix F: Saliency Maps for Additional Examples F.1 Random Samples from the Brain Tumor Test Set To show the robustness of our method to differences in scan appearance, as well as to imaging artifacts, we have Fig. 11 Saliency maps for slices 14 through 25 of the subject from Fig. 1 176 Neuroinform (2021) 19:159–184 T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived randomly selected 20 slices of each scan type from the brain tumor test set. All of these slices were then passed through P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1wC 1wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d the CNN, and we determined the saliency maps along with † † † the predicted class of each slice. This is the prediction P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d based on the slice itself, and thus before the majority vote. † † The saliency maps and predicted scan types are shown in P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d Figs. 12 and 13. We have highlighted slices that contain imaging artifacts (†), have a poor image quality (†), and P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d subjects with a large head tilt (†). These saliency maps show † † that the CNN is quite robust to the presence of a tumor, the P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d presence of imaging artifacts, or poor image quality, in most † † cases the CNN still predicts the correct scan type. P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d † † F.2 Random Samples from ADNI Dataset P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1wC 1wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d The same approach as in Appendix F.1 has been applied to P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d show the saliency maps from random samples of the ADNI † † dataset. In this case, the saliency maps were derived using P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d the trained model from Experiment II instead of Experiment I. Once again the saliency maps and the predicted scan type P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1 1wC wC P Predicted: redicted: T T2 2w w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2 2w-FLAIR w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: PWI-DSC PWI-DSC P Predicted: redicted: D Deriv eriveed d are shown in Figs. 14 and 15. We have highlighted slices Fig. 12 Saliency maps and predicted scan type of randomly drawn that contain imaging artifacts, including hippocampus scans samples from the brain tumor test set Fig. 13 Saliency maps and T1w T1wC T2w PDw T2w-FLAIR DWI PWI-DSC Derived predicted scan type of randomly † † drawn samples from the brain tumor test set P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d P Predicted: redicted: T1w T1w P Predicted: redicted: T2w T2w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † P Predicted: redicted: T T1w 1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1wC 1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d † † P Predicted: redicted: T1w T1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T1wC T1wC P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: Deriv Deriveed d Neuroinform (2021) 19:159–184 177 Fig. 14 Saliency maps and T1w T2w PDw T2w-FLAIR DWI Derived predicted scan type of randomly drawn samples from the ADNI dataset P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T1wC 1wC P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted:D DW WII † † P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d † † P Predicted: redicted: T T1w 1w P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T2w 2w P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: PDw PDw P Predicted: redicted: T T2w-FLAIR 2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T1wC T1wC P Predicted: redicted:D DW WII P Predicted: redicted:D DW WII with a limited field of view, (†), have a poor image quality then determined the saliency maps for these slices and the (†), and subjects with a large head tilt (†). predicted scan type, the results are shown in Fig. 16. These results show that our method is quite robust against bright spots in a scan. Only for the T1w and PWI-DSC F.3 Robustness Against Bright Noise scans there were slices that were misclassified. In the case To test the effect of potential bright spots in the scan, we of the T1w slice, there were two out of five slices that performed an experiment where random bright spots were were predicted to be T1wC. This is most likely caused introduced in the slices from Fig. 1. Within each slice 0.5% by the CNN having learned that a T1w and T1wC scan of voxels were randomly chosen, and the intensity of these have a similar appearance in general, but that the T1wC voxels was set to the maximum intensity of the slice. We scan has brighter spots. In two cases the PWI-DSC slice 178 Neuroinform (2021) 19:159–184 Fig. 15 Saliency maps and T1w T2w PDw T2w-FLAIR DWI Derived predicted scan type of randomly drawn samples from the ADNI dataset P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted:D DW WII P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: P PWI-DSC WI-DSC P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: P PDw Dw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted: D Deriv eriveed d P Predicted: redicted: T T1w 1w P Predicted: redicted: T T2w 2w P Predicted: redicted: PDw PDw P Predicted: redicted: T2w-FLAIR T2w-FLAIR P Predicted: redicted:D DW WII P Predicted: redicted:D DW WII was misclassified as a DWI. Probably this is caused by Appendix G: Feature Map Visualizations the CNN seeing the random brightness spots outside the skull as imaging artifacts, which often show up in DWI Figures 17 through 22 show the feature maps of all filters of scans and less so in PWI-DSC scans. Although the CNN each convolutional layer for the T1w slice shown in Fig. 1. misclassified the T1w and PWI-DSC slices in some cases, It can be seen that some filters mainly identify the skull (for when bright spots were introduced on all 25 slices of the example, filter 1 from convolutional layer 1), whereas other T1w and PWI-DSC scans (randomly for each slice) and filters seem to focus on specific structure (for example, filter then passed through the network, the CNN still predicted the 4 from convolutional layer 1, which seems to identify gray correct scan type of the scan after the majority vote. matter). Neuroinform (2021) 19:159–184 179 Fig. 16 Saliency maps and T1w T1w T1wC T1wC T2w T2w PDw PDw T2w-FLAIR T2w-FLAIR D DW WII PWI-DSC PWI-DSC Deriv Derived ed predicted scan types of the derived slices from Fig. 1 after randomly setting some pixels to the maximum intensity. Every time the slice with the added noise is shown, followed by the P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T T T1wC T1wC 1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: P P PWI-DSC PWI-DSC W WII--D DS SC C P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d saliency map and predicted scan type for the same slice in the row below P Predicted: P Predicted: rreed diicctteed d:: T1w T1w T T1 1w w P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d P Predicted: P Predicted: rreed diicctteed d:: T1w T1w T T1 1w w P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d P Predicted: P Predicted: rreed diicctteed d:: T1w T1w T T1 1w w P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: P P PWI-DSC PWI-DSC W WII--D DS SC C P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T1wC T1wC T T1 1w wC C P Predicted: P Predicted: rreed diicctteed d:: T2w T2w T T2 2w w P Predicted: P Predicted: rreed diicctteed d:: PDw PDw P PD Dw w P Predicted: P Predicted: rreed diicctteed d:: T T T2w-FLAIR T2w-FLAIR 2 2w w--F FL LA AIIR R P Predicted: P Predicted: rreed diicctteed d:D :D D DW W W WIIII P Predicted: P Predicted: rreed diicctteed d:: P P PWI-DSC PWI-DSC W WII--D DS SC C P Predicted: P Predicted: rreed diicctteed d:: Deriv Deriv D Deerriiv veeeed d d d Fig. 17 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results directly after convolutional layer F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 11 11 F Filter ilter 12 12 F Filter ilter 1 13 3 F Filter ilter 1 14 4 F Filter ilter 1 15 5 F Filter ilter 1 16 6 1. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 19 19 F Filter ilter 20 20 F Filter ilter 2 21 1 F Filter ilter 2 22 2 F Filter ilter 2 23 3 F Filter ilter 2 24 4 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 27 27 F Filter ilter 28 28 F Filter ilter 2 29 9 F Filter ilter 3 30 0 F Filter ilter 3 31 1 F Filter ilter 3 32 2 Fig. 18 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results directly after convolutional layer F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 12 12 F Filter ilter 13 13 F Filter ilter 1 14 4 F Filter ilter 1 15 5 F Filter ilter 1 16 6 2. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 20 20 F Filter ilter 21 21 F Filter ilter 2 22 2 F Filter ilter 2 23 3 F Filter ilter 2 24 4 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 28 28 F Filter ilter 29 29 F Filter ilter 3 30 0 F Filter ilter 3 31 1 F Filter ilter 3 32 2 180 Neuroinform (2021) 19:159–184 Fig. 19 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 1 12 2 F Filter ilter 13 13 F Filter ilter 14 14 F Filter ilter 1 15 5 F Filter ilter 16 16 directly after convolutional layer 3. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 2 20 0 F Filter ilter 21 21 F Filter ilter 22 22 F Filter ilter 2 23 3 F Filter ilter 24 24 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 2 28 8 F Filter ilter 29 29 F Filter ilter 30 30 F Filter ilter 3 31 1 F Filter ilter 32 32 F Filter ilter 3 33 3 F Filter ilter 3 34 4 F Filter ilter 3 35 5 F Filter ilter 3 36 6 F Filter ilter 37 37 F Filter ilter 38 38 F Filter ilter 3 39 9 F Filter ilter 40 40 F Filter ilter 4 41 1 F Filter ilter 4 42 2 F Filter ilter 4 43 3 F Filter ilter 4 44 4 F Filter ilter 45 45 F Filter ilter 46 46 F Filter ilter 4 47 7 F Filter ilter 48 48 F Filter ilter 4 49 9 F Filter ilter 5 50 0 F Filter ilter 5 51 1 F Filter ilter 5 52 2 F Filter ilter 53 53 F Filter ilter 54 54 F Filter ilter 5 55 5 F Filter ilter 56 56 F Filter ilter 5 57 7 F Filter ilter 5 58 8 F Filter ilter 5 59 9 F Filter ilter 6 60 0 F Filter ilter 61 61 F Filter ilter 62 62 F Filter ilter 6 63 3 F Filter ilter 64 64 Fig. 20 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 1 12 2 F Filter ilter 1 13 3 F Filter ilter 14 14 F Filter ilter 1 15 5 F Filter ilter 1 16 6 directly after convolutional layer 4. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 2 20 0 F Filter ilter 2 21 1 F Filter ilter 22 22 F Filter ilter 2 23 3 F Filter ilter 2 24 4 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 2 28 8 F Filter ilter 2 29 9 F Filter ilter 30 30 F Filter ilter 3 31 1 F Filter ilter 3 32 2 F Filter ilter 3 33 3 F Filter ilter 3 34 4 F Filter ilter 3 35 5 F Filter ilter 3 36 6 F Filter ilter 3 37 7 F Filter ilter 38 38 F Filter ilter 3 39 9 F Filter ilter 4 40 0 F Filter ilter 4 41 1 F Filter ilter 4 42 2 F Filter ilter 4 43 3 F Filter ilter 4 44 4 F Filter ilter 4 45 5 F Filter ilter 46 46 F Filter ilter 4 47 7 F Filter ilter 4 48 8 F Filter ilter 4 49 9 F Filter ilter 5 50 0 F Filter ilter 5 51 1 F Filter ilter 5 52 2 F Filter ilter 5 53 3 F Filter ilter 54 54 F Filter ilter 5 55 5 F Filter ilter 5 56 6 F Filter ilter 5 57 7 F Filter ilter 5 58 8 F Filter ilter 5 59 9 F Filter ilter 6 60 0 F Filter ilter 6 61 1 F Filter ilter 62 62 F Filter ilter 6 63 3 F Filter ilter 6 64 4 Neuroinform (2021) 19:159–184 181 Fig. 21 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 1 12 2 F Filter ilter 1 13 3 F Filter ilter 1 14 4 F Filter ilter 1 15 5 F Filter ilter 16 16 directly after convolutional layer 5. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 2 20 0 F Filter ilter 2 21 1 F Filter ilter 2 22 2 F Filter ilter 2 23 3 F Filter ilter 24 24 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 2 28 8 F Filter ilter 2 29 9 F Filter ilter 3 30 0 F Filter ilter 3 31 1 F Filter ilter 32 32 F Filter ilter 3 33 3 F Filter ilter 3 34 4 F Filter ilter 3 35 5 F Filter ilter 3 36 6 F Filter ilter 3 37 7 F Filter ilter 3 38 8 F Filter ilter 3 39 9 F Filter ilter 40 40 F Filter ilter 4 41 1 F Filter ilter 4 42 2 F Filter ilter 4 43 3 F Filter ilter 4 44 4 F Filter ilter 4 45 5 F Filter ilter 4 46 6 F Filter ilter 4 47 7 F Filter ilter 48 48 F Filter ilter 4 49 9 F Filter ilter 5 50 0 F Filter ilter 5 51 1 F Filter ilter 5 52 2 F Filter ilter 5 53 3 F Filter ilter 5 54 4 F Filter ilter 5 55 5 F Filter ilter 56 56 F Filter ilter 5 57 7 F Filter ilter 5 58 8 F Filter ilter 5 59 9 F Filter ilter 6 60 0 F Filter ilter 6 61 1 F Filter ilter 6 62 2 F Filter ilter 6 63 3 F Filter ilter 64 64 Fig. 22 Feature map visualizations of the trained CNN from Experiment I. These F Filter ilter1 1 F Filter ilter2 2 F Filter ilter3 3 F Filter ilter4 4 F Filter ilter5 5 F Filter ilter6 6 F Filter ilter7 7 F Filter ilter8 8 visualizations were obtained by passing a T1w slice through the network, and showing the results F Filter ilter9 9 F Filter ilter 1 10 0 F Filter ilter 1 11 1 F Filter ilter 1 12 2 F Filter ilter 1 13 3 F Filter ilter 14 14 F Filter ilter 1 15 5 F Filter ilter 1 16 6 directly after convolutional layer 6. The slice is the same as the one shown in Fig. 1 F Filter ilter 1 17 7 F Filter ilter 1 18 8 F Filter ilter 1 19 9 F Filter ilter 2 20 0 F Filter ilter 2 21 1 F Filter ilter 22 22 F Filter ilter 2 23 3 F Filter ilter 2 24 4 F Filter ilter 2 25 5 F Filter ilter 2 26 6 F Filter ilter 2 27 7 F Filter ilter 2 28 8 F Filter ilter 2 29 9 F Filter ilter 30 30 F Filter ilter 3 31 1 F Filter ilter 3 32 2 F Filter ilter 3 33 3 F Filter ilter 3 34 4 F Filter ilter 3 35 5 F Filter ilter 3 36 6 F Filter ilter 3 37 7 F Filter ilter 38 38 F Filter ilter 3 39 9 F Filter ilter 4 40 0 F Filter ilter 4 41 1 F Filter ilter 4 42 2 F Filter ilter 4 43 3 F Filter ilter 4 44 4 F Filter ilter 4 45 5 F Filter ilter 46 46 F Filter ilter 4 47 7 F Filter ilter 4 48 8 F Filter ilter 4 49 9 F Filter ilter 5 50 0 F Filter ilter 5 51 1 F Filter ilter 5 52 2 F Filter ilter 5 53 3 F Filter ilter 54 54 F Filter ilter 5 55 5 F Filter ilter 5 56 6 F Filter ilter 5 57 7 F Filter ilter 5 58 8 F Filter ilter 5 59 9 F Filter ilter 6 60 0 F Filter ilter 6 61 1 F Filter ilter 62 62 F Filter ilter 6 63 3 F Filter ilter 6 64 4 182 Neuroinform (2021) 19:159–184 References Greenspan, H., van Ginneken, B., Summers, R.M. (2016). Deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging, Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., 35(5), 1153–1159. https://doi.org/10.1109/TMI.2016.2553401. Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, for image recognition. In 2016 IEEE Conference on computer B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, vision and pattern recognition (CVPR), (Vol. 29 pp. 770–778), Y., Zheng, X. (2016). TensorFlow: a system for large-scale https://doi.org/10.1109/CVPR.2016.90. machine learning. In 12th USENIX Symposium on operating Hirsch, J.D., Siegel, E.L., Balasubramanian, S., Wang, K.C. systems design and implementation (OSDI 16) (pp. 265–283): (2015). We built this house; it’s time to move in: Lever- USENIX Association. https://www.usenix.org/conference/osdi16/ aging existing DICOM structure to more completely technical- sessions/presentation/abad. utilize readily available detailed contrast administration Akkus, Z., Ali, I., Sedla ´ˇ r, J., Agrawal, J.P., Parney, I.F., Giannini, information. Journal of Digital Imaging, 28(4), 407–411. C., Erickson, B.J. (2017). Predicting deletion of chromosomal https://doi.org/10.1007/s10278-015-9771-y. arms 1p/19q in low-grade gliomas from MR images using Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q. (2017). machine intelligence. Journal of Digital Imaging, 30(4), 469–476. Densely connected convolutional networks. In 2017 IEEE https://doi.org/10.1007/s10278-017-9984-3. Conference on computer vision and pattern recognition (CVPR), Arias, J., Mart´ ınez-Gomez, ´ J., Gamez, ´ J.A., de Herrera, A.GS., Mu ¨ ller, (Vol. 30 pp. 2261–2269), https://doi.org/10.1109/CVPR.2017.243. H. (2016). Medical image modality classification using discrete Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Bayesian networks. Computer Vision and Image Understanding, Smith, S.M. (2012). FSL. NeuroImage, 62(2), 782–790. 151, 61–71. https://doi.org/10.1016/j.cviu.2016.04.002. https://doi.org/10.1016/j.neuroimage.2011.09.015. Barboriak, D. (2015). Data from RIDER NEURO MRI. Kingma, D.P., & Ba, J. (2015). Adam: a method for stochastic https://doi.org/10.7937/K9/TCIA.2015.VOSN3HN1. optimization. In: 3rd International conference on learning repre- Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Kop- sentations, ICLR conference track proceedings. ArXiv:1412.6980. pel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). ImageNet Tarbox, L., Prior, F. (2013). The cancer imaging archive classification with deep convolutional neural networks. In (TCIA): maintaining and operating a public information Pereira, F., Burges, C.J., Bottou, L., Weinberger, K.Q. (Eds.) repository. Journal of Digital Imaging, 26(6), 1045–1057. Communications of the ACM, (Vol. 25 pp. 1097–1105): Curran https://doi.org/10.1007/s10278-013-9622-7. Associates, Inc., https://doi.org/10.1145/3065386. DeVries, T., & Taylor, G.W. (2018). Learning confidence for out-of- Lambin, P., Leijenaar, R.T., Deist, T.M., Peerlings, J., de Jong, distribution detection in neural networks. arXiv:180204865. E.E., van Timmeren, J., Sanduleanu, S., Larue, R.T., Even, Dimitrovski, I., Kocev, D., Kitanovski, I., Loskovska, S., Dzer ˇ oski, A.J., Jochems, A., van Wijk, Y., Woodruff, H., van Soest, J., S. (2015). Improved medical image modality classifica- Lustberg, T., Roelofs, E., van Elmpt, W., Dekker, A., Mot- tion using a combination of visual and textual features. taghy, F.M., Wildberger, J.E., Walsh, S. (2017). Radiomics: Computerized Medical Imaging and Graphics, 39, 14–26. the bridge between medical imaging and personalized https://doi.org/10.1016/j.compmedimag.2014.06.005. medicine. Nature Reviews Clinical Oncology, 14(12), 749–762. Erickson, B., Akkus, Z., Sedlar, J., Korfiatis, P. (2016). Data from https://doi.org/10.1038/nrclinonc.2017.141. LGG-1p19qDeletion. https://doi.org/10.7937/K9/TCIA.2017. LaMontagne, P.J., Keefe, S., Lauren, W., Xiong, C., Grant, dwehtz9v. E.A., Moulder, K.L., Morris, J.C., Benzinger, T.L., Mar- Esteban, O., Birman, D., Schaer, M., Koyejo, O.O., Poldrack, R.A., cus, D.S. (2018). OASIS-3: longitudinal neuroimaging, Gorgolewski, K.J. (2017). MRIQC: advancing the automatic clinical, and cognitive dataset for normal aging and prediction of image quality in MRI from unseen sites. PLOS ONE, Alzheimer’s disease. Alzheimer’s & Dementia, 14(7), P1097. 12(9), 1–21. https://doi.org/10.1371/journal.pone.0184661. https://doi.org/10.1016/j.jalz.2018.06.1439. Fyllingen, E.H., Stensjøen, A.L., Berntsen, E.M., Solheim, O., Rein- Lee, D., Kim, J., Moon, W.J., Ye, J.C. (2019). CollaGAN: collaborative ertsen, I. (2016). Glioblastoma segmentation: comparison of three GAN for missing image data imputation. In 2019 IEEE/CVF different software packages. PLOS ONE, 11(10), e0164891:1– conference on computer vision and pattern recognition (CVPR) e0164891:16. https://doi.org/10.1371/journal.pone.0164891. (pp. 2482–2491). https://doi.org/10.1109/CVPR.2019.00259. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of Li, R., Zhang, W., Suk, H.I., Wang, L., Li, J., Shen, D., Ji, training deep feedforward neural networks. In Teh, Y.W., & S. (2014). Deep learning based imaging data completion Titterington, M. (Eds.) Proceedings of the thirteenth international for improved brain disease diagnosis. In Golland, P., Hata, conference on artificial intelligence and statistics, PMLR, N., Barillot, C., Hornegger, J., Howe, R. (Eds.) Medical proceedings of machine learning research, (Vol. 9 pp. 249–256). image computing and computer-assisted intervention – MIC- http://proceedings.mlr.press/v9/glorot10a.htm. CAI 2014 (pp. 305–312): Springer International Publishing, Gorgolewski, K.J., Auer, T., Calhoun, V.D., Craddock, R.C., Das, S., https://doi.org/10.1007/978-3-319-10443-0 39. Duff, E.P., Flandin, G., Ghosh, S.S., Glatard, T., Halchenko, Y.O., Li, X., Morgan, P.S., Ashburner, J., Smith, J., Rorden, C. (2016). Handwerker, D.A., Hanke, M., Keator, D., Li, X., Michael, Z., The first step for neuroimaging data analysis: DICOM to Maumet, C., Nichols, B.N., Nichols, T.E., Pellman, J., Poline, NIfTI conversion. Journal of Neuroscience Methods, 264, 47–56. J.B., Rokem, A., Schaefer, G., Sochat, V., Triplett, W., Turner, https://doi.org/10.1016/j.jneumeth.2016.03.001. J.A., Varoquaux, G., Poldrack, R.A. (2016). The brain imaging Li, Z., Wang, Y., Yu, J., Guo, Y., Cao, W. (2017). Deep learning based data structure, a format for organizing and describing outputs radiomics (DLR) and its usage in noninvasive IDH1 prediction of neuroimaging experiments. Scientific Data, 3(1), 160044:1– for low grade glioma. Scientific Reports, 7(1), 5467:1–5467:11. 160044:9. https://doi.org/10.1038/sdata.2016.44. https://doi.org/10.1038/s41598-017-05848-2. Gorgolewski, K., Esteban, O., Schaefer, G., Wandell, B., Poldrack, Lundervold, A.S., & Lundervold, A. (2019). An overview R. (2017). OpenNeuro — a free online platform for sharing of deep learning in medical imaging focusing on MRI. and analysis of neuroimaging data. F1000research, 6, 1055. Zeitschrift fur ¨ Medizinische Physik, 29(2), 102–127. https://doi.org/10.7490/f1000research.1114354.1. https://doi.org/10.1016/j.zemedi.2018.11.002. Neuroinform (2021) 19:159–184 183 Marek, K., Chowdhury, S., Siderowf, A., Lasch, S., Coffey, C.S., tumor analysis consortium glioblastoma multiforme CPTAC- Caspell-Garcia, C., Simuni, T., Jennings, D., Tanner, C.M., GBM collection. https://doi.org/10.7937/k9/tcia.2018.3rje41q1. Trojanowski, J.Q., Shaw, L.M., Seibyl, J., Schuff, N., Singleton, Nie, D., Zhang, H., Adeli, E., Liu, L., Shen, D. (2016). 3D deep A., Kieburtz, K., Toga, A.W., Mollenhauer, B., Galasko, D., learning for multi-modal imaging-guided survival time prediction Chahine, L.M., Weintraub, D., Foroud, T., Tosun-Turgut, D., of brain tumor patients. In Ourselin, S., Joskowicz, L., Sabuncu, Poston, K., Arnedo, V., Frasier, M., Sherer, T., Bressman, S., M.R.,Unal, G.,Wells, W.(Eds.) Medical image computing and Merchant, M., Poewe, W., Kopil, C., Naito, A., Dorsey, R., computer-assisted intervention – MICCAI 2016, (Vol. 19 pp. 212– Casaceli, C., Daegele, N., Albani, J., Uribe, L., Foster, E., Long, 220): Springer, https://doi.org/10.1007/978-3-319-46723-8 25. J., Seedorff, N., Crawford, K., Smith, D., Casalin, P., Malferrari, Pedano, N., Flanders, A.E., Scarpace, L., Mikkelsen, T., G., Halter, C., Heathers, L., Russell, D., Factor, S., Hogarth, P., Eschbacher, J.M., Hermes, B., Sisneros, V., Barnholtz-Sloan, Amara, A., Hauser, R., Jankovic, J., Stern, M., Hu, S.C., Todd, J., Ostrom, Q. (2016). Radiology data from the cancer G., Saunders-Pullman, R., Richard, I., Saint-Hilaire, H., Seppi, genome atlas low grade glioma [TCGA-LGG] collection. K., Shill, H., Fernandez, H., Trenkwalder, C., Oertel, W., Berg, https://doi.org/10.7937/K9/TCIA.2016.L4LTD3TK. D., Brockman, K., Wurster, I., Rosenthal, L., Tai, Y., Pavese, Pereira, S., Pinto, A., Alves, V., Silva, C.A. (2015). Deep con- N., Barone, P., Isaacson, S., Espay, A., Rowe, D., Brandabur, volutional neural networks for the segmentation of gliomas M., Tetrud, J., Liang, G., Iranzo, A., Tolosa, E., Marder, K., in multi-sequence MRI. In Crimi, A., Menze, B., Maier, O., Sanchez, M., Stefanis, L., Marti, M., Martinez, J., Corvol, J.C., Reyes, M., Handels, H. (Eds.) Brainlesion: glioma, multiple Assly, O., Brillman, S., Giladi, N., Smejdir, D., Pelaggi, J., sclerosis, stroke and traumatic brain injuries. Lecture notes Kausar, F., Rees, L., Sommerfield, B., Cresswell, M., Blair, C., in computer science, (Vol. 9556 pp. 131–143): Springer, Williams, K., Zimmerman, G., Guthrie, S., Rawlins, A., Donharl, https://doi.org/10.1007/978-3-319-30858-6 12. L., Hunter, C., Tran, B., Darin, A., Venkov, H., Thomas, C.A., Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M. James, R., Heim, B., Deritis, P., Sprenger, F., Raymond, D., (2013). Deep feature learning for knee cartilage segmentation Willeke, D., Obradov, Z., Mule, J., Monahan, N., Gauss, K., using a triplanar convolutional neural network. In Mori, K., Fontaine, D., Szpak, D., McCoy, A., Dunlop, B., Payne, L., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (Eds.) Advanced Ainscough, S., Carvajal, L., Silverstein, R., Espay, K., Ranola, M., information systems engineering (pp. 246–253). Berlin: Springer, Rezola, E., Santana, H., Stamelou, M., Garrido, A., Carvalho, S., https://doi.org/10.1007/978-3-642-40763-5 31. Kristiansen, G., Specketer, K., Mirlman, A., Facheris, M., Soares, Prevedello, L.M., Halabi, S.S., Shih, G., Wu, C.C., Kohli, M.D., H., Mintun, A., Cedarbaum, J., Taylor, P., Jennings, D., Slieker, Chokshi, F.H., Erickson, B.J., Kalpathy-Cramer, J., Andriole, K.P., L., McBride, B., Watson, C., Montagut, E., Sheikh, Z., Bingol, B., Flanders, A.E. (2019). Challenges related to artificial intelligence Forrat, R., Sardi, P., Fischer, T., Reith, D., Egebjerg, J., Larsen, research in medical imaging and the importance of image analysis L., Breysse, N., Meulien, D., Saba, B., Kiyasova, V., Min, C., competitions. Radiology: Artificial Intelligence, 1(1), e180031. McAvoy, T., Umek, R., Iredale, P., Edgerton, J., Santi, D., Czech, https://doi.org/10.1148/ryai.2019180031. C., Boess, F., Sevigny, J., Kremer, T., Grachev, I., Merchant, Remedios, S., Roy, S., Pham, D.L., Butman, J.A. (2018). Classi- K., Avbersek, A., Muglia, P., Stewart, A., Prashad, R., Taucher, fying magnetic resonance image modalities with convolutional J., the Parkinson’s Progression Markers Initiative (2018). The neural networks. In Petrick, N., & Mori, K. (Eds.) Medical parkinson’s progression markers initiative (PPMI,) – establishing imaging 2018: computer-aided diagnosis, international society a PD biomarker cohort. Annals of Clinical and Translational for optics and photonics, SPIE, (Vol. 10575 pp. 558–563), Neurology, 5(12), 1460–1477. https://doi.org/10.1002/acn3.644. https://doi.org/10.1117/12.2293943. Martino, A.D., O’Connor, D., Chen, B., Alaerts, K., Anderson, J.S., Scarpace, L., Flanders, A.E., Jain, R., Mikkelsen, T., Assaf, M., Balsters, J.H., Baxter, L., Beggiato, A., Bernaerts, Andrews, D.W. (2015). Data from REMBRANDT. S., Blanken, L.ME., Bookheimer, S.Y., Braden, B.B., Byrge, https://doi.org/10.7937/K9/TCIA.2015.588OZUZB. L., Castellanos, F.X., Dapretto, M., Delorme, R., Fair, D.A., Scarpace, L., Mikkelsen, T., Cha, S., Rao, S., Tekchandani, Fishman, I., Fitzgerald, J., Gallagher, L., Keehn, R., Kennedy, S., Gutman, D., Saltz, J.H., Erickson, B.J., Pedano, N., D.P., Lainhart, J.E., Luna, B., Mostofsky, S.H., Mu ¨ ller, R.A., Flanders, A.E., Barnholtz-Sloan, J., Ostrom, Q., Barboriak, Nebel, M.B., Nigg, J.T., O’Hearn, K., Solomon, M., Toro, D., Pierce, L.J. (2016). Radiology data from the cancer R., Vaidya, C.J., Wenderoth, N., White, T., Craddock, R.C., genome atlas glioblastoma multiforme [TCGA-GBM] collection. Lord, C., Leventhal, B., Milham, M.P. (2017). Enhancing https://doi.org/10.7937/K9/TCIA.2016.RNYFUYE9. studies of the connectome in autism using the autism brain Schmainda, K., & Prah, M. (2018). Data from brain-tumor- imaging data exchange II. Scientific Data, 4(1), 170010. progression. https://doi.org/10.7937/K9/TCIA.2018.15quzvnb. https://doi.org/10.1038/sdata.2017.10. Shah, N., Feng, X., Lankerovich, M., Puchalski, Mercier, L., Maestro, R.FD., Petrecca, K., Araujo, D., Haegelen, R.B., Keogh, B. (2016). Data from ivy GAP. C., Collins, D.L. (2012). Online database of clinical MR and https://doi.org/10.7937/K9/TCIA.2016.XLwaN6nL. ultrasound images of brain tumors. Medical Physics, 39(6 Part 1), Simonyan, K., & Zisserman, A. (2015). Very deep convolutional 3253–3261. https://doi.org/10.1118/1.4709600. networks for large-scale image recognition. In Bengio, Y., Montagnon, E., Cerny, M., Cadrin-Chene ˆ vert, A., Hamilton, V., & LeCun, Y. (Eds.) International conference on learning Derennes, T., Ilinca, A., Vandenbroucke-Menu, F., Turcotte, representations, ICLR, conference track proceedings,Vol. 3. S., Kadoury, S., Tang, A. (2020). Deep learning workflow https://dblp.org/rec/html/journals/corr/SimonyanZ14a. in radiology: a primer. Insights into Imaging, 11(1), 22. Simonyan, K., Vedaldi, A., Zisserman, A. (2014). Deep inside https://doi.org/10.1186/s13244-019-0832-5. convolutional networks: visualising image classification models Moore, S.M., Maffitt, D.R., Smith, K.E., Kirby, J.S., Clark, and saliency maps. In Bengio, Y., & LeCun, Y. (Eds.) International K.W., Freymann, J.B., Vendt, B.A., Tarbox, L.R., Prior, F.W. conference on learning representations, ICLR, workshop track (2015). De-identification of medical images with retention proceedings. http://arxiv.org/abs/1312.6034. of scientific research value. RadioGraphics, 35(3), 727–735. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A. (2015). https://doi.org/10.1148/rg.2015140244. Striving for simplicity: the all convolutional net. In International National Cancer Institute Clinical Proteomic Tumor Analysis Consor- conference on learning representations ICLR, workshop track tium (CPTAC) (2018). adiology data from the clinical proteomic proceedings. http://arxiv.org/abs/1412.6806. 184 Neuroinform (2021) 19:159–184 Srinivas, M., & Mohan, C.K. (2014). Medical images modality van Ooijen, P.MA. (2019). Quality and curation of medical images classification using multi-scale dictionary learning. In 2014 19th and data (pp. 247–255): Springer International Publishing. International conference on digital signal processing,(Vol.19 https://doi.org/10.1007/978-3-319-94878-2 17. pp. 621–625), https://doi.org/10.1109/ICDSP.2014.6900739. Wang, S., Pavlicek, W., Roberts, C.C., Langer, S.G., Zhang, Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, M., Hu, M., Morin, R.L., Schueler, B.A., Wellnitz, C.V., C.B., Gotway, M.B., Liang, J. (2016). Convolutional neural Wu, T. (2011). An automated DICOM database capable of networks for medical image analysis: full training or fine tuning? arbitrary data mining (including radiation dose indicators) for IEEE Transactions on Medical Imaging, 35(5), 1299–1312. quality monitoring. Journal of Digital Imaging, 24(2), 223–233. https://doi.org/10.1109/TMI.2016.2535302. https://doi.org/10.1007/s10278-010-9329-y. Tamada, D., Kromrey, M.L., Ichikawa, S., Onishi, H., Motosugi, U. Xiao,Y., Fortin,M., Unsgar ˚ d, G., Rivaz, H., Reinertsen, I. (2020). Motion artifact reduction using a convolutional neural (2017). REtroSpective evaluation of cerebral tumors (RESECT): network for dynamic contrast enhanced MR imaging of the A clinical database of pre-operative MRI and intra-operative liver. Magnetic Resonance in Medical Sciences, 19(1), 64–76. ultrasound in low-grade glioma surgeries. Medical Physics, 44(7), https://doi.org/10.2463/mrms.mp.2018-0156. 3875–3882. https://doi.org/10.1002/mp.12268. van Erp, T.GM., Chervenak, A.L., Kesselman, C., D’Arcy, M., Yu, Y., Lin, H., Yu, Q., Meng, J., Zhao, Z., Li, Y., Zuo, L. Sobell, J., Keator, D., Dahm, L., Murry, J., Law, M., Hasso, (2015). Modality classification for medical images using multiple A., Ames, J., Macciardi, F., Potkin, S.G. (2011). Infrastruc- deep convolutional neural networks. Journal of Computational ture for sharing standardized clinical brain scans across hos- Information Systems, 11(15), 5403–5413. pitals. In 2011 IEEE International conference on bioinfor- Publisher’s Note Springer Nature remains neutral with regard to matics and biomedicine workshops (BIBMW) (pp. 1026–1028), jurisdictional claims in published maps and institutional affiliations. https://doi.org/10.1109/BIBMW.2011.6112547. Affiliations 1 2 1 Sebastian R. van der Voort · Marion Smits · Stefan Klein · for the Alzheimer’s Disease Neuroimaging Initiative Biomedical Imaging Group Rotterdam, Departments of Radiology and Nuclear Medicine and Medical Informatics, Erasmus MC - University Medical Centre Rotterdam, Rotterdam, The Netherlands Department of Radiology and Nuclear Medicine, Erasmus MC - University Medical Centre Rotterdam, Rotterdam, The Netherlands

Journal

NeuroinformaticsSpringer Journals

Published: Jul 5, 2020

There are no references for this article.