Using 3D CNN for classification of Parkinson’s disease from resting-state fMRI data

Parkinson’s disease is a chronic and progressive movement disorder caused by the degeneration of dopamine-producing neurons in the substantia nigra of the brain. Currently, there is no specific diagnostic test available for Parkinson’s disease, and physicians rely on symptoms and medical history for diagnosis. In this study, a 3D-CNN deep learning model is proposed for detecting Parkinson’s disease using 4D-fMRI data. The data is preprocessed using independent component analysis (ICA) and dual regression processes through MELODIC in FSL, which results in a sequence of 30 3D spatial maps, each with its unique time course. A reference network, referred to as an atlas, is then applied using the fslcc command in FSL to map the 3D spatial maps. Fourteen resting-state networks (RSNs) are identified successfully, while the remaining maps are rejected as noise or artifacts. The detected RSNs or 3D spatial maps are fed into the 3D-CNN model, which is trained with a 10-fold cross-validation method. The proposed model has an accuracy of 86.07% on average.


Introduction
Parkinson's disease (PD) is a neurodegenerative disorder characterized by the degeneration of dopamine producing neurons in the substantia nigra (SN) [1][2][3].PD may affect anybody, regardless of age and gender; however, it is more prevalent in males compared to women.Although it chiefly affects the elderly, but it can affect younger people as well.The symptoms of PD can be controlled with medicine and deep brain stimulation at the moment, but there is no known treatment for the condition [4,5].
Since there is no conclusive test for PD, diagnosis might be difficult.Clinical signs such as bradykinesia, resting tremor, muscular rigidity, and postural instability are used by doctors to make diagnoses, although these symptoms can only be detected once they have appeared in a patient [5][6][7].However, prior to the onset of clinical symptoms, individuals may experience other symptoms including olfactory dysfunction [8][9][10], constipation [11,12], sleep disturbances [13,14], and mood disorders [15,16].Since PD is dormant until motor or clinical symptoms appear, these early signs offer a chance for an early diagnosis.Radiographic data can be used to determine those who could be more susceptible to developing PD.
Many studies have suggested that functional connectivity differences can identify people with PD from others who are healthy.Particularly, differences in functional connectivity have been observed in the supramarginal gyrus, visual, and sensorimotor areas of the cerebral cortex [17].Another study found reduced functional connectivity in the visual networks, right middle frontal gyrus, and angular gyrus in PD patients [18].Decreased functional connectivity was also observed in the default mode network (DMN), executive control network (ECN), salience network (SN), precuneus network (PCUN), and sensorimotor network in PD patients compared to healthy individuals [19].Higher connectivity within cerebral nodes was observed in PD patients in a different study [20].
In a study by [21], functional connectivity was investigated in both healthy individuals and those diagnosed with Parkinson's disease (PD).PD patients were categorized into two groups: those with mild cognitive impairment (MCI) and those with normal cognition.Among patients with MCI, more pronounced metabolic and functional connectivity abnormalities were observed in frontoparietal connections when compared to both healthy controls and patients with normal cognition.
Another study by [22] also explored differences in functional connectivity networks among PD patients.They found that patients with posterior cortical deficits exhibited pronounced functional connectivity within the basal ganglia when compared to patients with frontostriatal deficits.Conversely, patients with frontostriatal deficits showed reduced functional connectivity in various other networks, including the visual, default mode network (DMN), salience network (SN), and others, in comparison to both healthy controls and patients with normal cognition.
Machine learning (ML) and deep learning (DL) models have been trained on fMRI data to predict PD.One study proposed a low-dimensional representation of restingstate fMRI data as a potential biomarker for PD identification using DL, achieving a classification accuracy of 96.4% [23].Another study employed pre-trained neural networks (VGG19, VGG16, and Inception v3) for PD classification in fMRI data, achieving accuracies of 91.5%, 88.5%, and 89.5%, respectively [24].In other studies, feature selection and support vector machines (SVM) were used for PD classification based on fMRI data [25,26].Radiomic features extracted from the Brainnetome 246 atlas achieved a discriminative accuracy of 78.07%using SVM.Additionally, a long short-term memory (LSTM) network characterized PD stages with an accuracy of 71.63% based on fMRI data [27].Signal processing techniques, such as short-time Fourier transform and SVM, were employed for feature extraction and classification of fMRI data in another study [28].
The aim of this work is to utilize independent component analysis (ICA) and dual regression to identify 3D spatial maps for PD classification.Additionally, a 3D-CNN model is developed for PD classification.

Methods
The use of 4D-fMRI data for PD prediction is the main goal of this work.While previous works have employed fMRI data for PD prediction, few studies have specifically utilized resting-state networks (RSNs) for PD classification.Therefore, our aim is to explore the potential of RSNs in PD classification.
We will use Multivariate Exploratory Linear Optimized Decomposition into Independent Components (MELODIC) to do independent component analysis (ICA), which will help us accomplish this.This technique enables the identification of pertinent RSNs for PD classification by allowing us to breakdown the 4D fMRI data into subject-specific spatial and temporal components.By comparing the obtained networks to previously developed reference networks, we will be able to differentiate between RSNs that contribute to PD classification and those that might be noise or unrelated to the disease.The 3D CNN designed to categorize PD patients will be fed with the identified RSN components.
ICA encompasses three types: spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA, all of which have relevance in fMRI applications.ICA is a method that aims to separate data into groups that are as independent as possible [29].In fMRI, the data consists of both signals that are of interest to us (e.g., task-related and functionrelated signals) and signals that are considered noise (e.g., physiological, respiratory, and motion-related signals, as well as scanner-related configuration) [30].In ICA models, these noise components manifest as separate components, while the task-related and function-related components are of interest for our analysis [29,31,32].

Dataset
For this study, we utilized the Tao Wu dataset [33], which includes age-matched T1 structural images and resting-state fMRI scans of the subjects.All scans were acquired using a Siemens MAGNETOM Trio 3 T scanner.The structural images had a repetition time of 1100 ms, echo time of 3.39 ms, and voxel size of 1 × 1 × 1 mm.Resting-state scans were obtained using an echo sequence with a repetition time of 2 s, echo time of 40 ms, flip angle of 90°, and voxel size of 4 × 4 × 5 mm.
The dataset utilized in this study comprises T1 and fMRI scans obtained from a total of 40 subjects, consisting of 20 control individuals and 20 individuals diagnosed with Parkinson's disease.It is important to note that all PD subjects were at a medium severity level of the disease during the data collection [34].
Each resting-state session had a duration of 8 min.For each subject, there is one T1 MR image and one fMR image.All scans from the 40 subjects were included in the study after manual and visual inspection.It was determined that all scans were in good condition and of high quality, so no scans were excluded.

Preprocessing
In this study, FSL version 6.0.5 was utilized for the preprocessing of the data.The structural T1 images were first aligned with the MNI152 template, a subsequently skull stripped using the Brain Extraction Tool (BET) algorithm available in FSL.The BET algorithm is employed in FSL for brain extraction and skull stripping, which detects and removes non-brain tissue from structural MRI scans, resulting in a final image that contains only brain tissue.This process enhances the accuracy of subsequent image analysis methods applied to brain imaging data.
For the preprocessing of the fMRI images, several steps were performed.Motion correction was applied to correct for any motion artifacts present in the data.Slice timing correction was also performed to account for the differences in acquisition timing of individual slices.Additionally, a Gaussian filter with a full width at half maximum (FWHM) of 8 mm was applied to smooth the fMRI images.Finally, a high-pass filter was used to remove low-frequency noise from the data.
All preprocessed fMRI images were registered to their corresponding pre-processed structural T1 images of the same subjects.Subsequently, these images were registered to the standard MNI152 space and, finally, resampled to 4 mm.All the preprocessing steps for the structural and fMR image have been diagrammatically represented in Fig. 1.
The process of aligning images from distinct geometric spaces to a single one is known as registration.Registration comprises three major components.To begin, a spatial geometric change must be provided.Translation, rotation, scaling, and shearing in the x, y, and z dimensions are the 3D transformation parameters.Rigid-body transformation involves three translations and three rotations consisting of six degrees of freedom Fig. 1 Representing the flow of the preprocessing steps for the fMRI and its corresponding T1 structural image of the same subject (DOF), whereas affine transformation involves three scaling and three shearing variables in addition to the rigid-body parameters, consisting of 12 DOF [35].Generally, the problem of image registration may be summarized as follows: D is the distance measure, S represents the smoother, and C represents the explicit con- straints.The regularization parameter α can be used to modify the smoothness of the displacement in relation to the similarity of the images [36].
Slice timing correction is performed to correct the time differences at which each slice was acquired.During an MRI scan, people are not advised to move their heads.However, inevitable head motions occur, and the data becomes damaged with motionrelated abnormalities.As a result, head motion correction should be applied to fMRI data.Motion correction is achieved by using a rigid-body transformation to register all volumes to a reference volume.The reference volume might be any volume, although it is usually the first or middle volume of the whole data set that is chosen [35].
The weighted average across surrounding voxels is calculated using an 8-mm full width at half maximum (FWHM) Gaussian filter to accomplish spatial smoothing.Smoothing is used to lower the noise in image data.However, it can lead to the diminishing of the signal intensity.A 3D Gaussian function with the standard deviation of σ can be repre- sented by the following [37]: MELODIC version 3.15 was used for performing the multi-session temporal concatenation.Multi-session temporal concatenation is a method used to conduct Group ICA.Group ICA helps to identify resting-state networks (RSNs) that are shared among all individuals in the group.This analysis allows us to uncover whole-brain networks that exhibit similar patterns of functional connectivity.
To further investigate the individual contributions to these group-level RSNs, we utilized a technique known as dual regression [38].Dual regression enables us to determine the subject-specific spatial maps and time courses for each RSN identified at the group level.
Although FSL has the capability to estimate the dimensionality automatically, we followed the approach taken in previous similar studies by setting the number of independent components to 30 [39][40][41][42].In addition, we applied variance normalization to the time courses and set a threshold of 0.5 for the independent component (IC) maps.This allowed us to effectively extract relevant information from the data and obtain a set of 30 independent components/spatial maps.
The set of spatial maps from the group average analysis was used to generate subjectspecific versions of the spatial maps, and associated time series, using dual regression [43,44].First, for each subject, the group-average set of spatial maps is regressed (as spatial regressors in a multiple regression) into the subject's 4D space-time dataset.This (1) results in a set of subject-specific time series, one per group-level spatial map.Next, those time series are regressed (as temporal regressors, again in a multiple regression) into the same 4D dataset, resulting in a set of subject-specific spatial maps, one per group-level spatial map.We then tested for using FSL's randomize permutation-testing tool.
To make it clear, ICA decomposes the fMRI data into spatially independent components, which may include components that represent some genuine neural activity, some physiological noise such as breathing or any other similar brain signals, and it may also include some motion-related artifacts.As we have performed Group ICA, these components represent common patterns of brain activity across all subjects, including both the controls and the subjects affected by PD.Each of these components represents a spatial map and an associated time series but averaged across all the subjects.
Dual regression is used to identify the subject-specific contributions to the Group ICA.It produces subject-specific spatial maps and time courses for each group-level component, allowing for comparisons across the subjects.Dual regression is a two-stage process; in stage 1, group-level spatial maps are regressed against each subjects 4D-fMRI data.It results in subject-specific time courses.Stage 2 uses the time courses from stage 1 to generate subject-specific spatial maps.In essence, dual regression allows us to measure the individual expression of group-level brain networks identified by ICA.
Out of the 30 components obtained from the independent component analysis (ICA), not all of them necessarily correspond to resting-state networks (RSNs).It is possible for ICA to identify components that may be noise or artifacts rather than meaningful networks.To determine the RSNs among the obtained components, we employed the fslcc command in FSL, comparing them against previously known reference networks described in [45].
Through this comparison, we identified a total of 14 components that demonstrated significant correlation with the known RSNs in the reference network.These 14 components represented the RSNs of interest and were utilized as input for the training of a 3D CNN.On the other hand, the remaining 16 components that could not be identified as RSNs were considered to mostly represent noise or other artifacts and were discarded from further analysis.All the identified RSNs have been listed in Table 1.
Figure 2 displays the images of all 30 components obtained from the ICA.In Table 1, we provide the names of the identified RSNs corresponding to the images they represent.The networks are labelled according to their respective numbers in Fig. 1, which are displayed in the top left corner of each spatial map.

Model development and experimental setup
For the classification of the disease, a 3D convolutional neural network (3D-CNN) was employed.The model was trained using Google Colab.The architecture of the 3D CNN comprised three convolutional layers, three max pooling layers, three dropout layers, two dense layers, one flatten layer, and one classification layer.Additionally, a batch normalization layer was utilized after the final max pooling layer.The rectified linear unit (ReLU) activation function was applied to each layer of the neural network, except for the classification layer, which used the sigmoid activation function.L2 regularization was incorporated into all the convolutional and dense layers of the model.Figure 3 depicts the diagrammatic representation of the 3D CNN, while Table 2 provides a detailed description of the architecture of each layer, including the feature maps, strides, kernel size, and other parameters.
During training, a learning rate of 0.0001 was set, and a batch size of 9 was used.The model was trained for a total of 13 epochs, and k-fold cross validation with k = 10 was employed.In k-fold cross-validation, the entire dataset is randomly divided into smaller subsets of nearly equal sizes.The complete dataset is then used to train and test the model, and the cross-validation estimate of accuracy is calculated by dividing the overall number of accurate classifications by the total number Fig. 2 Representing all the ICs/RSNs that were obtained after performing the ICA.Each RSN is represented by a number at its top left corner.All the RSNs that could be identified are described in Table 1 of data points in the dataset [46].At the beginning of the process, 10% of the dataset was set aside for testing purposes and was not used in any other part of the training process except for evaluating the performance of the 3D CNN. Figure 4 represents the entire flow of the methodology including the preprocessing, generation of the ICs, and training and evaluation of the 3D-CNN.measure of the model's ability to generalize well to new data.Sensitivity, also known as the true-positive rate, is defined as the ratio of true positives (TP) predicted by the model to the total number of positive instances.It provides insights into the model's effectiveness in identifying positive cases.On the other hand, specificity, referred to as the true-negative rate, is defined as the ratio of true negatives (TN) predicted by the model to the total number of true negatives.It indicates the model's ability to accurately identify negative cases.
In a binary classification problem, each data point can result in four types of predictions: TP, TN, FP (false positive), or FN (false negative).These metrics can be defined as follows: In this study, independent component analysis (ICA) was utilized to reduce the dimensionality of the 4D fMRI data, which is inherently high dimensional.The fMRI data was transformed into 30 independent components (ICs) by selecting the components with the highest variance.This reduction was achieved through the MELODIC group concatenation technique.Subsequently, the ICs were further decomposed into their respective 3D spatial maps and corresponding time series using dual regression.
The resulting spatial maps were employed as features for a 3D convolutional neural network that was trained to classify Parkinson's disease.The model exhibited an average accuracy of 86.07%, average sensitivity of 87.4%, and average specificity of 85.4% on the testing data when evaluated through a 10-fold cross-validation approach.
This approach demonstrated promising results in utilizing fMRI data for PD classification, achieving a high average accuracy rate.

Conclusions
To the best of our knowledge, this study represents the first instance of utilizing independent component analysis (ICA) within the MELODIC framework to reduce the dimensionality of 4D-fMRI data.Additionally, the separation of time series and spatial maps using ICA in MELODIC has been performed for the first time in the context of building classification models for predicting PD.
The model was trained on a balanced dataset comprising age-matched subjects with an equal representation of men and women.Although PD primarily affects older individuals, the inclusion of young people cannot be disregarded.The model exhibited accurate predictions for PD; however, there is room for improvement in terms of testing accuracy.
It is important to note that parameters for MRI scans may vary between labs and manufacturers, as each follows its own protocols.In this dataset, all samples were acquired from the same lab and manufacturer.Consequently, the deep learning model may have Furthermore, there are differences in total intracranial volumes between men and women.Previous studies have shown that models trained exclusively on brain scans of either men or women tend to have better predictive capabilities compared to models trained on mixed-gender data.

Fig. 3
Fig. 3 Representation of the 3D CNN used in this for the classification of PD learned certain features specific to the acquisition protocol employed by that particular lab.

Table 1
List of spatial maps that were identified by comparing the obtained RSNs to a known reference network

Table 2
Describing in detail the architecture of the 3D CNN

Layer Feature map Stride Kernel Dropout rate Activation Pool size Regularization (l2 = 0.05)
et al.Journal of Engineering and Applied Science (2023) 70:89