Leveraging Lightweight Pretrained Model for Brain Tumour Detection

. This study presents an analysis of two deep learning models deployed for brain tumour detection: the lightweight pretrained MobileNetV2 and a novel hybrid model by combining light-weight MobileNetV2 with VGG16. The aim is to investigate the performance and efficiency of these models in terms of accuracy and training time. The new hybrid model integrates the strengths of both architectures, leveraging the depth-wise separable convolutions of MobileNetV2 and the deeper feature extraction capabilities of VGG16. Through experimentation and evaluation using a publicly available benchmark brain tumour dataset, the results demonstrate that the hybrid model achieves superior accuracy of training and testing accuracy of 99% and 98%, respectively compared to the standalone MobileNetV2 model, even at lower epochs. This novel fusion model presents a promising approach for enhancing brain tumour detection systems, offering improved accuracy with reduced training time and computational resources.


Introduction
Since the beginning of image processing, the leading and unquestionably one of the most important fields that researchers have chosen is medical imaging. In this intensely competitive society, maintaining one's health is crucial for survival. In terms of health discourse, cancer is the most perilous and life-threatening problem. The most lethal and concerning malignancies for both children and adults include those of the brain, bladder, colorectal, leukaemia, breast, kidney, lung, prostate, etc. [1] Leukaemia, brain tumours, and lymphomas are the three most prevalent cancers in children.
The study presents the concept of automatically segmenting brain tumours using MRI in order to view the anatomy of the brain. The MRI scan is used throughout the entire study process. CT scans are not preferable for diagnosis in comparison to MRI scans. In most cases, MRI uses both radio and the magnetic field. Waves are used for the MR picture. In the field of identifying brain tumours, a plethora of algorithms have been developed [2]. In terms of extraction and detection, they might, however, be subject to some limitations.
Brain tumour segmentation is the most important and difficult challenge to convert the current method into an automated one. The most important job in identifying a tumour is segmentation [3]. Due to various difficulties and anomalies, segmentation is regarded as the most important step in the processing of medical images. Additionally, the majority of brain MRI scans have noise, deviation, etc. Consequently, the precise segmentation of brain MRI images has grown to be a laborious operation.
The remaining part of the paper is divided into the following sections: Problem statement is discussed in section II. Section III elucidates the corpus of relevant research undertaken in this domain. The proposed methodology and theoretical arguments are discussed in Section IV. The results are specifically highlighted in Section V using a number of performance metrics. The general discussion and conclusion are drawn in Section VI and VII.

Problem Statement
A combination of imaging methods, histological examination, and clinical evaluation is often employed for the ascertainment of grade and level of a brain tumour. The first phase is acquiring the patient's medical history and doing a comprehensive physical examination.
Neuroimaging: Brain tumours may often be seen and located using imaging techniques like computed tomography (CT) and magnetic resonance imaging (MRI). These imaging techniques offer thorough details regarding the shape, location, and features of the tumour, which may be used to establish its grade and level. Finally, a biopsy will be performed to confirm the tumour's grade and level for sure.
The available options for treatment can be influenced by factors such as the type, size, and location of the tumour. The treatment focus may vary between providing relief from symptoms or aiming for a cure. There is a wide range of brain tumour types, with approximately 120 different variations, many of which can be treated. Advancements in medicine are extending the lives of numerous individuals and improving their overall quality of life. The process of normal cell growth involves the replacement of old or damaged cells in a controlled manner, allowing healthy cells to thrive. However, tumour cells exhibit uncontrolled proliferation for reasons that are not yet fully understood. A primary tumour can be either benign or malignant. Benign tumours grow in a contained manner and do not invade nearby brain tissue. The main goal is to automate such processes in order to achieve the better outcome and with the minimal resources needed.

Related Literature
Binary classifiers have been used in earlier research works to distinguish between benign and tumorous. Histogram equalization, Discrete wavelet transform, feed-forward ANN can be employed. Ullah et al. [4] suggested a hybrid system for the categorization of brain MRI.
It is intrinsic to extract both levels feature i.e., low and high. Fisher vector (FV) was used by Cheng et al. [5] to extract the brain tumour. The statistical characteristics that are retrieved from SIFT, and BoW (Bag of words) are high-level features that are formed on a local scale without taking spatial information into account.
A hybrid energy-efficient technique was developed by Rajan and Sundar [6]. Their suggested approach has seven extensive phases and a purported 98% accuracy. Their suggested model's key flaw is its lengthy computation time caused by the employment of various approaches.
Deep learning techniques have been widely employed for brain MRI categorization since the last ten years [7,8]. The deep learning approach does not require manual labour (handcrafting). One of the main challenges in MR imaging categorization closing the semantic divide between the raw visual data acquired by the imaging machine and the intricate visual knowledge acquired by human assessors. Convolutional neural networks are well-known deep learning approaches for image data, utilised to gather the pertinent features for the classification in order to close the semantic gap.
Previous research studies [9,10] have utilized CNN architectures to classify brain tumours. These CNN models employ convolution and pooling operations to extract features from scans. The primary objective was to identify the most effective deep learning architecture for precisely categorizing MRI images. Francisco et al. [11] deployed a multipath architecture, specifically designed for automatic segmentation of brain tumours, including gliomas, meningiomas, and pituitary tumours. They utilized a publicly available dataset consisting of T1-weighted, contrast-enhanced MRI scans to evaluate their proposed model, achieving an impressive accuracy of 97.3%. However, it should be noted that their training process is resource-intensive in terms of computational costs.
A paradigm to categorise brain tumours based on several stages was put up by Preethi and Aishwarya [12]. To create the feature matrix, they combined the GLCM and OFPA was used to further decrease the derived features. Using the chosen features, neural network achieved 92% accuracy.
Extreme machine learning local receptive field (ELM-LRF) model for brain tumour identification was proposed by Ari and Hanbay [13]. In this method, the CNN model is given the tumour image in order to extract features. The classifier is given the pooled feature before the hidden layer of the ELM. On the dataset, the suggested method has a 97.18% accuracy rate.
To summarize, the aforementioned research papers indicate that deep learning techniques achieve higher accuracy in classifying brain MRI scans compared to traditional machine learning methods. However, for deep learning models to surpass conventional ML approaches, they require a substantial volume of training data. Recent studies clearly demonstrate the integration of deep learning approaches into expert systems, intelligent systems, and medical image analysis. Moreover, it is crucial to acknowledge the limitations of the methodologies discussed above when dealing with brain tumour classification and segmentation.

Proposed Model
The detailed explanation of the proposed model, considering various numbers of layers and pre-trained models, is provided in the subsequent sections. Figure 1 shows the basic architecture of our model, which are common to both MobileNetV2 and proposed concatenated model of MobileNetV2 and VGG16. The proposed model is founded on deep learning principles, employing distinct hyperparameters for training while optimizing these parameters through the utilization of a loss function and the Adam optimizer. The utilization of a loss function in machine learning is a means of evaluating the efficacy of a given algorithm in modelling the provided data. Over time, the loss function progressively learns to minimize prediction errors with the assistance of an optimization function. Various loss functions are available and frequently utilized for binary classification tasks, where the model is trained to separately predict the likelihood of each class. The binary cross-entropy loss calculates the average loss for the entire batch by comparing the anticipated probabilities with the actual labels. The Adam optimizer is employed due to its effective amalgamation of adaptive learning rates, momentum, and gradient optimization techniques. Its utilization accelerates the training process, facilitates improved convergence, and enhances the overall performance of neural networks. . Evaluate the trained model on the test dataset to obtain the final performance metrics. 10. Make predictions on new, unseen brain tumour images. 11. Perform any necessary post-processing or analysis on the predictions. 12. Iterate and refine the model.

Dataset
This study makes use of the Kaggle Brain Tumour Detection dataset [14]. 1500 positive samples and 1500 negative samples are provided out of a total of 3000 samples in the collection. For the purposes of this study the dataset was divided into training, testing, and validation subsets, adhering to a proportional allocation ratio of 80:10:10, respectively. Another smaller dataset is used for comparing results of fusion model from Kaggle [15] containing 98 no tumour and 155 tumour images

Image Preprocessing, data distribution and Augmentation
This phase is common to both the models. These preprocessing steps assume paramount significance when employing pre-trained models, guaranteeing compliance between input data and the model's specific requirements, including consistent dimensions, normalized pixel values, and an appropriate format. Adhering to these preprocessing protocols ensures that the images are suitably prepared for subsequent inference or training using the pretrained MobileNetV2 model, thereby enhancing the accuracy of predictions or facilitating effective learning. it includes fetching the images from dataset, maintaining the aspect ratio while scaling to a certain width, changing the file format to a NumPy array, and normalising the pixel values by dividing by 255.0. By ensuring that pixel values are scaled to a uniform range of [0, 1], the normalisation process promotes uniformity within the input data. Figure  2 and 3 shows no tumour and tumour images after preprocessing respectively. The split of the data into sets improves the consistency and dependability of the machine learning model. In our study, the data were split as follows: 80% training, 10% test, and the remaining 10% validation data. This process facilitates model construction, model optimisation, and model evaluation, leading to more accurate and dependable predictions on unobserved data.
Augmentation techniques, such as zooming, shifting, and rescaling, are employed to modify the training data, enhancing the model's stability and capacity to generalize to unseen examples. Moreover, the pixel values of the images are normalized by scaling them within the range of 0 to 1. These preprocessing steps, which involve data augmentation and normalization, contribute to the overall improvement of the model's performance and its ability to handle diverse datasets.

Model Creation
MobileNetV2 and VGG16 serve as the suggested model's two basis models. Each base model uses convolutional layers to extract features from the input image, which is 224x224 pixels with three channels. In order to minimise the spatial dimensions, global average pooling layers are then applied to the outputs of the basis models.
Pre-trained MobileNetV2 Model -Convolutional neural network architecture called MobileNetV2 was created with the goal of achieving high accuracy while being computationally light and efficient. On numerous datasets, including the ImageNet dataset, which has over a million labelled images across thousands of classes, it has been effectively trained. The ImageNet dataset contains pre-trained weights that are loaded into the MobileNetV2 model. The input photos are scaled to fit within the (224, 224) target size. To keep the previously trained weights and stop them from changing during training, the base model is frozen. In order to minimise the spatial dimensions, a global average pooling layer is implemented. The global average pooling layer receives the output of the base model as input.
Hybrid model -Model architecture consists of two base models, MobileNetV2 and VGG16, concatenated with a fully connected layer and an output layer, as shown in figure 4. The MobileNetV2 and VGG16 models were loaded with pre-trained weights from ImageNet dataset. The input images were resized to a target size of (224, 224). The base models were frozen, ensuring that their pre-trained weights were not updated during training. Global average pooling layers were added after each base model to reduce spatial dimensions. The output of each base model was then concatenated to merge the features. A dense layer comprising 128 units and utilizing the ReLU activation function was incorporated to capture more intricate and abstract representations. Subsequently, a concluding dense layer employing the sigmoid activation function was introduced to facilitate binary classification. By fusing the outputs of both models, the fusion model benefits the performance as the complementary nature of the features extracted by MobileNetV2 and VGG16. This combination can improve the model's ability to have both low-level and high-level features, leading to a more comprehensive representation of the input images. Flow of proposed models is illustrated in Figure 5.

Result evaluation and analysis
The MobileNetV2 model achieved testing accuracy of 85.00% after 10 epochs, 89.99% after 30 epochs, and 97.00% after 50 epochs. The accuracy steadily improved with each epoch, indicating that the model learned and generalized better over time. The highest testing accuracy of 97.00% was achieved after 50 epochs, suggesting that the model had converged and performed well on unseen data.
The MobileNetV2 model achieved a training accuracy of 94.41% after 10 epochs, 97.92% after 30 epochs, and 99.00% after 50 epochs. The training accuracy increased with each epoch, indicating that the model was able to fit the training data better over time. The highest training accuracy of 99.00% was achieved after 50 epochs, suggesting that the model had learned the training data well. The MobileNetV2 model achieved a validation accuracy of 83.99% after 10 epochs, 88.33% after 30 epochs, and 97.00% after 50 epochs.
The validation accuracy followed a similar trend to the testing accuracy, indicating that the model was able to generalize well to unseen validation data. The highest validation accuracy of 97.00% was achieved after 50 epochs, suggesting that the model performed well on both the training and validation data. All the results for the model at different epochs are summarized in Table 1. Overall, the MobileNetV2 model showed consistent improvement in accuracy as the number of epochs increased. It achieved high accuracy levels on both the testing and validation datasets, indicating its effectiveness in classifying mobile-related data. It's important to monitor overfitting and other factors such as size of dataset as well. Model Accuracy and Loss variations can be seen in Figure 6.  For the fusion model, the training accuracy is 97.52% with a dataset size of 253 and 99.4% with a dataset size of 3000 after 10 epochs. The fusion model achieved a testing accuracy of 92.59% with a dataset of 253 and 98% with a dataset size of 3000 after 10 epochs. The fusion model achieved a validation accuracy of 83.33% with a dataset of 253 and 95.99% with a dataset size of 3000 after 10 epochs. But overall, at 20 epochs, the model achieved a high training accuracy of 99.87%, indicating that it learned the training data well. The testing accuracy of 96.66% suggests that the model generalizes reasonably well to unseen data. Finally, the validation accuracy of 97.00% indicates that the model performs well on a separate validation dataset, further validating its effectiveness. All the results for the fusion model at different epochs are summarized in Table 2. Model Accuracy and Loss variations at epoch 20 can be seen in Figure 7.

Discussion
For the diagnosis and categorization of MRI brain tumours, radiologists have traditionally utilised a method based on human inspection. This method relies on the radiologists' knowledge of the various image components. Because manually processing big scale datasets requires a lot of time, operator-assisted classification methods lack reproducibility and reliability when dealing with vast amounts of data. Computer-aided diagnosis tools are needed to process a lot of data effectively in order to solve such issues.
Handcrafted features predicated on high-level and low-level attributes are frequently derived through conventional feature extraction methods in machine learning. This presents a noteworthy quandary when applied to tumour analysis employing machine learning techniques. The precise localization, morphology, dimensions, and contour of a tumour region within an MR image bear a profound correlation to the discriminative features and salient information associated with any given brain tumour. Substantial divergences in the configuration, size, and intensity are observable across brain tumours. Consequently, manually devised features based on conventional machine learning approaches may not proffer a pragmatic means of capturing the informational intricacy conveyed by these intensity variations. Recent strides in computer-aided medical endeavours have been made feasible by embracing deep learning-powered automatic feature extraction and classification methodologies. The principal challenge in MR image classification and recognition lies in reconciling the discrepancy between the high-level data perceptible to a human evaluator and the low-level visual data captured by an MRI machine. By obviating the necessity for humancrafted features, this classification approach efficaciously extracts the most salient features that encapsulate both low-level and high-level information representation. Convolutional Neural Network (CNN) deep learning models adeptly extract consequential features using a hierarchically structured learning strategy, evincing the superiority of deep learning models in delivering optimal outcomes. In the early layers, these models apprehend rudimentary structural elements such as edges and shapes, while the ultimate layers encode or engender abstract interpretations of specific characteristics.
Furthermore, transfer learning has garnered significant traction owing to its capacity to surmount data constraints, accelerate training procedures, enhance performance outcomes, enable domain adaptation, and offer pragmatic and versatile solutions. This technique has emerged as a valuable asset for researchers and practitioners across diverse domains, empowering them to attain cutting-edge outcomes despite resource and time limitations. In the specific context of evaluating and categorizing brain tumours, the fusion of feature layers from both heavy-weight and lightweight pre-trained models has exhibited exceptional performance.

Conclusion
In essence, the presented research entails an investigation within the field of brain tumour classification, employing transfer learning and deep convolutional neural network (CNN) architectures. We deployed two pre-trained architectures standalone lightweight MobileNetv2 and fusion of MobileNetV2 and VGG16 on MRI images dataset from Kaggle to detect tumour and compared them both. To evaluate the model testing, training and validation accuracies are calculated. Attained 98.00% testing accuracy which was highest among all experiments with fusion model. A larger dataset increases the likelihood that the model will come across the variety of variances and patterns seen in the data, which can improve generalisation. Considering required computational resources and complexity MobileNetv2 is also a good option and can be further explored with fine tuning, using different datasets, for multi-class classification to determine type of tumour.
Although two models for brain tumours were examined in this work, there is still more that has to be researched. In the future, would like investigate alternative vitally important and time-efficient deep neural network topologies for the detection and classification of brain tumours.