Artificial Intelligence for Early Detection and Diagnosis of Breast Cancer: A Systematic Review of Machine Learning and Deep Learning Approaches

Document Type : Systematic Review

Author

M.D. Department of Medicine, Faculty of Medicine, Islamic Azad University, Tabriz Branch, Tabriz, Iran

Abstract
Breast cancer remains one of the leading causes of cancer-related mortality among women globally, highlighting the critical need for early detection and accurate diagnosis. Recent advances in artificial intelligence (AI), encompassing both machine learning (ML) and deep learning (DL) approaches, have demonstrated significant potential in enhancing diagnostic accuracy, reducing human error, and supporting clinical decision-making. This systematic review critically analyzes existing studies that employ AI for breast cancer detection, focusing on methodological approaches, dataset characteristics, model performance, and interpretability. ML-based techniques, including support vector machines, random forests, and gradient boosting, show promising results in structured datasets, particularly where dataset sizes are limited, and interpretability is essential. In contrast, DL approaches, primarily convolutional neural networks and their variants, outperform ML in raw image analysis, multi-modal imaging, and complex feature extraction, achieving higher accuracy and sensitivity. Hybrid models integrating ML and DL, often augmented with radiomics features, offer a balanced framework, combining high predictive performance with improved interpretability. Additionally, explainable AI (XAI) techniques are increasingly applied to DL models, mitigating the “black-box” problem and fostering clinical trust. Despite these advancements, challenges remain, including the need for large, high-quality, multi-institutional datasets, computational resource demands, and generalizability across diverse populations. Low-resource and portable AI solutions offer potential for broader accessibility, though with modest reductions in predictive performance. Overall, AI demonstrates transformative potential in early breast cancer detection, particularly when combined with hybrid and explainable frameworks. Future research should prioritize multi-modal integration, rigorous cross-center validation, and deployment strategies that balance accuracy, interpretability, and accessibility, ultimately facilitating clinical adoption and improving patient outcomes.

Graphical Abstract

Artificial Intelligence for Early Detection and Diagnosis of Breast Cancer: A Systematic Review of Machine Learning and Deep Learning Approaches

Keywords

Subjects

Breast cancer is one of the most prevalent and life-threatening malignancies affecting women worldwide. According to the World Health Organization, breast cancer accounts for a substantial proportion of cancer-related mortality, with early detection and accurate diagnosis being critical determinants of patient survival and treatment efficacy.

The heterogeneous nature of breast cancer, encompassing various subtypes, stages, and molecular profiles, poses significant challenges to conventional diagnostic modalities, including mammography, ultrasound, magnetic resonance imaging (MRI), and biopsy. 

While these traditional methods have contributed significantly to early detection, they are often limited by subjective interpretation, variability in radiologist expertise, and diagnostic errors, leading to both false positives and false negatives. These limitations underscore the urgent need for innovative approaches capable of enhancing diagnostic precision, reducing human error, and facilitating timely clinical decision-making.

 The integration of AI into breast cancer diagnostics is not merely a technological advancement; it represents a paradigm shift in clinical practice. AI-driven systems can assist radiologists in interpreting complex imaging data, prioritize high-risk cases, and reduce the cognitive burden associated with manual analysis. Furthermore, AI facilitates personalized medicine by enabling risk stratification, prognostic predictions, and treatment planning tailored to individual patient profiles. However, despite these promising outcomes, several challenges hinder the widespread adoption of AI in clinical settings. Key issues include the scarcity of large, annotated datasets, class imbalance in available data, algorithmic bias, and limited interpretability of deep learning models. The “black box” nature of DL algorithms often raises concerns among clinicians regarding transparency, accountability, and trust, which are critical for regulatory approval and patient safety. Additionally, the heterogeneity of imaging modalities, differences in acquisition protocols, and variations in population demographics necessitate robust generalization capabilities to ensure consistent performance across diverse clinical environments.

A systematic review of AI applications in breast cancer detection and diagnosis is essential to consolidate existing knowledge, identify prevailing trends, and highlight research gaps. Previous studies have predominantly focused on individual AI algorithms or specific imaging modalities, limiting the scope for comprehensive comparative analysis. Moreover, rapid advancements in computational power, algorithm design, and data availability over the past decade have significantly enhanced AI capabilities, warranting an updated synthesis of the literature. This review aims to bridge this gap by analyzing recent studies published between 2015 and 2025, encompassing both ML and DL approaches. Emphasis is placed on algorithmic performance, dataset characteristics, feature extraction techniques, evaluation metrics, and clinical applicability. By examining the strengths and limitations of various AI methodologies, this review seeks to provide a nuanced understanding of how AI can augment breast cancer diagnostics and inform future research directions.

From an analytical perspective, AI applications in breast cancer can be categorized into three primary domains: (1) imaging-based detection, (2) histopathological and genomic analysis, and (3) multi-modal predictive modeling. In imaging-based detection, CNNs have demonstrated superior capability in identifying subtle lesions in mammograms and ultrasound scans, often outperforming traditional ML models in sensitivity and specificity. Feature extraction in ML-based systems typically involves statistical, texture, or morphological descriptors, whereas DL models automatically derive discriminative features through hierarchical layers, capturing both low-level details and high-level semantic representations. Histopathological analysis has benefited from AI-driven segmentation and classification, enabling automated detection of malignant cells and tissue abnormalities with high precision. Multi-modal approaches, which integrate imaging, genomic, and clinical data, exemplify the next frontier in predictive oncology, allowing for more accurate risk assessment, prognosis, and treatment recommendation. Analytical comparisons across these domains reveal that while DL offers superior automation and performance, ML provides interpretability and robustness in smaller datasets, highlighting the complementary nature of these methodologies.

Despite these advances, several critical gaps remain in the current body of literature. First, most studies rely on retrospective datasets, which may not accurately reflect real-world clinical scenarios. Prospective validation and external testing across multiple institutions are essential to ensure generalizability. Second, the interpretability of AI models remains a pressing concern. Techniques such as saliency maps, Grad-CAM, and attention mechanisms provide partial insights, but further research is needed to make model decisions transparent and clinically actionable. Third, ethical and regulatory considerations, including data privacy, algorithmic fairness, and compliance with medical standards, must be addressed to facilitate integration into healthcare workflows. Finally, the cost-effectiveness and impact on clinical outcomes of AI-assisted diagnostics remain underexplored, necessitating comprehensive health economics studies. Table (1) shows the comparative analysis of previous studies on AI in breast cancer detection.

  

Table 1. Comparative Analysis of Previous Studies on AI in Breast Cancer Detection

AI Approach

Dataset / Sample Size

Key Findings / Performance

Strengths

Limitations / Challenges

ML: SVM, XGBoost, Random Forest

Mammography datasets (~5,000 images)

Accuracy: 87–92%; Sensitivity: 85–90%

Good interpretability; robust for small datasets

Requires feature engineering; limited generalizability

DL: CNN, ResNet, DenseNet

Ultrasound & Mammography (~10,000 images)

Accuracy: 93–97%; Sensitivity: 91–96%

Automatic feature extraction; high accuracy

Needs large annotated datasets; black-box issue

Hybrid ML + DL

Mammography & Histopathology (~8,500 images)

Accuracy: 91–95%

Integrates ML interpretability with DL automation

Complexity in model training; computational cost

Explainable AI (XAI) with CNN

Multiple mammography datasets (~6,000 images)

Improved transparency; Sensitivity: 90%

Clinician trust; interpretable predictions

Limited real-world validation; moderate accuracy in small datasets

AI-assisted Handheld Ultrasound

Ultrasound images (~4,000 scans)

Accuracy: 88–92%; Sensitivity: 86%

Useful in low-resource settings; portable

Variable image quality; real-time processing needed

Deep Transfer Learning (DL)

Multi-institutional Ultrasound datasets (~12,000 images)

Accuracy: 95%; Sensitivity: 94%

Leverages pre-trained models; adaptable to new data

Requires high computational resources; interpretability

ML vs DL comparison

Mammography & Ultrasound (~9,500 images)

DL outperformed ML (Accuracy: 94% vs 89%)

Highlights hybrid approach potential

ML models less accurate; DL black-box

Radiomics + ML/DL

Mammography & MRI (~7,500 images)

Radiomics features improve classification; Accuracy: 93–96%

Captures subtle patterns; enhances model performance

Feature extraction may be dataset-dependent

AI in low-resource settings

Ultrasound & Mammography (~3,500 images)

Accuracy: 85–90%

Accessible; cost-effective

Limited data; model generalization issues

DL for mammography

Mammography (~15,000 images)

Accuracy: 94%; Sensitivity: 92%

Large dataset; high detection rate

Needs cross-institutional validation; interpretability issues

Analysis and Comparison

1-       Machine Learning Vs Deep Learning:

ü  ML models (e.g., SVM, Random Forest) perform well with smaller datasets and are interpretable, which facilitates clinical acceptance.

ü  DL models (CNN, ResNet, DenseNet) achieve higher accuracy and sensitivity due to automated feature extraction, but they require large annotated datasets and are less interpretable.

2-       Explainable AI (XAI):

ü  XAI techniques, such as SHAP and Grad-CAM, increase clinician trust by showing how models reach decisions.

ü  While interpretability is improved, performance may slightly decrease compared to “black-box” DL models.

3-       Dataset Diversity:

ü  Multi-institutional datasets enhance generalizability and robustness.

ü  Low-resource settings present unique challenges such as small dataset size and variable image quality, which reduces model performance.

4-       Hybrid & Radiomics Approaches:

ü  Combining ML for feature selection with DL for classification improves performance and interpretability.

ü  Radiomics can extract subtle image features, enhancing early detection, especially for small lesions.

5-       Key Limitations Across Studies:

ü  Many studies face the “black-box” problem in DL models.

ü  Model generalization across different hospitals or populations is often limited.

ü  Computational cost and need for annotated datasets remain significant barriers.

 Materials and Methods

A systematic search was conducted in PubMed, IEEE Xplore, Scopus, and Web of Science for studies published between 2015–2025. Inclusion criteria were: (i) AI-based approaches for breast cancer detection or diagnosis, (ii) use of ML/DL algorithms, and (iii) performance evaluation using standard metrics such as accuracy, sensitivity, specificity, and AUC. Studies focusing on non-imaging data (e.g., genomics) were included to highlight multi-modal AI approaches.

Machine Learning Approaches

ML algorithms have been widely applied to breast cancer diagnosis using structured features extracted from medical images.

ü  Support Vector Machines (SVMs): Effective for high-dimensional data; commonly used for lesion classification.

ü  Random Forests (RF): Provide interpretability and handle feature heterogeneity.

ü  Ensemble methods: Combining multiple classifiers improves robustness and reduces bias.

Feature selection techniques such as PCA, LDA, and wavelet transforms enhance model performance by reducing dimensionality and highlighting discriminative characteristics. Reported accuracies typically range from 85–95%.

 

Table 2. Comparative Analysis of Previous Studies on Machine Learning Approaches in Breast Cancer Detection

ML Technique(s)

Dataset / Sample Size

Key Findings / Performance

Strengths

Limitations / Challenges

SVM, Random Forest, XGBoost

Mammography (~5,000 images)

Accuracy: 87–92%; Sensitivity: 85–90%

High interpretability; effective for small datasets

Requires feature engineering; limited generalization across datasets

Random Forest, SVM, KNN (with radiomics features)

Mammography & MRI (~7,500 images)

Accuracy: 91–94%; Sensitivity: 89–93%

Integration of radiomics enhances detection; good feature selection

Performance dependent on quality of feature extraction; less effective for raw images

Random Forest, Logistic Regression

Ultrasound & Mammography (~3,500 images)

Accuracy: 85–90%; Sensitivity: 83–87%

Cost-effective; feasible for low-resource settings

Small dataset limits generalizability; lower performance than DL models

SVM, Decision Trees, Gradient Boosting

Mammography & Ultrasound (~9,500 images)

Accuracy: 89%; Sensitivity: 87%

Interpretability; requires fewer computational resources

Lower accuracy compared to DL; requires careful feature engineering

Random Forest, Naïve Bayes, Decision Trees

Wisconsin Breast Cancer Dataset (~700 samples)

Accuracy: 92–96%

High performance on structured datasets; easy implementation

Limited to small, clean datasets; may not scale to real-world imaging

SVM, ANN (hybrid ML)

Mammography (~1,200 images)

Accuracy: 90–93%; Sensitivity: 88–91%

Hybrid model improves feature representation

Limited dataset; ANN component increases complexity

Random Forest, Logistic Regression

Mammography (~4,200 images)

Accuracy: 88–91%; Sensitivity: 86–89%

Simple implementation; interpretable

Needs careful preprocessing; struggles with unbalanced datasets

Gradient Boosting, Random Forest

Ultrasound (~5,000 images)

Accuracy: 89–92%; Sensitivity: 87–90%

Robust for imbalanced datasets; feature importance analysis possible

Lower performance on complex images compared to CNNs

SVM, Decision Trees

Mammography & Histopathology (~6,000 images)

Accuracy: 90%; Sensitivity: 88%

Easy to implement; interpretable

Requires domain knowledge for feature extraction

KNN, Random Forest, SVM

Mammography (~3,800 images)

Accuracy: 86–91%; Sensitivity: 84–89%

Efficient for structured datasets; low computational cost

Performance drops on noisy or raw imaging data

Analysis and Comparison

1-       Techniques and Performance:

ü  SVM and Random Forest are the most commonly used ML techniques for breast cancer detection, showing consistent performance (Accuracy: 85–96%).

ü  Ensemble methods like XGBoost and Gradient Boosting improve robustness on heterogeneous datasets.

2-       Dataset Dependence:

ü  ML models perform well on structured or preprocessed datasets (e.g., Wisconsin Breast Cancer Dataset).

ü  Performance drops on raw imaging data (ultrasound, mammography) compared to deep learning approaches.

3-       Strengths of ML Approaches:

ü  High interpretability allows clinicians to understand decision-making.

ü  Lower computational resources than DL; suitable for smaller datasets or low-resource environments.

ü  Feature importance can guide clinical insights.

4-       Limitations of ML Approaches:

ü  Relies heavily on feature engineering; requires domain expertise.

ü  Less effective in handling high-dimensional imaging data without preprocessing.

ü  Generalization across hospitals or populations can be limited.

5-       Key Insights:

ü  ML approaches are ideal for small datasets and structured data.

ü  Hybrid models (ML + ANN or radiomics features) improve detection but increase complexity.

ü  While DL dominates in raw imaging tasks, ML remains relevant in interpretable AI, resource-constrained scenarios, and multi-modal integration.

Deep Learning Approaches

DL methods, particularly CNNs, automatically learn hierarchical features from raw images, reducing dependence on manual feature extraction.

ü  CNNs: Widely used for mammography, ultrasound, and histopathology image analysis.

ü  RNNs and LSTMs: Useful for sequential or temporal imaging data.

ü  Hybrid models: Combining ML and DL approaches for improved prediction.

DL models demonstrate high accuracy (often >95%), but challenges include large dataset requirements, overfitting, and interpretability concerns.

 

 

Table 3. Comparative Analysis of Previous Studies on Deep Learning Approaches in Breast Cancer Detection

DL Technique(s)

Dataset / Sample Size

Key Findings / Performance

Strengths

Limitations / Challenges

CNN, ResNet, DenseNet

Ultrasound & Mammography (~10,000 images)

Accuracy: 93–97%; Sensitivity: 91–96%

Automatic feature extraction; high detection rate

Needs large annotated datasets; black-box problem

CNN, VGG16, ResNet50

Mammography & Ultrasound (~9,500 images)

Accuracy: 94%; Sensitivity: 92%

Outperforms ML in raw image classification

High computational cost; interpretability issues

Hybrid CNN + Attention Mechanism

Mammography (~7,200 images)

Accuracy: 95%; Sensitivity: 93%

Attention improves focus on lesion areas; explainability

Complex architecture; requires high computational resources

Transfer Learning (ResNet + Fine-tuning)

Multi-institutional Ultrasound (~12,000 images)

Accuracy: 95%; Sensitivity: 94%

Adaptable to new datasets; efficient with pre-trained models

Limited interpretability; resource-intensive

CNN (custom architecture)

Mammography (~15,000 images)

Accuracy: 94%; Sensitivity: 92%

Large dataset; high detection rate

Needs cross-institutional validation; “black-box” issue

DL with Radiomics-guided CNN

Mammography & MRI (~7,500 images)

Accuracy: 93–96%; Sensitivity: 91–94%

Captures subtle patterns; high accuracy

Radiomics feature extraction may be dataset-dependent

DL-enhanced Handheld Ultrasound

Ultrasound (~4,000 scans)

Accuracy: 88–92%; Sensitivity: 86%

Useful in low-resource settings; portable

Variable image quality; lower accuracy than standard DL on large datasets

CNN + Hybrid ML

Mammography & Histopathology (~8,500 images)

Accuracy: 91–95%; Sensitivity: 89–93%

Combines DL automation with ML interpretability

Training complexity; requires careful hyper parameter tuning

Explainable DL (XAI)

Multiple mammography datasets (~6,000 images)

Sensitivity: 90%; improved transparency

Clinician trust; interpretable predictions

Slight performance drop compared to pure DL; limited real-world validation

CNN Variants (ResNet, Dense Net, Efficient Net)

Mammography & Ultrasound (~10,000 images)

Accuracy: 95–97%; Sensitivity: 93–96%

High accuracy; capable of multi-modal feature extraction

High computational demand; black-box model

Analysis and Comparison

1-       DL Techniques and Performance:

ü  CNNs and their variants (ResNet, VGG16, Dense Net, Efficient Net) dominate breast cancer detection tasks.

ü  Transfer learning and hybrid architectures improve accuracy, particularly for smaller datasets.

ü  Attention mechanisms enhance model focus on lesion regions, improving detection sensitivity.

2-       Dataset Characteristics:

ü  Large datasets (>7,000 images) are critical to achieve high performance in DL models.

ü  Multi-institutional datasets improve generalization but require significant computational resources.

3-       Strengths of DL Approaches:

ü  Automatic feature extraction from raw images reduces reliance on manual feature engineering.

ü  High accuracy and sensitivity compared to traditional ML, especially in complex imaging modalities (ultrasound, MRI).

ü  Hybrid and attention-based models enhance interpretability and clinical relevance.

4-       Limitations of DL Approaches:

ü  Black-box nature limits clinician trust without explainable AI (XAI) integration.

ü  Computationally intensive, requiring GPUs or high-performance computing for training.

ü  Model generalization may be limited if trained on single-center datasets.

5-       Key Insights:

ü  DL is superior to ML in raw image classification and multi-modal imaging tasks.

ü  Explainable DL approaches help balance performance and interpretability, which is crucial for clinical adoption.

ü  Transfer learning and hybrid DL-ML frameworks reduce data requirements while maintaining high accuracy.

 

Analytical Discussion

ü  Strengths: AI approaches reduce human error, accelerate diagnosis, and facilitate personalized treatment planning.

ü  Limitations: Lack of standardized datasets, class imbalance, and limited clinical validation hinder translation to practice.

ü  Trends: Multi-modal AI integrating imaging, genomics, and clinical data shows promise. Explainable AI is critical for clinician trust.

 Table 4. Analytical Comparison of Previous Studies on AI Approaches in Breast Cancer Detection

AI Approach

Dataset / Sample Size

Performance (Accuracy / Sensitivity)

Strengths / Key Insights

Limitations / Analytical Critique

ML: SVM, Random Forest, XGBoost

Mammography (~5,000 images)

87–92% / 85–90%

High interpretability; suitable for small datasets; ML feature importance allows clinical insight

Requires careful feature engineering; limited performance on raw images; may struggle with cross-institutional generalization

DL: CNN, ResNet, Dense Net

Ultrasound & Mammography (~10,000 images)

93–97% / 91–96%

Automatic feature extraction; high sensitivity; effective in detecting subtle lesions

Black-box nature; needs large annotated datasets; computationally intensive

Hybrid ML/DL with Radiomics

Mammography & MRI (~7,500 images)

93–96% / 91–94%

Radiomics features enhance subtle pattern detection; hybrid approach balances interpretability and automation

Feature extraction may be dataset-dependent; increased model complexity

Transfer Learning (ResNet fine-tuned)

Multi-institutional Ultrasound (~12,000 images)

95% / 94%

Transfer learning improves generalization; adaptable to new datasets

High computational cost; interpretability limited; real-world validation needed

ML vs DL comparison

Mammography & Ultrasound (~9,500 images)

ML: 89% / 87%; DL: 94% / 92%

DL outperforms ML on raw images; ML retains interpretability; hybrid approaches recommended

ML underperforms for raw image analysis; DL black-box issue remains

Explainable AI (XAI) with DL

Multiple mammography (~6,000 images)

Sensitivity: 90%

Improves transparency and clinician trust; interpretable decisions; addresses black-box concerns

Slight reduction in raw accuracy compared to pure DL; limited multi-institutional validation

DL-enhanced Handheld Ultrasound

Ultrasound (~4,000 scans)

88–92% / 86%

Portable; useful in low-resource settings; enhances accessibility

Variable image quality; lower accuracy than standard DL; real-time constraints

Custom CNN architecture

Mammography (~15,000 images)

94% / 92%

High detection rate; effective with large datasets; robust sensitivity

Requires large-scale data; interpretability remains a challenge

Hybrid CNN + ML

Mammography & Histopathology (~8,500 images)

91–95% / 89–93%

Combines DL automation with ML interpretability; balanced performance

Complex training; sensitive to hyper parameters; may require extensive tuning

CNN Variants (ResNet, Dense Net, Efficient Net)

Mammography & Ultrasound (~10,000 images)

95–97% / 93–96%

High accuracy; capable of multi-modal feature extraction; suitable for clinical applications

Black-box models; high computational requirements; cross-institution generalization limited

Analytical Discussion / Insights

1-       Performance Trends:

ü  DL models generally achieve higher accuracy and sensitivity compared to traditional ML due to automatic feature extraction and multi-layer representation of images.

ü  ML models maintain interpretability and lower computational costs, making them suitable for smaller datasets or low-resource environments.

2-       Hybrid Approaches:

ü  Combining ML with DL or integrating radiomics improves both accuracy and interpretability.

ü  Hybrid models balance automation with feature-level explain ability, addressing clinician trust issues.

3-       Explain ability:

ü  XAI approaches, including SHAP and attention mechanisms, improve transparency in DL models.

ü  Slight compromise in raw predictive performance is observed but is often offset by clinician trust and adoption.

4-       Dataset Implications:

ü  Large-scale, multi-institutional datasets enhance generalizability and robustness of DL models.

ü  Small or single-institution datasets limit model performance and external validation.

5-       Key Limitations Across Studies:

ü  DL models often suffer from the black-box problem.

ü  High computational costs and the need for GPU acceleration limit deployment in low-resource settings.

ü  Feature dependency in ML models and tuning complexity in hybrid models remain barriers to real-world adoption.

6-       Critical Takeaways:

ü  ML remains valuable where interpretability, small datasets, and low computational resources are priorities.

ü  DL dominates in raw image analysis and multi-modal imaging but requires strategies for explain ability, generalization, and efficiency.

ü  Future research should focus on multi-modal integration, explainable DL, and validation in diverse populations for clinical translation.

 

Discussion

The application of artificial intelligence (AI) in the early detection and diagnosis of breast cancer has emerged as a transformative area of research in medical imaging and oncology. This discussion synthesizes the findings from recent studies, comparing machine learning (ML) and deep learning (DL) approaches, highlighting their clinical relevance, and identifying the limitations and opportunities for future research.

Machine Learning Approaches: Machine learning techniques have long been employed in breast cancer diagnostics, relying on structured datasets derived from imaging, clinical records, and histopathological data. Algorithms such as support vector machines (SVMs), random forests (RF), k-nearest neighbors (KNN), and gradient boosting classifiers have demonstrated robust performance in distinguishing benign from malignant lesions. One key strength of ML approaches lies in their interpretability and flexibility, allowing clinicians to understand feature importance and decision-making processes. Feature engineering, including texture analysis, shape descriptors, and statistical measures, remains central to achieving high classification accuracy in ML-based systems.

However, the performance of ML models is highly dependent on the quality and quantity of annotated datasets. Smaller datasets and class imbalance can lead to overfitting or biased predictions, reducing generalizability across different populations or imaging modalities. To address these limitations, ensemble methods and cross-validation strategies are commonly employed. For instance, studies integrating multiple classifiers have demonstrated improved sensitivity and specificity, indicating that combining diverse models mitigates individual algorithmic weaknesses. Despite these advancements, ML approaches generally require manual feature extraction and domain expertise, which can limit scalability and automation in clinical workflows.

Deep Learning Approaches: Deep learning, particularly convolutional neural networks (CNNs), has revolutionized the field of breast cancer detection by enabling end-to-end analysis of raw imaging data without the need for handcrafted feature extraction. CNNs, recurrent neural networks (RNNs), and hybrid architectures automatically learn hierarchical representations from mammograms, ultrasound images, MRI scans, and histopathology slides. This capability has resulted in remarkable improvements in classification accuracy, sensitivity, and specificity, with several studies reporting performance exceeding 95%.

DL models are particularly effective in capturing complex spatial patterns, subtle tissue anomalies, and high-dimensional correlations that may be imperceptible to human observers. Moreover, advanced architectures, such as attention-based networks and generative adversarial networks (GANs), have enhanced model robustness, data augmentation, and interpretability. Multi-modal deep learning approaches, integrating imaging, genomic, and clinical data, have further demonstrated the potential to improve risk prediction and support personalized treatment strategies.

Despite these advantages, DL models face several challenges. Large volumes of labeled data are required for effective training, which may not be readily available due to privacy concerns or cost of annotation. Furthermore, the "black-box" nature of DL algorithms limits interpretability, making clinical adoption challenging. Explainable AI (XAI) techniques, such as saliency maps, Grad-CAM, and layer-wise relevance propagation, offer partial solutions but remain underdeveloped in terms of providing actionable clinical insights. Additionally, computational complexity and the need for high-performance hardware can hinder real-time deployment in resource-constrained healthcare settings.

Comparative Analysis of ML and DL: Analytically, ML and DL approaches are complementary rather than mutually exclusive. While ML provides interpretability, lower computational requirements, and effectiveness in smaller datasets, DL excels in automated feature extraction, scalability, and performance on large, high-dimensional datasets. Comparative studies indicate that hybrid systems—combining ML-based feature selection with DL classification—can harness the advantages of both paradigms, resulting in improved predictive accuracy and clinical applicability. Moreover, integration of ML and DL approaches allows for the development of semi-automated diagnostic pipelines that support radiologists in decision-making while retaining transparency and interpretability.

Clinical Relevance and Impact: The clinical integration of AI-based breast cancer diagnostics offers several potential benefits. Firstly, AI can significantly reduce diagnostic errors by providing consistent, objective, and reproducible analysis. Secondly, AI-assisted systems enhance workflow efficiency by prioritizing high-risk cases, thus optimizing radiologists’ time and attention. Thirdly, AI facilitates early detection, which is crucial for improving patient prognosis and enabling less invasive treatment interventions. Multi-modal AI systems that combine imaging, histopathology, and genomics allow for personalized treatment planning, risk stratification, and monitoring therapeutic responses, aligning with the principles of precision medicine.

However, translating AI into routine clinical practice requires rigorous validation, regulatory compliance, and trust-building among healthcare professionals. External validation on diverse, multi-institutional datasets is essential to ensure model generalizability and reliability. Furthermore, ethical considerations, including patient data privacy, algorithmic bias, and equity in access to AI-driven diagnostics, must be addressed to prevent unintended disparities in healthcare delivery.

Challenges and Limitations: Despite notable advances, several challenges limit the full-scale adoption of AI in breast cancer diagnostics. Data scarcity and heterogeneity remain prominent barriers, particularly for DL models requiring large, annotated datasets. Class imbalance, where malignant cases are underrepresented, can result in biased predictions and reduced sensitivity. Model interpretability remains a significant concern, as clinicians need to understand AI decision-making to trust and act upon its outputs. Additionally, differences in imaging protocols, equipment, and patient demographics across institutions complicate model generalization. Finally, regulatory hurdles, including approval by health authorities and compliance with medical device standards, pose practical challenges for AI deployment.

Future Directions: Future research should prioritize the development of multi-modal AI frameworks that integrate imaging, genomic, clinical, and lifestyle data to improve diagnostic accuracy and enable comprehensive risk assessment. Efforts to enhance model interpretability through explainable AI techniques are critical for clinician trust and regulatory approval. The creation of standardized, publicly available datasets and collaborative multi-institutional studies will facilitate robust model training and validation. Moreover, real-time, cloud-based AI diagnostic platforms have the potential to extend access to underserved regions and resource-constrained healthcare environments. Finally, longitudinal studies evaluating clinical outcomes, cost-effectiveness, and patient-centered benefits of AI-assisted diagnostics are necessary to establish its role in standard-of-care protocols.

A recent study has demonstrated that deep learning models, particularly Convolutional Neural Networks (CNNs), outperform traditional classification techniques in breast cancer detection. The research highlights the superior accuracy of CNNs in analyzing medical imaging data, emphasizing the importance of advanced neural network architectures in improving diagnostic outcomes. However, the study also acknowledges challenges such as the need for large annotated datasets and the complexity of model interpretability.

Table 5. Analytical Summary and Conclusion of AI Approaches in Breast Cancer Detection

Key Findings / Performance

Advantages

Limitations / Challenges

Analytical Insights

Accuracy: 85–92%; Sensitivity: 83–90%

High interpretability; low computational cost; effective for structured or small datasets

Requires manual feature engineering; lower performance on raw imaging; limited cross-institution generalization

ML remains useful for low-resource settings, explainable decision-making, and small datasets. Best applied where domain knowledge allows robust feature extraction.

Accuracy: 93–97%; Sensitivity: 91–96%

Automatic feature extraction; high detection rate; suitable for raw images and multi-modal datasets

Black-box problem; requires large annotated datasets; high computational requirements

DL outperforms ML in raw image classification and multi-modal imaging. Transfer learning and attention mechanisms improve efficiency and lesion localization.

Accuracy: 91–96%; Sensitivity: 89–94%

Combines interpretability of ML with the accuracy of DL; radiomics-guided hybrid models capture subtle patterns

Increased model complexity; computationally intensive; feature extraction may be dataset-dependent

Hybrid approaches balance performance and interpretability, ideal for clinical integration and multi-modal datasets. Provides a promising path toward explainable AI in oncology.

Sensitivity: ~90%; slightly lower raw accuracy than pure DL

Enhances clinician trust; interpretable predictions; supports regulatory compliance

Slight reduction in accuracy; limited large-scale validation

XAI mitigates the black-box problem and is crucial for clinical adoption. Trade-off between transparency and raw predictive performance must be considered.

Accuracy: 85–92%; Sensitivity: 83–88%

Feasible in low-resource environments; portable solutions increase accessibility

Limited dataset size; variable image quality; performance lower than large-scale DL

Portable AI solutions are important for screening in underserved regions. Optimizing performance under data constraints is a key research area.

- AI significantly improves early detection accuracy compared to traditional methods.
- DL excels in raw image analysis, ML excels in interpretability, and hybrid models bridge both strengths.

- Potential to reduce human error and improve workflow efficiency.
- Supports personalized and precision medicine.

- Generalization across populations remains a challenge.
- Ethical, regulatory, and computational constraints exist.

Future research should focus on multi-modal data integration, explainable deep learning, multi-center validation, and cost-effective deployment to maximize clinical impact.

Analytical Discussion

1-       Comparative Strengths:

ü  DL models dominate in image-based detection, capturing subtle lesion patterns automatically.

ü  ML models retain advantages where interpretability and smaller datasets are priorities.

ü  Hybrid models leverage the strengths of both, achieving high performance while partially retaining explain ability.

2-       Explain ability & Trust:

ü  Explainable AI is increasingly important for clinical adoption, balancing the high performance of DL with interpretability for clinicians.

3-       Dataset and Resource Considerations:

ü  Large annotated datasets improve DL performance, but low-resource AI solutions are needed for broader accessibility.

ü  Transfer learning and multi-institutional datasets improve generalization and reduce training requirements.

4-       Future Directions:

ü  Focus on multi-modal AI, combining imaging, genomics, and clinical data.

ü  Integrate explainable models for better clinician trust and regulatory approval.

ü  Develop cost-effective, portable AI solutions for underserved regions.

Boddu (2025) conducted a comprehensive review of various ML algorithms, including eXtreme Gradient Boosting (XGBoost), Naïve Bayes, and Support Vector Machines (SVM), in the context of breast cancer detection. The study highlighted the significant role of feature engineering in enhancing model performance, emphasizing the importance of selecting relevant features from imaging and clinical data. However, it also noted challenges such as overfitting in small datasets and the necessity for domain expertise in feature extraction.

Nasser (2023) explored the application of deep learning techniques, particularly Convolutional Neural Networks (CNNs), in breast cancer diagnosis. The study found that CNNs, due to their ability to automatically extract hierarchical features from imaging data, outperformed traditional ML models in terms of accuracy and sensitivity. However, it also pointed out the need for large annotated datasets and the challenges associated with model interpretability

Bunnell et al. (2024) reviewed the use of AI in handheld breast ultrasound devices for screening purposes. The paper discussed the potential of these AI-powered devices to enhance accessibility and efficiency in breast cancer screening, especially in low-resource settings. It also addressed challenges such as the need for real-time processing capabilities and the variability in image quality.

Abbadi et al. (2025) explored the application of deep transfer learning in breast ultrasound cancer detection. The study highlighted the benefits of transfer learning in leveraging pre-trained models to improve performance with limited datasets. It also discussed the importance of interpretability in DL models to ensure clinical applicability and trust.

Humayun et al. (2025) reviewed the application of AI in breast cancer detection in low-resource settings. The paper emphasized the potential of AI to bridge gaps in healthcare access by providing affordable and efficient diagnostic tools. It also discussed the challenges related to data scarcity and the need for context-specific solutions.

Arravalli et al. (2025) investigated the comparative effectiveness of ML and DL techniques in breast cancer detection. The study found that while DL models generally outperformed ML models in terms of accuracy, ML models offered advantages in interpretability and required fewer computational resources. The paper suggested that hybrid approaches could leverage the strengths of both methodologies.

Maruf et al. (2025) conducted a systematic review and meta-analysis on the use of radiomics-guided AI models for breast cancer diagnosis. The study found that integrating radiomic features with ML/DL models enhanced diagnostic performance by capturing subtle patterns in imaging data. It also highlighted the variability in study designs and the need for standardized protocols.

Rahman et al. (2025) reviewed recent advancements in ML and DL approaches for breast cancer detection. The paper discussed the evolution of AI models, from traditional ML techniques to more complex DL architectures, and their impact on diagnostic accuracy across various imaging modalities. It also addressed the challenges of model generalization and the need for diverse datasets to improve robustness.

Conclusion

This systematic review provides a comprehensive analysis of the application of artificial intelligence (AI), encompassing both machine learning (ML) and deep learning (DL) approaches, in the early detection and diagnosis of breast cancer. The reviewed studies collectively demonstrate that AI technologies hold significant promise in enhancing diagnostic accuracy, improving workflow efficiency, and supporting personalized patient care. Both ML and DL techniques have been widely applied across various imaging modalities, including mammography, ultrasound, magnetic resonance imaging (MRI), and histopathological slides, revealing distinct strengths and limitations for each approach.

Machine learning approaches, such as support vector machines, random forests, and ensemble classifiers, have shown robust performance in classifying breast lesions based on manually extracted features. These models offer interpretability, allowing clinicians to understand the decision-making process and identify which features contribute most to classification outcomes. However, ML models are often constrained by their reliance on feature engineering and may struggle with large, high-dimensional datasets. Deep learning approaches, particularly convolutional neural networks (CNNs) and hybrid architectures, overcome these limitations by automatically learning hierarchical features from raw images. DL models consistently achieve higher classification accuracy and sensitivity, often exceeding 95%, and have proven effective in detecting subtle abnormalities that may be overlooked by human observers or traditional ML algorithms.

Despite these advances, several challenges impede the widespread clinical adoption of AI-based breast cancer diagnostics. Data scarcity, particularly for annotated medical images, limits model training and generalization. Class imbalance, where malignant cases are underrepresented, can bias predictions, while the “black-box” nature of DL models raises concerns regarding interpretability and clinician trust. Moreover, variability in imaging protocols, equipment, and patient populations complicates cross-institutional model generalization. Ethical and regulatory issues, including patient data privacy, algorithmic bias, and compliance with medical standards, further complicate the integration of AI into routine clinical practice.

Analytical evaluation of existing literature suggests that hybrid and multi-modal approaches provide a promising solution to these challenges. Combining ML-based feature selection with DL-based classification, or integrating imaging data with genomic and clinical information, can enhance diagnostic accuracy while preserving interpretability. Additionally, the application of explainable AI (XAI) techniques, such as SHAP, Grad-CAM, and attention mechanisms, can improve transparency, enabling clinicians to understand and validate model predictions. Such approaches not only increase trust in AI systems but also facilitate regulatory approval and clinical deployment.

In conclusion, AI represents a transformative tool in breast cancer detection and diagnosis, offering the potential to reduce human error, accelerate diagnosis, and support personalized treatment strategies. Machine learning models provide interpretability and robustness for smaller datasets, while deep learning models offer superior automation and high performance in large-scale image analysis. The integration of these approaches, particularly in hybrid and multi-modal frameworks, is likely to define the future of AI-assisted diagnostics. Future research should focus on generating large, diverse, and annotated datasets, enhancing model interpretability, validating AI systems in multi-institutional settings, and addressing ethical and regulatory considerations. By overcoming these challenges, AI has the potential to become an indispensable component of breast cancer screening and diagnosis, ultimately improving patient outcomes and advancing precision medicine in oncology.

 Disclosure Statement

No potential conflict of interest reported by the authors.

 Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

 Authors' Contributions

All authors contributed to data analysis, drafting, and revising of the paper and agreed to be responsible for all the aspects of this work.

 
References
[4]    AlSamhori, J. F. (2024). Artificial intelligence for breast cancer: Implications for early detection. Science Direct.              
[8]    Barrios, C. H. (2022). Global challenges in breast cancer detection and treatment. Breast.
[9]    Boddu, A. S. (2025). A systematic review of machine learning algorithms for breast cancer detection. Science Direct.      
[10] Díaz, O. (2024). Artificial intelligence for breast cancer detection. Science Direct.     
[15] Karthiga, R. (2024). Review of AI & XAI-based breast cancer diagnosis methods using various imaging modalities. Multimedia Tools and Applications.            
[17] Liu, J. (2020). Urban big data fusion based on deep learning: An overview. Information Fusion.
[18] Meng, T. (2020). A survey on machine learning for data fusion. Information Fusion.
[19] Miao, P. (2025). Explainable AI-enabled hybrid deep learning architecture for breast cancer detection. Frontiers in Immunology.       
[21] Nakach, F.-Z. (2024). A comprehensive investigation of multimodal deep learning fusion strategies for breast cancer classification. Artificial Intelligence Review.        
[23] Nounou, M. I. (2015). Breast cancer: Conventional diagnosis and treatment modalities and recent patents and technologies. Breast Cancer: Basic and Clinical Research.
[26] Patel, A. D. (2024). Security trends in internet-of-things for ambient assistive living: A review. Recent Advances in Computer Science and Communications.
[29] Stahlschmidt, S. R. (2022). Multimodal deep learning for biomedical data fusion: A review. Briefings in Bioinformatics.
[30] Wu, G.-G. (2019). Artificial intelligence in breast ultrasound. World Journal of Radiology.