Document Type : Systematic Review
Author
M.D. Department of Medicine, Faculty of Medicine, Islamic Azad University, Tabriz Branch, Tabriz, Iran
Graphical Abstract
Keywords
Breast cancer is one of the most prevalent and life-threatening malignancies affecting women worldwide. According to the World Health Organization, breast cancer accounts for a substantial proportion of cancer-related mortality, with early detection and accurate diagnosis being critical determinants of patient survival and treatment efficacy.
The heterogeneous nature of breast cancer, encompassing various subtypes, stages, and molecular profiles, poses significant challenges to conventional diagnostic modalities, including mammography, ultrasound, magnetic resonance imaging (MRI), and biopsy.While these traditional methods have contributed significantly to early detection, they are often limited by subjective interpretation, variability in radiologist expertise, and diagnostic errors, leading to both false positives and false negatives. These limitations underscore the urgent need for innovative approaches capable of enhancing diagnostic precision, reducing human error, and facilitating timely clinical decision-making.
The integration of AI into breast cancer diagnostics is not merely a technological advancement; it represents a paradigm shift in clinical practice. AI-driven systems can assist radiologists in interpreting complex imaging data, prioritize high-risk cases, and reduce the cognitive burden associated with manual analysis. Furthermore, AI facilitates personalized medicine by enabling risk stratification, prognostic predictions, and treatment planning tailored to individual patient profiles. However, despite these promising outcomes, several challenges hinder the widespread adoption of AI in clinical settings. Key issues include the scarcity of large, annotated datasets, class imbalance in available data, algorithmic bias, and limited interpretability of deep learning models. The “black box” nature of DL algorithms often raises concerns among clinicians regarding transparency, accountability, and trust, which are critical for regulatory approval and patient safety. Additionally, the heterogeneity of imaging modalities, differences in acquisition protocols, and variations in population demographics necessitate robust generalization capabilities to ensure consistent performance across diverse clinical environments.
A systematic review of AI applications in breast cancer detection and diagnosis is essential to consolidate existing knowledge, identify prevailing trends, and highlight research gaps. Previous studies have predominantly focused on individual AI algorithms or specific imaging modalities, limiting the scope for comprehensive comparative analysis. Moreover, rapid advancements in computational power, algorithm design, and data availability over the past decade have significantly enhanced AI capabilities, warranting an updated synthesis of the literature. This review aims to bridge this gap by analyzing recent studies published between 2015 and 2025, encompassing both ML and DL approaches. Emphasis is placed on algorithmic performance, dataset characteristics, feature extraction techniques, evaluation metrics, and clinical applicability. By examining the strengths and limitations of various AI methodologies, this review seeks to provide a nuanced understanding of how AI can augment breast cancer diagnostics and inform future research directions.
From an analytical perspective, AI applications in breast cancer can be categorized into three primary domains: (1) imaging-based detection, (2) histopathological and genomic analysis, and (3) multi-modal predictive modeling. In imaging-based detection, CNNs have demonstrated superior capability in identifying subtle lesions in mammograms and ultrasound scans, often outperforming traditional ML models in sensitivity and specificity. Feature extraction in ML-based systems typically involves statistical, texture, or morphological descriptors, whereas DL models automatically derive discriminative features through hierarchical layers, capturing both low-level details and high-level semantic representations. Histopathological analysis has benefited from AI-driven segmentation and classification, enabling automated detection of malignant cells and tissue abnormalities with high precision. Multi-modal approaches, which integrate imaging, genomic, and clinical data, exemplify the next frontier in predictive oncology, allowing for more accurate risk assessment, prognosis, and treatment recommendation. Analytical comparisons across these domains reveal that while DL offers superior automation and performance, ML provides interpretability and robustness in smaller datasets, highlighting the complementary nature of these methodologies.
Despite these advances, several critical gaps remain in the current body of literature. First, most studies rely on retrospective datasets, which may not accurately reflect real-world clinical scenarios. Prospective validation and external testing across multiple institutions are essential to ensure generalizability. Second, the interpretability of AI models remains a pressing concern. Techniques such as saliency maps, Grad-CAM, and attention mechanisms provide partial insights, but further research is needed to make model decisions transparent and clinically actionable. Third, ethical and regulatory considerations, including data privacy, algorithmic fairness, and compliance with medical standards, must be addressed to facilitate integration into healthcare workflows. Finally, the cost-effectiveness and impact on clinical outcomes of AI-assisted diagnostics remain underexplored, necessitating comprehensive health economics studies. Table (1) shows the comparative analysis of previous studies on AI in breast cancer detection.
Table 1. Comparative Analysis of Previous Studies on AI in Breast Cancer Detection
|
AI Approach |
Dataset / Sample Size |
Key Findings / Performance |
Strengths |
Limitations / Challenges |
|
ML: SVM, XGBoost, Random Forest |
Mammography datasets (~5,000 images) |
Accuracy: 87–92%; Sensitivity: 85–90% |
Good interpretability; robust for small datasets |
Requires feature engineering; limited generalizability |
|
DL: CNN, ResNet, DenseNet |
Ultrasound & Mammography (~10,000 images) |
Accuracy: 93–97%; Sensitivity: 91–96% |
Automatic feature extraction; high accuracy |
Needs large annotated datasets; black-box issue |
|
Hybrid ML + DL |
Mammography & Histopathology (~8,500 images) |
Accuracy: 91–95% |
Integrates ML interpretability with DL automation |
Complexity in model training; computational cost |
|
Explainable AI (XAI) with CNN |
Multiple mammography datasets (~6,000 images) |
Improved transparency; Sensitivity: 90% |
Clinician trust; interpretable predictions |
Limited real-world validation; moderate accuracy in small datasets |
|
AI-assisted Handheld Ultrasound |
Ultrasound images (~4,000 scans) |
Accuracy: 88–92%; Sensitivity: 86% |
Useful in low-resource settings; portable |
Variable image quality; real-time processing needed |
|
Deep Transfer Learning (DL) |
Multi-institutional Ultrasound datasets (~12,000 images) |
Accuracy: 95%; Sensitivity: 94% |
Leverages pre-trained models; adaptable to new data |
Requires high computational resources; interpretability |
|
ML vs DL comparison |
Mammography & Ultrasound (~9,500 images) |
DL outperformed ML (Accuracy: 94% vs 89%) |
Highlights hybrid approach potential |
ML models less accurate; DL black-box |
|
Radiomics + ML/DL |
Mammography & MRI (~7,500 images) |
Radiomics features improve classification; Accuracy: 93–96% |
Captures subtle patterns; enhances model performance |
Feature extraction may be dataset-dependent |
|
AI in low-resource settings |
Ultrasound & Mammography (~3,500 images) |
Accuracy: 85–90% |
Accessible; cost-effective |
Limited data; model generalization issues |
|
DL for mammography |
Mammography (~15,000 images) |
Accuracy: 94%; Sensitivity: 92% |
Large dataset; high detection rate |
Needs cross-institutional validation; interpretability issues |
Analysis and Comparison
1- Machine Learning Vs Deep Learning:
ü ML models (e.g., SVM, Random Forest) perform well with smaller datasets and are interpretable, which facilitates clinical acceptance.
ü DL models (CNN, ResNet, DenseNet) achieve higher accuracy and sensitivity due to automated feature extraction, but they require large annotated datasets and are less interpretable.
2- Explainable AI (XAI):
ü XAI techniques, such as SHAP and Grad-CAM, increase clinician trust by showing how models reach decisions.
ü While interpretability is improved, performance may slightly decrease compared to “black-box” DL models.
3- Dataset Diversity:
ü Multi-institutional datasets enhance generalizability and robustness.
ü Low-resource settings present unique challenges such as small dataset size and variable image quality, which reduces model performance.
4- Hybrid & Radiomics Approaches:
ü Combining ML for feature selection with DL for classification improves performance and interpretability.
ü Radiomics can extract subtle image features, enhancing early detection, especially for small lesions.
5- Key Limitations Across Studies:
ü Many studies face the “black-box” problem in DL models.
ü Model generalization across different hospitals or populations is often limited.
ü Computational cost and need for annotated datasets remain significant barriers.
Materials and Methods
A systematic search was conducted in PubMed, IEEE Xplore, Scopus, and Web of Science for studies published between 2015–2025. Inclusion criteria were: (i) AI-based approaches for breast cancer detection or diagnosis, (ii) use of ML/DL algorithms, and (iii) performance evaluation using standard metrics such as accuracy, sensitivity, specificity, and AUC. Studies focusing on non-imaging data (e.g., genomics) were included to highlight multi-modal AI approaches.
Machine Learning Approaches
ML algorithms have been widely applied to breast cancer diagnosis using structured features extracted from medical images.
ü Support Vector Machines (SVMs): Effective for high-dimensional data; commonly used for lesion classification.
ü Random Forests (RF): Provide interpretability and handle feature heterogeneity.
ü Ensemble methods: Combining multiple classifiers improves robustness and reduces bias.
Feature selection techniques such as PCA, LDA, and wavelet transforms enhance model performance by reducing dimensionality and highlighting discriminative characteristics. Reported accuracies typically range from 85–95%.
Table 2. Comparative Analysis of Previous Studies on Machine Learning Approaches in Breast Cancer Detection
|
ML Technique(s) |
Dataset / Sample Size |
Key Findings / Performance |
Strengths |
Limitations / Challenges |
|
SVM, Random Forest, XGBoost |
Mammography (~5,000 images) |
Accuracy: 87–92%; Sensitivity: 85–90% |
High interpretability; effective for small datasets |
Requires feature engineering; limited generalization across datasets |
|
Random Forest, SVM, KNN (with radiomics features) |
Mammography & MRI (~7,500 images) |
Accuracy: 91–94%; Sensitivity: 89–93% |
Integration of radiomics enhances detection; good feature selection |
Performance dependent on quality of feature extraction; less effective for raw images |
|
Random Forest, Logistic Regression |
Ultrasound & Mammography (~3,500 images) |
Accuracy: 85–90%; Sensitivity: 83–87% |
Cost-effective; feasible for low-resource settings |
Small dataset limits generalizability; lower performance than DL models |
|
SVM, Decision Trees, Gradient Boosting |
Mammography & Ultrasound (~9,500 images) |
Accuracy: 89%; Sensitivity: 87% |
Interpretability; requires fewer computational resources |
Lower accuracy compared to DL; requires careful feature engineering |
|
Random Forest, Naïve Bayes, Decision Trees |
Wisconsin Breast Cancer Dataset (~700 samples) |
Accuracy: 92–96% |
High performance on structured datasets; easy implementation |
Limited to small, clean datasets; may not scale to real-world imaging |
|
SVM, ANN (hybrid ML) |
Mammography (~1,200 images) |
Accuracy: 90–93%; Sensitivity: 88–91% |
Hybrid model improves feature representation |
Limited dataset; ANN component increases complexity |
|
Random Forest, Logistic Regression |
Mammography (~4,200 images) |
Accuracy: 88–91%; Sensitivity: 86–89% |
Simple implementation; interpretable |
Needs careful preprocessing; struggles with unbalanced datasets |
|
Gradient Boosting, Random Forest |
Ultrasound (~5,000 images) |
Accuracy: 89–92%; Sensitivity: 87–90% |
Robust for imbalanced datasets; feature importance analysis possible |
Lower performance on complex images compared to CNNs |
|
SVM, Decision Trees |
Mammography & Histopathology (~6,000 images) |
Accuracy: 90%; Sensitivity: 88% |
Easy to implement; interpretable |
Requires domain knowledge for feature extraction |
|
KNN, Random Forest, SVM |
Mammography (~3,800 images) |
Accuracy: 86–91%; Sensitivity: 84–89% |
Efficient for structured datasets; low computational cost |
Performance drops on noisy or raw imaging data |
Analysis and Comparison
1- Techniques and Performance:
ü SVM and Random Forest are the most commonly used ML techniques for breast cancer detection, showing consistent performance (Accuracy: 85–96%).
ü Ensemble methods like XGBoost and Gradient Boosting improve robustness on heterogeneous datasets.
2- Dataset Dependence:
ü ML models perform well on structured or preprocessed datasets (e.g., Wisconsin Breast Cancer Dataset).
ü Performance drops on raw imaging data (ultrasound, mammography) compared to deep learning approaches.
3- Strengths of ML Approaches:
ü High interpretability allows clinicians to understand decision-making.
ü Lower computational resources than DL; suitable for smaller datasets or low-resource environments.
ü Feature importance can guide clinical insights.
4- Limitations of ML Approaches:
ü Relies heavily on feature engineering; requires domain expertise.
ü Less effective in handling high-dimensional imaging data without preprocessing.
ü Generalization across hospitals or populations can be limited.
5- Key Insights:
ü ML approaches are ideal for small datasets and structured data.
ü Hybrid models (ML + ANN or radiomics features) improve detection but increase complexity.
ü While DL dominates in raw imaging tasks, ML remains relevant in interpretable AI, resource-constrained scenarios, and multi-modal integration.
Deep Learning Approaches
DL methods, particularly CNNs, automatically learn hierarchical features from raw images, reducing dependence on manual feature extraction.
ü CNNs: Widely used for mammography, ultrasound, and histopathology image analysis.
ü RNNs and LSTMs: Useful for sequential or temporal imaging data.
ü Hybrid models: Combining ML and DL approaches for improved prediction.
DL models demonstrate high accuracy (often >95%), but challenges include large dataset requirements, overfitting, and interpretability concerns.
Table 3. Comparative Analysis of Previous Studies on Deep Learning Approaches in Breast Cancer Detection
|
DL Technique(s) |
Dataset / Sample Size |
Key Findings / Performance |
Strengths |
Limitations / Challenges |
|
CNN, ResNet, DenseNet |
Ultrasound & Mammography (~10,000 images) |
Accuracy: 93–97%; Sensitivity: 91–96% |
Automatic feature extraction; high detection rate |
Needs large annotated datasets; black-box problem |
|
CNN, VGG16, ResNet50 |
Mammography & Ultrasound (~9,500 images) |
Accuracy: 94%; Sensitivity: 92% |
Outperforms ML in raw image classification |
High computational cost; interpretability issues |
|
Hybrid CNN + Attention Mechanism |
Mammography (~7,200 images) |
Accuracy: 95%; Sensitivity: 93% |
Attention improves focus on lesion areas; explainability |
Complex architecture; requires high computational resources |
|
Transfer Learning (ResNet + Fine-tuning) |
Multi-institutional Ultrasound (~12,000 images) |
Accuracy: 95%; Sensitivity: 94% |
Adaptable to new datasets; efficient with pre-trained models |
Limited interpretability; resource-intensive |
|
CNN (custom architecture) |
Mammography (~15,000 images) |
Accuracy: 94%; Sensitivity: 92% |
Large dataset; high detection rate |
Needs cross-institutional validation; “black-box” issue |
|
DL with Radiomics-guided CNN |
Mammography & MRI (~7,500 images) |
Accuracy: 93–96%; Sensitivity: 91–94% |
Captures subtle patterns; high accuracy |
Radiomics feature extraction may be dataset-dependent |
|
DL-enhanced Handheld Ultrasound |
Ultrasound (~4,000 scans) |
Accuracy: 88–92%; Sensitivity: 86% |
Useful in low-resource settings; portable |
Variable image quality; lower accuracy than standard DL on large datasets |
|
CNN + Hybrid ML |
Mammography & Histopathology (~8,500 images) |
Accuracy: 91–95%; Sensitivity: 89–93% |
Combines DL automation with ML interpretability |
Training complexity; requires careful hyper parameter tuning |
|
Explainable DL (XAI) |
Multiple mammography datasets (~6,000 images) |
Sensitivity: 90%; improved transparency |
Clinician trust; interpretable predictions |
Slight performance drop compared to pure DL; limited real-world validation |
|
CNN Variants (ResNet, Dense Net, Efficient Net) |
Mammography & Ultrasound (~10,000 images) |
Accuracy: 95–97%; Sensitivity: 93–96% |
High accuracy; capable of multi-modal feature extraction |
High computational demand; black-box model |
Analysis and Comparison
1- DL Techniques and Performance:
ü CNNs and their variants (ResNet, VGG16, Dense Net, Efficient Net) dominate breast cancer detection tasks.
ü Transfer learning and hybrid architectures improve accuracy, particularly for smaller datasets.
ü Attention mechanisms enhance model focus on lesion regions, improving detection sensitivity.
2- Dataset Characteristics:
ü Large datasets (>7,000 images) are critical to achieve high performance in DL models.
ü Multi-institutional datasets improve generalization but require significant computational resources.
3- Strengths of DL Approaches:
ü Automatic feature extraction from raw images reduces reliance on manual feature engineering.
ü High accuracy and sensitivity compared to traditional ML, especially in complex imaging modalities (ultrasound, MRI).
ü Hybrid and attention-based models enhance interpretability and clinical relevance.
4- Limitations of DL Approaches:
ü Black-box nature limits clinician trust without explainable AI (XAI) integration.
ü Computationally intensive, requiring GPUs or high-performance computing for training.
ü Model generalization may be limited if trained on single-center datasets.
5- Key Insights:
ü DL is superior to ML in raw image classification and multi-modal imaging tasks.
ü Explainable DL approaches help balance performance and interpretability, which is crucial for clinical adoption.
ü Transfer learning and hybrid DL-ML frameworks reduce data requirements while maintaining high accuracy.
Analytical Discussion
ü Strengths: AI approaches reduce human error, accelerate diagnosis, and facilitate personalized treatment planning.
ü Limitations: Lack of standardized datasets, class imbalance, and limited clinical validation hinder translation to practice.
ü Trends: Multi-modal AI integrating imaging, genomics, and clinical data shows promise. Explainable AI is critical for clinician trust.
Table 4. Analytical Comparison of Previous Studies on AI Approaches in Breast Cancer Detection
|
AI Approach |
Dataset / Sample Size |
Performance (Accuracy / Sensitivity) |
Strengths / Key Insights |
Limitations / Analytical Critique |
|
ML: SVM, Random Forest, XGBoost |
Mammography (~5,000 images) |
87–92% / 85–90% |
High interpretability; suitable for small datasets; ML feature importance allows clinical insight |
Requires careful feature engineering; limited performance on raw images; may struggle with cross-institutional generalization |
|
DL: CNN, ResNet, Dense Net |
Ultrasound & Mammography (~10,000 images) |
93–97% / 91–96% |
Automatic feature extraction; high sensitivity; effective in detecting subtle lesions |
Black-box nature; needs large annotated datasets; computationally intensive |
|
Hybrid ML/DL with Radiomics |
Mammography & MRI (~7,500 images) |
93–96% / 91–94% |
Radiomics features enhance subtle pattern detection; hybrid approach balances interpretability and automation |
Feature extraction may be dataset-dependent; increased model complexity |
|
Transfer Learning (ResNet fine-tuned) |
Multi-institutional Ultrasound (~12,000 images) |
95% / 94% |
Transfer learning improves generalization; adaptable to new datasets |
High computational cost; interpretability limited; real-world validation needed |
|
ML vs DL comparison |
Mammography & Ultrasound (~9,500 images) |
ML: 89% / 87%; DL: 94% / 92% |
DL outperforms ML on raw images; ML retains interpretability; hybrid approaches recommended |
ML underperforms for raw image analysis; DL black-box issue remains |
|
Explainable AI (XAI) with DL |
Multiple mammography (~6,000 images) |
Sensitivity: 90% |
Improves transparency and clinician trust; interpretable decisions; addresses black-box concerns |
Slight reduction in raw accuracy compared to pure DL; limited multi-institutional validation |
|
DL-enhanced Handheld Ultrasound |
Ultrasound (~4,000 scans) |
88–92% / 86% |
Portable; useful in low-resource settings; enhances accessibility |
Variable image quality; lower accuracy than standard DL; real-time constraints |
|
Custom CNN architecture |
Mammography (~15,000 images) |
94% / 92% |
High detection rate; effective with large datasets; robust sensitivity |
Requires large-scale data; interpretability remains a challenge |
|
Hybrid CNN + ML |
Mammography & Histopathology (~8,500 images) |
91–95% / 89–93% |
Combines DL automation with ML interpretability; balanced performance |
Complex training; sensitive to hyper parameters; may require extensive tuning |
|
CNN Variants (ResNet, Dense Net, Efficient Net) |
Mammography & Ultrasound (~10,000 images) |
95–97% / 93–96% |
High accuracy; capable of multi-modal feature extraction; suitable for clinical applications |
Black-box models; high computational requirements; cross-institution generalization limited |
Analytical Discussion / Insights
1- Performance Trends:
ü DL models generally achieve higher accuracy and sensitivity compared to traditional ML due to automatic feature extraction and multi-layer representation of images.
ü ML models maintain interpretability and lower computational costs, making them suitable for smaller datasets or low-resource environments.
2- Hybrid Approaches:
ü Combining ML with DL or integrating radiomics improves both accuracy and interpretability.
ü Hybrid models balance automation with feature-level explain ability, addressing clinician trust issues.
3- Explain ability:
ü XAI approaches, including SHAP and attention mechanisms, improve transparency in DL models.
ü Slight compromise in raw predictive performance is observed but is often offset by clinician trust and adoption.
4- Dataset Implications:
ü Large-scale, multi-institutional datasets enhance generalizability and robustness of DL models.
ü Small or single-institution datasets limit model performance and external validation.
5- Key Limitations Across Studies:
ü DL models often suffer from the black-box problem.
ü High computational costs and the need for GPU acceleration limit deployment in low-resource settings.
ü Feature dependency in ML models and tuning complexity in hybrid models remain barriers to real-world adoption.
6- Critical Takeaways:
ü ML remains valuable where interpretability, small datasets, and low computational resources are priorities.
ü DL dominates in raw image analysis and multi-modal imaging but requires strategies for explain ability, generalization, and efficiency.
ü Future research should focus on multi-modal integration, explainable DL, and validation in diverse populations for clinical translation.
Discussion
The application of artificial intelligence (AI) in the early detection and diagnosis of breast cancer has emerged as a transformative area of research in medical imaging and oncology. This discussion synthesizes the findings from recent studies, comparing machine learning (ML) and deep learning (DL) approaches, highlighting their clinical relevance, and identifying the limitations and opportunities for future research.
Machine Learning Approaches: Machine learning techniques have long been employed in breast cancer diagnostics, relying on structured datasets derived from imaging, clinical records, and histopathological data. Algorithms such as support vector machines (SVMs), random forests (RF), k-nearest neighbors (KNN), and gradient boosting classifiers have demonstrated robust performance in distinguishing benign from malignant lesions. One key strength of ML approaches lies in their interpretability and flexibility, allowing clinicians to understand feature importance and decision-making processes. Feature engineering, including texture analysis, shape descriptors, and statistical measures, remains central to achieving high classification accuracy in ML-based systems.
However, the performance of ML models is highly dependent on the quality and quantity of annotated datasets. Smaller datasets and class imbalance can lead to overfitting or biased predictions, reducing generalizability across different populations or imaging modalities. To address these limitations, ensemble methods and cross-validation strategies are commonly employed. For instance, studies integrating multiple classifiers have demonstrated improved sensitivity and specificity, indicating that combining diverse models mitigates individual algorithmic weaknesses. Despite these advancements, ML approaches generally require manual feature extraction and domain expertise, which can limit scalability and automation in clinical workflows.
Deep Learning Approaches: Deep learning, particularly convolutional neural networks (CNNs), has revolutionized the field of breast cancer detection by enabling end-to-end analysis of raw imaging data without the need for handcrafted feature extraction. CNNs, recurrent neural networks (RNNs), and hybrid architectures automatically learn hierarchical representations from mammograms, ultrasound images, MRI scans, and histopathology slides. This capability has resulted in remarkable improvements in classification accuracy, sensitivity, and specificity, with several studies reporting performance exceeding 95%.
DL models are particularly effective in capturing complex spatial patterns, subtle tissue anomalies, and high-dimensional correlations that may be imperceptible to human observers. Moreover, advanced architectures, such as attention-based networks and generative adversarial networks (GANs), have enhanced model robustness, data augmentation, and interpretability. Multi-modal deep learning approaches, integrating imaging, genomic, and clinical data, have further demonstrated the potential to improve risk prediction and support personalized treatment strategies.
Despite these advantages, DL models face several challenges. Large volumes of labeled data are required for effective training, which may not be readily available due to privacy concerns or cost of annotation. Furthermore, the "black-box" nature of DL algorithms limits interpretability, making clinical adoption challenging. Explainable AI (XAI) techniques, such as saliency maps, Grad-CAM, and layer-wise relevance propagation, offer partial solutions but remain underdeveloped in terms of providing actionable clinical insights. Additionally, computational complexity and the need for high-performance hardware can hinder real-time deployment in resource-constrained healthcare settings.
Comparative Analysis of ML and DL: Analytically, ML and DL approaches are complementary rather than mutually exclusive. While ML provides interpretability, lower computational requirements, and effectiveness in smaller datasets, DL excels in automated feature extraction, scalability, and performance on large, high-dimensional datasets. Comparative studies indicate that hybrid systems—combining ML-based feature selection with DL classification—can harness the advantages of both paradigms, resulting in improved predictive accuracy and clinical applicability. Moreover, integration of ML and DL approaches allows for the development of semi-automated diagnostic pipelines that support radiologists in decision-making while retaining transparency and interpretability.
Clinical Relevance and Impact: The clinical integration of AI-based breast cancer diagnostics offers several potential benefits. Firstly, AI can significantly reduce diagnostic errors by providing consistent, objective, and reproducible analysis. Secondly, AI-assisted systems enhance workflow efficiency by prioritizing high-risk cases, thus optimizing radiologists’ time and attention. Thirdly, AI facilitates early detection, which is crucial for improving patient prognosis and enabling less invasive treatment interventions. Multi-modal AI systems that combine imaging, histopathology, and genomics allow for personalized treatment planning, risk stratification, and monitoring therapeutic responses, aligning with the principles of precision medicine.
However, translating AI into routine clinical practice requires rigorous validation, regulatory compliance, and trust-building among healthcare professionals. External validation on diverse, multi-institutional datasets is essential to ensure model generalizability and reliability. Furthermore, ethical considerations, including patient data privacy, algorithmic bias, and equity in access to AI-driven diagnostics, must be addressed to prevent unintended disparities in healthcare delivery.
Challenges and Limitations: Despite notable advances, several challenges limit the full-scale adoption of AI in breast cancer diagnostics. Data scarcity and heterogeneity remain prominent barriers, particularly for DL models requiring large, annotated datasets. Class imbalance, where malignant cases are underrepresented, can result in biased predictions and reduced sensitivity. Model interpretability remains a significant concern, as clinicians need to understand AI decision-making to trust and act upon its outputs. Additionally, differences in imaging protocols, equipment, and patient demographics across institutions complicate model generalization. Finally, regulatory hurdles, including approval by health authorities and compliance with medical device standards, pose practical challenges for AI deployment.
Future Directions: Future research should prioritize the development of multi-modal AI frameworks that integrate imaging, genomic, clinical, and lifestyle data to improve diagnostic accuracy and enable comprehensive risk assessment. Efforts to enhance model interpretability through explainable AI techniques are critical for clinician trust and regulatory approval. The creation of standardized, publicly available datasets and collaborative multi-institutional studies will facilitate robust model training and validation. Moreover, real-time, cloud-based AI diagnostic platforms have the potential to extend access to underserved regions and resource-constrained healthcare environments. Finally, longitudinal studies evaluating clinical outcomes, cost-effectiveness, and patient-centered benefits of AI-assisted diagnostics are necessary to establish its role in standard-of-care protocols.
A recent study has demonstrated that deep learning models, particularly Convolutional Neural Networks (CNNs), outperform traditional classification techniques in breast cancer detection. The research highlights the superior accuracy of CNNs in analyzing medical imaging data, emphasizing the importance of advanced neural network architectures in improving diagnostic outcomes. However, the study also acknowledges challenges such as the need for large annotated datasets and the complexity of model interpretability.
Table 5. Analytical Summary and Conclusion of AI Approaches in Breast Cancer Detection
|
Key Findings / Performance |
Advantages |
Limitations / Challenges |
Analytical Insights |
|
Accuracy: 85–92%; Sensitivity: 83–90% |
High interpretability; low computational cost; effective for structured or small datasets |
Requires manual feature engineering; lower performance on raw imaging; limited cross-institution generalization |
ML remains useful for low-resource settings, explainable decision-making, and small datasets. Best applied where domain knowledge allows robust feature extraction. |
|
Accuracy: 93–97%; Sensitivity: 91–96% |
Automatic feature extraction; high detection rate; suitable for raw images and multi-modal datasets |
Black-box problem; requires large annotated datasets; high computational requirements |
DL outperforms ML in raw image classification and multi-modal imaging. Transfer learning and attention mechanisms improve efficiency and lesion localization. |
|
Accuracy: 91–96%; Sensitivity: 89–94% |
Combines interpretability of ML with the accuracy of DL; radiomics-guided hybrid models capture subtle patterns |
Increased model complexity; computationally intensive; feature extraction may be dataset-dependent |
Hybrid approaches balance performance and interpretability, ideal for clinical integration and multi-modal datasets. Provides a promising path toward explainable AI in oncology. |
|
Sensitivity: ~90%; slightly lower raw accuracy than pure DL |
Enhances clinician trust; interpretable predictions; supports regulatory compliance |
Slight reduction in accuracy; limited large-scale validation |
XAI mitigates the black-box problem and is crucial for clinical adoption. Trade-off between transparency and raw predictive performance must be considered. |
|
Accuracy: 85–92%; Sensitivity: 83–88% |
Feasible in low-resource environments; portable solutions increase accessibility |
Limited dataset size; variable image quality; performance lower than large-scale DL |
Portable AI solutions are important for screening in underserved regions. Optimizing performance under data constraints is a key research area. |
|
- AI significantly improves early detection accuracy compared to traditional methods. |
- Potential to reduce human error and improve workflow efficiency. |
- Generalization across populations remains a challenge. |
Future research should focus on multi-modal data integration, explainable deep learning, multi-center validation, and cost-effective deployment to maximize clinical impact. |
Analytical Discussion
1- Comparative Strengths:
ü DL models dominate in image-based detection, capturing subtle lesion patterns automatically.
ü ML models retain advantages where interpretability and smaller datasets are priorities.
ü Hybrid models leverage the strengths of both, achieving high performance while partially retaining explain ability.
2- Explain ability & Trust:
ü Explainable AI is increasingly important for clinical adoption, balancing the high performance of DL with interpretability for clinicians.
3- Dataset and Resource Considerations:
ü Large annotated datasets improve DL performance, but low-resource AI solutions are needed for broader accessibility.
ü Transfer learning and multi-institutional datasets improve generalization and reduce training requirements.
4- Future Directions:
ü Focus on multi-modal AI, combining imaging, genomics, and clinical data.
ü Integrate explainable models for better clinician trust and regulatory approval.
ü Develop cost-effective, portable AI solutions for underserved regions.
Boddu (2025) conducted a comprehensive review of various ML algorithms, including eXtreme Gradient Boosting (XGBoost), Naïve Bayes, and Support Vector Machines (SVM), in the context of breast cancer detection. The study highlighted the significant role of feature engineering in enhancing model performance, emphasizing the importance of selecting relevant features from imaging and clinical data. However, it also noted challenges such as overfitting in small datasets and the necessity for domain expertise in feature extraction.
Nasser (2023) explored the application of deep learning techniques, particularly Convolutional Neural Networks (CNNs), in breast cancer diagnosis. The study found that CNNs, due to their ability to automatically extract hierarchical features from imaging data, outperformed traditional ML models in terms of accuracy and sensitivity. However, it also pointed out the need for large annotated datasets and the challenges associated with model interpretability
Bunnell et al. (2024) reviewed the use of AI in handheld breast ultrasound devices for screening purposes. The paper discussed the potential of these AI-powered devices to enhance accessibility and efficiency in breast cancer screening, especially in low-resource settings. It also addressed challenges such as the need for real-time processing capabilities and the variability in image quality.
Abbadi et al. (2025) explored the application of deep transfer learning in breast ultrasound cancer detection. The study highlighted the benefits of transfer learning in leveraging pre-trained models to improve performance with limited datasets. It also discussed the importance of interpretability in DL models to ensure clinical applicability and trust.
Humayun et al. (2025) reviewed the application of AI in breast cancer detection in low-resource settings. The paper emphasized the potential of AI to bridge gaps in healthcare access by providing affordable and efficient diagnostic tools. It also discussed the challenges related to data scarcity and the need for context-specific solutions.
Arravalli et al. (2025) investigated the comparative effectiveness of ML and DL techniques in breast cancer detection. The study found that while DL models generally outperformed ML models in terms of accuracy, ML models offered advantages in interpretability and required fewer computational resources. The paper suggested that hybrid approaches could leverage the strengths of both methodologies.
Maruf et al. (2025) conducted a systematic review and meta-analysis on the use of radiomics-guided AI models for breast cancer diagnosis. The study found that integrating radiomic features with ML/DL models enhanced diagnostic performance by capturing subtle patterns in imaging data. It also highlighted the variability in study designs and the need for standardized protocols.
Rahman et al. (2025) reviewed recent advancements in ML and DL approaches for breast cancer detection. The paper discussed the evolution of AI models, from traditional ML techniques to more complex DL architectures, and their impact on diagnostic accuracy across various imaging modalities. It also addressed the challenges of model generalization and the need for diverse datasets to improve robustness.
Conclusion
This systematic review provides a comprehensive analysis of the application of artificial intelligence (AI), encompassing both machine learning (ML) and deep learning (DL) approaches, in the early detection and diagnosis of breast cancer. The reviewed studies collectively demonstrate that AI technologies hold significant promise in enhancing diagnostic accuracy, improving workflow efficiency, and supporting personalized patient care. Both ML and DL techniques have been widely applied across various imaging modalities, including mammography, ultrasound, magnetic resonance imaging (MRI), and histopathological slides, revealing distinct strengths and limitations for each approach.
Machine learning approaches, such as support vector machines, random forests, and ensemble classifiers, have shown robust performance in classifying breast lesions based on manually extracted features. These models offer interpretability, allowing clinicians to understand the decision-making process and identify which features contribute most to classification outcomes. However, ML models are often constrained by their reliance on feature engineering and may struggle with large, high-dimensional datasets. Deep learning approaches, particularly convolutional neural networks (CNNs) and hybrid architectures, overcome these limitations by automatically learning hierarchical features from raw images. DL models consistently achieve higher classification accuracy and sensitivity, often exceeding 95%, and have proven effective in detecting subtle abnormalities that may be overlooked by human observers or traditional ML algorithms.
Despite these advances, several challenges impede the widespread clinical adoption of AI-based breast cancer diagnostics. Data scarcity, particularly for annotated medical images, limits model training and generalization. Class imbalance, where malignant cases are underrepresented, can bias predictions, while the “black-box” nature of DL models raises concerns regarding interpretability and clinician trust. Moreover, variability in imaging protocols, equipment, and patient populations complicates cross-institutional model generalization. Ethical and regulatory issues, including patient data privacy, algorithmic bias, and compliance with medical standards, further complicate the integration of AI into routine clinical practice.
Analytical evaluation of existing literature suggests that hybrid and multi-modal approaches provide a promising solution to these challenges. Combining ML-based feature selection with DL-based classification, or integrating imaging data with genomic and clinical information, can enhance diagnostic accuracy while preserving interpretability. Additionally, the application of explainable AI (XAI) techniques, such as SHAP, Grad-CAM, and attention mechanisms, can improve transparency, enabling clinicians to understand and validate model predictions. Such approaches not only increase trust in AI systems but also facilitate regulatory approval and clinical deployment.
In conclusion, AI represents a transformative tool in breast cancer detection and diagnosis, offering the potential to reduce human error, accelerate diagnosis, and support personalized treatment strategies. Machine learning models provide interpretability and robustness for smaller datasets, while deep learning models offer superior automation and high performance in large-scale image analysis. The integration of these approaches, particularly in hybrid and multi-modal frameworks, is likely to define the future of AI-assisted diagnostics. Future research should focus on generating large, diverse, and annotated datasets, enhancing model interpretability, validating AI systems in multi-institutional settings, and addressing ethical and regulatory considerations. By overcoming these challenges, AI has the potential to become an indispensable component of breast cancer screening and diagnosis, ultimately improving patient outcomes and advancing precision medicine in oncology.
Disclosure Statement
No potential conflict of interest reported by the authors.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Authors' Contributions
All authors contributed to data analysis, drafting, and revising of the paper and agreed to be responsible for all the aspects of this work.