Performance Validation of CT Radiomics for Endometrial Tumor Classification: A Comprehensive Analysis for Translational Research

Elijah Foster Nov 26, 2025 206

This article provides a comprehensive analysis of the performance validation of CT radiomics for endometrial tumor classification, addressing a critical need in gynecological oncology.

Performance Validation of CT Radiomics for Endometrial Tumor Classification: A Comprehensive Analysis for Translational Research

Abstract

This article provides a comprehensive analysis of the performance validation of CT radiomics for endometrial tumor classification, addressing a critical need in gynecological oncology. While MRI remains the gold standard for local staging, recent advancements demonstrate CT radiomics offers a highly accurate, accessible alternative for differentiating malignant from benign endometrial lesions. We explore the foundational principles of radiomics feature extraction from CT images, detail optimal machine learning methodologies for model development, address key technical challenges in clinical implementation, and present rigorous multi-center validation results. For researchers and drug development professionals, this synthesis highlights CT radiomics as a promising non-invasive tool for precision medicine, with potential applications in prognostic prediction and therapeutic monitoring that warrant further investigation in larger prospective trials.

The Emerging Role of CT Radiomics in Endometrial Cancer Management

Endometrial cancer (EC) represents a significant and growing health burden as the most common gynecologic malignancy in high-income countries, with incidence rates rising globally due to factors including increasing obesity rates and population aging [1] [2]. By the end of 2023, an estimated 66,200 new cases and 13,030 related deaths were projected in the United States alone [3]. This malignancy demonstrates alarming disparities, with Black women in the United States twice as likely to be diagnosed and die from endometrial cancer compared to women of other races [4]. The five-year survival rate exceeds 95% when detected early but plummets to approximately 15% once the disease has metastasized, underscoring the critical importance of early and accurate detection [1].

Traditional diagnostic pathways for endometrial cancer present significant challenges. While transvaginal ultrasound (TVUS) serves as an initial screening tool, its moderate specificity (61%-86%) often necessitates invasive follow-up procedures [1]. The current diagnostic gold standard involves endometrial biopsy via dilation and curettage (D&C), which is invasive, painful, and inconclusive in up to 30% of cases due to insufficient tissue sampling, frequently requiring repeat procedures and delaying diagnosis [5] [6]. These limitations highlight an urgent clinical imperative for developing accurate, accessible, and non-invasive diagnostic alternatives that can improve patient experience, reduce procedural risks, and facilitate earlier detection.

Comparative Analysis of Non-Invasive Diagnostic Modalities

Emerging Liquid Biopsy and Molecular Tests

The development of non-invasive liquid-based cytology tests represents a significant advancement in endometrial cancer detection. Recently launched tests like EdenDx utilize novel molecular approaches, detecting hypermethylation of the CDO1 and CELF4 genes strongly associated with endometrial cancer [5]. This test uses an endocervical sample collected during a routine pelvic exam with a cervical brush or broom, preserved in a ThinPrep vial, and delivers results within three to seven days. Validation studies demonstrated impressive performance with 97.8% specificity and 85.3% sensitivity, including detection of high-grade cancers [5].

Another promising approach is the WomEC test, which targets proteins in the uterine fluid fraction of biopsy samples. Using mass spectrometry to assess over 100 proteins in uterine fluid samples from 358 patients, researchers identified three protein biomarkers that correctly identified 99% of women with endometrial cancer while ruling it out in 97% of those not requiring further testing [6]. This method enhances diagnostic accuracy from standard biopsies without additional invasive procedures, potentially reducing unnecessary follow-up interventions.

Advanced Imaging and AI-Assisted Classification

Table 1: Performance Comparison of Imaging Modalities in Endometrial Cancer Detection

Imaging Modality Technology Approach Classification Task Performance Metrics Study Details
CT Radiomics Random Forest ML model with 1132 radiomic features Malignant vs. benign endometrial tumors AUROC: 0.96Sensitivity: 100%Specificity: 92.31% 83 patients from two centers [3]
Multiparametric MRI Clinical-radiomics DL model with ResNet-50 Molecular subtype classification Macro-average AUC: 0.79 (internal validation)AUC: 0.74 (external validation) 526 patients across three institutions [7]
MRI Deep Transfer Learning ResNet50, ResNet101, DenseNet121 with fusion strategies Prediction of tumor aggressiveness AUC: 0.950 (test cohort)AUC: 0.972 (training cohort) 207 patients with pathologically confirmed EC [2]
Hybrid MRI/CT AI ResNet50 + Vision Transformer (ViT) Benign, malignant, and normal classification MRI Accuracy: 90.24%CT Accuracy: 86.99% 300 patients from KAUH dataset [8] [9]
AI-Enhanced Ultrasound YOLOv8 deep learning model Endometrial cancer detection in postmenopausal women AUC: 0.858 (testing set)AUC: 0.811 (validation set) 877 patients in primary care settings [10]

Nuclear Medicine and Radiotheranostic Approaches

Emerging research in nuclear medicine has identified promising biomarkers for endometrial cancer theranostics. Investigations of human epidermal growth factor receptor 2 (HER2), mucin-16 (MUC16), and CD24 as potential radiotheranostic targets have yielded encouraging results [4]. Immunofluorescent staining revealed that endometrial cancer cells and tissue samples express elevated levels of these biomarkers compared with healthy controls.

The development of 89Zr-labeled radioimmunoconjugates ([89Zr]Zr-DFO-trastuzumab for HER2, [89Zr]Zr-DFO-AR9.6 for MUC16, and [89Zr]Zr-DFO-ATG-031 for CD24) demonstrated high radiochemical conversion (>95%), purity (>95%), and specific activity (4-5 mCi/mg) [4]. In vivo performance evaluation in murine models revealed [89Zr]Zr-DFO-ATG-031 (targeting CD24) provided the highest tumor uptake (>30 %ID/g) and tumor-to-background contrast, while [89Zr]Zr-DFO-trastuzumab (targeting HER2) produced moderate yet promising results [4].

Experimental Protocols and Methodologies

CT Radiomics Workflow for Endometrial Tumor Classification

Table 2: Key Research Reagent Solutions for CT Radiomics

Research Tool Specifications/Parameters Primary Function in Research
Pyradiomics Python Package Version 3.0.1, IBSI-compliant Extraction of high-throughput radiomic features from medical images
Pre-surgical CT Scans Standardized acquisition protocols across centers Source imaging data for radiomic analysis and model development
Random Forest Algorithm Ensemble learning method with multiple decision trees Optimal ML modeling for EC classification based on comparative analysis
SHAP (SHapley Additive exPlanations) Game theory-based approach Model interpretability and feature importance analysis
ITK-SNAP Software Version 3.8.0 Manual segmentation of regions of interest (ROIs) for feature extraction

The CT radiomics analysis followed a structured workflow [3]. For patient selection, 83 EC patients from two centers (46 malignant, 37 benign) were divided into training (n=59) and testing (n=24) sets. Region of interest (ROI) segmentation was performed manually on pre-surgical CT scans using specialized software. Feature extraction utilized the Pyradiomics package to derive 1,132 radiomic features from each ROI.

For model development, six explainable machine learning algorithms were implemented and compared: Logistic Regression, K-Nearest Neighbors, Support Vector Classifier, XGBoost, Random Forest, and TabPFNv2. The Random Forest model emerged as optimal, with its effectiveness attributed to the ensemble nature that minimizes overfitting by constructing decision trees with different data and feature subsets. Model validation included fivefold cross-validation, with performance evaluation based on sensitivity, specificity, accuracy, precision, F1 score, AUROC, and AUPRC. Finally, interpretability analysis employed SHAP to identify the most important radiomic features and decision curve analysis to assess clinical utility.

CT_Workflow Patient Cohort\n(83 patients) Patient Cohort (83 patients) CT Image Acquisition CT Image Acquisition Patient Cohort\n(83 patients)->CT Image Acquisition ROI Segmentation\n(ITK-SNAP) ROI Segmentation (ITK-SNAP) CT Image Acquisition->ROI Segmentation\n(ITK-SNAP) Feature Extraction\n(1132 features via Pyradiomics) Feature Extraction (1132 features via Pyradiomics) ROI Segmentation\n(ITK-SNAP)->Feature Extraction\n(1132 features via Pyradiomics) Machine Learning Modeling\n(6 algorithms compared) Machine Learning Modeling (6 algorithms compared) Feature Extraction\n(1132 features via Pyradiomics)->Machine Learning Modeling\n(6 algorithms compared) Model Validation\n(5-fold cross-validation) Model Validation (5-fold cross-validation) Machine Learning Modeling\n(6 algorithms compared)->Model Validation\n(5-fold cross-validation) Random Forest Selected Random Forest Selected Machine Learning Modeling\n(6 algorithms compared)->Random Forest Selected Performance Evaluation\n(AUROC, Sensitivity, Specificity) Performance Evaluation (AUROC, Sensitivity, Specificity) Model Validation\n(5-fold cross-validation)->Performance Evaluation\n(AUROC, Sensitivity, Specificity) Clinical Interpretation\n(SHAP analysis, Decision Curves) Clinical Interpretation (SHAP analysis, Decision Curves) Performance Evaluation\n(AUROC, Sensitivity, Specificity)->Clinical Interpretation\n(SHAP analysis, Decision Curves)

MRI-Based Molecular Subtyping Protocol

The molecular subtyping study utilized a comprehensive approach integrating clinical and radiomic data [7]. The patient cohort included 526 patients from three institutions with confirmed EC who underwent surgery, MRI, and molecular pathology between January 2020 and March 2024. Patients were divided into training, internal validation, and external validation cohorts.

MRI acquisition included multiple sequences: axial T1WI, axial/sagittal/coronal fat-saturation T2-weighted imaging, diffusion-weighted imaging, and dynamic contrast-enhanced T1-weighted imaging. For radiomics feature extraction, 386 handcrafted features were extracted from each MR sequence using Pyradiomics, while MoCo-v2 was employed for contrastive self-supervised learning to extract 2,048 deep learning features per patient. Feature selection integrated selected features into 12 machine learning methods, with recursive feature elimination applied to identify the most predictive features.

The clinical-radiomics DL model outperformed both clinical-only and radiomics-only models, achieving macro-average AUCs of 0.79 vs. 0.69 and 0.73 in internal validation, and 0.74 vs. 0.67 and 0.69 in external validation [7]. The MRI features exhibited particularly strong diagnostic performance for POLEmut and p53abn molecular subtypes.

Discussion: Integration into Clinical Practice

The accumulating evidence supporting non-invasive diagnostic approaches for endometrial cancer highlights their potential to transform clinical practice. The complementary strengths of different modalities suggest potential integrated implementation pathways. Liquid biopsy tests like EdenDx and WomEC offer minimal invasiveness suitable for primary care settings and initial screening [5] [6]. CT radiomics provides widely accessible imaging-based classification with high accuracy (AUROC: 0.96) [3], while MRI-based approaches deliver superior soft tissue characterization for molecular subtyping and surgical planning [7] [8].

For real-world implementation, several considerations emerge. Infrastructure requirements vary significantly across modalities, with CT being most widely accessible in diverse healthcare settings. Interpretability remains crucial for clinical adoption, with SHAP analysis and feature mapping providing transparent decision support for radiomics models [3]. Regulatory validation needs include standardization of imaging protocols, reproducibility across institutions, and demonstration of clinical utility through impact on patient outcomes.

Future research directions should focus on multimodal integration, combining the strengths of liquid biomarkers, radiomics, and clinical data for improved diagnostic accuracy. Prospective validation across diverse populations and healthcare settings is essential to establish generalizability. Furthermore, cost-effectiveness analyses will be critical for guiding implementation, particularly in resource-limited settings where endometrial cancer disparities are most pronounced.

The development and validation of non-invasive diagnostic tools for endometrial cancer address a critical clinical imperative driven by rising incidence rates, disparities in outcomes, and limitations of current diagnostic approaches. The emerging technologies reviewed—including liquid biopsy tests, CT and MRI radiomics, AI-enhanced ultrasound, and radiotheranostic agents—demonstrate promising performance characteristics that could significantly improve early detection, molecular classification, and personalized treatment planning.

The experimental data compiled in this analysis provides compelling evidence that these non-invasive approaches can achieve diagnostic accuracy comparable to or exceeding conventional methods while reducing patient discomfort, procedural risks, and accessibility barriers. As validation studies continue to mature, the integration of these technologies into standardized diagnostic pathways holds potential to transform endometrial cancer care, ultimately improving survival rates and reducing disparities in outcomes across diverse patient populations.

Radiomics is a rapidly advancing field in medical imaging that converts standard-of-care CT images into high-dimensional, mineable data. This process extracts sub-visual quantitative features that can reveal tumor characteristics imperceptible to the human eye, offering a powerful method for capturing intratumoral heterogeneity and advancing personalized medicine [3]. In endometrial cancer (EC) management, where CT scans are widely used for initial diagnosis and staging due to their faster acquisition times and broader availability compared to MRI, CT radiomics provides a critical tool for non-invasive tumor characterization [8] [3]. The transformation from pixel values to predictive models follows a structured pipeline encompassing image acquisition, segmentation, feature extraction, and model development, creating a bridge between radiology and precision oncology.

Experimental Protocols: From Image Acquisition to Validation

Image Acquisition and Preprocessing

The foundational step in radiomics analysis involves acquiring standardized CT images. In a two-center study focusing on endometrial cancer, pre-surgical CT scans were obtained from patients following specific protocols. The scans were typically performed using various CT scanners from major manufacturers (e.g., GE Healthcare, Philips, Siemens), with tube voltage settings of 100-140 kV and slice thickness ranging from 1-5 mm [11]. For contrast-enhanced studies, the venous phase is particularly valuable for endometrial cancer staging as it better highlights parenchymal characteristics and the contrast between tumor and myometrium [11]. Prior to feature extraction, crucial preprocessing steps include image normalization and resampling (often to 1×1×1 mm resolution) to minimize inter-scanner variability and ensure feature comparability across different datasets [12].

Tumor Segmentation and Feature Extraction

Tumor segmentation defines the region of interest (ROI) from which radiomic features are extracted. This process involves manually delineating the complete uterine volume slice-by-slice on axial views, excluding surrounding intestinal and vascular structures [11]. Specialized software like 3D Slicer or MIM software is typically used for this precise contouring [11] [12]. To ensure reproducibility, inter-observer agreement is quantified using metrics like the Dice Similarity Coefficient (DSC), with values ≥0.8 indicating satisfactory concordance between different radiologists' segmentations [11].

Following segmentation, high-throughput feature extraction is performed using platforms such as PyRadiomics [13] [3]. A single CT scan can yield over 1,100 radiomic features [3] [14], which can be categorized as:

  • First-order statistics: Describing the distribution of voxel intensities within the ROI (e.g., entropy, kurtosis, skewness)
  • Shape-based features: Quantifying three-dimensional geometric characteristics
  • Texture features: Capturing intra-tumoral heterogeneity patterns through matrices like Gray Level Co-occurrence Matrix (GLCM) and Gray Level Run Length Matrix (GLRLM) [3]

Table: Categories of Radiomic Features Extracted from CT Images

Feature Category Description Example Features
First-Order Statistics Distribution of voxel intensities Energy, Entropy, Kurtosis, Skewness
3D Shape-based Geometric characteristics of the tumor Volume, Surface Area, Sphericity
Texture Features Patterns and relationships of voxels GLCM, GLRLM, GLSZM features
Transformed Features Features from filtered images Wavelet, LoG-filtered features

Feature Selection and Model Development

The high dimensionality of radiomic datasets (often thousands of features relative to limited patient samples) necessitates robust feature selection to prevent model overfitting. Dimensionality reduction typically employs a two-stage approach: first using least absolute shrinkage and selection operator (LASSO) regression, followed by the minimum redundancy maximum relevance (mRMR) algorithm to identify the most discriminative radiomic signature [12]. This process typically reduces the feature set to a manageable number (e.g., 8-20 optimal features) that show significant differences between malignant and benign tumors [3] [12].

Multiple machine learning algorithms are then trained and validated on the selected feature sets. Common approaches include:

  • Random Forest: An ensemble method that constructs multiple decision trees
  • Support Vector Machine: Finds optimal hyperplanes to separate classes
  • Gradient Boosting: Builds sequential models that learn from previous errors
  • Logistic Regression: Provides a linear modeling approach [3] [12]

Model performance is evaluated using metrics such as area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and accuracy in independent testing sets [13] [3].

G CT Image Acquisition CT Image Acquisition Tumor Segmentation Tumor Segmentation CT Image Acquisition->Tumor Segmentation Scanner Protocols Scanner Protocols CT Image Acquisition->Scanner Protocols Feature Extraction Feature Extraction Tumor Segmentation->Feature Extraction ROI Delineation ROI Delineation Tumor Segmentation->ROI Delineation Feature Selection Feature Selection Feature Extraction->Feature Selection PyRadiomics PyRadiomics Feature Extraction->PyRadiomics Model Development Model Development Feature Selection->Model Development LASSO/mRMR LASSO/mRMR Feature Selection->LASSO/mRMR Validation Validation Model Development->Validation ML Algorithms ML Algorithms Model Development->ML Algorithms Performance Metrics Performance Metrics Validation->Performance Metrics

Performance Comparison: CT Radiomics for Endometrial Tumor Classification

Diagnostic Performance Across Methodologies

Recent studies have demonstrated the substantial potential of CT radiomics in differentiating malignant from benign endometrial tumors. A 2025 two-center study developed an explainable machine learning model using CT radiomics features from 83 endometrial cancer patients. Among six modeling strategies compared, the Random Forest model emerged as the optimal choice, achieving a perfect training AUROC of 1.00 and an exceptional testing AUROC of 0.96, with 100% sensitivity and 92.31% specificity in the independent test set [3]. This performance highlights the robust diagnostic capability of carefully developed radiomics models.

Another study utilizing a hybrid deep learning approach (combining ResNet50 and Vision Transformer) on endometrial image sets from 300 patients reported slightly lower but still impressive accuracy of 86.99% for CT-based classification of endometrial cancer, compared to 90.24% for MRI-based classification [8]. This suggests that while MRI may offer superior soft tissue contrast, CT radiomics remains highly competitive, particularly considering the wider availability and faster acquisition times of CT scanners.

Table: Comparative Performance of CT Radiomics in Endometrial Tumor Classification

Study Patients Best Model AUROC Sensitivity Specificity Accuracy
Zhang et al. (2025) [3] 83 Random Forest 0.96 100% 92.31% -
Comparative DL Study (2025) [8] 300 ResNet50-ViT Hybrid - - - 86.99%
Pulmonary GGN Study (2025) [12] 392 Gradient Boosting 0.929 85.1% 84.9% 85.0%

Comparison with Other Imaging Modalities

While CT radiomics demonstrates strong performance, understanding its relative strengths compared to other modalities is crucial for clinical implementation. MRI-based radiomics generally achieves slightly higher accuracy in endometrial cancer detection (90.24% in a direct comparison) due to its superior soft tissue contrast [8]. However, CT maintains important practical advantages, including broader availability, faster acquisition times, and lower cost, making CT radiomics more accessible for widespread clinical implementation, particularly in resource-limited settings [3].

For predicting treatment response and recurrence, CT radiomics has shown remarkable potential. A 2023 pilot study developed machine learning models using radiomic features from pre-surgical CT scans to predict endometrial cancer recurrence, achieving AUCs of 0.86-0.90 in the test set [11]. Patients classified as high-risk by these models exhibited significantly worse disease-free survival (p-value < 0.001), demonstrating the prognostic value of CT radiomics [11].

The Scientist's Toolkit: Essential Research Reagents and Solutions

Successful implementation of CT radiomics requires specialized tools and platforms throughout the analytical pipeline. The table below details key research reagents and their functions in radiomics research.

Table: Essential Research Reagents and Computational Tools for CT Radiomics

Tool Category Specific Solution Function in Radiomics Pipeline
Image Segmentation 3D Slicer, ITK-SNAP, MIM Software Manual/ semi-automatic delineation of tumor volumes
Feature Extraction PyRadiomics, RaCaT High-throughput extraction of radiomic features
Feature Selection LASSO, mRMR, Recursive Feature Elimination Dimensionality reduction and identification of optimal features
Machine Learning Scikit-learn, XGBoost, TensorFlow Model development and training
Model Interpretation SHAP, LIME Explainable AI for feature importance and model decisions

Technical Validation and Explainability

Model Validation Strategies

Robust validation is essential for clinical translation of radiomics models. Beyond standard training-test splits, rigorous approaches include cross-validation (e.g., 5-fold or 10-fold) and, most importantly, external validation on completely independent datasets from different institutions [3] [15]. The 2025 two-center study exemplified this approach by training on data from one institution and testing on another, demonstrating generalizability across different patient populations and scanning protocols [3]. Additionally, the Society of Nuclear Medicine and Molecular Imaging AI Task Force emphasizes external validation using data unseen during model development as a critical step for validating models [15].

Explainable AI in Radiomics

The "black box" nature of complex machine learning models presents a significant barrier to clinical adoption. Explainable AI techniques address this challenge by providing transparency into model decisions. SHAP (SHapley Additive exPlanations) analysis identifies the most important radiomic features driving predictions and illustrates their direction of effect [3]. In endometrial cancer classification, texture features (particularly from wavelet-transformed images) frequently emerge as top predictors, comprising approximately 60% of the most important features in Random Forest models [3]. Feature mapping visualization further enhances interpretability by graphically representing how these mathematical features manifest spatially within tumors, allowing clinicians to develop intuitive understanding of model reasoning [3].

G Radiomics Model Radiomics Model Performance Metrics Performance Metrics Radiomics Model->Performance Metrics Explainable AI Explainable AI Radiomics Model->Explainable AI AUROC AUROC Performance Metrics->AUROC Calibration Calibration Performance Metrics->Calibration Decision Curve Decision Curve Performance Metrics->Decision Curve Clinical Application Clinical Application Explainable AI->Clinical Application SHAP Analysis SHAP Analysis Explainable AI->SHAP Analysis Feature Maps Feature Maps Explainable AI->Feature Maps Model Trust Model Trust Explainable AI->Model Trust Risk Stratification Risk Stratification Clinical Application->Risk Stratification Clinical Adoption Clinical Adoption Clinical Application->Clinical Adoption

The transformation of standard CT images into mineable radiomics data represents a paradigm shift in medical image analysis, particularly for endometrial tumor classification. The validated performance of CT radiomics models, with AUROCs reaching 0.96 for malignancy detection, demonstrates clinical-grade diagnostic capability [3]. Future developments will likely focus on multi-center validation across larger populations, integration of clinical and molecular data with radiomic features [7], and implementation of standardized reporting guidelines to improve reproducibility [15]. As these tools become more refined and accessible, CT radiomics is poised to become an indispensable auxiliary tool for precise endometrial cancer diagnosis, ultimately supporting personalized treatment decisions and improving patient outcomes.

The assessment of endometrial cancer, a leading gynecologic malignancy, relies heavily on imaging for accurate diagnosis, staging, and treatment planning. Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are two cornerstone modalities for this task, each with distinct strengths and technological trajectories. Within the context of validating the performance of CT radiomics for endometrial tumor classification, this guide provides an objective, data-driven comparison of these modalities. We examine their fundamental diagnostic performance, explore advanced applications in AI-driven classification and molecular subtyping, and detail the experimental protocols that underpin these innovations, offering researchers a clear view of the current technological landscape.

Diagnostic Performance and Clinical Applications

CT and MRI serve complementary roles in the clinical management of endometrial cancer. Their utility varies significantly depending on the specific diagnostic task, from initial detection to detailed local staging.

MRI is lauded for its superior soft tissue contrast resolution, making it the preferred modality for evaluating local disease extent. It is the gold standard for assessing key prognostic factors such as the depth of myometrial invasion (MI) and cervical stromal invasion [8] [16]. A 2024 diagnostic performance study confirmed that while CT demonstrated high sensitivity for cervical invasion, MRI maintained higher specificity for evaluating myometrial invasion, which is critical for surgical planning [16].

CT, on the other hand, is more widely available, has faster acquisition times, and is superior for evaluating distant metastatic spread [8] [3]. It is particularly valuable in assessing extrauterine spread, lymphadenopathy, and metastatic disease beyond the pelvis [8]. Recent advances, such as Dual-Energy CT (DECT), have improved its local diagnostic capabilities. DECT generates virtual monoenergetic images (VMI) that significantly enhance iodine contrast, leading to a higher contrast-to-noise ratio (CNR) and improved sensitivity for detecting myometrial invasion compared to conventional CT [16].

Table 1: Comparison of Key Diagnostic Applications for CT and MRI in Endometrial Cancer

Diagnostic Task MRI Performance & Advantages CT Performance & Advantages
Myometrial Invasion High specificity; gold standard for local staging [16]. Improved sensitivity with DECT VMI; useful when MRI is contraindicated [16].
Cervical Invasion High diagnostic accuracy [16]. C-CT showed greater sensitivity and AUC than MRI in one study [16].
Lymph Node Metastasis Limited by size criteria; DW-MRI shows variable results [17]. More effective for advanced disease; visual assessment of PET/CT has low false-positive rate [8] [17].
Distant Metastasis Limited field of view for full-body staging. Recommended for distant staging; fast whole-body coverage [8] [3].
Patient Factors Less favorable for claustrophobic patients or those with metal implants [8] [16]. Broader accessibility; faster acquisition; less susceptible to motion artifacts [8] [3].

Quantitative Performance in AI-Driven Classification

Artificial intelligence (AI), particularly deep learning and radiomics, is revolutionizing endometrial tumor assessment by extracting sub-visual, quantitative data from images. Performance metrics from recent studies demonstrate the high accuracy of both CT and MRI when paired with AI models.

A seminal 2025 study directly compared a hybrid deep learning model (ViTNet) for classifying endometrial cases as benign, malignant, or normal. The model achieved an accuracy of 90.24% with MRI images and 86.99% with CT images, indicating that both modalities are highly effective, with MRI holding a slight performance advantage [8] [9].

Another 2025 study focused exclusively on CT radiomics for differentiating malignant from benign endometrial tumors. Using a Random Forest model, researchers reported an exceptional Area Under the Receiver Operating Characteristic Curve (AUROC) of 1.00 in the training set and 0.96 in the testing set, with 100% sensitivity and 92.31% specificity, proving CT's high diagnostic potential when enhanced with machine learning [3].

Table 2: Quantitative Performance of AI Models Using CT and MRI

Study Focus Imaging Modality AI Model Used Key Performance Metrics
Tumor Classification [8] [9] MRI Hybrid ResNet50-ViT (ViTNet) Accuracy: 90.24%
Tumor Classification [8] [9] CT Hybrid ResNet50-ViT (ViTNet) Accuracy: 86.99%
Malignant vs. Benign Differentiation [3] CT Random Forest Testing AUROC: 0.96, Sensitivity: 100%, Specificity: 92.31%
Molecular Subtype Classification [7] MRI Clinical-Radiomics Deep Learning Model Macro-Average AUC: 0.79 (Internal Validation)
Treatment Response Assessment [18] Contrast-Enhanced MRI Integrated Model with Biomarkers AUC: 0.864, Sensitivity: 78.3%, Specificity: 86.3%
Treatment Response Assessment [18] CT Integrated Model with Biomarkers AUC: 0.854, Sensitivity: 81.2%, Specificity: 83.4%

Advanced Applications: Molecular Subtyping and Treatment Response

Beyond basic classification, imaging radiomics is increasingly used to predict molecular subtypes and treatment response, enabling non-invasive personalized medicine.

The 2023 FIGO staging guidelines formally incorporated molecular subtypes due to their profound prognostic value [7]. While traditional MRI cannot reliably distinguish these subtypes, MRI-based clinical-radiomics deep learning models have shown significant promise. A 2025 multicenter study developed such a model to classify the four TCGA subtypes—POLEmut, MMRd, NSMP, and p53abn—achieving a macro-average AUC of 0.79, outperforming models based on clinical or radiomics data alone [7]. Another study confirmed the value of intratumoral and peritumoral radiomic features from multiparametric MRI for this task [19].

For evaluating treatment response in recurrent endometrial cancer, both Contrast-Enhanced MRI (CE-MRI) and CT demonstrate high effectiveness. A 2025 retrospective study found CE-MRI had a slightly higher AUC (0.864) and specificity (86.3%) compared to CT (AUC 0.854, specificity 83.4%). Furthermore, integrating imaging findings with biomarker data (e.g., ER, PR, CA125) improved the AUC to 0.889, highlighting the power of combined models [18].

Experimental Protocols and Methodologies

The advancement of CT radiomics relies on standardized, transparent experimental protocols. The following workflow details the key steps, as used in a two-center study that developed an explainable machine learning model for differentiating endometrial tumors [3].

start Patient Cohort Selection (n=83, two centers) seg Manual ROI Segmentation on Pre-surgical CT Scans start->seg ext Radiomic Feature Extraction (1,132 features via PyRadiomics) seg->ext split Data Split Training Set (n=59) Testing Set (n=24) ext->split model Model Training & Validation (Six ML algorithms tested) split->model select Optimal Model Selection (Random Forest) model->select eval Model Performance Evaluation (AUROC, Sensitivity, Specificity) select->eval explain Explainability Analysis (SHAP, Feature Map Visualization) eval->explain clinical Clinical Utility Assessment (Decision Curve Analysis) explain->clinical

Key Phases of the CT Radiomics Workflow:

  • Cohort Formation and Imaging: Studies typically involve a retrospective, multi-center design to ensure robust and generalizable results. For example, a foundational study included 83 patients from two centers, with pre-surgical CT scans acquired using standardized protocols [3].
  • Tumor Segmentation and Feature Extraction: The region of interest (ROI) encompassing the entire tumor is manually delineated on each CT slice by experienced radiologists. Subsequently, a high-throughput extraction of radiomic features is performed using open-source software like PyRadiomics. These features quantify tumor intensity, shape, texture, and wavelet patterns [3] [20].
  • Model Development and Validation: The dataset is split into training and testing sets. Multiple machine learning algorithms (e.g., Random Forest, Support Vector Machine) are trained and evaluated. The model with the best performance on the independent testing set is selected as the final model. The Random Forest algorithm has been particularly effective in this domain [3] [21].
  • Explainability and Clinical Translation: To overcome the "black box" nature of AI, explainability techniques like SHAP (Shapley Additive Explanations) are employed. SHAP identifies the most important radiomic features driving the model's predictions, providing clinicians with transparent and interpretable insights. Finally, decision curve analysis is used to evaluate the model's net benefit over traditional clinical strategies [3] [20].

The Scientist's Toolkit: Essential Research Reagents and Solutions

The following table details key materials and computational tools essential for conducting rigorous radiomics research in endometrial cancer.

Table 3: Key Research Reagents and Computational Tools

Item Name Function / Application Specific Examples / Notes
PyRadiomics Open-source Python package for the extraction of radiomic features from medical images. IBSI-compliant; used to extract 1,132 features from CT scans in [3].
ITK-SNAP Software application for manual, semi-automatic, and automatic segmentation of medical images. Used for delineating regions of interest (ROIs) along tumor borders [7].
SHAP (Shapley Additive Explanations) A game theory-based method to interpret the output of any machine learning model. Provides explainability by identifying the most influential radiomic features for a prediction [3] [20].
Random Forest Classifier An ensemble machine learning algorithm used for classification and regression. Consistently a top-performing model in radiomics studies for its accuracy and robustness [3] [21].
Preoperative Serological Markers Blood-based biomarkers used to build combined diagnostic and prognostic models. HE4 and CA125 are pivotal predictors in machine learning models for risk stratification [21].

The comparative landscape of CT and MRI for endometrial tumor assessment is nuanced. MRI remains the superior modality for local staging due to its unmatched soft tissue resolution. However, CT, particularly when enhanced with radiomics and AI, demonstrates competitive and highly accurate performance for tumor classification and differentiation. The validation of CT radiomics is well-supported by rigorous experimental protocols that prioritize explainability and clinical utility. For researchers, the choice between CT and MRI should be guided by the specific clinical question—local invasion or holistic staging—while recognizing that AI integration is rapidly closing the performance gap and expanding the non-invasive profiling of endometrial cancer.

Radiomics represents a paradigm shift in medical image analysis, converting standard-of-care digital images into mineable, high-dimensional data [22]. The core hypothesis driving radiomics research is that biomedical images contain information that reflects underlying pathophysiology, which can be revealed through quantitative analysis to capture intra- and intertumoral heterogeneity [22]. This approach is particularly valuable in oncology, where tumor heterogeneity—the inherent diversity within a tumor encompassing genetic, phenotypic, and microenvironmental variations—represents a major challenge for treatment and is a known cause of therapeutic failure and resistance emergence [22] [23]. By extracting numerous quantitative features from tomographic images, radiomics provides a non-invasive method to quantitatively measure this heterogeneity, offering spatially and temporally resolved in vivo biomarkers of tumor biology that can inform clinical decision-making [22].

The biological basis of radiomics rests on its capacity to reveal characteristics of the tumor microenvironment (TME) and intra-tumoral heterogeneity that are imperceptible to visual assessment alone [24] [23]. These radiomic features, derived from first-order statistics, shape, and texture analyses, provide distinct information on tumor phenotype and microenvironment that complements clinical reports, laboratory tests, and genomic assays [22]. When correlated with genomic data in radiogenomic analyses, radiomic features can suggest gene expression or mutation status and provide additional, independent information that may increase diagnostic, prognostic, and predictive power [22]. This review synthesizes current evidence validating the biological basis of radiomics features across cancer types, with particular emphasis on their relationship with tumor heterogeneity and microenvironment.

Experimental Protocols and Methodological Frameworks

Standardized Radiomics Analysis Pipeline

The process of radiomics involves discrete steps, each with specific methodological considerations [22]. The following workflow represents the standardized approach used across multiple studies:

  • Image Acquisition and Preprocessing: Studies utilized contrast-enhanced CT scans acquired with standardized protocols. For example, in renal cell carcinoma research, CT scans were performed with tube voltage of 120 kV, tube current 250 mAs, and slice thickness of 5 mm [24]. Image preprocessing included resampling to standardized voxel spacing (typically 1×1×1 mm) and gray-level discretization to normalize intensity values across scanners [25].

  • Tumor Segmentation: Manual segmentation of regions of interest (ROIs) was consistently performed slice-by-slice by experienced radiologists using specialized software (3D Slicer or ITK-SNAP) [25] [7]. The "Level Tracing" function was often employed for boundary delineation, with exclusion of non-tumor tissues. To ensure reproducibility, intra- and inter-observer reliability was assessed using intraclass correlation coefficients (ICCs), with features having ICCs < 0.8 typically excluded [25].

  • Feature Extraction: High-throughput feature extraction was performed using standardized platforms, primarily PyRadiomics in Python [13] [23]. The number of extracted features varied by study, ranging from 851 in NSCLC research to 3,566 in deep learning radiomics analysis of renal cell carcinoma [24] [23]. Feature classes consistently included first-order statistics, shape-based features, and texture features (GLCM, GLRLM, GLSZM) [23].

  • Feature Selection and Model Building: Robust feature selection pipelines involved multiple steps: (1) removal of features with poor reproducibility (ICC < 0.8); (2) elimination of highly correlated features (Spearman's |ρ| > 0.8); (3) univariate analysis to identify significant features; and (4) regularized regression techniques like LASSO for final feature selection [25]. Machine learning models including Random Forest, Cox regression, and deep learning approaches were then built using selected features [13] [24].

G cluster_0 Computational Analysis cluster_1 Biological Interpretation Medical Imaging (CT/MRI) Medical Imaging (CT/MRI) Tumor Segmentation Tumor Segmentation Medical Imaging (CT/MRI)->Tumor Segmentation Feature Extraction Feature Extraction Tumor Segmentation->Feature Extraction Feature Selection Feature Selection Feature Extraction->Feature Selection Model Development Model Development Feature Selection->Model Development Biological Validation Biological Validation Model Development->Biological Validation Clinical Application Clinical Application Biological Validation->Clinical Application

Multi-Omics Integration for Biological Validation

To establish the biological basis of radiomics, researchers have implemented sophisticated multi-omics validation frameworks:

  • Genomic Correlation Analysis: Studies integrated radiomics features with genomic data from sources like The Cancer Genome Atlas (TCGA). Differentially expressed genes (DEGs) between radiomics risk groups were identified using thresholds of adjusted p-value < 0.05 and |log2FC| > 2 [25] [23].

  • Pathway Enrichment Analysis: Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed on DEGs to identify biological pathways enriched in different radiomics risk groups [25] [23]. Significance thresholds were typically set at p < 0.05.

  • Tumor Microenvironment Deconvolution: Immune cell infiltration patterns were analyzed using transcriptomic data and tools like ESTIMATE to calculate immune scores [23]. This allowed researchers to correlate radiomics features with specific immune cell populations in the TME.

  • Survival Analysis Integration: Clinical outcomes including overall survival (OS) and disease-free survival (DFS) were correlated with radiomics risk groups using Kaplan-Meier analysis and Cox regression models [24] [25].

Comparative Analysis of Radiomics Validation Across Cancers

Performance Metrics and Biological Correlations

Table 1: Comparative Performance of Radiomics Models Across Cancer Types

Cancer Type Imaging Modality Sample Size Key Biological Correlations Model Performance (AUC/Validation)
Clear Cell Renal Cell Carcinoma [24] [25] Contrast-enhanced CT 512 patients + external validation Tumor heterogeneity, microenvironment, cell cycle regulation, DNA repair, platinum resistance 1-year OS: 0.879, 3-year OS: 0.854, 5-year OS: 0.831
Endometrial Cancer [13] [3] CT 83 patients (two-center) Texture features associated with malignancy (60% texture, 40% first-order) Training AUROC: 1.00, Testing AUROC: 0.96
Non-Small Cell Lung Cancer [23] Contrast-enhanced CT 334 patients + external validation Hypoxia, TNFA-NF-κB signaling, inflammatory response, angiogenesis, immune cell infiltration Significant prognostic stratification (p<0.05)
Endometrial Cancer Molecular Subtyping [7] MRI 526 patients (multicenter) POLEmut, NSMP, p53abn molecular subtypes Macro-average AUC: 0.79 (internal), 0.74 (external)

Biological Pathway Associations

Table 2: Radiomics Associations with Tumor Biology Pathways and Microenvironment

Radiomics Risk Group Enriched Biological Pathways Tumor Microenvironment Characteristics Clinical Prognostic Correlation
High-Risk ccRCC [24] [25] Cell cycle regulation, DNA repair, platinum resistance Reduced immune scores, decreased naive B cells, impaired immune activity Shorter overall survival, increased recurrence
High-Risk NSCLC [23] Hypoxia, TNFA-NF-κB signaling, inflammatory response, angiogenesis Significantly reduced immune scores, decreased proportions of naive B cells Stronger inflammatory responses, aggressive phenotypes, poorer outcomes
Malignant Endometrial Tumors [13] [3] Features associated with tumor heterogeneity and aggressiveness Greater heterogeneity across multiple feature domains, complex internal patterns Higher net benefit for clinical decision-making per DCA
p53abn EC Molecular Subtype [7] p53-related pathways, aggressive tumor biology Distinct microenvironment patterns by molecular subtype Less favorable prognosis, requiring aggressive treatment

The Biological Basis of Radiomics: Mechanisms and Evidence

Revealing Tumor Heterogeneity through Imaging Features

Radiomics quantifies tumor heterogeneity by analyzing spatial variations in pixel intensities within medical images. The biological basis for this approach rests on the premise that genetic and phenotypic heterogeneity within tumors manifests as measurable heterogeneity in medical images [22]. In clear cell renal cell carcinoma, radiomics features successfully captured intra-tumoral heterogeneity that correlated with genomic heterogeneity observed in sequencing data [24]. Specifically, tumors classified as high-risk by radiomics exhibited greater genomic instability and more aggressive molecular profiles, demonstrating that non-invasive imaging can reflect the underlying biological diversity of tumors [24].

The connection between radiomics and tumor heterogeneity is further strengthened by studies showing that texture features—mathematical representations of the spatial distribution of image intensities—correlate with histopathological measures of cellular heterogeneity [22]. In endometrial cancer, malignant tumors exhibited significantly greater heterogeneity across multiple feature domains compared to benign tumors, with more complex internal patterns and irregular intensity distributions visible on feature maps [3]. These visual patterns provide intuitive representations of the complex mathematical features that drive classification models and reflect the underlying biological heterogeneity [3].

Mapping the Tumor Microenvironment through Radiomics

Radiomics features provide unique insights into the tumor microenvironment (TME), particularly regarding immune cell infiltration and stromal composition. In NSCLC, radiomics risk stratification revealed significant differences in immune microenvironment profiles, with high-risk patients showing significantly reduced immune scores and decreased proportions of naive B cells, indicating impaired immune activity [23]. This correlation between radiomics features and immune landscape suggests that imaging can non-invasively assess the immune contexture of tumors, which has important implications for immunotherapy response prediction.

The biological basis of radiomics-TME relationships is further elucidated through pathway analysis. In NSCLC, gene set enrichment analysis revealed significant enrichment of tumor invasion and proliferation-related pathways—including hypoxia, TNFA-NF-κB signaling, inflammatory response, and angiogenesis—in the high-risk group defined by radiomics [23]. Similarly, in renal cell carcinoma, the genomic landscape of different radiomics score groups showed significant variations in the heterogeneity of tumor cells and tumor microenvironment [24] [25]. These findings establish a direct link between radiomics features and the biological processes shaping the TME.

G cluster_0 Biological Processes cluster_1 Clinical Manifestations Radiomics Features Radiomics Features Tumor Heterogeneity Tumor Heterogeneity Radiomics Features->Tumor Heterogeneity Reflects Tumor Microenvironment Tumor Microenvironment Radiomics Features->Tumor Microenvironment Maps Molecular Pathways Molecular Pathways Tumor Heterogeneity->Molecular Pathways Drives Clinical Outcomes Clinical Outcomes Tumor Heterogeneity->Clinical Outcomes Impacts Tumor Microenvironment->Molecular Pathways Influences Molecular Pathways->Clinical Outcomes Determines

Molecular Pathway Associations Validated Through Multi-Omics

The most compelling evidence for the biological basis of radiomics comes from integrated multi-omics studies that directly correlate imaging features with molecular pathways. In clear cell renal cell carcinoma, differential gene expression analysis between radiomics risk groups identified marked disparities in cell cycle regulation, DNA repair, and platinum resistance pathways [25]. These molecular differences provide a mechanistic explanation for the observed variations in clinical outcomes between radiomics-defined risk groups.

Similarly, in endometrial cancer molecular subtyping, deep learning radiomics models based on MRI demonstrated significant associations with specific molecular subtypes (POLEmut, NSMP, p53abn) [7]. The clinical-radiomics DL model outperformed both clinical models and radiomics DL models alone, achieving macro-average AUCs of 0.79 in internal validation and 0.74 in external validation [7]. This successful classification of molecular subtypes based on imaging features provides strong evidence that radiomics captures fundamental biological characteristics of tumors.

The Scientist's Toolkit: Essential Research Reagents and Solutions

Table 3: Essential Research Resources for Radiomics-Biology Correlation Studies

Resource Category Specific Tools/Solutions Application in Radiomics Research Key Features
Image Analysis Software 3D Slicer, ITK-SNAP Tumor segmentation, ROI delineation Open-source, standardized segmentation, handles DICOM format
Radiomics Extraction Platforms PyRadiomics (Python) High-throughput feature extraction IBSI-compliant, standardized feature definitions, multiple image filters
Machine Learning Frameworks Scikit-learn, TensorFlow, PyTorch Model development, deep learning Comprehensive algorithms, neural network architectures, cross-validation
Genomic Data Resources TCGA, TCIA Multi-omics integration, biological validation Paired imaging-genomics data, clinical outcomes, molecular profiling
Statistical Analysis Tools R, Python (SciPy, Pandas) Feature selection, statistical validation Multiple testing correction, survival analysis, data visualization
Bioinformatics Databases STRING, GO, KEGG Pathway enrichment analysis Protein-protein interactions, biological pathway maps, functional annotation

The accumulating evidence from multiple cancer types consistently demonstrates that radiomics features have a robust biological basis, reflecting underlying tumor heterogeneity and microenvironment characteristics. Through multi-omics validation approaches, researchers have established direct correlations between radiomics features and specific molecular pathways, immune microenvironment composition, and genomic heterogeneity patterns. The reproducible performance of radiomics models across different cancer types and imaging modalities further strengthens the validity of these biological connections.

For researchers and drug development professionals, these findings position radiomics as a valuable tool for non-invasive assessment of tumor biology, with applications in patient stratification, treatment response prediction, and biomarker development. The biological validation of radiomics features moves the field beyond correlative black-box models toward mechanistically informed imaging biomarkers that can provide genuine insights into tumor biology and therapeutic vulnerabilities. Future research should focus on standardizing radiomics pipelines across institutions and further elucidating the specific biological mechanisms that give rise to distinctive radiomics features, ultimately enhancing their clinical utility in precision oncology.

Building Effective CT Radiomics Pipelines: From Feature Extraction to Model Development

Image Acquisition Protocols and Quality Assurance for CT Radiomics

Radiomics analysis translates medical images into mineable, high-dimensional quantitative data by extracting numerous features that can serve as actionable biomarkers for disease diagnosis, prognosis, and treatment prediction [26]. In the specific context of endometrial tumor classification research, the performance validation of CT radiomics is critically dependent on two foundational pillars: standardized image acquisition protocols and rigorous quality assurance (QA). Variations in CT acquisition parameters introduce significant inter-scanner variability that can compromise feature reproducibility and model generalizability [27]. Consequently, implementing robust QA processes is not merely a technical formality but an essential prerequisite for generating reliable, translatable radiomic signatures that accurately reflect underlying tumor biology rather than scanner-specific artifacts.

The radiomics pipeline encompasses several sequential stages: image acquisition, preprocessing, segmentation, feature extraction, and analysis [26]. Errors or inconsistencies introduced at the acquisition stage propagate through subsequent stages, potentially invalidating conclusions. For endometrial tumor research, where the goal is often to distinguish subtle textural patterns associated with tumor subtypes or grades, maintaining geometric and dosimetric accuracy through protocol compliance is paramount [28]. This guide systematically compares QA approaches and acquisition strategies, providing researchers with the experimental frameworks necessary to optimize CT radiomics for gynecological oncology applications.

Comparative Analysis of Quality Assurance Frameworks

Clinical vs. Preclinical QA Requirements

Quality assurance protocols for CT radiomics must be tailored to the research context, with distinct but overlapping requirements for clinical and preclinical imaging systems. The American College of Radiology (ACR) establishes comprehensive QA guidelines for clinical CT scanners, mandating a continuous program supervised by a qualified medical physicist (QMP) [29]. This includes initial acceptance testing, annual performance evaluations, and frequent technologist-led constancy checks. Key performance metrics monitored in clinical systems include CT number accuracy, radiation beam width, spatial resolution, low-contrast performance, dosimetry, and artifact evaluation [29].

In preclinical research, while formal accreditation programs are less established, the fundamental principles of QA remain equally critical. Preclinical CT systems used for radiomics, including cone-beam CT (CBCT) and micro-CT (µCT), require rigorous characterization of acquisition parameters similar to their clinical counterparts [27]. However, the smaller scale, different energy ranges, and specialized applications of preclinical systems necessitate modified approaches. For example, the higher spatial resolution requirements for small animal imaging place greater emphasis on verifying geometric accuracy and minimizing partial volume effects that could distort radiomic feature extraction.

Table 1: Quality Assurance Frequency and Responsibility Comparison

Task Clinical CT (ACR Guidelines) Preclinical CT (Research Best Practices)
Overall Supervision Qualified Medical Physicist (QMP) Principal Investigator/Designated Physicist
Acceptance Testing Upon installation before patient use Upon installation before experimental use
Annual Performance Survey Required (within 14-month interval) Recommended at least annually
Daily Constancy Checks Water CT number, uniformity, artifacts Phantom scans for signal-to-noise, uniformity
Key Performance Metrics CTDIvol, spatial resolution, low-contrast detectability, CT number accuracy Spatial resolution, noise, uniformity, geometric accuracy
Automated Protocol Compliance Systems

Beyond traditional QA, automated systems have emerged to verify imaging parameter compliance specifically for radiotherapy and radiomics applications. The ImageCompliance system exemplifies this approach—an automated, GUI-based script that verifies correct CT and MRI parameters against predefined commissioned protocols directly within the treatment planning workflow [28]. This system utilizes a multi-tier warning classification ("Fail," "Physics Review," "Warning") for parameter deviations based on their potential impact on dosimetric or geometric accuracy.

For radiomics applications, parameters with direct implications for feature stability, such as tube voltage (kVp), slice thickness, and reconstruction kernel, are typically designated with "Fail" status if outside tolerance, immediately halting the workflow until resolved [28]. This is critical because variations in tube voltage significantly affect CT numbers and subsequent texture analysis [28] [27]. Meanwhile, dose-related parameters like CTDIvol may trigger less severe "Physics Review" alerts, as they primarily affect radiation dose rather than image texture characteristics fundamental to radiomics.

Standardized Image Acquisition Parameters for Radiomics

Critical Acquisition Parameters and Their Impact on Feature Stability

Radiomic feature reproducibility is highly sensitive to specific CT acquisition parameters. Understanding and controlling these variables is fundamental to any endometrial tumor classification study. The following parameters have demonstrated significant effects on feature reliability and must be carefully standardized:

  • Tube Voltage (kVp): Directly influences photon energy spectrum and tissue contrast, causing substantial variation in CT numbers and texture features. Studies show kVp variations can significantly impact dose calculation accuracy and radiomic feature values [28] [27]. It should be fixed per protocol with a tolerance of exact equivalence (=) to the reference value [28].
  • Slice Thickness: Affects spatial resolution and partial volume effects. Thicker slices can obscure fine-texture details and reduce radiomic feature sensitivity. For stereotactic applications, ≤1.0 mm is often required, while 2.0–2.5 mm may be acceptable for other body applications [28]. It should be maintained at or below (≤) the reference value [28].
  • Reconstruction Kernel/Algorithm: Determines image sharpness and noise texture. Sharp kernels enhance edges but increase noise, while smooth kernels reduce noise but blur fine structures. Changing kernels alters texture feature values substantially. The convolution kernel must exactly match (=) the protocol-specific reference [28].
  • Tube Current (mA) and CTDIvol: Primarily affect image noise rather than CT number accuracy. While moderately variable mA may be acceptable for some applications, extreme variations can impact low-contrast detectability and feature stability. Tolerance should be within the 95% confidence interval of historical data [28].
  • Reconstruction Diameter/Field of View: Impacts pixel size and spatial resolution. Should be fixed to protocol-specific values, typically with a tolerance of ≤ the reference value to prevent unintended resampling [28].

Table 2: CT Acquisition Parameter Tolerances for Radiomics Protocol Compliance

Parameter DICOM Tag Impact on Radiomics Recommended Tolerance Warning Tier
Tube Voltage (kVp) (0018,0060) High - Affects CT numbers & texture = (Exact match) Fail
Slice Thickness (0018,0050) High - Affects spatial resolution ≤ (Less than or equal) Fail
Reconstruction Kernel (0018,1210) High - Alters noise texture = (Exact match) Fail
Tube Current (0018,1151) Medium - Affects noise patterns Within 95% CI of historical data Physics Review
CTDIvol (0018,9345) Low-Medium - Indirect via noise Within 95% CI of historical data Physics Review
Reconstruction Diameter (0018,1100) Medium - Affects pixel size ≤ (Less than or equal) Fail
Gantry Tilt (0018,1120) High - Causes geometric distortion = 0 Fail
Cross-Platform Comparison of Radiomics Feature Reliability

Different CT scanner platforms exhibit distinct radiomic feature reliability profiles, necessitating platform-specific validation. A comparative analysis of preclinical CBCT and µCT systems revealed that first-order statistics and Gray Level Co-occurrence Matrix (GLCM) features were the most stable across different scanners, segmentation volumes, and imaging energies [27]. This finding has direct relevance for clinical endometrial tumor studies, suggesting these feature classes may provide more reproducible biomarkers across multi-center validation studies.

The same study established an inverse relationship between tissue density and feature reliability, with the highest number of reliable features found in lung tissue and the lowest in bone [27]. For endometrial tumor research, this suggests that textural analysis of uterine tissue (soft tissue density) may demonstrate intermediate reliability, underscoring the need for rigorous feature selection protocols. Furthermore, voxel size harmonization through resampling significantly increased the number of comparable features between different scanners, indicating this preprocessing step is essential for multi-institutional radiomics research [27].

Experimental Protocols for Radiomics Quality Assurance

Phantom-Based Validation Methodology

Phantom experiments are fundamental for establishing the technical validation of radiomics features before clinical application. The following protocol, adapted from comparative preclinical studies, provides a framework for assessing feature reliability across CT platforms:

Materials and Equipment:

  • Anatomically realistic phantom with tissue-equivalent inserts (e.g., national physical laboratory phantom with density inserts for soft tissue, lung, and bone equivalents) [27]
  • CT scanners to be compared (e.g., different manufacturers or models)
  • Data analysis workstation with radiomics software (e.g., PyRadiomics)

Scanning Protocol:

  • Perform scan-rescan analysis on each scanner by acquiring two consecutive scans of the phantom without repositioning.
  • Acquire images at multiple clinically relevant energy levels (e.g., 80, 100, 120, 140 kVp for clinical CT; 40 and 60 kVp for preclinical systems) [27].
  • Maintain consistent other parameters (mA, rotation time, slice thickness) across energies where possible.
  • Reconstruct images using standard and sharp kernels routinely used in clinical practice.

Feature Reliability Assessment:

  • Segment spherical or cylindrical volumes of interest (VOIs) within different density inserts using consistent brush sizes (e.g., 44 mm³, 92 mm³, 238 mm³) [27].
  • Extract radiomic features using standardized software (e.g., PyRadiomics) with fixed bin width (e.g., 25) [27].
  • Calculate intraclass correlation coefficient (ICC) for each feature between scan and rescan using two-way mixed-effects models with absolute agreement [27].
  • Classify features with ICC > 0.8 as "reliable" based on established test-retest thresholds [27].

This methodology allows researchers to establish a scanner-specific reliable radiomics signature, filtering out unstable features before applying them to endometrial tumor data.

Clinical Translation and Cross-Validation Protocol

Validating that radiomic features identified in preclinical models translate to clinical applications is crucial. The following protocol outlines an approach for cross-species and cross-scanner validation:

Tumor Model and Imaging:

  • Establish orthotopic tumor models (e.g., in rodent brains for neurological cancers or potentially in appropriate sites for gynecological research) [30].
  • Perform longitudinal contrast-enhanced CT imaging at predetermined intervals throughout tumor development.
  • Acquire clinical CT scans from patient cohorts with appropriate pathology (e.g., endometrial cancer).

Feature Extraction and Selection:

  • Delineate whole-tumor volumes of interest (VOIs) using semi-automated or manual methods in consistent software platforms (e.g., ITK-SNAP, 3D Slicer) [30].
  • Extract comprehensive radiomic feature sets (e.g., 800+ features including first-order, GLCM, GLRLM, GLSZM, GLDM, NGTDM) using PyRadiomics [30].
  • Apply wavelet filters to generate additional feature versions from transformed images.
  • Remove inter-correlated features (Spearman correlation > 0.85) and apply feature selection algorithms (recursive feature elimination or Boruta algorithm) [30].

Cross-Validation Analysis:

  • Identify features that significantly differentiate tumor from normal tissue in preclinical models.
  • Test these feature candidates in clinical datasets, analyzing distribution differences between tumor and normal regions in patient scans [30].
  • Validate conservation of feature trends (e.g., consistently increased or decreased in tumor tissue across species and scanner platforms).
  • Establish predictive models using conserved features and evaluate performance with area under the receiver operating characteristic curve (AUC) [30].

This translational protocol provides a robust framework for ensuring that radiomic signatures discovered in controlled experimental settings maintain diagnostic value in clinical endometrial tumor classification.

Essential Research Toolkit for CT Radiomics

Implementing a standardized radiomics workflow requires specific software tools and physical resources. The following toolkit outlines essential components for conducting validated CT radiomics research for endometrial tumor classification:

Table 3: Essential Research Toolkit for CT Radiomics Quality Assurance

Tool Category Specific Tool/Resource Function in Radiomics Pipeline
Quality Assurance Phantoms ACR CT Accreditation Phantom Verification of CT number accuracy, uniformity, slice thickness, low-contrast resolution [29]
Quality Assurance Phantoms Anatomically Realistic Phantom (e.g., NPL Mouse Phantom) Assessment of radiomic feature reliability across tissue densities [27]
Dosimetry Equipment CT Sensor + CTDI Phantom (e.g., RaySafe X2) Measurement of radiation dose metrics (CTDIvol, DLP) for protocol compliance [31]
Segmentation Software ITK-SNAP (http://www.itksnap.org) Manual and semi-automated delineation of tumor VOIs [27] [30]
Segmentation Software 3D Slicer (http://www.slicer.org) Multi-modal image analysis and segmentation [26]
Feature Extraction PyRadiomics (Python package) Standardized extraction of radiomic features from medical images [26] [30]
Protocol Compliance ImageCompliance or similar script Automated verification of DICOM parameter compliance with commissioned protocols [28]
Statistical Analysis R Studio with irr package Calculation of intraclass correlation coefficients for feature reliability [27]

Workflow Diagram for QA-Aware Radiomics Research

The following diagram illustrates a comprehensive quality assurance-aware workflow for CT radiomics research, integrating the protocols and comparisons discussed throughout this guide:

CT_Radiomics_Workflow cluster_acquisition Image Acquisition Phase cluster_analysis Radiomics Analysis Phase cluster_translation Validation & Translation Start Study Design for Endometrial Tumor Classification Protocol Define Standardized Acquisition Protocol Start->Protocol QA1 Perform Quality Assurance: - CT Number Accuracy - Uniformity - Geometric Accuracy Protocol->QA1 Acquisition Acquire Patient/Phantom CT Scans QA1->Acquisition Compliance Automated Protocol Compliance Check Acquisition->Compliance Preprocessing Image Preprocessing: - Voxel Size Harmonization - Intensity Normalization Compliance->Preprocessing Segmentation Tumor Volume Segmentation (Manual/Semi-automated) Preprocessing->Segmentation Extraction Radiomic Feature Extraction Using PyRadiomics Segmentation->Extraction Validation Phantom Validation of Feature Reliability (ICC>0.8) Extraction->Validation Selection Feature Selection & Model Building Validation->Selection ClinicalVal Clinical Validation in Patient Cohort Selection->ClinicalVal Reporting Results Reporting with QA Documentation ClinicalVal->Reporting

Diagram Title: Comprehensive QA Workflow for CT Radiomics Research

Performance validation of CT radiomics for endometrial tumor classification research demands meticulous attention to image acquisition protocols and quality assurance practices. The comparative data presented in this guide demonstrates that variations in key acquisition parameters—particularly tube voltage, slice thickness, and reconstruction kernel—significantly impact radiomic feature stability. Implementation of automated protocol compliance systems, such as the ImageCompliance framework with its multi-tier warning structure, provides a robust mechanism for ensuring parameter consistency across imaging sessions and platforms.

Furthermore, phantom-based validation protocols establish essential ground truth for distinguishing biologically relevant radiomic signatures from scanner-induced artifacts. The experimental methodologies outlined, including cross-platform reliability assessment and clinical translation frameworks, provide researchers with practical tools for strengthening their radiomics workflows. As the field advances toward clinical implementation, adherence to these standardized QA practices will be paramount for developing reliable, validated CT radiomics models for endometrial tumor classification that can genuinely impact patient care through improved diagnostic accuracy and treatment personalization.

Robust Tumor Segmentation and Feature Extraction Methodologies

Robust tumor segmentation and radiomic feature extraction are foundational to developing reliable, non-invasive diagnostic and prognostic tools for oncology research. In endometrial cancer classification, these methodologies enable the quantitative analysis of tumor phenotypic characteristics from medical images, which can be correlated with molecular subtypes and clinical outcomes [32] [3]. The critical challenge lies in ensuring that these quantitative features remain robust to variations in image acquisition protocols, segmentation methodologies, and inter-observer delineation differences. Without such robustness, radiomic models may fail to generalize across institutions and patient populations, limiting their clinical utility [33] [34].

This guide provides a comparative analysis of current segmentation methodologies and feature extraction protocols, with specific attention to their application in CT radiomics for endometrial tumor classification. We evaluate performance through standardized metrics and experimental data, providing researchers with evidence-based recommendations for implementing robust radiomics workflows.

Comparative Analysis of Segmentation Methodologies

Performance Metrics for Segmentation Robustness

The evaluation of segmentation methodologies relies on quantitative metrics that assess both geometric accuracy and clinical utility:

  • Dice Similarity Coefficient (DSC): Measures spatial overlap between segmented volumes, with values >0.7 generally indicating clinically acceptable agreement [35] [36].
  • Intraclass Correlation Coefficient (ICC): Quantifies feature reproducibility across multiple segmentations, with ICC >0.75 considered robust and >0.90 indicating excellent reliability [33] [34].
  • Lin's Concordance Correlation Coefficient (CCC): Evaluates agreement between feature values extracted from different imaging protocols, with CCC >0.75 indicating protocol robustness [33].
  • Hausdorff Distance (HD95): Measures boundary agreement, with lower values indicating superior contour precision [35].
Comparison of Segmentation Approaches

Table 1: Comparative Performance of Tumor Segmentation Methodologies

Methodology Representative Implementation Dice Score (Median) ICC Range Key Advantages Primary Limitations
Manual Segmentation Slice-by-slice delineation by experts 0.73–0.80 [35] 0.77±0.17 [34] Considered reference standard; direct clinical translation Time-consuming; high inter-observer variability (ICC=0.77) [34]
Semi-automatic Segmentation 3D-Slicer GrowCut algorithm [33] [34] 0.75–0.85 [34] 0.85±0.15 [34] Reduced inter-observer variability; faster than manual Requires initial manual input; algorithm parameter sensitivity
Deep Learning (Supervised) 3D U-Net (iSeg) [35] 0.70–0.73 [35] 0.82–0.90 [36] Fully automated; rapid processing; matches human performance Requires large annotated datasets for training
Deep Learning (RNN) 3D Recurrent Neural Network [36] 0.803 [36] 0.84–0.90 [36] Superior contour accuracy; excellent feature stability Computational complexity; longer training times
Traditional Image Processing Weighted Fuzzy C-Means (WFCM) [36] 0.576 [36] 0.65–0.75 [36] No training data required; computationally efficient Lower accuracy for heterogeneous tumors

Table 2: Downstream Diagnostic Performance of Segmentation Methods in Lung Nodule Classification

Segmentation Method Benign vs. Malignant Classification (AUC) Adenocarcinoma Infiltration (AUC) Nodule Density Classification (Kappa)
RNN (Deep Learning) 0.840 ± 0.01 [36] 0.946 [36] 0.729 [36]
Senior Radiologist (S1) 0.824 ± 0.015 [36] 0.924 [36] 0.698 [36]
UNET (Deep Learning) 0.801 ± 0.012 [36] 0.912 [36] 0.681 [36]
Junior Radiologist (R1) 0.792 ± 0.011 [36] 0.901 [36] 0.665 [36]
Contextual Performance for Endometrial Cancer

While comprehensive segmentation comparisons for endometrial cancer specifically are limited in the available literature, evidence suggests that semi-automated and deep learning approaches offer particular advantages for gynecological applications. In endometrial cancer research, semi-automatic segmentation using 3D-Slicer has demonstrated significantly higher feature reproducibility (ICC = 0.85±0.15) compared to manual delineation (ICC = 0.77±0.17) [34]. For MRI-based endometrial tumor segmentation, deep learning methods have shown promising results, though CT-specific segmentation algorithms for endometrial cancer remain an area of active development [7] [3].

Robust Radiomic Feature Extraction

Assessment of Feature Robustness

Radiomic feature robustness is essential for developing reliable classification models. A recent comprehensive study on CT radiomics for non-small cell lung cancer identified that only 21 out of 106 features demonstrated robustness to both segmentation variations and acquisition protocol differences [33]. These robust features showed superior predictive performance in recurrence prediction compared to non-robust features, highlighting the importance of rigorous feature selection [33].

Table 3: Radiomic Feature Robustness Across Segmentation and Acquisition Variations

Feature Category Robustness to Segmentation (ICC >0.75) Robustness to Protocol (CCC >0.75) Representative Robust Features
First-Order Statistics 68% of features [33] [34] 45% of features [33] Energy, Entropy, 90th Percentile [33] [3]
Texture Features 72% of features [33] [34] 52% of features [33] GLCM Contrast, GLCM SumEntropy [33]
Shape Features 62% of features [33] 38% of features [33] Surface Area, Sphericity, Maximum 3D Diameter [33]
Wavelet-Transformed 85% of features [33] [3] 65% of features [33] Wavelet-HHH firstorder 90Percentile [3]
Feature Stability Across Methodologies

The stability of radiomic features varies significantly across segmentation methodologies. Studies demonstrate that intensity statistics and textural features exhibit significantly higher reproducibility (p = 0.0006 and p = 0.009, respectively) when extracted from semi-automated segmentations compared to manual delineations [34]. For endometrial cancer classification using CT radiomics, approximately 90% of the most discriminative features originate from transformed images (particularly wavelet and LoG filtering), indicating these feature classes may offer enhanced robustness for tumor characterization [3].

Experimental Protocols for Method Validation

Protocol for Assessing Segmentation Robustness

A standardized protocol for evaluating segmentation robustness involves multiple independent annotators and assessment cycles:

  • Multi-annotator Design: Engage at least two independent annotators with relevant expertise (e.g., radiation oncologists for CT images) [33].
  • Segmentation Execution: Each annotator performs segmentation using both manual and semi-automatic/automatic methods [36] [34].
  • Dice Coefficient Calculation: Compute spatial overlap between segmentations from different annotators and methods [33].
  • Feature Extraction: Extract radiomic features from each segmentation using standardized software (e.g., PyRadiomics) [33] [3].
  • ICC Calculation: Determine inter-observer and inter-method reliability for each feature [33].
  • Performance Validation: Evaluate downstream diagnostic performance using segmented features in classification tasks [36].

G Start Start Segmentation Robustness Assessment AnnotatorSelection Select Multiple Independent Annotators Start->AnnotatorSelection SegmentationMethods Apply Multiple Segmentation Methods (Manual & Automated) AnnotatorSelection->SegmentationMethods DiceCalculation Calculate Dice Similarity Coefficient (DSC) SegmentationMethods->DiceCalculation FeatureExtraction Extract Radiomic Features Using PyRadiomics DiceCalculation->FeatureExtraction ICCAnalysis Perform ICC Analysis for Feature Robustness FeatureExtraction->ICCAnalysis Validation Validate Diagnostic Performance in Classification Tasks ICCAnalysis->Validation

Segmentation Robustness Assessment Workflow

Protocol for Evaluating Feature Robustness Across Acquisition Protocols

For assessing feature stability across different imaging protocols:

  • Multi-protocol Imaging: Acquire images using different scanner parameters (e.g., high-dose vs. low-dose CT) for the same patients [33].
  • Segmentation Consistency: Apply consistent segmentation across protocol variants [33].
  • Feature Extraction: Extract identical feature sets from all protocol variants [33].
  • Concordance Analysis: Calculate Lin's Concordance Correlation Coefficient (CCC) between feature values from different protocols [33].
  • Robust Feature Identification: Select features with CCC >0.75 for further analysis [33].
  • Predictive Modeling: Build and validate models using only robust features [33].

Table 4: Essential Tools for Robust Radiomics Research

Tool Category Specific Solution Primary Function Application Context
Segmentation Software 3D Slicer (GrowCut algorithm) [33] [34] Semi-automatic volumetric segmentation Reduces inter-observer variability in tumor contouring
Segmentation Software ITK-SNAP [7] Manual and semi-automatic segmentation Multi-organ segmentation with active contour method
Deep Learning Framework 3D U-Net [35] [36] Fully automated segmentation High-throughput processing in large datasets
Deep Learning Framework 3D RNN [36] Automated segmentation with iterative refinement Superior boundary definition for heterogeneous tumors
Feature Extraction PyRadiomics (v3.0.1+) [33] [36] [3] Standardized radiomic feature extraction IBSI-compliant feature quantification
Validation Metrics Dice Similarity Coefficient [33] [35] Spatial overlap assessment Segmentation accuracy quantification
Validation Metrics Intraclass Correlation Coefficient [33] [34] Feature reproducibility measurement Robust feature identification
Validation Metrics Lin's Concordance Correlation Coefficient [33] Protocol agreement evaluation Multi-scanner/protocol feature stability

Based on comparative performance data, we recommend researchers consider the following evidence-based approaches for endometrial tumor classification research:

  • For maximum feature robustness: Implement semi-automated segmentation using 3D-Slicer, which demonstrates significantly higher ICC values (0.85±0.15) compared to manual delineation (0.77±0.17) [34].

  • For high-throughput studies: Employ deep learning approaches (particularly 3D RNN) which achieve superior Dice scores (0.803) and maintain diagnostic performance in downstream classification tasks [36].

  • For feature selection: Prioritize wavelet-transformed and texture features, which demonstrate the highest robustness to both segmentation and protocol variations [33] [3].

  • For model generalizability: Validate all radiomic features for robustness (ICC >0.75, CCC >0.75) before inclusion in predictive models, as only approximately 20% of features may meet both criteria [33].

The integration of robust segmentation methodologies with careful feature selection represents the most promising path forward for developing clinically applicable radiomic models for endometrial cancer classification. As these technologies continue to evolve, adherence to standardized validation protocols will be essential for ensuring reproducible and clinically meaningful research outcomes.

In the evolving field of computational oncology, the selection of an optimal machine learning (ML) algorithm is paramount for developing robust diagnostic and prognostic models. Within the specific research context of performance validation for CT radiomics in endometrial tumor classification, numerous algorithms are being evaluated. Among them, Random Forest (RF) consistently demonstrates superior performance across multiple studies. This guide provides an objective, data-driven comparison of RF against other prevalent ML algorithms, drawing on recent experimental evidence to inform researchers, scientists, and drug development professionals.

Performance Comparison in Endometrial Cancer Classification

Recent studies directly comparing multiple machine learning algorithms for classifying endometrial cancer using CT radiomics have consistently ranked Random Forest among the top performers.

Table 1: Comparative Performance of ML Algorithms in Differentiating Benign and Malignant Endometrial Tumors via CT Radiomics (Two-Center Study, n=83) [3]

Machine Learning Model Testing AUROC Testing Sensitivity Testing Specificity Training AUROC
Random Forest 0.96 100% 92.31% 1.00
XGBoost 0.93 92.31% 92.31% 0.99
Support Vector Classifier 0.91 84.62% 92.31% 0.97
K-Nearest Neighbors 0.90 84.62% 92.31% 0.95
Logistic Regression 0.88 76.92% 92.31% 0.92
TabPFNv2 0.88 69.23% 92.31% 0.89

A separate study utilizing serological markers and clinical variables from 562 patients to diagnose and stage endometrial cancer further confirmed the dominance of ensemble methods. The Random Forest classifier achieved a predictive accuracy of 0.94 and an AUC of 0.81, outperforming other models like Support Vector Machine, Logistic Regression, and Neural Networks [21].

Beyond classification, RF's utility extends to critical prognostic tasks. Research on predicting endometrial cancer recurrence from pre-surgical CT scans found that a Random Forest model (RFsrc) achieved an AUC of 0.90 on the test set, demonstrating high accuracy for survival analysis [11] [37].

Key Experimental Protocols and Methodologies

The superior performance of Random Forest is validated through rigorous and reproducible experimental designs. The following workflow is typical in studies comparing ML algorithms for radiomics tasks [11] [3] [37].

G start Patient Cohort (Contrast-Enhanced CT Scans) a Image Preprocessing (Bias field correction, normalization) start->a b Tumor Segmentation (Manual/Semi-automatic ROI contouring) a->b c Radiomic Feature Extraction (Using PyRadiomics) b->c d Feature Selection & Processing (Univariate analysis, RFE) c->d e Model Training & Validation (Multiple ML Algorithms) Training/Test Split | k-Fold CV d->e f Performance Evaluation (AUC, Sensitivity, Specificity) e->f g Model Interpretation (SHAP Analysis, Feature Maps) f->g

Data Acquisition and Preprocessing

In the cited two-center study [3], pre-surgical Contrast-Enhanced CT (CE-CT) scans from 83 endometrial cancer patients (46 malignant, 37 benign) were used. The venous phase of CE-CT was specifically utilized as it provides better parenchymal characterization and tumor-myometrium contrast. Key preprocessing steps included ensuring images were reconstructed with a soft tissue algorithm and aligned to the body axis, while excluding scans with significant artifacts.

Tumor Segmentation and Feature Extraction

The region of interest (ROI), encompassing the entire uterus, was manually segmented slice-by-slice on the CT scans by radiologists blinded to pathological results [3] [37]. This step is critical, and inter-physician reproducibility is often assessed using the Dice Similarity Coefficient (DSC), with a value ≥ 0.8 indicating excellent agreement [37].

Following segmentation, a large volume of quantitative data was extracted from each ROI. The open-source PyRadiomics library in Python was used to extract 1,132 radiomic features across various classes, including First-Order Statistics, Shape-based, and Texture features (e.g., Gray Level Co-occurrence Matrix - GLCM) [3].

Feature Selection and Model Training

To avoid overfitting, feature selection was performed. Common methods included:

  • Univariate Analysis to filter out non-significant features [37].
  • Recursive Feature Elimination (RFE) [7].
  • Joint Mutual Information (JMI) and Joint Mutual Information Maximization (JMIM), which were identified in a large-scale comparison as among the top-performing feature selection algorithms for radiomics [38].

Models were typically trained on a subset of the data (e.g., 70% or a 6:4 training-test split) and validated on a held-out test set, often employing 10-fold cross-validation on the training set to tune hyperparameters [3] [37].

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 2: Key Research Reagents and Computational Tools for CT Radiomics

Item/Solution Function in Research Specific Examples / Notes
PyRadiomics Library Open-source platform for standardized extraction of radiomic features from medical images. Essential for reproducibility; compliant with the Image Biomarker Standardisation Initiative (IBSI) [11] [3].
Contrast-Enhanced CT (CE-CT) Scans Provides the primary imaging data for analysis. The venous phase is particularly valuable for endometrial cancer staging [37].
Segmentation Software Used to delineate Regions of Interest (ROIs) around tumors. MIM software [37], ITK-SNAP [7].
Scikit-learn Library Python library providing efficient implementations of machine learning algorithms. Includes Random Forest, SVM, Logistic Regression, etc. [38].
SHAP (SHapley Additive exPlanations) A game-theoretic method to interpret model output and explain feature importance. Critical for creating explainable AI models in a clinical context [3].

Understanding Random Forest Architecture and Advantages

The exceptional performance of Random Forest can be attributed to its underlying ensemble architecture, which effectively mitigates common problems like overfitting that plague simpler models.

G cluster_trees Build Decision Trees Input Training Dataset Bootstrap Create Multiple Bootstrap Samples Input->Bootstrap Tree1 Tree 1 Bootstrap->Tree1 Tree2 Tree 2 Bootstrap->Tree2 Tree3 Tree ... TreeN Tree N Bootstrap->TreeN Aggregation Aggregate Predictions (Majority Vote / Averaging) Tree1->Aggregation Tree2->Aggregation TreeN->Aggregation Subset Random Feature Subset per Split Subset->Tree1 Subset->Tree2 Subset->TreeN Output Final Prediction Aggregation->Output

As illustrated, RF operates by constructing a "forest" of many decision trees during training. Each tree is built on a random subset of the data (bootstrap sample), and at each split in a tree, a random subset of features is considered. This dual randomness introduces de-correlation between the trees, making the model more robust. For classification, the final output is determined by majority voting across all trees in the forest [21].

The key advantages of this architecture include:

  • Robustness to Overfitting: The ensemble nature, leveraging the Law of Large Numbers, effectively averages out noise and reduces variance [3].
  • Handling High-Dimensional Data: RF efficiently manages datasets with a large number of features (like radiomics) without the need for extensive dimensionality reduction prior to modeling [3].
  • Feature Importance Output: The model natively provides estimates of which features contribute most to the predictions, offering valuable biological or clinical insights [21].

Empirical evidence from recent, high-quality studies solidifies Random Forest's position as a top-performing algorithm for classification tasks within CT radiomics research for endometrial cancer. Its consistent excellence, explainability, and inherent resistance to overfitting make it an excellent benchmark and a powerful tool for researchers developing diagnostic and prognostic models. While other algorithms like XGBoost also show strong performance, RF's combination of high accuracy, relative simplicity, and interpretability often makes it the algorithm of choice for validating radiomic pipelines in oncological research.

The radiomics score (Rad-score) represents a transformative quantitative tool in oncologic imaging, enabling objective diagnostic and prognostic predictions by integrating selected imaging features into a single, validated metric. Framed within the broader validation of CT radiomics for endometrial tumor classification, this guide provides a comparative analysis of Rad-score development methodologies across imaging modalities. We present detailed experimental protocols, performance data, and essential research tools, offering researchers a comprehensive framework for implementing Rad-score in gynecologic oncology and beyond. The standardized approach to converting medical images into mineable data through Rad-score demonstrates significant potential for advancing personalized medicine by capturing intratumoral heterogeneity imperceptible to human visual assessment.

The Rad-score represents a calculated value derived from quantitative imaging features that serves as a non-invasive biomarker for various clinical endpoints in oncology [39]. This score computationally integrates multiple radiomics features into a single metric that correlates with pathological conditions, treatment response, and survival outcomes. In the specific context of endometrial cancer (EC), Rad-score development has shown remarkable progress across multiple imaging modalities, including CT, MRI, and ultrasound, providing clinicians with valuable tools for preoperative assessment and prognostic prediction [39] [13] [40].

The fundamental principle underlying Rad-score involves converting digital medical images into high-dimensional, mineable data through automated or semi-automated feature extraction algorithms [39] [3]. These radiomic features comprehensively describe tumor phenotypes by quantifying intensity statistics, textural patterns, and morphological characteristics that may reflect underlying pathological processes [41] [3]. The integration of these selected features into a unified Rad-score provides a powerful approach for capturing intratumoral heterogeneity, thereby facilitating more precise diagnostic classifications and prognostic stratification [39] [7].

Within endometrial cancer research, Rad-score applications have expanded to include differentiation of malignant and benign tumors [13] [3], prediction of histological grade [42], assessment of molecular subtypes [7], and evaluation of disease-free survival [39]. This technological advancement addresses critical clinical challenges in endometrial cancer management, particularly the need for accurate preoperative characterization to guide surgical planning and adjuvant therapy selection [40] [42]. The development of robust Rad-score models represents a significant step toward personalized medicine in gynecologic oncology.

Experimental Protocols for Rad-score Development

Study Population and Image Acquisition

The initial phase of Rad-score development requires careful patient selection and standardized image acquisition. Studies typically employ retrospective designs with pathologically confirmed cases, divided into training and validation cohorts to ensure model generalizability [39] [13] [3]. For endometrial tumor classification, sample sizes have ranged from 83 to 526 patients across multiple institutions [13] [3] [7]. Inclusion criteria commonly encompass histologically verified diagnoses, availability of preoperative imaging within specified timeframes before surgery, and complete clinical-pathological data [39] [40]. Exclusion criteria typically address image artifacts, previous treatments that might alter tumor characteristics, and coexisting malignancies [40] [7].

Image acquisition protocols vary by modality but must be consistent within studies. For CT-based Rad-score development, standard abdominal-pelvic protocols are employed without contrast optimization specifically for radiomics [13] [3]. MRI protocols typically include T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI) with apparent diffusion coefficient (ADC) maps, and dynamic contrast-enhanced (DCE) sequences [40] [7] [42]. Ultrasound studies utilize standard gynecological imaging protocols with consistent transducer frequencies and image settings [39]. Critical to this phase is the documentation of acquisition parameters including slice thickness, reconstruction algorithms, magnetic field strength (for MRI), and tube voltage/current (for CT) to ensure reproducibility [40] [42].

Tumor Segmentation and Feature Extraction

Tumor segmentation, a critical step in the radiomics workflow, involves delineating regions of interest (ROIs) around the entire tumor volume. This process is typically performed manually by experienced radiologists using specialized software such as ITK-SNAP or 3D Slicer [39] [40] [7]. To ensure reproducibility, multiple radiologists often segment the same subset of cases, with intraclass correlation coefficients (ICCs) calculated to assess inter-observer agreement [40] [7]. Features with ICC values >0.75–0.80 are generally considered sufficiently reproducible for subsequent analysis [40] [7].

Feature extraction transforms the segmented ROI into quantitative data using standardized computational algorithms. Platforms such as PyRadiomics (Python) or Artificial Intelligence Kit (A.K., GE Healthcare) are commonly employed to extract hundreds to thousands of radiomic features [39] [13] [3]. These features encompass several categories: first-order statistics describing voxel intensity distributions (histogram-based features); second- and higher-order textures quantifying spatial patterns (Gray-Level Co-occurrence Matrix [GLCM], Gray-Level Run-Length Matrix [GLRLM], etc.); and shape-based features characterizing geometric properties [41] [3]. Additional features may be derived from transformed images using filters like wavelet, Laplacian of Gaussian (LoG), or Fourier transformations to capture multi-scale texture information [3].

Feature Selection and Rad-score Calculation

Feature selection represents a crucial step to avoid overfitting and identify the most predictive features for Rad-score construction. This process typically involves multiple stages, beginning with the removal of non-reproducible features (ICC <0.75–0.80) and redundant features (highly correlated feature pairs) [40] [7]. Subsequently, univariate analysis identifies features significantly associated with the clinical endpoint, followed by more sophisticated multivariate selection techniques [39] [41].

The least absolute shrinkage and selection operator (LASSO) regression has emerged as the predominant method for Rad-score development, particularly for high-dimensional data [39] [41] [40]. LASSO applies a penalty term that shrinks coefficients of less important features to zero, effectively performing feature selection while constructing the predictive model [39]. The optimal penalty parameter (λ) is typically determined through ten-fold cross-validation to maximize predictive performance [39] [40]. Alternative feature selection methods include recursive feature elimination (RFE) [7] and maximum relevance minimum redundancy (mRMR) [40].

The final Rad-score is calculated as a linear combination of the selected features weighted by their respective coefficients from the LASSO regression model [39]. The mathematical formula follows: Rad-score = β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ, where β₀ represents the intercept, βᵢ denotes the coefficient for each selected feature, and Xᵢ represents the normalized value of each feature [39] [41]. This calculated score serves as a composite radiomics signature that can be combined with clinical parameters in subsequent predictive models [39] [42].

Table 1: Key Steps in Rad-score Development Workflow

Development Phase Key Procedures Common Tools & Methods Quality Control Measures
Study Population Patient selection; Cohort division Retrospective design; Training/validation splits Inclusion/exclusion criteria; Multi-center validation
Image Acquisition Standardized imaging protocols CT, MRI, or ultrasound systems Parameter documentation; Phantom studies
Tumor Segmentation Manual/automatic ROI delineation ITK-SNAP; 3D Slicer Multi-reader ICC analysis; ROI consistency checks
Feature Extraction High-throughput feature calculation PyRadiomics; A.K. software IBSI compliance; Feature stability assessment
Feature Selection Dimensionality reduction LASSO; mRMR; RFE Cross-validation; Correlation analysis
Rad-score Calculation Linear combination of features Regression models Coefficient validation; Score distribution analysis

Model Validation and Clinical Implementation

Validation represents the final critical phase in Rad-score development, assessing the model's performance and generalizability. The most common approach involves internal validation using bootstrap resampling or cross-validation techniques applied to the training cohort [39] [3]. More robust validation employs temporal, geographic, or multi-center external validation cohorts to evaluate model transportability across different populations and imaging protocols [40] [7]. For endometrial cancer applications, studies have demonstrated successful validation across 2-3 independent institutions [40] [7].

Model performance is quantitatively evaluated using discrimination metrics, particularly the area under the receiver operating characteristic curve (AUC) for classification tasks [39] [13] [3] and Harrell's C-index for survival predictions [39]. Additional evaluation includes calibration assessed through calibration curves or Hosmer-Lemeshow tests [39] [42], and clinical utility evaluated via decision curve analysis (DCA) [13] [40] [3]. The Rad-score is typically integrated with clinical parameters through nomograms [39] [42] or combined prediction models [7] to enhance clinical applicability.

For implementation, patients are often stratified into high-risk and low-risk groups based on optimal Rad-score cutoffs determined from the training cohort using methods like maximally selected rank statistics or receiver operating characteristic (ROC) analysis [39]. This binary stratification facilitates clinical decision-making, such as identifying candidates for more aggressive treatments or intensified follow-up protocols [39] [42].

Comparative Performance of Rad-score Across Imaging Modalities

CT-based Rad-score for Endometrial Tumor Classification

Computed tomography has emerged as a valuable modality for Rad-score development in endometrial cancer, despite traditionally being considered less optimal than MRI for soft tissue characterization. A recent two-center study developed a CT-based Rad-score using explainable machine learning for differentiating malignant and benign endometrial tumors [13] [3] [14]. The investigation extracted 1,132 radiomic features from preoperative CT scans of 83 patients (46 malignant, 37 benign) and implemented six machine learning algorithms to determine the optimal predictive model [13] [3]. The Random Forest model emerged as superior, achieving a perfect training AUC of 1.00 and maintaining an impressive testing AUC of 0.96 [13] [3]. The sensitivity and specificity in the testing set reached 100% and 92.31%, respectively, demonstrating the strong discriminatory power of CT-based Rad-score for endometrial tumor classification [3].

The clinical utility of this CT-based approach was further validated through decision curve analysis, which indicated a higher net benefit compared to "treat all" or "treat none" strategies across most probability thresholds [3]. SHapley Additive exPlanations (SHAP) analysis identified the most influential radiomic features, revealing that 60% represented texture features while 40% were first-order statistical features [3]. Notably, 90% of these top-performing features originated from transformed images, particularly wavelet and Laplacian of Gaussian (LoG) filters, highlighting the importance of image transformation for enhancing feature expression in CT radiomics [3].

MRI-based Rad-score for Prognostic Prediction

Magnetic resonance imaging has been more extensively utilized for Rad-score development in endometrial cancer, leveraging its superior soft tissue contrast for various prognostic applications. For predicting disease-free survival (DFS) in endometrial cancer patients, ultrasound-based radiomics models have demonstrated robust performance, with a Rad-score incorporating nine selected features achieving AUCs of 0.823 and 0.792 in training and validation cohorts, respectively [39]. When this Rad-score was combined with clinical parameters in a nomogram, the predictive performance improved significantly to AUCs of 0.893 and 0.885 in training and validation cohorts [39]. Patients stratified into high-risk groups based on Rad-score showed significantly worse DFS, confirming the prognostic value of the radiomics approach [39].

For histological grading of endometrial carcinoma, an MRI-based Rad-score integrated with clinical indicators (CA125 and BMI) demonstrated exceptional performance in discriminating low-grade (G1-G2) from high-grade (G3) tumors [42]. The combined model achieved AUCs of 0.925 and 0.915 in training and test cohorts, respectively, significantly outperforming models based on ADC values alone (AUC 0.715 and 0.621) [42]. Similarly, for classifying molecular subtypes in endometrial cancer, a clinical-radiomics deep learning model based on MRI showed macro-average AUCs of 0.79 and 0.74 in internal and external validation cohorts, respectively [7]. This model exhibited particularly strong performance for POLEmut (AUC 0.79) and p53abn (AUC 0.78) subtypes [7].

Table 2: Performance Comparison of Rad-score Across Imaging Modalities and Clinical Applications

Imaging Modality Clinical Application Sample Size Selected Features Performance (AUC) Reference
CT Benign vs. malignant endometrial tumor classification 83 patients (2 centers) 20 features (60% texture, 40% first-order) Training: 1.00; Testing: 0.96 [13] [3]
Ultrasound Disease-free survival prediction in endometrial cancer 175 patients 9 features from 1,130 extracted Training: 0.823; Validation: 0.792 [39]
Ultrasound Disease-free survival prediction (combined nomogram) 175 patients Rad-score + clinical parameters Training: 0.893; Validation: 0.885 [39]
MRI Histological grade discrimination (G1-G2 vs. G3) 358 patients Rad-score + CA125 + BMI Training: 0.925; Testing: 0.915 [42]
MRI Molecular subtype classification 526 patients (3 centers) Clinical-radiomics DL model Internal: 0.79; External: 0.74 [7]
MRI Benign vs. malignant endometrial lesion classification 139 patients (2 centers) 15 selected features Training: 0.90; Testing: 0.85 [40]

Cross-Cancer Validation of Rad-score Applications

The development and validation of Rad-score extend beyond endometrial cancer, demonstrating its broad applicability across oncology. In lung cancer diagnosis, a CT-based Rad-score developed for differentiating benign and malignant pulmonary nodules showed AUCs of 0.895 and 0.808 in training and validation cohorts, respectively [41]. When combined with clinical factors, the composite model performance improved further to AUCs of 0.927 and 0.854 [41]. Similarly, for assessing liver fibrosis in metabolic dysfunction-associated steatotic liver disease (MASLD), an MRI-based Rad-score incorporating 12 selected features demonstrated strong diagnostic performance with AUCs of 0.90 and 0.89 in training and testing sets, respectively [43].

These cross-cancer applications highlight several consistent advantages of the Rad-score approach. First, the integration of Rad-score with clinical parameters consistently enhances predictive performance compared to either alone [39] [41] [42]. Second, the stratification of patients into risk groups based on Rad-score cutoffs effectively identifies subgroups with significantly different clinical outcomes [39]. Third, the methodology for Rad-score development shows remarkable consistency across different cancer types and imaging modalities, supporting its standardization and broader implementation in oncologic imaging [39] [41] [43].

Visualization of Rad-score Development Workflow

The following diagram illustrates the standardized workflow for Rad-score development, from image acquisition through clinical application:

rad_score_workflow cluster_segmentation Segmentation Phase cluster_extraction Feature Extraction Phase cluster_selection Feature Selection & Rad-score cluster_validation Validation Phase cluster_application Clinical Implementation start Patient Population & Image Acquisition segmentation Tumor Segmentation & ROI Definition start->segmentation extraction Feature Extraction segmentation->extraction manual_seg Manual ROI Delineation selection Feature Selection & Rad-score Calculation extraction->selection feature_calc High-throughput Feature Calculation validation Model Validation selection->validation lasso LASSO Regression application Clinical Application validation->application internal_val Internal Validation nomogram Nomogram Development software ITK-SNAP / 3D Slicer manual_seg->software inter_observer Inter-observer ICC Analysis software->inter_observer platforms PyRadiomics / A.K. Software feature_calc->platforms categories First-order, Texture, Shape Features platforms->categories coefficients Feature Coefficient Calculation lasso->coefficients rad_formula Rad-score = β₀ + ΣβᵢXᵢ coefficients->rad_formula external_val External Validation internal_val->external_val metrics AUC, Calibration, DCA external_val->metrics stratification Risk Stratification nomogram->stratification decision Clinical Decision Support stratification->decision

Rad-score Development and Validation Workflow

Table 3: Essential Research Tools for Rad-score Development

Tool Category Specific Tool/Platform Primary Function Application Example
Image Analysis Software ITK-SNAP Manual tumor segmentation and ROI definition Delineating endometrial tumors on CT/MRI [39] [40] [7]
Radiomics Feature Extraction PyRadiomics (Python) Standardized extraction of radiomic features Calculating 1,130+ features from ROIs [13] [3] [7]
Radiomics Feature Extraction Artificial Intelligence Kit (A.K.) Commercial radiomics analysis platform Feature extraction in ultrasound-based studies [39]
Statistical Analysis R Software with "pROC", "rms" packages Statistical analysis and model development LASSO regression; ROC analysis [39]
Feature Selection LASSO Regression Dimensionality reduction and feature selection Identifying most predictive features from thousands [39] [41] [40]
Model Validation Cross-validation (10-fold) Internal validation of model performance Optimizing λ parameter in LASSO [39] [40]
Performance Evaluation Receiver Operating Characteristic (ROC) Analysis Quantifying model discrimination capability Calculating AUC for diagnostic performance [39] [13] [3]
Clinical Utility Assessment Decision Curve Analysis (DCA) Evaluating clinical value of prediction models Assessing net benefit across threshold probabilities [13] [40] [3]

The development of Rad-score through integration of selected radiomic features represents a standardized, reproducible methodology for enhancing diagnostic and prognostic predictions in endometrial cancer and beyond. The comparative analysis presented in this guide demonstrates consistently strong performance across multiple imaging modalities, with CT-based approaches achieving exceptional discrimination (AUC 0.96) for tumor classification [13] [3], MRI-based models successfully predicting histological grade (AUC 0.915) [42] and molecular subtypes (AUC 0.79) [7], and ultrasound-based methods effectively stratifying disease-free survival risk [39]. The standardized workflow encompassing image acquisition, tumor segmentation, feature extraction, selection via LASSO regression, and multi-tier validation provides researchers with a robust framework for Rad-score implementation.

The integration of Rad-score with clinical parameters consistently enhances predictive performance compared to either approach alone, supporting the development of combined nomograms for clinical decision support [39] [42]. The essential research tools outlined in this guide, particularly open-source platforms like PyRadiomics and ITK-SNAP, make Rad-score development accessible to the research community while promoting methodological standardization. As radiomics continues to evolve, the Rad-score methodology will likely play an increasingly important role in advancing precision medicine by transforming routine medical images into powerful quantitative biomarkers for personalized cancer care.

Performance Comparison of Machine Learning Models in Endometrial Tumor Classification

Table 1: Diagnostic Performance of Different Machine Learning Models in a Two-Center CT Radiomics Study

Machine Learning Model Training AUROC Testing AUROC Testing Sensitivity Testing Specificity Key Strengths
Random Forest 1.00 0.96 100% 92.31% Superior overall performance; resists overfitting [3]
XGBoost Not Reported >0.88 Not Reported Not Reported High performance [3]
Support Vector Classifier Not Reported >0.88 Not Reported Not Reported Competitive performance [3]
K-Nearest Neighbors Not Reported >0.88 Not Reported Not Reported Competitive performance [3]
Logistic Regression Not Reported >0.88 Not Reported Not Reported Competitive performance [3]
TabPFNv2 Not Reported >0.88 Not Reported Not Reported Novel tabular foundation model [3]

The comparative analysis of six machine learning algorithms revealed that the Random Forest model was the optimal choice for classifying endometrial tumors using CT radiomics features. It demonstrated perfect discriminative ability on the training data and maintained exceptional performance on the independent testing set, underscoring its robustness and clinical applicability [3].

Table 2: Performance of Radiomics Models Across Multiple Clinical Prediction Tasks in Endometrial Cancer

Prediction Task Imaging Modality Model Type Performance (AUC) Key Contributors (via SHAP)
Molecular Subtype Classification [7] MRI Clinical-Radiomics DL Model 0.79 (Macro-average) Integrated model outperformed clinical or radiomics-only models
Deep Myometrial Invasion [44] Multiparametric MRI Radiomics with CNN 0.960 Model extracted features related to biological characteristics
Lymph-Vascular Space Invasion [44] Multiparametric MRI Radiomics with CNN 0.924 Model extracted features related to biological characteristics
Histologic Grade [44] Multiparametric MRI Radiomics with CNN 0.937 Model extracted features related to biological characteristics
Microsatellite Instability [45] Multiparametric MRI Hybrid Radiomics (HMRadSum) 0.945 Combined quantitative and deep learning features

Experimental Protocols and Workflow

The following diagram illustrates the standard experimental workflow for developing an explainable AI model in radionics, from data collection to clinical application.

workflow DataCollection Data Collection & ROI Segmentation FeatureExtraction Feature Extraction (PyRadiomics) DataCollection->FeatureExtraction ModelTraining Model Training & Validation FeatureExtraction->ModelTraining SHAPAnalysis SHAP Analysis ModelTraining->SHAPAnalysis ClinicalApplication Clinical Interpretation & Application SHAPAnalysis->ClinicalApplication

Data Collection and Preprocessing

The foundational two-center study included 83 endometrial cancer patients (46 malignant, 37 benign). Data was split into a training set (n=59) and testing set (n=24) to ensure robust validation [3]. Regions of interest were manually segmented from pre-surgical CT scans, a critical step for subsequent feature extraction [3].

Feature Extraction and Selection

Using the PyRadiomics platform (version 3.0.1), researchers extracted 1,132 radiomic features from each CT scan [3]. The SHAP analysis revealed that among the top 20 most important features, 60% were texture features while 40% were first-order statistical features [3]. Notably, 90% of these significant features originated from transformed images (e.g., wavelet and LoG filtering), indicating that image transformations enhance the expression of critical texture information [3].

Model Development and Explainability Implementation

Six machine learning algorithms were implemented and compared: Logistic Regression, K-Nearest Neighbors, Support Vector Classifier, XGBoost, Random Forest, and TabPFNv2 [3]. The optimal Random Forest model was then interpreted using SHAP (SHapley Additive exPlanations) analysis, which quantified the contribution of each feature to the model's predictions [3]. This provided both local explanations for individual cases and global feature importance rankings.

Research Reagent Solutions

Table 3: Essential Research Tools for Explainable AI in Radiomics

Tool Name Category Primary Function Application Example
PyRadiomics (v3.0.1) Feature Extraction IBSI-compliant extraction of radiomic features from medical images Extracted 1,132 features from CT scans for endometrial tumor classification [3]
SHAP Model Interpretation Explains output of machine learning models using Shapley values Identified top 20 radiomic features and their contribution to Random Forest predictions [3]
ITK-SNAP Image Segmentation Semi-automatic segmentation of medical images in 2D and 3D Manual delineation of regions of interest from pre-surgical CT scans [3] [7]
RadShap Specialized Interpretation Explains predictions of multi-ROI radiomic models Highlighted contribution of individual regions in multi-lesion analysis [46]
LIFEx Image Processing Quantification of texture features from medical images Used for delineating all tumor foci in PET scans for radiomic analysis [46]

SHAP Analysis and Feature Importance

The following diagram illustrates how SHAP analysis decomposes a model's prediction to highlight the contribution of individual features.

shap ModelPrediction Model Prediction Malignant Tumor BaseValue Base Value (Average Prediction) BaseValue->ModelPrediction Starting Point Feature1 Texture Feature (+35%) Feature1->ModelPrediction Positive Contribution Feature2 Shape Feature (+22%) Feature2->ModelPrediction Positive Contribution Feature3 First-Order Feature (-11%) Feature3->ModelPrediction Negative Contribution

In the foundational CT radiomics study, SHAP analysis provided critical insights into the model's decision-making process. All selected radiomic features showed statistically significant associations with endometrial cancer classification (p < 0.05) [3]. The analysis revealed that texture features and features derived from transformed images were particularly important for accurate differentiation between malignant and benign tumors [3].

Similar approaches have been successfully applied to other medical AI applications. For instance, in a study predicting biological characteristics of endometrial cancer, SHAP analysis helped identify which imaging features were most predictive of myometrial invasion, lymph-vascular space invasion, histologic grade, and estrogen receptor status [44]. This capability to explain "why" a model makes a particular prediction is crucial for building clinical trust and facilitating adoption.

Decision curve analysis conducted alongside SHAP interpretations demonstrated that the explainable Random Forest model provided higher net benefit compared to the "treat all" or "treat none" strategies across a range of risk thresholds [3]. This provides compelling evidence for the clinical utility of explainable AI in identifying high-risk cases and reducing unnecessary interventions.

Addressing Technical Challenges and Optimizing Model Performance

The field of radiomics offers significant potential for non-invasive cancer diagnosis and prognosis, including for endometrial cancer. However, the translation of radiomic signatures into clinical practice is hampered by challenges in reproducibility, particularly the variability introduced by human observers during image segmentation. Inter-observer reproducibility, which measures the agreement between different raters outlining the same region of interest, is a critical validation step that ensures radiomic features are robust and reliable. A key statistical tool for quantifying this reliability is the Intraclass Correlation Coefficient. Proper application and reporting of ICC are fundamental to establishing confidence in radiomic measurements and ensuring that subsequent models are built upon a foundation of robust data [47] [48].

Understanding the Intraclass Correlation Coefficient (ICC)

ICC Fundamentals and Interpretation

The Intraclass Correlation Coefficient is a descriptive statistic used to measure how strongly units in the same group resemble each other. In the context of radiomics, it quantifies the reliability of measurements made by different raters assessing the same subject [49]. Unlike other correlation measures, the ICC operates on data structured as groups and is calculated from mean squares derived from analysis of variance. It represents the ratio of true variance (between subjects) to the total variance (true variance plus error variance) [47] [50].

The ICC is a value between 0 and 1, and its interpretation follows general guidelines:

  • Values less than 0.5: Indicative of poor reliability
  • Values between 0.5 and 0.75: Indicative of moderate reliability
  • Values between 0.75 and 0.9: Indicative of good reliability
  • Values greater than 0.90: Indicative of excellent reliability [47] [51]

It is important to note that these are guidelines, and the acceptability of an ICC score also depends on established values in similar research literature [50].

Selecting the Appropriate ICC Form

A critical consideration is that multiple forms of ICC exist, each with distinct assumptions and interpretations. The appropriate form is selected based on three key parameters [47] [50]:

  • Model: This depends on whether the same set of raters evaluates all subjects and whether raters are considered random samples from a larger population (generalizable) or are the only raters of interest (fixed effects).
  • Type: This specifies whether the reliability is for measurements from a single rater or the mean of multiple raters.
  • Definition: This determines whether the analysis should measure absolute agreement (including systematic errors) or consistency (where systematic errors are canceled out) between raters.

Table 1: Guide to Selecting the Appropriate ICC Form Based on Research Design

Selection Factor Options Appropriate Use Case
Statistical Model One-way Random Effects Each subject is rated by a different, randomly selected set of raters [47].
Two-way Random Effects Raters are randomly selected from a larger population; results are generalizable [47].
Two-way Mixed Effects Raters are the only ones of interest; results are not generalizable [47].
Type of Measure Single Rater Reliability applies to a context where a single rater's measurement will be used [47] [50].
Average of Multiple Raters Reliability applies when the average score of multiple raters will be used [47] [50].
Definition of Relationship Absolute Agreement Accounts for both correlation and agreement in the raters' scores; sensitive to systematic bias [47] [50].
Consistency Assesses if raters are consistent in their scoring pattern relative to each other; less sensitive to systematic bias [47] [50].

The selection can be guided by answering four key questions about the research design, as shown in the workflow below.

ICC_Selection Start Start: Selecting an ICC Form Q1 Q1: Same raters for all subjects? Start->Q1 Q2 Q2: Raters random from a population? Q1->Q2 Yes O1 One-Way Random Q1->O1 No Model Model Selection Q2->Model O2 Two-Way Random Q2->O2 Yes O3 Two-Way Mixed Q2->O3 No Q3 Q3: Reliability for single or mean of raters? Type Type Selection Q3->Type O4 Single Measures Q3->O4 Single Rater O5 Average Measures Q3->O5 Mean of k Raters Q4 Q4: Need absolute agreement or consistency? Definition Definition Selection Q4->Definition O6 Absolute Agreement Q4->O6 Absolute Agreement O7 Consistency Q4->O7 Consistency Model->Q3 Type->Q4

Experimental Protocols for Assessing Inter-observer Reproducibility

Standardized Workflow for Reproducibility Analysis

A robust experimental protocol is essential for a valid assessment of inter-observer reproducibility. The workflow below outlines the key steps involved, from initial data collection to the final calculation of the ICC, integrating best practices from recent radiomics studies [52] [40].

ReproducibilityWorkflow Step1 1. Image Acquisition & Cohort Selection Sub1 • Use multi-center/scanner data if possible • Document CT parameters (kVp, mAs, slice thickness) • Ensure consistent image pre-processing Step1->Sub1 Step2 2. Multi-rater Segmentation Sub2 • Multiple trained raters (≥3 recommended) • Manual or semi-automatic segmentation • Blind to clinical/pathological data Step2->Sub2 Step3 3. Feature Extraction Sub3 • Use standardized software (e.g., PyRadiomics) • Extract shape, first-order, and texture features • Adhere to IBSI guidelines Step3->Sub3 Step4 4. Feature Stability Assessment Sub4 • Identify robust vs. poorly reproducible features • Exclude features with ICC < 0.75-0.80 • Focus model building on stable features Step4->Sub4 Step5 5. Intraclass Correlation (ICC) Calculation Sub5 • Select correct ICC form (see Table 1) • Report ICC estimate and 95% confidence interval • Use software: Pingouin, R, SPSS Step5->Sub5 Sub1->Step2 Sub2->Step3 Sub3->Step4 Sub4->Step5

Key Methodological Considerations

Segmentation and Rater Variability: Manual segmentation by multiple experts is a common source of variability. Studies have shown that more than three raters are often needed to capture the full distribution of plausible segmentations [52]. To ensure consistency, researchers should assess inter-rater agreement using metrics like the Dice Similarity Coefficient (DSC) and the Mean Distance to Agreement (MDA). A DSC ≥ 0.8 and MDA ≤ 3 mm are commonly used thresholds to verify acceptable agreement among physicians [11].

Feature Robustness Screening: A crucial step is to screen radiomic features for their robustness before model building. A widely adopted method is to calculate the ICC for each feature across the multiple segmentations and retain only those features that demonstrate high reproducibility. A common threshold is ICC ≥ 0.75 or 0.80 for a feature to be considered sufficiently robust [52] [40]. For instance, one study extracted 396 distinct MRI radiomic features and retained only those with an ICC > 0.75 for subsequent model development, ensuring the analysis was based on stable and reproducible signals [40].

Quantitative Data on Radiomics Feature Reproducibility

Reproducibility Across Tumor Types and Segmentation Methods

Empirical data from multiple studies provides critical insight into which radiomic features are consistently robust against inter-observer variability. The following table synthesizes findings from key investigations that systematically evaluated feature reproducibility.

Table 2: Reproducibility of Radiomics Features Under Inter-observer Variability

Study & Dataset Robust Features (High ICC) Non-robust Features (Low ICC) Key Findings on Reproducibility
Haarburger et al. [52]Lung (LIDC), Kidney (KiTS), Liver (LiTS) CT First-Order Statistics:• Energy• Total Energy• Minimum Intensity Texture Features:• Large Dependence Emphasis (from GLDM)• Large Area Low Gray Level Emphasis (from GLSZM) • Identified consistent subsets of robust features across 3 datasets and 25 automated segmentations.• Demonstrated that some features are consistently unstable and prone to poor reproducibility.
Radiomics & Endometrial Cancer (MRI) [40] Features with ICC > 0.75:• Select Histogram features• Certain GLCM and GLRLM features Features with ICC < 0.75:• Excluded from model construction • Used ICC > 0.75 as a filter for feature selection in a predictive model for classifying endometrial lesions.• The final model showed high diagnostic performance (AUC=0.90 in training, 0.85 in test).
Deep Learning Harmonization (Abdominal CT) [53] Post-Harmonization:• Vessel features (increased from 14% to 69% reproducible)• Spleen, Kidney, Muscle, Liver features Pre-Harmonization:• Majority of features showed low reproducibility • Deep learning-based image harmonization significantly improved feature reproducibility.• Reproducible features increased from 18% to 65% after harmonization (patient-based analysis).

Impact of Technical Factors on Reproducibility

The reproducibility of radiomic features is sensitive to several technical factors beyond human segmentation. A review of CT radiomics highlighted that parameters affecting image noise—such as kilovoltage (kVp), tube current (mAs), slice thickness, and reconstruction algorithm—can significantly impact feature stability [48]. This underscores the necessity of standardizing imaging protocols or developing harmonization techniques to improve robustness. Furthermore, features can be categorized based on their stability in relation to these parameters, providing a guideline for feature selection in clinical studies.

Table 3: Key Research Reagent Solutions for Radiomics Reproducibility Studies

Tool Category Specific Tool / Resource Function and Application
Radiomics Feature Extraction PyRadiomics (Python) [52] [11] Open-source library for the extraction of a large panel of standardized radiomic features from medical images.
Statistical Analysis & ICC Calculation Pingouin (Python) [50]R Statistical Software [51]SPSS [47] Statistical packages that provide functions for calculating various forms of ICC, along with confidence intervals.
Image Segmentation ITK-SNAP [40]MIM Software [11] Software platforms used for manual, semi-automatic, or automatic delineation of regions of interest (ROIs).
Standardization Initiative Image Biomarker Standardization Initiative (IBSI) [32] [48] An international consortium providing standardized guidelines for radiomic feature extraction and calculation to improve reproducibility.
Deep Learning Harmonization Probabilistic U-Net / PHiSeg [52] Deep learning algorithms used to generate multiple plausible segmentations or to harmonize images from different scanners, improving feature robustness.

Ensuring the robustness of radiomic features through rigorous validation of inter-observer reproducibility is a non-negotiable step in developing reliable models for endometrial tumor classification. The Intraclass Correlation Coefficient serves as a fundamental metric for this validation. The process requires a meticulous approach: careful selection of the ICC form based on the experimental design, implementation of a standardized workflow involving multiple raters, and systematic screening of features to retain only those with high reproducibility (typically ICC ≥ 0.75). Empirical evidence shows that while a significant proportion of radiomic features are sensitive to inter-observer variability, a consistent subset of robust features exists. Building models upon these stable features, while accounting for technical factors like CT acquisition parameters, is the most viable path toward developing generalizable and clinically applicable radiomic signatures for endometrial cancer and beyond.

In the field of computed tomography (CT) radiomics for endometrial tumor classification, the high dimensionality of feature data poses a significant challenge for developing robust machine learning models. The number of extracted radiomic features often vastly exceeds the number of patient samples, creating a perfect environment for overfitting, where models perform well on training data but fail to generalize to new data. This article provides a comprehensive comparison of strategies to mitigate this risk, focusing on feature selection techniques and cross-validation protocols, with specific application to endometrial cancer (EC) research. We present experimental data from recent studies to guide researchers, scientists, and drug development professionals in implementing validated approaches for performance validation.

Understanding the Challenge: Feature Dimensionality in Radiomics

Radiomics analysis typically extracts hundreds to thousands of quantitative features from medical images, transforming them into mineable data for cancer detection, diagnosis, and treatment response prediction [54]. In endometrial cancer research, one study extracted 1,132 radiomic features from pre-surgical CT scans of 83 EC patients to differentiate malignant from benign conditions [3]. This high feature count relative to the small sample size creates the "curse of dimensionality," necessitating robust dimensionality reduction techniques to prevent model overfitting.

The challenge extends beyond mere feature count to the inherent redundancy and correlation among radiomic features. Without proper feature selection, machine learning models may learn noise and dataset-specific variations rather than biologically meaningful patterns, compromising their clinical utility and generalizability.

Comparative Analysis of Feature Selection Strategies

Feature selection methods are primarily used in radiomics to eliminate redundant features and identify clinically relevant ones, thereby enhancing model interpretability and performance [55]. The table below summarizes the performance of various feature selection methods across multiple radiomics studies.

Table 1: Performance Comparison of Feature Selection Methods in Radiomics

Feature Selection Method Study Context Performance Metrics Key Findings
LASSO Regression Gynecologic cancer HT prediction [56] AUC: 0.927 (combined model) Effectively selected features from clinical and MRI-radiomics data; superior to CT-only models
Random Forest Feature Importance Gynecologic cancer HT prediction [56] AUC: 0.927 (combined model) Complemented LASSO in selecting predictive bone marrow radiomic features
mRMR + LASSO Lung adenocarcinoma classification [57] AUC: 0.929 (GradientBoosting model) Identified 8 discriminative radiomics features; effective two-stage selection
Hybrid Feature Selection Breast cancer NAC response [54] Accuracy: 0.88 Matrix rank theorem removed redundancy; genetic algorithm optimized feature set
Extremely Randomized Trees (ET) Multi-study benchmark [55] Highest average AUC rank (8.0) Top performer across 50 radiomic datasets; robust feature importance ranking
Non-Negative Matrix Factorization Multi-study benchmark [55] Best projection method (rank: 9.8) Occasionally outperformed selection methods on individual datasets

Feature Selection vs. Feature Projection

While feature selection chooses a subset of original features, feature projection methods create new features by recombining original ones. A comprehensive benchmarking study on 50 radiomic datasets revealed that feature selection methods, particularly Extremely Randomized Trees (ET) and LASSO, achieved the highest average performance [55]. Although projection methods like Non-Negative Matrix Factorization (NMF) occasionally outperformed selection methods on individual datasets, the study concluded that selection methods should remain the primary approach in radiomics [55].

Cross-Validation: A Critical Companion Technique

Cross-validation (CV) provides a robust framework for estimating model performance and tuning hyperparameters while mitigating overfitting. It is particularly crucial in radiomics studies with limited sample sizes.

Standard Protocols and Applications

Nested cross-validation represents the gold standard approach, featuring an inner loop for model selection and hyperparameter tuning and an outer loop for performance estimation. A large benchmarking study implemented a "rigorous nested cross-validation strategy" with 10 repeats of 5-fold CV to evaluate models across 50 datasets [55].

In endometrial cancer research, one study employed 5-fold cross-validation to evaluate a random forest model for diagnosing EC, which achieved a perfect AUROC of 1.00 during training and maintained 0.96 during testing [3]. Similarly, another study on predicting hematologic toxicity in gynecologic cancer patients used 5-fold CV with AUC as the primary metric for feature selection and hyperparameter tuning [56].

Independent Validation

Beyond cross-validation, independent external validation represents the strongest evidence of model generalizability. A study on microsatellite instability (MSI) prediction in endometrial cancer demonstrated this approach, where models trained on 222 patients from one center were tested on 70 patients from a second center, achieving an AUC of 0.938 [58]. This multi-center validation approach provides confidence that the model can generalize beyond the training data distribution.

Experimental Protocols in Practice

Detailed Workflow for Radiomics Analysis

The following diagram illustrates a standardized radiomics workflow integrating both feature selection and cross-validation, compiled from multiple studies:

G start Patient Cohort & CT Image Acquisition seg Tumor Segmentation (Manual/ROI delineation) start->seg extract Radiomic Feature Extraction (1100+ features typical) seg->extract fs Feature Selection extract->fs lasso LASSO Regression fs->lasso Common rf_imp Random Forest Importance fs->rf_imp Common mrmr mRMR fs->mrmr Effective hybrid Hybrid Methods fs->hybrid Advanced model_dev Model Development lasso->model_dev rf_imp->model_dev mrmr->model_dev hybrid->model_dev cv Cross-Validation (5-fold typical) model_dev->cv nested Nested CV (Optimal) model_dev->nested external External Validation (Multi-center) model_dev->external final Validated Predictive Model cv->final nested->final external->final

Diagram 1: Radiomics Analysis Workflow (Character count: 86)

Implementation Protocols

LASSO Regression Protocol:

  • Applied logistic regression with L1 penalty to promote sparsity
  • Tuned via cross-validation to determine optimal regularization parameter [56]
  • Combined with Mann-Whitney U test for preliminary filtering in some studies [58]
  • Effectively handled high-dimensional data (e.g., 2,675 CT radiomic features) [56]

Random Forest Feature Importance Protocol:

  • Utilized mean decrease in Gini impurity for feature ranking [56]
  • Often combined with other methods (e.g., top 30 features selected) [56]
  • Particularly effective for capturing complexity and diversity of radiomic data [3]

Hybrid Feature Selection Protocol:

  • Implemented two-phase approach: filter-based strategy followed by wrapper method
  • Used matrix rank theorem to remove dependent and redundant features
  • Applied genetic algorithm coupled with SVM classifier to determine optimal features [54]

The Scientist's Toolkit: Essential Research Reagents

Table 2: Key Research Reagent Solutions for CT Radiomics in Endometrial Cancer

Tool/Category Specific Examples Function/Application Evidence
Radiomics Software PyRadiomics, 3D Slicer Standardized feature extraction from medical images [56] [3] [58]
Feature Selection Algorithms LASSO, Random Forest, mRMR, Genetic Algorithms Dimensionality reduction; identifies most predictive features [56] [55] [54]
Machine Learning Classifiers Random Forest, SVM, XGBoost, Logistic Regression Model development for classification tasks [56] [3] [58]
Validation Frameworks 5-fold CV, Nested CV, External Validation Performance estimation; mitigates overfitting [56] [55] [3]
Image Preprocessing Tools N4 bias field correction, Z-score normalization, Resampling Standardizes images before feature extraction [58]

The successful application of CT radiomics for endometrial tumor classification requires meticulous attention to overfitting mitigation through strategic feature selection and rigorous validation. Based on current evidence, feature selection methods, particularly LASSO and tree-based importance ranking, consistently outperform feature projection approaches in most radiomics scenarios [55]. Furthermore, random forest classifiers frequently demonstrate superior performance in endometrial cancer applications, effectively handling the complexity of radiomic data while minimizing overfitting [3].

The integration of robust feature selection with comprehensive cross-validation, preferably including external multi-center validation, represents the gold standard for developing clinically applicable models. As radiomics continues to evolve toward clinical implementation, these methodological considerations will be paramount for ensuring that predictive models generalize beyond their training cohorts to deliver meaningful impact in endometrial cancer diagnosis and treatment.

Radiomics, the high-throughput extraction of quantitative features from medical images, has emerged as a powerful tool for tumor characterization, prognosis prediction, and treatment response assessment [59] [60]. In endometrial cancer research, radiomics shows particular promise for non-invasive molecular subtyping, differentiation of malignant and benign tumors, and prediction of critical biomarkers such as microsatellite instability [7] [3] [61]. However, the clinical translation of radiomics faces a significant barrier: the variability of quantitative features introduced by differences in computed tomography (CT) scanners, acquisition parameters, and reconstruction protocols across multiple centers [59] [62] [60].

This variability, often termed "batch effects" or "scanner effects," compromises the reproducibility and generalizability of radiomic models [60]. Features sensitive to acquisition parameters may exhibit greater variability due to technical factors than actual biological differences, potentially leading to spurious findings and models that fail to validate on external datasets [59] [62]. Consequently, harmonization techniques have become an essential preprocessing step in multi-center radiomics studies to ensure that observed differences reflect true biological signals rather than technical artifacts [62] [60] [63].

Fundamentals of Harmonization Techniques

Harmonization methods can be broadly categorized into image-based and feature-based approaches. Image-based methods operate directly on the raw images, aiming to standardize their technical characteristics, while feature-based methods adjust the extracted feature values to remove unwanted technical variability [59] [64].

Image-based harmonization includes techniques such as generative adversarial networks (GANs) and resolution harmonization methodologies. GANs learn complex transformations to translate images from one scanner protocol to another, potentially improving visual consistency [59]. Resolution harmonization methods leverage multiple reconstructions from the same acquisition to estimate and enhance spatial resolution while controlling noise [64]. These methods typically require access to raw images and can be computationally intensive.

Feature-based harmonization, exemplified by ComBat and its derivatives, operates on already-extracted radiomic features [62]. These methods model and remove batch effects statistically, requiring only the feature values and corresponding batch information. They are computationally efficient and have demonstrated effectiveness in multi-center studies [60] [63].

Table: Comparison of Harmonization Technique Categories

Approach Method Operation Level Data Requirements Key Advantages
Image-based GANs Image Raw images Potentially improves visual quality
Image-based Resolution Harmonization Image Multiple reconstructions Reduces spatial resolution variability
Feature-based ComBat Features Feature matrix Computationally efficient, widely validated
Feature-based CovBat Features Feature matrix Extends ComBat by correcting covariance

Deep Dive into Key Harmonization Methods

ComBat Harmonization: Theory and Implementation

ComBat, originally developed for genomics, has become one of the most widely used harmonization methods in radiomics [62]. The method employs an empirical Bayes framework to estimate and remove additive and multiplicative batch effects that systematically affect feature measurements across different scanners or protocols [62] [60].

The ComBat model assumes that each measured feature value ( y{ij} ) for batch ( i ) and measurement ( j ) can be expressed as: [ y{ij} = \alpha + \gammai + \deltai \epsilon{ij} ] where ( \alpha ) represents the overall feature mean, ( \gammai ) is the additive batch effect, ( \deltai ) is the multiplicative batch effect, and ( \epsilon{ij} ) is the error term [62]. After estimating these parameters, ComBat applies the following correction: [ y{ij}^{\text{harmonized}} = \frac{y{ij} - \hat{\alpha} - \hat{\gamma}i}{\hat{\delta}i} + \hat{\alpha} ] This transformation effectively aligns the mean and variance of feature distributions across different batches [62].

A critical consideration in ComBat implementation is that the transformation is data-driven and specific to the input population and tissue type [62]. For instance, the appropriate transformation for SUVmax in liver tissue differed significantly from that for breast tumors in one study, highlighting the importance of tissue-specific harmonization [62]. ComBat can be applied either to align distributions to a virtual reference site or to a specific chosen reference site, with the latter often preferred for interpretability [62].

Advanced Methods: CovBat and Image-based Approaches

CovBat represents an advancement of ComBat that additionally corrects for covariance differences between batches [60]. In a multi-center, multi-device study comparing eight CT models from four manufacturers, CovBat demonstrated superior performance to ComBat in preserving biological signals while removing technical variability, particularly for features with high covariance dependence [60].

Image-based harmonization methods offer an alternative approach. Generative Adversarial Networks (GANs) can learn complex mappings between different scanner domains, potentially improving both quantitative feature stability and visual image quality [59]. However, one phantom study found that while GANs qualitatively enhanced image harmonization, they provided inferior statistical improvements in feature stability compared to ComBat, actually reducing classification performance in some cases [59].

Resolution harmonization methodologies specifically address spatial resolution variability by modeling the computed tomography process and employing deconvolution techniques [64]. These methods leverage multiple reconstructions from the same acquisition to estimate spatially variant point spread functions and generate images with enhanced resolution and reduced spatial variability [64].

Experimental Comparison of Harmonization Performance

Quantitative Performance Metrics

Recent studies have provided comprehensive quantitative comparisons of harmonization techniques across various metrics including reproducibility, feature stability, and classification performance.

Table: Experimental Performance Comparison of Harmonization Methods

Study Method Reproducibility Improvement Stability Change Classification AUC Impact
CT Phantom Study [59] ComBat +31.58% +5.24% +15.19%
CT Phantom Study [59] GAN +8.00% -4.33% -2.56%
Multi-center CT Study [60] ComBat Significant improvement 43.8% features stable Enhanced model performance
Multi-center CT Study [60] CovBat Superior to ComBat 53.1% features stable Best model performance
NSCLC Clinical Study [63] ComBat 76% to 0% protocol-dependent features N/A N/A

In a dedicated phantom study evaluating harmonization for liver lesion classification, ComBat demonstrated substantial improvements across all metrics, while GAN-based harmonization showed mixed results—improving reproducibility but reducing stability and classification performance [59]. This highlights that qualitative image improvements do not necessarily translate to better quantitative feature performance.

A multi-center, multi-device study further revealed that CovBat outperformed ComBat, with 53.1% of features achieving stability after CovBat harmonization compared to 43.8% with ComBat [60]. The study also found that feature categories responded differently to harmonization, with first-order and texture features showing varying sensitivity to batch effects [60].

Impact on Machine Learning Model Performance

The ultimate test of harmonization effectiveness lies in its impact on downstream machine learning tasks. In endometrial cancer research, one study developed a CT radiomics-based model for differentiating malignant and benign tumors that achieved an AUC of 0.96 using Random Forest classifiers [3]. While this study did not explicitly report harmonization methods, it underscores the potential of well-curated multi-center radiomics.

Harmonization becomes particularly crucial for predicting molecular subtypes in endometrial cancer, where subtle radiographic patterns may correlate with genomic profiles [7] [61]. One study successfully predicted microsatellite instability and high tumor mutation burden from contrast-enhanced CT in endometrial cancers with AUCs of 0.78 and 0.87 respectively [61], demonstrating the potential of radiomics as a non-invasive biomarker—a application that would heavily depend on effective multi-center harmonization.

Practical Implementation Framework

Workflow for Multi-center Harmonization

The following diagram illustrates a comprehensive workflow for implementing harmonization in multi-center radiomics studies:

G Multi-center Radiomics Harmonization Workflow cluster_1 1. Data Collection cluster_2 2. Feature Extraction cluster_3 3. Harmonization cluster_4 4. Validation Multicenter Multi-center CT Image Acquisition Segmentation Tumor Segmentation Multicenter->Segmentation Protocol Protocol Documentation Protocol->Segmentation Extraction Radiomic Feature Extraction Segmentation->Extraction IBSI IBSI Standard Compliance Extraction->IBSI BatchID Batch Effect Assessment IBSI->BatchID MethodSelect Method Selection (ComBat/CovBat/Image-based) BatchID->MethodSelect Apply Apply Harmonization MethodSelect->Apply Quality Quality Metrics Evaluation Apply->Quality Biological Biological Signal Preservation Quality->Biological Model Model Performance Validation Biological->Model

Table: Essential Tools for Radiomics Harmonization Research

Tool Name Type Function Implementation
ComBat Harmonization Software Package Removes batch effects from feature data R (sva package), Python
CovBat Software Package Extends ComBat with covariance correction R
PyRadiomics Feature Extraction Standardized radiomic feature extraction Python
LIFEx Feature Extraction IBSI-compliant feature extraction Standalone software
ITK-SNAP Image Processing Manual segmentation of regions of interest Standalone software
EARL Phantom Quality Assurance Standardized phantom for cross-scanner validation Physical phantom

Application to Endometrial Cancer Research

In endometrial cancer radiomics, effective harmonization enables more reliable differentiation of molecular subtypes such as POLE-mutated, MMR-deficient, p53-abnormal, and no specific molecular profile (NSMP) tumors from CT imaging [7]. One multicenter study developed a clinical-radiomics deep learning model that achieved macro-average AUCs of 0.79 in internal validation and 0.74 in external validation for classifying these subtypes [7], demonstrating the potential of properly harmonized multi-center data.

The choice of harmonization technique should be guided by specific research objectives. For endometrial cancer studies focusing on predictive model development across multiple institutions, ComBat or CovBat provide practical feature-based solutions [60] [63]. When working with images from scanners with substantially different resolution characteristics, image-based harmonization may be necessary as a preprocessing step [64].

Methodological quality assessment using tools like RQS (Radiomics Quality Score) and METRICS (METhodological RadiomICs Score) has shown that endometrial cancer radiomics research generally exhibits good methodological quality with recent improvements, though standardization remains essential for clinical translation [65].

Harmonization techniques are indispensable for robust multi-center radiomics research in endometrial cancer. The experimental evidence strongly supports ComBat and its advanced variant CovBat as effective methods for reducing scanner-related variability while preserving biological signals. The choice between feature-based and image-based harmonization should be guided by specific research needs, data availability, and computational resources.

Future developments in harmonization will likely include deep learning approaches that more effectively separate biological from technical variability, standardized phantom-based calibration protocols, and adaptive methods that continuously learn from incoming multi-center data. As endometrial cancer radiomics progresses toward clinical application, rigorous harmonization will remain essential for developing reliable, generalizable models that can genuinely impact patient care.

Endometrial cancer (EC) remains the most prevalent gynecologic malignancy in high-income countries, with rising incidence and mortality rates driving the need for more efficient and accurate diagnostic workflows [8] [11]. The integration of artificial intelligence (AI) and radiomics into clinical practice presents unprecedented opportunities to enhance diagnostic accuracy while optimizing resource utilization. Radiomics—the high-throughput extraction of quantitative features from medical images—leverages routine imaging data to uncover tumor characteristics imperceptible to the human eye, creating new pathways for non-invasive diagnosis and prognosis [66] [3]. This transformation is particularly relevant in computed tomography (CT) radiomics, where automation can standardize interpretation while expert oversight ensures nuanced clinical application.

The fundamental challenge in modern healthcare workflows lies in balancing technological automation with human expertise. As the Office of the National Coordinator for Health Information Technology (ONC) emphasizes, effective automation must "add value, not burden," requiring thoughtful integration that complements rather than replaces clinical judgment [67]. This balance is critical in endometrial tumor classification, where diagnostic decisions directly impact surgical planning and treatment strategies. This article examines the current landscape of CT radiomics for endometrial tumor classification, evaluating performance metrics across multiple studies while providing a framework for implementing these technologies within clinically validated workflows that appropriately balance automation with expert oversight.

Performance Comparison of CT Radiomics Models for Endometrial Tumor Classification

Diagnostic Performance Across Multiple Studies

Recent research demonstrates significant advances in CT radiomics for endometrial cancer, with several studies reporting strong performance metrics for various classification tasks. The table below summarizes key findings from multiple investigations, highlighting the efficacy of different machine learning approaches:

Table 1: Performance Comparison of CT Radiomics Models in Endometrial Cancer Classification

Study Focus Best Performing Model Sample Size Performance Metrics Reference
Benign vs. Malignant Classification Random Forest 83 patients (46 malignant, 37 benign) Testing AUROC: 0.96, Sensitivity: 100%, Specificity: 92.31% [3]
Endometrial Cancer Detection ViTNet (Hybrid ResNet50 + Vision Transformer) 300 patients (22-85 years) Accuracy: 86.99% for CT images [8]
Recurrence Prediction Multiple Models (LASSO-Cox, CoxBoost, Random Forest) 81 EC cases Test set AUC: 0.86-0.90, Sensitivity: 0.89-1.00, Specificity: 0.73-0.90 [11]

The performance variations across studies reflect differences in sample characteristics, imaging protocols, and model architectures. Notably, the Random Forest model demonstrated exceptional performance in differentiating malignant from benign endometrial tumors, achieving perfect sensitivity while maintaining high specificity [3]. This balanced performance profile is particularly valuable in clinical settings where both false negatives and false positives carry significant consequences.

Comparison with MRI-Based Approaches

While CT radiomics shows promising results, magnetic resonance imaging (MRI) continues to offer advantages in certain applications due to its superior soft tissue contrast. A comparative deep learning study evaluating both modalities reported MRI accuracy of 90.24% versus CT accuracy of 86.99% in endometrial cancer detection [8]. This performance differential reflects MRI's enhanced capability for visualizing anatomical details relevant to endometrial characterization. However, CT remains widely used for initial staging due to faster acquisition times, broader availability, and utility in assessing extrauterine spread and lymphadenopathy [8] [11]. These practical considerations make CT radiomics particularly valuable for centers where MRI access is limited or for patients with contraindications to MRI.

Experimental Protocols and Methodologies

Image Acquisition and Preprocessing Standards

Consistent imaging protocols are fundamental to reproducible radiomics analysis. Across multiple studies, Contrast-Enhanced CT (CE-CT) scans, particularly the venous phase, have been prioritized for endometrial cancer assessment due to superior parenchymal characterization and enhanced tumor-myometrium contrast [11]. Typical acquisition parameters include slice thickness ranging from 1-5mm, with variations in mA (23-605) and kV (100-140) settings across different scanner platforms [11].

Critical to the preprocessing pipeline is image standardization, which typically includes bias field correction, pixel min-max normalization, and histogram equalization to mitigate scanner-specific variations [7]. These steps ensure feature extraction consistency across multi-center studies, enhancing model generalizability. The reliance on venous phase imaging reflects its clinical utility in highlighting parenchymal characteristics and contrast dynamics essential for endometrial tumor characterization [11].

Tumor Segmentation and Feature Extraction

Segmentation methodologies represent a critical junction between automated and expert-driven processes in radiomics workflows. The predominant approach involves manual or semi-automatic slice-by-slice contouring of uterine volumes by experienced radiologists or gynecology experts [11] [7]. This labor-intensive process highlights the need for expert oversight while presenting an opportunity for future automation.

Table 2: Essential Research Reagent Solutions for CT Radiomics in Endometrial Cancer

Tool Category Specific Solutions Primary Function Application Context
Segmentation Software MIM Software (v.7.1.4), ITK-SNAP (v.3.8.0) Manual/semi-automatic tumor contouring Volume of interest (VOI) definition for feature extraction
Feature Extraction Platforms PyRadiomics (Python library) High-throughput radiomic feature calculation Standardized extraction of 100+ features from medical images
Machine Learning Frameworks Scikit-learn, XGBoost, Random Forest, TabPFN Model development and validation Building predictive classifiers for tumor classification
Visualization & Analysis SHAP (SHapley Additive exPlanations) Model interpretability and feature importance Explaining model predictions and identifying key radiomic features

Feature extraction typically employs standardized platforms like PyRadiomics, with studies reporting extraction of 107-1132 radiomic features spanning first-order statistics, shape-based features, and texture features from matrices including Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), and Gray Level Size Zone Matrix (GLSZM) [11] [3]. The transition from manual segmentation to feature extraction represents a natural automation boundary where computational methods excel without compromising clinical validity.

Model Development and Validation Frameworks

Rigorous validation methodologies are essential for clinical translation of radiomics models. The predominant approach employs k-fold cross-validation (typically 5- or 10-fold) with separate training and testing cohorts, sometimes supplemented by external validation across multiple institutions [11] [7] [3]. Data augmentation and balancing techniques address class imbalance issues, while multiple machine learning algorithms (including Random Forest, XGBoost, Support Vector Machines, and neural networks) are compared to identify optimal performance [11] [3].

The increasing emphasis on explainable AI in medical applications has promoted the integration of SHAP (SHapley Additive exPlanations) analysis, which identifies feature importance and enhances clinical interpretability [3]. This focus on model transparency represents a key aspect of expert oversight, enabling clinicians to understand the basis for automated classifications and identify potential failure modes.

workflow cluster_auto Automated Components cluster_human Expert Oversight Components CT Image Acquisition CT Image Acquisition Expert Tumor Segmentation Expert Tumor Segmentation CT Image Acquisition->Expert Tumor Segmentation Automated Feature Extraction Automated Feature Extraction Expert Tumor Segmentation->Automated Feature Extraction Feature Selection & Engineering Feature Selection & Engineering Automated Feature Extraction->Feature Selection & Engineering Model Training Model Training Feature Selection & Engineering->Model Training Performance Validation Performance Validation Model Training->Performance Validation Clinical Implementation Clinical Implementation Performance Validation->Clinical Implementation Expert Review & Oversight Expert Review & Oversight Clinical Implementation->Expert Review & Oversight Expert Review & Oversight->CT Image Acquisition

Figure 1: Clinical Integration Workflow for CT Radiomics. This diagram illustrates the interplay between automated processes and expert oversight in endometrial tumor classification.

Workflow Integration: Balancing Automation and Expertise

Strategic Implementation Frameworks

Successful integration of CT radiomics into clinical practice requires thoughtful consideration of which workflow components benefit from full automation versus those requiring expert oversight. The ONC identifies six priorities for healthcare workflow automation, including enabling discovery of redundant tasks, ensuring clinician readiness, and leveraging interoperable health data [67]. These principles directly apply to endometrial tumor classification, where automation excels in quantitative feature analysis while human expertise remains crucial for contextual interpretation.

Critical considerations for balanced implementation include:

  • Segmentation Verification: While automated segmentation algorithms show promise, expert radiologist review ensures accuracy in tumor boundary delineation, with studies utilizing Dice Similarity Coefficients (DSC ≥0.8) and Mean Distance to Agreement (MDA ≤3mm) for quality control [11].

  • Model Interpretation Oversight: Even high-performance algorithms require clinician review for complex cases, with SHAP analysis and feature visualization providing decision support rather than replacement [3].

  • Workflow Integration: Automated systems must interface seamlessly with existing electronic health record systems and imaging platforms to avoid disruptive changes to clinical routines [67] [68].

Validation and Quality Assurance Protocols

Robust validation frameworks are essential for maintaining diagnostic accuracy in automated systems. Multi-center validation remains the gold standard, with studies demonstrating performance maintenance across different patient populations and scanner types [7]. Ongoing monitoring protocols should include:

  • Continuous Performance Metrics Tracking: Monitoring sensitivity, specificity, and AUC metrics across patient subgroups identifies performance drift or bias [11] [3].

  • Regular Audit Cycles: Scheduled reviews of discordant cases between model predictions and final clinical diagnoses inform model refinement [68].

  • Reference Standard Adherence: Histopathological confirmation maintains diagnostic accuracy, with automated systems positioned as decision support rather than replacement for gold-standard diagnostics [3].

validation cluster_auto Automated Processes cluster_human Expert Oversight Model Development Model Development Internal Validation Internal Validation Model Development->Internal Validation External Validation External Validation Internal Validation->External Validation Clinical Deployment Clinical Deployment External Validation->Clinical Deployment Performance Monitoring Performance Monitoring Clinical Deployment->Performance Monitoring Expert Review Committee Expert Review Committee Performance Monitoring->Expert Review Committee Audit Trail Maintenance Audit Trail Maintenance Performance Monitoring->Audit Trail Maintenance Model Refinement Model Refinement Expert Review Committee->Model Refinement Model Refinement->Model Development Audit Trail Maintenance->Expert Review Committee

Figure 2: Performance Validation Framework for CT Radiomics Models. This diagram illustrates the continuous validation cycle integrating automated monitoring with expert committee review.

CT radiomics for endometrial tumor classification represents a transformative advancement in gynecologic oncology, with studies consistently demonstrating high diagnostic accuracy, robust recurrence prediction, and reliable benign-malignant differentiation [8] [11] [3]. The performance metrics rival traditional assessment methods while offering non-invasive alternatives to more expensive or less available imaging modalities. However, the clinical value of these technologies depends not only on algorithmic performance but also on effective workflow integration that strategically balances automation with expert oversight.

The future of endometrial cancer diagnosis will undoubtedly incorporate increasingly sophisticated AI tools, but their successful implementation requires maintaining clinical expertise at critical decision points. This balanced approach ensures that technological advancements enhance rather than disrupt the diagnostic process, ultimately improving patient outcomes through more accurate, efficient, and accessible endometrial tumor classification.

Multi-center Validation and Comparative Performance Assessment

The preoperative differentiation between benign and malignant endometrial tumors is a critical challenge in gynecologic oncology, directly influencing surgical planning and patient management. Radiomics, the high-throughput extraction of quantitative features from medical images, has emerged as a powerful tool for developing objective diagnostic models [69]. By converting medical images into mineable data, radiomics can reveal tumor heterogeneity and characteristics that may not be discernible to the human eye [40]. Within this field, the Area Under the Receiver Operating Characteristic Curve (AUC or AUROC) has become a paramount metric for evaluating model performance, particularly in testing cohorts, where it demonstrates a model's generalizability and true diagnostic potential. This guide provides a comparative analysis of experimental strategies that have achieved high AUC values in the testing phase for endometrial tumor classification, focusing on CT radiomics and its alternatives.

Performance Comparison of High-AUC Models

The diagnostic performance of a model is ultimately proven by its performance on an independent testing cohort, which assesses its ability to generalize to new, unseen data. The following tables summarize the key performance metrics, including AUC, of several recent studies on endometrial tumor classification.

Table 1: Comparative Performance of Radiomics Models in Testing Cohorts

Study (Year) Imaging Modality Model/Algorithm Testing Cohort AUC Sensitivity Specificity Primary Diagnostic Task
Random Forest Model (2025) [3] CT Random Forest 0.96 100% 92.31% Differentiate malignant vs. benign endometrial tumors
ViTNet Model (2025) [8] MRI Hybrid ResNet50 + Vision Transformer 0.90 (Accuracy) - - Classify endometrial cases as benign, malignant, or normal
ViTNet Model (2025) [8] CT Hybrid ResNet50 + Vision Transformer 0.87 (Accuracy) - - Classify endometrial cases as benign, malignant, or normal
Radiomics Nomogram (2023) [40] MRI Logistic Regression-based Nomogram 0.86 - - Differentiate benign vs. malignant endometrial lesions
Ultrasound CNN Model (2025) [70] Ultrasound Pre-trained Inception-V3 CNN 0.91 (Mean) - - Predict depth of myometrial invasion (Binary)

Table 2: Advanced Model Performance on Complex Classification Tasks

Study (Year) Imaging Modality Model Type Testing AUC (Macro-Average) Classification Task
Clinical-Radiomics DL Model (2025) [7] MRI Clinical-Radiomics Deep Learning 0.74 (External Validation) Molecular subtyping (POLEmut, NSMP, p53abn)
All-Combined Model (2025) [71] MRI Integrated Clinical, Radiomics & DL 0.88 (External Testing) Differentiate Uterine Serous Carcinoma from Endometrioid Carcinoma

Detailed Experimental Protocols for High-Performance Models

CT Radiomics Model with Random Forest (AUC: 0.96)

This study developed an explainable machine learning model using pre-surgical CT scans to precisely diagnose malignancy in endometrial cancer patients [3].

  • Patient Cohort: The research was a two-center study involving 83 EC patients (46 malignant, 37 benign). Data was split into a training set (n=59) and a testing set (n=24) [3].
  • Image Segmentation & Feature Extraction: Regions of interest (ROIs) were manually segmented from the pre-surgical CT scans. A high-throughput extraction of 1,132 radiomic features was then performed using the PyRadiomics platform, capturing information on intensity, texture, and shape [3].
  • Feature Selection & Model Training: Six different explainable machine learning algorithms were implemented and compared. The Random Forest model emerged as the optimal choice. To enhance interpretability, SHAP (SHapley Additive exPlanations) analysis was employed to identify the most important radiomic features driving the predictions, all of which were significantly associated with EC (p < 0.05) [3].
  • Validation & Evaluation: The model was rigorously evaluated on the independent testing set. Beyond AUC, performance was assessed using sensitivity, specificity, precision, F1 score, and area under the precision-recall curve (AUPRC). Decision curve analysis (DCA) was also conducted, confirming the model's clinical utility by demonstrating a higher net benefit compared to "treat all" or "treat none" strategies [3].

Multi-Modality Deep Learning Model (MRI AUC: 0.90, CT AUC: 0.87)

This research directly compared the diagnostic performance of MRI and CT using a sophisticated deep-learning approach [8].

  • Datasets: The study introduced two new datasets from King Abdullah University Hospital: the KAUH Endometrial Cancer MRI dataset (KAUH-ECMD) and the KAUH Endometrial Cancer CT dataset (KAUH-ECCTD), collected from 300 patients [8].
  • Model Architecture: A hybrid deep learning model, termed ViTNet, was proposed. This model combined the architectural strengths of ResNet50, a powerful Convolutional Neural Network (CNN), with a Vision Transformer (ViT), which leverages self-attention mechanisms to capture global contextual information in images [8].
  • Training & Classification: The model was trained to classify endometrial images into three categories: benign, malignant, and normal. The superior soft-tissue contrast of MRI contributed to the higher accuracy achieved compared to CT [8].
  • Validation: The model's performance was validated on the respective MRI and CT datasets, and it was also compared against a set of pre-trained models to benchmark its performance [8].

MRI-Based Radiomics Model for Molecular Subtyping (AUC: 0.74)

Accurate classification of molecular subtypes is crucial for prognostic risk assessment in endometrial cancer. This study developed a model to address this complex task [7].

  • Patient Cohort & Molecular Data: This multicenter retrospective study included 526 EC patients from three institutions. Molecular pathological diagnosis (POLEmut, MMRd, NSMP, p53abn) following the TCGA classification served as the reference standard [7].
  • Multi-Feature Extraction: The study extracted 386 handcrafted radiomics features from each MR sequence (T2WI, DWI, DCE-T1WI). Additionally, a contrastive self-supervised learning method (MoCo-v2) was employed using a pre-trained ResNet-50 network to extract 2,048 deep learning features per patient, capturing complex patterns not defined by handcrafted features [7].
  • Feature Selection & Model Fusion: Feature selection integrated the chosen features into 12 machine learning methods. The final clinical-radiomics DL model combined selected clinical data, handcrafted radiomics features, and deep learning features [7].
  • Multi-Center Validation: The model was validated on both internal and external validation cohorts, demonstrating robust performance across different institutions and proving its generalizability for distinguishing EC molecular subtypes [7].

Workflow of a Radiomics Study for Endometrial Cancer

The following diagram illustrates the standardized, end-to-end pipeline for developing and validating a radiomics model, from initial data collection to clinical application.

G Start Patient Cohort & Imaging Data (MRI/CT) A 1. Image Acquisition & Pre-processing Start->A B 2. Tumor Segmentation (Manual/Semi-automatic) A->B C 3. High-Throughput Feature Extraction B->C D 4. Feature Selection & Engineering C->D E 5. Model Training (e.g., Random Forest, CNN) D->E Data2 Radiomics Signature / Rad-Score D->Data2 F 6. Model Validation (Internal/External Test Cohort) E->F Data3 Trained & Validated Predictive Model E->Data3 End Clinical Application: Diagnosis, Prognosis, Treatment Planning F->End Data1 Clinical & Molecular Data Data1->D Integration Data2->E Data3->F

The Scientist's Toolkit: Essential Research Reagents & Solutions

The following table details key software, platforms, and methodological tools that are foundational to conducting radiomics research in endometrial cancer.

Table 3: Key Research Reagents and Computational Tools

Tool/Solution Category Primary Function Example Use in Research
PyRadiomics [3] [7] Feature Extraction Standardized, open-source platform for high-throughput extraction of radiomic features from medical images. Used to extract 1,132 features from CT scans [3] and 386 handcrafted features from MRI [7].
ITK-SNAP [7] [40] Image Segmentation Interactive software for manual, semi-automatic, and automatic segmentation of regions of interest (ROIs). Used by radiologists to manually delineate tumor borders on MRI slices [7] [40].
Tree-based Models (e.g., Random Forest) [3] Machine Learning Ensemble learning method for classification and regression; robust against overfitting and handles high-dimensional data well. Identified as the optimal model, achieving an AUC of 0.96 for CT-based diagnosis [3].
Deep Learning Frameworks (e.g., ResNet, ViT) [8] [7] Deep Learning Pre-trained or custom neural networks for automatic feature learning and complex pattern recognition from images. A hybrid ResNet50+Vision Transformer model was used for classification [8]. MoCo-v2 with ResNet-50 extracted DL features [7].
SHAP (SHapley Additive exPlanations) [3] Model Interpretation Explains the output of any machine learning model, identifying which features contributed most to a prediction. Implemented to provide explainability and identify the most important radiomic features in the Random Forest model [3].

Achieving a high AUC in testing cohorts is the benchmark for a robust and generalizable radiomics model. The evidence demonstrates that CT radiomics is a highly competent modality for the binary classification of malignant and benign endometrial tumors, with the Random Forest model achieving a standout AUC of 0.96 [3]. However, MRI maintains a performance advantage in more complex diagnostic scenarios, such as molecular subtyping and histological classification, due to its superior soft-tissue resolution [8] [7] [71]. The consistent key to high performance across all modalities lies in rigorous methodology: multi-center data sourcing, fusion of handcrafted and deep-learning features, robust external validation, and the application of explainable AI techniques to build trust and insight into model predictions. Future research should focus on the prospective validation of these models and their integration into clinical workflows to truly personalize the management of endometrial cancer.

The clinical application of radiomics models hinges on their generalizability—the ability to maintain diagnostic performance when applied to new patient data from institutions not involved in the initial model development. Multi-center validation represents the most robust methodological standard for assessing this critical characteristic, providing essential evidence for the potential real-world clinical utility of quantitative imaging biomarkers [72] [7].

Within endometrial cancer (EC) research, the transition from single-center proof-of-concept studies to rigorously validated multi-center models marks a significant advancement toward clinical implementation. This guide systematically compares the generalizability of recently developed radiomics models for EC classification, with particular focus on the emerging evidence for CT-based approaches alongside the more established MRI-based methodologies.

Performance Comparison of Multi-Center Radiomics Models

Table 1: Performance Metrics of Multi-center Radiomics Models for Endometrial Cancer Classification

Primary Task Imaging Modality Centers / Sample Size Best Performing Algorithm Internal Validation AUC External Validation AUC Key Clinical Application
Malignant vs. Benign Differentiation [3] CT 2 / 83 patients Random Forest 1.00 0.96 Pre-surgical characterization of endometrial tumors
Molecular Subtype Classification [7] MRI 3 / 526 patients Clinical-Radiomics DL Model 0.79 (Macro-average) 0.74 (Macro-average) Preoperative assessment of molecular subtypes (POLEmut, MMRd, NSMP, p53abn)
HER2 Status Prediction [73] MRI 3 / 492 patients SVM-based Fusion Model 0.914 0.809-0.865 Identifying candidates for HER2-targeted therapies
Recurrence Risk Prediction [74] [75] MRI 1 / 184 patients Logistic Regression Nomogram Not explicitly reported Model demonstrated stable performance via 10-fold cross-validation Stratifying patients for personalized adjuvant therapy and monitoring
USC vs. EEC Subtype Differentiation [72] MRI 4 / 210 patients Combined Clinical-Radiological-DL Model 0.957 0.880 Tailoring surgical extent and adjuvant treatment plans

Table 2: Model Generalizability and Validation Evidence

Primary Task Validation Strategy Additional Validation Explainability / Clinical Integration Reported Limitations
Malignant vs. Benign Differentiation [3] Train-Test Split (Two-center) 5-fold Cross-Validation, Calibration Curves, Decision Curve Analysis SHAP analysis for feature importance; Feature maps for visualization Relatively small sample size; Single CT acquisition protocol
Molecular Subtype Classification [7] Training, Internal, and External Validation Cohorts 10-fold Cross-Validation CLEAR checklist and METRICS tool for standardized reporting Class imbalance between molecular subtypes
HER2 Status Prediction [73] Training, Internal, and Two External Validation Cohorts DeLong's Test, Calibration Curves, Decision Curve Analysis Fusion nomogram combining Rad-score and clinical predictors Model performance varies across different validation cohorts
Recurrence Risk Prediction [74] [75] Train-Test Split (7:3) with 10-fold Nested Cross-Validation Clinical Impact Curve Analysis SHAP analysis; Nomogram integrating clinical factors and Rad-score Single-center design for model development
USC vs. EEC Subtype Differentiation [72] Training, Internal-Test, and External-Test Cohorts Decision Curve Analysis for clinical utility Combined model integrating clinical, radiological, radiomics, and DL features DL model did not show statistically significant improvement over clinical-radiological model

Analysis of Experimental Protocols

Common Methodological Framework for Radiomics Model Development

The generalizability of radiomics models depends on a standardized development pipeline, illustrated below and consistently implemented across the studies analyzed.

G Data Multi-Center Data Collection Preprocessing Image Preprocessing (Resampling, Normalization) Data->Preprocessing Segmentation Tumor Segmentation (Manual/ROI Delineation) Preprocessing->Segmentation FeatureExtraction High-Dimensional Feature Extraction (Shape, Texture, Intensity) Segmentation->FeatureExtraction FeatureSelection Feature Selection (ICC, LASSO, mRMR) FeatureExtraction->FeatureSelection ModelTraining Model Training with Multiple Algorithms FeatureSelection->ModelTraining Validation Multi-Center Validation (Internal & External Testing) ModelTraining->Validation ClinicalIntegration Clinical Integration & Explainability Validation->ClinicalIntegration

Key Methodological Elements for Generalizability

Data Collection and Image Acquisition

All studies implemented multi-center retrospective designs with clearly defined inclusion/exclusion criteria. The CT-based endometrial tumor differentiation study specifically recruited patients from two independent centers (n=83), with data split into training (n=59) and testing (n=24) sets [3]. MRI-based studies generally involved larger sample sizes (n=184-526) from 3-4 centers [72] [7] [73]. Crucially, all studies reported that there were no statistically significant differences in clinical characteristics between the training and validation cohorts, establishing a foundation for fair performance comparison.

Image Preprocessing and Tumor Segmentation

Standardized preprocessing protocols were consistently applied across studies to minimize inter-scanner variability. This typically included image resampling to isotropic voxels (e.g., 1mm³) and intensity normalization [76] [77]. Tumor segmentation was performed manually by experienced radiologists using specialized software (ITK-SNAP, 3D Slicer, or proprietary platforms). To ensure feature reproducibility, most studies calculated intraclass correlation coefficients (ICCs), retaining only features with ICC >0.80 [7] [74].

Feature Extraction and Selection

Feature extraction employed standardized platforms, primarily PyRadiomics, yielding 386-1688 radiomics features per region of interest. Feature selection implemented rigorous multi-step pipelines:

  • Reproducibility Filtering: ICC analysis to remove unstable features [74]
  • Redundancy Reduction: Spearman's correlation or minimum Redundancy Maximum Relevance (mRMR) to eliminate highly correlated features [74] [75]
  • Predictive Feature Selection: Least Absolute Shrinkage and Selection Operator (LASSO) or recursive feature elimination to identify the most informative features [72] [77] [73]
Model Development and Validation Strategies

Studies employed diverse machine learning algorithms to identify optimal performance. The CT-based model for malignant/benign differentiation tested six algorithms, finding Random Forest most effective [3]. For recurrence prediction, nine classifiers were evaluated, with Logistic Regression performing best [74] [75]. Multi-center validation consistently demonstrated that combined models (integrating radiomics with clinical features) outperformed radiomics-only or clinical-only models [7] [73].

The Scientist's Toolkit: Essential Research Reagents and Platforms

Table 3: Key Research Reagents and Platforms for Radiomics Studies

Category Specific Tool / Platform Primary Function in Research Example Implementation
Image Analysis Software ITK-SNAP (v3.8.0) Manual tumor segmentation and ROI delineation Used for 3D tumor contouring across multiple sequences [7]
Radiomics Feature Extraction PyRadiomics (v3.0.1) Standardized extraction of high-dimensional imaging features IBSI-compliant feature extraction; 386+ features per ROI [3] [7]
AI Research Platforms uAI Research Portal (United Imaging) End-to-end radiomics analysis from segmentation to modeling Used for CT-based pneumonia radiomics studies [76]
AI Research Platforms DARWIN Intelligent Research Platform Integrated workflow for AI-powered medical imaging research Used for peritumoral feature extraction in recurrence prediction [74] [75]
Deep Learning Frameworks MoCo-v2 with ResNet-50 Self-supervised deep learning feature extraction from images Extracted 2048 DL features from 4 MR images per patient [7]
Statistical Analysis R (v4.1.0) & Python (v3.9.6) Statistical analysis and machine learning implementation Primary platforms for feature selection and model development [73]

Multi-center validation provides the most rigorous assessment of radiomics model generalizability, with external validation AUC values typically 0.05-0.15 lower than internal performance. The emerging evidence for CT-based endometrial tumor classification, while from a smaller sample size, demonstrates exceptional performance (AUC 0.96) [3], suggesting CT radiomics may offer a viable alternative to MRI-based approaches, particularly in settings where CT availability exceeds MRI access.

Methodologically, successful generalizability correlates with robust preprocessing, multi-step feature selection, and integration of clinical and radiomics data in combined models. Future research directions should prioritize prospective multi-center designs, standardization of imaging protocols across institutions, and development of more sophisticated harmonization techniques to further enhance model generalizability across diverse clinical environments.

Radiomics, the high-throughput extraction of quantitative features from medical images, has emerged as a powerful tool for tumor characterization, transforming medical imaging into mineable data [3] [78]. In gynecologic oncology, particularly for endometrial cancer (EC), accurate preoperative diagnosis is crucial for surgical planning and prognostic stratification [78] [7]. While computed tomography (CT) and magnetic resonance imaging (MRI) are both established modalities in clinical practice, their comparative performance in radiomics analysis presents a critical research focus. This guide objectively compares the diagnostic capabilities of CT and MRI radiomics, framing the analysis within performance validation for endometrial tumor classification research.

Performance Comparison: CT vs. MRI Radiomics

Direct Performance Metrics in Endometrial Cancer

A direct comparative study of CT and MRI using a hybrid deep learning model (ResNet50 and Vision Transformer) on endometrial images demonstrated a clear performance differential [8]. The results are summarized in the table below.

Table 1: Direct Comparison of CT and MRI Deep Learning Models for Endometrial Cancer Classification

Imaging Modality Deep Learning Model Diagnostic Accuracy Data Source
MRI ResNet50 + Vision Transformer (ViTNet) 90.24% KAUH-ECMD (300 patients) [8]
CT ResNet50 + Vision Transformer (ViTNet) 86.99% KAUH-ECCTD (300 patients) [8]

This study concluded that the MRI-based approach demonstrated superior diagnostic performance for detecting endometrial cancer compared to the CT-based classification [8].

Performance Across Various Cancer Types and Clinical Tasks

The performance gap between MRI and CT radiomics extends beyond endometrial cancer and is consistent across various clinical applications, from diagnosis to molecular subtyping.

Table 2: Comparative Performance of CT and MRI Radiomics Across Various Cancers and Tasks

Cancer Type Clinical Task Best Performing Model (Modality) Key Performance Metric Reference
Intrahepatic Cholangiocarcinoma (PLC) Diagnosing iCCA within PLC CT-MRI Fused Model AUC: 0.937 (test cohort) [79]
MRI DL Radiomics-Radiological (DLRR_MRI) AUC: 0.923 [79]
CT DL Radiomics-Radiological (DLRR_CT) AUC: 0.880 [79]
Endometrial Cancer Molecular Subtype Classification Clinical-Radiomics DL Model (MRI) Macro-average AUC: 0.79 (internal validation) [7]
Endometrial Cancer Differentiating Malignant vs. Benign Tumors Random Forest (CT Radiomics) AUROC: 0.96 (testing set) [3]
Nasopharyngeal Carcinoma Predicting Response to Induction Chemotherapy MRI-based Support Vector Machine Specificity: 80.7%, Accuracy: 73.2% [80]
Bladder Cancer Radiogenomic Staging MRI with RNA Sequencing Superior staging accuracy vs. CT, especially for ≥pT3 [81]

Experimental Protocols and Methodologies

CT Radiomics Protocol for Endometrial Cancer

A two-center study developed an explainable machine learning model for differentiating malignant and benign endometrial tumors from pre-surgical CT scans [3]. The protocol is detailed below:

  • Patient Cohort: 83 EC patients (46 malignant, 37 benign) split into training (n=59) and testing (n=24) sets.
  • Image Acquisition: Pre-surgical CT scans.
  • Tumor Segmentation: Manual delineation of Regions of Interest (ROIs).
  • Feature Extraction: 1132 radiomic features extracted using PyRadiomics.
  • Feature Analysis: SHAP (SHapley Additive exPlanations) analysis identified the top 20 features, 60% of which were texture features and 40% were first-order statistical features. Notably, 90% of the top features originated from transformed images (e.g., wavelet and Laplacian of Gaussian filters) [3].
  • Modeling: Six machine learning algorithms were implemented, including Logistic Regression, K-Nearest Neighbors, Support Vector Classifier, XGBoost, Random Forest, and TabPFNv2.
  • Validation: The Random Forest model emerged as the optimal choice, achieving an AUROC of 1.00 in training and 0.96 in testing, with 100% sensitivity and 92.31% specificity [3].

MRI Radiomics Protocol for Endometrial Cancer Molecular Subtyping

A multicenter study developed a clinical-radiomics deep learning model based on MRI to classify EC into the four molecular subtypes defined by The Cancer Genome Atlas (TCGA): POLEmut, MMRd, NSMP, and p53abn [7]. The workflow is as follows:

  • Patient Cohort: 526 patients across three institutions, divided into training, internal, and external validation sets.
  • Image Acquisition: Multiparametric MRI including T2WI, DWI (b=800/1000 s/mm²), and DCE-T1WI.
  • Tumor Segmentation: Manual segmentation using ITK-SNAP software along tumor borders on all sequences.
  • Feature Extraction:
    • Handcrafted Radiomics: 386 features per MR sequence extracted using the IBSI-compliant Pyradiomics package.
    • Deep Learning Features: A pretrained ResNet-50 network with MoCo-v2 (contrastive self-supervised learning) was used to extract 2048 DL features.
  • Feature Selection and Modeling: Integrated feature selection combined clinical, handcrafted radiomics, and DL features. The model was built using 12 machine learning algorithms.
  • Validation: The clinical-radiomics DL model demonstrated robust performance across multiple centers, outperforming models that used clinical or radiomics features alone [7].

MRI_Radiomics_Workflow Start Patient Cohort (n=526) MRI Multiparametric MRI Scan Start->MRI Seg Manual Tumor Segmentation MRI->Seg FeatExt Feature Extraction Seg->FeatExt Handcrafted Handcrafted Radiomics (386 features/sequence) FeatExt->Handcrafted DL Deep Learning Features (ResNet-50 + MoCo-v2) FeatExt->DL Fusion Feature Fusion & Selection Handcrafted->Fusion DL->Fusion Model Machine Learning (12 Algorithms) Fusion->Model Validation Multi-Center Validation Model->Validation Result Molecular Subtype Classification Validation->Result

Figure 1: MRI Radiomics Workflow for Endometrial Cancer Molecular Subtyping. This diagram illustrates the comprehensive pipeline from image acquisition to multi-center validation, highlighting the integration of handcrafted and deep learning features [7].

Comparative Radiomics Analysis Protocol

A study on intrahepatic cholangiocarcinoma (iCCA) provides a robust template for a head-to-head comparison of CT and MRI radiomics [79]:

  • Patient Cohort: 178 patients with pathologically confirmed primary liver cancer.
  • Image Acquisition and Segmentation: Patients underwent both CT and MRI. ROIs were delineated on nine sequences: non-contrast, arterial, and venous phases of CT; and T1WI, T2WI, DWI, arterial, venous, and delayed phases of MRI.
  • Model Construction: Six distinct models were built for each modality: radiomics-only (DLRS), radiological-features-only (R), and a combined model (DLRR).
  • Cross-Modal Fusion: A fused model combining the best CT and MRI models (DLRRCT and DLRRMRI) was developed using multivariate logistic regression.
  • Evaluation: Models were compared using ROC curves, calibration curves, and decision curve analysis (DCA). The MRI-based models demonstrated superior predictive performance than CT-based models, and the CT-MRI fused model yielded the highest AUC of 0.937 in the test cohort [79].

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Key Research Reagent Solutions for Radiomics Analysis

Item Name Function/Application Specific Examples from Literature
Image Analysis Software Manual segmentation of tumors and regions of interest. ITK-SNAP [79] [7], 3D Slicer [78]
Radiomics Feature Extraction Platforms Standardized extraction of quantitative features from medical images. PyRadiomics (IBSI-compliant) [3] [7]
Deep Learning Frameworks Extraction of deep learning features and model training. ResNet-50 [79] [7], ResNet50-ViT Hybrid [8]
Machine Learning Algorithms Building predictive models from extracted features. Random Forest [3] [82], Support Vector Machine (SVM) [80], LASSO regression [79] [78]
Molecular Pathology Kits Ground truth validation for molecular subtyping studies. BPTMplus Panel (for POLE, TP53, MMR genes) [7], FFPE DNA kits [7]

The consolidated evidence from multiple studies indicates that MRI generally holds an advantage over CT in radiomics analyses due to its superior soft tissue contrast, which enables more detailed characterization of tumor heterogeneity [79] [8] [7]. This is particularly evident in tasks like molecular subtyping of endometrial cancer and soft tissue characterization. However, CT radiomics remains a highly valuable and performant tool, especially for specific clinical questions like benign-malignant classification of endometrial tumors, and offers practical benefits of wider availability, faster scan times, and lower cost [3] [8]. The most promising future direction lies in the development of cross-modal models that strategically fuse data from both CT and MRI, leveraging the strengths of each modality to achieve diagnostic performance superior to either alone [79]. For researchers validating CT radiomics performance in endometrial tumor classification, these findings affirm its robust capability while highlighting MRI as the benchmark for advanced characterization tasks.

This guide provides an objective comparison of Decision Curve Analysis (DCA) as a method for assessing the clinical utility of prediction models, contextualized within performance validation frameworks for CT radiomics in endometrial tumor classification research.

Decision Curve Analysis (DCA) is a quantitative method that evaluates the clinical value of diagnostic and prognostic models by integrating clinical consequences and patient preferences into assessment metrics [83]. Unlike traditional statistical measures such as area under the receiver operating characteristic curve (AUC), sensitivity, or specificity, DCA directly accounts for the relative clinical impact of different types of classification errors [84]. Conventional metrics measure a model's discriminatory accuracy but fail to incorporate clinical considerations, such as whether it is worse to miss a cancer (false negative) than to perform an unnecessary biopsy (false positive) [85]. DCA addresses this limitation by introducing the concept of net benefit, which quantifies the trade-off between benefits (true positives) and harms (false positives) across a range of clinically reasonable threshold probabilities [83] [86].

The core output of DCA is a decision curve that plots net benefit against threshold probabilities, enabling researchers to compare multiple models or strategies and determine which provides the greatest clinical utility across different preference scenarios [86]. This methodology has been recommended by major medical journals and the TRIPOD guidelines for developing prediction models, representing a significant advancement toward transparent, clinically-informed decision-making [83].

Methodological Framework of DCA

Core Components and Calculations

The mathematical foundation of DCA rests on three key components: net benefit, threshold probability, and the exchange rate between false positives and true positives.

Net Benefit is calculated using the formula [83] [84]:

$$Net\ Benefit = \frac{True\ Positives}{n} - \frac{False\ Positives}{n} \times \frac{Pt}{1-Pt}$$

where:

  • $n$ is the total number of patients
  • $P_t$ is the threshold probability

This net benefit calculation can be interpreted as the number of true positive classifications per patient, after adjusting for the harm of false positives weighted by the exchange rate derived from the threshold probability [83]. A model with higher net benefit across a range of threshold probabilities is considered clinically superior.

Threshold Probability ($P_t$) represents the minimum probability of disease at which a clinician would recommend intervention [86] [84]. This probability reflects clinical preferences about the relative harm of missing a disease versus undertaking unnecessary treatment. For example, in endometrial cancer assessment, a conservative clinician might set a threshold probability of 5% (willing to treat 19 false positives for each true positive), while a more cautious clinician might set a threshold of 50% (only willing to treat one false positive per true positive) [83].

Reference Strategies for Comparison

DCA typically includes two fundamental reference strategies for benchmarking model performance [86]:

  • Treat All: The net benefit of intervening on all patients, calculated as prevalence - (1-prevalence) × [P_t/(1-P_t)]
  • Treat None: The net benefit of intervening on no patients, which is always zero

A clinically useful prediction model should demonstrate higher net benefit than both these default strategies across a range of clinically reasonable threshold probabilities.

Experimental Workflow for DCA

The following diagram illustrates the standard experimental workflow for implementing Decision Curve Analysis in clinical prediction model research:

DCA_Workflow Start Develop Prediction Model Data Collect Validation Data (n patients, outcome prevalence) Start->Data Prob Obtain Individual Predicted Probabilities Data->Prob Thresholds Define Threshold Probability Range (typically 1%-99%) Prob->Thresholds Calculate Calculate Net Benefit at Each Threshold Thresholds->Calculate Thresholds->Calculate Apply formula Compare Compare Net Benefit Across: - New Model - Treat All - Treat None - Existing Models Calculate->Compare Plot Plot Decision Curves Compare->Plot Compare->Plot Net benefit values Interpret Interpret Clinical Utility across Preference Spectrum Plot->Interpret

DCA Implementation in CT Radiomics Research

Experimental Protocols from Endometrial Tumor Classification

Recent research on CT radiomics for endometrial tumor classification provides exemplary protocols for implementing DCA in medical imaging research. A 2025 two-center study developed an explainable machine learning model for differentiating malignant and benign endometrial tumors using CT radiomics features [3] [14]. The experimental methodology encompassed the following key stages:

Patient Cohort and Data Splitting: The study included 83 endometrial cancer patients (46 malignant, 37 benign) from two medical centers. Data were split into a training set (n=59) and testing set (n=24) to ensure robust validation [3].

Image Acquisition and Segmentation: Pre-surgical CT scans were obtained for all patients. Regions of interest (ROIs) were manually segmented from these scans by experienced radiologists to define tumor boundaries for feature extraction [3].

Radiomic Feature Extraction: Using PyRadiomics, researchers extracted 1,132 radiomic features from each segmented tumor volume. These features captured quantitative information about tumor intensity, texture, and morphology that are not perceptible to the human eye [3].

Machine Learning Modeling: Six explainable machine learning algorithms were implemented and compared: Logistic Regression, K-Nearest Neighbors, Support Vector Classifier, XGBoost, Random Forest, and TabPFNv2. The Random Forest model emerged as optimal, achieving a testing AUC of 0.96 [3].

DCA Implementation: The clinical utility of the final model was evaluated using DCA, which compared the net benefit of the radiomics model against the default "treat all" and "treat none" strategies across a range of threshold probabilities [3].

Comparative Performance Data

The following table summarizes quantitative performance data from the endometrial tumor classification study, illustrating how DCA complements traditional discrimination metrics:

Table 1: Performance Metrics of CT Radiomics Model for Endometrial Tumor Classification

Metric Training Performance Testing Performance Clinical Interpretation
AUC 1.00 0.96 Excellent discrimination between malignant and benign tumors
Sensitivity 100% 100% Identifies all malignant cases in test set
Specificity 95.83% 92.31% Minimizes false positives
Net Benefit Superior to "All" and "None" strategies Superior to "All" and "None" strategies Clinical utility across preference thresholds

Beyond this specific application, DCA has demonstrated value across various cancer domains. In renal tumor classification, a 2025 study evaluating the TabPFN algorithm reported that DCA provided crucial insights into clinical utility that complemented traditional AUC metrics (0.935-1.000 training, 0.800-0.946 validation) [87]. Similarly, in gastric cancer research, DCA has been implemented to validate multimodal CT radiomics models predicting PD-1 inhibitor efficacy, with models achieving AUCs of 0.76-0.94 across internal and external validation cohorts [88].

Comparative Analysis of DCA Against Alternative Metrics

Advantages Over Traditional Performance Measures

DCA addresses critical limitations of conventional model assessment metrics by directly incorporating clinical consequences and patient preferences. The table below provides a structured comparison:

Table 2: DCA Versus Traditional Model Assessment Metrics

Assessment Method Key Strengths Key Limitations Clinical Relevance
Decision Curve Analysis Incorporates clinical consequences and patient preferences; Directly comparable across strategies; Intuitive net benefit interpretation Requires understanding of threshold probability concept; Does not replace discrimination metrics High - directly informs clinical decision-making
AUC-ROC Comprehensive discrimination assessment; Threshold-independent; Widely understood Does not incorporate clinical consequences; Limited intuitive interpretation Moderate - indicates accuracy but not clinical value
Sensitivity/Specificity Intuitive clinical interpretation; Simple calculation Single-threshold evaluation; Trade-off between metrics not quantified Moderate - familiar but incomplete for clinical utility
Calibration Measures Assesses probability accuracy; Important for risk prediction Does not incorporate clinical utility of predictions; Multiple metrics needed Moderate - necessary but insufficient alone

Interpretation Guidelines for Decision Curves

The practical value of DCA emerges through correct interpretation of decision curves. A simple, step-by-step approach facilitates understanding [86]:

  • Identify the Highest Curve: Across the range of threshold probabilities, the model or strategy with the highest net benefit (topmost curve) provides the greatest clinical utility at that preference level.

  • Determine the Preference Range: Establish the spectrum of threshold probabilities where the new model outperforms reference strategies. In endometrial cancer applications, this typically focuses on the clinically plausible range (e.g., 5%-50%) rather than the entire theoretical range [83].

  • Assess Clinical Superiority: A model is recommended for clinical use if it demonstrates higher net benefit than both "treat all" and "treat none" strategies across threshold probabilities that reflect realistic clinical preferences.

In the endometrial tumor classification study, DCA demonstrated that the radiomics model provided higher net benefit than both the "treat all" and "treat none" strategies across most threshold probabilities, supporting its potential clinical adoption for identifying high-risk cases and reducing unnecessary interventions [3].

Essential Research Reagents and Computational Tools

Successful implementation of DCA requires specific methodological tools and computational resources. The following table details essential "research reagents" for conducting robust DCA in CT radiomics research:

Table 3: Essential Research Reagents for DCA Implementation

Tool Category Specific Solutions Function in DCA Research
Statistical Software R Statistical Environment with 'decisioncurve' package [85]; Python with custom DCA functions [84] Calculate net benefit; Generate decision curves; Statistical comparisons
Radiomics Platforms PyRadiomics (Python) [3]; 3D Slicer with PyRadiomics integration [87] Standardized feature extraction from medical images; IBSI-compliant feature definitions
Image Analysis Tools ITK-SNAP [89]; 3D Slicer [87] Manual/automated tumor segmentation; ROI definition for feature extraction
Machine Learning Libraries Scikit-learn (Python) [3]; caret (R) [84] Develop and validate prediction models; Hyperparameter tuning
Validation Frameworks TRIPOD guidelines [83]; CLEAR checklist [89] Standardized reporting; Methodological rigor assessment

Technical implementation of DCA has been greatly facilitated by dedicated code resources. The MSKCC Decision Curve Analysis website (https://mskcc-epi-bio.github.io/decisioncurveanalysis/) provides comprehensive code for R, Stata, SAS, and Python, covering binary outcomes, time-to-event data, multivariable analysis, and advanced applications [85]. For R users, custom functions like ntbft() enable calculation of different net benefit types (treated, untreated, overall) and the ADAPT index, with options for cross-validation and bootstrap correction [84] [90].

Decision Curve Analysis represents a paradigm shift in prediction model assessment by directly quantifying clinical utility through net benefit. In the specific context of CT radiomics for endometrial tumor classification, DCA provides crucial evidence beyond traditional discrimination metrics, demonstrating that radiomics models offer superior clinical value compared to default strategies across realistic preference thresholds. The methodology's ability to incorporate the consequences of clinical decisions and patient preferences makes it an indispensable component of comprehensive model validation. As radiomics continues to advance cancer diagnostics, DCA will play an increasingly vital role in translating technical accuracy into meaningful clinical impact.

Radiomics, the high-throughput extraction of quantitative features from medical images, is emerging as a powerful non-invasive tool for tumor profiling. In endometrial cancer (EC), research increasingly demonstrates its capacity to predict clinical prognosis and uncover underlying molecular characteristics. This guide objectively compares the performance of CT-based radiomics against other imaging modalities and molecular techniques in prognostic profiling. We summarize validated experimental data, detail key methodological protocols, and explore the burgeoning connection between radiomic features and established EC molecular subtypes, providing a performance validation framework for researchers and drug development professionals.

Endometrial cancer (EC) is the most common gynecological malignancy in developed countries, with a globally rising incidence [91] [92] [93]. While early-stage disease often has a favorable prognosis, advanced or recurrent EC is associated with significantly diminished outcomes; the 5-year overall survival for stages IVA and IVB is a mere 17% and 15%, respectively [91]. This stark contrast highlights the critical need for accurate prognostic tools to guide personalized, aggressive treatment for high-risk patients while avoiding overtreatment for those with low-risk disease.

Traditional prognostication relies on a combination of clinical factors, histology, and surgical staging. However, even the widely used International Federation of Gynecology and Obstetrics (FIGO) staging system has an prognostic accuracy of only about 60-70% [94]. The advent of molecular classification has revolutionized EC management. The classification system—comprising POLE-mutated (POLEmut), mismatch repair-deficient (MMRd), p53 abnormal (p53abn), and no specific molecular profile (NSMP) subtypes—provides superior risk stratification and now guides treatment decisions, including the use of immunotherapy [95] [93]. A key limitation, however, is that this molecular data is typically acquired post-operatively.

Radiomics addresses this gap by leveraging standard-of-care preoperative imaging (e.g., CT, MRI) to extract sub-visual features that reflect tumor heterogeneity. When combined with machine learning, radiomic models can predict recurrence risk, survival, and potentially, molecular subtypes non-invasously, offering a powerful complement to existing diagnostic pathways.

Performance Comparison of Radiomics Modalities and Models

Different imaging modalities and analytic approaches yield varying levels of prognostic performance. The tables below compare the quantitative performance of various radiomics models.

Table 1: Performance of CT-based Radiomics Models for Predicting Endometrial Cancer Recurrence [11]

Machine Learning Model Training Set AUC Test Set AUC Sensitivity Specificity Prognostic Value
LASSO-Cox 0.92 0.90 0.89-1.00 0.73-0.90 Patients with high-risk prediction had significantly worse disease-free survival (p < 0.001)
CoxBoost 0.93 0.86 0.89-1.00 0.73-0.90 Patients with high-risk prediction had significantly worse disease-free survival (p < 0.001)
Random Forest (RFsrc) 0.92 0.88 0.89-1.00 0.73-0.90 Patients with high-risk prediction had significantly worse disease-free survival (p < 0.001)

Table 2: Performance of MRI-based Radiomics and Integrated Models in Endometrial Cancer

Model Type / Focus Imaging Modality Performance (AUROC) Key Findings
Deep Myometrial Invasion (DMI) Classifier [91] Multiparametric MRI Average AUROC: 0.83 Performance comparable to experienced radiologists.
Radiomics Nomogram for DMI [91] MRI AUROC: 0.871 - 0.883 Improved radiologists' diagnostic accuracy from ~80% to over 90%.
Overall Survival Prediction [94] T2-weighted MRI (Tumor & Peritumoral) Validation Set AUCs: 0.862 (1-yr), 0.885 (3-yr), 0.870 (5-yr) Model based on XGBoost; features from tumor and peritumor showed complementarity.
Radiogenomic Clustering [92] Multi-sequence MRI N/A Unsupervised clustering of radiomic features identified patient clusters with significantly different disease-specific survival (p < 0.001).

Table 3: Radiomics Performance in Differential Diagnosis and Molecular Subtyping

Diagnostic Task Imaging Modality Performance Reference
Distinguishing IA-stage EC from benign lesions [91] MRI (T2WI, DWI, ADC, LCE-T1WI) Accuracy: 0.802; Average AUROC: 0.854 (internal & external validation) Bi et al.
Differentiating EC from benign endometrial polyps [91] MRI (ADC, T2WI, DWI) Average AUROC: 0.983 Chen, Wang et al.
Differentiating Type I from Type II EC [91] MRI AUROC: 0.93 (training), 0.91 (testing) Multicenter study (n=875)
Predicting Histological Grading [91] MRI Average AUROC: 0.64 - 0.77 Underperformed tumor size measurement (AUROC=0.86)

Key Insights from Performance Data

  • CT vs. MRI Utility: While MRI is the gold standard for local staging in EC due to its superior soft-tissue contrast, the performance of CT-based radiomics is notable. CT is more widely available and faster, and its capability to predict recurrence risk with high AUCs (0.86-0.90) [11] makes it a valuable prognostic tool, particularly for assessing distant recurrence.
  • The Peritumoral Region: Models incorporating radiomic features from a 5mm region surrounding the tumor have demonstrated complementary prognostic value to intratumoral features alone, suggesting the peritumoral microenvironment contains critical biological information [94].
  • Beyond the Tumor: Radiogenomics: The integration of radiomic data with genomic data has proven powerful. One study derived an 11-gene prognostic signature (including HSPA5, GATA3, and FLT1) from MRI-based radiomic risk groups. This signature was independently validated and found to be enriched in the aggressive p53abn molecular subtype, providing a biological rationale for the radiomic patterns [92] [94].

Experimental Protocols in Radiomics Research

The workflow for developing a radiomics model is methodical and involves several critical steps to ensure robustness and clinical applicability. The following diagram illustrates a generalized radiomics workflow for prognostic profiling in endometrial cancer, integrating features from both CT and MRI pathways:

G Medical Image Acquisition Medical Image Acquisition CT (e.g., CE-CT) CT (e.g., CE-CT) Medical Image Acquisition->CT (e.g., CE-CT) MRI (e.g., T2WI, DWI, DCE) MRI (e.g., T2WI, DWI, DCE) Medical Image Acquisition->MRI (e.g., T2WI, DWI, DCE) Tumor Segmentation Tumor Segmentation CT (e.g., CE-CT)->Tumor Segmentation MRI (e.g., T2WI, DWI, DCE)->Tumor Segmentation Manual (Radiologist) Manual (Radiologist) Tumor Segmentation->Manual (Radiologist) Semi-/Auto (ML-based) Semi-/Auto (ML-based) Tumor Segmentation->Semi-/Auto (ML-based) Feature Extraction Feature Extraction Manual (Radiologist)->Feature Extraction Semi-/Auto (ML-based)->Feature Extraction Shape, First-Order, Texture Shape, First-Order, Texture Feature Extraction->Shape, First-Order, Texture Feature Processing Feature Processing Shape, First-Order, Texture->Feature Processing ICC Analysis, LASSO ICC Analysis, LASSO Feature Processing->ICC Analysis, LASSO Model Building & Validation Model Building & Validation ICC Analysis, LASSO->Model Building & Validation ML Algorithms (e.g., CoxBoost, XGBoost) ML Algorithms (e.g., CoxBoost, XGBoost) Model Building & Validation->ML Algorithms (e.g., CoxBoost, XGBoost) Internal/External Validation Internal/External Validation Model Building & Validation->Internal/External Validation Prognostic Output Prognostic Output ML Algorithms (e.g., CoxBoost, XGBoost)->Prognostic Output Internal/External Validation->Prognostic Output Recurrence Risk Recurrence Risk Prognostic Output->Recurrence Risk Survival Prediction Survival Prediction Prognostic Output->Survival Prediction Molecular Correlates Molecular Correlates Prognostic Output->Molecular Correlates

Detailed Methodological Breakdown

Image Acquisition and Preprocessing

Protocols vary by modality but must be meticulously documented to ensure reproducibility.

  • CT Protocols: Studies often use venous-phase Contrast-Enhanced CT (CE-CT) for EC staging, as it better highlights parenchymal characteristics and tumor-myometrium contrast [11]. Parameters are variable but typically include: kVp (100-140), slice thickness (1-5 mm), and administration of iodinated contrast agent.
  • MRI Protocols: Multiparametric MRI is common. Key sequences include T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI) with apparent diffusion coefficient (ADC) maps, and dynamic contrast-enhanced (DCE) MRI. Standardization across different scanner manufacturers is a key challenge [91] [94].
Tumor Segmentation

This is a critical step where the region of interest (ROI) is defined.

  • Methods: Can be manual (by radiologists), semi-automated, or fully automated using machine learning algorithms [11] [92]. Manual segmentation is considered the gold standard but is time-consuming and subject to inter-observer variability.
  • Validation: Reproducibility is assessed using metrics like the Dice Similarity Coefficient (DSC) and Mean Distance to Agreement (MDA). Studies often require a DSC ≥ 0.8 and MDA ≤ 3 mm to confirm contouring consistency [11].
  • Volume of Interest (VOI): The ROI is often expanded to include the peritumoral region (e.g., a 5mm margin) to capture microenvironmental features [94].
Feature Extraction and Selection

This is the core of the radiomics process.

  • Extraction Tools: The open-source PyRadiomics library in Python is a widely used standard for extracting a large set of features from the segmented VOI [91] [11] [94].
  • Feature Classes: Hundreds of features are extracted, including:
    • First-Order Statistics: Describing voxel intensity distributions (e.g., energy, entropy).
    • Shape-based Features: 3D descriptors of tumor geometry.
    • Texture Features: Quantifying intra-tumoral heterogeneity using matrices like Gray Level Co-occurrence Matrix (GLCM) and Gray Level Run Length Matrix (GLRLM) [11].
  • Feature Selection: To avoid model overfitting, robust feature reduction is essential. Techniques include:
    • Intraclass Correlation Coefficient (ICC): To exclude features with poor inter-observer reproducibility (e.g., ICC < 0.75) [94].
    • Least Absolute Shrinkage and Selection Operator (LASSO) Regression: A common method for selecting the most predictive features from a high-dimensional dataset [11] [94].
Model Building and Validation
  • Machine Learning Algorithms: A variety of algorithms are employed, including Cox proportional-hazards models (LASSO-Cox, CoxBoost), Random Survival Forests (RFsrc), and XGBoost [11] [94].
  • Validation: Rigorous validation is paramount.
    • Internal Validation: Often performed using k-fold cross-validation (e.g., 10-fold) [11].
    • External Validation: The model is tested on completely independent datasets from different institutions to prove generalizability [94]. This is a key step for establishing clinical utility.

Biological Pathways and Radiogenomic Integration

The "black box" nature of radiomics is being elucidated through studies that link imaging features to specific biological pathways. The following diagram illustrates a proposed biological pathway linking radiomic features to tumor angiogenesis, a key mechanism in cancer progression:

G Radiomic Risk Signature Radiomic Risk Signature Upregulation of Angiogenesis Pathways Upregulation of Angiogenesis Pathways Radiomic Risk Signature->Upregulation of Angiogenesis Pathways Gene Expression Analysis Associated with aggressive tumors and poor prognosis Associated with aggressive tumors and poor prognosis FLT1 (VEGFR1) Gene FLT1 (VEGFR1) Gene Upregulation of Angiogenesis Pathways->FLT1 (VEGFR1) Gene Biological Outcome Biological Outcome FLT1 (VEGFR1) Gene->Biological Outcome Functional Imaging Correlates Functional Imaging Correlates IVIM-DWI Parameters (f, D*) IVIM-DWI Parameters (f, D*) Functional Imaging Correlates->IVIM-DWI Parameters (f, D*) DCE-MRI Parameters (Ktrans, Ve) DCE-MRI Parameters (Ktrans, Ve) Functional Imaging Correlates->DCE-MRI Parameters (Ktrans, Ve) Increased Tumor Angiogenesis & Blood Supply Increased Tumor Angiogenesis & Blood Supply Biological Outcome->Increased Tumor Angiogenesis & Blood Supply Increased Tumor Angiogenesis & Blood Supply->Functional Imaging Correlates Aggressive Tumor Phenotype Aggressive Tumor Phenotype Increased Tumor Angiogenesis & Blood Supply->Aggressive Tumor Phenotype Poor Survival (Validation in TCGA) Poor Survival (Validation in TCGA) Aggressive Tumor Phenotype->Poor Survival (Validation in TCGA)

Connecting Radiomics to Molecular Subtypes and the Microenvironment

Radiogenomic studies have successfully linked non-invasive imaging to critical molecular and cellular processes:

  • Association with p53abn Subtype: Unsupervised clustering of MRI-based radiomic features has identified patient groups with significantly different survival outcomes. These high-risk radiomic clusters are strongly enriched for the aggressive p53abn molecular subtype and exhibit gene expression patterns consistent with loss of hormone receptors and poor prognosis [92].
  • Tumor Angiogenesis: One study identified FLT1 (VEGFR1), a key gene in vascular endothelial growth factor signaling, as a central player in the biological mechanism underlying a prognostic radiomic signature. This was functionally validated using IVIM-DWI and DCE-MRI, techniques that quantitatively assess tissue microvascular structure and perfusion. The results confirmed that high-risk radiomic models are associated with elevated tumor angiogenesis and blood supply [94].
  • Tumor Immune Microenvironment (TIME): The molecular subtypes of EC have distinct immunogenicities. For instance, MMRd tumors have a high mutational burden and are more responsive to immunotherapy [93]. While an active area of research, radiomics holds the potential to non-invasively characterize the TIME and predict response to immunotherapies.

The Scientist's Toolkit: Essential Research Reagents and Solutions

The following table details key reagents, software, and data resources essential for conducting radiomics research in endometrial cancer.

Table 4: Key Research Reagents and Solutions for Radiomics Studies

Item Name Type Primary Function / Application Example Sources / Software
PyRadiomics Open-Source Software Library Standardized extraction of radiomic features from medical images. Python [91] [11]
3D Slicer Open-Source Software Platform Visualization, interaction, and segmentation of medical image data (e.g., manual tumor contouring). https://www.slicer.org/ [94]
MIM Software Commercial Software Medical image analysis, including semi-automated contouring of volumes of interest (VOIs). MIM Software Inc. [11]
The Cancer Genome Atlas (TCGA)-UCEC Genomic & Clinical Database Public repository of genomic, transcriptomic, and clinical data for validation of radiogenomic findings. National Cancer Institute [92] [94]
CPTAC-UCEC Proteomic & Clinical Database Public repository of proteomic and clinical data for deeper mechanistic validation. National Cancer Institute [94]
IVIM-DWI & DCE-MRI Analysis Software Proprietary Software Platforms Quantitative assessment of microvascular perfusion and permeability, validating angiogenesis-related radiomic features. GE AW workstation, GE Omni Kinetic [94]
Cell Line Models (p53abn, MMRd, etc.) Biological Reagents In vitro and in vivo validation of molecular subtype-specific biological mechanisms and vulnerabilities. CCLE, DepMap [96]

Conclusion

CT radiomics represents a validated, high-performance approach for endometrial tumor classification, with recent studies demonstrating exceptional diagnostic accuracy (AUC up to 0.96) in multi-center settings. The integration of explainable machine learning, particularly Random Forest algorithms, provides both predictive power and clinical interpretability through feature importance analysis. While MRI maintains advantages for local staging, CT radiomics offers a widely accessible alternative with strong performance in malignancy differentiation. Future directions should focus on prospective validation in larger cohorts, standardization of imaging protocols across institutions, integration with molecular profiling for comprehensive tumor characterization, and exploration of radiomics' role in predicting treatment response and survival outcomes. For translational researchers and drug developers, CT radiomics presents a promising non-invasive biomarker platform for precision oncology applications in endometrial cancer.

References