Back to VIRLab Research
Under Review

SRA-Seg: Synthetic to Real Alignment for Semi-Supervised Medical Image Segmentation

University of Texas at San Antonio
89.34%
Dice on ACDC
84.42%
Dice on FIVES
10%
Real Data Required
90%
Synthetic Data
SRA-Seg Overview

Figure: Overview of the proposed SRA-Seg method. The framework integrates synthetic images generated by StyleGAN2-ADA with real labeled data through soft-mix augmentation, EMA-based pseudo-labeling, and similarity-alignment loss using frozen DINOv2 embeddings.

Abstract

Synthetic data, an appealing alternative to extensive expert-annotated data for medical image segmentation, consistently fails to improve segmentation performance despite its visual realism. The reason being that synthetic and real medical images exist in different semantic feature spaces, creating a domain gap that current semi-supervised learning methods cannot bridge. We propose SRA-Seg, a framework explicitly designed to align synthetic and real feature distributions for medical image segmentation. SRA-Seg introduces a similarity-alignment (SA) loss using frozen DINOv2 embeddings to pull synthetic representations toward their nearest real counterparts in semantic space. We employ soft edge blending to create smooth anatomical transitions and continuous labels, eliminating the hard boundaries from traditional copy-paste augmentation. The framework generates pseudo-labels for synthetic images via an EMA teacher model and applies soft-segmentation losses that respect uncertainty in mixed regions. Our experiments demonstrate strong results: using only 10% labeled real data and 90% synthetic unlabeled data, SRA-Seg achieves 89.34% Dice on ACDC and 84.42% on FIVES, significantly outperforming existing semi-supervised methods and matching the performance of methods using real unlabeled data.

Key Contributions

  • A framework explicitly designed to bridge the synthetic-to-real domain gap for semi-supervised medical image segmentation
  • A similarity-alignment (SA) loss using frozen DINOv2 embeddings to pull synthetic features toward real counterparts
  • Soft edge blending for smooth anatomical transitions, replacing hard copy-paste boundaries

Method

Synthetic Data Generation

High-fidelity synthetic images are generated using StyleGAN2-ADA trained on limited real labeled data (5% or 10%).

Soft-Mix Augmentation

Bidirectional patch exchange with soft edge blending creates smooth anatomical transitions, avoiding sharp boundaries.

Similarity-Alignment Loss

SA loss using frozen DINOv2 embeddings pulls synthetic features toward their nearest real counterparts in semantic space.

Soft-Segmentation Loss

Soft Dice and Cross-Entropy losses computed directly on continuous probability maps, respecting uncertainty in mixed regions.

Synthetic Data Samples

Figure: Synthetic data generated by StyleGAN2-ADA for ACDC (top) and FIVES (bottom) datasets. Left to right: first 3 are original images, next 3 generated using 5% real data, last 3 using 10% real data.

Domain Gap Visualization

UNet Real

(a) UNet + Real Unlabeled

UNet Synthetic

(b) UNet + Synthetic

BCP Synthetic

(c) BCP + Synthetic

SRA-Seg

(d) SRA-Seg (Ours)

Figure: KDE plots showing domain mismatch between labeled (green) and unlabeled (blue) data. Our method (d) effectively aligns the distributions.

Soft Edge Blending

Figure: Soft Edge Blending component of SRA-Seg. The labeled and unlabeled images as well as the corresponding segmentation masks are mixed through cropping and soft mask blending to reduce sharp edges.

Results

ACDC Dataset (Cardiac MRI)

Method Labeled Real Unlabeled Synthetic DICE ↑ Jaccard ↑ 95HD ↓ ASD ↓
UNet 136 0 0 79.41 68.11 9.35 2.70
BCP (CVPR'23) 136 (10%) 1176 (90%) 0 88.84 80.62 3.98 1.17
BCP (CVPR'23) 136 (10%) 0 1176 (90%) 87.46 78.53 5.30 1.62
CrossMatch (JBHI'25) 136 (10%) 0 1176 (90%) 85.26 76.28 3.72 1.00
ABD(BCP) (CVPR'24) 136 (10%) 0 1176 (90%) 87.03 77.88 3.19 0.85
DiffRect (MICCAI'24) 136 (10%) 0 1176 (90%) 88.14 79.65 5.72 1.60
CGS (TMI'25) 136 (10%) 0 1176 (90%) 87.76 78.95 3.82 1.31
SRA-Seg (Ours) 136 (10%) 0 1176 (90%) 89.34 81.24 3.03 1.14

FIVES Dataset (Fundus Images)

Method Labeled Real Unlabeled Synthetic DICE ↑ Jaccard ↑ 95HD ↓ ASD ↓
UNet 56 0 0 59.36 43.71 13.69 2.46
BCP (CVPR'23) 56 (10%) 504 (90%) 0 81.87 69.35 1.85 0.15
BCP (CVPR'23) 56 (10%) 0 504 (90%) 83.86 72.25 1.51 0.17
CrossMatch (JBHI'25) 56 (10%) 0 504 (90%) 60.68 43.73 3.69 0.02
DiffRect (MICCAI'24) 56 (10%) 0 504 (90%) 84.22 72.79 1.41 0.16
CGS (TMI'25) 56 (10%) 0 504 (90%) 83.55 71.80 1.59 0.17
SRA-Seg (Ours) 56 (10%) 0 504 (90%) 84.42 73.08 1.34 0.16

Qualitative Results

ACDC Dataset (10% labeled, 90% synthetic). Incorrectly segmented pixels are highlighted in red.

Image UNet BCP CrossMatch ABD DiffRect CGS SRA-Seg GT

FIVES Dataset (10% labeled, 90% synthetic). Incorrectly segmented pixels are highlighted in red.

Image UNet BCP CrossMatch ABD DiffRect CGS SRA-Seg GT
Dice Score Comparison

Figure: Comparison of real-image usage (bar height) and resulting Dice scores (red markers) for BCP versus SRA-Seg on ACDC and FIVES datasets. SRA-Seg achieves higher Dice with only 10% real data.

FID Scores

Figure: FID scores comparing synthetic and real images across 5% and 10% data splits. Lower FID indicates better synthetic data quality. FIVES achieves consistently lower FID scores.

Ablation Study

Each component of SRA-Seg contributes to the final performance:

Soft-Mix Soft-Loss SA-Loss ACDC DICE ↑ FIVES DICE ↑
87.46 83.86
88.16 84.08
88.96 84.16
89.33 84.42

BibTeX

@article{aranya2025sraseg,
  title={SRA-Seg: Synthetic to Real Alignment for Semi-Supervised Medical Image Segmentation},
  author={Aranya, OFM Riaz Rahman and Desai, Kevin},
  journal={arXiv preprint},
  year={2025}
}

Acknowledgments

This work builds upon BCP and utilizes StyleGAN2-ADA for synthetic data generation and DINOv2 for feature extraction.