Under Review

SRA-Seg: Synthetic to Real Alignment for Semi-Supervised Medical Image Segmentation

OFM Riaz Rahman Aranya Kevin Desai

University of Texas at San Antonio

89.34%
Dice on ACDC

84.42%
Dice on FIVES

10%
Real Data Required

90%
Synthetic Data

Figure: Overview of the proposed SRA-Seg method. The framework integrates synthetic images generated by StyleGAN2-ADA with real labeled data through soft-mix augmentation, EMA-based pseudo-labeling, and similarity-alignment loss using frozen DINOv2 embeddings.

Abstract

Synthetic data, an appealing alternative to extensive expert-annotated data for medical image segmentation, consistently fails to improve segmentation performance despite its visual realism. The reason being that synthetic and real medical images exist in different semantic feature spaces, creating a domain gap that current semi-supervised learning methods cannot bridge. We propose SRA-Seg, a framework explicitly designed to align synthetic and real feature distributions for medical image segmentation. SRA-Seg introduces a similarity-alignment (SA) loss using frozen DINOv2 embeddings to pull synthetic representations toward their nearest real counterparts in semantic space. We employ soft edge blending to create smooth anatomical transitions and continuous labels, eliminating the hard boundaries from traditional copy-paste augmentation. The framework generates pseudo-labels for synthetic images via an EMA teacher model and applies soft-segmentation losses that respect uncertainty in mixed regions. Our experiments demonstrate strong results: using only 10% labeled real data and 90% synthetic unlabeled data, SRA-Seg achieves 89.34% Dice on ACDC and 84.42% on FIVES, significantly outperforming existing semi-supervised methods and matching the performance of methods using real unlabeled data.

Key Contributions

A framework explicitly designed to bridge the synthetic-to-real domain gap for semi-supervised medical image segmentation
A similarity-alignment (SA) loss using frozen DINOv2 embeddings to pull synthetic features toward real counterparts
Soft edge blending for smooth anatomical transitions, replacing hard copy-paste boundaries

Method

Synthetic Data Generation

High-fidelity synthetic images are generated using StyleGAN2-ADA trained on limited real labeled data (5% or 10%).

Soft-Mix Augmentation

Bidirectional patch exchange with soft edge blending creates smooth anatomical transitions, avoiding sharp boundaries.

Similarity-Alignment Loss

SA loss using frozen DINOv2 embeddings pulls synthetic features toward their nearest real counterparts in semantic space.

Soft-Segmentation Loss

Soft Dice and Cross-Entropy losses computed directly on continuous probability maps, respecting uncertainty in mixed regions.

Figure: Synthetic data generated by StyleGAN2-ADA for ACDC (top) and FIVES (bottom) datasets. Left to right: first 3 are original images, next 3 generated using 5% real data, last 3 using 10% real data.

Domain Gap Visualization

(a) UNet + Real Unlabeled

(b) UNet + Synthetic

(d) SRA-Seg (Ours)

Figure: KDE plots showing domain mismatch between labeled (green) and unlabeled (blue) data. Our method (d) effectively aligns the distributions.

Figure: Soft Edge Blending component of SRA-Seg. The labeled and unlabeled images as well as the corresponding segmentation masks are mixed through cropping and soft mask blending to reduce sharp edges.

Results

ACDC Dataset (Cardiac MRI)

Method	Labeled	Real Unlabeled	Synthetic	DICE ↑	Jaccard ↑	95HD ↓	ASD ↓
UNet	136	0	0	79.41	68.11	9.35	2.70
BCP (CVPR'23)	136 (10%)	1176 (90%)	0	88.84	80.62	3.98	1.17
BCP (CVPR'23)	136 (10%)	0	1176 (90%)	87.46	78.53	5.30	1.62
CrossMatch (JBHI'25)	136 (10%)	0	1176 (90%)	85.26	76.28	3.72	1.00
ABD(BCP) (CVPR'24)	136 (10%)	0	1176 (90%)	87.03	77.88	3.19	0.85
DiffRect (MICCAI'24)	136 (10%)	0	1176 (90%)	88.14	79.65	5.72	1.60
CGS (TMI'25)	136 (10%)	0	1176 (90%)	87.76	78.95	3.82	1.31
SRA-Seg (Ours)	136 (10%)	0	1176 (90%)	89.34	81.24	3.03	1.14

FIVES Dataset (Fundus Images)

Method	Labeled	Real Unlabeled	Synthetic	DICE ↑	Jaccard ↑	95HD ↓	ASD ↓
UNet	56	0	0	59.36	43.71	13.69	2.46
BCP (CVPR'23)	56 (10%)	504 (90%)	0	81.87	69.35	1.85	0.15
BCP (CVPR'23)	56 (10%)	0	504 (90%)	83.86	72.25	1.51	0.17
CrossMatch (JBHI'25)	56 (10%)	0	504 (90%)	60.68	43.73	3.69	0.02
DiffRect (MICCAI'24)	56 (10%)	0	504 (90%)	84.22	72.79	1.41	0.16
CGS (TMI'25)	56 (10%)	0	504 (90%)	83.55	71.80	1.59	0.17
SRA-Seg (Ours)	56 (10%)	0	504 (90%)	84.42	73.08	1.34	0.16

Qualitative Results

ACDC Dataset (10% labeled, 90% synthetic). Incorrectly segmented pixels are highlighted in red.

Image	UNet	BCP	CrossMatch	ABD	DiffRect	CGS	SRA-Seg	GT

FIVES Dataset (10% labeled, 90% synthetic). Incorrectly segmented pixels are highlighted in red.

Image	UNet	BCP	CrossMatch	ABD	DiffRect	CGS	SRA-Seg	GT

Figure: Comparison of real-image usage (bar height) and resulting Dice scores (red markers) for BCP versus SRA-Seg on ACDC and FIVES datasets. SRA-Seg achieves higher Dice with only 10% real data.

Figure: FID scores comparing synthetic and real images across 5% and 10% data splits. Lower FID indicates better synthetic data quality. FIVES achieves consistently lower FID scores.

Ablation Study

Each component of SRA-Seg contributes to the final performance:

Soft-Mix	Soft-Loss	SA-Loss	ACDC DICE ↑	FIVES DICE ↑
✗	✗	✗	87.46	83.86
✓	✗	✗	88.16	84.08
✓	✓	✗	88.96	84.16
✓	✓	✓	89.33	84.42

BibTeX

@article{aranya2025sraseg,
  title={SRA-Seg: Synthetic to Real Alignment for Semi-Supervised Medical Image Segmentation},
  author={Aranya, OFM Riaz Rahman and Desai, Kevin},
  journal={arXiv preprint},
  year={2025}
}

Acknowledgments

This work builds upon BCP and utilizes StyleGAN2-ADA for synthetic data generation and DINOv2 for feature extraction.