Synthetic data, an appealing alternative to extensive expert-annotated data for medical image segmentation, consistently fails to improve segmentation performance despite its visual realism. The reason being that synthetic and real medical images exist in different semantic feature spaces, creating a domain gap that current semi-supervised learning methods cannot bridge. We propose SRA-Seg, a framework explicitly designed to align synthetic and real feature distributions for medical image segmentation. SRA-Seg introduces a similarity-alignment (SA) loss using frozen DINOv2 embeddings to pull synthetic representations toward their nearest real counterparts in semantic space. We employ soft edge blending to create smooth anatomical transitions and continuous labels, eliminating the hard boundaries from traditional copy-paste augmentation. The framework generates pseudo-labels for synthetic images via an EMA teacher model and applies soft-segmentation losses that respect uncertainty in mixed regions. Our experiments demonstrate strong results: using only 10% labeled real data and 90% synthetic unlabeled data, SRA-Seg achieves 89.34% Dice on ACDC and 84.42% on FIVES, significantly outperforming existing semi-supervised methods and matching the performance of methods using real unlabeled data.
High-fidelity synthetic images are generated using StyleGAN2-ADA trained on limited real labeled data (5% or 10%).
Bidirectional patch exchange with soft edge blending creates smooth anatomical transitions, avoiding sharp boundaries.
SA loss using frozen DINOv2 embeddings pulls synthetic features toward their nearest real counterparts in semantic space.
Soft Dice and Cross-Entropy losses computed directly on continuous probability maps, respecting uncertainty in mixed regions.
Figure: Synthetic data generated by StyleGAN2-ADA for ACDC (top) and FIVES (bottom) datasets. Left to right: first 3 are original images, next 3 generated using 5% real data, last 3 using 10% real data.
(a) UNet + Real Unlabeled
(b) UNet + Synthetic
(c) BCP + Synthetic
(d) SRA-Seg (Ours)
Figure: KDE plots showing domain mismatch between labeled (green) and unlabeled (blue) data. Our method (d) effectively aligns the distributions.
Figure: Soft Edge Blending component of SRA-Seg. The labeled and unlabeled images as well as the corresponding segmentation masks are mixed through cropping and soft mask blending to reduce sharp edges.
| Method | Labeled | Real Unlabeled | Synthetic | DICE ↑ | Jaccard ↑ | 95HD ↓ | ASD ↓ |
|---|---|---|---|---|---|---|---|
| UNet | 136 | 0 | 0 | 79.41 | 68.11 | 9.35 | 2.70 |
| BCP (CVPR'23) | 136 (10%) | 1176 (90%) | 0 | 88.84 | 80.62 | 3.98 | 1.17 |
| BCP (CVPR'23) | 136 (10%) | 0 | 1176 (90%) | 87.46 | 78.53 | 5.30 | 1.62 |
| CrossMatch (JBHI'25) | 136 (10%) | 0 | 1176 (90%) | 85.26 | 76.28 | 3.72 | 1.00 |
| ABD(BCP) (CVPR'24) | 136 (10%) | 0 | 1176 (90%) | 87.03 | 77.88 | 3.19 | 0.85 |
| DiffRect (MICCAI'24) | 136 (10%) | 0 | 1176 (90%) | 88.14 | 79.65 | 5.72 | 1.60 |
| CGS (TMI'25) | 136 (10%) | 0 | 1176 (90%) | 87.76 | 78.95 | 3.82 | 1.31 |
| SRA-Seg (Ours) | 136 (10%) | 0 | 1176 (90%) | 89.34 | 81.24 | 3.03 | 1.14 |
| Method | Labeled | Real Unlabeled | Synthetic | DICE ↑ | Jaccard ↑ | 95HD ↓ | ASD ↓ |
|---|---|---|---|---|---|---|---|
| UNet | 56 | 0 | 0 | 59.36 | 43.71 | 13.69 | 2.46 |
| BCP (CVPR'23) | 56 (10%) | 504 (90%) | 0 | 81.87 | 69.35 | 1.85 | 0.15 |
| BCP (CVPR'23) | 56 (10%) | 0 | 504 (90%) | 83.86 | 72.25 | 1.51 | 0.17 |
| CrossMatch (JBHI'25) | 56 (10%) | 0 | 504 (90%) | 60.68 | 43.73 | 3.69 | 0.02 |
| DiffRect (MICCAI'24) | 56 (10%) | 0 | 504 (90%) | 84.22 | 72.79 | 1.41 | 0.16 |
| CGS (TMI'25) | 56 (10%) | 0 | 504 (90%) | 83.55 | 71.80 | 1.59 | 0.17 |
| SRA-Seg (Ours) | 56 (10%) | 0 | 504 (90%) | 84.42 | 73.08 | 1.34 | 0.16 |
ACDC Dataset (10% labeled, 90% synthetic). Incorrectly segmented pixels are highlighted in red.
| Image | UNet | BCP | CrossMatch | ABD | DiffRect | CGS | SRA-Seg | GT |
|---|---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
FIVES Dataset (10% labeled, 90% synthetic). Incorrectly segmented pixels are highlighted in red.
| Image | UNet | BCP | CrossMatch | ABD | DiffRect | CGS | SRA-Seg | GT |
|---|---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Figure: Comparison of real-image usage (bar height) and resulting Dice scores (red markers) for BCP versus SRA-Seg on ACDC and FIVES datasets. SRA-Seg achieves higher Dice with only 10% real data.
Figure: FID scores comparing synthetic and real images across 5% and 10% data splits. Lower FID indicates better synthetic data quality. FIVES achieves consistently lower FID scores.
Each component of SRA-Seg contributes to the final performance:
| Soft-Mix | Soft-Loss | SA-Loss | ACDC DICE ↑ | FIVES DICE ↑ |
|---|---|---|---|---|
| ✗ | ✗ | ✗ | 87.46 | 83.86 |
| ✓ | ✗ | ✗ | 88.16 | 84.08 |
| ✓ | ✓ | ✗ | 88.96 | 84.16 |
| ✓ | ✓ | ✓ | 89.33 | 84.42 |
@article{aranya2025sraseg,
title={SRA-Seg: Synthetic to Real Alignment for Semi-Supervised Medical Image Segmentation},
author={Aranya, OFM Riaz Rahman and Desai, Kevin},
journal={arXiv preprint},
year={2025}
}
This work builds upon BCP and utilizes StyleGAN2-ADA for synthetic data generation and DINOv2 for feature extraction.