Multi-Modal Image Segmentation for UAV and Remote Sensing
Satellite and UAV imagery are widely used for road and land-cover segmentation. Satellite data provide global coverage with diverse spectral channels, while UAV imagery offers very high spatial resolution. Additional modalities such as multispectral, LiDAR/DSM, and radar add complementary information. Recent foundation models like AnySat from CVPR 2025 can unify heterogeneous inputs across resolutions and modalities, but they do not explicitly analyze the relative importance of each modality or channel, limiting interpretability for sensor selection.
This work aims to develop a multi-modal segmentation framework that fuses satellite and UAV data while learning and quantifying modality and channel importance. The expected outcome is improved segmentation accuracy and interpretable insights into which data sources contribute most to segmentation tasks.
Tasks to be performed by the student will include:
· Present a literature review of image segmentation methods in satellite and UAV remote sensing, with emphasis on multi-modal fusion strategies.
· Study preprocessing and co-registration techniques for multi-modal datasets (e.g., optical RGB, multispectral, LiDAR/DSM, radar).
· Create segmentation models incorporating multi-modal and multi-scale feature fusion (early fusion, feature-level fusion with attention, late fusion).
· Design attention-based or gating mechanisms to estimate modality and channel importance in segmentation tasks.
· Verify segmentation performance on benchmark datasets (e.g., ISPRS Potsdam/Vaihingen, UAVid, SpaceNet+DSM) using metrics such as mIoU and F1-score.
· Conduct ablation studies and visualization (e.g., attention maps, modality-drop experiments) to analyze and interpret modality and channel contributions.
Note:Applicant will use publicly available datasets and receive technical support from SZTAKI.
Supervisor at the department: Dr. Chang Liu, Assistant Professor
External supervisor: Prof. Tamás Szirányi, HUN-REN SZTAKI.