BME-HIT

Multi-Modal Surface Representation Learning for UAV and Satellite Imagery

Félév: 2025-2026/I.

Konzulens: Dr. Liu Chang

Description:

In complex environments, relying on a single data source is often insufficient for accurate road or target extraction. This research integrates UAV imagery, satellite embeddings (AlphaEarth Foundations), LiDAR, and climate data for multi-modal fusion. The goal is to explore surface representation learning methods that combine global and local information, improving road detection accuracy while supporting disaster monitoring and environmental assessment.

Objectives:

Multi-Modal Feature Fusion:

Fuse UAV images, high-resolution satellite embeddings, and other environmental data (LiDAR, climate).
Use Transformer/Attention architectures to jointly model global and local features.

Spatio-Temporal Modeling and Dynamic Prediction:

Integrate temporal sequence information to predict road changes or disaster impact.
Enable dynamic road or target detection.

Visualization and Application:

Build an online visualization tool for real-time display and analysis of multi-modal data.

Innovation:

Combines AlphaEarth embeddings as global context with high-resolution UAV imagery for novel multi-modal surface representations.
Applicable to environmental monitoring, flood warning, and disaster response scenarios.

Expected Contributions:

Improve road and target detection accuracy in complex environments.
Provide a multi-modal solution for Earth observation and disaster monitoring.
Potential for publication in high-impact remote sensing or computer vision journals/conferences.

Requirements and Technical Skills:

Proficiency in Python and experience with TensorFlow/PyTorch.
Familiarity with Transformer/Attention architectures and multi-modal fusion methods.
Basic experience in remote sensing data processing and visualization.

Hallgatók száma: 1

Jelentkezők száma: 0