Preprocessing cloud-contaminated satellite data: Balancing samples across cloud levels for super-resolution
Keywords:
balanced sampling, preprocessing, multi-sensor, data diversity, cloud-awareAbstract
Satellite imagery is used in data-driven applications such as environmental monitoring, agriculture, and disaster management. However, while multispectral imagery suffers from cloud contamination that reduces quality and consistency, radar data remain unaffected but have differing signal characteristics. This study introduces a reproducible preprocessing pipeline for super-resolution that quantifies cloud levels and applies sampling strategies to balance representation across low, medium, and high cloud contamination. Because super-resolution models are typically trained on clean multispectral imagery yet must operate on cloud-contaminated inputs, cloud-level imbalance can bias learning and destabilise optimisation. The pipeline was tested on the SEN12MS-CR dataset, which includes multispectral Sentinel-2 imagery and radar Sentinel-1 data, with image degradation applied to simulate low-resolution inputs for super-resolution tasks. Weighted sampling balanced cloud categories, preserved spectral diversity, and improved repeat-level stability, while random sampling produced pronounced imbalance and variability. This study extends existing preprocessing research by quantifying how cloud-level imbalance influences entropy, variance, and repeat-level stability, which are factors that are rarely examined together in a super-resolution workflow. By ensuring controlled cloud-level representation and preserving diversity, the proposed approach improves dataset representativeness and strengthens the robustness of super-resolution applications.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Research Journal on Innovations in Engineering, Science and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.