To help Ukraine, Berkeley AI researchers provide machine learning methods and pretrained models to use any imagery interchangeably

It has become difficult to extract insights and actionable insights by manually processing hundreds of terabytes of data downlink from satellites to data centers.

Synthetic Aperture Radar (SAR) imagery is a type of active remote sensing in which a satellite sends pulses of microwave radar waves down to the Earth’s surface. These radar signals return to the satellite after being reflected by the Earth and any object. A SAR image is created by processing these pulses in time and space, with each pixel representing the superposition of several radar scatters. Radar waves penetrate clouds and illuminate the Earth’s surface even at night because the satellite is actively creating them.

They produce visuals that are sometimes contradictory and incompatible with modern computer vision systems. The three common effects are polarization, layover, and multipath.

  • The overlap effect occurs when radar beams reach the top of a structure before reaching the bottom. This makes the top of the object appear to overlap with the bottom.
  • When radar waves reflect off objects on the ground and bounce several times before returning to the SAR sensor, it is called multipath effects. Multi-path effects cause image elements to appear in multiple transformations in the final image.

Existing computer vision approaches based on typical RGB images are not designed to account for these impacts. Current techniques can be applied to SAR imagery but with lower performance and systemic errors that can only be addressed with a specific SAR approach.

During the current invasion of Ukraine, satellite imagery is a key source of intelligence. Many types of satellite images do not allow ground observation in Ukraine due to heavy cloud cover and attacks that frequently occur at night. Cloud-piercing Synthetic Aperture Radar (SAR) imagery is available, but requires expertise. Image analyzers are forced to rely on manual analysis, which is time-consuming and error-prone. Automating this time-consuming procedure would allow for real-time analysis, but current computer vision approaches based on RGB images do not sufficiently account for the phenomenology of SAR.

To overcome these issues, the Berkeley AI Research team developed an initial set of algorithms and models that learned robust representations for RGB, SAR, and co-registered RGB+SAR images. The researchers used the publicly available BigEarthNet-MM dataset and data from Capella’s Open Data, which includes both RGB and SAR imagery. Imagery analysts can now interchangeably use RGB, SAR, or co-registered RGB+SAR imagery for downstream tasks such as image classification, semantic segmentation, object detection, and change detection. using our models.

The researchers note that the Vision Transformer (ViT) is a particularly excellent design for representation learning with SAR, as it removes the scale-inductive and shift-invariant biases built into convolutional neural networks.

MAERS is the most efficient method for learning representations on co-registered RGB, SAR and RGB+SAR. It is based on Masked Autoencoder (MAE). The network takes a masked version of the input data and learns to encode it. Then it learns to decode the data in order to reconstruct the unmasked input data. Unlike many other contrastive learning approaches, MAE does not require particular augmentation invariances in the data that may be erroneous for SAR features. Instead, it relies solely on reconstructing the original input, whether RGB, SAR, or co-recorded.

MAERS improves MAE by:

  1. Learning independent RGB, SAR and RGB+SAR input projection layers
  2. Encoding the output of these projected layers with a shared ViT
  3. Using independent output projection layers to decode them into RGB, SAR or RGB+SAR channels.

The input encoder can accept RGB, SAR, or RGB+SAR as input, and the ViT projection and shared input layers can then be moved to downstream tasks like object detection or change detection.

The team states that content-based image retrieval, classification, segmentation and detection can benefit from representation learning for RGB, SAR and co-registered modalities. They evaluate their method on well-established benchmarks to:

  1. Multi-label classification on the BigEarthNet-MM dataset
  2. Semantic segmentation on the VHR EO and SAR SpaceNet 6 dataset.

Their results suggest that the refined MAERS beats the best RGB+SAR results from the BigEarthNet-MM study. This demonstrates that adapting the MAE architecture for representation learning yields state-of-the-art results.


They also used transfer learning for semantic segmentation of building footprints, a prerequisite for building damage assessment. This would help imagery analysts grasp the disaster in Ukraine.

They used the SpaceNet 6 dataset as an open and public reference to demonstrate the effectiveness of learned representations for detecting building footprints with Capella Space’s VHR SAR. Compared to training the RGB+SAR model from scratch or fitting ImageNet weights with the same architecture, the pre-trained MAERS model improves by 13 points.

This research demonstrates that MAERS can learn strong RGB+SAR representations that allow practitioners to perform downstream tasks using EO or SAR images interchangeably.

The researchers intend to continue their research with comprehensive experiments and standards. They will help humanitarian partners use these models to detect changes in residential and other civilian areas, allowing better monitoring of war crimes in Ukraine.


James G. Williams