PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

CVPR 2026 Findings

Gabriele Magrini ·  Federico Becattini ·  Niccolò Biondi ·  Pietro Pala
CVPR 2026 Findings
PEPR teaser
PEPR trains an RGB encoder to predict event-derived latent representations, transferring domain robustness to a model that requires only RGB at test time.
TL;DR
1 sentence Train with event cameras as a privileged signal — discard them at test time — and get a more domain-robust RGB model without any extra sensors at inference.
~80 words Domain shift is a key obstacle for visual perception models. PEPR exploits event cameras as privileged information: available only during training, they provide domain-invariant supervision via a predictor that trains the RGB encoder to predict event-derived latent representations rather than align features directly. This transfers robustness without forcing dense RGB features to match sparse event outputs. At test time, only the standard RGB model is used — no additional sensors or inference modules required.
Abstract Deep neural networks for visual perception are highly susceptible to domain shift, limiting their deployment under conditions that differ from the training data. Event cameras offer a compelling complement to RGB sensors, as their output is sparse yet more domain-invariant. However, directly aligning RGB and event features is difficult: RGB streams are semantically dense but domain-dependent, while event streams are sparse yet more domain-invariant. We propose PEPR, a cross-modal learning framework that uses event cameras exclusively during training as privileged information. Rather than directly aligning RGB and event features, PEPR trains the RGB encoder to predict event-derived representations through a dedicated predictor module. This transfers domain robustness from the event stream to the RGB encoder while preserving semantic detail. At inference, the event branch is discarded, leaving an RGB-only model with improved domain generalisation across semantic segmentation and object detection benchmarks.
Key Ideas
1
Prediction over Alignment

Instead of forcing dense RGB features to directly match sparse event outputs, PEPR trains the RGB encoder to predict event-derived latent targets via a lightweight predictor module.

2
Events as Privileged Information

Event cameras act as a training-only supervisory signal. They provide domain-invariant cues during training and are completely discarded after training — no paired data needed at deployment.

3
RGB-only Deployment

At test time, PEPR runs with the standard RGB model — no event camera, no additional sensors, no extra inference modules. The robustness is baked into the encoder weights.

Method

During training, PEPR combines four components: an RGB encoder (backbone), a task prediction head (segmentation or detection), a privileged event encoder, and a predictor module that maps RGB latents to event latent targets. The total loss combines the standard task loss with the prediction loss between RGB-predicted and event-derived representations. After training, the event encoder and predictor are discarded.

Key insight: Predicting event latents forces the RGB encoder to learn representations that are predictive of domain-invariant event features — without requiring the two modalities to share the same feature space.
Patch Selection Mechanism

A core challenge in cross-modal predictive learning is that event cameras produce sparse outputs: most of the spatial grid carries no signal at any given moment. Supervising every RGB patch against an empty event target would flood the predictor with uninformative gradients and destabilize training.

PEPR addresses this with a patch selection mechanism: only the spatial patches where the event stream is active — i.e., where events actually fired — are selected as prediction targets for the RGB encoder. Concretely, the event representation is divided into non-overlapping patches and those with sufficient event density are retained. The predictor then aligns the corresponding RGB patches only at those locations, concentrating the supervision signal where the event modality is informative. This selective alignment makes the training loss meaningful and prevents the RGB encoder from being pulled toward trivial or noisy targets.

PEPR patch selection mechanism
Patch selection: only spatially active event patches (highlighted) are used as prediction targets, focusing supervision where the event signal is informative.
Results

PEPR is evaluated on three benchmarks spanning semantic segmentation and object detection under domain shift, using the FRED, DSEC, Hard-DSEC-DET, Cityscapes, and Cityscapes Adverse datasets. A selection of key results is shown below; many additional experiments and ablations are reported in the full CVPR 2026 Findings paper.

PEPR segmentation results
Semantic segmentation results under domain shift.
PEPR FRED dataset results
Results on the FRED benchmark.
PEPR Hard-DSEC-DET results
Object detection results on the Hard-DSEC-DET benchmark.
Citation
@inproceedings{magrini2026pepr,
  title     = {PEPR: Privileged Event-based Predictive Regularization for Domain Generalization},
  author    = {Magrini, Gabriele and Becattini, Federico and Biondi, Niccolò and Pala, Pietro},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  note      = {Findings},
  year      = {2026},
  arxiv     = {2602.04583}
}