Coded Aperture Compressive Temporal Imaging (CACTI)

cacti Compressive Temporal Coding Ray
View Benchmarks (1)

CACTI captures multiple video frames in a single camera exposure by modulating the scene with a shifting binary mask during the integration period. Each temporal frame sees a different mask pattern, and the detector integrates all modulated frames into a single 2D measurement. The forward model is y = sum_t M_t * x_t + n where M_t is the mask at time t. Typical compression ratios are 8-48 frames per snapshot. Reconstruction exploits temporal correlation via GAP-TV, PnP-FFDNet, or deep unfolding networks (STFormer, EfficientSCI).

Forward Model

Coded Aperture Temporal

Noise Model

Gaussian

Default Solver

gap tv

Sensor

CMOS

Forward-Model Signal Chain

Each primitive represents a physical operation in the measurement process. Arrows show signal flow left to right.

M m_t Temporal Mask Sigma t Temporal Sum D g, η₄ Detector
Spec Notation

M(m_t) → Σ_t → D(g, η₄)

Benchmark Variants & Leaderboards

CACTI

Coded Aperture Compressive Temporal Imaging

Full Benchmark Page →
Spec Notation

M(m_t) → Σ_t → D(g, η₄)

Standard Leaderboard (Top 10)

# Method Score PSNR (dB) SSIM Trust Source
🥇 HiSViT-9 0.876 38.24 0.978 ✓ Certified HiSViT (ECCV 2024)
🥈 EfficientSCI 0.867 37.71 0.976 ✓ Certified EfficientSCI (CVPR 2023)
🥉 ELP-Unfolding 0.826 35.54 0.968 ✓ Certified ELP-Unfolding (2022)
4 RevSCI 0.786 33.49 0.956 ✓ Certified RevSCI (TPAMI 2022)
5 BIRNAT 0.715 30.26 0.921 ✓ Certified BIRNAT (TPAMI 2021)
6 GAP-TV 0.630 26.02 0.892 ✓ Certified GAP-TV (Signal Processing 2016)
Mismatch Parameters (6) click to expand
Name Symbol Description Nominal Perturbed
mask_dx Δx Mask lateral shift (pixels) 0 0.5
mask_dy Δy Mask vertical shift (pixels) 0 0.3
mask_theta θ Mask rotation (rad) 0 0.1
clock_offset Δt Clock synchronization offset 0 0.05
duty_cycle d Shutter duty cycle 1.0 0.95
gain g Detector gain multiplier 1.0 1.02

Reconstruction Triad Diagnostics

The three diagnostic gates (G1, G2, G3) characterize how reconstruction quality degrades under different error sources. Each bar shows the relative attribution.

G1 — Forward Model Accuracy How well does the mathematical model match reality?

Model: coded aperture temporal — Mismatch modes: mask shift error, motion blur within frame, mask diffraction, nonuniform illumination

G2 — Noise Characterization Is the noise model correctly specified?

Noise: gaussian — Typical SNR: 20.0–40.0 dB

G3 — Calibration Quality Are instrument parameters accurately measured?

Requires: mask patterns, mask shift calibration, dark frame, temporal alignment

Modality Deep Dive

Principle

Coded Aperture Compressive Temporal Imaging (CACTI) compresses multiple high-speed video frames into a single sensor exposure by modulating the scene with a dynamic coded aperture (shifting mask) during the integration time. The sensor accumulates a coded sum of B consecutive frames, and computational algorithms recover all B frames from the single compressed measurement using video sparsity priors.

How to Build the System

Build a relay optical system with a physical translating mask or use a DMD as the coded aperture at an intermediate image plane. The mask shifts by one pixel per sub-frame interval during the camera integration time, effectively encoding B temporal frames. Use a standard camera at normal frame rate (e.g., 30 fps) to capture the compressed measurement. Calibrate the mask pattern and its motion precisely.

Common Reconstruction Algorithms

  • GAP-TV (Generalized Alternating Projection with Total Variation)
  • DeSCI (Decompress Snapshot Compressive Imaging, GMM prior)
  • PnP-FFDNet (Plug-and-Play with FFDNet denoiser)
  • Deep unfolding: BIRNAT, RevSCI, EfficientSCI
  • E2E-trained networks: STFormer, CST (transformer-based)

Common Mistakes

  • Mask calibration error causing temporal frame misalignment in reconstruction
  • Compression ratio too high (too many sub-frames per snapshot) for the scene motion
  • Motion blur within individual sub-frame intervals when scene moves fast
  • Non-uniform mask illumination creating brightness gradients in recovered frames
  • Choosing masks with poor conditioning (high mutual coherence between rows)

How to Avoid Mistakes

  • Calibrate mask position precisely using a static known pattern before experiments
  • Limit compression ratio (B ≤ 8-10 for complex natural scenes; B ≤ 24-48 for simpler scenes)
  • Ensure sub-frame exposure is short enough that intra-frame motion is negligible
  • Flatfield-correct the mask modulation using a uniform target calibration
  • Simulate reconstruction quality with candidate mask patterns before hardware fabrication

Forward-Model Mismatch Cases

  • The widefield fallback processes a single 2D (64,64) frame, but CACTI compresses B temporal frames into a single 2D coded snapshot using a shifting binary mask — the temporal dimension (64,64,B) is entirely lost
  • Without the time-varying coded exposure pattern, individual video frames cannot be separated from the compressed measurement — temporal super-resolution from the fallback is impossible

How to Correct the Mismatch

  • Use the CACTI operator that applies frame-wise binary masks and sums the coded frames: y = sum_b(M_b * x_b), compressing B frames into one measurement
  • Reconstruct the video sequence using PnP-SCI (plug-and-play with FastDVDnet), ELP-Unfolding, or GAP-TV that model the temporal compression and recover B frames from the single snapshot

Experimental Setup

Instrument

Custom CACTI system (Duke / USTC prototype)

Coded Aperture

shifting binary mask on lithographic substrate

Frames Per Snapshot

8

Spatial Resolution

256x256

Compression Ratio

8

Equivalent Fps

1200

Detector

FLIR Point Grey Grasshopper3 CMOS

Reconstruction

GAP-TV / PnP-FFDNet / STFormer

Signal Chain Diagram

Experimental setup diagram for Coded Aperture Compressive Temporal Imaging (CACTI)

Key References

  • Llull et al., 'Coded aperture compressive temporal imaging', Optics Express 19, 10526 (2011)
  • Yuan et al., 'Generalized alternating projection based total variation minimization (GAP-TV)', IEEE ICIP 2016
  • Wang et al., 'Spatial-Temporal Transformer for Video Snapshot Compressive Imaging (STFormer)', ECCV 2022

Canonical Datasets

  • Kobe, Runner, Drop, Traffic (grayscale SCI benchmarks)
  • DAVIS 2017 (adapted for SCI simulation)

Benchmark Pages