Coded Aperture Compressive Temporal Imaging (CACTI)
CACTI captures multiple video frames in a single camera exposure by modulating the scene with a shifting binary mask during the integration period. Each temporal frame sees a different mask pattern, and the detector integrates all modulated frames into a single 2D measurement. The forward model is y = sum_t M_t * x_t + n where M_t is the mask at time t. Typical compression ratios are 8-48 frames per snapshot. Reconstruction exploits temporal correlation via GAP-TV, PnP-FFDNet, or deep unfolding networks (STFormer, EfficientSCI).
Coded Aperture Temporal
Gaussian
gap tv
CMOS
Forward-Model Signal Chain
Each primitive represents a physical operation in the measurement process. Arrows show signal flow left to right.
M(m_t) → Σ_t → D(g, η₄)
Benchmark Variants & Leaderboards
CACTI
Coded Aperture Compressive Temporal Imaging
M(m_t) → Σ_t → D(g, η₄)
Standard Leaderboard (Top 10)
| # | Method | Score | PSNR (dB) | SSIM | Trust | Source |
|---|---|---|---|---|---|---|
| 🥇 | HiSViT-9 | 0.876 | 38.24 | 0.978 | ✓ Certified | HiSViT (ECCV 2024) |
| 🥈 | EfficientSCI | 0.867 | 37.71 | 0.976 | ✓ Certified | EfficientSCI (CVPR 2023) |
| 🥉 | ELP-Unfolding | 0.826 | 35.54 | 0.968 | ✓ Certified | ELP-Unfolding (2022) |
| 4 | RevSCI | 0.786 | 33.49 | 0.956 | ✓ Certified | RevSCI (TPAMI 2022) |
| 5 | BIRNAT | 0.715 | 30.26 | 0.921 | ✓ Certified | BIRNAT (TPAMI 2021) |
| 6 | GAP-TV | 0.630 | 26.02 | 0.892 | ✓ Certified | GAP-TV (Signal Processing 2016) |
Mismatch Parameters (6) click to expand
| Name | Symbol | Description | Nominal | Perturbed |
|---|---|---|---|---|
| mask_dx | Δx | Mask lateral shift (pixels) | 0 | 0.5 |
| mask_dy | Δy | Mask vertical shift (pixels) | 0 | 0.3 |
| mask_theta | θ | Mask rotation (rad) | 0 | 0.1 |
| clock_offset | Δt | Clock synchronization offset | 0 | 0.05 |
| duty_cycle | d | Shutter duty cycle | 1.0 | 0.95 |
| gain | g | Detector gain multiplier | 1.0 | 1.02 |
Reconstruction Triad Diagnostics
The three diagnostic gates (G1, G2, G3) characterize how reconstruction quality degrades under different error sources. Each bar shows the relative attribution.
Model: coded aperture temporal — Mismatch modes: mask shift error, motion blur within frame, mask diffraction, nonuniform illumination
Noise: gaussian — Typical SNR: 20.0–40.0 dB
Requires: mask patterns, mask shift calibration, dark frame, temporal alignment
Modality Deep Dive
Principle
Coded Aperture Compressive Temporal Imaging (CACTI) compresses multiple high-speed video frames into a single sensor exposure by modulating the scene with a dynamic coded aperture (shifting mask) during the integration time. The sensor accumulates a coded sum of B consecutive frames, and computational algorithms recover all B frames from the single compressed measurement using video sparsity priors.
How to Build the System
Build a relay optical system with a physical translating mask or use a DMD as the coded aperture at an intermediate image plane. The mask shifts by one pixel per sub-frame interval during the camera integration time, effectively encoding B temporal frames. Use a standard camera at normal frame rate (e.g., 30 fps) to capture the compressed measurement. Calibrate the mask pattern and its motion precisely.
Common Reconstruction Algorithms
- GAP-TV (Generalized Alternating Projection with Total Variation)
- DeSCI (Decompress Snapshot Compressive Imaging, GMM prior)
- PnP-FFDNet (Plug-and-Play with FFDNet denoiser)
- Deep unfolding: BIRNAT, RevSCI, EfficientSCI
- E2E-trained networks: STFormer, CST (transformer-based)
Common Mistakes
- Mask calibration error causing temporal frame misalignment in reconstruction
- Compression ratio too high (too many sub-frames per snapshot) for the scene motion
- Motion blur within individual sub-frame intervals when scene moves fast
- Non-uniform mask illumination creating brightness gradients in recovered frames
- Choosing masks with poor conditioning (high mutual coherence between rows)
How to Avoid Mistakes
- Calibrate mask position precisely using a static known pattern before experiments
- Limit compression ratio (B ≤ 8-10 for complex natural scenes; B ≤ 24-48 for simpler scenes)
- Ensure sub-frame exposure is short enough that intra-frame motion is negligible
- Flatfield-correct the mask modulation using a uniform target calibration
- Simulate reconstruction quality with candidate mask patterns before hardware fabrication
Forward-Model Mismatch Cases
- The widefield fallback processes a single 2D (64,64) frame, but CACTI compresses B temporal frames into a single 2D coded snapshot using a shifting binary mask — the temporal dimension (64,64,B) is entirely lost
- Without the time-varying coded exposure pattern, individual video frames cannot be separated from the compressed measurement — temporal super-resolution from the fallback is impossible
How to Correct the Mismatch
- Use the CACTI operator that applies frame-wise binary masks and sums the coded frames: y = sum_b(M_b * x_b), compressing B frames into one measurement
- Reconstruct the video sequence using PnP-SCI (plug-and-play with FastDVDnet), ELP-Unfolding, or GAP-TV that model the temporal compression and recover B frames from the single snapshot
Experimental Setup
Custom CACTI system (Duke / USTC prototype)
shifting binary mask on lithographic substrate
8
256x256
8
1200
FLIR Point Grey Grasshopper3 CMOS
GAP-TV / PnP-FFDNet / STFormer
Signal Chain Diagram
Key References
- Llull et al., 'Coded aperture compressive temporal imaging', Optics Express 19, 10526 (2011)
- Yuan et al., 'Generalized alternating projection based total variation minimization (GAP-TV)', IEEE ICIP 2016
- Wang et al., 'Spatial-Temporal Transformer for Video Snapshot Compressive Imaging (STFormer)', ECCV 2022
Canonical Datasets
- Kobe, Runner, Drop, Traffic (grayscale SCI benchmarks)
- DAVIS 2017 (adapted for SCI simulation)