CACTI

Coded Aperture Compressive Temporal Imaging

Join the Competition Contribute to Benchmark

Standard reconstruction benchmark — forward model perfectly known, no calibration needed. Score = 0.5 × clip((PSNR−15)/30, 0, 1) + 0.5 × SSIM

#	Method	Score	PSNR (dB)	SSIM	Source
🥇	HiSViT-9 HiSViT-9 HiSViT (ECCV 2024) 38.24 dB SSIM 0.978 Try in SpecLab →	0.876	38.24	0.978	✓ Certified	HiSViT (ECCV 2024)
🥈	EfficientSCI EfficientSCI EfficientSCI (CVPR 2023) 37.71 dB SSIM 0.976 Try in SpecLab →	0.867	37.71	0.976	✓ Certified	EfficientSCI (CVPR 2023)
🥉	ELP-Unfolding ELP-Unfolding ELP-Unfolding (2022) 35.54 dB SSIM 0.968 Try in SpecLab →	0.826	35.54	0.968	✓ Certified	ELP-Unfolding (2022)
4	RevSCI RevSCI RevSCI (TPAMI 2022) 33.49 dB SSIM 0.956 Try in SpecLab →	0.786	33.49	0.956	✓ Certified	RevSCI (TPAMI 2022)
5	BIRNAT BIRNAT BIRNAT (TPAMI 2021) 30.26 dB SSIM 0.921 Try in SpecLab →	0.715	30.26	0.921	✓ Certified	BIRNAT (TPAMI 2021)
6	GAP-TV GAP-TV GAP-TV (Signal Processing 2016) 26.02 dB SSIM 0.892 Try in SpecLab →	0.630	26.02	0.892	✓ Certified	GAP-TV (Signal Processing 2016)

Dataset: 6-scene simulation, T=8, Cr=8

Blind Reconstruction Challenge — forward model has unknown mismatch, must calibrate from data. Score = 0.4 × PSNR_norm + 0.4 × SSIM + 0.2 × (1 − ‖y − Ĥx̂‖/‖y‖)

#	Method	Overall Score	Public PSNR / SSIM	Dev PSNR / SSIM	Hidden PSNR / SSIM	Trust	Source
🥇	ELP-Unfolding + blind cal ELP-Unfolding + blind cal InverseNet Scenario IV (blind cal, improved hybrid grid search) Score 0.191 Correct & Reconstruct →	0.191	0.572 23.06 dB / 0.672	—	—	✓ Certified	InverseNet Scenario IV (blind cal, improved hybrid grid search)
🥈	EfficientSCI + blind cal EfficientSCI + blind cal InverseNet Scenario IV (blind cal, improved hybrid grid search) Score 0.189 Correct & Reconstruct →	0.189	0.567 22.65 dB / 0.676	—	—	✓ Certified	InverseNet Scenario IV (blind cal, improved hybrid grid search)
🥉	HiSViT-9 + blind cal HiSViT-9 + blind cal InverseNet Scenario IV (blind cal, improved hybrid grid search) Score 0.188 Correct & Reconstruct →	0.188	0.565 22.58 dB / 0.673	—	—	✓ Certified	InverseNet Scenario IV (blind cal, improved hybrid grid search)
4	GAP-TV + blind cal GAP-TV + blind cal InverseNet Scenario IV (blind cal, improved hybrid grid search) Score 0.174 Correct & Reconstruct →	0.174	0.522 21.79 dB / 0.590	—	—	✓ Certified	InverseNet Scenario IV (blind cal, improved hybrid grid search)
5	PnP-FFDNet + blind cal PnP-FFDNet + blind cal InverseNet Scenario IV (blind cal, improved hybrid grid search) Score 0.149 Correct & Reconstruct →	0.149	0.446 17.37 dB / 0.542	—	—	✓ Certified	InverseNet Scenario IV (blind cal, improved hybrid grid search)

Complete score requires all 3 tiers (Public + Dev + Hidden).

Join the competition →

Scoring: 0.4 × PSNR_norm + 0.4 × SSIM + 0.2 × (1 − ‖y − Ĥx̂‖/‖y‖) PSNR 40% · SSIM 40% · Consistency 20%

Public 6 scenes

Full-access development tier with all data visible.

What you get & how to use

What you get: Measurements (y), ideal forward operator (H), spec ranges, ground truth (x_true), and true mismatch spec.

How to use: Load HDF5 → compare reconstruction vs x_true → check consistency → iterate.

What to submit: Reconstructed signals (x_hat) and corrected spec as HDF5.

Public Leaderboard

#	Method	Score	PSNR	SSIM
1	ELP-Unfolding + blind cal	0.572	23.06	0.672
2	EfficientSCI + blind cal	0.567	22.65	0.676
3	HiSViT-9 + blind cal	0.565	22.58	0.673
4	GAP-TV + blind cal	0.522	21.79	0.59
5	PnP-FFDNet + blind cal	0.446	17.37	0.542

Spec Ranges (7 parameters)

Parameter	Min	Max	Unit
mask_dx	0.2	0.8	px
mask_dy	0.1	0.5	px
mask_rotation	-0.05	0.25	deg
mask_blur	-0.25	0.25	px
clock_offset	-0.05	0.15	frames
gain_drift	0.97	1.07
offset_drift	-0.018	0.022

View Details →

Dev 6 scenes

Blind evaluation tier — no ground truth available.

What you get & how to use

What you get: Measurements (y), ideal forward operator (H), and spec ranges only.

How to use: Apply your pipeline from the Public tier. Use consistency as self-check.

What to submit: Reconstructed signals and corrected spec. Scored server-side.

Dev Leaderboard

#	Method	Score	PSNR	SSIM
1	ELP-Unfolding + blind cal	0.000	0.0	0.0
2	EfficientSCI + blind cal	0.000	0.0	0.0
3	HiSViT-9 + blind cal	0.000	0.0	0.0
4	GAP-TV + blind cal	0.000	0.0	0.0
5	PnP-FFDNet + blind cal	0.000	0.0	0.0

Spec Ranges (7 parameters)

Parameter	Min	Max	Unit
mask_dx	0.05	0.65	px
mask_dy	0.0	0.4	px
mask_rotation	-0.07	0.23	deg
mask_blur	-0.15	0.35	px
clock_offset	-0.13	0.07	frames
gain_drift	0.93	1.03
offset_drift	-0.03	0.01

Submit Reconstruction View Details →

Hidden 6 scenes

Fully blind server-side evaluation — no data download.

What you get & how to use

What you get: No data downloadable. Algorithm runs server-side on hidden measurements.

How to use: Package algorithm as Docker container / Python script. Submit via link.

What to submit: Containerized algorithm accepting y + H, outputting x_hat + corrected spec.

Hidden Leaderboard

#	Method	Score	PSNR	SSIM
1	ELP-Unfolding + blind cal	0.000	0.0	0.0
2	EfficientSCI + blind cal	0.000	0.0	0.0
3	HiSViT-9 + blind cal	0.000	0.0	0.0
4	GAP-TV + blind cal	0.000	0.0	0.0
5	PnP-FFDNet + blind cal	0.000	0.0	0.0

Spec Ranges (7 parameters)

Parameter	Min	Max	Unit
mask_dx	0.35	0.95	px
mask_dy	0.2	0.6	px
mask_rotation	0.07	0.37	deg
mask_blur	0.1	0.6	px
clock_offset	-0.02	0.18	frames
gain_drift	0.99	1.09
offset_drift	-0.005	0.035

Submit Algorithm View Details →

Blind Reconstruction Challenge

Challenge

Given measurements with unknown mismatch and spec ranges (not exact params), reconstruct the original signal. A method must be evaluated on all three tiers for a complete score. Scored on a composite metric: 0.4 × PSNR_norm + 0.4 × SSIM + 0.2 × (1 − ‖y − Ĥx̂‖/‖y‖).

Input

Measurements y, ideal forward model H, spec ranges

Output

Reconstructed signal x̂

About the Imaging Modality

CACTI captures multiple video frames in a single camera exposure by modulating the scene with a shifting binary mask during the integration period. Each temporal frame sees a different mask pattern, and the detector integrates all modulated frames into a single 2D measurement. The forward model is y = sum_t M_t * x_t + n where M_t is the mask at time t. Typical compression ratios are 8-48 frames per snapshot. Reconstruction exploits temporal correlation via GAP-TV, PnP-FFDNet, or deep unfolding networks (STFormer, EfficientSCI).

Principle

Coded Aperture Compressive Temporal Imaging (CACTI) compresses multiple high-speed video frames into a single sensor exposure by modulating the scene with a dynamic coded aperture (shifting mask) during the integration time. The sensor accumulates a coded sum of B consecutive frames, and computational algorithms recover all B frames from the single compressed measurement using video sparsity priors.

How to Build the System

Build a relay optical system with a physical translating mask or use a DMD as the coded aperture at an intermediate image plane. The mask shifts by one pixel per sub-frame interval during the camera integration time, effectively encoding B temporal frames. Use a standard camera at normal frame rate (e.g., 30 fps) to capture the compressed measurement. Calibrate the mask pattern and its motion precisely.

Common Reconstruction Algorithms

GAP-TV (Generalized Alternating Projection with Total Variation)
DeSCI (Decompress Snapshot Compressive Imaging, GMM prior)
PnP-FFDNet (Plug-and-Play with FFDNet denoiser)
Deep unfolding: BIRNAT, RevSCI, EfficientSCI
E2E-trained networks: STFormer, CST (transformer-based)

Common Mistakes

Mask calibration error causing temporal frame misalignment in reconstruction
Compression ratio too high (too many sub-frames per snapshot) for the scene motion
Motion blur within individual sub-frame intervals when scene moves fast
Non-uniform mask illumination creating brightness gradients in recovered frames
Choosing masks with poor conditioning (high mutual coherence between rows)

How to Avoid Mistakes

Calibrate mask position precisely using a static known pattern before experiments
Limit compression ratio (B ≤ 8-10 for complex natural scenes; B ≤ 24-48 for simpler scenes)
Ensure sub-frame exposure is short enough that intra-frame motion is negligible
Flatfield-correct the mask modulation using a uniform target calibration
Simulate reconstruction quality with candidate mask patterns before hardware fabrication

Forward-Model Mismatch Cases

The widefield fallback processes a single 2D (64,64) frame, but CACTI compresses B temporal frames into a single 2D coded snapshot using a shifting binary mask — the temporal dimension (64,64,B) is entirely lost
Without the time-varying coded exposure pattern, individual video frames cannot be separated from the compressed measurement — temporal super-resolution from the fallback is impossible

How to Correct the Mismatch

Use the CACTI operator that applies frame-wise binary masks and sums the coded frames: y = sum_b(M_b * x_b), compressing B frames into one measurement
Reconstruct the video sequence using PnP-SCI (plug-and-play with FastDVDnet), ELP-Unfolding, or GAP-TV that model the temporal compression and recover B frames from the single snapshot

Experimental Setup — Signal Chain

Experimental setup diagram for Coded Aperture Compressive Temporal Imaging (CACTI)

Reconstruction Gallery — 4 Scenes × 3 Scenarios

Method: CPU_baseline | Mismatch: nominal (nominal=True, perturbed=False)

I Ideal II Mismatch (uncorrected) III Oracle correction

I — Ideal

Ground Truth

Measurement

Reconstruction

28.83583466097646 dB SSIM 0.8942128869007379

II — Mismatch

Ground Truth

Measurement

Reconstruction

19.628746907852033 dB SSIM 0.6299246978860197

III — Oracle

Ground Truth

Measurement (perturbed)

Reconstruction

28.661967596893607 dB SSIM 0.888431847653619

I — Ideal

Ground Truth

Measurement

Reconstruction

20.88216606453536 dB SSIM 0.7034055396617757

II — Mismatch

Ground Truth

Measurement

Reconstruction

16.777755027773797 dB SSIM 0.4868548118169957

III — Oracle

Ground Truth

Measurement (perturbed)

Reconstruction

20.84428515238867 dB SSIM 0.7000468328132369

I — Ideal

Ground Truth

Measurement

Reconstruction

30.691990156584193 dB SSIM 0.9154534922988713

II — Mismatch

Ground Truth

Measurement

Reconstruction

24.827200485908673 dB SSIM 0.8474444214238722

III — Oracle

Ground Truth

Measurement (perturbed)

Reconstruction

30.012986873119104 dB SSIM 0.908457497268565

I — Ideal

Ground Truth

Measurement

Reconstruction

35.81071472884463 dB SSIM 0.9741879755737781

II — Mismatch

Ground Truth

Measurement

Reconstruction

30.14980891279552 dB SSIM 0.9504253169730846

III — Oracle

Ground Truth

Measurement (perturbed)

Reconstruction

35.67595393556099 dB SSIM 0.9734564527392388

Mean PSNR Across All Scenes

29.05517640273516 dB

I — Ideal

22.845877833582506 dB

II — Mismatch

28.798798389490592 dB

III — Oracle

Degradation: -6.2 dB Recovery: +6.0 dB

Per-scene PSNR breakdown (4 scenes)

Scene	I (PSNR)	I (SSIM)	II (PSNR)	II (SSIM)	III (PSNR)	III (SSIM)
scene_00	28.83583466097646	0.8942128869007379	19.628746907852033	0.6299246978860197	28.661967596893607	0.888431847653619
scene_01	20.88216606453536	0.7034055396617757	16.777755027773797	0.4868548118169957	20.84428515238867	0.7000468328132369
scene_02	30.691990156584193	0.9154534922988713	24.827200485908673	0.8474444214238722	30.012986873119104	0.908457497268565
scene_03	35.81071472884463	0.9741879755737781	30.14980891279552	0.9504253169730846	35.67595393556099	0.9734564527392388
Mean	29.05517640273516	0.8718149736087908	22.845877833582506	0.7286623120249931	28.798798389490592	0.8675981576186649

Experimental Setup

Instrument: Custom CACTI system (Duke / USTC prototype)

Coded Aperture: shifting binary mask on lithographic substrate

Frames Per Snapshot: 8

Spatial Resolution: 256x256

Compression Ratio: 8

Equivalent Fps: 1200

Detector: FLIR Point Grey Grasshopper3 CMOS

Reconstruction: GAP-TV / PnP-FFDNet / STFormer

Key References

Llull et al., 'Coded aperture compressive temporal imaging', Optics Express 19, 10526 (2011)
Yuan et al., 'Generalized alternating projection based total variation minimization (GAP-TV)', IEEE ICIP 2016
Wang et al., 'Spatial-Temporal Transformer for Video Snapshot Compressive Imaging (STFormer)', ECCV 2022

Canonical Datasets

Kobe, Runner, Drop, Traffic (grayscale SCI benchmarks)
DAVIS 2017 (adapted for SCI simulation)

Spec DAG — Forward Model Pipeline

M(m_t) → Σ_t → D(g, η₄)

M Temporal Mask (m_t)

Σ Temporal Sum (t)

D Detector (g, η₄)

Mismatch Parameters

Symbol	Parameter	Description	Nominal	Perturbed
Δx	mask_dx	Mask lateral shift (pixels)	0	0.5
Δy	mask_dy	Mask vertical shift (pixels)	0	0.3
θ	mask_theta	Mask rotation (rad)	0	0.1
Δt	clock_offset	Clock synchronization offset	0	0.05
d	duty_cycle	Shutter duty cycle	1.0	0.95
g	gain	Detector gain multiplier	1.0	1.02

Credits System

40%

Platform Profit Pool

Revenue allocated to benchmark rewards

30%

Winner Share

Top algorithm receives from pool

$100

Min Withdrawal

Minimum payout threshold

Spec Primitives Reference (11 primitives)

P Propagation

Free-space or medium propagation kernel (Fresnel, Rayleigh-Sommerfeld).

M Mask / Modulation

Spatial or spatio-temporal amplitude modulation (coded aperture, SLM pattern).

Π Projection

Geometric projection operator (Radon transform, fan-beam, cone-beam).

F Fourier Sampling

Sampling in the Fourier / k-space domain (MRI, ptychography).

C Convolution

Shift-invariant convolution with a point-spread function (PSF).

Σ Summation / Integration

Summation along a physical dimension (spectral, temporal, angular).

D Detector

Sensor readout with gain g and noise model η (Gaussian, Poisson, mixed).

S Structured Illumination

Patterned illumination (block, Hadamard, random) applied to the scene.

W Wavelength Dispersion

Spectral dispersion element (prism, grating) with shift α and aperture a.

R Rotation / Motion

Sample or gantry rotation (CT, electron tomography).

Λ Wavelength Selection

Spectral filter or monochromator selecting a wavelength band.