Physics World Model — Modality Catalog

6 imaging modalities with descriptions, experimental setups, and reconstruction guidance.

All Compressive

Coded Aperture Compressive Temporal Imaging (CACTI)

CACTI captures multiple video frames in a single camera exposure by modulating the scene with a shifting binary mask during the integration period. Each temporal frame sees a different mask pattern, and the detector integrates all modulated frames into a single 2D measurement. The forward model is y = sum_t M_t * x_t + n where M_t is the mask at time t. Typical compression ratios are 8-48 frames per snapshot. Reconstruction exploits temporal correlation via GAP-TV, PnP-FFDNet, or deep unfolding networks (STFormer, EfficientSCI).

Physics: temporal coding

Solver: gap_tv

Noise: gaussian

#compressive #video #temporal #snapshot #high_speed

View details →

Coded Aperture Compressive Temporal Imaging (CACTI)

Description

Principle

Coded Aperture Compressive Temporal Imaging (CACTI) compresses multiple high-speed video frames into a single sensor exposure by modulating the scene with a dynamic coded aperture (shifting mask) during the integration time. The sensor accumulates a coded sum of B consecutive frames, and computational algorithms recover all B frames from the single compressed measurement using video sparsity priors.

How to Build the System

Build a relay optical system with a physical translating mask or use a DMD as the coded aperture at an intermediate image plane. The mask shifts by one pixel per sub-frame interval during the camera integration time, effectively encoding B temporal frames. Use a standard camera at normal frame rate (e.g., 30 fps) to capture the compressed measurement. Calibrate the mask pattern and its motion precisely.

Common Reconstruction Algorithms

GAP-TV (Generalized Alternating Projection with Total Variation)
DeSCI (Decompress Snapshot Compressive Imaging, GMM prior)
PnP-FFDNet (Plug-and-Play with FFDNet denoiser)
Deep unfolding: BIRNAT, RevSCI, EfficientSCI
E2E-trained networks: STFormer, CST (transformer-based)

Common Mistakes

Mask calibration error causing temporal frame misalignment in reconstruction
Compression ratio too high (too many sub-frames per snapshot) for the scene motion
Motion blur within individual sub-frame intervals when scene moves fast
Non-uniform mask illumination creating brightness gradients in recovered frames
Choosing masks with poor conditioning (high mutual coherence between rows)

How to Avoid Mistakes

Calibrate mask position precisely using a static known pattern before experiments
Limit compression ratio (B ≤ 8-10 for complex natural scenes; B ≤ 24-48 for simpler scenes)
Ensure sub-frame exposure is short enough that intra-frame motion is negligible
Flatfield-correct the mask modulation using a uniform target calibration
Simulate reconstruction quality with candidate mask patterns before hardware fabrication

Forward-Model Mismatch Cases

The widefield fallback processes a single 2D (64,64) frame, but CACTI compresses B temporal frames into a single 2D coded snapshot using a shifting binary mask — the temporal dimension (64,64,B) is entirely lost
Without the time-varying coded exposure pattern, individual video frames cannot be separated from the compressed measurement — temporal super-resolution from the fallback is impossible

How to Correct the Mismatch

Use the CACTI operator that applies frame-wise binary masks and sums the coded frames: y = sum_b(M_b * x_b), compressing B frames into one measurement
Reconstruct the video sequence using PnP-SCI (plug-and-play with FastDVDnet), ELP-Unfolding, or GAP-TV that model the temporal compression and recover B frames from the single snapshot

Experimental Setup — Signal Chain

Experimental setup diagram for Coded Aperture Compressive Temporal Imaging (CACTI)

Experimental Setup — Details

Instrument: Custom CACTI system (Duke / USTC prototype)

Coded Aperture: shifting binary mask on lithographic substrate

Frames Per Snapshot: 8

Spatial Resolution: 256x256

Compression Ratio: 8

Equivalent Fps: 1200

Detector: FLIR Point Grey Grasshopper3 CMOS

Reconstruction: GAP-TV / PnP-FFDNet / STFormer

Benchmark Variants

CACTI Benchmarks

Key References

Llull et al., 'Coded aperture compressive temporal imaging', Optics Express 19, 10526 (2011)
Yuan et al., 'Generalized alternating projection based total variation minimization (GAP-TV)', IEEE ICIP 2016
Wang et al., 'Spatial-Temporal Transformer for Video Snapshot Compressive Imaging (STFormer)', ECCV 2022

Canonical Datasets

Kobe, Runner, Drop, Traffic (grayscale SCI benchmarks)
DAVIS 2017 (adapted for SCI simulation)

Coded Aperture Snapshot Spectral Imaging (CASSI)

cassi Compressive

CASSI captures a 3D hyperspectral data cube (2 spatial + 1 spectral dimension) in a single 2D camera exposure. The scene is modulated by a binary coded aperture mask, spectrally dispersed by a prism, and integrated onto a 2D detector. The forward model is y = H*x + n where H encodes both coded-aperture modulation and spectral-dispersion shift. Compression ratios equal the number of spectral bands (e.g. 28:1). Reconstruction exploits spectral correlation via GAP-TV, MST, or CST.

Physics: spectral coding

Solver: mst

Noise: gaussian

#compressive #spectral #coded_aperture #snapshot #hyperspectral

View details →

Coded Aperture Snapshot Spectral Imaging (CASSI)

Description

Principle

Coded Aperture Snapshot Spectral Imaging (CASSI) captures a full 3-D spectral datacube (x, y, λ) in a single 2-D snapshot by encoding the scene with a binary coded aperture and spectrally dispersing it with a prism onto the detector. Different spectral channels are shifted and superimposed on the sensor, creating a compressed measurement. Computational algorithms recover the full datacube from this single measurement using sparsity priors.

How to Build the System

Build an optical relay with an objective lens, place a binary coded aperture (lithographic chrome-on-glass mask or DMD) at an intermediate image plane, then disperse with an Amici or double-Amici prism, and re-image onto a high-resolution detector (2048× 2048+ pixels). Precisely calibrate the spectral dispersion curve (nm/pixel). The coded aperture pattern should have ~50 % transmittance and good conditioning.

Common Reconstruction Algorithms

TwIST (Two-step Iterative Shrinkage/Thresholding)
GAP-TV (Generalized Alternating Projection with Total Variation)
ADMM with sparsity in DCT or wavelet domain
Deep unfolding networks (DGSMP, TSA-Net, BIRNAT)
Plug-and-Play ADMM with learned denoisers

Common Mistakes

Poor spectral calibration causing wavelength assignment errors across the datacube
Coded aperture not precisely at the image plane, blurring the code modulation
Insufficient detector resolution relative to the number of spectral bands
Ignoring optical aberrations in the dispersive relay that vary with wavelength
Using a random mask without checking its sensing matrix condition number

How to Avoid Mistakes

Calibrate spectral mapping with monochromatic sources at known wavelengths
Mount coded aperture on a precision z-stage and focus to maximize modulation contrast
Ensure detector pixel count > (spatial pixels × spectral bands) for adequate compression ratio
Design the relay optics for uniform imaging quality across the spectral range
Optimize or simulate the mask pattern for low coherence (good RIP) before fabrication

Forward-Model Mismatch Cases

The widefield fallback produces a 2D (64,64) grayscale image, but CASSI compresses a 3D spectral datacube (64,64,L wavelengths) into a single 2D coded snapshot via a binary mask and dispersive prism — the spectral dimension is entirely absent
Without the coded aperture mask and spectral dispersion, the measurement does not encode wavelength-dependent information — spectral unmixing or hyperspectral reconstruction from the fallback output is impossible

How to Correct the Mismatch

Use the CASSI operator that applies the binary coded aperture mask followed by spectral dispersion (prism/grating shift), producing a 2D coded measurement that encodes the full 3D spectral datacube
Reconstruct the (x,y,lambda) datacube using compressive sensing (TwIST, GAP-TV) or deep unfolding networks (TSA-Net, MST) that exploit the spatio-spectral structure encoded by the CASSI forward model

Experimental Setup — Signal Chain

Experimental setup diagram for Coded Aperture Snapshot Spectral Imaging (CASSI)

Experimental Setup — Details

Instrument: Custom SD-CASSI / KAIST CASSI prototype

Coded Aperture: binary random mask on photolithography substrate

Disperser: Amici prism (SD-CASSI)

Spectral Bands: 28

Wavelength Range Nm: 450-650

Spatial Resolution: 256x256

Compression Ratio: 28

Detector: FLIR Grasshopper3 monochrome CMOS (2048x2048)

Relay Lens: 4f relay system with 1:1 magnification

Benchmark Variants

CASSI Benchmarks

Key References

Wagadarikar et al., 'Single disperser design for coded aperture snapshot spectral imaging', Applied Optics 47, B44-B51 (2008)
Cai et al., 'Mask-guided Spectral-wise Transformer (MST++)', CVPRW 2022

Canonical Datasets

CAVE (Columbia, 32 scenes, 512x512x31)
KAIST (30 scenes, 2704x3376x28)
ARAD_1K (1000 hyperspectral images)

Generic Compressive Matrix Sensing

matrix Compressive

Generic compressive sensing framework where the measurement process is modelled as y = A*x + n with A being an explicit M x N sensing matrix (M < N). This covers any linear inverse problem including random Gaussian, Bernoulli, or structured sensing matrices. The compressed sensing theory of Candes, Romberg, and Tao guarantees exact recovery when x is sparse and A satisfies the restricted isometry property (RIP). Reconstruction uses standard proximal algorithms (FISTA, ADMM) with sparsity-promoting regularizers (L1, TV, wavelet).

Physics: compressive sensing

Solver: fista_l2

Noise: gaussian

#compressive #generic #matrix #compressed_sensing #inverse_problem

View details →

Generic Compressive Matrix Sensing

Description

Principle

Generic matrix sensing models the forward process as y = Ax + n, where A is an arbitrary measurement matrix (not necessarily structured like a convolution or Radon transform). This is the most general compressive sensing framework, applicable to random projections, coded apertures, and any linear dimensionality reduction scheme. The key requirement is that A satisfies the Restricted Isometry Property (RIP) for successful sparse recovery.

How to Build the System

Implementation depends on the physical sensing modality. For optical random projections, use a DMD or scattering medium to implement pseudo-random measurement vectors. Calibrate the measurement matrix A by measuring the system response to a complete basis set (e.g., Hadamard patterns). Store A as a dense or structured matrix. Ensure the measurement SNR is adequate for the desired reconstruction quality.

Common Reconstruction Algorithms

ISTA / FISTA (Iterative Shrinkage-Thresholding Algorithm)
Basis pursuit (L1 minimization via linear programming)
AMP (Approximate Message Passing)
ADMM with various regularizers (TV, wavelet sparsity, low-rank)
Learned ISTA (LISTA) and other deep unfolding networks

Common Mistakes

Measurement matrix does not satisfy RIP (too coherent or poorly conditioned)
Mismatch between calibrated A and actual system behavior (model error)
Not accounting for measurement noise level when setting regularization strength
Using an insufficiently sparse signal model for the reconstruction
Ignoring quantization effects of the detector in the measurement model

How to Avoid Mistakes

Verify the condition number and coherence of A; use random or optimized designs
Re-calibrate A periodically to account for system drift
Set regularization parameter proportional to noise level (e.g., via cross-validation)
Validate sparsity assumption on representative signals before deploying CS
Include quantization noise in the forward model or use dithering techniques

Forward-Model Mismatch Cases

The widefield fallback applies a Gaussian blur (shape-preserving convolution), but the correct compressed sensing operator applies a random measurement matrix y = Phi*x that projects the image into a lower-dimensional space
Gaussian blur preserves spatial locality and image structure, whereas the random measurement matrix scrambles all spatial information — the fallback measurements contain no compressed-sensing-compatible encoding

How to Correct the Mismatch

Use the correct compressed sensing operator with the measurement matrix Phi (Gaussian random, partial Fourier, or structured random), producing y = Phi * vec(x)
Reconstruct using L1/TV-regularized optimization (ISTA, ADMM) or learned proximal operators designed for the specific measurement matrix structure

Experimental Setup — Signal Chain

Experimental setup diagram for Generic Compressive Matrix Sensing

Experimental Setup — Details

Matrix Size: 256x256

Sampling Ratio: 0.25

Sensing Matrix: Gaussian random / partial Fourier

Rank Assumption: low-rank or sparse

Reconstruction: FISTA-L2 / ADMM / ISTA-Net

Key References

Candes et al., 'Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information', IEEE TIT 52, 489-509 (2006)
Donoho, 'Compressed sensing', IEEE TIT 52, 1289-1306 (2006)

Canonical Datasets

Set11 / BSD68 (simulation benchmarks)

Single-Pixel Camera

spc Compressive

The single-pixel camera reconstructs a 2D image from scalar intensity measurements acquired by a photodiode after spatially modulating the scene with known patterns on a DMD. Each measurement y_i is the inner product of the scene with a pattern, giving y = Phi*x + n. Compressed sensing theory guarantees recovery from M << N measurements if the scene is sparse. The single detector can operate at wavelengths where array detectors are unavailable (SWIR, THz). Reconstruction uses FISTA with L1/TV penalties or Plug-and-Play methods.

Physics: compressive sensing

Solver: pnp_fista

Noise: gaussian

#compressive #single_pixel #compressed_sensing #dmd #sub_nyquist

View details →

Single-Pixel Camera

Description

Principle

A single-pixel camera uses a spatial light modulator (DMD) to project a sequence of binary or grayscale patterns onto the scene. Each pattern multiplies the scene, and a single bucket detector (photodiode or PMT) measures the total light for each pattern, producing one scalar measurement per pattern. Compressive sensing recovers the image from far fewer measurements than Nyquist by exploiting sparsity in a transform domain.

How to Build the System

Place a DMD (e.g., Texas Instruments DLP LightCrafter) at the image plane of a relay lens. Focus the scene onto the DMD. After the DMD, collect all reflected light onto a single photodetector (avalanche photodiode for low light, or silicon photodiode for visible). Display Hadamard, random, or optimized patterns at 10-22 kHz DMD rate. Synchronize pattern display with detector readout.

Common Reconstruction Algorithms

Basis pursuit / L1 minimization (LASSO)
Orthogonal matching pursuit (OMP)
Total-variation minimization (TV-CS)
TVAL3 (TV with augmented Lagrangian and alternating direction)
Deep compressive sensing networks (ReconNet, CSNet)

Common Mistakes

Pattern-detector timing mismatch causing wrong measurement-to-pattern association
DMD diffraction effects not accounted for at oblique illumination angles
Insufficient measurements for the scene complexity (under-sampling ratio too aggressive)
Analog-to-digital converter resolution too low for the dynamic range of measurements
Not calibrating detector linearity and dark current drift during long acquisitions

How to Avoid Mistakes

Hardware-trigger the detector acquisition from the DMD synchronization signal
Calibrate the effective pattern at the sample plane (not just the DMD command pattern)
Start with 25-50 % measurement ratio for natural scenes; reduce only if sparsity allows
Use 16-bit or higher ADC; verify linearity with a calibrated light source
Measure dark frames periodically and subtract; maintain stable detector temperature

Forward-Model Mismatch Cases

The widefield fallback produces a 2D (64,64) image, but single-pixel camera acquires a 1D vector of M scalar measurements (M << N pixels) via structured illumination patterns and a single photodetector — output shape (M,) vs (64,64)
Each SPC measurement is an inner product of the scene with a known pattern (y_i = <phi_i, x>), capturing compressed information — the widefield blur produces N^2 pixels with no compression, making compressive reconstruction algorithms incompatible

How to Correct the Mismatch

Use the SPC operator that applies the sensing matrix Phi (Hadamard, random, or learned patterns): y = Phi * x, where y has far fewer entries than the image has pixels
Reconstruct using compressive sensing algorithms (ISTA-Net, basis pursuit, total variation) that exploit sparsity to recover the N^2-pixel image from M << N^2 measurements

Experimental Setup — Signal Chain

Experimental setup diagram for Single-Pixel Camera

Experimental Setup — Details

Instrument: Rice SPC prototype / custom DMD system

Spatial Modulator: TI DLP7000 DMD (1024x768 micromirrors)

Detector: Thorlabs PDA100A2 Si photodiode

Effective Resolution: 64x64

Sampling Ratio: 0.25

Sensing Matrix: Walsh-Hadamard (partial)

Pattern Rate Hz: 22000

Collection Optics: 50 mm f/1.4 lens

Benchmark Variants

SPC-Block Benchmarks SPC-Kronecker Benchmarks

Key References

Duarte et al., 'Single-pixel imaging via compressive sampling', IEEE Signal Processing Magazine 25, 83-91 (2008)
Edgar et al., 'Principles and prospects for single-pixel imaging', Nature Photonics 13, 13-20 (2019)

Canonical Datasets

Set11 (11 standard test images)
BSD68 (Martin et al., ICCV 2001)

SPC-Block

spc_block Compressive

Physics: Photon

View details →

SPC-Kronecker

spc_kronecker Compressive

Physics: Photon

View details →