Physics World Model — Modality Catalog
6 imaging modalities with descriptions, experimental setups, and reconstruction guidance.
Coded Aperture Compressive Temporal Imaging (CACTI)
CACTI captures multiple video frames in a single camera exposure by modulating the scene with a shifting binary mask during the integration period. Each temporal frame sees a different mask pattern, and the detector integrates all modulated frames into a single 2D measurement. The forward model is y = sum_t M_t * x_t + n where M_t is the mask at time t. Typical compression ratios are 8-48 frames per snapshot. Reconstruction exploits temporal correlation via GAP-TV, PnP-FFDNet, or deep unfolding networks (STFormer, EfficientSCI).
Coded Aperture Compressive Temporal Imaging (CACTI)
Description
CACTI captures multiple video frames in a single camera exposure by modulating the scene with a shifting binary mask during the integration period. Each temporal frame sees a different mask pattern, and the detector integrates all modulated frames into a single 2D measurement. The forward model is y = sum_t M_t * x_t + n where M_t is the mask at time t. Typical compression ratios are 8-48 frames per snapshot. Reconstruction exploits temporal correlation via GAP-TV, PnP-FFDNet, or deep unfolding networks (STFormer, EfficientSCI).
Principle
Coded Aperture Compressive Temporal Imaging (CACTI) compresses multiple high-speed video frames into a single sensor exposure by modulating the scene with a dynamic coded aperture (shifting mask) during the integration time. The sensor accumulates a coded sum of B consecutive frames, and computational algorithms recover all B frames from the single compressed measurement using video sparsity priors.
How to Build the System
Build a relay optical system with a physical translating mask or use a DMD as the coded aperture at an intermediate image plane. The mask shifts by one pixel per sub-frame interval during the camera integration time, effectively encoding B temporal frames. Use a standard camera at normal frame rate (e.g., 30 fps) to capture the compressed measurement. Calibrate the mask pattern and its motion precisely.
Common Reconstruction Algorithms
- GAP-TV (Generalized Alternating Projection with Total Variation)
- DeSCI (Decompress Snapshot Compressive Imaging, GMM prior)
- PnP-FFDNet (Plug-and-Play with FFDNet denoiser)
- Deep unfolding: BIRNAT, RevSCI, EfficientSCI
- E2E-trained networks: STFormer, CST (transformer-based)
Common Mistakes
- Mask calibration error causing temporal frame misalignment in reconstruction
- Compression ratio too high (too many sub-frames per snapshot) for the scene motion
- Motion blur within individual sub-frame intervals when scene moves fast
- Non-uniform mask illumination creating brightness gradients in recovered frames
- Choosing masks with poor conditioning (high mutual coherence between rows)
How to Avoid Mistakes
- Calibrate mask position precisely using a static known pattern before experiments
- Limit compression ratio (B ≤ 8-10 for complex natural scenes; B ≤ 24-48 for simpler scenes)
- Ensure sub-frame exposure is short enough that intra-frame motion is negligible
- Flatfield-correct the mask modulation using a uniform target calibration
- Simulate reconstruction quality with candidate mask patterns before hardware fabrication
Forward-Model Mismatch Cases
- The widefield fallback processes a single 2D (64,64) frame, but CACTI compresses B temporal frames into a single 2D coded snapshot using a shifting binary mask — the temporal dimension (64,64,B) is entirely lost
- Without the time-varying coded exposure pattern, individual video frames cannot be separated from the compressed measurement — temporal super-resolution from the fallback is impossible
How to Correct the Mismatch
- Use the CACTI operator that applies frame-wise binary masks and sums the coded frames: y = sum_b(M_b * x_b), compressing B frames into one measurement
- Reconstruct the video sequence using PnP-SCI (plug-and-play with FastDVDnet), ELP-Unfolding, or GAP-TV that model the temporal compression and recover B frames from the single snapshot
Experimental Setup — Signal Chain
Experimental Setup — Details
Benchmark Variants
Key References
- Llull et al., 'Coded aperture compressive temporal imaging', Optics Express 19, 10526 (2011)
- Yuan et al., 'Generalized alternating projection based total variation minimization (GAP-TV)', IEEE ICIP 2016
- Wang et al., 'Spatial-Temporal Transformer for Video Snapshot Compressive Imaging (STFormer)', ECCV 2022
Canonical Datasets
- Kobe, Runner, Drop, Traffic (grayscale SCI benchmarks)
- DAVIS 2017 (adapted for SCI simulation)
Coded Aperture Snapshot Spectral Imaging (CASSI)
CASSI captures a 3D hyperspectral data cube (2 spatial + 1 spectral dimension) in a single 2D camera exposure. The scene is modulated by a binary coded aperture mask, spectrally dispersed by a prism, and integrated onto a 2D detector. The forward model is y = H*x + n where H encodes both coded-aperture modulation and spectral-dispersion shift. Compression ratios equal the number of spectral bands (e.g. 28:1). Reconstruction exploits spectral correlation via GAP-TV, MST, or CST.
Coded Aperture Snapshot Spectral Imaging (CASSI)
Description
CASSI captures a 3D hyperspectral data cube (2 spatial + 1 spectral dimension) in a single 2D camera exposure. The scene is modulated by a binary coded aperture mask, spectrally dispersed by a prism, and integrated onto a 2D detector. The forward model is y = H*x + n where H encodes both coded-aperture modulation and spectral-dispersion shift. Compression ratios equal the number of spectral bands (e.g. 28:1). Reconstruction exploits spectral correlation via GAP-TV, MST, or CST.
Principle
Coded Aperture Snapshot Spectral Imaging (CASSI) captures a full 3-D spectral datacube (x, y, λ) in a single 2-D snapshot by encoding the scene with a binary coded aperture and spectrally dispersing it with a prism onto the detector. Different spectral channels are shifted and superimposed on the sensor, creating a compressed measurement. Computational algorithms recover the full datacube from this single measurement using sparsity priors.
How to Build the System
Build an optical relay with an objective lens, place a binary coded aperture (lithographic chrome-on-glass mask or DMD) at an intermediate image plane, then disperse with an Amici or double-Amici prism, and re-image onto a high-resolution detector (2048× 2048+ pixels). Precisely calibrate the spectral dispersion curve (nm/pixel). The coded aperture pattern should have ~50 % transmittance and good conditioning.
Common Reconstruction Algorithms
- TwIST (Two-step Iterative Shrinkage/Thresholding)
- GAP-TV (Generalized Alternating Projection with Total Variation)
- ADMM with sparsity in DCT or wavelet domain
- Deep unfolding networks (DGSMP, TSA-Net, BIRNAT)
- Plug-and-Play ADMM with learned denoisers
Common Mistakes
- Poor spectral calibration causing wavelength assignment errors across the datacube
- Coded aperture not precisely at the image plane, blurring the code modulation
- Insufficient detector resolution relative to the number of spectral bands
- Ignoring optical aberrations in the dispersive relay that vary with wavelength
- Using a random mask without checking its sensing matrix condition number
How to Avoid Mistakes
- Calibrate spectral mapping with monochromatic sources at known wavelengths
- Mount coded aperture on a precision z-stage and focus to maximize modulation contrast
- Ensure detector pixel count > (spatial pixels × spectral bands) for adequate compression ratio
- Design the relay optics for uniform imaging quality across the spectral range
- Optimize or simulate the mask pattern for low coherence (good RIP) before fabrication
Forward-Model Mismatch Cases
- The widefield fallback produces a 2D (64,64) grayscale image, but CASSI compresses a 3D spectral datacube (64,64,L wavelengths) into a single 2D coded snapshot via a binary mask and dispersive prism — the spectral dimension is entirely absent
- Without the coded aperture mask and spectral dispersion, the measurement does not encode wavelength-dependent information — spectral unmixing or hyperspectral reconstruction from the fallback output is impossible
How to Correct the Mismatch
- Use the CASSI operator that applies the binary coded aperture mask followed by spectral dispersion (prism/grating shift), producing a 2D coded measurement that encodes the full 3D spectral datacube
- Reconstruct the (x,y,lambda) datacube using compressive sensing (TwIST, GAP-TV) or deep unfolding networks (TSA-Net, MST) that exploit the spatio-spectral structure encoded by the CASSI forward model
Experimental Setup — Signal Chain
Experimental Setup — Details
Benchmark Variants
Key References
- Wagadarikar et al., 'Single disperser design for coded aperture snapshot spectral imaging', Applied Optics 47, B44-B51 (2008)
- Cai et al., 'Mask-guided Spectral-wise Transformer (MST++)', CVPRW 2022
Canonical Datasets
- CAVE (Columbia, 32 scenes, 512x512x31)
- KAIST (30 scenes, 2704x3376x28)
- ARAD_1K (1000 hyperspectral images)
Generic Compressive Matrix Sensing
Generic compressive sensing framework where the measurement process is modelled as y = A*x + n with A being an explicit M x N sensing matrix (M < N). This covers any linear inverse problem including random Gaussian, Bernoulli, or structured sensing matrices. The compressed sensing theory of Candes, Romberg, and Tao guarantees exact recovery when x is sparse and A satisfies the restricted isometry property (RIP). Reconstruction uses standard proximal algorithms (FISTA, ADMM) with sparsity-promoting regularizers (L1, TV, wavelet).
Generic Compressive Matrix Sensing
Description
Generic compressive sensing framework where the measurement process is modelled as y = A*x + n with A being an explicit M x N sensing matrix (M < N). This covers any linear inverse problem including random Gaussian, Bernoulli, or structured sensing matrices. The compressed sensing theory of Candes, Romberg, and Tao guarantees exact recovery when x is sparse and A satisfies the restricted isometry property (RIP). Reconstruction uses standard proximal algorithms (FISTA, ADMM) with sparsity-promoting regularizers (L1, TV, wavelet).
Principle
Generic matrix sensing models the forward process as y = Ax + n, where A is an arbitrary measurement matrix (not necessarily structured like a convolution or Radon transform). This is the most general compressive sensing framework, applicable to random projections, coded apertures, and any linear dimensionality reduction scheme. The key requirement is that A satisfies the Restricted Isometry Property (RIP) for successful sparse recovery.
How to Build the System
Implementation depends on the physical sensing modality. For optical random projections, use a DMD or scattering medium to implement pseudo-random measurement vectors. Calibrate the measurement matrix A by measuring the system response to a complete basis set (e.g., Hadamard patterns). Store A as a dense or structured matrix. Ensure the measurement SNR is adequate for the desired reconstruction quality.
Common Reconstruction Algorithms
- ISTA / FISTA (Iterative Shrinkage-Thresholding Algorithm)
- Basis pursuit (L1 minimization via linear programming)
- AMP (Approximate Message Passing)
- ADMM with various regularizers (TV, wavelet sparsity, low-rank)
- Learned ISTA (LISTA) and other deep unfolding networks
Common Mistakes
- Measurement matrix does not satisfy RIP (too coherent or poorly conditioned)
- Mismatch between calibrated A and actual system behavior (model error)
- Not accounting for measurement noise level when setting regularization strength
- Using an insufficiently sparse signal model for the reconstruction
- Ignoring quantization effects of the detector in the measurement model
How to Avoid Mistakes
- Verify the condition number and coherence of A; use random or optimized designs
- Re-calibrate A periodically to account for system drift
- Set regularization parameter proportional to noise level (e.g., via cross-validation)
- Validate sparsity assumption on representative signals before deploying CS
- Include quantization noise in the forward model or use dithering techniques
Forward-Model Mismatch Cases
- The widefield fallback applies a Gaussian blur (shape-preserving convolution), but the correct compressed sensing operator applies a random measurement matrix y = Phi*x that projects the image into a lower-dimensional space
- Gaussian blur preserves spatial locality and image structure, whereas the random measurement matrix scrambles all spatial information — the fallback measurements contain no compressed-sensing-compatible encoding
How to Correct the Mismatch
- Use the correct compressed sensing operator with the measurement matrix Phi (Gaussian random, partial Fourier, or structured random), producing y = Phi * vec(x)
- Reconstruct using L1/TV-regularized optimization (ISTA, ADMM) or learned proximal operators designed for the specific measurement matrix structure
Experimental Setup — Signal Chain
Experimental Setup — Details
Key References
- Candes et al., 'Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information', IEEE TIT 52, 489-509 (2006)
- Donoho, 'Compressed sensing', IEEE TIT 52, 1289-1306 (2006)
Canonical Datasets
- Set11 / BSD68 (simulation benchmarks)
Single-Pixel Camera
The single-pixel camera reconstructs a 2D image from scalar intensity measurements acquired by a photodiode after spatially modulating the scene with known patterns on a DMD. Each measurement y_i is the inner product of the scene with a pattern, giving y = Phi*x + n. Compressed sensing theory guarantees recovery from M << N measurements if the scene is sparse. The single detector can operate at wavelengths where array detectors are unavailable (SWIR, THz). Reconstruction uses FISTA with L1/TV penalties or Plug-and-Play methods.
Single-Pixel Camera
Description
The single-pixel camera reconstructs a 2D image from scalar intensity measurements acquired by a photodiode after spatially modulating the scene with known patterns on a DMD. Each measurement y_i is the inner product of the scene with a pattern, giving y = Phi*x + n. Compressed sensing theory guarantees recovery from M << N measurements if the scene is sparse. The single detector can operate at wavelengths where array detectors are unavailable (SWIR, THz). Reconstruction uses FISTA with L1/TV penalties or Plug-and-Play methods.
Principle
A single-pixel camera uses a spatial light modulator (DMD) to project a sequence of binary or grayscale patterns onto the scene. Each pattern multiplies the scene, and a single bucket detector (photodiode or PMT) measures the total light for each pattern, producing one scalar measurement per pattern. Compressive sensing recovers the image from far fewer measurements than Nyquist by exploiting sparsity in a transform domain.
How to Build the System
Place a DMD (e.g., Texas Instruments DLP LightCrafter) at the image plane of a relay lens. Focus the scene onto the DMD. After the DMD, collect all reflected light onto a single photodetector (avalanche photodiode for low light, or silicon photodiode for visible). Display Hadamard, random, or optimized patterns at 10-22 kHz DMD rate. Synchronize pattern display with detector readout.
Common Reconstruction Algorithms
- Basis pursuit / L1 minimization (LASSO)
- Orthogonal matching pursuit (OMP)
- Total-variation minimization (TV-CS)
- TVAL3 (TV with augmented Lagrangian and alternating direction)
- Deep compressive sensing networks (ReconNet, CSNet)
Common Mistakes
- Pattern-detector timing mismatch causing wrong measurement-to-pattern association
- DMD diffraction effects not accounted for at oblique illumination angles
- Insufficient measurements for the scene complexity (under-sampling ratio too aggressive)
- Analog-to-digital converter resolution too low for the dynamic range of measurements
- Not calibrating detector linearity and dark current drift during long acquisitions
How to Avoid Mistakes
- Hardware-trigger the detector acquisition from the DMD synchronization signal
- Calibrate the effective pattern at the sample plane (not just the DMD command pattern)
- Start with 25-50 % measurement ratio for natural scenes; reduce only if sparsity allows
- Use 16-bit or higher ADC; verify linearity with a calibrated light source
- Measure dark frames periodically and subtract; maintain stable detector temperature
Forward-Model Mismatch Cases
- The widefield fallback produces a 2D (64,64) image, but single-pixel camera acquires a 1D vector of M scalar measurements (M << N pixels) via structured illumination patterns and a single photodetector — output shape (M,) vs (64,64)
- Each SPC measurement is an inner product of the scene with a known pattern (y_i = <phi_i, x>), capturing compressed information — the widefield blur produces N^2 pixels with no compression, making compressive reconstruction algorithms incompatible
How to Correct the Mismatch
- Use the SPC operator that applies the sensing matrix Phi (Hadamard, random, or learned patterns): y = Phi * x, where y has far fewer entries than the image has pixels
- Reconstruct using compressive sensing algorithms (ISTA-Net, basis pursuit, total variation) that exploit sparsity to recover the N^2-pixel image from M << N^2 measurements
Experimental Setup — Signal Chain
Experimental Setup — Details
Benchmark Variants
Key References
- Duarte et al., 'Single-pixel imaging via compressive sampling', IEEE Signal Processing Magazine 25, 83-91 (2008)
- Edgar et al., 'Principles and prospects for single-pixel imaging', Nature Photonics 13, 13-20 (2019)
Canonical Datasets
- Set11 (11 standard test images)
- BSD68 (Martin et al., ICCV 2001)