Physics World Model — Modality Catalog
3 imaging modalities with descriptions, experimental setups, and reconstruction guidance.
Integral Photography
Integral photography (IP), originally proposed by Lippmann in 1908, captures a light field using a fly-eye lens array (matrix of small lenses) where each lenslet records a small elemental image from a slightly different perspective. The array of elemental images encodes 3D scene information, enabling computational refocusing, depth estimation, and autostereoscopic 3D display. Compared to microlens-based plenoptic cameras, IP typically uses larger lenslets with correspondingly more pixels per lens. Reconstruction includes depth-from-correspondence between elemental images and 3D focal stack computation.
Integral Photography
Description
Integral photography (IP), originally proposed by Lippmann in 1908, captures a light field using a fly-eye lens array (matrix of small lenses) where each lenslet records a small elemental image from a slightly different perspective. The array of elemental images encodes 3D scene information, enabling computational refocusing, depth estimation, and autostereoscopic 3D display. Compared to microlens-based plenoptic cameras, IP typically uses larger lenslets with correspondingly more pixels per lens. Reconstruction includes depth-from-correspondence between elemental images and 3D focal stack computation.
Principle
Integral photography (also known as integral imaging) uses a 2-D array of elemental lenses to capture multi-perspective views of a 3-D scene simultaneously. Each elemental lens records a small perspective image, and the full set encodes the 4-D light field. Computational reconstruction produces 3-D images that can be viewed from different angles or refocused without glasses.
How to Build the System
Place a 2-D microlens or lenslet array (pitch 0.5-1 mm, ~50-200 elements per side) at one focal length from a high-resolution sensor. Each lenslet forms a separate elemental image. For display: show the integral image on a high-resolution display with a matched output lenslet array. Calibrate lenslet grid alignment, individual lens focal lengths, and vignetting correction. Use telecentric imaging for uniform magnification.
Common Reconstruction Algorithms
- Computational refocusing via pixel rearrangement and summation
- Depth estimation from elemental image disparity analysis
- 3-D scene reconstruction from integral images
- Super-resolution integral imaging (combining multiple shifted captures)
- Deep-learning integral image reconstruction and view synthesis
Common Mistakes
- Lenslet array not properly aligned with the sensor pixel grid
- Insufficient number of elemental lenses for the desired depth range
- Crosstalk between adjacent elemental images due to lens aberrations
- Not correcting for vignetting variations across the lenslet array
- Pseudoscopic (depth-reversed) images if reconstruction is not properly handled
How to Avoid Mistakes
- Align lenslet array to sensor with precision jigs and verify with calibration patterns
- Design lenslet pitch and focal length for the required depth-of-field
- Use high-quality molded lenslets and baffles to minimize crosstalk
- Apply per-lenslet calibration including vignetting and distortion correction
- Use computational depth inversion to correct pseudoscopic effects
Forward-Model Mismatch Cases
- The widefield fallback produces a single-perspective blurred image, but integral imaging captures multiple sub-aperture views through a lenslet array — each elemental image sees the scene from a slightly different angle
- Without the lenslet-array angular encoding, depth information (parallax between views) is lost — computational refocusing and 3D reconstruction from the fallback output are impossible
How to Correct the Mismatch
- Use the integral imaging operator that models the lenslet array: each microlens captures a different angular perspective, encoding the 4D light field on the 2D sensor
- Reconstruct depth maps via disparity estimation between elemental images, and perform computational refocusing using pixel rearrangement and summation across sub-aperture views
Experimental Setup — Signal Chain
Experimental Setup — Details
Key References
- Lippmann, C. R. Acad. Sci. Paris 146, 446 (1908)
- Park et al., 'Recent progress in 3D imaging systems', J. Opt. Soc. Am. A 26, 2538 (2009)
Canonical Datasets
- ETRI integral imaging test set
- Middlebury multi-view stereo (adapted)
Light Field Imaging
Light field imaging captures the full 4D radiance function L(x,y,u,v) describing both spatial position (x,y) and angular direction (u,v) of light rays. A microlens array placed before the sensor captures multiple sub-aperture views simultaneously, enabling post-capture refocusing, depth estimation, and perspective shifts. Each microlens images the objective's exit pupil, trading spatial resolution for angular resolution. The 4D light field can be processed with shift-and-sum for refocusing, disparity estimation for depth, or epipolar-plane image (EPI) analysis. Primary challenges include the inherent spatial-angular resolution tradeoff and microlens aberrations.
Light Field Imaging
Description
Light field imaging captures the full 4D radiance function L(x,y,u,v) describing both spatial position (x,y) and angular direction (u,v) of light rays. A microlens array placed before the sensor captures multiple sub-aperture views simultaneously, enabling post-capture refocusing, depth estimation, and perspective shifts. Each microlens images the objective's exit pupil, trading spatial resolution for angular resolution. The 4D light field can be processed with shift-and-sum for refocusing, disparity estimation for depth, or epipolar-plane image (EPI) analysis. Primary challenges include the inherent spatial-angular resolution tradeoff and microlens aberrations.
Principle
Light-field imaging captures both the spatial position and direction of light rays in a scene, recording a 4-D light field L(u,v,s,t) where (u,v) parameterize the aperture and (s,t) parameterize the spatial position. This enables computational refocusing, depth estimation, and novel viewpoint synthesis from a single capture. A microlens array placed before the sensor trades spatial resolution for angular resolution.
How to Build the System
Place a microlens array (MLA) at the sensor plane of a camera, one focal length in front of the image sensor. Each microlens captures the angular distribution of light from a corresponding spatial position (Lytro-style plenoptic camera). Alternative: use a camera array (e.g., 4×4 or 8×8 synchronized cameras) for higher angular and spatial resolution. Calibrate MLA alignment, microlens pitch, and main lens parameters.
Common Reconstruction Algorithms
- Shift-and-sum refocusing (synthetic aperture)
- Depth estimation from disparity between sub-aperture images
- Fourier slice theorem for light-field refocusing
- Light-field super-resolution (recovering spatial resolution lost to MLA)
- Deep-learning view synthesis (light field reconstruction from sparse views)
Common Mistakes
- Microlens array misaligned with sensor pixels, causing vignetting and crosstalk
- Insufficient angular samples for accurate depth estimation in textureless regions
- Not calibrating MLA-to-sensor alignment, producing decoding artifacts
- Confusing spatial and angular resolution trade-off limits of the plenoptic design
- Ignoring diffraction effects at the microlens apertures
How to Avoid Mistakes
- Precisely align MLA to sensor with sub-pixel accuracy; use calibration targets
- Increase camera array density or use coded-aperture techniques for more angular samples
- Calibrate using a white image and point-source images for precise microlens grid mapping
- Design the system with the desired spatial-angular trade-off explicitly computed
- Use microlens diameters larger than the diffraction limit (> 10× wavelength)
Forward-Model Mismatch Cases
- The widefield fallback produces a single (64,64) image, but a light field camera captures both spatial and angular information via a microlens array — the output encodes multiple sub-aperture views for computational refocusing
- Without the angular dimension (directions of light rays), depth estimation from parallax and computational refocusing are impossible — the widefield model captures only a single perspective
How to Correct the Mismatch
- Use the light field operator that models the microlens array: each microlens captures light from different angular directions, producing an (x, y, u, v) 4D light field on the 2D sensor
- Reconstruct depth maps from sub-aperture disparity, perform computational refocusing via shift-and-sum, or apply light-field super-resolution to trade angular for spatial resolution
Experimental Setup — Signal Chain
Experimental Setup — Details
Key References
- Levoy & Hanrahan, 'Light field rendering', SIGGRAPH 1996
- Ng et al., 'Light field photography with a hand-held plenoptic camera', Stanford Tech Report CTSR 2005-02
Canonical Datasets
- HCI 4D Light Field Benchmark
- Stanford Lego Gantry Archive
- INRIA Lytro Light Field Dataset
Panorama Multi-Focus Fusion
Multi-focus panoramic fusion combines images captured at different focal planes and/or different spatial positions to produce an all-in-focus image with extended depth of field and wide field of view. Focus stacking selects the sharpest regions from each focal plane using local contrast measures, then blends them via Laplacian pyramid fusion or wavelet-based methods. Panoramic stitching aligns overlapping images using feature matching (SIFT/SURF) and blends seams. Primary challenges include parallax at scene edges and focus measure ambiguity in low-texture regions.
Panorama Multi-Focus Fusion
Description
Multi-focus panoramic fusion combines images captured at different focal planes and/or different spatial positions to produce an all-in-focus image with extended depth of field and wide field of view. Focus stacking selects the sharpest regions from each focal plane using local contrast measures, then blends them via Laplacian pyramid fusion or wavelet-based methods. Panoramic stitching aligns overlapping images using feature matching (SIFT/SURF) and blends seams. Primary challenges include parallax at scene edges and focus measure ambiguity in low-texture regions.
Principle
Panoramic multi-focus fusion captures multiple images of the same wide scene at different focal distances and combines them to produce a single all-in-focus panorama with extended depth of field. Image stitching aligns overlapping frames using feature matching and homography estimation, while focus fusion selects the sharpest pixels from each focal plane.
How to Build the System
Mount a camera on a motorized panoramic head (nodal point rotation). For each pan/tilt position, capture a focus stack (3-10 images at different focus distances). Use a medium-aperture setting (f/5.6-f/8) for each frame. Stitch overlapping views (30 % horizontal overlap) and fuse focus stacks per view tile. Calibrate the panoramic head to rotate around the lens entrance pupil to minimize parallax.
Common Reconstruction Algorithms
- Laplacian pyramid focus fusion (weighted blending by local contrast)
- SIFT/SURF feature matching + RANSAC homography estimation
- Multi-band blending (Burt-Adelson) for seamless stitching
- Exposure fusion (Mertens et al.) for HDR panoramas
- Deep-learning focus stacking (DFDF, DeepFocus)
Common Mistakes
- Parallax errors from rotation not centered on the lens entrance pupil
- Ghosting from moving objects between sequential captures
- Color inconsistency between overlapping tiles due to auto-exposure variation
- Incomplete focus coverage leaving blurry regions in the final panorama
- Stitching artifacts at seam lines visible in the final output
How to Avoid Mistakes
- Use a calibrated panoramic head; verify no-parallax point for the specific lens
- Mask out or blend moving objects; capture quickly or use simultaneous multi-camera rigs
- Lock exposure, white balance, and focus (manual mode) across all tiles
- Plan focus distances to cover the entire depth range of the scene
- Use multi-band blending and choose seam lines in textureless regions
Forward-Model Mismatch Cases
- The widefield fallback applies Gaussian blur to a single image, but panoramic imaging involves geometric projection (cylindrical, spherical, or equirectangular) of the scene onto a wide field of view — the projection geometry is absent
- Panorama multi-focus fusion requires modeling focus variation across the wide FOV and stitching multiple exposures — the widefield single-frame model cannot capture the spatially varying focus or overlap regions
How to Correct the Mismatch
- Use the panorama operator that models the geometric projection (cylindrical or spherical warping) and focus-dependent blur across the wide field of view
- Reconstruct using image stitching with homography estimation, exposure fusion, and spatially varying deblurring that account for the correct projection geometry
Experimental Setup — Signal Chain
Experimental Setup — Details
Key References
- Burt & Adelson, 'The Laplacian Pyramid as a Compact Image Code', IEEE Trans. Commun. 31, 532-540 (1983)
Canonical Datasets
- Lytro multi-focus test set