Physics World Model — Modality Catalog
5 imaging modalities with descriptions, experimental setups, and reconstruction guidance.
Flash LiDAR
Flash LiDAR
LiDAR Scanner
LiDAR (Light Detection and Ranging) measures distances by emitting laser pulses and timing the round-trip to the reflecting surface. Automotive LiDAR systems use rotating multi-beam scanners (e.g., Velodyne HDL-64E) or solid-state flash LiDAR to acquire 3D point clouds at 10-20 Hz. The forward model is simple time-of-flight: d = c*t/2. The resulting sparse point cloud requires densification, ground segmentation, and object detection. Primary challenges include sparse sampling, intensity variation with surface reflectivity, and rain/fog attenuation.
LiDAR Scanner
Description
LiDAR (Light Detection and Ranging) measures distances by emitting laser pulses and timing the round-trip to the reflecting surface. Automotive LiDAR systems use rotating multi-beam scanners (e.g., Velodyne HDL-64E) or solid-state flash LiDAR to acquire 3D point clouds at 10-20 Hz. The forward model is simple time-of-flight: d = c*t/2. The resulting sparse point cloud requires densification, ground segmentation, and object detection. Primary challenges include sparse sampling, intensity variation with surface reflectivity, and rain/fog attenuation.
Principle
Light Detection and Ranging (LiDAR) measures distances by emitting laser pulses (905 nm or 1550 nm) and timing their return after reflection from the scene (time-of-flight: d = c·t/2). A scanning mechanism (rotating mirror, MEMS, or optical phased array) sweeps the beam to build a 3-D point cloud of the environment. Resolution depends on the beam divergence, scanning density, and pulse timing precision.
How to Build the System
Select a LiDAR sensor appropriate for the application: mechanical spinning (Velodyne VLP-16/128 for autonomous vehicles), solid-state (Livox, Ouster), or airborne (Leica ALS80 for terrain mapping). Mount rigidly and combine with an IMU and GNSS for georeferencing. Calibrate intrinsic parameters (beam angles, timing offsets, intensity response) and extrinsics (relative to vehicle coordinate frame). Process returns: first/last/full waveform for different applications.
Common Reconstruction Algorithms
- Point cloud registration (ICP, NDT for multi-scan alignment)
- Ground filtering and classification (progressive morphological filter)
- SLAM (Simultaneous Localization and Mapping) with LiDAR
- Object detection and segmentation (PointNet, PointPillars)
- Surface reconstruction from point clouds (Poisson, ball-pivoting)
Common Mistakes
- Multi-echo / multi-path reflections causing ghost points
- Motion distortion in the point cloud from vehicle movement during one scan rotation
- Incorrect calibration causing misalignment between LiDAR and camera data
- Rain, fog, or dust causing false returns and reduced range
- Near-range blind zone where the receiver is not sensitive to returns
How to Avoid Mistakes
- Filter ghost points using intensity thresholds and multi-return analysis
- Apply ego-motion compensation using IMU data to deskew each scan
- Perform target-based or targetless calibration between LiDAR and other sensors
- Use 1550 nm wavelength (eye-safe and less affected by rain) for outdoor applications
- Account for minimum range specification; fuse with short-range sensors if needed
Forward-Model Mismatch Cases
- The widefield fallback produces a 2D (64,64) image, but LiDAR produces a 1D or 3D point cloud of range measurements (r_i = c*t_i/2) — the output is a set of (x,y,z) points, not a blurred image
- LiDAR measures distance by timing laser pulse round-trips, with angular scanning determining direction — the widefield spatial blur has no connection to time-of-flight distance measurement or angular scanning geometry
How to Correct the Mismatch
- Use the LiDAR operator that models pulsed laser emission, scene reflection (surface albedo and geometry), and time-of-flight detection: range = c*delta_t/2 for each beam direction
- Process the point cloud using registration (ICP), ground classification, or object detection algorithms that operate on the correct 3D range measurement format
Experimental Setup — Signal Chain
Experimental Setup — Details
Key References
- Geiger et al., 'Are we ready for autonomous driving? The KITTI vision benchmark suite', CVPR 2012
Canonical Datasets
- KITTI 3D object detection
- nuScenes (1000 driving scenes)
- Waymo Open Dataset
Photometric Stereo
Photometric Stereo
Structured-Light Depth Camera
Structured-light depth cameras project a known pattern (IR dot pattern, fringe, or binary code) onto the scene and infer depth from the pattern deformation observed by a camera offset from the projector. For coded structured light (e.g., Kinect v1), depth is computed via triangulation from the correspondence between projected and observed pattern features. For phase-shifting methods, multiple fringe patterns encode depth as the local phase. Primary challenges include occlusion in the projector-camera baseline, ambient light interference, and depth discontinuity errors.
Structured-Light Depth Camera
Description
Structured-light depth cameras project a known pattern (IR dot pattern, fringe, or binary code) onto the scene and infer depth from the pattern deformation observed by a camera offset from the projector. For coded structured light (e.g., Kinect v1), depth is computed via triangulation from the correspondence between projected and observed pattern features. For phase-shifting methods, multiple fringe patterns encode depth as the local phase. Primary challenges include occlusion in the projector-camera baseline, ambient light interference, and depth discontinuity errors.
Principle
Structured-light depth sensing projects a known pattern (stripes, dots, coded binary patterns) onto the scene and observes the pattern deformation with a camera from a different viewpoint. The displacement (disparity) of each pattern element between projected and observed positions encodes the surface depth via triangulation. Dense depth maps are obtained by identifying pattern correspondences across the scene.
How to Build the System
Arrange a projector (DLP or laser dot projector) and camera with a known baseline separation (5-25 cm) and convergent geometry. Calibrate the projector-camera system (intrinsics and extrinsics) using a planar calibration target. For temporal coding (Gray code), project multiple patterns sequentially. For spatial coding (single-shot, e.g., Apple FaceID dot projector), use a diffractive optical element to generate a unique dot pattern.
Common Reconstruction Algorithms
- Gray code + phase shifting (sequential multi-pattern decoding)
- Single-shot coded pattern matching (speckle or pseudo-random dot decoding)
- Phase unwrapping for sinusoidal fringe projection
- Stereo matching applied to textured scenes (active stereo)
- Deep-learning depth estimation from structured-light patterns
Common Mistakes
- Ambient light washing out the projected pattern, losing depth information
- Specular (shiny) surfaces reflecting the projector into the camera, causing erroneous depth
- Occlusion zones where the projector illuminates but the camera cannot see (shadowed regions)
- Insufficient projector resolution limiting the achievable depth precision
- Color/reflectance variations in the scene altering perceived pattern intensity
How to Avoid Mistakes
- Use NIR projector + camera with ambient-light rejection filter
- Apply polarization filtering or spray surfaces with matte coating for calibration
- Add a second camera or projector to reduce occlusion zones
- Use high-resolution projectors (1080p+) and fine patterns for sub-mm precision
- Use binary or phase-shifting patterns that are robust to reflectance variations
Forward-Model Mismatch Cases
- The widefield fallback applies spatial blur, but structured-light depth sensing projects known patterns and measures their deformation via triangulation — the depth-encoding pattern correspondence between projector and camera is absent
- Structured light extracts depth from disparity between projected and observed pattern positions (d = f*B/disparity) — the widefield blur produces no disparity information and cannot encode surface depth
How to Correct the Mismatch
- Use the structured-light operator that models pattern projection (Gray code, sinusoidal fringe, or speckle) and camera observation from a different viewpoint: depth is encoded in pattern deformation due to surface geometry
- Extract depth maps using pattern decoding (Gray code → correspondence → triangulation) or phase unwrapping (sinusoidal fringe → depth) with calibrated projector-camera geometry
Experimental Setup — Signal Chain
Experimental Setup — Details
Key References
- Geng, 'Structured-light 3D surface imaging: a tutorial', Advances in Optics and Photonics 3, 128-160 (2011)
Canonical Datasets
- Middlebury stereo benchmark
- ETH3D multi-view stereo benchmark
Time-of-Flight Depth Camera
ToF cameras measure per-pixel depth by emitting modulated near-infrared light and measuring the phase delay of the reflected signal relative to the emitted signal. In amplitude-modulated continuous-wave (AMCW) ToF, the phase offset phi = 2*pi*f*2d/c encodes the round-trip distance 2d. Multiple modulation frequencies resolve depth ambiguity. Primary degradations include multi-path interference (MPI), motion blur, and systematic errors at depth discontinuities (flying pixels).
Time-of-Flight Depth Camera
Description
ToF cameras measure per-pixel depth by emitting modulated near-infrared light and measuring the phase delay of the reflected signal relative to the emitted signal. In amplitude-modulated continuous-wave (AMCW) ToF, the phase offset phi = 2*pi*f*2d/c encodes the round-trip distance 2d. Multiple modulation frequencies resolve depth ambiguity. Primary degradations include multi-path interference (MPI), motion blur, and systematic errors at depth discontinuities (flying pixels).
Principle
A Time-of-Flight depth camera measures the round-trip time of modulated light (typically near-infrared LEDs at 850 nm) reflected from the scene. The sensor measures the phase shift between emitted and received modulated signals at each pixel, which is proportional to the target distance: d = c·Δφ/(4π·f_mod). Typical modulation frequencies are 20-100 MHz, providing depth ranges of 0.5-10 meters with mm-cm precision.
How to Build the System
Use an integrated ToF camera module (e.g., Microsoft Azure Kinect DK, PMD CamBoard pico, Texas Instruments OPT8241). The module contains the NIR light source, modulation driver, and ToF sensor with per-pixel demodulation circuits. Mount rigidly and calibrate intrinsic parameters (lens distortion, depth offset) and phase-to-depth nonlinearities. For multi-camera setups, synchronize or frequency-multiplex to avoid interference.
Common Reconstruction Algorithms
- Four-phase demodulation for distance extraction
- Multi-frequency unwrapping for extended unambiguous range
- Flying-pixel filtering (mixed pixels at depth discontinuities)
- Multi-path interference correction
- Deep-learning depth denoising and completion
Common Mistakes
- Multi-path interference causing systematic depth errors in concave scenes
- Flying pixels at depth edges producing incorrect intermediate depth values
- Phase wrapping ambiguity when objects exceed the unambiguous range
- Interference from ambient NIR light (sunlight) degrading outdoor performance
- Systematic depth errors from non-ideal sensor response not calibrated out
How to Avoid Mistakes
- Use multi-path correction algorithms or multi-frequency modulation
- Apply flying-pixel detection and removal based on amplitude and neighbor consistency
- Use dual-frequency operation to extend the unambiguous range
- Use narrow-band optical filter and higher modulation power for outdoor use
- Perform per-pixel depth calibration with a known flat reference at multiple distances
Forward-Model Mismatch Cases
- The widefield fallback produces a 2D intensity image, but ToF cameras measure depth via phase shift of modulated near-infrared light — the distance information (d = c*dphi/(4*pi*f_mod)) is entirely absent from the blurred image
- ToF measurement involves demodulation of the reflected modulated signal at each pixel, producing amplitude, phase, and confidence maps — the widefield intensity-only blur cannot produce depth or distinguish multi-path interference
How to Correct the Mismatch
- Use the ToF camera operator that models modulated illumination and per-pixel demodulation: four-phase sampling extracts the phase shift proportional to target distance at each pixel
- Apply phase-to-depth conversion, multi-path correction, and flying-pixel filtering using the correct modulation frequency, amplitude, and phase measurement model
Experimental Setup — Signal Chain
Experimental Setup — Details
Key References
- Hansard et al., 'Time-of-Flight Cameras: Principles, Methods and Applications', Springer (2013)
Canonical Datasets
- NYU Depth V2 (Silberman et al.)
- KITTI depth benchmark (adapted)