Physics World Model — Modality Catalog

5 imaging modalities with descriptions, experimental setups, and reconstruction guidance.

All Depth Imaging

Flash LiDAR

LiDAR (Light Detection and Ranging) measures distances by emitting laser pulses and timing the round-trip to the reflecting surface. Automotive LiDAR systems use rotating multi-beam scanners (e.g., Velodyne HDL-64E) or solid-state flash LiDAR to acquire 3D point clouds at 10-20 Hz. The forward model is simple time-of-flight: d = c*t/2. The resulting sparse point cloud requires densification, ground segmentation, and object detection. Primary challenges include sparse sampling, intensity variation with surface reflectivity, and rain/fog attenuation.

Physics: time of flight

Solver: tv_fista

Noise: gaussian

#depth #lidar #point_cloud #autonomous_driving #3d

View details →

LiDAR Scanner

Description

Principle

Light Detection and Ranging (LiDAR) measures distances by emitting laser pulses (905 nm or 1550 nm) and timing their return after reflection from the scene (time-of-flight: d = c·t/2). A scanning mechanism (rotating mirror, MEMS, or optical phased array) sweeps the beam to build a 3-D point cloud of the environment. Resolution depends on the beam divergence, scanning density, and pulse timing precision.

How to Build the System

Select a LiDAR sensor appropriate for the application: mechanical spinning (Velodyne VLP-16/128 for autonomous vehicles), solid-state (Livox, Ouster), or airborne (Leica ALS80 for terrain mapping). Mount rigidly and combine with an IMU and GNSS for georeferencing. Calibrate intrinsic parameters (beam angles, timing offsets, intensity response) and extrinsics (relative to vehicle coordinate frame). Process returns: first/last/full waveform for different applications.

Common Reconstruction Algorithms

Point cloud registration (ICP, NDT for multi-scan alignment)
Ground filtering and classification (progressive morphological filter)
SLAM (Simultaneous Localization and Mapping) with LiDAR
Object detection and segmentation (PointNet, PointPillars)
Surface reconstruction from point clouds (Poisson, ball-pivoting)

Common Mistakes

Multi-echo / multi-path reflections causing ghost points
Motion distortion in the point cloud from vehicle movement during one scan rotation
Incorrect calibration causing misalignment between LiDAR and camera data
Rain, fog, or dust causing false returns and reduced range
Near-range blind zone where the receiver is not sensitive to returns

How to Avoid Mistakes

Filter ghost points using intensity thresholds and multi-return analysis
Apply ego-motion compensation using IMU data to deskew each scan
Perform target-based or targetless calibration between LiDAR and other sensors
Use 1550 nm wavelength (eye-safe and less affected by rain) for outdoor applications
Account for minimum range specification; fuse with short-range sensors if needed

Forward-Model Mismatch Cases

The widefield fallback produces a 2D (64,64) image, but LiDAR produces a 1D or 3D point cloud of range measurements (r_i = c*t_i/2) — the output is a set of (x,y,z) points, not a blurred image
LiDAR measures distance by timing laser pulse round-trips, with angular scanning determining direction — the widefield spatial blur has no connection to time-of-flight distance measurement or angular scanning geometry

How to Correct the Mismatch

Use the LiDAR operator that models pulsed laser emission, scene reflection (surface albedo and geometry), and time-of-flight detection: range = c*delta_t/2 for each beam direction
Process the point cloud using registration (ICP), ground classification, or object detection algorithms that operate on the correct 3D range measurement format

Experimental Setup — Signal Chain

Experimental setup diagram for LiDAR Scanner

Experimental Setup — Details

Instrument: Velodyne HDL-64E / Ouster OS1-128 / Livox Avia

Channels: 64

Range M: 120

Horizontal Fov Deg: 360

Vertical Fov Deg: 27

Horizontal Resolution Deg: 0.08

Rotation Rate Hz: 10

Wavelength Nm: 905

Points Per Second: 2200000

Dataset: KITTI, nuScenes, Waymo Open

Key References

Geiger et al., 'Are we ready for autonomous driving? The KITTI vision benchmark suite', CVPR 2012

Canonical Datasets

KITTI 3D object detection
nuScenes (1000 driving scenes)
Waymo Open Dataset

Photometric Stereo

photometric_stereo Depth Imaging

Physics: Photon

View details →

Structured-Light Depth Camera

structured_light Depth Imaging

Structured-light depth cameras project a known pattern (IR dot pattern, fringe, or binary code) onto the scene and infer depth from the pattern deformation observed by a camera offset from the projector. For coded structured light (e.g., Kinect v1), depth is computed via triangulation from the correspondence between projected and observed pattern features. For phase-shifting methods, multiple fringe patterns encode depth as the local phase. Primary challenges include occlusion in the projector-camera baseline, ambient light interference, and depth discontinuity errors.

Physics: structured light

Solver: phase_unwrap

Noise: gaussian

#depth #structured_light #3d #triangulation #ir_projection

View details →

Structured-Light Depth Camera

Description

Principle

Structured-light depth sensing projects a known pattern (stripes, dots, coded binary patterns) onto the scene and observes the pattern deformation with a camera from a different viewpoint. The displacement (disparity) of each pattern element between projected and observed positions encodes the surface depth via triangulation. Dense depth maps are obtained by identifying pattern correspondences across the scene.

How to Build the System

Arrange a projector (DLP or laser dot projector) and camera with a known baseline separation (5-25 cm) and convergent geometry. Calibrate the projector-camera system (intrinsics and extrinsics) using a planar calibration target. For temporal coding (Gray code), project multiple patterns sequentially. For spatial coding (single-shot, e.g., Apple FaceID dot projector), use a diffractive optical element to generate a unique dot pattern.

Common Reconstruction Algorithms

Gray code + phase shifting (sequential multi-pattern decoding)
Single-shot coded pattern matching (speckle or pseudo-random dot decoding)
Phase unwrapping for sinusoidal fringe projection
Stereo matching applied to textured scenes (active stereo)
Deep-learning depth estimation from structured-light patterns

Common Mistakes

Ambient light washing out the projected pattern, losing depth information
Specular (shiny) surfaces reflecting the projector into the camera, causing erroneous depth
Occlusion zones where the projector illuminates but the camera cannot see (shadowed regions)
Insufficient projector resolution limiting the achievable depth precision
Color/reflectance variations in the scene altering perceived pattern intensity

How to Avoid Mistakes

Use NIR projector + camera with ambient-light rejection filter
Apply polarization filtering or spray surfaces with matte coating for calibration
Add a second camera or projector to reduce occlusion zones
Use high-resolution projectors (1080p+) and fine patterns for sub-mm precision
Use binary or phase-shifting patterns that are robust to reflectance variations

Forward-Model Mismatch Cases

The widefield fallback applies spatial blur, but structured-light depth sensing projects known patterns and measures their deformation via triangulation — the depth-encoding pattern correspondence between projector and camera is absent
Structured light extracts depth from disparity between projected and observed pattern positions (d = f*B/disparity) — the widefield blur produces no disparity information and cannot encode surface depth

How to Correct the Mismatch

Use the structured-light operator that models pattern projection (Gray code, sinusoidal fringe, or speckle) and camera observation from a different viewpoint: depth is encoded in pattern deformation due to surface geometry
Extract depth maps using pattern decoding (Gray code → correspondence → triangulation) or phase unwrapping (sinusoidal fringe → depth) with calibrated projector-camera geometry

Experimental Setup — Signal Chain

Experimental setup diagram for Structured-Light Depth Camera

Experimental Setup — Details

Instrument: Intel RealSense D435i / Apple TrueDepth / Kinect v1

Pattern: pseudorandom IR dot pattern / fringe projection

Wavelength Nm: 850

Range M: 0.2-10.0

Depth Resolution: 1280x720

Accuracy Mm: 1.0

Frame Rate Fps: 30

Baseline Mm: 55

Key References

Geng, 'Structured-light 3D surface imaging: a tutorial', Advances in Optics and Photonics 3, 128-160 (2011)

Canonical Datasets

Middlebury stereo benchmark
ETH3D multi-view stereo benchmark

Time-of-Flight Depth Camera

tof_camera Depth Imaging

ToF cameras measure per-pixel depth by emitting modulated near-infrared light and measuring the phase delay of the reflected signal relative to the emitted signal. In amplitude-modulated continuous-wave (AMCW) ToF, the phase offset phi = 2*pi*f*2d/c encodes the round-trip distance 2d. Multiple modulation frequencies resolve depth ambiguity. Primary degradations include multi-path interference (MPI), motion blur, and systematic errors at depth discontinuities (flying pixels).

Physics: time of flight

Solver: tv_fista

Noise: gaussian

#depth #tof #3d #nir #range_imaging

View details →

Time-of-Flight Depth Camera

Description

Principle

A Time-of-Flight depth camera measures the round-trip time of modulated light (typically near-infrared LEDs at 850 nm) reflected from the scene. The sensor measures the phase shift between emitted and received modulated signals at each pixel, which is proportional to the target distance: d = c·Δφ/(4π·f_mod). Typical modulation frequencies are 20-100 MHz, providing depth ranges of 0.5-10 meters with mm-cm precision.

How to Build the System

Use an integrated ToF camera module (e.g., Microsoft Azure Kinect DK, PMD CamBoard pico, Texas Instruments OPT8241). The module contains the NIR light source, modulation driver, and ToF sensor with per-pixel demodulation circuits. Mount rigidly and calibrate intrinsic parameters (lens distortion, depth offset) and phase-to-depth nonlinearities. For multi-camera setups, synchronize or frequency-multiplex to avoid interference.

Common Reconstruction Algorithms

Four-phase demodulation for distance extraction
Multi-frequency unwrapping for extended unambiguous range
Flying-pixel filtering (mixed pixels at depth discontinuities)
Multi-path interference correction
Deep-learning depth denoising and completion

Common Mistakes

Multi-path interference causing systematic depth errors in concave scenes
Flying pixels at depth edges producing incorrect intermediate depth values
Phase wrapping ambiguity when objects exceed the unambiguous range
Interference from ambient NIR light (sunlight) degrading outdoor performance
Systematic depth errors from non-ideal sensor response not calibrated out

How to Avoid Mistakes

Use multi-path correction algorithms or multi-frequency modulation
Apply flying-pixel detection and removal based on amplitude and neighbor consistency
Use dual-frequency operation to extend the unambiguous range
Use narrow-band optical filter and higher modulation power for outdoor use
Perform per-pixel depth calibration with a known flat reference at multiple distances

Forward-Model Mismatch Cases

The widefield fallback produces a 2D intensity image, but ToF cameras measure depth via phase shift of modulated near-infrared light — the distance information (d = c*dphi/(4*pi*f_mod)) is entirely absent from the blurred image
ToF measurement involves demodulation of the reflected modulated signal at each pixel, producing amplitude, phase, and confidence maps — the widefield intensity-only blur cannot produce depth or distinguish multi-path interference

How to Correct the Mismatch

Use the ToF camera operator that models modulated illumination and per-pixel demodulation: four-phase sampling extracts the phase shift proportional to target distance at each pixel
Apply phase-to-depth conversion, multi-path correction, and flying-pixel filtering using the correct modulation frequency, amplitude, and phase measurement model

Experimental Setup — Signal Chain

Experimental setup diagram for Time-of-Flight Depth Camera

Experimental Setup — Details

Instrument: Intel RealSense L515 / Microsoft Azure Kinect DK

Depth Resolution: 640x480

Range M: 0.1-6.0

Frame Rate Fps: 30

Wavelength Nm: 850

Depth Accuracy Mm: 2.0

Modulation: AMCW (amplitude-modulated continuous wave)

Key References

Hansard et al., 'Time-of-Flight Cameras: Principles, Methods and Applications', Springer (2013)

Canonical Datasets

NYU Depth V2 (Silberman et al.)
KITTI depth benchmark (adapted)