Physics World Model — Modality Catalog

2 imaging modalities with descriptions, experimental setups, and reconstruction guidance.

All Neural Rendering

3D Gaussian Splatting

3D Gaussian splatting represents scenes as a collection of learnable 3D Gaussian primitives, each parameterized by position, covariance (anisotropic 3D extent), opacity, and spherical harmonic color coefficients. Rendering rasterizes the Gaussians by projecting them to 2D screen space, sorting by depth, and alpha-compositing with a tile-based differentiable rasterizer. Training optimizes Gaussian parameters via gradient descent with adaptive density control (splitting, cloning, pruning). This achieves real-time (30+ fps) rendering at quality comparable to NeRF, from SfM point cloud initialization (COLMAP).

Physics: neural volume

Solver: gaussian_splatting_3dgs

Noise: gaussian

#neural_rendering #gaussian_splatting #3d #real_time #point_based

View details →

3D Gaussian Splatting

Description

Principle

3-D Gaussian Splatting represents a scene as a set of anisotropic 3-D Gaussians, each with position, covariance, opacity, and spherical harmonics color coefficients. Novel views are rendered by projecting (splatting) these Gaussians onto the image plane and alpha-compositing them in depth order. Unlike NeRF, rendering is rasterization-based and achieves real-time frame rates (≥100 fps) with high visual quality.

How to Build the System

Start with the same multi-view image dataset as NeRF (50-200 posed images via COLMAP). Initialize 3-D Gaussians from the SfM point cloud. Train by differentiable rasterization: project Gaussians to each training view, compute photometric loss (L1 + SSIM), and optimize positions, covariances, colors, and opacities via Adam. Adaptive densification (splitting/cloning Gaussians) and pruning runs periodically during training. Training takes ~15-30 minutes on a modern GPU.

Common Reconstruction Algorithms

3D Gaussian Splatting (original, Kerbl et al. 2023)
Mip-Splatting (anti-aliased multi-scale Gaussian splatting)
SuGaR (Surface-Aligned Gaussian Splatting for mesh extraction)
Dynamic 3D Gaussians (for dynamic scenes / video)
Compact-3DGS (compressed Gaussian representations)

Common Mistakes

Insufficient initial SfM points causing sparse reconstruction
Too few training views creating holes or floater artifacts in novel views
Excessive Gaussian count (millions) consuming too much GPU memory
Not using adaptive densification, leaving under-reconstructed regions
Ignoring exposure variation between training images

How to Avoid Mistakes

Use dense SfM initialization; increase COLMAP matching thoroughness if sparse
Capture more views, especially in regions that are under-represented
Apply periodic pruning of low-opacity Gaussians to control memory
Enable adaptive densification and set proper gradient thresholds for splitting
Apply per-image exposure compensation or normalize images before training

Forward-Model Mismatch Cases

The widefield fallback processes a single 2D (64,64) image, but Gaussian splatting renders multi-view images from a set of 3D Gaussian primitives — output shape (n_views, H, W) encodes view-dependent appearance
Gaussian splatting is a nonlinear rendering process (alpha-compositing of projected 3D Gaussians sorted by depth) — the widefield linear blur cannot model 3D-to-2D projection, depth ordering, or view-dependent effects

How to Correct the Mismatch

Use the Gaussian splatting operator that projects 3D Gaussian primitives onto each camera plane via differentiable rasterization with alpha compositing
Optimize Gaussian parameters (position, covariance, opacity, color SH coefficients) to minimize rendering loss across training views using the correct splatting forward model

Experimental Setup — Signal Chain

Experimental setup diagram for 3D Gaussian Splatting

Experimental Setup — Details

Training Views: 24-300 (scene-dependent)

Image Resolution: ~1600x1200

Initialization: SfM point cloud (COLMAP)

Rendering Fps: 30

Scene Type: unbounded indoor / outdoor

Training Iterations: 30000

Evaluation: PSNR / SSIM / LPIPS

Dataset: Mip-NeRF360, Tanks & Temples, Deep Blending

Key References

Kerbl et al., '3D Gaussian Splatting for Real-Time Radiance Field Rendering', SIGGRAPH 2023

Canonical Datasets

Mip-NeRF 360 (9 scenes)
Tanks & Temples (Knapitsch et al.)
Deep Blending (Hedman et al.)

Neural Radiance Fields (NeRF)

nerf Neural Rendering

Neural radiance fields (NeRF) represent a 3D scene as a continuous volumetric function F(x,y,z,theta,phi) -> (RGB, sigma) parameterized by a multi-layer perceptron that maps 5D coordinates (position + viewing direction) to color and volume density. Novel views are synthesized by marching camera rays through the volume and integrating color weighted by transmittance using quadrature. Training optimizes the MLP weights to minimize photometric loss between rendered and observed images. Primary challenges include slow training/rendering, view-dependent effects, and the need for accurate camera poses (from COLMAP).

Physics: neural volume

Solver: nerf_mlp

Noise: gaussian

#neural_rendering #nerf #3d #view_synthesis #volumetric

View details →

Neural Radiance Fields (NeRF)

Description

Principle

Neural Radiance Fields (NeRF) represent a 3-D scene as a continuous volumetric function F(x,y,z,θ,φ) → (RGB, σ) parameterized by a multi-layer perceptron (MLP). The network maps 3-D position and viewing direction to color and volume density. Novel views are synthesized by differentiable volume rendering along camera rays, and the network is trained by minimizing photometric loss against a set of posed 2-D images.

How to Build the System

Capture 50-200 images of a scene from diverse viewpoints using a calibrated camera (known intrinsics) or estimate camera poses with COLMAP structure-from-motion. Images should cover the scene uniformly. Train a NeRF MLP (typically 8 layers, 256 units, with positional encoding of input coordinates) on a GPU (≥12 GB VRAM). Training takes 12-48 hours on a single V100. Use mip-NeRF, Instant-NGP, or TensoRF for faster convergence.

Common Reconstruction Algorithms

Vanilla NeRF (MLP + positional encoding)
Instant-NGP (multi-resolution hash encoding, minutes training)
mip-NeRF (anti-aliased cone tracing)
Nerfacto (nerfstudio default combining multiple improvements)
TensoRF (tensor factorization for compact radiance fields)

Common Mistakes

Insufficient camera pose accuracy (SfM failure) causing blurry results
Too few input views or views clustered in a narrow angular range
Training only at one scale without mip-NeRF, causing aliasing at novel distances
Floater artifacts in empty space from insufficient regularization
Very slow training and rendering with vanilla NeRF (hours to train, seconds per frame)

How to Avoid Mistakes

Verify COLMAP pose estimation quality; add more images if registration fails
Capture views uniformly around the scene; include close-up and distant views
Use mip-NeRF or multi-scale training for scale consistency
Add distortion loss or density regularization to eliminate floater artifacts
Use Instant-NGP or 3D Gaussian Splatting for real-time rendering requirements

Forward-Model Mismatch Cases

The widefield fallback processes a single 2D (64,64) image, but NeRF renders multiple views of a 3D scene from a volumetric radiance field — output shape (n_views, H, W) represents images from different camera poses
NeRF is fundamentally nonlinear (volume rendering integral: C(r) = integral of T(t)*sigma(t)*c(t) dt along each ray) — the widefield linear blur cannot model view-dependent appearance, occlusion, or 3D geometry

How to Correct the Mismatch

Use the NeRF operator that performs differentiable volume rendering: for each pixel, cast a ray through the volumetric density/color field and integrate transmittance-weighted radiance
Optimize the 3D radiance field (MLP or voxel grid) to minimize photometric loss across all training views using the correct volume rendering equation as the forward model

Experimental Setup — Signal Chain

Experimental setup diagram for Neural Radiance Fields (NeRF)

Experimental Setup — Details

Training Views: 100

Test Views: 200

Image Resolution: 800x800

Scene Type: object-centric 360 deg (Blender synthetic)

Architecture: positional encoding + MLP (8 layers, 256 hidden)

Training Iterations: 200000

Batch Size Rays: 4096

Evaluation: PSNR / SSIM / LPIPS

Dataset: Blender Synthetic (8 scenes), LLFF (8 forward-facing)

Key References

Mildenhall et al., 'NeRF: Representing scenes as neural radiance fields for view synthesis', ECCV 2020
Muller et al., 'Instant Neural Graphics Primitives (Instant-NGP)', SIGGRAPH 2022

Canonical Datasets

NeRF Blender Synthetic (8 scenes)
LLFF (8 forward-facing scenes)
Mip-NeRF 360 (9 unbounded scenes)