Physics World Model
The smallest trustable protocol by which scientific objects become executable, comparable, certifiable, and reusable.
PWM as a Dyson Swarm
A Dyson Swarm works only if the sun is stable. PWM becomes a swarm not by building many gravity wells, but by hardening one small, trust-centered kernel and letting everything else orbit it.
The sun is the protocol. The orbits are the domains, datasets, methods, instruments, events, and communities that dock to it. If the sun drifts, every collector drifts. If the sun holds, collectors can be added, replaced, or removed without system-level damage.
Deep Sun
CoreSpec
Judge
Certificate
The Deep Sun: 6 Core Objects
These objects define the universal trust protocol. They are domain-neutral, semver-locked, and change only through an RFC process.
CoreSpec
Universal six-tuple: (Ω, E, B, I, O, ε). Any solver accepting the same tuple can consume it.
Judge
Universal trust kernel. Enforces R1-R4 run-gates. Validates S1-S4 scientific conditions.
OperatorGraph
Canonical typed graph of any scientific computation. Domain-neutral IR.
RunBundle
Immutable audit record. SHA-256 hashes, provenance, seeds — everything to reproduce.
Certificate
Machine-readable trust verdict. Issued after all gates pass. Carries trust tier.
Registry
Versioned catalog of primitives, profiles, datasets, methods. Append-only per version.
How Papers Enter the Benchmark
Every benchmark entry starts as a ClaimCard — a structured record of a scientific result. Claims go through a review process before appearing on the leaderboard.
Review
A Claim Curator reviews: is the paper real? Are the metrics correct? Click "Approve" or "Reject".
Reproduce
A Benchmark Reviewer independently runs the algorithm. If results match, click "Mark Reproduced".
Certify
The Judge verifies all R1-R4 gates pass. A reviewer co-signs. The result reaches Certified tier.
Trust Tiers: How Results Get Certified
Auto-scaffolded
Confirmed
Reproduced
Certified
Every benchmark result starts at Draft and progresses through independent reproduction and Judge verification. No single person can certify a result alone.
How to Use PWM
For Researchers
Evaluate your imaging algorithm against 172 modalities with reproducible benchmarks.
1. Try SpecLab — generate a spec, run reconstruction, see PSNR/SSIM
2. Browse Benchmarks — compare your method against the leaderboard
3. Compete — submit to the challenge and get a trust-tier badge
For Developers
Integrate PWM into your CI/CD pipeline or build a new solver plugin.
1. Install: pip install -e packages/pwm_core/
2. Evaluate: pwm evaluate --modality ct --emit-certificate
3. CI: Add the pwm-benchmark GitHub Action to your repo
For Contributors
Join the Dyson Swarm — 8 roles from dataset steward to red-team contributor.
1. Sign up and apply for a role
2. Review claims, reproduce results, or probe for vulnerabilities
3. Earn badges and build your contributor profile
By the Numbers
172
Imaging Modalities
886
Cataloged Algorithms
13
Golden Bundles (Certified)
98
Automated Tests
The Protocol
Semantic Kernel
Makes problems expressible and executable.
- CoreSpec (Ω, E, B, I, O, ε)
- PrimitiveRegistry (general/v1 + domain)
- OperatorGraph (typed DAG)
Trust Kernel
Makes results trustable and comparable.
- Judge (S1-S4 + R1-R4)
- RunBundle (immutable audit)
- Certificate (trust verdict)
Extension Layer
Domains dock without rewriting the kernel.
- DomainProfile (imaging, CT QC, ...)
- PrimitiveDialect (domain bridge)
- DomainDiagnosticReport (Triad, ...)
CoreSpec: The Universal Language of Science
Every scientific problem in PWM is expressed as a canonical six-tuple. This is the sun's language — any solver that accepts the same tuple can consume any problem.
Ω Domain
Spatial extent, mesh, material properties
CT: image voxels · Combustion: spatial mesh
E Equations
Forward model, physics, noise model
CT: Radon transform · MRI: Fourier encoding
B Boundary Conditions
Environmental constraints
CT: scanner geometry · QC: clinical thresholds
I Initial Conditions
Calibration, prior knowledge
CT: calibration data · QC: commissioning baseline
O Observables
Detector geometry, measurement space
CT: sinogram · MRI: k-space · QC: metrics
ε Tolerance
Acceptance threshold
CT: PSNR ≥ 30 dB · QC: ±4 HU accuracy
The 4-Scenario Evaluation Protocol
Every imaging algorithm is tested under four conditions that decompose performance into recoverability, noise sensitivity, and model mismatch.
Ideal
Perfect model, no mismatch
Measures intrinsic sampling limit
Assumed
Mismatch blindly applied
Measures operator mismatch penalty
Corrected
Calibration applied
Measures calibration effectiveness
Oracle
Partial oracle reconstruction
Gap to theoretical best
How a New Science Joins the Swarm
Any scientific domain can join PWM through a three-stage transformation ladder. Not every science enters at Stage C immediately — many start as explicit objects and progress over time.
Make it Explicit
Map the domain to CoreSpec (Ω, E, B, I, O, ε). Write a DomainProfile. The science becomes legible.
Exit: spec parses, profile registered
Make it Executable
Compile to OperatorGraph + ComputePlan. One method runs end-to-end and emits a RunBundle.
Exit: one RunBundle produced
Make it Certifiable
Pass the Judge, emit a Certificate with universal trust + domain diagnostics. Frozen thresholds for Certified.
Exit: golden RunBundle at Certified tier
The Contributor Economy
A Dyson Swarm works when every collector has identity and reward. PWM defines 8 contributor roles with badges and credit surfaces.
📋
Claim Curator
Review paper claims
🔍
Benchmark Reviewer
Reproduce results
🔬
Modality Maintainer
Curate leaderboards
🗃
Dataset Steward
Manage data quality
⚡
Method Integrator
Adapt algorithms
🔒
Judge-Rule Author
Propose gates via RFC
🛠
Red-Team
Probe for vulnerabilities
📡
Instrument Contributor
Provide calibration data
Beyond Imaging: Cross-Domain Science
PWM starts with computational imaging but the protocol is domain-neutral. Any science that can be expressed as (Ω, E, B, I, O, ε) can dock to the swarm.
🧬
Imaging
172 modalities
Trusted
🔴
CT QC
Quality control
Certified-compatible
🔥
Combustion
Reactive flow
Experimental
🌎
Remote Sensing
Satellite / aerial
Experimental
⚛
Particle Physics
Detector simulation
Experimental
Two Gate Families
R1-R4: Operational Run-Gates
Checked on every run. The machinery that makes the trust ratchet work.
Spec Completeness
All CoreSpec fields bound and type-valid
Reproducibility
Seeds, versions, hashes recorded
Metric Integrity
SHA-256 hashes match stored artifacts
Budget Compliance
Runtime within declared budget
S1-S4: Scientific Validity
Verified at design time and by post-execution audit. Not runtime checks.
Finite Specifiability
Problem admits a finite, type-valid description
Hadamard Stability
Well-posed: existence, uniqueness, continuity
Approximability
Solution admits convergent discretization
Certifiability
Computable error bounds exist
A run that passes R1-R4 on a problem satisfying S1-S4 is the definition of a Certified result.
Open-Core Model
The protocol is free. The convenience is paid.
Open Kernel
Everything needed to run the full evaluation locally with identical trust guarantees.
- CoreSpec + OperatorGraph + RunBundle
- Certificate schema + Judge (R1-R4, S1-S4)
- 172 modality specs + 886 algorithms
- CLI:
pwm evaluate
Hosted Platform
Managed cloud with leaderboards, SpecLab, and contributor tools.
- AI-powered SpecLab reconstruction
- Blind challenge evaluation
- Trust-tier promotion workflow
- Contributor profiles + badges
Research
Papers and experiments that drive the protocol forward.
- Universal Simulation paper
- InverseNet (calibration)
- CT QC Copilot
- Finite Primitive Theorem