Physics World Model

The smallest trustable protocol by which scientific objects become executable, comparable, certifiable, and reusable.

PWM as a Dyson Swarm

A Dyson Swarm works only if the sun is stable. PWM becomes a swarm not by building many gravity wells, but by hardening one small, trust-centered kernel and letting everything else orbit it.

The sun is the protocol. The orbits are the domains, datasets, methods, instruments, events, and communities that dock to it. If the sun drifts, every collector drifts. If the sun holds, collectors can be added, replaced, or removed without system-level damage.

Deep Sun

CoreSpec
Judge
Certificate

172 Modalities
886 Algorithms
Datasets
Instruments
Claims
Events
Methods
Community

The Deep Sun: 6 Core Objects

These objects define the universal trust protocol. They are domain-neutral, semver-locked, and change only through an RFC process.

CS

CoreSpec

Universal six-tuple: (Ω, E, B, I, O, ε). Any solver accepting the same tuple can consume it.

J

Judge

Universal trust kernel. Enforces R1-R4 run-gates. Validates S1-S4 scientific conditions.

OG

OperatorGraph

Canonical typed graph of any scientific computation. Domain-neutral IR.

RB

RunBundle

Immutable audit record. SHA-256 hashes, provenance, seeds — everything to reproduce.

C

Certificate

Machine-readable trust verdict. Issued after all gates pass. Carries trust tier.

R

Registry

Versioned catalog of primitives, profiles, datasets, methods. Append-only per version.

How Papers Enter the Benchmark

Every benchmark entry starts as a ClaimCard — a structured record of a scientific result. Claims go through a review process before appearing on the leaderboard.

1

Scaffold

Create a ClaimCard from a paper. Click "+ Scaffold new ClaimCard" on the Claims page.

2

Review

A Claim Curator reviews: is the paper real? Are the metrics correct? Click "Approve" or "Reject".

3

Reproduce

A Benchmark Reviewer independently runs the algorithm. If results match, click "Mark Reproduced".

4

Certify

The Judge verifies all R1-R4 gates pass. A reviewer co-signs. The result reaches Certified tier.

Trust Tiers: How Results Get Certified

Draft

Auto-scaffolded

Author

Confirmed

Repro

Reproduced

Cert

Certified

Every benchmark result starts at Draft and progresses through independent reproduction and Judge verification. No single person can certify a result alone.

How to Use PWM

For Researchers

Evaluate your imaging algorithm against 172 modalities with reproducible benchmarks.

1. Try SpecLab — generate a spec, run reconstruction, see PSNR/SSIM

2. Browse Benchmarks — compare your method against the leaderboard

3. Compete — submit to the challenge and get a trust-tier badge

For Developers

Integrate PWM into your CI/CD pipeline or build a new solver plugin.

1. Install: pip install -e packages/pwm_core/

2. Evaluate: pwm evaluate --modality ct --emit-certificate

3. CI: Add the pwm-benchmark GitHub Action to your repo

For Contributors

Join the Dyson Swarm — 8 roles from dataset steward to red-team contributor.

1. Sign up and apply for a role

2. Review claims, reproduce results, or probe for vulnerabilities

3. Earn badges and build your contributor profile

By the Numbers

172

Imaging Modalities

886

Cataloged Algorithms

13

Golden Bundles (Certified)

98

Automated Tests

The Protocol

Semantic Kernel

Makes problems expressible and executable.

  • CoreSpec (Ω, E, B, I, O, ε)
  • PrimitiveRegistry (general/v1 + domain)
  • OperatorGraph (typed DAG)

Trust Kernel

Makes results trustable and comparable.

  • Judge (S1-S4 + R1-R4)
  • RunBundle (immutable audit)
  • Certificate (trust verdict)

Extension Layer

Domains dock without rewriting the kernel.

  • DomainProfile (imaging, CT QC, ...)
  • PrimitiveDialect (domain bridge)
  • DomainDiagnosticReport (Triad, ...)

CoreSpec: The Universal Language of Science

Every scientific problem in PWM is expressed as a canonical six-tuple. This is the sun's language — any solver that accepts the same tuple can consume any problem.

Ω Domain

Spatial extent, mesh, material properties

CT: image voxels · Combustion: spatial mesh

E Equations

Forward model, physics, noise model

CT: Radon transform · MRI: Fourier encoding

B Boundary Conditions

Environmental constraints

CT: scanner geometry · QC: clinical thresholds

I Initial Conditions

Calibration, prior knowledge

CT: calibration data · QC: commissioning baseline

O Observables

Detector geometry, measurement space

CT: sinogram · MRI: k-space · QC: metrics

ε Tolerance

Acceptance threshold

CT: PSNR ≥ 30 dB · QC: ±4 HU accuracy

The 4-Scenario Evaluation Protocol

Every imaging algorithm is tested under four conditions that decompose performance into recoverability, noise sensitivity, and model mismatch.

I

Ideal

Perfect model, no mismatch

Measures intrinsic sampling limit

II

Assumed

Mismatch blindly applied

Measures operator mismatch penalty

III

Corrected

Calibration applied

Measures calibration effectiveness

IV

Oracle

Partial oracle reconstruction

Gap to theoretical best

How a New Science Joins the Swarm

Any scientific domain can join PWM through a three-stage transformation ladder. Not every science enters at Stage C immediately — many start as explicit objects and progress over time.

A

Make it Explicit

Map the domain to CoreSpec (Ω, E, B, I, O, ε). Write a DomainProfile. The science becomes legible.

Exit: spec parses, profile registered

B

Make it Executable

Compile to OperatorGraph + ComputePlan. One method runs end-to-end and emits a RunBundle.

Exit: one RunBundle produced

C

Make it Certifiable

Pass the Judge, emit a Certificate with universal trust + domain diagnostics. Frozen thresholds for Certified.

Exit: golden RunBundle at Certified tier

Beyond Imaging: Cross-Domain Science

PWM starts with computational imaging but the protocol is domain-neutral. Any science that can be expressed as (Ω, E, B, I, O, ε) can dock to the swarm.

🧬

Imaging

172 modalities

Trusted

🔴

CT QC

Quality control

Certified-compatible

🔥

Combustion

Reactive flow

Experimental

🌎

Remote Sensing

Satellite / aerial

Experimental

Particle Physics

Detector simulation

Experimental

Two Gate Families

R1-R4: Operational Run-Gates

Checked on every run. The machinery that makes the trust ratchet work.

R1

Spec Completeness

All CoreSpec fields bound and type-valid

R2

Reproducibility

Seeds, versions, hashes recorded

R3

Metric Integrity

SHA-256 hashes match stored artifacts

R4

Budget Compliance

Runtime within declared budget

S1-S4: Scientific Validity

Verified at design time and by post-execution audit. Not runtime checks.

S1

Finite Specifiability

Problem admits a finite, type-valid description

S2

Hadamard Stability

Well-posed: existence, uniqueness, continuity

S3

Approximability

Solution admits convergent discretization

S4

Certifiability

Computable error bounds exist

A run that passes R1-R4 on a problem satisfying S1-S4 is the definition of a Certified result.

Open-Core Model

The protocol is free. The convenience is paid.

Open Kernel

Everything needed to run the full evaluation locally with identical trust guarantees.

  • CoreSpec + OperatorGraph + RunBundle
  • Certificate schema + Judge (R1-R4, S1-S4)
  • 172 modality specs + 886 algorithms
  • CLI: pwm evaluate
GitHub →

Hosted Platform

Managed cloud with leaderboards, SpecLab, and contributor tools.

  • AI-powered SpecLab reconstruction
  • Blind challenge evaluation
  • Trust-tier promotion workflow
  • Contributor profiles + badges
Explore →

Research

Papers and experiments that drive the protocol forward.

  • Universal Simulation paper
  • InverseNet (calibration)
  • CT QC Copilot
  • Finite Primitive Theorem
"PWM becomes a Dyson swarm by keeping the deep sun small, the semantic kernel versioned, the trust kernel authoritative, and every new domain docked through profiles rather than kernel rewrites."