powdr-wasm: Faster zkVM Proofs via WebAssembly

Faster zkVM proofs via
WebAssembly and crush

Leo Alt

leo@powdrlabs.com

tl;dr: 1.5x faster proofs than RISC-V

Benchmark	Proof Time	Trace Cells
Keccak (25k iters)	1.58x faster	1.62x fewer
Reth (block 24171384)	1.45x faster	1.56x fewer

No precompiles used in either case
Also: Go support (Geth/Keeper proven end-to-end)

The RISC-V problem: register spilling

RISC-V has 32 registers
Compilers must spill registers to RAM constantly
In hardware: RAM is cheap, so this is fine
In zkVMs: memory is more expensive
Registers are actually implemented in memory
(same address space in SP1, different in OpenVM)

Register spilling in practice

Keccak permutation in RISC-V assembly (snippet)

STOREW rd=28, rs1=8, imm=40    -- spill to stack
STOREW rd=84, rs1=8, imm=24    -- spill to stack
LOADW  rd=92, rs1=8, imm=108   -- reload
XOR    rd=48, rs1=92, rs2=28   -- actual work
STOREW rd=100, rs1=8, imm=76  -- spill again
STOREW rd=104, rs1=8, imm=52  -- spill again
XOR    rd=52, rs1=104, rs2=68 -- actual work

Most instructions are loads/stores, not computation!

Same code in crush

Keccak permutation in crush assembly (snippet)

XOR_64 r54, r29, r31
XOR_64 r54, r54, r27
XOR_64 r54, r54, r25
XOR_64 r60, r54, r23
SLL_64 r54, r60, 1
SRL_64 r58, r60, 63
OR_64  r56, r54, r58

Pure computation, no spilling
Registers r3..r84 used directly: as many as the frame needs

Why not just specialize RISC-V?

LLVM-based approaches

Build a zkVM-aware LLVM backend (e.g. Valida)
-> monumental effort
Hack LLVM's RISC-V backend for infinite registers
-> very invasive change on a large codebase

What about WebAssembly?

WebAssembly already gives us most of what we want:

Portable program format
Standardized runtimes
Preserves control flow
Preserves local accesses (locals + stack)
Bonus: 64-bit instructions

crush

Lightweight compiler from WebAssembly to a zk-friendly ISA

Start from WASM, not LLVM
Flatten WASM stack + locals into infinite registers
All register accesses relative to a frame pointer
Frame sizes known at compile time
Well-defined compiler passes (DAG-based pipeline)
Also has a write-once-memory backend

WASM

BlockTree

DAG

Optimized DAG

Flat Assembly

crush ISA overview

Core instructions are extremely similar to RISC-V

Category	Instructions
Arithmetic	ADD, SUB, AND, OR, XOR (32 & 64-bit)
Multiply/Divide	MUL, DIV, REM
Shifts	SLL, SRL, SRA
Memory	LOADW, STOREW
Comparison	LT, LE, GT, GE, EQ, NEQ
Constants	CONST32
Control flow	JUMP, JUMP_IF, CALL, RET

crush vs RISC-V

Key differences

Frame pointer (FP) register: all accesses are FP-relative
FP changes with callstack (CALL, RETURN)
Infinite registers per frame
Read-write registers with liveness-based register allocation

powdr-wasm & OpenVM extension

Implements crush as an OpenVM extension

Re-used RISC-V chips

Minimal changes, mainly FP handling

reuse base_alu (add, sub, and, or, xor)
reuse mul, divrem
reuse shifts
reuse loadstore
reuse comparisons

New chips

new call
new jump
new const

GPU tracegen also re-uses RISC-V code heavily

Keccak results

25,000 Keccak iterations - powdr-wasm vs OpenVM RISC-V

Proof Time

↓1.58x

76.7s vs 120.8s

Trace Cells

↓1.62x

32.5B vs 52.7B

Segments

↓1.71x

45 vs 77

GPU: NVIDIA 4090, no precompiles

Where do the savings come from? Keccak

Per-chip breakdown: LoadStore dominates

crush

RISC-V

LoadStore

2.4B

15.0B

↓6.16x

ALU

20.1B

28.2B

↓1.41x

LoadStore savings = 62% of total cell reduction

Reth results

Reth verifying Ethereum mainnet block 24171384

Proof Time

↓1.45x

239.3s vs 348.1s

Trace Cells

↓1.56x

106.9B vs 166.5B

Segments

↓1.50x

141 vs 211

Where do the savings come from? Reth

Same pattern: LoadStore is the dominant win

crush

RISC-V

LoadStore

13.9B

63.0B

↓4.53x

ALU

46.6B

55.4B

↓1.19x

LoadStore savings = 82% of total cell reduction

Even on a full Ethereum execution client, eliminating register spilling yields massive prover savings.

Go support / Keeper (Geth)

WASM is a universal compilation target

Go compiles to WASM via WASI runtime
Keeper = Geth-based guest for Ethereum block verification

Proof Time

827.6s

Trace Cells

352.0B

Segments

467

Hoodi block 1151683, no precompiles

Implemented WASI syscalls

args_sizes_getargs_getenviron_sizes_getenviron_get fd_writefd_readfd_closefd_fdstat_get fd_fdstat_set_flagsfd_prestat_getclock_time_getrandom_get proc_exitpoll_oneofffd_seekfd_sync sched_yieldpath_open

Autoprecompiles

Currently being integrated
WASM preserves high-level information that RISC-V loses:
- Register lifetime information
- Callstack control flow structure
This info can propagate all the way to autoprecompile candidates

Future work

Autoprecompiles (in progress)
zkVM circuit optimizations
crush compiler optimizations
crush formal verification
OpenVM v2 support
Hints (non-deterministic advice to reduce guest cycle counts)

WASM + custom ISA/compiler =
clear advantages over RISC-V for zkVMs

1.5x faster proofs from eliminating register spilling alone
LoadStore chip savings account for 60-80% of the improvement
Universal language support (Rust, Go, Swift, ...)
Further gains expected from autoprecompiles and compiler optimizations

github.com/powdr-labs/powdr-wasm
github.com/powdr-labs/crush

Faster zkVM proofs viaWebAssembly and crush

tl;dr: 1.5x faster proofs than RISC-V

The RISC-V problem: register spilling

Register spilling in practice

Same code in crush

Why not just specialize RISC-V?

LLVM-based approaches

What about WebAssembly?

crush

crush ISA overview

crush vs RISC-V

Key differences

powdr-wasm & OpenVM extension

Re-used RISC-V chips

New chips

Keccak results

Where do the savings come from? Keccak

Reth results

Where do the savings come from? Reth

Go support / Keeper (Geth)

Implemented WASI syscalls

Autoprecompiles

Future work

WASM + custom ISA/compiler =clear advantages over RISC-V for zkVMs

Faster zkVM proofs via
WebAssembly and crush

WASM + custom ISA/compiler =
clear advantages over RISC-V for zkVMs