Faster zkVM proofs via
WebAssembly and crush

Leo Alt

leo@powdrlabs.com

tl;dr: 1.5x faster proofs than RISC-V

BenchmarkProof TimeTrace Cells
Keccak (25k iters)1.58x faster1.62x fewer
Reth (block 24171384)1.45x faster1.56x fewer
  • No precompiles used in either case
  • Also: Go support (Geth/Keeper proven end-to-end)

The RISC-V problem: register spilling

  • RISC-V has 32 registers
  • Compilers must spill registers to RAM constantly
  • In hardware: RAM is cheap, so this is fine
  • In zkVMs: memory is more expensive
  • Registers are actually implemented in memory
    (same address space in SP1, different in OpenVM)

Register spilling in practice

Keccak permutation in RISC-V assembly (snippet)

STOREW rd=28, rs1=8, imm=40    -- spill to stack
STOREW rd=84, rs1=8, imm=24    -- spill to stack
LOADW  rd=92, rs1=8, imm=108   -- reload
XOR    rd=48, rs1=92, rs2=28   -- actual work
STOREW rd=100, rs1=8, imm=76  -- spill again
STOREW rd=104, rs1=8, imm=52  -- spill again
XOR    rd=52, rs1=104, rs2=68 -- actual work

Most instructions are loads/stores, not computation!

Same code in crush

Keccak permutation in crush assembly (snippet)

XOR_64 r54, r29, r31
XOR_64 r54, r54, r27
XOR_64 r54, r54, r25
XOR_64 r60, r54, r23
SLL_64 r54, r60, 1
SRL_64 r58, r60, 63
OR_64  r56, r54, r58
  • Pure computation, no spilling
  • Registers r3..r84 used directly: as many as the frame needs

Why not just specialize RISC-V?

LLVM-based approaches

  1. Build a zkVM-aware LLVM backend (e.g. Valida)
    -> monumental effort
  2. Hack LLVM's RISC-V backend for infinite registers
    -> very invasive change on a large codebase

What about WebAssembly?

WebAssembly already gives us most of what we want:

  • Portable program format
  • Standardized runtimes
  • Preserves control flow
  • Preserves local accesses (locals + stack)
  • Bonus: 64-bit instructions

crush

Lightweight compiler from WebAssembly to a zk-friendly ISA

  • Start from WASM, not LLVM
  • Flatten WASM stack + locals into infinite registers
  • All register accesses relative to a frame pointer
  • Frame sizes known at compile time
  • Well-defined compiler passes (DAG-based pipeline)
  • Also has a write-once-memory backend
WASM
BlockTree
DAG
Optimized DAG
Flat Assembly

crush ISA overview

Core instructions are extremely similar to RISC-V

CategoryInstructions
ArithmeticADD, SUB, AND, OR, XOR (32 & 64-bit)
Multiply/DivideMUL, DIV, REM
ShiftsSLL, SRL, SRA
MemoryLOADW, STOREW
ComparisonLT, LE, GT, GE, EQ, NEQ
ConstantsCONST32
Control flowJUMP, JUMP_IF, CALL, RET

crush vs RISC-V

Key differences

  • Frame pointer (FP) register: all accesses are FP-relative
  • FP changes with callstack (CALL, RETURN)
  • Infinite registers per frame
  • Read-write registers with liveness-based register allocation

powdr-wasm & OpenVM extension

Implements crush as an OpenVM extension

Re-used RISC-V chips

Minimal changes, mainly FP handling

  • reuse base_alu (add, sub, and, or, xor)
  • reuse mul, divrem
  • reuse shifts
  • reuse loadstore
  • reuse comparisons

New chips

  • new call
  • new jump
  • new const

GPU tracegen also re-uses RISC-V code heavily

Keccak results

25,000 Keccak iterations - powdr-wasm vs OpenVM RISC-V

Proof Time
1.58x
76.7s vs 120.8s
Trace Cells
1.62x
32.5B vs 52.7B
Segments
1.71x
45 vs 77

GPU: NVIDIA 4090, no precompiles

Where do the savings come from? Keccak

Per-chip breakdown: LoadStore dominates

crush
RISC-V
LoadStore
2.4B
15.0B
6.16x
ALU
20.1B
28.2B
1.41x

LoadStore savings = 62% of total cell reduction

Reth results

Reth verifying Ethereum mainnet block 24171384

Proof Time
1.45x
239.3s vs 348.1s
Trace Cells
1.56x
106.9B vs 166.5B
Segments
1.50x
141 vs 211

Where do the savings come from? Reth

Same pattern: LoadStore is the dominant win

crush
RISC-V
LoadStore
13.9B
63.0B
4.53x
ALU
46.6B
55.4B
1.19x

LoadStore savings = 82% of total cell reduction

Even on a full Ethereum execution client, eliminating register spilling yields massive prover savings.

Go support / Keeper (Geth)

WASM is a universal compilation target

  • Go compiles to WASM via WASI runtime
  • Keeper = Geth-based guest for Ethereum block verification
Proof Time
827.6s
Trace Cells
352.0B
Segments
467

Hoodi block 1151683, no precompiles

Implemented WASI syscalls

args_sizes_getargs_getenviron_sizes_getenviron_get fd_writefd_readfd_closefd_fdstat_get fd_fdstat_set_flagsfd_prestat_getclock_time_getrandom_get proc_exitpoll_oneofffd_seekfd_sync sched_yieldpath_open

Autoprecompiles

  • Currently being integrated
  • WASM preserves high-level information that RISC-V loses:
    • Register lifetime information
    • Callstack control flow structure
  • This info can propagate all the way to autoprecompile candidates

Future work

  • Autoprecompiles (in progress)
  • zkVM circuit optimizations
  • crush compiler optimizations
  • crush formal verification
  • OpenVM v2 support
  • Hints (non-deterministic advice to reduce guest cycle counts)

WASM + custom ISA/compiler =
clear advantages over RISC-V for zkVMs

  • 1.5x faster proofs from eliminating register spilling alone
  • LoadStore chip savings account for 60-80% of the improvement
  • Universal language support (Rust, Go, Swift, ...)
  • Further gains expected from autoprecompiles and compiler optimizations

github.com/powdr-labs/powdr-wasm
github.com/powdr-labs/crush