Faster zkVM proofs via
WebAssembly and crush
Leo Alt
leo@powdrlabs.com
tl;dr: 1.5x faster proofs than RISC-V
| Benchmark | Proof Time | Trace Cells |
| Keccak (25k iters) | 1.58x faster | 1.62x fewer |
| Reth (block 24171384) | 1.45x faster | 1.56x fewer |
- No precompiles used in either case
- Also: Go support (Geth/Keeper proven end-to-end)
The RISC-V problem: register spilling
- RISC-V has 32 registers
- Compilers must spill registers to RAM constantly
- In hardware: RAM is cheap, so this is fine
- In zkVMs: memory is more expensive
- Registers are actually implemented in memory
(same address space in SP1, different in OpenVM)
Register spilling in practice
Keccak permutation in RISC-V assembly (snippet)
STOREW rd=28, rs1=8, imm=40
STOREW rd=84, rs1=8, imm=24
LOADW rd=92, rs1=8, imm=108
XOR rd=48, rs1=92, rs2=28
STOREW rd=100, rs1=8, imm=76
STOREW rd=104, rs1=8, imm=52
XOR rd=52, rs1=104, rs2=68
Most instructions are loads/stores, not computation!
Same code in crush
Keccak permutation in crush assembly (snippet)
XOR_64 r54, r29, r31
XOR_64 r54, r54, r27
XOR_64 r54, r54, r25
XOR_64 r60, r54, r23
SLL_64 r54, r60, 1
SRL_64 r58, r60, 63
OR_64 r56, r54, r58
- Pure computation, no spilling
- Registers r3..r84 used directly: as many as the frame needs
Why not just specialize RISC-V?
LLVM-based approaches
- Build a zkVM-aware LLVM backend (e.g. Valida)
-> monumental effort
- Hack LLVM's RISC-V backend for infinite registers
-> very invasive change on a large codebase
What about WebAssembly?
WebAssembly already gives us most of what we want:
- Portable program format
- Standardized runtimes
- Preserves control flow
- Preserves local accesses (locals + stack)
- Bonus: 64-bit instructions
crush
Lightweight compiler from WebAssembly to a zk-friendly ISA
- Start from WASM, not LLVM
- Flatten WASM stack + locals into infinite registers
- All register accesses relative to a frame pointer
- Frame sizes known at compile time
- Well-defined compiler passes (DAG-based pipeline)
- Also has a write-once-memory backend
WASM
BlockTree
DAG
Optimized DAG
Flat Assembly
crush ISA overview
Core instructions are extremely similar to RISC-V
| Category | Instructions |
| Arithmetic | ADD, SUB, AND, OR, XOR (32 & 64-bit) |
| Multiply/Divide | MUL, DIV, REM |
| Shifts | SLL, SRL, SRA |
| Memory | LOADW, STOREW |
| Comparison | LT, LE, GT, GE, EQ, NEQ |
| Constants | CONST32 |
| Control flow | JUMP, JUMP_IF, CALL, RET |
crush vs RISC-V
Key differences
- Frame pointer (FP) register: all accesses are FP-relative
- FP changes with callstack (CALL, RETURN)
- Infinite registers per frame
- Read-write registers with liveness-based register allocation
powdr-wasm & OpenVM extension
Implements crush as an OpenVM extension
Re-used RISC-V chips
Minimal changes, mainly FP handling
- reuse base_alu (add, sub, and, or, xor)
- reuse mul, divrem
- reuse shifts
- reuse loadstore
- reuse comparisons
New chips
- new call
- new jump
- new const
GPU tracegen also re-uses RISC-V code heavily
Keccak results
25,000 Keccak iterations - powdr-wasm vs OpenVM RISC-V
Proof Time
↓1.58x
76.7s vs 120.8s
Trace Cells
↓1.62x
32.5B vs 52.7B
GPU: NVIDIA 4090, no precompiles
Where do the savings come from? Keccak
Per-chip breakdown: LoadStore dominates
LoadStore savings = 62% of total cell reduction
Reth results
Reth verifying Ethereum mainnet block 24171384
Proof Time
↓1.45x
239.3s vs 348.1s
Trace Cells
↓1.56x
106.9B vs 166.5B
Segments
↓1.50x
141 vs 211
Where do the savings come from? Reth
Same pattern: LoadStore is the dominant win
LoadStore savings = 82% of total cell reduction
Even on a full Ethereum execution client, eliminating register spilling yields massive prover savings.
Go support / Keeper (Geth)
WASM is a universal compilation target
- Go compiles to WASM via WASI runtime
- Keeper = Geth-based guest for Ethereum block verification
Hoodi block 1151683, no precompiles
Implemented WASI syscalls
args_sizes_getargs_getenviron_sizes_getenviron_get
fd_writefd_readfd_closefd_fdstat_get
fd_fdstat_set_flagsfd_prestat_getclock_time_getrandom_get
proc_exitpoll_oneofffd_seekfd_sync
sched_yieldpath_open
Autoprecompiles
- Currently being integrated
- WASM preserves high-level information that RISC-V loses:
- Register lifetime information
- Callstack control flow structure
- This info can propagate all the way to autoprecompile candidates
Future work
- Autoprecompiles (in progress)
- zkVM circuit optimizations
- crush compiler optimizations
- crush formal verification
- OpenVM v2 support
- Hints (non-deterministic advice to reduce guest cycle counts)
WASM + custom ISA/compiler =
clear advantages over RISC-V for zkVMs
- 1.5x faster proofs from eliminating register spilling alone
- LoadStore chip savings account for 60-80% of the improvement
- Universal language support (Rust, Go, Swift, ...)
- Further gains expected from autoprecompiles and compiler optimizations
github.com/powdr-labs/powdr-wasm
github.com/powdr-labs/crush