Skip to Content
⚠️ This documentation is AI-generated, for personal use only, and is not supported or endorsed by Google.
GuidesParsing Structured Bytes in Circuit

Parsing Structured Bytes in Circuit

Signed documents — TPM2 TPMS_ATTEST structures, TDX TDQUOTE bodies, CBOR-encoded mdoc credentials, JWT payloads — almost always have internal structure: tagged fields, length prefixes, nested records. Parsing them inside an arithmetic circuit is harder than parsing them in C because the circuit’s gate count is fixed at compile time. There is no while (more_bytes) { ... }; every loop is unrolled to a compile-time bound. This page covers the three primitives Longfellow provides for structured-byte parsing and the pattern for composing them.

The core challenge

A fixed-topology circuit cannot have a runtime-variable number of gates, but it can have gates whose inputs are selected at witness time. The game is:

  1. The prover witnesses the offsets and lengths of every field.
  2. The circuit has gates for the maximum possible field count / length, wired to witness-selected byte ranges.
  3. “Unused” iterations get gated to no-op by a witness flag, without reducing the gate count.

Primitive 1: byte-range selection (Routing::shift)

circuits/logic/routing.h provides the general-purpose “extract a fixed-length window at a variable offset” primitive. Given:

  • an n-byte input array A,
  • a log n-bit witness amount,
  • a fixed window length k,

Routing<Logic>::shift(logn, amount, k, B, n, A, default, unroll) writes B[i] = A[i + \text{amount}] for i \in [0, k) — or default when the index is past the end of A. It runs in \log n stages, each of which is a single bit-controlled mux over the entire array. Circuit cost is O(n \log n) gates; depth is O(\log n) with the unroll parameter controlling the depth/gate-count trade-off (unroll = 3 is the typical value used in practice).

Use this whenever the circuit needs to pick a contiguous range whose start is not known at compile time.

Primitive 2: fixed-length comparison (Memcmp)

circuits/logic/memcmp.h compares two n-byte sequences bit-by-bit.

  • Memcmp<Logic>::lt(n, A, B) — true iff A < B lexicographically.
  • Memcmp<Logic>::leq(n, A, B) — same, ≤.

Both arrange the bytes MSB-first into 8n-bit vectors and dispatch to Logic::lt / Logic::leq. The cost is O(n) gates and O(\log n) depth.

The three recurring uses:

  • Literal equality check. Compare an extracted range against a compile-time byte literal: is byte 10..42 the string "2026-04-26"? Memcmp::leq twice ( and ) gives ==.
  • Range check. Prove a timestamp is ≥ public now: Memcmp::leq(8, now_bytes, ts_bytes).
  • Length-prefix validation. Prove a declared length field equals the actual field length — see below.

Primitive 3: bounded loops gated by the witness

Fixed-topology circuits never truly “loop.” Instead, they unroll the maximum iteration count at compile time and let the witness tell the circuit which iterations actually ran. The pattern from circuits/mdoc/mdoc_hash.h:

for (size_t k = 0; k < kMaxAttributes; ++k) { // Witness-supplied: offset_k, length_k, active_k (1-bit flag). auto range = r_.shift(logn, offset_k_bits, kMaxAttrBytes, attr_bytes, doc_len, doc, zero_byte, /*unroll=*/3); // Parse range as a CBOR-tagged field (fixed-shape: tag + length + value). assert_cbor_shape(range, length_k); // Gate the assertion on active_k: inactive iterations become tautologies. lc.assert_implies(active_k, predicate_holds_on(range)); }

Two details matter:

  • Unused iterations stay in the circuit. They burn gate count; they do not break the topology. The only way to reduce cost is to pick a tighter kMaxAttributes.
  • The witness decides active_k. It is not a public input. The verifier never learns how many attributes were actually parsed.

Length-prefix validation

Length prefixes are the hardest part: you want to prove “the declared length field equals the actual number of bytes consumed by this field,” without revealing the length. The pattern threads Routing::shift and assert_sum from circuits/logic/bit_adder.h together:

  1. The prover witnesses offset_k (start) and length_k (length) for each field.
  2. The circuit asserts offset_{k+1} = offset_k + length_k for each k — an additive constraint the verifier checks in the base field, forcing the offsets to describe a contiguous partition of the document.
  3. The circuit extracts the length byte from the document via Routing::shift at offset_k, decodes the tag bits, and asserts the decoded length equals length_k — the tag says “I am length_k bytes long” and the partition constraint says “the next field starts length_k bytes from here.”

The upshot: you never need to prove length equality directly. Instead you prove that the offsets are consistent, and the length check falls out for free.

Format-agnostic, not format-free

Nothing on this page is CBOR-specific — the primitives are the same for TLV, TLV-with-bit-packing, or custom binary schemas. What changes between formats is:

  • The tag decoder: different formats use different bit patterns in the tag byte. Write a gadget that takes the tag byte and emits the field count, element type, and inline-vs-out-of-line flag. This is the piece of circuits/mdoc/ that does not generalise.
  • The composition: CBOR has maps with unordered fields, TLV is strictly ordered. Unordered fields need a witness permutation plus an equality check that every field is visited exactly once — again, see mdoc_hash.h for one worked realisation.

Pitfalls to watch for

  • Forgetting to gate inactive iterations. Without the active_k gate, a tampered witness could satisfy inactive iterations with garbage and you would never notice. Always assert_implies(active_k, ...).
  • Undersizing kMaxAttributes. A document with more fields than the compile-time bound is unprovable. Pick a comfortable ceiling; it costs gates, not security.
  • Using Routing::shift inside a per-byte inner loop. shift is O(n \log n) per call; invoking it inside a byte-level loop explodes the gate count. Hoist the shift out and index into its output.
Last updated on