Parsing Structured Bytes in Circuit
Signed documents — TPM2 TPMS_ATTEST structures, TDX TDQUOTE bodies, CBOR-encoded mdoc credentials, JWT payloads — almost always have internal structure: tagged fields, length prefixes, nested records. Parsing them inside an arithmetic circuit is harder than parsing them in C because the circuit’s gate count is fixed at compile time. There is no while (more_bytes) { ... }; every loop is unrolled to a compile-time bound. This page covers the three primitives Longfellow provides for structured-byte parsing and the pattern for composing them.
The core challenge
A fixed-topology circuit cannot have a runtime-variable number of gates, but it can have gates whose inputs are selected at witness time. The game is:
- The prover witnesses the offsets and lengths of every field.
- The circuit has gates for the maximum possible field count / length, wired to witness-selected byte ranges.
- “Unused” iterations get gated to no-op by a witness flag, without reducing the gate count.
Primitive 1: byte-range selection (Routing::shift)
circuits/logic/routing.h provides the general-purpose “extract a fixed-length window at a variable offset” primitive. Given:
- an
n-byte input arrayA, - a
log n-bit witnessamount, - a fixed window length
k,
Routing<Logic>::shift(logn, amount, k, B, n, A, default, unroll) writes B[i] = A[i + \text{amount}] for i \in [0, k) — or default when the index is past the end of A. It runs in \log n stages, each of which is a single bit-controlled mux over the entire array. Circuit cost is O(n \log n) gates; depth is O(\log n) with the unroll parameter controlling the depth/gate-count trade-off (unroll = 3 is the typical value used in practice).
Use this whenever the circuit needs to pick a contiguous range whose start is not known at compile time.
Primitive 2: fixed-length comparison (Memcmp)
circuits/logic/memcmp.h compares two n-byte sequences bit-by-bit.
Memcmp<Logic>::lt(n, A, B)— true iffA < Blexicographically.Memcmp<Logic>::leq(n, A, B)— same, ≤.
Both arrange the bytes MSB-first into 8n-bit vectors and dispatch to Logic::lt / Logic::leq. The cost is O(n) gates and O(\log n) depth.
The three recurring uses:
- Literal equality check. Compare an extracted range against a compile-time byte literal: is byte 10..42 the string
"2026-04-26"?Memcmp::leqtwice (≤and≥) gives==. - Range check. Prove a timestamp is ≥ public
now:Memcmp::leq(8, now_bytes, ts_bytes). - Length-prefix validation. Prove a declared length field equals the actual field length — see below.
Primitive 3: bounded loops gated by the witness
Fixed-topology circuits never truly “loop.” Instead, they unroll the maximum iteration count at compile time and let the witness tell the circuit which iterations actually ran. The pattern from circuits/mdoc/mdoc_hash.h:
for (size_t k = 0; k < kMaxAttributes; ++k) {
// Witness-supplied: offset_k, length_k, active_k (1-bit flag).
auto range = r_.shift(logn, offset_k_bits, kMaxAttrBytes,
attr_bytes, doc_len, doc, zero_byte, /*unroll=*/3);
// Parse range as a CBOR-tagged field (fixed-shape: tag + length + value).
assert_cbor_shape(range, length_k);
// Gate the assertion on active_k: inactive iterations become tautologies.
lc.assert_implies(active_k, predicate_holds_on(range));
}Two details matter:
- Unused iterations stay in the circuit. They burn gate count; they do not break the topology. The only way to reduce cost is to pick a tighter
kMaxAttributes. - The witness decides
active_k. It is not a public input. The verifier never learns how many attributes were actually parsed.
Length-prefix validation
Length prefixes are the hardest part: you want to prove “the declared length field equals the actual number of bytes consumed by this field,” without revealing the length. The pattern threads Routing::shift and assert_sum from circuits/logic/bit_adder.h together:
- The prover witnesses
offset_k(start) andlength_k(length) for each field. - The circuit asserts
offset_{k+1} = offset_k + length_kfor eachk— an additive constraint the verifier checks in the base field, forcing the offsets to describe a contiguous partition of the document. - The circuit extracts the length byte from the document via
Routing::shiftatoffset_k, decodes the tag bits, and asserts the decoded length equalslength_k— the tag says “I amlength_kbytes long” and the partition constraint says “the next field startslength_kbytes from here.”
The upshot: you never need to prove length equality directly. Instead you prove that the offsets are consistent, and the length check falls out for free.
Format-agnostic, not format-free
Nothing on this page is CBOR-specific — the primitives are the same for TLV, TLV-with-bit-packing, or custom binary schemas. What changes between formats is:
- The tag decoder: different formats use different bit patterns in the tag byte. Write a gadget that takes the tag byte and emits the field count, element type, and inline-vs-out-of-line flag. This is the piece of
circuits/mdoc/that does not generalise. - The composition: CBOR has maps with unordered fields, TLV is strictly ordered. Unordered fields need a witness permutation plus an equality check that every field is visited exactly once — again, see
mdoc_hash.hfor one worked realisation.
Pitfalls to watch for
- Forgetting to gate inactive iterations. Without the
active_kgate, a tampered witness could satisfy inactive iterations with garbage and you would never notice. Alwaysassert_implies(active_k, ...). - Undersizing
kMaxAttributes. A document with more fields than the compile-time bound is unprovable. Pick a comfortable ceiling; it costs gates, not security. - Using
Routing::shiftinside a per-byte inner loop.shiftisO(n \log n)per call; invoking it inside a byte-level loop explodes the gate count. Hoist the shift out and index into its output.
Related
- Verifying a Signed Document — where the predicate fits into the attestation pipeline.
- Circuits / Logic (Reference) — the primitive gate library.
- mdoc Case Study — a full CBOR parser built on these primitives.