SHA-256
FlatSHA256Circuit is Longfellow’s in-circuit SHA-256 gadget. “Flat” means the 64-round compression function is fully unrolled in parallel rather than folded behind a loop: the prover supplies each round’s intermediate message-schedule word w, and the rolling state values a and e, as witness, and the circuit just verifies per-round arithmetic consistency. In an attestation pipeline you use it to bind the bytes of a TPM2 quote or TDX report to a digest inside the proof, then feed that digest into the ECDSA gadget.
When to reach for it
- Proving that a specific attestation-bearing message hashes to a known digest (TPM2 quote, TDX report, signed credential body).
- Composing with the ECDSA gadget: hash-then-sign is the dominant attestation shape, and
FlatSHA256Circuit’s digest output feeds directly in. - Verifying a commitment to a pre-hashed blob without revealing the blob — the circuit ties message bytes to a public digest without publishing anything else.
Design overview
The gadget lives in lib/circuits/sha/flatsha256_circuit.h and is parameterised as FlatSHA256Circuit<Logic, BitPlucker>. You hand it a Logic<Field, Backend> instance and a BitPlucker<Logic, kShaPluckerSize> encoder. It works over either a prime field (Fp256Base, for the standard ZK path paired with P-256 ECDSA) or a binary field (GF2_128, for the Ligero-over-GF path); pick whichever matches the surrounding circuit — there is no conversion cost to keeping everything in one field.
The single most important thing to internalise is witness duality. The gadget does not compute SHA-256 in-circuit. Outside the proof, the prover runs a plain-C SHA-256 via FlatSHA256Witness::transform_and_witness_block(...), which emits every per-round intermediate (outw[48], oute[64], outa[64]) plus the next chaining state h1[8]. The circuit then checks that each round’s e and a agree with the standard FIPS 180-4 recurrence given the previous state and the claimed w. This is what makes “flat” cheap in a Longfellow sumcheck: there are no nested loops and no long sequential dependency chain, just 64 independent local assertions per block.
Padding is also host-side. The helper FlatSHA256Witness::transform_and_witness_message(...) does the standard SHA-256 padding — append 0x80, zero-pad so the length lands at 56 mod 64, then append the 64-bit big-endian bit length — and writes the padded bytes into the input buffer. The circuit does not pad; instead, assert_zero_padding enforces that any blocks past the active count nb are all-zero, so a prover cannot smuggle extra data into the tail.
The kShaPluckerSize = 2 packing knob (from flatsha256_io.h) trades input-wire count for depth: two bits of each 32-bit witness word share a field element, which roughly halves the number of input wires the sumcheck has to commit to, at the cost of a deeper unpack in the circuit. It is a wires-vs-depth tradeoff, not a free optimisation; default to 2 and only tune if you are depth-bound.
API surface
Single-block verification
void assert_transform_block(const v32 in[16], const v32 H0[8],
const v32 outw[48], const v32 oute[64],
const v32 outa[64], const v32 H1[8]) const;Verifies one 512-bit block transform from chaining state H0 to H1, given the per-round witness arrays. There are also two packed overloads that accept packed_v32 witness arrays produced by the BitPlucker.
Multi-block message
void assert_message(size_t max, const v8& nb, const v8 in[/* 64*max */],
const BlockWitness bw[/* max */]) const;
void assert_message_hash(size_t max, const v8& nb, const v8 in[/* 64*max */],
const v256& target, const BlockWitness bw[/* max */]) const;max is a compile-time upper bound on the number of blocks (set it to whatever covers your largest anticipated attestation); nb is the actual block count, which can be a witnessed value up to max. The first overload chains block transforms and enforces zero-padding past nb; assert_message_hash additionally asserts equality of the final chaining state with a public target digest.
Witness
There are two BlockWitness structs — one in-circuit and one host-side — and they are intentionally different types.
FlatSHA256Circuit::BlockWitness (in flatsha256_circuit.h) holds packed field elements. Each 32-bit word is stored as a packed_v32 (an std::array<EltW, kNv32Elts>), where kNv32Elts = ceil(32 / LOGN) groups of LOGN bits share one field-element wire:
struct BlockWitness { // FlatSHA256Circuit::BlockWitness
packed_v32 outw[48];
packed_v32 oute[64];
packed_v32 outa[64];
packed_v32 h1[8];
void input(const Logic& lc); // pulls packed wires from the private tape
};FlatSHA256Witness::BlockWitness (in flatsha256_witness.h) is the host-side counterpart. It holds raw uint32_t values that the prover computes outside the circuit and then encodes into packed_v32 for the circuit:
struct BlockWitness { // FlatSHA256Witness::BlockWitness
uint32_t outw[48];
uint32_t oute[64];
uint32_t outa[64];
uint32_t h1[8];
};Builders: FlatSHA256Witness::transform_and_witness_block(in, H0, outw, oute, outa, H1) for a single block, and FlatSHA256Witness::transform_and_witness_message(n, msg, max, numb, in, bw) for a full message — the latter is what you usually call, because it does the padding and populates the per-block witness array in one shot.
Cost
Per-block numbers reported by flatsha256_circuit_test.cc (see the comment block on FlatSHA256Circuit):
- Unpacked single-block: depth 7, ~38K wires, ~6.7K input wires.
- Packed with
kShaPluckerSize = 2: depth 9, ~66K wires, ~3.6K input wires.
Packing nearly halves the input-wire count at the cost of a deeper circuit. A TPM2 quote or TDX report is typically 200-400 bytes (4-7 SHA-256 blocks once padded), so multiply accordingly when sizing the full hash portion of your proof.
Example
using EvalBackend = EvaluationBackend<Field>;
using Logic = Logic<Field, EvalBackend>;
using FlatSha = FlatSHA256Circuit<Logic, BitPlucker<Logic, kShaPluckerSize>>;
const EvalBackend ebk(F);
const Logic L(&ebk, F);
const FlatSha FSHA(L);
uint32_t in[16], H0[8], outw[48], oute[64], outa[64], H1[8];
FlatSHA256Witness::transform_and_witness_block(in, H0, outw, oute, outa, H1);
// Lift each uint32_t into a v32 via L.vbit32(...) (elided for brevity),
// then assert the block transform:
FSHA.assert_transform_block(vin.data(), vH0.data(), voutw.data(),
voute.data(), vouta.data(), vH1.data());See flatsha256_circuit_test.cc for the full lifting boilerplate and for the packed and multi-block variants.
See it used
flatsha256_circuit_test.cc— correctness tests plus the block-size benchmarks that produced the cost numbers above.flatsha256_witness.cc— the host-side witness builder and padding implementation.- FIPS 180-4 — the SHA-256 specification itself; the round function and
Sigma/sigmaterminology come straight from section 4.1.2.
Related
- Logic — the bit, word, and vector primitives the gadget is built on.
- ECDSA — the usual downstream consumer: hash here, verify signature there.
- Verifying a Signed Document — end-to-end walkthrough of the hash-then-verify pattern for an attestation payload.