SHA-256

FlatSHA256Circuit is Longfellow’s in-circuit SHA-256 gadget. “Flat” means the 64-round compression function is fully unrolled in parallel rather than folded behind a loop: the prover supplies each round’s intermediate message-schedule word w, and the rolling state values a and e, as witness, and the circuit just verifies per-round arithmetic consistency. In an attestation pipeline you use it to bind the bytes of a TPM2 quote or TDX report to a digest inside the proof, then feed that digest into the ECDSA gadget.

When to reach for it

Proving that a specific attestation-bearing message hashes to a known digest (TPM2 quote, TDX report, signed credential body).
Composing with the ECDSA gadget: hash-then-sign is the dominant attestation shape, and FlatSHA256Circuit’s digest output feeds directly in.
Verifying a commitment to a pre-hashed blob without revealing the blob — the circuit ties message bytes to a public digest without publishing anything else.

Design overview

The gadget lives in lib/circuits/sha/flatsha256_circuit.h and is parameterised as FlatSHA256Circuit<Logic, BitPlucker>. You hand it a Logic<Field, Backend> instance and a BitPlucker<Logic, kShaPluckerSize> encoder. It works over either a prime field (Fp256Base, for the standard ZK path paired with P-256 ECDSA) or a binary field (GF2_128, for the Ligero-over-GF path); pick whichever matches the surrounding circuit — there is no conversion cost to keeping everything in one field.

The single most important thing to internalise is witness duality. The gadget does not compute SHA-256 in-circuit. Outside the proof, the prover runs a plain-C SHA-256 via FlatSHA256Witness::transform_and_witness_block(...), which emits every per-round intermediate (outw[48], oute[64], outa[64]) plus the next chaining state h1[8]. The circuit then checks that each round’s e and a agree with the standard FIPS 180-4 recurrence given the previous state and the claimed w. This is what makes “flat” cheap in a Longfellow sumcheck: there are no nested loops and no long sequential dependency chain, just 64 independent local assertions per block.

Padding is also host-side. The helper FlatSHA256Witness::transform_and_witness_message(...) does the standard SHA-256 padding — append 0x80, zero-pad so the length lands at 56 mod 64, then append the 64-bit big-endian bit length — and writes the padded bytes into the input buffer. The circuit does not pad; instead, assert_zero_padding enforces that any blocks past the active count nb are all-zero, so a prover cannot smuggle extra data into the tail.

The kShaPluckerSize = 2 packing knob (from flatsha256_io.h) trades input-wire count for depth: two bits of each 32-bit witness word share a field element, which roughly halves the number of input wires the sumcheck has to commit to, at the cost of a deeper unpack in the circuit. It is a wires-vs-depth tradeoff, not a free optimisation; default to 2 and only tune if you are depth-bound.

API surface

Single-block verification


void assert_transform_block(const v32 in[16], const v32 H0[8],
                            const v32 outw[48], const v32 oute[64],
                            const v32 outa[64], const v32 H1[8]) const;

Verifies one 512-bit block transform from chaining state H0 to H1, given the per-round witness arrays. There are also two packed overloads that accept packed_v32 witness arrays produced by the BitPlucker.

Multi-block message


void assert_message(size_t max, const v8& nb, const v8 in[/* 64*max */],
                    const BlockWitness bw[/* max */]) const;
void assert_message_hash(size_t max, const v8& nb, const v8 in[/* 64*max */],
                         const v256& target, const BlockWitness bw[/* max */]) const;

max is a compile-time upper bound on the number of blocks (set it to whatever covers your largest anticipated attestation); nb is the actual block count, which can be a witnessed value up to max. The first overload chains block transforms and enforces zero-padding past nb; assert_message_hash additionally asserts equality of the final chaining state with a public target digest.

Witness

There are two BlockWitness structs — one in-circuit and one host-side — and they are intentionally different types.

FlatSHA256Circuit::BlockWitness (in flatsha256_circuit.h) holds packed field elements. Each 32-bit word is stored as a packed_v32 (an std::array<EltW, kNv32Elts>), where kNv32Elts = ceil(32 / LOGN) groups of LOGN bits share one field-element wire:


struct BlockWitness {       // FlatSHA256Circuit::BlockWitness
  packed_v32 outw[48];
  packed_v32 oute[64];
  packed_v32 outa[64];
  packed_v32 h1[8];
 
  void input(const Logic& lc);   // pulls packed wires from the private tape
};

FlatSHA256Witness::BlockWitness (in flatsha256_witness.h) is the host-side counterpart. It holds raw uint32_t values that the prover computes outside the circuit and then encodes into packed_v32 for the circuit:


struct BlockWitness {       // FlatSHA256Witness::BlockWitness
  uint32_t outw[48];
  uint32_t oute[64];
  uint32_t outa[64];
  uint32_t h1[8];
};

Builders: FlatSHA256Witness::transform_and_witness_block(in, H0, outw, oute, outa, H1) for a single block, and FlatSHA256Witness::transform_and_witness_message(n, msg, max, numb, in, bw) for a full message — the latter is what you usually call, because it does the padding and populates the per-block witness array in one shot.

Cost

Per-block numbers reported by flatsha256_circuit_test.cc (see the comment block on FlatSHA256Circuit):

Unpacked single-block: depth 7, ~38K wires, ~6.7K input wires.
Packed with kShaPluckerSize = 2: depth 9, ~66K wires, ~3.6K input wires.

Packing nearly halves the input-wire count at the cost of a deeper circuit. A TPM2 quote or TDX report is typically 200-400 bytes (4-7 SHA-256 blocks once padded), so multiply accordingly when sizing the full hash portion of your proof.

Example


using EvalBackend = EvaluationBackend<Field>;
using Logic = Logic<Field, EvalBackend>;
using FlatSha = FlatSHA256Circuit<Logic, BitPlucker<Logic, kShaPluckerSize>>;
 
const EvalBackend ebk(F);
const Logic L(&ebk, F);
const FlatSha FSHA(L);
 
uint32_t in[16], H0[8], outw[48], oute[64], outa[64], H1[8];
FlatSHA256Witness::transform_and_witness_block(in, H0, outw, oute, outa, H1);
 
// Lift each uint32_t into a v32 via L.vbit32(...) (elided for brevity),
// then assert the block transform:
FSHA.assert_transform_block(vin.data(), vH0.data(), voutw.data(),
                            voute.data(), vouta.data(), vH1.data());

See flatsha256_circuit_test.cc for the full lifting boilerplate and for the packed and multi-block variants.

See it used

flatsha256_circuit_test.cc — correctness tests plus the block-size benchmarks that produced the cost numbers above.
flatsha256_witness.cc — the host-side witness builder and padding implementation.
FIPS 180-4 — the SHA-256 specification itself; the round function and Sigma/sigma terminology come straight from section 4.1.2.

Logic — the bit, word, and vector primitives the gadget is built on.
ECDSA — the usual downstream consumer: hash here, verify signature there.
Verifying a Signed Document — end-to-end walkthrough of the hash-then-verify pattern for an attestation payload.