Reed-Solomon

The algebra/reed_solomon.h encoder is the error-correcting code Longfellow commits over in Ligero. Given n evaluations of a degree-< n polynomial at the points \{0, 1, \ldots, n-1\}, the encoder produces m - n additional evaluations at \{n, n+1, \ldots, m-1\} — extending the message into a codeword whose minimum distance bounds how far a cheating prover could stray from a legitimate polynomial. Ligero’s soundness comes almost entirely from this object’s distance, so the rate parameter you pick here is the lever that sets the rate parameter in Ligero.

When to reach for it

You are picking rateinv for ZkProver/ZkVerifier and want to know what it means at the code level.
You are writing a custom commitment or low-degree test and need a systematic RS encoder with reusable setup.
You are sizing proofs and need the rate/distance arithmetic that drives per-query soundness.

Design overview

Reed-Solomon encoding in Longfellow is interpolation by extension: the encoder computes the polynomial p of degree < n that matches the input evaluations p(0), \ldots, p(n-1), then evaluates that same polynomial at new points n, n+1, \ldots, m-1. Rather than computing coefficients explicitly, the code uses a Knuth-style identity that expresses p(k) for k \geq n as a linear convolution of the input evaluations with a precomputed kernel — FFTConvolution from convolution.h does the actual work.

Concretely, for d = n - 1 and k \geq n:

$p(k) = (-1)^{d} \cdot (k - d) \cdot \binom{k}{d} \cdot \sum_{j=0}^{d} \frac{(-1)^{j}}{k - j} \binom{d}{j} \cdot p(j).$

ReedSolomon<Field, ConvolutionFactory> precomputes the (-1)^j \binom{d}{j} kernel and the leading factors (-1)^d (k - d) \binom{k}{d} once; encoding at a set of new points reduces to a single convolution and a pointwise scale. The ConvolutionFactory abstraction lets the caller choose between FFTConvolution, CRTConvolution, or the extension-field real-FFT variant without recompiling the encoder. Setup is O(m) field operations plus one O(m \log m) FFT-of-kernel; per-codeword encoding is O(m \log m).

Systematic or not? The output of interpolate() is laid out as two arrays: the original n message evaluations unchanged, and the m - n extended evaluations appended. From a coding-theory standpoint this is a systematic encoding of the polynomial at \{0, \ldots, m-1\} — the first n positions recover the message by inspection, and the Ligero prover relies on that layout when splitting each row into the message block and the “extension” block.

Reed-Solomon over extension fields

ReedSolomonExtension6 handles the case where elements live in a degree-6 extension of the base field but the evaluation points and the code’s distance analysis live in the base field. It treats a codeword over Fp24_6 as six parallel codewords over Fp24, each encoded independently by ReedSolomon<Fp24, ...>. This keeps the distance properties of the base-field code while allowing messages over the extension.

Parameter semantics and soundness (level-3)

Rate and distance. A Reed-Solomon code with message length n and block length m has rate R = n / m and minimum distance d = m - n + 1, meeting the Singleton bound with equality. Longfellow’s Ligero integration parameterises in terms of the inverse rate rateinv = 1/R, so rateinv = 4 means a message of length BLOCK is extended to a codeword of length BLOCK_ENC \approx 4 \cdot \mathrm{BLOCK}, and the minimum distance is BLOCK_ENC - BLOCK + 1 \approx 3 \cdot \mathrm{BLOCK}. The relative distance \delta = d / m \approx 1 - 1/\mathrm{rateinv} is what matters for proximity tests.

Per-query detection. Suppose a cheating Ligero prover commits to a row that is not a codeword but is close to one — within distance e < d/2 of some codeword, where the decoder would snap the row to the wrong message. The prover wins a query if that query lands in one of the m - e positions where the committed row agrees with the decoded codeword — meaning the verifier fails to detect the tampering at that column. For a prover who is at least d/2 away from every codeword, any single random column catches the fraud with probability at least \delta \approx 1 - 1/\mathrm{rateinv}. With nreq independent column queries, the undetected-fraud probability falls to roughly (1 - \delta)^{\mathrm{nreq}}. That is the arithmetic behind the Ligero soundness numbers: rateinv = 4, nreq = 128 lands near 86+ bits of statistical security; rateinv = 7, nreq = 132 lands near 109 bits. (See the Ligero page for why the achieved security is below the naive nreq \cdot \log_2(\mathrm{rateinv}) bound — interleaved-test and low-degree-test slack cost roughly a factor of three.)

Why raising rateinv pays off quickly. Each extra unit of rateinv shrinks the per-column undetected-fraud probability geometrically (1/\mathrm{rateinv} instead of 1/(\mathrm{rateinv} - 1)), while proof size grows only linearly in the number of opened columns. That is why moving from rateinv = 4 to rateinv = 7 with only a modest bump in nreq buys roughly +23 bits of security while keeping proof sizes comparable.

API surface

Construction


template <class Field, class ConvolutionFactory>
class ReedSolomon {
 public:
  ReedSolomon(size_t n, size_t m, const Field& F, const ConvolutionFactory& factory);
  // ...
};

n = message length (number of input evaluations at \{0, \ldots, n-1\}).
m = codeword length (extended output evaluations at \{0, \ldots, m-1\}).
F = the field instance (must outlive the ReedSolomon object).
factory = factory that supplies an FFTConvolution (or CRT variant) for the internal kernel.
Setup cost dominates single-use scenarios; reuse the same ReedSolomon across many encodings.

Encoding


void interpolate(Elt y[/* m */]) const;

The array y is in-place: the caller fills y[0..n-1] with the n input evaluations before calling, and interpolate writes the m - n extended evaluations into y[n..m-1].
Deterministic: no randomness is consumed.
Thread-safe on a shared ReedSolomon as long as each thread supplies its own y buffer.

Extension-field variant


class ReedSolomonExtension6 {
 public:
  ReedSolomonExtension6(size_t n, size_t m, const BaseField& f);
  void interpolate(ExtElt y[/* m */]) const;
};

Treats the extension-field codeword as six parallel base-field codewords.

See it used

longfellow-zk/lib/algebra/reed_solomon_test.cc — reed_solomon_test.cc
longfellow-zk/lib/ligero/ligero_prover.h — the main production consumer.
longfellow-zk/lib/ligero/ligero_param.h — picks the rateinv and nreq constants that feed this encoder’s output into column queries.

Ligero — the commitment that this code’s distance properties protect.
FFT & Convolution — the transforms the encoder stands on.
Interpolation — univariate basis conversions used at the sumcheck layer.