FFT & Convolution

Longfellow’s FFT family provides the polynomial arithmetic that Reed-Solomon encoding, Ligero commitments, and sumcheck-layer interpolation all stand on. The module ships a cache-oblivious radix-2/4 FFT over any field with roots of unity, a real-FFT variant for fields where elements factor into a real+imaginary pair, a Nussbaumer negacyclic convolution for small sizes, and two convolution wrappers that pre-FFT one of the operands for reuse.

When to reach for it

You are building a new polynomial routine (interpolation, multiplication, evaluation over a coset) and need the transform primitive directly.
You are picking a field for a new statement and need to check whether it has a root of unity of the order you want.
You are tuning a proof-system instantiation and want to know where the polynomial work is — this is it.

Design overview

The module has three conceptual layers:

FFT. FFT<Field> is a cache-oblivious Cooley-Tukey transform with variable radix. Given a size N, choose_radix() factors N = R · S · R with S ≤ 16384 and picks the split that minimises recursion depth; below kBasecase = 16384 the iterative bit-reversed basecase runs in place. The forward transform is fftf(A, n, omega, omega_order, F), the inverse is fftb(A, n, omega_j, j, F); both take the primitive root omega passed by const-ref, together with its order, so the caller can pick a sub-order in a larger-order field.

Real FFT. RFFT<FieldExt> operates on a quadratic extension FieldExt = Fp2<Base> but stores output in half-complex format, exploiting F[n-j] = \overline{F[j]}. It runs at radix-4 with specialized basecases (r2hcI_2, r2hcI_4, hc2hcf_4, hc2rI_4) that avoid multiplies by the fourth root of unity. Used when the input is real (imaginary part zero) and the FFT is part of a longer real-valued pipeline, which saves roughly half the work compared to the complex transform.

Nussbaumer. For small sizes n ≤ 64 (set by kNussbaumerSmall), negacyclic and linear convolutions fall back to a subtractive Karatsuba recursion. For larger n, Nussbaumer<Field> lifts the 1-D convolution to a 2-D problem of size M × R, runs FFTs on both dimensions, and recursively handles the small-size negacyclic residue. Exposes negacyclic, linear, and middle products.

Convolution wrappers. FFTConvolution<Field> precomputes FFT(y) and the 1/N normalization, so later calls to convolution(x) are “FFT x → pointwise multiply → inverse FFT”. FFTExtConvolution<Field, FieldExt> does the same with the real-FFT path. CRTConvolution<CRT, Field> decomposes elements into several small primes, runs independent FFTs over each prime, and reassembles via CRT — useful when the working field’s limb count makes a single large FFT expensive.

API surface

`FFT<Field>` — in-place cache-oblivious FFT


template <class Field>
struct FFT {
  using Elt = typename Field::Elt;
  static void fftf(Elt A[], size_t n, const Elt& omega, uint64_t omega_order,
                   const Field& F);
  static void fftb(Elt A[], size_t n, const Elt& omega_j, uint64_t j,
                   const Field& F);
};

n must be a power of two.
omega must be a primitive omega_order-th root of unity in Field; the field must supply one — Fp128 does for very high orders, Fp256 through FFT-friendly subgroups via CRT.
Both calls are in-place; allocate A large enough for n elements.

`RFFT<FieldExt>` — real-input FFT in half-complex form


template <class FieldExt>
struct RFFT {
  using RElt = typename FieldExt::BaseField::Elt;
  using CElt = typename FieldExt::Elt;
  static void r2hc(RElt A[], size_t n, CElt omega, uint64_t order,
                   const FieldExt& C);
  static void hc2r(RElt A[], size_t n, CElt omega, uint64_t order,
                   const FieldExt& C);
};

Output of r2hc stores re(F[j]) at index 2j when 2j ≤ n and im(F[n-j]) otherwise; hc2r is the exact inverse.

`Nussbaumer<Field>` — negacyclic, linear, and middle products


template <class Field>
struct Nussbaumer {
  using Elt = typename Field::Elt;
  static void negacyclic(size_t n, Elt z[], Elt x[], Elt y[], const Field& F);
  static void linear    (size_t n, Elt z[], Elt x[], Elt y[], const Field& F);
  static void middle    (size_t n, Elt z[], Elt x[], Elt y[], const Field& F);
};

linear output has size 2n; negacyclic and middle outputs have size n.

Convolution wrappers


template <class Field>
class FFTConvolution {
 public:
  // Pre-FFTs y; omega must be a primitive omega_order-th root of unity.
  FFTConvolution(size_t n, size_t m, const Field& f, const Elt omega,
                 uint64_t omega_order, const Elt y[/*m*/]);
  // z[k] = sum_{i=0}^{n-1} x[i] y[k-i], first m entries.
  void convolution(const Elt x[/*n*/], Elt z[/*m*/]) const;
};
 
template <class Field, class FieldExt>
class FFTExtConvolution { /* same shape, half-complex internal */ };
 
template <class CRT, class Field>
class CRTConvolution { /* multi-prime decomposition */ };

All three expose a matching Factory class for the lazy-initialization pattern reed_solomon and ligero use — construction is the expensive step (precomputing the FFT of y and the scale), subsequent calls are cheap.

`LCH14<Field>` — additive FFT over binary fields (`gf2k/lch14.h`)

LCH14 implements the Lin-Chung-Han 2014 additive FFT algorithm for binary fields (those with Field::kCharacteristicTwo). It precomputes normalized basis values ŵ_i(β_j) at construction time and uses them to compute twiddle factors on the fly.


template <class Field>
class LCH14 {
 public:
  explicit LCH14(const Field& F);  // precomputes ŵ_i(β_j) table
 
  // Forward additive FFT of length 2^l starting at coset offset `coset`.
  void FFT(size_t l, size_t coset, Elt B[/* 1 << l */]) const;
 
  // Inverse additive FFT of length 2^l starting at coset offset `coset`.
  void IFFT(size_t l, size_t coset, Elt B[/* 1 << l */]) const;
 
  // Bidirectional (truncated) FFT: given k "time" evaluations and 2^l - k
  // "frequency" evaluations, fills in the complementary set.
  void BidirectionalFFT(size_t l, size_t k, Elt B[/* 1 << l */]) const;
};

The constructor takes only the field reference; there is no log_n parameter. The maximum supported l is Field::kSubFieldBits.
All three transform methods take l (log₂ of the transform length) and either a coset offset (FFT, IFFT) or k (the split point between known time and frequency samples, BidirectionalFFT).
Method names are uppercase (FFT, IFFT, BidirectionalFFT).

Roots of unity (`twiddle.h`)

Twiddle<Field> caches powers of the primitive root of unity across transforms. Each FFT invocation passes its omega in, but for long-running code paths the cache in Twiddle avoids recomputing the same omega^k values. You rarely construct a Twiddle directly; the FFT and convolution constructors build one as a private member.

Complexity at a glance

Routine	Cost
`FFT::fftf` / `fftb`	`O(N \log N)` field operations
`RFFT::r2hc` / `hc2r`	`O(N \log N)`, ~half the constant of complex FFT
`Nussbaumer::negacyclic` (large `n`)	`O(N \log N)` via lifted FFT
`Nussbaumer::negacyclic` (small `n`)	`O(N^{\log_2 3})` Karatsuba
`FFTConvolution::convolution`	`O(N \log N)` per call; setup amortised
`CRTConvolution::convolution`	`O(k · N \log N)` where `k` is prime count

The polynomial interpolation (interpolation.h) and Reed-Solomon encoder (reed_solomon.h) both call into this module rather than implementing their own transforms.

See it used

longfellow-zk/lib/algebra/fft_test.cc — fft_test.cc
longfellow-zk/lib/algebra/rfft_test.cc — rfft_test.cc
longfellow-zk/lib/algebra/nussbaumer_test.cc — nussbaumer_test.cc
longfellow-zk/lib/algebra/reed_solomon.h — the main consumer.

Reed-Solomon — layered on top of the convolution wrappers.
Interpolation — reuses the same transforms for Lagrange ↔ monomial conversion.
Prime Fields — picks the Field the transform runs over.

FFT & Convolution

When to reach for it

Design overview

API surface

FFT<Field> — in-place cache-oblivious FFT

RFFT<FieldExt> — real-input FFT in half-complex form

Nussbaumer<Field> — negacyclic, linear, and middle products