DCT Acceleration

Overview

Spectral methods (FFT) are the gold standard for solving the Diffusion Equation in SCFT due to their high accuracy and efficiency compared to Finite Difference methods. However, standard FFTs operate on the full complex grid, which can be redundant for systems with crystallographic symmetry.

Polyorder.jl introduces a specialized DCT-I (Discrete Cosine Transform) Acceleration Mode for systems exhibiting mirror symmetry (e.g., Block Copolymers in Pmmm supergroups). This mode:

  1. Reduces Memory: Operates on the octant grid ($N/8$), reducing solver buffer size by ~87.5%.
  2. Improves Speed: DCT-I is computationally cheaper than R2C FFT for real symmetric data. Benchmarks show a ~2.8x speedup on CPU and ~3.3x speedup on GPU (CUDA).
  3. Maintains Precision: Mathematically equivalent to FFT for symmetric fields to machine precision ($\sim 10^{-15}$).

Mathematical Basis

For a real field $w(x)$ defined on $[0, L)$ with periodic boundary conditions, if the field exhibits mirror symmetry about the center: $w(x) = w(L-x)$ Then its Fourier coefficients are real and even. The full Fourier Transform can be exactly mapped to a Type-I Discrete Cosine Transform (DCT-I) on the first half of the domain $[0, L/2]$.

  • FFT Domain: $N$ points $\{0, 1, \dots, N-1\}$
  • DCT Domain: $N/2 + 1$ points $\{0, 1, \dots, N/2\}$ (Octant in 3D)

In 3D, for Space Group 47 (Pmmm) and its supergroups (including Cubic SG 221 $Pm\bar{3}m$, SG 229 $Im\bar{3}m$, etc.), the fields possess mirror planes perpendicular to all three axes. This allows us to solve the diffusion equation entirely within the independent octant using 3D DCTs.

Architecture

OctAuxiliaryField

The core of the implementation is the OctAuxiliaryField type, which represents a field defined only on the symmetry-reduced octant.

struct OctAuxiliaryField{...} <: AbstractAuxiliaryField
    data::Array{T, 3}  # Size: (Nx/2+1, Ny/2+1, Nz/2+1)
    fft::DCTPlan       # Optimized DCT-I plan
    ifft::IDCTPlan     # Optimized IDCT-I plan
end

Hybrid Solver Pipeline

The DCT solver integrates seamlessly with Polyorder.jl's existing memory optimizations (Checkpointing and Symmetric Storage):

  1. Storage: Propagators are stored in SymmetricStorage (highly compressed stars).
  2. Expansion: Data is expanded from Stars $\to$ Octant directly (skipping the full grid).
  3. Time Stepping: The OSF (Operator Splitting) solver advances the propagator using DCTs on the octant grid.
  4. Compression: Resulting octant data is compressed back Octant $\to$ Stars.
  5. Checkpoint Compatibility: If using Checkpointing (e.g., :minimal), the octant propagator is expanded to the full grid before being stored in the checkpoint buffer. This ensures compatibility with analysis tools that expect full-grid data, while keeping the expensive solver steps in the optimized octant domain.

This "Hybrid" approach combines the massive storage efficiency of Symmetric Storage (~96x) with the computational speed and buffer efficiency of DCT (~8x).

Integration with Performance Profiles

DCT acceleration is orthogonal to the Memory Profiles but requires symmetry to be enabled.

  • Default Behavior: Profiles that disable symmetry (:fast, :checkpoint) will NOT use DCT by default.
  • Manual Override: You can enable DCT with these profiles by passing symmetrize=true.
ProfileDCT Compatible?Behavior
:fast❌ No (Default)Uses pure FFT (no symmetry).
:checkpoint❌ No (Default)Uses pure FFT (no symmetry).
:fast + sym✅ YesBecomes equivalent to :balanced + DCT.
:checkpoint + sym✅ YesEffective :minimal. Checkpointing + Symmetric Storage + DCT.
:balanced✅ YesStandard configuration. Uses Symmetric Storage + DCT Solver.
:compact✅ YesFurther reduces memory by not precomputing physics.
:minimal✅ YesUltimate Scaling. Checkpointing + Symmetric Storage + DCT.

Verification

The DCT implementation has been rigorously verified against the standard FFT solver:

  • Free Energy: Matches to machine precision ($|\Delta F| < 10^{-14}$).
  • Density: Pointwise difference $< 10^{-15}$.
  • Performance: Validated on both CPU and GPU (CUDA).

Constraints

To enable DCT mode, the system must satisfy:

  1. Orthogonal Lattice: Cubic, Tetragonal, Orthorhombic. (No Triclinic/Monoclinic).
  2. Even Grid Dimensions: $N_x, N_y, N_z$ must be even.
  3. Pmmm Supergroup (Restricted): Only space groups with pure mirror planes perpendicular to all axes are supported (no glide planes).
    • Orthorhombic: 47 (Pmmm), 65 (Cmmm), 69 (Fmmm), 71 (Immm)
    • Tetragonal: 123 (P4/mmm), 131 (P4₂/mmc), 139 (I4/mmm)
    • Cubic: 221 (Pm-3m), 223 (Pm-3n), 225 (Fm-3m), 226 (Fm-3c), 229 (Im-3m)

Other "mmm" groups (e.g., Gyroid SG 230 $Ia\bar{3}d$) typically involve glide planes (reflection + shift) which break the simple $f(x) = f(N-x)$ symmetry required for standard DCT-I. Only SG 230 is notably absent due to its glide nature, but SG 229 serves as a valid Cubic alternative for many phases.