GPU Acceleration
Polyorder.jl provides seamless support for GPU acceleration. This is achieved through a device-agnostic design powered by AcceleratedKernels.jl and KernelAbstractions.jl. This means the same core SCFT algorithms work on both CPU and GPU without code duplication, simply by changing the underlying array type.
Prerequisites
To use GPU acceleration, you need:
- A CUDA-capable NVIDIA GPU.
- Properly installed NVIDIA drivers.
- CUDA.jl installed in your Julia environment.
Installation
First, ensure CUDA.jl is installed. In the Julia REPL:
using Pkg
Pkg.add("CUDA")Polyorder.jl automatically detects CUDA.jl and AcceleratedKernels.jl to enable GPU functionality.
Example: Running SCFT on GPU
To run on a GPU, you simply initialize the AuxiliaryField with a CuArray (from CUDA.jl). Polyorder.jl detects the GPU array and automatically uses AcceleratedKernels.jl to dispatch operations to the device.
using Polyorder
using CUDA
using Random
# 1. Define the polymer system (exact same code for CPU and GPU)
system = AB_system(χN=20.0)
# 2. Define the simulation cell (lattice)
lattice = BravaisLattice(UnitCell(Tetragonal(), 4.0, 1.6))
# --- GPU Initialization ---
# Create an initial GPU array (e.g., zeros) with the desired grid size
gpu_data = CUDA.zeros(Float64, 48, 48, 16)
# Create the AuxiliaryField backed by the GPU array
w_gpu = AuxiliaryField(gpu_data, lattice; specie=:A)
# Create the SCFT solver
scft_gpu = NoncyclicChainSCFT(system, w_gpu;
init=:randn,
rng=Xoshiro(1234)
)
# Run the solver
# All heavy computations (propagators, FFTs, field updates) happen on the GPU
Polyorder.solve!(scft_gpu)How It Works
Polyorder.jl uses a device-agnostic approach:
- Field Storage: The type of array used for fields (
w,ϕ) determines the execution backend. UsingCuArraytriggers GPU execution. - Kernel Dispatch: AcceleratedKernels.jl and KernelAbstractions.jl are used to write kernels that compile for both CPU and GPU.
- FFTs: The
AbstractFFTsinterface automatically usesCUDA.CUFFTwhen acting onCuArrays. - Linear Algebra: Updaters like Anderson Mixing use generic linear algebra that works seamlessly on GPU arrays.
Performance Tips
Precision (Float64 vs Float32)
By default, SCFT simulations often use Float64 (double precision) for accuracy. Most consumer GPUs are optimized for Float32 (single precision).
- Default:
AuxiliaryField(..., init=:randn)createsFloat64arrays.CuArray(...)preserves the element type. - Faster Option: If single precision is sufficient for your problem, convert the field to
Float32before creating theAuxiliaryField.
w_gpu_data_f32 = CuArray{Float32}(w_temp.data)
w_gpu_f32 = AuxiliaryField(w_gpu_data_f32, lattice)Grid Sizes
FFT performance on GPUs is highly sensitive to grid dimensions. Sizes that are powers of 2 (e.g., 32, 64, 128) or products of small primes are significantly faster.
Use best_N_fft to find an optimal grid size:
N = best_N_fft(desired_size; pow2=true)Troubleshooting
Out of Memory (OOM)
SCFT requires storing multiple copies of the propagator and history arrays for acceleration. If you run out of GPU memory:
- Reduce the grid size.
- Use
Float32precision.