Low Memory Mode

Polyorder.jl provides a Low Memory Mode utilizing Checkpointing and Shared Caches to significantly reduce memory usage during SCFT simulations. This enables running high-resolution 3D simulations (e.g., $128^3$ or larger) on memory-constrained hardware (e.g., GPUs with 11-24 GB VRAM) that would otherwise exceed memory limits.

Note: This mode trades compute time for memory. In exchange for massive memory savings (typically 50-80% reduction), the SCFT iteration time effectively doubles (~2x slowdown) due to propagator recomputation.

Prerequisites

To benefit from Low Memory Mode, you typically encounter one of the following scenarios:

Large-scale 3D simulations: Grid sizes > $128^3$.
Memory-constrained hardware: Running on consumer/gaming GPUs or workstations with limited RAM.

Usage

Basic Usage

To enable Low Memory Mode, simply set the low_memory=true keyword argument when creating your NoncyclicChainSCFT solver. This automatically enables both checkpointing and shared cache optimizations with auto-tuned parameters.

using Polyorder

# 1. Define typical system and fields
system = ...
w = ...
ds = ...

# 2. Enable Low Memory Mode
scft = NoncyclicChainSCFT(system, w, ds; 
    low_memory=true,    # <--- Enables checkpointing
    init=:randn
)

# 3. Run as usual
Polyorder.solve!(scft)

Advanced Configuration

You can customize the checkpointing behavior using the mde_options NamedTuple.

scft = NoncyclicChainSCFT(system, w, ds; 
    low_memory=true,
    mde_options=(; 
        shared_cache=true, # Use shared cache pool (recommended: true)
        k=0                # Checkpoint interval (0 = auto-tune)
    )
)

shared_cache: If true (default), multiple propagators share a single recomputation buffer. This saves significantly more memory but requires that propagators are not accessed simultaneously (which is true for standard SCFT).
k: The checkpointing interval. k=0 (default) calculates the optimal $k$ analytically to minimize memory usage. You can manually set an integer $k \ge 0$ if needed (e.g., k=10 stores checkpoints every 10th step).

CPU vs GPU Support

Low Memory Mode is fully device-agnostic:

CPU: Reduces RAM usage, allowing you to run massive grids or multiple concurrent jobs on a workstation.
GPU: Critical for running large 3D simulations on limited VRAM.

Example: GPU + Low Memory

Combine GPU arrays with low_memory=true to maximize grid size on your graphics card.

using CUDA, Polyorder

# 1. Setup system and lattice
# create lattice (lat) and polymer system (sys) here...

# 2. Create GPU-backed field
w_gpu = AuxiliaryField(CUDA.zeros(Float64, 128, 128, 128), lat)

# 3. Enable low memory checkpointing
scft_gpu = NoncyclicChainSCFT(sys, w_gpu, ds; 
    low_memory=true,  # Critical for 128^3 on consumer GPUs
    init=:randn
)

solve!(scft_gpu)

How It Works

This mode utilizes optimal checkpointing to reduce the storage complexity of propagator history from $\mathcal{O}(N_s)$ to $\mathcal{O}(\sqrt{N_s})$.

Checkpointing: Instead of storing the full propagator history ($N_s$ steps), we store "checkpoints" at intervals of $k$.
Recomputation: Intermediate steps between checkpoints are recomputed on-the-fly when needed during density calculation. This effectively means solving the MDEs twice per iteration, leading to a ~2x increase in compute time.
Shared Cache: A pool of temporary buffers is used for recomputation, shared across different propagators (e.g., forward/backward).
Optimized Implementation: We use pre-allocated buffers and in-place broadcasting (.=) for recomputation, ensuring that the allocation overhead per iteration is negligible (< 4%).

Garbage Collection: Run GC.gc(); CUDA.reclaim() between solves to free up cached GPU memory.
Fresh Session: Run a single large simulation in a fresh Julia session to ensure maximum contiguous memory availability.
Precision: Use Float32 instead of Float64 for your fields. This reduces memory usage by another 50% at the cost of some numerical precision. julia w_gpu = AuxiliaryField(CUDA.zeros(Float32, 128, 128, 128), lat)