Model Architecture¶

This page describes the internal architecture of AIMNet2 models for ML researchers and developers. For model selection and usage, see the Model Selection Guide.

Overview¶

AIMNet2 is an atom-centered neural network potential. The forward pass proceeds through these stages:

AEV Computation -- Expand interatomic distances and directions into a symmetry-adapted descriptor (Atomic Environment Vectors).
Multi-Pass Charge Equilibration -- Iteratively refine per-atom charges using ConvSV convolutions and MLPs, enforcing total charge conservation.
Output Pipeline -- Map final atomic features to per-atom energies, apply self-atomic energy shifts, sum to molecular energy, and add long-range corrections (Coulomb, DFT-D3).

coordinates ──> AEV (AEVSV) ──> [MLP_0] ──> [MLP_1] ──> ... ──> [MLP_final]
   |                 |              |            |                     |
   |            g_sv (radial +   initial     refine               AIM vector
   |             directional)   charges      charges
   |                                                                  |
   |                                                           Output + AtomicShift
   |                                                                  |
   |                                                             AtomicSum
   |                                                                  |
   +--------- LRCoulomb (external) -----> molecular energy <--- DFTD3 (external)

AEV Descriptors (AEVSV)¶

The Atomic Environment Vector module (AEVSV in aimnet/modules/aev.py) encodes the local geometry around each atom as a fixed-length descriptor.

Radial Basis¶

Interatomic distances are expanded over a set of shifted Gaussian basis functions:

\[ g_s(r_{ij}) = \exp\bigl(-\eta\,(r_{ij} - r_s)^2\bigr) \cdot f_c(r_{ij}) \]

Default parameters:

Parameter	Value	Description
`rc_s`	5.0 A	Cutoff radius
`nshifts_s`	16	Number of Gaussian shifts
`rmin`	0.8 A	Minimum distance for shifts
`eta`	Auto	Width, computed as `1 / ((rc - rmin) / nshifts)^2`

The shifts are placed at equal intervals between rmin and rc_s. A cosine cutoff envelope f_c(r) smoothly brings the basis functions to zero at the cutoff radius.

Scalar + Vector Decomposition¶

AIMNet2 decomposes the AEV into scalar (radial) and vector (directional) components. For each neighbor j of atom i:

Scalar component: The Gaussian-expanded distance weighted by the cutoff envelope. Shape: (nshifts,) per neighbor.
Vector component: The scalar component multiplied by the unit direction vector r_ij / |r_ij|. Shape: (nshifts, 3) per neighbor.

These are concatenated into a single tensor g_sv of shape (num_neighbors, nshifts, 4) -- one scalar channel plus three vector channels.

Why Scalar + Vector?

Separating scalar and vector parts allows the network to learn both distance-dependent features (radial symmetry functions) and orientation- dependent features (angular information) within a unified framework, without requiring explicit angular symmetry functions.

ConvSV Convolution¶

The ConvSV module (aimnet/modules/aev.py) is the core convolution operation that combines local geometry (from the AEV) with atomic features. It appears twice in AIMNet2: once for atomic features (conv_a) and once for charge features (conv_q).

Mechanism¶

Given atomic features a (shape (N, nchannel) or (N, nchannel, nshifts) for 2D features) and the AEV descriptor g_sv (shape (N, num_neighbors, nshifts, 4)):

Gather neighbor features: Collect features a_j for all neighbors of each atom using the neighbor list (nbmat).
Contract features with geometry via einsum:

For 1D features:

avf_sv[i, a, g, d] = sum_j  a_j[i, m, a] * g_sv[i, m, g, d]

For 2D features (d2features=True):

avf_sv[i, a, g, d] = sum_j  a_j[i, m, a, g] * g_sv[i, m, g, d]

Split scalar and vector parts:
avf_s -- Scalar part (d=0), flattened to (nchannel * nshifts_s,).
avf_v -- Vector part (d=1,2,3), processed through learned linear combinations via the agh parameter tensor, then squared and summed over the spatial dimension to produce rotationally invariant features. Output: (nchannel * ncomb_v,).
Concatenate: The scalar and vector outputs are concatenated into a single feature vector of size nchannel * nshifts_s + nchannel * ncomb_v.

GPU Acceleration

When 2D features are used on CUDA, ConvSV dispatches to a custom Warp kernel (conv_sv_2d_sp) for sparse gather-and-contract, avoiding the memory overhead of materializing the full neighbor feature tensor.

Multi-Pass Charge Equilibration¶

The core of AIMNet2 is an iterative charge equilibration loop. The model runs N MLP passes (typically 3), each refining the atomic charges while updating atomic features.

Pass Structure¶

Pass 0 (initialization):

Input: ConvSV(a) -- convolution of initial atomic embeddings with geometry.
Output: Initial charges q, charge flexibility f, and feature update delta_a.
Charges are set directly (not added to previous values).

Passes 1 to N-2 (refinement):

Input: ConvSV(a) + ConvSV(q) -- convolution of both atomic and charge features.
Output: Charge correction delta_q, updated flexibility f, feature update delta_a.
Charges are updated as q = q + delta_q.

Pass N-1 (final):

Input: Same as refinement passes.
Output: AIM vector (Atomic Interaction Model) -- the final per-atom representation passed to the output pipeline.

Charge Conservation (NSE)¶

After each pass, charges are redistributed to enforce total charge conservation using the NSE (Neutral Spin Equilibrated) scheme:

\[ q_i^{\text{final}} = q_i^{\text{raw}} + f_i \cdot \frac{Q_{\text{total}} - \sum_j q_j^{\text{raw}}}{\sum_j f_j + \epsilon} \]

where:

q_i^raw is the unconstrained per-atom charge from the MLP.
f_i is the per-atom charge flexibility (always positive, via squaring).
Q_total is the target total charge.
The ratio redistributes the charge deficit proportionally to each atom's flexibility.

This ensures exact charge conservation at every pass without constraining the network outputs directly.

Open-Shell Extension (num_charge_channels=2)¶

For the NSE model (aimnet2nse), charges have two channels: alpha-spin and beta-spin. The model sets num_charge_channels=2.

Preprocessing: The total charge Q and multiplicity M are converted to two-channel targets:

Q_alpha = 0.5 * (Q + (M - 1))
Q_beta  = 0.5 * (Q - (M - 1))

During equilibration: Both channels are equilibrated independently using the same NSE formula, maintaining conservation for each spin channel.

Postprocessing: The two channels are combined:

charges = q_alpha + q_beta (total charge per atom)
spin_charges = q_alpha - q_beta (spin density per atom)

Output Pipeline¶

After the multi-pass loop produces the AIM vector, the output pipeline converts it to physical observables:

1. Output MLP¶

An Output module applies a final MLP to the AIM vector to produce per-atom raw energies:

aim_vector (per atom) --[MLP]--> per-atom energy (scalar)

2. AtomicShift (Self-Atomic Energies)¶

AtomicShift adds element-specific energy offsets (SAE values) to the per-atom energies. These are stored as a learnable nn.Embedding indexed by atomic number:

e_atom = e_raw + SAE[atomic_number]

The SAE values are stored in float64 precision to avoid numerical issues when computing energy differences between large molecules.

3. AtomicSum¶

AtomicSum sums per-atom energies within each molecule to produce molecular energies:

E_mol = sum_i  e_atom_i    (for atoms i in molecule)

4. Long-Range Corrections¶

Two external modules add long-range physics that the short-range NN cannot capture:

LRCoulomb -- Electrostatic energy from predicted charges. Three methods:

Method	Use Case	Neighbor List
`simple`	Non-periodic, small systems	All pairs
`dsf`	Periodic systems	Finite cutoff (default 15 A)
`ewald`	Periodic, high accuracy	Computed from accuracy target

The Coulomb module uses the charges predicted by the equilibration loop and subtracts the short-range Coulomb component (already learned by the NN) to avoid double-counting.

DFTD3 -- Grimme's DFT-D3 dispersion correction with BJ damping. Uses reference C6 coefficients and coordination-number-dependent interpolation. Applied with a smoothed cutoff at 15 A (default).

Data Flow Summary¶

The complete data flow through the model:

Input: coord, numbers, charge [, mult]
  |
  v
AEV (AEVSV): coord + nbmat --> g_sv (N, nnb, nshifts, 4)
  |
  v
Embedding: numbers --> a (N, nfeature [, nshifts])
  |
  v
Pass 0: ConvSV(a, g_sv) --> MLP --> q_initial, f, delta_a
  |                                    |
  |                               NSE(Q, q, f)
  v
Pass 1: ConvSV(a, g_sv) + ConvSV(q, g_sv) --> MLP --> delta_q, f, delta_a
  |                                                      |
  |                                                 NSE(Q, q+dq, f)
  v
Pass 2 (final): ConvSV(a, g_sv) + ConvSV(q, g_sv) --> MLP --> aim_vector
  |
  v
Output MLP: aim --> e_atom_raw
  |
  v
AtomicShift: e_atom_raw + SAE[Z] --> e_atom
  |
  v
AtomicSum: sum(e_atom) --> E_mol
  |
  v
LRCoulomb(charges) + DFTD3(coord) --> E_total

Glossary¶

Term	Full Name	Description
AEV	Atomic Environment Vector	Fixed-length descriptor encoding the local geometry around each atom via Gaussian radial basis functions and directional components.
NSE	Neutral Spin Equilibrated	Charge redistribution scheme that enforces total charge (and optionally spin) conservation by distributing the deficit proportionally to per-atom flexibility values.
SAE	Self-Atomic Energy	Element-specific energy offset (a.k.a. atomic reference energy). Stored as a learnable embedding and added to per-atom NN outputs before summation.
DSF	Damped Shifted Force	Electrostatic method for periodic systems. Applies a damping function and shift to the Coulomb potential at a finite cutoff, avoiding the need for Ewald summation.
SRCoulomb	Short-Range Coulomb	The portion of Coulomb interaction within the NN cutoff radius (5.0 A). When using external Coulomb methods (DSF, Ewald), this component is subtracted to avoid double-counting with the NN.
nb_threshold	Neighbor Threshold	Atom count threshold (default 120) that controls whether molecules are processed in dense batched mode (small systems) or sparse flattened mode (large systems).
ConvSV	Scalar-Vector Convolution	Custom convolution combining neighbor atomic features with AEV geometry descriptors via einsum, producing rotationally invariant output through vector squaring.
AIM	Atomic Interaction Model	The final per-atom feature vector produced by the last MLP pass, which encodes all information needed to predict atomic properties.
D3TS	D3 with Tkatchenko-Scheffler combination rule	Embedded dispersion variant using TS mixing rules for C6 coefficients instead of standard D3 interpolation.
BJ damping	Becke-Johnson damping	Damping function for DFT-D3 that uses element-pair-specific cutoff radii to avoid divergence at short distances.

Key Source Files¶

File	Contents
`aimnet/models/aimnet2.py`	`AIMNet2` model class with multi-pass forward loop
`aimnet/models/base.py`	`AIMNet2Base` class, input preprocessing, `load_model()`
`aimnet/modules/aev.py`	`AEVSV` (AEV computation) and `ConvSV` (convolution)
`aimnet/modules/core.py`	`Output`, `AtomicShift`, `AtomicSum`, `MLP`, `Embedding`
`aimnet/modules/lr.py`	`LRCoulomb`, `SRCoulomb`, `DFTD3`, `D3TS`
`aimnet/ops.py`	`nse()` charge equilibration function
`aimnet/calculators/calculator.py`	`AIMNet2Calculator` inference wrapper