Data¶
Dataset handling and data loading utilities.
DataGroup¶
DataGroup(data=None, keys=None, shard=None)
¶
Dict-like container for data arrays with consistent shape.
Args:
data (str | Dict[str, np.ndarray] | None): The data to be used in the dataset.
It can be a string representing the path to the data file in NPZ format or a dictionary where
keys are strings and values are numpy arrays.
keys (List[str] | None): A list of keys to be used from the data dictionary.
shard (Tuple[int, int] | None): A tuple representing the shard index and total shards.
Source code in aimnet/data/sgdataset.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
cv_split(cv=5, seed=None)
¶
Return list of cv tuples containing train and val DataGroups
Source code in aimnet/data/sgdataset.py
122 123 124 125 126 127 128 129 130 131 132 133 | |
sample(idx, keys=None)
¶
Return a new DataGroup with the data indexed by idx.
Source code in aimnet/data/sgdataset.py
103 104 105 106 107 108 109 | |
SizeGroupedDataset¶
SizeGroupedDataset(data=None, keys=None, shard=None)
¶
Source code in aimnet/data/sgdataset.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
SizeGroupedSampler¶
SizeGroupedSampler(ds, batch_size, batch_mode='molecules', shuffle=False, batches_per_epoch=-1, seed=None)
¶
Source code in aimnet/data/sgdataset.py
482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 | |