TrainingArguments
Arguments for GRADIEND training (HF Trainer–style, single training_args class).
Pass to Trainer at construction: Trainer(model=..., training_args=TrainingArguments(...)). Used directly by the core training loop.
activation_decoder
class-attribute
instance-attribute
Decoder activation name (e.g. 'id', 'tanh'). None = model default ('id').
activation_encoder
class-attribute
instance-attribute
Encoder activation name (e.g. 'tanh', 'gelu', 'relu'). None = model default ('tanh').
add_identity_for_other_classes
class-attribute
instance-attribute
If True, add identity (factual==alternative) examples for classes not in the target classes used for training.
bias_decoder
class-attribute
instance-attribute
Whether the decoder linear layer uses a bias term. None = model default (True).
convergent_mean_by_class_threshold
class-attribute
instance-attribute
Optional additional convergence criterion: minimum absolute mean encoded value (e.g. abs_mean_by_type['training']). When None, only convergent_score_threshold is used. Set to e.g. 0.5 to require strong separation in addition to correlation.
convergent_metric
class-attribute
instance-attribute
Metric for convergence: "correlation" or "loss". Defaults to correlation unless supervised_decoder.
convergent_score_threshold
class-attribute
instance-attribute
Threshold for convergence. Defaults to 0.6 for correlation; required for loss.
criterion
class-attribute
instance-attribute
Loss function; None = MSELoss().
decoder_eval_feature_factors
class-attribute
instance-attribute
Feature factors for decoder grid search. None = derive from trainer target classes.
decoder_eval_lrs
class-attribute
instance-attribute
Learning rates for decoder grid search. None = DecoderEvaluator defaults ([1e-2, 1e-3, 1e-4, 1e-5]).
decoder_eval_max_size_neutral
class-attribute
instance-attribute
Max samples for decoder neutral evaluation data (also LMS text cap). None = use default behavior.
decoder_eval_max_size_training_like
class-attribute
instance-attribute
Max samples for decoder training-like evaluation data. None = use default behavior.
delete_models
class-attribute
instance-attribute
If True, delete intermediate model files at end (e.g. .bin). This can be used to save disk space if you only care about metrics and not the model itself. Does not delete the whole model directory, which may contain other files (e.g. config, pre/post-prune results).
do_eval
class-attribute
instance-attribute
Whether to run evaluation during training.
encoder_decoder_same_device
class-attribute
instance-attribute
If True, place encoder and decoder on the same GPU (cuda:0), giving the base model the rest. Useful for large base models with pre-pruning: encoder+decoder are small and can share GPU 0; base model can use cuda:1 (2 GPUs) or cuda:2 (3+ GPUs). If False (default), encoder and decoder are split across cuda:0 and cuda:1 when 2+ GPUs are available.
encoder_eval_balance
class-attribute
instance-attribute
If True, balance encoder evaluation data per feature_class_id. If False, use natural class distribution.
encoder_eval_max_size
class-attribute
instance-attribute
Max samples for encoder evaluation outside training (e.g. analysis, manual evaluate_encoder). None = use all available.
encoder_eval_train_max_size
class-attribute
instance-attribute
Max samples for encoder evaluation during training (fast estimate; per-feature_class when available). None = use encoder_eval_max_size.
eval_steps
class-attribute
instance-attribute
Run evaluation every eval_steps (when eval_strategy == 'steps').
eval_strategy
class-attribute
instance-attribute
When to run evaluation: 'steps' (every eval_steps) or 'no'.
evaluate_fn
class-attribute
instance-attribute
Custom evaluation function; None = default (encoder correlation on eval data).
experiment_dir
class-attribute
instance-attribute
Root directory for this experiment. When set, default paths use subpaths under it (model, encoded_values, etc.). One experiment dir holds one model. Trainer.run_id (when set) is used as subdir under this.
keep_seed_runs
class-attribute
instance-attribute
If True, keep all per-seed model directories; otherwise delete model files and keep only metrics.
latent_dim
class-attribute
instance-attribute
GRADIEND latent dimension (number of features). None = model default (1).
max_steps
class-attribute
instance-attribute
If > 0, total number of steps; overrides num_train_epochs. -1 = use epochs.
min_convergent_seeds
class-attribute
instance-attribute
Stop once this many seeds have converged. None = run max_seeds. 0 is invalid.
model_use_cache
class-attribute
instance-attribute
When False (default), pass use_cache=False to decoder model forward during training (KV cache disabled). Use True only for inference/generation. Decoder-only MLM head training respects this via train_decoder_only_mlm_head.
normalize_gradiend
class-attribute
instance-attribute
Whether to normalize GRADIEND encodings during training, i.e., first target class is encoded to +1 and second to -1. This is recommended for enhanced comparability between runs.
output_dir
class-attribute
instance-attribute
Directory to save the trained model. If None and experiment_dir is set, uses experiment_dir/model (or experiment_dir/run_id/model when Trainer.run_id is set). Otherwise must be set explicitly.
params
class-attribute
instance-attribute
If set, only these parameter names or wildcards are included in the GRADIEND param map when building from a base model. None = include all backbone parameters (default). Enables future params selection processes.
post_prune_config
class-attribute
instance-attribute
If set, post-prune is run automatically after training. The pruned model is kept in memory for subsequent evaluation. No disk save unless you save explicitly.
pre_prune_config
class-attribute
instance-attribute
If set, pre-prune is run automatically before training. The pruned model is kept in memory; training then uses it. No disk save unless you save explicitly.
save_only_best
class-attribute
instance-attribute
If True, keep only the best checkpoint (by evaluation correlation).
save_steps
class-attribute
instance-attribute
Save checkpoint every save_steps when save_strategy == 'steps'.
save_strategy
class-attribute
instance-attribute
'best' (default): keep only best checkpoint by correlation. 'steps': also save periodic checkpoints every save_steps. 'no': no checkpointing.
seed
class-attribute
instance-attribute
Random seed for reproducible runs (default 0). The Trainer sets PyTorch/numpy/Python RNG, CUDA determinism, and CUBLAS/OMP env vars; data pipelines use this as random_state. Also the base for multi-seed runs (seed+i). Pass seed=None for non-deterministic runs. If results still vary, call set_seed(42) at the very start of your script or set env CUBLAS_WORKSPACE_CONFIG=:4096:8 and OMP_NUM_THREADS=1 before starting Python.
seed_runs_dir
class-attribute
instance-attribute
Directory for per-seed runs. Defaults to experiment_dir/seeds when experiment_dir is set.
seed_selection_eval_max_size
class-attribute
instance-attribute
Max samples for encoder evaluation when selecting the best seed. None = use encoder_eval_max_size.
source
class-attribute
instance-attribute
Source for GRADIEND input: 'factual', 'alternative', or 'diff'.
supervised_decoder
class-attribute
instance-attribute
If True, train only the GRADIEND decoder: decoder(labels) vs target gradients (MSE). Baseline mode. Cannot be True together with supervised_encoder.
supervised_encoder
class-attribute
instance-attribute
If True, train only the GRADIEND encoder: encode(source) vs labels (MSE). Baseline mode.
target
class-attribute
instance-attribute
Target for GRADIEND output: 'factual', 'alternative', or 'diff'.
torch_dtype
class-attribute
instance-attribute
dtype for model; None = torch.float32.
train_batch_size
class-attribute
instance-attribute
Batch size for training (single-device).
train_max_size
class-attribute
instance-attribute
If set, cap training samples per feature_class_id (downsampling).
Note: Balancing is handled automatically by the dataset scheduler via oversampling (cycling through balance groups). This parameter primarily reduces total dataset size for memory/performance. None = use all data.
trust_remote_code
class-attribute
instance-attribute
If True, pass trust_remote_code=True when loading models/tokenizers from Hugging Face (e.g. for EuroBERT).
use_cache
class-attribute
instance-attribute
If True, skip when output path exists (training: model dir; encoder: CSV; etc.). Use False to recompute/retrain.
use_cached_gradients
class-attribute
instance-attribute
Whether to use cached gradients if available. Using cached gradients speeds up training and evaluation, but leads to exhaustive memory usage (in memory) and/or on disk (in cached files).
__post_init__
Source code in gradiend/trainer/core/arguments.py
__str__
Source code in gradiend/trainer/core/arguments.py
from_dict
classmethod
Create from dict (e.g. loaded from JSON). Canonical keys only.
Source code in gradiend/trainer/core/arguments.py
get
to_dict
Dict for serialization (excludes callables and nn.Module). Canonical keys only.