ModelWithGradiend
Bases: Module, ABC
Abstract base class that combines a base model (neural network) with a GRADIEND model.
The GRADIEND model holds encoder/decoder weights. This adapter: - interprets GRADIEND IO (source/target), - defines how gradients are created (create_gradients), - provides encode() and rewrite_base_model(), - persists adapter-level config next to the GRADIEND checkpoint.
Important refactor invariant: - self.gradiend.param_map is a dict mapping each base parameter name to a param-spec: {"shape": tuple[int,...], "repr": "all"|"mask"|"indices", ("mask": BoolTensor), ("indices": LongTensor)} Construction-time normalization happens here (adapter), since shapes come from base_model.
Subclasses must implement: - create_gradients(...) - _save_model(...) - _load_model(...) - _create_gradiend(...)
Source code in gradiend/model/model_with_gradiend.py
__len__
__str__
Source code in gradiend/model/model_with_gradiend.py
_create_gradiend
classmethod
Create a new ParamMappedGradiendModel when loading a path that is not a GRADIEND checkpoint.
Uses modality-agnostic build_gradiend_from_base_model (backbone vs head split). When pre_prune_config is set, uses lazy_init=True so encoder/decoder are built only after prune. Subclasses may override for custom behavior.
Source code in gradiend/model/model_with_gradiend.py
_ensure_gradiend_param_map_spec
Validate gradiend.param_map is in spec-dict format.
param_map must be a dict of per-parameter specs including "shape" and "repr".
Source code in gradiend/model/model_with_gradiend.py
_get_device_config
classmethod
Optional hook: return device_encoder, device_decoder, device_base_model (or similar) for loading. Default uses resolve_device_config_for_model; subclasses may override _resolve_device_config.
Source code in gradiend/model/model_with_gradiend.py
_load_model
abstractmethod
classmethod
Subclass hook to load the base model and modality-specific components.
When base_model_id is set, load_directory is a GRADIEND checkpoint dir and the base model should be loaded from base_model_id (and gradiend_kwargs may contain e.g. tokenizer path). When base_model_id is None, load_directory is the base model path/name; load base model from it.
Returns:
| Type | Description |
|---|---|
tuple
|
(base_model, *extra) where extra are modality-specific training_args for the subclass constructor |
tuple
|
(e.g. tokenizer for text). |
Source code in gradiend/model/model_with_gradiend.py
_post_init_from_pretrained
Optional hook called after from_pretrained builds the instance (e.g. to freeze base model layers). Subclasses may override; default no-op.
_resolve_device_config
classmethod
Hook for resolving device placement for base/encoder/decoder.
Source code in gradiend/model/model_with_gradiend.py
_save_model
abstractmethod
Subclass hook to persist base-model artifacts.
Implementations typically save the base model, and any modality-specific files needed to restore the model at load time (e.g., tokenizer).
Source code in gradiend/model/model_with_gradiend.py
cpu
create_gradients
abstractmethod
Create GRADIEND input gradients for a modality-specific example.
Expected to run the base model forward/backward and return either: - a 1D tensor in GRADIEND input space, or - a dict of per-parameter gradient tensors compatible with the GRADIEND param_map.
Subclasses decide how to build inputs, labels, and loss for their modality.
Source code in gradiend/model/model_with_gradiend.py
cuda
Move base_model and gradiend to CUDA. device: None (default cuda), int (cuda:N), or str/torch.device.
Source code in gradiend/model/model_with_gradiend.py
encode
Encode input to latent space.
Supports: - raw modality input (e.g. str) -> create_gradients -> encode - already-created gradient tensor in GRADIEND input space
Source code in gradiend/model/model_with_gradiend.py
from_pretrained
classmethod
Load a ModelWithGradiend from a directory (GRADIEND checkpoint) or create new from base model path.
Common logic: normalize path, try ParamMappedGradiendModel.from_pretrained, then either load base via _load_model(..., base_model_id=..., gradiend_kwargs=...) or _load_model + _create_gradiend; load gradiend_context.json (source/target), instantiate cls(...), restore feature_class_encoding_direction. Modality-specific loading is in _get_device_config, _load_model, _create_gradiend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
load_directory
|
Any
|
Directory path or model identifier to load from. |
required |
require_gradiend_model
|
bool
|
If True, load_directory must be a GRADIEND checkpoint. Raises FileNotFoundError (or ValueError) if not found. Default: False. |
False
|
feature_definition
|
Optional[Any]
|
Optional FeatureLearningDefinition instance. When provided, uses its pair and classes attributes to set feature_class_encoding_direction on the model. |
None
|
**kwargs
|
Any
|
Additional arguments passed to _load_model and _create_gradiend. |
{}
|
Source code in gradiend/model/model_with_gradiend.py
694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 | |
get_enhancer_mask
Source code in gradiend/model/model_with_gradiend.py
get_topk_weights
Return top-k base-global indices by importance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
part
|
str
|
Importance source passed to get_weight_importance. Options: "encoder-weight", "decoder-weight", "decoder-bias", "decoder-sum". |
'decoder-weight'
|
topk
|
int
|
Number of indices to return (clipped to input_dim) or a proportion in (0, 1]. |
1000
|
Returns:
| Type | Description |
|---|---|
List[int]
|
List of base-global input indices (length k) sorted by descending importance (base-global |
List[int]
|
index means the index in the flattened input space corresponding to the base model parameters, |
List[int]
|
not the GRADIEND input space, such that differently pruned GRADIEND models are comparable). |
Source code in gradiend/model/model_with_gradiend.py
get_weight_importance
Return per-input-dimension importance from GRADIEND weights.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
part
|
str
|
Which component to use for importance aggregation: - "encoder-weight": L1 over encoder weight columns - "decoder-weight": L1 over decoder weight rows - "decoder-bias": absolute decoder bias - "decoder-sum": absolute(sum(weight_row) + bias) |
'decoder-weight'
|
Returns:
| Type | Description |
|---|---|
'torch.Tensor'
|
1D CPU float tensor of length input_dim, where higher means more |
'torch.Tensor'
|
influential according to the chosen aggregation. |
Source code in gradiend/model/model_with_gradiend.py
invert_encoding
Invert encoder direction by flipping encoder/decoder signs.
This preserves reconstruction while flipping the sign of the latent feature. Set update_direction=True only for manual/user-driven flips.
Source code in gradiend/model/model_with_gradiend.py
parameters
Return GRADIEND parameters (adapter exposes GRADIEND weights as trainable parameters).
prune_gradiend
prune_gradiend(*, topk=None, threshold=None, mask=None, part='decoder-weight', importance=None, inplace=True, return_mask=False)
Prune GRADIEND input space by selecting important input dimensions and physically reducing
gradiend.input_dim. Converts gradiend.param_map list -> dict internally; the pruned gradiend
will have dict(param -> bool mask).
Selection is applied in fixed order: mask -> threshold -> topk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topk
|
Optional[float]
|
int (absolute) or float in (0,1] (relative fraction of remaining dims). |
None
|
threshold
|
Optional[float]
|
keep dims with importance >= threshold (importance from get_weight_importance(part) or importance arg). |
None
|
mask
|
Optional[Tensor]
|
optional bool mask of shape (gradiend.input_dim,) in current GRADIEND input space. |
None
|
part
|
str
|
'encoder-weight' | 'decoder-weight' | 'decoder-bias' | 'decoder-sum' (delegated to get_weight_importance when importance is None). |
'decoder-weight'
|
importance
|
Optional[Tensor]
|
optional 1D tensor (e.g. from pre-prune gradient mean); when provided, used instead of get_weight_importance(part). |
None
|
inplace
|
bool
|
if True, mutate self; else return a deepcopy with pruned gradiend. |
True
|
return_mask
|
bool
|
if True, also return the final combined_mask (in original input space). |
False
|
Returns:
| Type | Description |
|---|---|
Union['ModelWithGradiend', Tuple['ModelWithGradiend', Tensor]]
|
model (self or copy) or (model, combined_mask) if return_mask=True |
Source code in gradiend/model/model_with_gradiend.py
pruned_length
rewrite_base_model
Rewrite the base model by applying GRADIEND-derived updates.
General form: $base_model + learning_rate * enhancer(part, feature_factor)$, where enhancer(part, feature_factor) is defined by the selected part below.
part: - 'decoder' : uses decoder(feature_factor) - 'decoder-weight' : uses weight vector of decoder - 'decoder-bias' : uses decoder bias vector - 'decoder-sum' : uses decoder (weight + bias) vector - 'encoder-weight' : uses encoder weights
Source code in gradiend/model/model_with_gradiend.py
save_pretrained
Save base model artifacts + GRADIEND metadata and weights.
Writes: - gradiend_context.json (source/target and optional feature_class_encoding_direction) - subclass hook _save_model(...) - gradiend.save_pretrained(...)
Source code in gradiend/model/model_with_gradiend.py
set_feature_class_encoding_direction
Set feature_class_encoding_direction from configuration (class_labels). Set-once.
Direction is taken directly from class_labels: +1, -1, or 0 (neutral).
Source code in gradiend/model/model_with_gradiend.py
to
Move base_model and gradiend to the given device. Accepts str or torch.device.
Source code in gradiend/model/model_with_gradiend.py
unpruned_length
with_original_base_model
Return a copy of this ModelWithGradiend with base_model replaced by new_base.
Used when the base model has a specialized head for training but evaluation should use the original underlying model. The gradiend and other attributes (name_or_path, tokenizer, etc.) are preserved.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_base
|
Module
|
The original/base model to use for evaluation. |
required |
Returns:
| Type | Description |
|---|---|
'ModelWithGradiend'
|
A new ModelWithGradiend instance with base_model=new_base, same gradiend, |
'ModelWithGradiend'
|
and param_lookup recomputed from new_base. |