Evaluator
High-level evaluation coordinator bound to a trainer.
This class owns an EncoderEvaluator and DecoderEvaluator and exposes convenience methods that pass the trainer through. It can also hold a Visualizer instance for evaluation-related plots.
Subclasses can override evaluation or plotting methods to customize caching, metrics, or visualization behavior.
Source code in gradiend/evaluator/evaluator.py
_visualizer_class
instance-attribute
_visualizer_class = visualizer_class if visualizer_class is not None else _default_visualizer_class()
_delegate_to_visualizer
Source code in gradiend/evaluator/evaluator.py
_get_visualizer
evaluate
Run encoder and decoder evaluation and return a combined result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs_encoder
|
dict
|
Optional dict of keyword arguments forwarded to evaluate_encoder. |
None
|
kwargs_decoder
|
dict
|
Optional dict of keyword arguments forwarded to evaluate_decoder. |
None
|
**kwargs
|
Any
|
Extra kwargs applied to both encoder and decoder evaluations (e.g., shared eval data settings). |
{}
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with: |
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Source code in gradiend/evaluator/evaluator.py
evaluate_decoder
evaluate_decoder(model_with_gradiend=None, feature_factors=None, lrs=None, use_cache=None, max_size_training_like=None, max_size_neutral=None, eval_batch_size=None, training_like_df=None, neutral_df=None, selector=None, summary_extractor=None, summary_metrics=None, target_class=None, increase_target_probabilities=True, plot=False, show=None)
Run decoder grid evaluation and return summary + grid for one direction (strengthen or weaken).
Only the dataset and feature-factor combinations required for the requested direction are computed. Use increase_target_probabilities=True (default) for strengthen, False for weaken.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_with_gradiend
|
Any
|
Optional ModelWithGradiend (or path) to evaluate. If None, the trainer's model is used. |
None
|
feature_factors
|
Optional[list]
|
Optional list of feature factors to test. If None, derived from direction and target classes. |
None
|
lrs
|
Optional[list]
|
Optional list of learning rates to test. If None, defaults are used. |
None
|
use_cache
|
Optional[bool]
|
If True, cached decoder grid results are reused when available under the trainer's experiment_dir. If None, defaults come from trainer training args. |
None
|
max_size_training_like
|
Optional[int]
|
Maximum size for generated training-like eval data. |
None
|
max_size_neutral
|
Optional[int]
|
Maximum size for generated neutral eval data (and LMS text cap). |
None
|
eval_batch_size
|
Optional[int]
|
Common eval batch size used for LMS. |
None
|
training_like_df
|
Optional[Any]
|
Optional explicit training-like DataFrame. |
None
|
neutral_df
|
Optional[Any]
|
Optional explicit neutral DataFrame. |
None
|
selector
|
Optional[Any]
|
Optional SelectionPolicy for choosing best candidate per metric (e.g. LMSThresholdPolicy). |
None
|
summary_extractor
|
Optional[Any]
|
Optional callable(results) -> (candidates, ctx). Use to add derived metrics (e.g. bpi, fpi, mpi) to candidates; then pass summary_metrics. |
None
|
summary_metrics
|
Optional[Any]
|
Optional list of metric names to summarize (e.g. ["bpi", "fpi", "mpi"]). |
None
|
target_class
|
Optional[Any]
|
If set (str or list of str), evaluate only for this target class (or classes). Restricts feature factors and datasets for efficiency. When None, evaluates for all target classes. |
None
|
increase_target_probabilities
|
bool
|
If True (default), compute strengthen summaries only (keys e.g. "3SG"). If False, compute weaken summaries only (keys e.g. "3SG_weaken"). Only required combinations are evaluated. |
True
|
plot
|
bool
|
If True, after selection run any missing dataset evaluations for plotting, update cache, then plot. |
False
|
show
|
Optional[bool]
|
If True, display the plot; if False, only save. When None and plot=True, defaults to True. |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Flat dict: for strengthen, keys like result['3SG']; for weaken, keys like result['3SG_weaken']. |
Dict[str, Any]
|
Each entry has value, feature_factor, learning_rate, id, strengthen, lms, base_lms. Plus 'grid'. |
Dict[str, Any]
|
When plot=True, also 'plot_paths' and 'plot_path'. |
Source code in gradiend/evaluator/evaluator.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
evaluate_encoder
evaluate_encoder(encoder_df=None, eval_data=None, use_cache=None, split=None, max_size=None, **kwargs)
Run encoder evaluation and return encoding/correlation metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder_df
|
Optional[Union[Any, Dict[str, Any]]]
|
Optional DataFrame or dict with "encoder_df" key. If provided, skips encoding and computes metrics from this data. Use evaluate_encoder(return_df=True) to get such a dict. |
None
|
eval_data
|
Any
|
Optional pre-computed GradientTrainingDataset. If None and encoder_df is None, the trainer creates eval data via create_eval_data. |
None
|
use_cache
|
Optional[bool]
|
If True, reuse cached JSON result under experiment_dir when available. If None, defaults come from trainer training args. |
None
|
split
|
Optional[str]
|
Dataset split for eval data creation. Default: "test". |
None
|
max_size
|
Optional[int]
|
Maximum samples per variant for eval data creation. |
None
|
**kwargs
|
Any
|
Forwarded to create_eval_data when encoder_df and eval_data are None. |
{}
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with keys: correlation, mean_by_class, mean_by_type, n_samples, |
Dict[str, Any]
|
all_data, training_only, target_classes_only, boundaries; optionally |
Dict[str, Any]
|
neutral_mean_by_type, mean_by_feature_class, label_value_to_class_name. |
Source code in gradiend/evaluator/evaluator.py
plot_encoder_distributions
Plot encoder distributions (typically a violin plot).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Forwarded to the Visualizer implementation. |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
Whatever the Visualizer returns (often a matplotlib/seaborn figure). |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If no Visualizer is configured and this method is not overridden in a subclass. |
Source code in gradiend/evaluator/evaluator.py
plot_encoder_scatter
Plot interactive encoder scatter (Plotly: jitter x, encoded y, colored by label).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Forwarded to the Visualizer implementation (encoder_df, show, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
Plotly Figure or None if Plotly is not installed. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If no Visualizer is configured. |
Source code in gradiend/evaluator/evaluator.py
plot_probability_shifts
Plot decoder probability shifts vs learning rate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Forwarded to the Visualizer implementation (decoder_results, class_ids, use_cache, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
Path to saved plot file or empty string. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If no Visualizer is configured. |
Source code in gradiend/evaluator/evaluator.py
plot_training_convergence
Plot training convergence (means by class/feature_class and correlation).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Forwarded to the Visualizer implementation. |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
Whatever the Visualizer returns (often a matplotlib/seaborn figure). |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If no Visualizer is configured and this method is not overridden in a subclass. |