Tutorial: Detailed workflow (overview)
This page is the map of the detailed workflow. After Start here, the full pipeline has five steps: data, training, intra-model evaluation, model rewrite, and inter-model evaluation. Each step has its own tutorial where the real detail lives.

- Tutorial: Feature Selection and Data Generation — Build training and neutral data (e.g. with spaCy and morphology). Output feeds into the trainer.
- Tutorial: GRADIEND Training — Experiment layout, pruning, multi-seed, convergence plot, and how to configure a real run.
- Tutorial: Intra-Model Evaluation — Encoder and decoder evaluation, including decoder config selection under LMS constraints.
- Tutorial: Model Rewrite — Apply decoder-selected rewrites to produce changed checkpoints.
- Tutorial: Inter-Model Evaluation — Comparing multiple runs (top-k overlap, heatmap).
Use this page to see how the pieces connect in one run; use the part tutorials when you need to understand or customize a step.
Optional dependencies by step
- Data generation: spaCy-based filtering requires
pip install gradiend[data]— see Data generation. - Training & evaluation plots: Convergence plots, encoder/decoder plots, heatmaps and Venn diagrams require
pip install gradiend[plot]— see Training, Evaluation (intra-model), Evaluation (inter-model).
Data: precomputed or generate yourself
You can either use precomputed data (e.g. a Hugging Face dataset) or generate data yourself from raw text with Tutorial: Data generation. The trainer accepts both; see Data handling for all formats.
Option A: Precomputed data (e.g. Hugging Face)
Pass a dataset id as data and, for neutral evaluation, as eval_neutral_data:
trainer = TextPredictionTrainer(
model="bert-base-german-cased",
run_id="masc_nom_fem_nom",
data="aieng-lab/de-gender-case-articles",
target_classes=["masc_nom", "fem_nom"],
masked_col="masked",
split_col="split",
eval_neutral_data="aieng-lab/wortschatz-leipzig-de-grammar-neutral",
args=args,
)
Option B: Generate data (e.g. German gender–case, 12 classes)
For the full German definite singular article paradigm there are 12 gender–case cells (3 genders × 4 cases). You define one TextFilterConfig per cell and pass them to TextPredictionDataCreator. Here we show five cells; the rest follow the same pattern (see Data generation for syncretism and the full list).
from gradiend.data import TextFilterConfig, TextPreprocessConfig, TextPredictionDataCreator
feature_targets = [
TextFilterConfig(targets=["der"], spacy_tags={"pos": "DET", "Case": "Nom", "Gender": "Masc", "Number": "Sing"}, id="masc_nom"),
TextFilterConfig(targets=["die"], spacy_tags={"pos": "DET", "Case": "Nom", "Gender": "Fem", "Number": "Sing"}, id="fem_nom"),
TextFilterConfig(targets=["den"], spacy_tags={"pos": "DET", "Case": "Acc", "Gender": "Masc", "Number": "Sing"}, id="masc_acc"),
TextFilterConfig(targets=["der"], spacy_tags={"pos": "DET", "Case": "Dat", "Gender": "Fem", "Number": "Sing"}, id="fem_dat"),
TextFilterConfig(targets=["des"], spacy_tags={"pos": "DET", "Case": "Gen", "Gender": "Neut", "Number": "Sing"}, id="neut_gen"),
# ... e.g. masc_dat, masc_gen, fem_acc, fem_gen, neut_nom, neut_acc, neut_dat for all 12
]
creator = TextPredictionDataCreator(
base_data="wikipedia", # load data from HuggingFace (requires datasets library)
hf_config="20231101.de", # German wikipedia
#base_data="path/to/texts.csv", # instead of HF data, use local csv file with 'text' column
#base_data=["Sentence 1", "Sentence 2"], # or provide a Python list of strings
text_column="text",
preprocess=TextPreprocessConfig(split_to_sentences=True, min_chars=50, max_chars=500),
spacy_model="de_core_news_sm",
feature_targets=feature_targets,
)
training = creator.generate_training_data(max_size_per_class=5000, format="per_class")
neutral = creator.generate_neutral_data(additional_excluded_words=["der", "die", "das", "den", "dem", "des"], max_size=5000)
Then pass training and neutral to the trainer as data=training, eval_neutral_data=neutral, and set target_classes to the pair you want for this run (e.g. ["masc_nom", "fem_nom"]).
One run, end to end
Below: one run using precomputed data (Option A). Replace the data and eval_neutral_data with the result of Option B if you generated data yourself. Training and evaluation steps are the same.
from gradiend import TextPredictionTrainer, TrainingArguments
from gradiend.trainer import PrePruneConfig, PostPruneConfig
# --- Data: set from Option A (precomputed) or Option B (generated) ---
data = "aieng-lab/de-gender-case-articles" # Option A; or use training from Option B
eval_neutral_data = "aieng-lab/wortschatz-leipzig-de-grammar-neutral" # Option A; or neutral from Option B
# --- Training args and trainer (see Tutorial: Training) ---
args = TrainingArguments(
experiment_dir="runs/gender_de_detailed",
train_batch_size=8,
max_steps=500,
eval_steps=100,
learning_rate=5e-5,
pre_prune_config=PrePruneConfig(n_samples=16, topk=0.01, source="diff"),
post_prune_config=PostPruneConfig(topk=0.05, part="decoder-weight"),
use_cache=True,
add_identity_for_other_classes=True,
)
trainer = TextPredictionTrainer(
model="bert-base-german-cased",
run_id="masc_nom_fem_nom",
data=data, # or training from Option B
eval_neutral_data=eval_neutral_data, # or neutral from Option B
target_classes=["masc_nom", "fem_nom"],
masked_col="masked",
split_col="split",
args=args,
)
# --- Train and evaluate (see Tutorial: Evaluation) ---
trainer.train()
trainer.plot_training_convergence()
enc_eval = trainer.evaluate_encoder(max_size=100, return_df=True, plot=True)
dec_results = trainer.evaluate_decoder()
# --- Rewrite (see Tutorial: Model rewrite) ---
changed_model = trainer.rewrite_base_model(decoder_results=dec_results, target_class="masc_nom")
For why each option matters and what to change when, follow the part tutorials: Data generation → Training → Evaluation (intra-model) → Model rewrite → Evaluation (inter-model).
Next steps
- Data generation — Syncretism, spaCy tags, one filter per gender–case cell.
- Training —
experiment_dirandrun_id, source/target, pre- and post-pruning, multi-seed, convergence plot. - Evaluation (intra-model) — Encoder vs decoder, selection policies, caching.
- Model rewrite —
rewrite_base_model, strengthen/weaken, and saving changed models. - Evaluation (inter-model) — Top-k overlap and heatmap for comparing runs.
- Data handling — All supported data formats and column names.
- Decoder-only models — Causal LMs and optional MLM head.