TextPreprocessConfig
Configuration for preprocessing text: sentence splitting and length/character filters.
Used by :class:~gradiend.data.text.prediction.creator.TextPredictionDataCreator and
:func:preprocess_texts. When preprocess is None (e.g. in the data creator),
no preprocessing is applied and texts are used as-is.
Attributes:
| Name | Type | Description |
|---|---|---|
split_to_sentences |
bool
|
If |
min_chars |
Optional[int]
|
Drop segments (sentences or whole texts) with strictly fewer than this
many characters. |
max_chars |
Optional[int]
|
Drop segments with strictly more than this many characters. |
exclude_chars |
Optional[str]
|
Drop segments that contain any character in this string (e.g.
|
custom_filter |
Optional[Callable[[str], bool]]
|
Optional callable |