Skip to content

hoeffding_tree

HoeffdingTree(inputs, seed, model_handler, specs_handler)

Train a Hoeffding Tree on synthetic samples.

Samples are generated from another model.

Attributes:

datamodel: DataModel Object used to generate synthetic training inputs based on the original dataset.

list

Original training inputs transformed to River-compatible format with predictions.

list

List of indices for nominal features.

Methods:

train_tree() Trains a Hoeffding Tree and returns the trained model.

Initializes the HoeffdingTree trainer.

Parameters:

Name Type Description Default
inputs DataFrame

Original training dataset including target column.

required
seed int

Random seed for reproducible synthetic sample generation.

required
model_handler ModelHandler

Object used to generate predictions from the Flowcean model.

required
specs_handler SystemSpecsHandler

Object containing feature specifications and metadata.

required
Source code in src/flowcean/testing/generator/ddtig/domain/model_analyser/mut/hoeffding_tree.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def __init__(
    self,
    inputs: pl.DataFrame,
    seed: int,
    model_handler: ModelHandler,
    specs_handler: SystemSpecsHandler,
) -> None:
    """Initializes the HoeffdingTree trainer.

    Args:
        inputs: Original training dataset including target column.
        seed: Random seed for reproducible synthetic sample generation.
        model_handler: Object used to generate predictions from
            the Flowcean model.
        specs_handler: Object containing feature specifications
            and metadata.
    """
    # Remove target column to isolate input features
    inputs = inputs.drop(inputs.columns[-1])
    self.datamodel = DataModel(inputs, seed, model_handler, specs_handler)

    # Generate River-compatible samples using original data
    self.samples = self.datamodel.generate_dataset(original_data=True)
    self.nominal_attributes = specs_handler.get_nominal_features()

train_tree(performance_threshold, sample_limit, n_predictions, *, classification, **kwargs)

Train a Hoeffding Tree using synthetic samples.

Continue until performance criteria are met.

Parameters:

Name Type Description Default
performance_threshold float

Minimum performance required to finalize the model.

required
sample_limit int

Maximum number of samples to use during training.

required
n_predictions int

Number of consecutive correct predictions required to stop training.

required
classification bool

Indicates whether the task is classification or regression.

required
**kwargs Any

Additional hyperparameters for the Hoeffding Tree model.

{}

Returns:

Type Description
HoeffdingTreeRegressor | HoeffdingTreeClassifier

Trained Hoeffding Tree model.

Source code in src/flowcean/testing/generator/ddtig/domain/model_analyser/mut/hoeffding_tree.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
def train_tree(
    self,
    performance_threshold: float,
    sample_limit: int,
    n_predictions: int,
    *,
    classification: bool,
    **kwargs: Any,
) -> HoeffdingTreeRegressor | HoeffdingTreeClassifier:
    """Train a Hoeffding Tree using synthetic samples.

    Continue until performance criteria are met.

    Args:
        performance_threshold: Minimum performance required to
            finalize the model.
        sample_limit: Maximum number of samples to use during training.
        n_predictions: Number of consecutive correct predictions
            required to stop training.
        classification: Indicates whether the task is
            classification or regression.
        **kwargs: Additional hyperparameters for the Hoeffding Tree model.

    Returns:
        Trained Hoeffding Tree model.
    """
    metric, model = self._create_model_and_metric(
        classification=classification,
        **kwargs,
    )

    # Pre-train
    for x, y in self.samples:
        y_true = self._normalize_target(y, classification=classification)
        model.learn_one(x, y_true)

    self._run_training_loop(
        model=model,
        metric=metric,
        performance_threshold=performance_threshold,
        n_predictions=n_predictions,
        sample_limit=sample_limit,
        classification=classification,
    )

    logger.info("Hoeffding Tree training completed successfully.")
    return model