sklearn

`Accuracy(features=None)`

Bases: SelectMixin, LazyMixin, Metric

Accuracy classification score.

Initialize metric.

Parameters:

Name	Type	Description	Default
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`

Source code in src/flowcean/sklearn/metrics/classification.py

def __init__(
    self,
    features: list[str] | None = None,
) -> None:
    """Initialize metric.

    Args:
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
    """
    super().__init__(features=features)

`ClassificationReport(features=None)`

Bases: SelectMixin, LazyMixin, Metric

Build a text report showing the main classification metrics.

As defined by scikit-learn.

Initialize metric.

Parameters:

Name	Type	Description	Default
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`

Source code in src/flowcean/sklearn/metrics/classification.py

def __init__(
    self,
    features: list[str] | None = None,
) -> None:
    """Initialize metric.

    Args:
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
    """
    super().__init__(features=features)

`FBetaScore(*, beta=1.0, features=None)`

Bases: SelectMixin, LazyMixin, Metric

F-beta score.

As defined by scikit-learn.

Initialize metric.

Parameters:

Name	Type	Description	Default
`beta`	`float`	The beta parameter.	`1.0`
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`

Source code in src/flowcean/sklearn/metrics/classification.py

def __init__(
    self,
    *,
    beta: float = 1.0,
    features: list[str] | None = None,
) -> None:
    """Initialize metric.

    Args:
        beta: The beta parameter.
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
    """
    super().__init__(features=features)
    self.beta = beta

`PrecisionScore(features=None)`

Bases: SelectMixin, LazyMixin, Metric

Precision classification score.

As defined by scikit-learn.

Initialize metric.

Parameters:

Name	Type	Description	Default
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`

Source code in src/flowcean/sklearn/metrics/classification.py

def __init__(
    self,
    features: list[str] | None = None,
) -> None:
    """Initialize metric.

    Args:
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
    """
    super().__init__(features=features)

`Recall(features=None)`

Bases: SelectMixin, LazyMixin, Metric

Recall classification score.

As defined by scikit-learn.

Initialize metric.

Parameters:

Name	Type	Description	Default
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`

Source code in src/flowcean/sklearn/metrics/classification.py

def __init__(
    self,
    features: list[str] | None = None,
) -> None:
    """Initialize metric.

    Args:
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
    """
    super().__init__(features=features)

`MaxError(feature=None)`

Bases: SelectMixin, LazyMixin, Metric

Max error regression loss.

As defined by scikit-learn.

Initialize MaxError metric.

Parameters:

Name	Type	Description	Default
`feature`	`str \| None`	The feature to calculate the metric for. If None, the metric expects a single feature in the data.	`None`

Source code in src/flowcean/sklearn/metrics/regression.py

def __init__(self, feature: str | None = None) -> None:
    """Initialize MaxError metric.

    Args:
        feature: The feature to calculate the metric for. If None, the
            metric expects a single feature in the data.
    """
    features = [feature] if feature is not None else None
    super().__init__(features=features)

`MeanAbsoluteError(features=None, multioutput='raw_values')`

Bases: SelectMixin, LazyMixin, MultiOutputMixin, Metric

Mean absolute error (MAE) regression loss.

As defined by scikit-learn.

Initialize metric.

Parameters:

Name	Type	Description	Default
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`
`multioutput`	`Literal['raw_values', 'uniform_average']`	Defines how to aggregate multiple output values. See scikit-learn documentation for details.	`'raw_values'`

Source code in src/flowcean/sklearn/metrics/regression.py

def __init__(
    self,
    features: list[str] | None = None,
    multioutput: Literal[
        "raw_values",
        "uniform_average",
    ] = "raw_values",
) -> None:
    """Initialize metric.

    Args:
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
        multioutput: Defines how to aggregate multiple output values.
            See [scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html)
            for details.
    """
    super().__init__(features=features, multioutput=multioutput)

`MeanAbsolutePercentageError(features=None, multioutput='raw_values')`

Bases: SelectMixin, LazyMixin, MultiOutputMixin, Metric

Mean absolute percentage error (MAPE) regression loss.

As defined by scikit-learn.

Initialize metric.

Parameters:

Name	Type	Description	Default
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`
`multioutput`	`Literal['raw_values', 'uniform_average']`	Defines how to aggregate multiple output values. See scikit-learn documentation for details.	`'raw_values'`

Source code in src/flowcean/sklearn/metrics/regression.py

def __init__(
    self,
    features: list[str] | None = None,
    multioutput: Literal[
        "raw_values",
        "uniform_average",
    ] = "raw_values",
) -> None:
    """Initialize metric.

    Args:
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
        multioutput: Defines how to aggregate multiple output values.
            See [scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html)
            for details.
    """
    super().__init__(features=features, multioutput=multioutput)

`MeanSquaredError(features=None, multioutput='raw_values')`

Bases: SelectMixin, LazyMixin, MultiOutputMixin, Metric

Mean squared error (MSE) regression loss.

As defined by scikit-learn.

Initialize metric.

Parameters:

Name	Type	Description	Default
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`
`multioutput`	`Literal['raw_values', 'uniform_average']`	Defines how to aggregate multiple output values. See scikit-learn documentation for details.	`'raw_values'`

Source code in src/flowcean/sklearn/metrics/regression.py

def __init__(
    self,
    features: list[str] | None = None,
    multioutput: Literal[
        "raw_values",
        "uniform_average",
    ] = "raw_values",
) -> None:
    """Initialize metric.

    Args:
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
        multioutput: Defines how to aggregate multiple output values.
            See [scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html)
            for details.
    """
    super().__init__(features=features, multioutput=multioutput)

`R2Score(features=None, multioutput='raw_values')`

Bases: SelectMixin, LazyMixin, MultiOutputMixin, Metric

R^2 (coefficient of determination) regression score.

As defined by scikit-learn.

Initialize metric.

Parameters:

Name	Type	Description	Default
`features`	`list[str] \| None`	The features to calculate the metric for. If None, the metric uses all features in the data.	`None`
`multioutput`	`Literal['raw_values', 'uniform_average']`	Defines how to aggregate multiple output values. See scikit-learn documentation for details.	`'raw_values'`

Source code in src/flowcean/sklearn/metrics/regression.py

def __init__(
    self,
    features: list[str] | None = None,
    multioutput: Literal[
        "raw_values",
        "uniform_average",
    ] = "raw_values",
) -> None:
    """Initialize metric.

    Args:
        features: The features to calculate the metric for. If None, the
            metric uses all features in the data.
        multioutput: Defines how to aggregate multiple output values.
            See [scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html)
            for details.
    """
    super().__init__(features=features, multioutput=multioutput)

`SciKitClassifierModel(estimator, *, output_names, threshold=0.5, name=None)`

Bases: SciKitModel

A SciKit model for classifiers with probability predictions.

Supports threshold-based predictions via the threshold attribute and exposes class probabilities via predict_proba. The estimator must implement predict_proba.

Initialize the classifier model.

Parameters:

Name	Type	Description	Default
`estimator`	`SupportsPredict`	The scikit-learn classifier (must support `predict_proba`).	required
`output_names`	`Iterable[str]`	The names of the output columns.	required
`threshold`	`float`	Decision threshold for the positive class (default: 0.5).	`0.5`
`name`	`str \| None`	The name of the model.	`None`

Source code in src/flowcean/sklearn/model.py

def __init__(
    self,
    estimator: SupportsPredict,
    *,
    output_names: Iterable[str],
    threshold: float = 0.5,
    name: str | None = None,
) -> None:
    """Initialize the classifier model.

    Args:
        estimator: The scikit-learn classifier (must support
            ``predict_proba``).
        output_names: The names of the output columns.
        threshold: Decision threshold for the positive class
            (default: 0.5).
        name: The name of the model.
    """
    super().__init__(estimator, output_names=output_names, name=name)
    self.threshold = threshold

`predict_proba(input_features)`

Predict class probabilities, applying preprocessing transforms.

Parameters:

Name	Type	Description	Default
`input_features`	`DataFrame \| LazyFrame`	The inputs for which to predict probabilities.	required

Returns:

Type	Description
`LazyFrame`	The predicted probabilities for the positive class.

Source code in src/flowcean/sklearn/model.py

def predict_proba(
    self,
    input_features: pl.DataFrame | pl.LazyFrame,
) -> pl.LazyFrame:
    """Predict class probabilities, applying preprocessing transforms.

    Args:
        input_features: The inputs for which to predict probabilities.

    Returns:
        The predicted probabilities for the positive class.
    """
    input_features = self.preprocess(input_features)
    return self._predict_proba(input_features)

`SciKitModel(estimator, *, output_names, name=None)`

Bases: Model

A model that wraps a scikit-learn estimator.

Initialize the model.

Parameters:

Name	Type	Description	Default
`estimator`	`SupportsPredict`	The scikit-learn estimator.	required
`output_names`	`Iterable[str]`	The names of the output columns.	required
`name`	`str \| None`	The name of the model.	`None`

Source code in src/flowcean/sklearn/model.py

def __init__(
    self,
    estimator: SupportsPredict,
    *,
    output_names: Iterable[str],
    name: str | None = None,
) -> None:
    """Initialize the model.

    Args:
        estimator: The scikit-learn estimator.
        output_names: The names of the output columns.
        name: The name of the model.
    """
    super().__init__()
    if name is None:
        name = estimator.__class__.__name__
    self._name = name
    self.estimator = estimator
    self.output_names = list(output_names)

`RandomForestRegressorLearner(n_estimators=100, *, criterion='squared_error', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=1.0, max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, ccp_alpha=0.0, max_samples=None, monotonic_cst=None, callbacks=None)`

Bases: SupervisedLearner

Wrapper class for sklearn's RandomForestRegressor.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

Initialize the random forest learner.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

Parameters:

Name	Type	Description	Default
`n_estimators`	`int`	Number of trees in the forest.	`100`
`criterion`	`str`	Function to measure the quality of a split.	`'squared_error'`
`max_depth`	`int \| None`	Maximum depth of the tree.	`None`
`min_samples_split`	`int`	Minimum number of samples required to split an internal node.	`2`
`min_samples_leaf`	`int`	Minimum number of samples required to be at a leaf node.	`1`
`min_weight_fraction_leaf`	`float`	Minimum weighted fraction of the sum total of weights required to be at a leaf node.	`0.0`
`max_features`	`float`	Number of features to consider when looking for the best split.	`1.0`
`max_leaf_nodes`	`int \| None`	Grow trees with max_leaf_nodes in best-first fashion.	`None`
`min_impurity_decrease`	`float`	A node will be split if this split induces a decrease of the impurity greater than or equal to this value.	`0.0`
`bootstrap`	`bool`	Whether bootstrap samples are used when building trees.	`True`
`oob_score`	`bool`	Whether to use out-of-bag samples to estimate the R^2 on unseen data.	`False`
`n_jobs`	`int \| None`	Number of jobs to run in parallel.	`None`
`random_state`	`int \| None`	Controls the randomness of the estimator.	`None`
`verbose`	`int`	Controls the verbosity when fitting and predicting.	`0`
`warm_start`	`bool`	When set to True, reuse the solution of the previous call to fit.	`False`
`ccp_alpha`	`float`	Complexity parameter used for Minimal Cost-Complexity Pruning.	`0.0`
`max_samples`	`int \| float \| None`	If bootstrap is True, the number of samples to draw from X to train each base estimator.	`None`
`monotonic_cst`	`NDArray \| None`	Monotonicity constraints.	`None`
`callbacks`	`list[LearnerCallback] \| LearnerCallback \| None`	Optional callbacks for progress feedback. Use `None` for silent learning.	`None`

Source code in src/flowcean/sklearn/random_forest.py

def __init__(
    self,
    n_estimators: int = 100,
    *,
    criterion: str = "squared_error",
    max_depth: int | None = None,
    min_samples_split: int = 2,
    min_samples_leaf: int = 1,
    min_weight_fraction_leaf: float = 0.0,
    max_features: float = 1.0,
    max_leaf_nodes: int | None = None,
    min_impurity_decrease: float = 0.0,
    bootstrap: bool = True,
    oob_score: bool = False,
    n_jobs: int | None = None,
    random_state: int | None = None,
    verbose: int = 0,
    warm_start: bool = False,
    ccp_alpha: float = 0.0,
    max_samples: int | float | None = None,  # noqa: PYI041
    monotonic_cst: NDArray | None = None,
    callbacks: list[LearnerCallback] | LearnerCallback | None = None,
) -> None:
    """Initialize the random forest learner.

    Reference: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

    Args:
        n_estimators: Number of trees in the forest.
        criterion: Function to measure the quality of a split.
        max_depth: Maximum depth of the tree.
        min_samples_split: Minimum number of samples required to split
            an internal node.
        min_samples_leaf: Minimum number of samples required to be at
            a leaf node.
        min_weight_fraction_leaf: Minimum weighted fraction of the sum
            total of weights required to be at a leaf node.
        max_features: Number of features to consider when looking for
            the best split.
        max_leaf_nodes: Grow trees with max_leaf_nodes in best-first
            fashion.
        min_impurity_decrease: A node will be split if this split
            induces a decrease of the impurity greater than or equal
            to this value.
        bootstrap: Whether bootstrap samples are used when building trees.
        oob_score: Whether to use out-of-bag samples to estimate the R^2
            on unseen data.
        n_jobs: Number of jobs to run in parallel.
        random_state: Controls the randomness of the estimator.
        verbose: Controls the verbosity when fitting and predicting.
        warm_start: When set to True, reuse the solution of the previous
            call to fit.
        ccp_alpha: Complexity parameter used for Minimal Cost-Complexity
            Pruning.
        max_samples: If bootstrap is True, the number of samples to draw
            from X to train each base estimator.
        monotonic_cst: Monotonicity constraints.
        callbacks: Optional callbacks for progress feedback. Use `None`
            for silent learning.
    """
    self.regressor = RandomForestRegressor(
        n_estimators=n_estimators,
        criterion=criterion,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        min_weight_fraction_leaf=min_weight_fraction_leaf,
        max_features=max_features,
        max_leaf_nodes=max_leaf_nodes,
        min_impurity_decrease=min_impurity_decrease,
        bootstrap=bootstrap,
        oob_score=oob_score,
        n_jobs=n_jobs,
        random_state=random_state or get_seed(),
        verbose=verbose,
        warm_start=warm_start,
        ccp_alpha=ccp_alpha,
        max_samples=max_samples,
        monotonic_cst=monotonic_cst,
    )
    self.callback_manager = create_callback_manager(callbacks)

`learn(inputs, outputs)`

Fit the random forest regressor on the given inputs and outputs.

Source code in src/flowcean/sklearn/random_forest.py

@override
def learn(
    self,
    inputs: pl.LazyFrame,
    outputs: pl.LazyFrame,
) -> Model:
    """Fit the random forest regressor on the given inputs and outputs."""
    dfs = pl.collect_all([inputs, outputs])
    collected_inputs = dfs[0]
    collected_outputs = dfs[1]

    # Notify callbacks that learning is starting
    context = {
        "n_estimators": self.regressor.n_estimators,
        "n_samples": len(collected_inputs),
        "n_features": len(collected_inputs.columns),
    }
    self.callback_manager.on_learning_start(self, context)

    try:
        # Fit the model (flatten outputs to 1D if single column)
        outputs_array = collected_outputs.to_numpy()
        if outputs_array.shape[1] == 1:
            outputs_array = outputs_array.ravel()
        self.regressor.fit(collected_inputs, outputs_array)
        logger.info("Using Random Forest Regressor")

        # Create the model
        model = SciKitModel(
            self.regressor,
            output_names=outputs.collect_schema().names(),
        )

        # Notify callbacks that learning is complete
        self.callback_manager.on_learning_end(self, model)
    except Exception as e:
        # Notify callbacks of the error
        self.callback_manager.on_learning_error(self, e)
        raise
    else:
        return model

`RegressionTree(*, dot_graph_export_path=None, criterion='squared_error', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, ccp_alpha=0.0, monotonic_cst=None, callbacks=None)`

Bases: SupervisedLearner

Wrapper class for sklearn's DecisionTreeRegressor.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html

Initialize the regression tree learner.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html

Parameters:

Name	Type	Description	Default
`dot_graph_export_path`	`None \| str`	Path to export the decision tree graph in Graphviz DOT format.	`None`
`criterion`	`str`	Function to measure the quality of a split.	`'squared_error'`
`splitter`	`str`	Strategy used to choose the split at each node.	`'best'`
`max_depth`	`int \| None`	Maximum depth of the tree.	`None`
`min_samples_split`	`int`	Minimum number of samples required to split an internal node.	`2`
`min_samples_leaf`	`int`	Minimum number of samples required to be at a leaf node.	`1`
`min_weight_fraction_leaf`	`float`	Minimum weighted fraction of the sum total of weights required to be at a leaf node.	`0.0`
`max_features`	`float \| None`	Number of features to consider when looking for the best split.	`None`
`random_state`	`int \| None`	Controls the randomness of the estimator.	`None`
`max_leaf_nodes`	`int \| None`	Grow a tree with max_leaf_nodes in best-first fashion.	`None`
`min_impurity_decrease`	`float`	A node will be split if this split induces a decrease of the impurity greater than or equal to this value.	`0.0`
`ccp_alpha`	`float`	Complexity parameter used for Minimal Cost-Complexity Pruning.	`0.0`
`monotonic_cst`	`NDArray \| None`	Monotonicity constraints.	`None`
`callbacks`	`list[LearnerCallback] \| LearnerCallback \| None`	Optional callbacks for progress feedback. Use `None` for silent learning.	`None`

Source code in src/flowcean/sklearn/regression_tree.py

def __init__(
    self,
    *,
    dot_graph_export_path: None | str = None,
    criterion: str = "squared_error",
    splitter: str = "best",
    max_depth: int | None = None,
    min_samples_split: int = 2,
    min_samples_leaf: int = 1,
    min_weight_fraction_leaf: float = 0.0,
    max_features: float | None = None,
    random_state: int | None = None,
    max_leaf_nodes: int | None = None,
    min_impurity_decrease: float = 0.0,
    ccp_alpha: float = 0.0,
    monotonic_cst: NDArray | None = None,
    callbacks: list[LearnerCallback] | LearnerCallback | None = None,
) -> None:
    """Initialize the regression tree learner.

    Reference: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html

    Args:
        dot_graph_export_path: Path to export the decision tree graph
            in Graphviz DOT format.
        criterion: Function to measure the quality of a split.
        splitter: Strategy used to choose the split at each node.
        max_depth: Maximum depth of the tree.
        min_samples_split: Minimum number of samples required to split
            an internal node.
        min_samples_leaf: Minimum number of samples required to be at
            a leaf node.
        min_weight_fraction_leaf: Minimum weighted fraction of the sum
            total of weights required to be at a leaf node.
        max_features: Number of features to consider when looking for
            the best split.
        random_state: Controls the randomness of the estimator.
        max_leaf_nodes: Grow a tree with max_leaf_nodes in best-first
            fashion.
        min_impurity_decrease: A node will be split if this split
            induces a decrease of the impurity greater than or equal
            to this value.
        ccp_alpha: Complexity parameter used for Minimal Cost-Complexity
            Pruning.
        monotonic_cst: Monotonicity constraints.
        callbacks: Optional callbacks for progress feedback. Use `None`
            for silent learning.
    """
    self.regressor = DecisionTreeRegressor(
        criterion=criterion,
        splitter=splitter,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        min_weight_fraction_leaf=min_weight_fraction_leaf,
        max_features=max_features,
        max_leaf_nodes=max_leaf_nodes,
        min_impurity_decrease=min_impurity_decrease,
        random_state=random_state or get_seed(),
        ccp_alpha=ccp_alpha,
        monotonic_cst=monotonic_cst,
    )
    self.dot_graph_export_path = dot_graph_export_path
    self.callback_manager = create_callback_manager(callbacks)