transform

This module provides abstractions for transforms.

Transforms are reusable, composable operations that modify data in preparation for machine learning or analysis. Transforms unify pre-processing, feature engineering, and augmentation under a single protocol-based interface.

Nomenclature

Transforms are a set of operations that modify data. They can include operations such as data normalization, dimensionality reduction, data augmentation, and much more. These transformations are essential for preparing data for machine learning tasks and improving model performance.

We use the generalized term transform for all types of pre-processing of data, feature engineering, and data augmentation, as they all involve the same fundamental concept of transforming data to obtain a modified dataset.

flowcean provides a flexible and unified interface to apply transforms to data. The framework allows to combine these transforming steps steps as needed.

Using Transforms

Here's a basic example:

from flowcean.polars import Select, Standardize
from flowcean.core import Lambda

# Define a simple pipeline
transform = (
    Select(features=["x", "y"])
    | Standardize()
    | Lambda(lambda df: df.with_columns(z=df["x"] * df["y"]))
)

# Fit and apply
transform.fit(dataset)
transformed = transform(dataset)

# Invert (if supported)
restored = transform.inverse()(transformed)

`Transform`

Bases: Named, Protocol

Base protocol for all transforms in Flowcean.

A transform is a reusable operation that modifies data. Examples include preprocessing (e.g., standardization), feature engineering (e.g., feature selection, PCA), or augmentation (noise injection, synthetic features).

Transforms are composable via the | operator, allowing complex transformation pipelines to be expressed in a clean and functional style:

Example

>>> transform = Select(features=["x"]) | Standardize()
>>> transformed = transform(dataset)

`apply(data)` `abstractmethod`

Apply the transform to data.

Parameters:

Name	Type	Description	Default
`data`	`Data`	The data to transform.	required

Returns:

Type	Description
`Data`	The transformed data.

Source code in src/flowcean/core/transform.py

@abstractmethod
def apply(self, data: Data) -> Data:
    """Apply the transform to data.

    Args:
        data: The data to transform.

    Returns:
        The transformed data.
    """

`call(data)`

Apply the transform to data.

Equivalent to self.apply(data).

Parameters:

Name	Type	Description	Default
`data`	`Data`	The data to transform.	required

Returns:

Type	Description
`Data`	The transformed data.

Source code in src/flowcean/core/transform.py

@final
def __call__(self, data: Data) -> Data:
    """Apply the transform to data.

    Equivalent to ``self.apply(data)``.

    Args:
        data: The data to transform.

    Returns:
        The transformed data.
    """
    return self.apply(data)

`chain(other)`

Chain this transform with other.

This can be used to chain multiple transforms together. Chained transforms are applied left-to-right:

Example

chained = TransformA().chain(TransformB())
chained(data)  # Equivalent to TransformB(TransformA(data))

Parameters:

Name	Type	Description	Default
`other`	`Transform`	The transforms to chain.	required

Returns:

Type	Description
`Transform`	A new chained transform.

Source code in src/flowcean/core/transform.py

def chain(
    self,
    other: Transform,
) -> Transform:
    """Chain this transform with ``other``.

    This can be used to chain multiple transforms together. Chained
    transforms are applied left-to-right:

    Example:
        ```python
        chained = TransformA().chain(TransformB())
        chained(data)  # Equivalent to TransformB(TransformA(data))
        ```

    Args:
        other: The transforms to chain.

    Returns:
        A new chained transform.
    """
    return ChainedTransforms(self, other)

`or(other)`

Shorthand for chaining transforms.

Example

chained = TransformA() | TransformB()

Parameters:

Name	Type	Description	Default
`other`	`Transform`	The transform to chain.	required

Returns:

Type	Description
`Transform`	A new Chain transform.

Source code in src/flowcean/core/transform.py

def __or__(
    self,
    other: Transform,
) -> Transform:
    """Shorthand for chaining transforms.

    Example:
        ```python
        chained = TransformA() | TransformB()
        ```

    Args:
        other: The transform to chain.

    Returns:
        A new Chain transform.
    """
    return self.chain(other)

`fit(data)`

Fit the transform to data.

Many transforms (e.g. scaling, PCA) require statistics from the dataset before applying. Default implementation is a no-op. This is meant to be idempotent, i.e., calling fit() multiple times should have the same effect as calling it once.

Parameters:

Name	Type	Description	Default
`data`	`Data`	The data to fit to.	required

Source code in src/flowcean/core/transform.py

def fit(self, data: Data) -> Self:
    """Fit the transform to data.

    Many transforms (e.g. scaling, PCA) require statistics from the dataset
    before applying. Default implementation is a no-op.
    This is meant to be idempotent, i.e., calling ``fit()`` multiple times
    should have the same effect as calling it once.

    Args:
        data: The data to fit to.
    """
    _ = data
    return self

`fit_incremental(data)`

Incrementally fit the transform to streaming/batched data.

Default implementation is a no-op.

Parameters:

Name	Type	Description	Default
`data`	`Data`	The data to fit to.	required

Source code in src/flowcean/core/transform.py

def fit_incremental(self, data: Data) -> Self:
    """Incrementally fit the transform to streaming/batched data.

    Default implementation is a no-op.

    Args:
        data: The data to fit to.
    """
    _ = data
    return self

`Invertible`

Bases: Protocol

Protocol for transforms that support inversion.

An invertible transform can undo its effect via inverse().

Example

>>> scaler = Standardize().fit(data)
>>> restored = scaler.inverse()(scaler(data))

`inverse()` `abstractmethod`

Return a new transform that inverts this one.

Returns:

Type	Description
`Transform`	The inverse of the transform.

Source code in src/flowcean/core/transform.py

@abstractmethod
def inverse(self) -> Transform:
    """Return a new transform that inverts this one.

    Returns:
        The inverse of the transform.
    """

`ChainedTransforms(*transforms)`

Bases: Invertible, Transform

A composition of multiple transforms applied sequentially.

Chained transforms are applied left-to-right. Useful for building preprocessing pipelines.

Initialize the chained transforms.

Parameters:

Name	Type	Description	Default
`transforms`	`Transform`	The transforms to chain.	`()`

Source code in src/flowcean/core/transform.py

def __init__(
    self,
    *transforms: Transform,
) -> None:
    """Initialize the chained transforms.

    Args:
        transforms: The transforms to chain.
    """
    self.transforms = transforms

`Identity()`

Bases: Invertible, Transform

A no-op transform that returns data unchanged.

Often used as a placeholder or default transform.

Initialize the identity transform.

Source code in src/flowcean/core/transform.py

def __init__(self) -> None:
    """Initialize the identity transform."""
    super().__init__()

`Lambda(func, *, inverse_func=None)`

Bases: Transform, Invertible

A transform wrapping a function.

Useful for quick one-off transformations without creating a dedicated class.

Example

>>> to_float = Lambda(lambda df: df.cast(pl.Float64))
>>> normalized = Lambda(
...     lambda df: (df - df.mean()) / df.std(),
...     inverse_func=lambda df: df * df.std() + df.mean(),
... )

Initialize the lambda transform.

Parameters:

Name	Type	Description	Default
`func`	`Callable[[Data], Data]`	Function that transforms data.	required
`inverse_func`	`Callable[[Data], Data] \| None`	Optional function that inverts `func`.	`None`

Source code in src/flowcean/core/transform.py

def __init__(
    self,
    func: Callable[[Data], Data],
    *,
    inverse_func: Callable[[Data], Data] | None = None,
) -> None:
    """Initialize the lambda transform.

    Args:
        func: Function that transforms data.
        inverse_func: Optional function that inverts ``func``.
    """
    self.func = func
    self.inverse_func = inverse_func

transform

Nomenclature

Using Transforms

Transform

apply(data) abstractmethod

__call__(data)

chain(other)

__or__(other)

fit(data)

fit_incremental(data)

Invertible

inverse() abstractmethod

ChainedTransforms(*transforms)

Identity()

Lambda(func, *, inverse_func=None)

`Transform`

`apply(data)` `abstractmethod`

`call(data)`

`chain(other)`

`or(other)`

`fit(data)`

`fit_incremental(data)`

`Invertible`

`inverse()` `abstractmethod`

`ChainedTransforms(*transforms)`

`Identity()`

`Lambda(func, *, inverse_func=None)`