Skip to content

transform

This module provides abstractions for transforms.

Transforms are reusable, composable operations that modify data in preparation for machine learning or analysis. Transforms unify pre-processing, feature engineering, and augmentation under a single protocol-based interface.

Nomenclature

Transforms are a set of operations that modify data. They can include operations such as data normalization, dimensionality reduction, data augmentation, and much more. These transformations are essential for preparing data for machine learning tasks and improving model performance.

We use the generalized term transform for all types of pre-processing of data, feature engineering, and data augmentation, as they all involve the same fundamental concept of transforming data to obtain a modified dataset.

flowcean provides a flexible and unified interface to apply transforms to data. The framework allows to combine these transforming steps steps as needed.

Using Transforms

Here's a basic example:

from flowcean.polars import Select, Standardize
from flowcean.core import Lambda

# Define a simple pipeline
transform = (
    Select(features=["x", "y"])
    | Standardize()
    | Lambda(lambda df: df.with_columns(z=df["x"] * df["y"]))
)

# Fit and apply
transform.fit(dataset)
transformed = transform(dataset)

# Invert (if supported)
restored = transform.inverse()(transformed)

Transform

Bases: Named, Protocol

Base protocol for all transforms in Flowcean.

A transform is a reusable operation that modifies data. Examples include preprocessing (e.g., standardization), feature engineering (e.g., feature selection, PCA), or augmentation (noise injection, synthetic features).

Transforms are composable via the | operator, allowing complex transformation pipelines to be expressed in a clean and functional style:

Example
>>> transform = Select(features=["x"]) | Standardize()
>>> transformed = transform(dataset)

apply(data) abstractmethod

Apply the transform to data.

Parameters:

Name Type Description Default
data Data

The data to transform.

required

Returns:

Type Description
Data

The transformed data.

Source code in src/flowcean/core/transform.py
78
79
80
81
82
83
84
85
86
87
@abstractmethod
def apply(self, data: Data) -> Data:
    """Apply the transform to data.

    Args:
        data: The data to transform.

    Returns:
        The transformed data.
    """

__call__(data)

Apply the transform to data.

Equivalent to self.apply(data).

Parameters:

Name Type Description Default
data Data

The data to transform.

required

Returns:

Type Description
Data

The transformed data.

Source code in src/flowcean/core/transform.py
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
@final
def __call__(self, data: Data) -> Data:
    """Apply the transform to data.

    Equivalent to ``self.apply(data)``.

    Args:
        data: The data to transform.

    Returns:
        The transformed data.
    """
    return self.apply(data)

chain(other)

Chain this transform with other.

This can be used to chain multiple transforms together. Chained transforms are applied left-to-right:

Example
chained = TransformA().chain(TransformB())
chained(data)  # Equivalent to TransformB(TransformA(data))

Parameters:

Name Type Description Default
other Transform

The transforms to chain.

required

Returns:

Type Description
Transform

A new chained transform.

Source code in src/flowcean/core/transform.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def chain(
    self,
    other: Transform,
) -> Transform:
    """Chain this transform with ``other``.

    This can be used to chain multiple transforms together. Chained
    transforms are applied left-to-right:

    Example:
        ```python
        chained = TransformA().chain(TransformB())
        chained(data)  # Equivalent to TransformB(TransformA(data))
        ```

    Args:
        other: The transforms to chain.

    Returns:
        A new chained transform.
    """
    return ChainedTransforms(self, other)

__or__(other)

Shorthand for chaining transforms.

Example
chained = TransformA() | TransformB()

Parameters:

Name Type Description Default
other Transform

The transform to chain.

required

Returns:

Type Description
Transform

A new Chain transform.

Source code in src/flowcean/core/transform.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def __or__(
    self,
    other: Transform,
) -> Transform:
    """Shorthand for chaining transforms.

    Example:
        ```python
        chained = TransformA() | TransformB()
        ```

    Args:
        other: The transform to chain.

    Returns:
        A new Chain transform.
    """
    return self.chain(other)

fit(data)

Fit the transform to data.

Many transforms (e.g. scaling, PCA) require statistics from the dataset before applying. Default implementation is a no-op. This is meant to be idempotent, i.e., calling fit() multiple times should have the same effect as calling it once.

Parameters:

Name Type Description Default
data Data

The data to fit to.

required
Source code in src/flowcean/core/transform.py
145
146
147
148
149
150
151
152
153
154
155
156
157
def fit(self, data: Data) -> Self:
    """Fit the transform to data.

    Many transforms (e.g. scaling, PCA) require statistics from the dataset
    before applying. Default implementation is a no-op.
    This is meant to be idempotent, i.e., calling ``fit()`` multiple times
    should have the same effect as calling it once.

    Args:
        data: The data to fit to.
    """
    _ = data
    return self

fit_incremental(data)

Incrementally fit the transform to streaming/batched data.

Default implementation is a no-op.

Parameters:

Name Type Description Default
data Data

The data to fit to.

required
Source code in src/flowcean/core/transform.py
159
160
161
162
163
164
165
166
167
168
def fit_incremental(self, data: Data) -> Self:
    """Incrementally fit the transform to streaming/batched data.

    Default implementation is a no-op.

    Args:
        data: The data to fit to.
    """
    _ = data
    return self

Invertible

Bases: Protocol

Protocol for transforms that support inversion.

An invertible transform can undo its effect via inverse().

Example
>>> scaler = Standardize().fit(data)
>>> restored = scaler.inverse()(scaler(data))

inverse() abstractmethod

Return a new transform that inverts this one.

Returns:

Type Description
Transform

The inverse of the transform.

Source code in src/flowcean/core/transform.py
184
185
186
187
188
189
190
@abstractmethod
def inverse(self) -> Transform:
    """Return a new transform that inverts this one.

    Returns:
        The inverse of the transform.
    """

ChainedTransforms(*transforms)

Bases: Invertible, Transform

A composition of multiple transforms applied sequentially.

Chained transforms are applied left-to-right. Useful for building preprocessing pipelines.

Initialize the chained transforms.

Parameters:

Name Type Description Default
transforms Transform

The transforms to chain.

()
Source code in src/flowcean/core/transform.py
202
203
204
205
206
207
208
209
210
211
def __init__(
    self,
    *transforms: Transform,
) -> None:
    """Initialize the chained transforms.

    Args:
        transforms: The transforms to chain.
    """
    self.transforms = transforms

Identity()

Bases: Invertible, Transform

A no-op transform that returns data unchanged.

Often used as a placeholder or default transform.

Initialize the identity transform.

Source code in src/flowcean/core/transform.py
261
262
263
def __init__(self) -> None:
    """Initialize the identity transform."""
    super().__init__()

Lambda(func, *, inverse_func=None)

Bases: Transform, Invertible

A transform wrapping a function.

Useful for quick one-off transformations without creating a dedicated class.

Example
>>> to_float = Lambda(lambda df: df.cast(pl.Float64))
>>> normalized = Lambda(
...     lambda df: (df - df.mean()) / df.std(),
...     inverse_func=lambda df: df * df.std() + df.mean(),
... )

Initialize the lambda transform.

Parameters:

Name Type Description Default
func Callable[[Data], Data]

Function that transforms data.

required
inverse_func Callable[[Data], Data] | None

Optional function that inverts func.

None
Source code in src/flowcean/core/transform.py
300
301
302
303
304
305
306
307
308
309
310
311
312
313
def __init__(
    self,
    func: Callable[[Data], Data],
    *,
    inverse_func: Callable[[Data], Data] | None = None,
) -> None:
    """Initialize the lambda transform.

    Args:
        func: Function that transforms data.
        inverse_func: Optional function that inverts ``func``.
    """
    self.func = func
    self.inverse_func = inverse_func