transform
This module provides abstractions for transforms.
Transforms are reusable, composable operations that modify data in preparation for machine learning or analysis. Transforms unify pre-processing, feature engineering, and augmentation under a single protocol-based interface.
Nomenclature
Transforms are a set of operations that modify data. They can include operations such as data normalization, dimensionality reduction, data augmentation, and much more. These transformations are essential for preparing data for machine learning tasks and improving model performance.
We use the generalized term transform for all types of pre-processing of data, feature engineering, and data augmentation, as they all involve the same fundamental concept of transforming data to obtain a modified dataset.
flowcean provides a flexible and unified interface to apply transforms to data. The framework allows to combine these transforming steps steps as needed.
Using Transforms
Here's a basic example:
from flowcean.polars import Select, Standardize
from flowcean.core import Lambda
# Define a simple pipeline
transform = (
Select(features=["x", "y"])
| Standardize()
| Lambda(lambda df: df.with_columns(z=df["x"] * df["y"]))
)
# Fit and apply
transform.fit(dataset)
transformed = transform(dataset)
# Invert (if supported)
restored = transform.inverse()(transformed)
Transform
Bases: Named, Protocol
Base protocol for all transforms in Flowcean.
A transform is a reusable operation that modifies data. Examples include preprocessing (e.g., standardization), feature engineering (e.g., feature selection, PCA), or augmentation (noise injection, synthetic features).
Transforms are composable via the | operator, allowing complex
transformation pipelines to be expressed in a clean and functional style:
Example
>>> transform = Select(features=["x"]) | Standardize()
>>> transformed = transform(dataset)
apply(data)
abstractmethod
Apply the transform to data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Data
|
The data to transform. |
required |
Returns:
| Type | Description |
|---|---|
Data
|
The transformed data. |
Source code in src/flowcean/core/transform.py
85 86 87 88 89 90 91 92 93 94 | |
__call__(data)
Apply the transform to data.
Equivalent to self.apply(data).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Data
|
The data to transform. |
required |
Returns:
| Type | Description |
|---|---|
Data
|
The transformed data. |
Source code in src/flowcean/core/transform.py
96 97 98 99 100 101 102 103 104 105 106 107 108 | |
chain(other)
Chain this transform with other.
This can be used to chain multiple transforms together. Chained transforms are applied left-to-right:
Example
chained = TransformA().chain(TransformB())
chained(data) # Equivalent to TransformB(TransformA(data))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Transform
|
The transforms to chain. |
required |
Returns:
| Type | Description |
|---|---|
Transform
|
A new chained transform. |
Source code in src/flowcean/core/transform.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
__or__(other)
Shorthand for chaining transforms.
Example
chained = TransformA() | TransformB()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Transform
|
The transform to chain. |
required |
Returns:
| Type | Description |
|---|---|
Transform
|
A new Chain transform. |
Source code in src/flowcean/core/transform.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
fit(data)
Fit the transform to data.
Many transforms (e.g. scaling, PCA) require statistics from the dataset
before applying. Default implementation is a no-op.
This is meant to be idempotent, i.e., calling fit() multiple times
should have the same effect as calling it once.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Data
|
The data to fit to. |
required |
Source code in src/flowcean/core/transform.py
152 153 154 155 156 157 158 159 160 161 162 163 164 | |
fit_incremental(data)
Incrementally fit the transform to streaming/batched data.
Default implementation is a no-op.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Data
|
The data to fit to. |
required |
Source code in src/flowcean/core/transform.py
166 167 168 169 170 171 172 173 174 175 | |
Invertible
Bases: Protocol
Protocol for transforms that support inversion.
An invertible transform can undo its effect via inverse().
Example
>>> scaler = Standardize().fit(data)
>>> restored = scaler.inverse()(scaler(data))
inverse()
abstractmethod
Return a new transform that inverts this one.
Returns:
| Type | Description |
|---|---|
Transform
|
The inverse of the transform. |
Source code in src/flowcean/core/transform.py
191 192 193 194 195 196 197 | |
ChainedTransforms(*transforms)
Bases: Invertible, Transform
A composition of multiple transforms applied sequentially.
Chained transforms are applied left-to-right. Useful for building preprocessing pipelines.
Initialize the chained transforms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
transforms
|
Transform
|
The transforms to chain. |
()
|
Source code in src/flowcean/core/transform.py
209 210 211 212 213 214 215 216 217 218 | |
Identity()
Bases: Invertible, Transform
A no-op transform that returns data unchanged.
Often used as a placeholder or default transform.
Initialize the identity transform.
Source code in src/flowcean/core/transform.py
268 269 270 | |
Lambda(func, *, inverse_func=None)
Bases: Transform, Invertible
A transform wrapping a function.
Useful for quick one-off transformations without creating a dedicated class.
Example
>>> to_float = Lambda(lambda df: df.cast(pl.Float64))
>>> normalized = Lambda(
... lambda df: (df - df.mean()) / df.std(),
... inverse_func=lambda df: df * df.std() + df.mean(),
... )
Initialize the lambda transform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable[[Data], Data]
|
Function that transforms data. |
required |
inverse_func
|
Callable[[Data], Data] | None
|
Optional function that inverts |
None
|
Source code in src/flowcean/core/transform.py
307 308 309 310 311 312 313 314 315 316 317 318 319 320 | |