transform
This module provides abstractions for transforms.
Transforms are reusable, composable operations that modify data in preparation for machine learning or analysis. Transforms unify pre-processing, feature engineering, and augmentation under a single protocol-based interface.
Nomenclature
Transforms are a set of operations that modify data. They can include operations such as data normalization, dimensionality reduction, data augmentation, and much more. These transformations are essential for preparing data for machine learning tasks and improving model performance.
We use the generalized term transform for all types of pre-processing of data, feature engineering, and data augmentation, as they all involve the same fundamental concept of transforming data to obtain a modified dataset.
flowcean provides a flexible and unified interface to apply transforms to data. The framework allows to combine these transforming steps steps as needed.
Using Transforms
Here's a basic example:
from flowcean.polars import Select, Standardize
from flowcean.core import Lambda
# Define a simple pipeline
transform = (
Select(features=["x", "y"])
| Standardize()
| Lambda(lambda df: df.with_columns(z=df["x"] * df["y"]))
)
# Fit and apply
transform.fit(dataset)
transformed = transform(dataset)
# Invert (if supported)
restored = transform.inverse()(transformed)
Transform
Bases: Named
, Protocol
Base protocol for all transforms in Flowcean.
A transform is a reusable operation that modifies data. Examples include preprocessing (e.g., standardization), feature engineering (e.g., feature selection, PCA), or augmentation (noise injection, synthetic features).
Transforms are composable via the |
operator, allowing complex
transformation pipelines to be expressed in a clean and functional style:
Example
>>> transform = Select(features=["x"]) | Standardize()
>>> transformed = transform(dataset)
apply(data)
abstractmethod
Apply the transform to data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
The data to transform. |
required |
Returns:
Type | Description |
---|---|
Data
|
The transformed data. |
Source code in src/flowcean/core/transform.py
78 79 80 81 82 83 84 85 86 87 |
|
__call__(data)
Apply the transform to data.
Equivalent to self.apply(data)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
The data to transform. |
required |
Returns:
Type | Description |
---|---|
Data
|
The transformed data. |
Source code in src/flowcean/core/transform.py
89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
chain(other)
Chain this transform with other
.
This can be used to chain multiple transforms together. Chained transforms are applied left-to-right:
Example
chained = TransformA().chain(TransformB())
chained(data) # Equivalent to TransformB(TransformA(data))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Transform
|
The transforms to chain. |
required |
Returns:
Type | Description |
---|---|
Transform
|
A new chained transform. |
Source code in src/flowcean/core/transform.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
__or__(other)
Shorthand for chaining transforms.
Example
chained = TransformA() | TransformB()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Transform
|
The transform to chain. |
required |
Returns:
Type | Description |
---|---|
Transform
|
A new Chain transform. |
Source code in src/flowcean/core/transform.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
fit(data)
Fit the transform to data.
Many transforms (e.g. scaling, PCA) require statistics from the dataset
before applying. Default implementation is a no-op.
This is meant to be idempotent, i.e., calling fit()
multiple times
should have the same effect as calling it once.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
The data to fit to. |
required |
Source code in src/flowcean/core/transform.py
145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
fit_incremental(data)
Incrementally fit the transform to streaming/batched data.
Default implementation is a no-op.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
The data to fit to. |
required |
Source code in src/flowcean/core/transform.py
159 160 161 162 163 164 165 166 167 168 |
|
Invertible
Bases: Protocol
Protocol for transforms that support inversion.
An invertible transform can undo its effect via inverse()
.
Example
>>> scaler = Standardize().fit(data)
>>> restored = scaler.inverse()(scaler(data))
inverse()
abstractmethod
Return a new transform that inverts this one.
Returns:
Type | Description |
---|---|
Transform
|
The inverse of the transform. |
Source code in src/flowcean/core/transform.py
184 185 186 187 188 189 190 |
|
ChainedTransforms(*transforms)
Bases: Invertible
, Transform
A composition of multiple transforms applied sequentially.
Chained transforms are applied left-to-right. Useful for building preprocessing pipelines.
Initialize the chained transforms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transforms
|
Transform
|
The transforms to chain. |
()
|
Source code in src/flowcean/core/transform.py
202 203 204 205 206 207 208 209 210 211 |
|
Identity()
Bases: Invertible
, Transform
A no-op transform that returns data unchanged.
Often used as a placeholder or default transform.
Initialize the identity transform.
Source code in src/flowcean/core/transform.py
261 262 263 |
|
Lambda(func, *, inverse_func=None)
Bases: Transform
, Invertible
A transform wrapping a function.
Useful for quick one-off transformations without creating a dedicated class.
Example
>>> to_float = Lambda(lambda df: df.cast(pl.Float64))
>>> normalized = Lambda(
... lambda df: (df - df.mean()) / df.std(),
... inverse_func=lambda df: df * df.std() + df.mean(),
... )
Initialize the lambda transform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
Callable[[Data], Data]
|
Function that transforms data. |
required |
inverse_func
|
Callable[[Data], Data] | None
|
Optional function that inverts |
None
|
Source code in src/flowcean/core/transform.py
300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|