transform
This module provides base classes for transforms.
Pre-processing of data, feature engineering, or augmentation, are fundamental processes in machine learning. AGenC generalizes these processes under the term transforms. This page will guide through the concept of transforms and demonstrate how to use them within AGenC.
Nomenclature
Transforms are a set of operations that modify data. They can include operations such as data normalization, dimensionality reduction, data augmentation, and much more. These transformations are essential for preparing data for machine learning tasks and improving model performance.
In AGenC, we use the generalized term transform for all types of pre-processing of data, feature engineering, and data augmentation, as they all involve the same fundamental concept of transforming data to obtain a modified dataset.
AGenC provides a flexible and unified interface to apply transforms to data. The framework allows to combine these transforming steps steps as needed.
Using Transforms
Here's a basic example:
from flowcean.transforms import Select, Standardize
# Load the dataset
dataset: pl.DataFrame = ...
# Define transforms by chaining a selection and a standardization
transforms = Select(features=["reference", "temperature"]) | Standardize(
mean={
"reference": 0.0,
"temperature": 0.0,
},
std={
"reference": 1.0,
"temperature": 1.0,
},
)
# Apply the transforms to data
transformed_data = transforms(dataset)
Transform
Bases: ABC
Base class for all transforms.
apply(data)
abstractmethod
Apply the transform to data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
LazyFrame
|
The data to transform. |
required |
Returns:
Type | Description |
---|---|
LazyFrame
|
The transformed data. |
Source code in src/flowcean/core/transform.py
66 67 68 69 70 71 72 73 74 75 |
|
__call__(data)
Apply the transform to data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
LazyFrame
|
The data to transform. |
required |
Returns:
Type | Description |
---|---|
LazyFrame
|
The transformed data. |
Source code in src/flowcean/core/transform.py
77 78 79 80 81 82 83 84 85 86 |
|
chain(other)
Chain this transform with other transforms.
This can be used to chain multiple transforms together. Chained transforms are applied left to right.
Example
chained_transform = TransformA().chain(TransformB())
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Transform
|
The transforms to chain. |
required |
Returns:
Type | Description |
---|---|
Transform
|
A new Chain transform. |
Source code in src/flowcean/core/transform.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
__or__(other)
Shorthand for chaining transforms.
Example
chained_transform = TransformA() | TransformB()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Transform
|
The transform to chain. |
required |
Returns:
Type | Description |
---|---|
Transform
|
A new Chain transform. |
Source code in src/flowcean/core/transform.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
FitOnce
Bases: ABC
A mixin for transforms that need to be fitted to data once.
fit(data)
abstractmethod
Fit to the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
LazyFrame
|
The data to fit to. |
required |
Source code in src/flowcean/core/transform.py
133 134 135 136 137 138 139 |
|
FitIncremetally
Bases: ABC
A mixin for transforms that need to be fitted to data incrementally.
fit_incremental(data)
abstractmethod
Fit to the data incrementally.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
LazyFrame
|
The data to fit to. |
required |
Source code in src/flowcean/core/transform.py
145 146 147 148 149 150 151 |
|
ChainedTransforms(*transforms)
Bases: Transform
, FitOnce
, FitIncremetally
A transform that is a chain of other transforms.
Initialize the chained transforms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transforms
|
Transform
|
The transforms to chain. |
()
|
Source code in src/flowcean/core/transform.py
159 160 161 162 163 164 165 166 167 168 |
|
Identity()
Bases: Transform
A transform that does nothing.
Initialize the identity transform.
Source code in src/flowcean/core/transform.py
201 202 203 |
|