Skip to content

transform

This module provides base classes for transforms.

Pre-processing of data, feature engineering, or augmentation, are fundamental processes in machine learning. AGenC generalizes these processes under the term transforms. This page will guide through the concept of transforms and demonstrate how to use them within AGenC.

Nomenclature

Transforms are a set of operations that modify data. They can include operations such as data normalization, dimensionality reduction, data augmentation, and much more. These transformations are essential for preparing data for machine learning tasks and improving model performance.

In AGenC, we use the generalized term transform for all types of pre-processing of data, feature engineering, and data augmentation, as they all involve the same fundamental concept of transforming data to obtain a modified dataset.

AGenC provides a flexible and unified interface to apply transforms to data. The framework allows to combine these transforming steps steps as needed.

Using Transforms

Here's a basic example:

from flowcean.transforms import Select, Standardize

# Load the dataset
dataset: pl.DataFrame = ...

# Define transforms by chaining a selection and a standardization
transforms = Select(features=["reference", "temperature"]) | Standardize(
    mean={
        "reference": 0.0,
        "temperature": 0.0,
    },
    std={
        "reference": 1.0,
        "temperature": 1.0,
    },
)

# Apply the transforms to data
transformed_data = transforms(dataset)

Transform

Bases: ABC

Base class for all transforms.

apply(data) abstractmethod

Apply the transform to data.

Parameters:

Name Type Description Default
data LazyFrame

The data to transform.

required

Returns:

Type Description
LazyFrame

The transformed data.

Source code in src/flowcean/core/transform.py
66
67
68
69
70
71
72
73
74
75
@abstractmethod
def apply(self, data: pl.LazyFrame) -> pl.LazyFrame:
    """Apply the transform to data.

    Args:
        data: The data to transform.

    Returns:
        The transformed data.
    """

__call__(data)

Apply the transform to data.

Parameters:

Name Type Description Default
data LazyFrame

The data to transform.

required

Returns:

Type Description
LazyFrame

The transformed data.

Source code in src/flowcean/core/transform.py
77
78
79
80
81
82
83
84
85
86
def __call__(self, data: pl.LazyFrame) -> pl.LazyFrame:
    """Apply the transform to data.

    Args:
        data: The data to transform.

    Returns:
        The transformed data.
    """
    return self.apply(data)

chain(other)

Chain this transform with other transforms.

This can be used to chain multiple transforms together. Chained transforms are applied left to right.

Example
chained_transform = TransformA().chain(TransformB())

Parameters:

Name Type Description Default
other Transform

The transforms to chain.

required

Returns:

Type Description
Transform

A new Chain transform.

Source code in src/flowcean/core/transform.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
def chain(
    self,
    other: Transform,
) -> Transform:
    """Chain this transform with other transforms.

    This can be used to chain multiple transforms together.
    Chained transforms are applied left to right.

    Example:
        ```python
        chained_transform = TransformA().chain(TransformB())
        ```

    Args:
        other: The transforms to chain.

    Returns:
        A new Chain transform.
    """
    return ChainedTransforms(self, other)

__or__(other)

Shorthand for chaining transforms.

Example
chained_transform = TransformA() | TransformB()

Parameters:

Name Type Description Default
other Transform

The transform to chain.

required

Returns:

Type Description
Transform

A new Chain transform.

Source code in src/flowcean/core/transform.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def __or__(
    self,
    other: Transform,
) -> Transform:
    """Shorthand for chaining transforms.

    Example:
        ```python
        chained_transform = TransformA() | TransformB()
        ```

    Args:
        other: The transform to chain.

    Returns:
        A new Chain transform.
    """
    return self.chain(other)

FitOnce

Bases: ABC

A mixin for transforms that need to be fitted to data once.

fit(data) abstractmethod

Fit to the data.

Parameters:

Name Type Description Default
data LazyFrame

The data to fit to.

required
Source code in src/flowcean/core/transform.py
133
134
135
136
137
138
139
@abstractmethod
def fit(self, data: pl.LazyFrame) -> None:
    """Fit to the data.

    Args:
        data: The data to fit to.
    """

FitIncremetally

Bases: ABC

A mixin for transforms that need to be fitted to data incrementally.

fit_incremental(data) abstractmethod

Fit to the data incrementally.

Parameters:

Name Type Description Default
data LazyFrame

The data to fit to.

required
Source code in src/flowcean/core/transform.py
145
146
147
148
149
150
151
@abstractmethod
def fit_incremental(self, data: pl.LazyFrame) -> None:
    """Fit to the data incrementally.

    Args:
        data: The data to fit to.
    """

ChainedTransforms(*transforms)

Bases: Transform, FitOnce, FitIncremetally

A transform that is a chain of other transforms.

Initialize the chained transforms.

Parameters:

Name Type Description Default
transforms Transform

The transforms to chain.

()
Source code in src/flowcean/core/transform.py
159
160
161
162
163
164
165
166
167
168
def __init__(
    self,
    *transforms: Transform,
) -> None:
    """Initialize the chained transforms.

    Args:
        transforms: The transforms to chain.
    """
    self.transforms = transforms

Identity()

Bases: Transform

A transform that does nothing.

Initialize the identity transform.

Source code in src/flowcean/core/transform.py
201
202
203
def __init__(self) -> None:
    """Initialize the identity transform."""
    super().__init__()