one_cold
OneCold(feature_categories, *, check_for_missing_categories=False)
Bases: Transform
Transforms integer features into a set of binary one-cold features.
Transforms integer features into a set of binary one-cold features. The original integer features are dropped and are not part of the resulting data frame.
As an example consider the following data
feature |
---|
0 |
1 |
2 |
1 |
5 |
When the one-cold transformation is applied, the result is as follows
feature_0 | feature_1 | feature_2 | feature_5 |
---|---|---|---|
0 | 1 | 1 | 1 |
1 | 0 | 1 | 1 |
1 | 1 | 0 | 1 |
1 | 0 | 1 | 1 |
1 | 1 | 1 | 0 |
In the default configuration missing categories are ignored. Their respective entries will all be one. If you however want to enforce that each data entry belongs to a certain category, you can set the check_for_missing_categories flag to true when constructing a One-Cold transform. In that case if an unknown value is found which does not belong to any category, a NoMatchingCategoryError is thrown. This however has an impact on the performance and will slow down the transform.
If you want to enable this check, create the transform as follows:
python
transform = OneCold(
feature_categories={
"feature": [0, 1, 2, 5]
},
check_for_missing_categories=True
)
Initializes the One-Hot transform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feature_categories
|
dict[str, list[Any]]
|
Dictionary of features and a list of categorical values to encode for each. |
required |
check_for_missing_categories
|
bool
|
If set to true, a check is performed to see if all values belong to a category. If an unknown value is found which does not belong to any category, a NoMatchingCategoryError is thrown. To perform this check, the dataframe must be materialised, resulting in a potential performance decrease. Therefore it defaults to false. |
False
|
Source code in src/flowcean/transforms/one_cold.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
apply(data)
Transform data with this one hot transformation.
Transform data with this one hot transformation and return the resulting dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
LazyFrame
|
The data to transform. |
required |
Returns:
Type | Description |
---|---|
LazyFrame
|
The transformed data. |
Source code in src/flowcean/transforms/one_cold.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
from_dataframe(data, features, *, check_for_missing_categories=False)
classmethod
Creates a new one-hot transformation based on sample data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame
|
A dataframe containing sample data for determining the categories of the transform. |
required |
features
|
Iterable[str]
|
Name of the features for which the one hot transformation will determine the categories. |
required |
check_for_missing_categories
|
bool
|
If set to true, a check is performed to see if all values belong to a category. If an unknown value is found which does not belong to any category, a NoMatchingCategoryError is thrown. To perform this check, the dataframe must be materialised, resulting in a potential performance decrease. Therefore it defaults to false. |
False
|
Source code in src/flowcean/transforms/one_cold.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|