one_hot
OneHot(feature_categories, *, check_for_missing_categories=False)
Bases: Transform
Transforms integer features into a set of binary one-hot features.
Transforms integer features into a set of binary one-hot features. The original integer features are dropped and are not part of the resulting data frame.
As an example consider the following data
feature |
---|
0 |
1 |
2 |
1 |
5 |
When the one-hot transformation is applied, the result is as follows
feature_0 | feature_1 | feature_2 | feature_5 |
---|---|---|---|
1 | 0 | 0 | 0 |
0 | 1 | 0 | 0 |
0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 |
Initializes the One-Hot transform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feature_categories
|
dict[str, list[Any]]
|
Dictionary of features and a list of categorical values to encode for each. |
required |
check_for_missing_categories
|
bool
|
If set to true, a check is performed to see if all values belong to a category. If an unknown value is found which does not belong to any category, a NoMatchingCategoryError is thrown. To perform this check, the dataframe must be materialised, resulting in a potential performance decrease. Therefore it defaults to false. |
False
|
Source code in src/flowcean/transforms/one_hot.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
apply(data)
Transform data with this one hot transformation.
Transform data with this one hot transformation and return the resulting dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame
|
The data to transform. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed data. |
Source code in src/flowcean/transforms/one_hot.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
from_dataframe(data, features, *, check_for_missing_categories=False)
classmethod
Creates a new one-hot transformation based on sample data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame
|
A dataframe containing sample data for determining the categories of the transform. |
required |
features
|
Iterable[str]
|
Name of the features for which the one hot transformation will determine the categories. |
required |
check_for_missing_categories
|
bool
|
If set to true, a check is performed to see if all values belong to a category. If an unknown value is found which does not belong to any category, a NoMatchingCategoryError is thrown. To perform this check, the dataframe must be materialised, resulting in a potential performance decrease. Therefore it defaults to false. |
False
|
Source code in src/flowcean/transforms/one_hot.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|