dataframe
DataFrame(data, *, data_hash=None)
Bases: OfflineEnvironment
A dataset environment.
This environment represents static tabular datasets.
Attributes:
Name | Type | Description |
---|---|---|
data |
LazyFrame
|
The data to represent. |
Initialize the dataset environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame | LazyFrame
|
The data to represent. |
required |
data_hash
|
bytes | None
|
The hash of the data. If None, it will be computed from the dataframe which is potentially slow and expensive. |
None
|
Source code in src/flowcean/polars/environments/dataframe.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
from_csv(path, separator=',')
classmethod
Load a dataset from a CSV file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | Path
|
Path to the CSV file. |
required |
separator
|
str
|
Value separator. Defaults to ",". |
','
|
Source code in src/flowcean/polars/environments/dataframe.py
54 55 56 57 58 59 60 61 62 63 64 |
|
from_json(path)
classmethod
Load a dataset from a JSON file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | Path
|
Path to the JSON file. |
required |
Source code in src/flowcean/polars/environments/dataframe.py
66 67 68 69 70 71 72 73 74 |
|
from_parquet(path)
classmethod
Load a dataset from a Parquet file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | Path
|
Path to the Parquet file. |
required |
Source code in src/flowcean/polars/environments/dataframe.py
76 77 78 79 80 81 82 83 84 |
|
from_yaml(path)
classmethod
Load a dataset from a YAML file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | Path
|
Path to the YAML file. |
required |
Source code in src/flowcean/polars/environments/dataframe.py
86 87 88 89 90 91 92 93 94 |
|
from_uri(uri)
classmethod
Load a dataset from a URI.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uri
|
str
|
The URI to load the dataset from. |
required |
Source code in src/flowcean/polars/environments/dataframe.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
__len__()
Return the number of samples in the dataset.
Source code in src/flowcean/polars/environments/dataframe.py
127 128 129 130 131 132 133 134 135 |
|
InvalidUriSchemeError(scheme)
Bases: Exception
Exception raised when an URI scheme is invalid.
Initialize the InvalidUriSchemeError.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scheme
|
str
|
Invalid URI scheme. |
required |
Source code in src/flowcean/polars/environments/dataframe.py
194 195 196 197 198 199 200 201 202 |
|
UnsupportedFileTypeError(suffix)
Bases: Exception
Exception raised when a file type is not supported.
Initialize the UnsupportedFileTypeError.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
suffix
|
str
|
File type suffix. |
required |
Source code in src/flowcean/polars/environments/dataframe.py
208 209 210 211 212 213 214 |
|
collect(environment, n=None, *, progress_bar=True)
Collect data from an environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment
|
Iterable[LazyFrame] | Collection[LazyFrame]
|
The environment to collect data from. |
required |
n
|
int | None
|
Number of samples to collect. If None, all samples are collected. |
None
|
progress_bar
|
bool | dict[str, Any]
|
Whether to show a progress bar. If a dictionary is provided, it will be passed to the progress bar. |
True
|
Returns:
Type | Description |
---|---|
DataFrame
|
The collected dataset. |
Source code in src/flowcean/polars/environments/dataframe.py
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
|