Skip to content

data_model

DataModel(data, seed, model_handler, specs_handler)

Generate synthetic samples from the training data distribution.

Attributes:

data: pl.DataFrame Original training data used in the Flowcean model.

list

Names of the columns in the training data.

int

Number of features in the dataset.

ModelHandler

ModelHandler object used to produce predictions.

list

List of indices for features of type int.

Methods:

generate_dataset() Generate random samples based on data distribution, or use original data.

Initializes the DataModel.

Parameters:

Name Type Description Default
data DataFrame

Original training data used in the Flowcean model.

required
seed int

Random seed for reproducibility.

required
model_handler ModelHandler

ModelHandler object used to produce predictions.

required
specs_handler SystemSpecsHandler

SystemSpecsHandler object storing system specifications.

required
Source code in src/flowcean/testing/generator/ddtig/domain/model_analyser/mut/data_model.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def __init__(
    self,
    data: pl.DataFrame,
    seed: int,
    model_handler: ModelHandler,
    specs_handler: SystemSpecsHandler,
) -> None:
    """Initializes the DataModel.

    Args:
        data: Original training data used in the Flowcean model.
        seed: Random seed for reproducibility.
        model_handler: ModelHandler object used to produce predictions.
        specs_handler: SystemSpecsHandler object storing
            system specifications.
    """
    self.data = data
    self.seed = seed
    self.col_names = data.columns
    self.model_handler = model_handler

    self.n_features = specs_handler.get_n_features()
    self.int_features = specs_handler.get_int_features()

generate_dataset(*, original_data=False, n_samples=0)

Generates a dataset of inputs and corresponding model predictions.

If original_data is True, uses the original training data. Otherwise, generates synthetic samples using KDE.

Parameters:

Name Type Description Default
original_data bool

Whether to use original training data or generate synthetic samples.

False
n_samples int

Number of synthetic samples to generate.

0

Returns:

Name Type Description
list

List of tuples containing input dictionaries and model outputs.

Example n_samples = 1
list

[({'Length': 0.5093, 'Diameter': 0.3886,

list

'Height': 0.1106, 'M': 0}, 8.6006)]

Source code in src/flowcean/testing/generator/ddtig/domain/model_analyser/mut/data_model.py
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
def generate_dataset(
    self,
    *,
    original_data: bool = False,
    n_samples: int = 0,
) -> list:
    """Generates a dataset of inputs and corresponding model predictions.

    If original_data is True, uses the original training data.
    Otherwise, generates synthetic samples using KDE.

    Args:
        original_data: Whether to use original training
            data or generate synthetic samples.
        n_samples: Number of synthetic samples to generate.

    Returns:
        List of tuples containing input dictionaries and model outputs.
        Example (n_samples = 1):
        [({'Length': 0.5093, 'Diameter': 0.3886,
        'Height': 0.1106, 'M': 0}, 8.6006)]
    """
    training_inputs = (
        self.data
        if original_data
        else self._generate_samples(n_samples, self.int_features)
    )
    training_outputs = self.model_handler.get_model_prediction(
        training_inputs,
    ).collect()
    samples_input_lst = training_inputs.to_dicts()
    samples_output_lst = pl.Series(
        training_outputs.select(training_outputs.columns[0]),
    ).to_list()
    return [
        (inputs, output)
        for inputs, output in zip(
            samples_input_lst,
            samples_output_lst,
            strict=False,
        )
    ]