pyMAISE.preprocessing.SplitSequence

class pyMAISE.preprocessing.SplitSequence(input_steps, output_steps, output_position, **kwargs)[source]

Bases: object

Split sequence function for rolling windows of time series data. Using a rolling windows, 2D-time series data of dimensions (time steps, features) is split according to the features defined in sequence_inputs, sequence_outputs, and the windows width and positional information. This results in a 3D input data set and a 2D or 3D output data set.

If the data set is 3D, then rolling windows are applied to the sequences specified in sequence_inputs and sequence_outputs resulting in a 4D array. The features and windows (the last two dimensions) are combined to create a 3D data set. Features without rolling windows are specified in feature_inputs and feature_outputs and are concatenated to get 3D input and output xarray.DataArray objects.

Parameters:
  • input_steps (int) – The window size or number of time steps for each input sample.

  • output_steps (int) – The window size or number of time steps for each output sample.

  • output_position (int) – The position to start the output window relative to the position of the final time step in the input window. If the last time step in the input window is at index five and output_position=1, then the output window begins at index six.

  • sequence_inputs (None or array/list of int or str) – Corresponds to the features (last dimension of data) taken for inputs. If None then the entire data set is used for inputs.

  • sequence_outputs (None or array/list of int or str) – Corresponds to the labels (last dimension of data) that are taken for outputs. If None then the entire data set is used for outputs.

  • const_inputs (None or array/list of int or str) – The features concatenated to the input windows that are not used in rolling windows. This is only used when data is 3D.

  • const_outputs (None or array/list of int or str) – The labels concatenated to the input windows that are not used in rolling windows. This is only used when data is 3D.

Examples

Using the 2D LOCA data set, we demonstrate rolling windows on the perturbed data.

>>> from pyMAISE import datasets, preprocessing
>>> _, perturbed = datasets.load_loca(stack_series=True)
>>> perturbed.shape
(1600000, 44)
>>> sequence_outputs = [
        "Pellet Cladding Temperature",
        "Core Pressure",
        "Water Level",
        "Break Flow Rate"
    ]
>>> split_sequences = preprocessing.SplitSequence(
        4,
        1,
        1,
        sequence_outputs=sequence_outputs
    )
>>> perturbed_input, perturbed_output = split_sequences.split(perturbed)
>>> perturbed_input.shape
(1599996, 4, 44)
>>> perturbed_output.shape
(1599996, 4)

Alternatively, we can use the 3D perturbed LOCA data, specify the four sequential features as inputs and outputs and then add the time-independent features.

>>> from pyMAISE import datasets, preprocessing
>>> _, perturbed = datasets.load_loca(stack_series=False)
>>> perturbed.shape
(4000, 400, 44)
>>> split_sequences = preprocessing.SplitSequence(
        input_steps=4,
        output_steps=1,
        output_position=1,
        sequence_inputs=range(-4, 0),
        sequence_outputs=range(-4, 0),
        const_inputs=range(40),
    )
>>> perturbed_input, perturbed_output = split_sequences.split(perturbed)
>>> perturbed_input.shape
(4000, 396, 56)
>>> perturbed_output.shape
(4000, 396, 4)
__init__(input_steps, output_steps, output_position, **kwargs)[source]

Methods

split(data)

Run rolling windows.

split(data)[source]

Run rolling windows.

Parameters:

data (xarray.DataArray) – A data set that includes both input and output sequence data. This data can be either 2 or 3-dimensional.

Returns:

  • split_input (xarray.DataArray) – The 3D data set of input data with dimensions (samples, time steps, features).

  • split_output (xarray.DataArray) – The 3D or 2D data set of output data with either dimensions (samples, time steps, labels) or (samples, labels). If output_steps=1 then the time steps dimension is removed.