pyMAISE.preprocessing.SplitSequence
- class pyMAISE.preprocessing.SplitSequence(input_steps, output_steps, output_position, **kwargs)[source]
Bases:
objectSplit sequence function for rolling windows of time series data. Using a rolling windows, 2D-time series data of dimensions (time steps, features) is split according to the features defined in
sequence_inputs,sequence_outputs, and the windows width and positional information. This results in a 3D input data set and a 2D or 3D output data set.If the data set is 3D, then rolling windows are applied to the sequences specified in
sequence_inputsandsequence_outputsresulting in a 4D array. The features and windows (the last two dimensions) are combined to create a 3D data set. Features without rolling windows are specified infeature_inputsandfeature_outputsand are concatenated to get 3D input and outputxarray.DataArrayobjects.- Parameters:
input_steps (int) – The window size or number of time steps for each input sample.
output_steps (int) – The window size or number of time steps for each output sample.
output_position (int) – The position to start the output window relative to the position of the final time step in the input window. If the last time step in the input window is at index five and
output_position=1, then the output window begins at index six.sequence_inputs (None or array/list of int or str) – Corresponds to the features (last dimension of
data) taken for inputs. IfNonethen the entire data set is used for inputs.sequence_outputs (None or array/list of int or str) – Corresponds to the labels (last dimension of
data) that are taken for outputs. IfNonethen the entire data set is used for outputs.const_inputs (None or array/list of int or str) – The features concatenated to the input windows that are not used in rolling windows. This is only used when
datais 3D.const_outputs (None or array/list of int or str) – The labels concatenated to the input windows that are not used in rolling windows. This is only used when
datais 3D.
Examples
Using the 2D LOCA data set, we demonstrate rolling windows on the perturbed data.
>>> from pyMAISE import datasets, preprocessing >>> _, perturbed = datasets.load_loca(stack_series=True) >>> perturbed.shape (1600000, 44) >>> sequence_outputs = [ "Pellet Cladding Temperature", "Core Pressure", "Water Level", "Break Flow Rate" ] >>> split_sequences = preprocessing.SplitSequence( 4, 1, 1, sequence_outputs=sequence_outputs ) >>> perturbed_input, perturbed_output = split_sequences.split(perturbed) >>> perturbed_input.shape (1599996, 4, 44) >>> perturbed_output.shape (1599996, 4)
Alternatively, we can use the 3D perturbed LOCA data, specify the four sequential features as inputs and outputs and then add the time-independent features.
>>> from pyMAISE import datasets, preprocessing >>> _, perturbed = datasets.load_loca(stack_series=False) >>> perturbed.shape (4000, 400, 44) >>> split_sequences = preprocessing.SplitSequence( input_steps=4, output_steps=1, output_position=1, sequence_inputs=range(-4, 0), sequence_outputs=range(-4, 0), const_inputs=range(40), ) >>> perturbed_input, perturbed_output = split_sequences.split(perturbed) >>> perturbed_input.shape (4000, 396, 56) >>> perturbed_output.shape (4000, 396, 4)
Methods
split(data)Run rolling windows.
- split(data)[source]
Run rolling windows.
- Parameters:
data (xarray.DataArray) – A data set that includes both input and output sequence data. This data can be either 2 or 3-dimensional.
- Returns:
split_input (xarray.DataArray) – The 3D data set of input data with dimensions (samples, time steps, features).
split_output (xarray.DataArray) – The 3D or 2D data set of output data with either dimensions (samples, time steps, labels) or (samples, labels). If
output_steps=1then the time steps dimension is removed.