sdf_xarray.SDFPreprocess

sdf_xarray.SDFPreprocess#

class sdf_xarray.SDFPreprocess(data_vars=None)[source]#

Bases: object

Preprocess SDF files for xarray ensuring matching job ids and sets time dimension.

This class is used as a ‘preprocess’ function within xr.open_mfdataset. It performs three main duties on each individual file’s Dataset:

  1. Checks for a matching job ID across all files to ensure dataset consistency.

  2. Filters the Dataset to keep only the variables specified in data_vars and their required coordinates.

  3. Expands dimensions to include a single ‘time’ coordinate, preparing the Dataset for concatenation.

EPOCH can output variables at different intervals, so some SDF files may not contain the requested variable. We combine this data into one dataset by concatenating across the time dimension.

The combination is performed using join="outer" (in the calling open_mfdataset function), meaning that the final combined dataset will contain the variable across the entire time span, with NaNs filling the time steps where the variable was absent in the individual file.

With large SDF files, this filtering method will save on memory consumption when compared to loading all variables from all files before concatenation.

Parameters:

data_vars (list[str] | None) – A list of data variables to load in (If not specified loads in all variables)

__init__(data_vars=None)[source]#
Parameters:

data_vars (list[str] | None)

Methods

__init__([data_vars])