Examples¶
Plain time serie¶
Here the files are all in the same folder. Only the timestamp differ from one file to the other:
Data
├── SSH
│ ├── SSH_20070101.nc
│ ├── SSH_20070109.nc
│ └── ...
└── SST
├── A_2007001_2007008.L3m_8D_sst.nc
├── A_2007008_2007016.L3m_8D_sst.nc
└── ...
We will scan for SST files:
from xarray_regex import FileFinder, library
root = 'Data/SST'
pregex = 'A_%(Y)%(j)_%(Y)%(j:discard)%(suffix)'
finder = FileFinder(root, pregex, suffix=r'\.L3m_8D_sst\.nc')
files = finder.get_files()
We would like to open all these files using Xarray, however the files lacks a defined ‘time’ dimensions to concatenate all files. To make it work, we can use the ‘preprocess’ argument of xarray.open_mfdataset:
def preprocess(ds, filename, finder):
matches = finder.get_matches(filename)
date = library.get_date(matches)
ds = ds.assign_coords(time=pd.to_datetime([value]))
return ds
ds = xr.open_mfdataset(files,
preprocess=f.get_func_process_filename(preprocess))
Nested files¶
We can scan both variables at the same time but retrieve the files as a nested list. We assume the filenames for both variable are structured in the same way. Groups in the pre-regex will define what matchers will be grouped together:
pregex = '%(variable:char)/%(variable:char)_%(time:Y)%(time:j)\.nc'
We can now group the files by variable or time:
>>> finder.get_files(relative=True, nested=['variable'])
[['SSH_20070101.nc',
'SSH_20070109.nc',
...],
['SST_20070101.nc',
'SST_20070109.nc',
...]]
>>> finder.get_files(relative=True, nested=['time'])
[['SSH_20070101.nc', 'SST_20070101.nc'],
['SSH_20070109.nc', 'SST_20070109.nc'],
...]
This works for any number of groups in any order.