pymc.backends.zarr.ZarrTrace#
- class pymc.backends.zarr.ZarrTrace(store=None, synchronizer=None, compressor=UNSET, draws_per_chunk=1, include_transformed=False)[source]#
Object that stores and enables access to MCMC draws stored in a
zarr.hierarchy.Groupobjects.This class creats a zarr hierarchy to represent the sampling information which is intended to mimic
arviz.InferenceData. The hierarchy looks like this:root|–> constant_data|–> observed_data|–> posterior|–> unconstrained_posterior|–> sample_stats|–> warmup_posterior|–> warmup_unconstrained_posterior|–> warmup_sample_stats|–> _sampling_stateThe root group is created when the
ZarrTraceobject is initialized. The rest of the groups are created onceinit_trace()is called with a few exceptions: unconstrained_posterior is only created ifinclude_transformed = True, and the groups prefixed withwarmup_are created only after callingsplit_warmup_groups().Since
ZarrTraceobjects are intended to be as close toarviz.InferenceDataobjects as possible, the groups store the dimension and coordinate information following the xarray zarr standard.- Parameters:
- store
zarr.storage.BaseStore|collections.abc.MutableMapping|None The store object where the zarr groups and arrays will be stored and read from. Any zarr compatible storage object works. Keep in mind that if
Noneis provided, azarr.storage.MemoryStorewill be used, which means that information won’t be visible to other processes and won’t persist after theZarrTracelife-cycle ends. If you want to have persistent storage, please use one of the multiple disk backed zarr storage options, e.g.DirectoryStoreorZipStore.- synchronizer
zarr.sync.Synchronizer|None The synchronizer to use for the underlying zarr arrays.
- compressor
numcodec.abc.Codec|None|pymc.util.UNSET The compressor to use for the underlying zarr arrays. If
None, no compressor is used. IfUNSET, zarr’s default compressor is used.- draws_per_chunk
int The number of draws that make up a chunk in the variable’s posterior array. Each variable’s array shape is set to
(n_chains, n_draws, *rv_shape), but the chunks are set to(1, draws_per_chunk, *rv_shape). This means that each chain will have it’s own chunk to read or write to, allowing for concurrent write operations of different chains not to interfere with each other, and that multiple draws can belong to the same chunk. The variable’s core dimension however, will never be split across different chunks.- include_transformedbool
If
True, the transformed, unconstrained value variables are included in the storage group.
- store
See also
Notes
ZarrTraceobjects represent the storage information. If the underlying store persists on disk or over the network (e.g. with azarr.storage.FSStore) multiple process will be able to concurrently access the same storage and read or write to it.The intended division of labour is for
ZarrTraceto handle the creation and management of the zarr group and storage objects and arrays, and for individualZarrChainobjects to handle recording MCMC samples to the trace. This division was chosen to stay close to the existing pymc.backends.base.MultiTrace and pymc.backends.ndarray.NDArray way of working with the existing samplers.One extra feature of
ZarrTraceis that it enables direct access to any array’s metadata.ZarrTracetakes advantage of this to tag arrays asdeterministicorfreeRVdepending on what kind of variable they were in the defining model.Methods
ZarrTrace.__init__([store, synchronizer, ...])ZarrTrace.create_group(name, data_dict)ZarrTrace.init_group_with_empty(group, ...)ZarrTrace.init_sampling_state_group(tune, chains)ZarrTrace.init_trace(chains, draws, tune, step)Initialize the trace groups and arrays.
ZarrTrace.split_warmup(group_name[, ...])Split the arrays of a group into the warmup and regular groups.
Split the warmup and standard groups.
ZarrTrace.to_inferencedata([save_warmup])Convert
ZarrTracetoInferenceData.Attributes
constant_dataobserved_dataposteriorsample_statssampling_timetuning_stepsunconstrained_posterior