PYME Data Model¶
The ImageStack object¶
Our base image class is
PYME.IO.image.ImageStack. This wraps both image data and image metadata, and provides functions
for loading and saving images in a number of formats. The
ImageStack object has two key attributes.
which provides access to the image data, and
ImageStack.metadata 1 which holds the metadata.
ImageStack.data attribute is an instance of a DataSource. These datasources implement lazy loading logic which
permits us to rapidly load data without needing to pull the entire contents of a file into memory. This is accomplished
by addressing the file on a frame by frame basis. Low level data source modules (which can be thought of as input drivers
or adapters) can be found in
PYME.IO.DataSources. Each of these data sources implements 3 core methods:
End user code should not generally use these methods directly. Instead DataSources present a custom
which allows data sources to be sliced as though they were numpy arrays. For example, returning the nth
slice from a 3D image could be accomplished as follows.
To extract a profile along z (or t) at a given (x,y) position
Or to extract a 3D sub-image
DataSource returns a new numpy array built on the fly by concatenating elements obtained from repeated
getSlice. Because 2D slicing is performed before concatenation, this allows axial line profiles or ROIs to
be extracted without ever having the full image file in memory. DataSources also present a
.shape attribute which
is very similar to the
.shape attribute of numpy arrays, with the major difference that a number of empty dimensions
are appended to the end of the shape. It is perfectly OK to index a data source outside it’s
true dimensionality. e.g. for a 3D data source
is the same as
This extra-dimensional indexing is there to allow consistent handling of data regardless of the number of colour channels. i.e. it will always be possible to access the 0th colour channel, even if the data is only a single channel. Similarly, it is always possible to access the 0th slice along the 3rd dimension, even if the underlying data is only 2D.
With the way colour channels were implemented in older versions of the code, it was not possible to slice the 4th dimension (indexing was OK). You can now slice along the 4th dimension, but it is possible that some parts of the code have not caught up.
We currently use a 4D model, where the first and second dimensions are x and y, the third dimension is either z or t and the 4th dimension is the colour channel.
The 4D data model means that there is currently no support for 3D time series, and that no distinction is made between time series and 3D stacks when processing. Up until this point, this has not been a major limitation, but it would be nicer if we had a consistent 5D data model. Transitioning to a 5D model is on the roadmap, but I have not yet decided if this will be a backwards compatible change.
The second principle data type in PYME is tabular data. In it’s loosest form this consists of columns which are accessible
by name (behaving a little like a dictionary). The most canonical form of tabular data is a class derived from
PYME.IO.tabular.TabularBase 4. In some parts of the code, however, you will find numpy record arrays,
pandas data frames, or even dictionaries standing in as tabular data.
The 4 key requirements for tabular data are:
It should be indexable by column name like a dictionary
Each column should be returned as a one-dimensional numpy array 3
Each column should have the same length
It should implement a
.keys()function which returns a list of the column names 5
If these requirements are met, tabular data can be processed by filters and mappings (defined in
and a processing pipeline built up by cascading filters. One example of a tabular processing pipleine is the VisGUI
At this point in time saving support is not baked into
TabularBase, and the most consistent / easiest way of
saving tabular data is probably to call the
.toDataFrame() method and then use pandas io functions. e.g.
There is also accessible through a shortcut,
ImageStack.mdh, which is used in most existing code. New code should use the more descriptive
Inherited from a common base class.
This is not strictly true if using pandas data frames (indexing by column returns another data frame). In most cases these are sufficiently similar to numpy arrays that you can get away with it, but caution is advised. TODO: write a
TabularBasederived wrapper for data frames.
This was previously
numpy recarrays do not implement a
keys()method and should normally be wrapped in an instance