PYME.IO.PZFFormat module

Defines a ‘wire’ format for transmitting or saving image frame data, optionally with Huffman compression and/or sqrt quantization. The combination of quantization and Huffman coding allows compression ratios of 6-10 fold on typical microscopy data. Using compression or quantization requires the pyme-compress companion library to be installed which has optimized c code for performing the compression and quantization. If pyme-compress is compiled and installed on an AVX capable processor, a throughput in excess of 800MB/s can be achieved.

Most users will just want the dumps() and loads() functions

PYME.IO.PZFFormat.ChunkedHuffmanCompress(data, quantization=None)
PYME.IO.PZFFormat.ChunkedHuffmanCompress_o(data)
PYME.IO.PZFFormat.ChunkedHuffmanDecompress(datastring)
PYME.IO.PZFFormat.dumps(data, sequenceID=0, frameNum=0, frameTimestamp=0, compression=0, quantization=0, quantizationOffset=0, quantizationScale=1)

Dump an image frame (supplied as a numpy array) into a string in PZF format.

Parameters
data: ndarray

The frame as a 2D (or optionally 3D) numpy array

sequenceID: int

A unique identifier for the sequence to which this frame belongs. This will let us connect the frame with it’s metadata even if they end up in different directories etc …

frameNum: int

The position of this frame within the sequence

frameTimestamp: float

A timestamp for the frame (if provided by the camera)

compression: int (enum)

compression method to use - one of: PZFFormat.DATA_COMP_RAW, PZFFormat.DATA_COMP_HUFFCODE, or PZFFormat.DATA_COMP_HUFFCODE_CHUNKS Where raw stores the data with no compression, huffcode uses Huffman coding, and huffcode chunks breaks the data into chunks first, with each chunk meing encodes by a separate thread.

quantization: int (enum)

Whether or not the data is quantized before saving. One of DATA_QUANT_NONE or DATA_QUANT_SQRT. If DATA_QUANT_SQRT is selected, then the data is quantized as follows prior to compression:

\[data_{quant} = \frac{\sqrt{data - quantizationOffset}}{quantizationScale}\]
PYME.IO.PZFFormat.header_dtype_v3 = [('ID', 'S2'), ('Version', 'u1'), ('DataFormat', 'u1'), ('DataCompression', 'u1'), ('DataQuantization', 'u1'), ('DimOrder', 'S1'), ('RESERVED0', 'S1'), ('SequenceID', 'i8'), ('FrameNum', 'u4'), ('Width', 'u4'), ('Height', 'u4'), ('Depth', 'u4'), ('FrameTimestamp', 'u8'), ('QuantOffset', 'f4'), ('QuantScale', 'f4'), ('DataOffset', 'u4'), ('RESERVED1', 'S12')]

numpy dtype used to define the file header struct.

Most of the entries should be fairly self explanatory, with the following deserving a bit more explanation:

ID

a 2-character string that we can test to see if the file type is consistent

Version

the version of this format the file uses

DataFormat

what the data type of individual pixels is

DataCompression

whether the data is compressed, and which algorithm is used

SequenceID

A unique identifier for the sequence to which this frame belongs. The most important property of this number is that it is unique to each sequence. A reasonable method of generation would be to use a unix-format integer timestamp for the first dword, and a random integer for the second. A hash of the first n image pixels could also be used.

FrameNum

The position of this frame within the sequence

FrameTimestamp

Space to save camera derived frame timestamps, if available

Depth

As envisaged, the format is expected to contain individual 2D frames, with multiple frames being pulled together in a higher level container to construct a sequence or stack. Depth is included just because it doesn’t take a significant ammount of extra space, but gives us flexibility for the future.

PYME.IO.PZFFormat.load_header(datastring)
PYME.IO.PZFFormat.loads(datastring)

Loads image data from a string in PZF format.

Parameters
datastringstring / bytes

The encoded data

Returns
datandarray

The image data as a numpy array

headerrecarray

The image header, as a numpy record array with the header_dtype dtype.