Notation

When coding with dclab, you should be aware of the following definitions and design principles.

Events

An event comprises all data recorded for the detection of one object (e.g. cell or bead) in an RT-DC measurement.

Features

A feature is a measurement parameter of an RT-DC measurement. For instance, the feature “index” enumerates all recorded events, the feature “deform” contains the deformation values of all events. There are scalar features, i.e. features that assign a single number to an event, and non-scalar features, such as “image” and “contour”. All features in a dataset are exposed as read-only to the user. The following features are supported by dclab:

Scalar features

scalar features	description [units]
area_cvx	Convex area [px]
area_msd	Measured area [px]
area_ratio	Porosity (convex to measured area ratio)
area_um	Area [µm²]
area_um_raw	Area [µm²] of raw contour
aspect	Aspect ratio of bounding box
basinmap0	Basin mapping 0
basinmap1	Basin mapping 1
basinmap2	Basin mapping 2
basinmap3	Basin mapping 3
basinmap4	Basin mapping 4
basinmap5	Basin mapping 5
basinmap6	Basin mapping 6
basinmap7	Basin mapping 7
basinmap8	Basin mapping 8
basinmap9	Basin mapping 9
bg_med	Median frame background brightness [a.u.]
bright_avg	Brightness average [a.u.]
bright_bc_avg	Brightness average (bgc) [a.u.]
bright_bc_sd	Brightness SD (bgc) [a.u.]
bright_perc_10	10th Percentile of brightness (bgc)
bright_perc_90	90th Percentile of brightness (bgc)
bright_sd	Brightness SD [a.u.]
circ	Circularity
deform	Deformation
deform_raw	Deformation of raw contour
eccentr_prnc	Eccentricity of raw contour
emodulus	Young’s modulus [kPa]
fl1_area	FL-1 area of peak [a.u.]
fl1_dist	FL-1 distance between two first peaks [µs]
fl1_max	FL-1 maximum [a.u.]
fl1_max_ctc	FL-1 maximum, crosstalk-corrected [a.u.]
fl1_npeaks	FL-1 number of peaks
fl1_pos	FL-1 position of peak [µs]
fl1_width	FL-1 width [µs]
fl2_area	FL-2 area of peak [a.u.]
fl2_dist	FL-2 distance between two first peaks [µs]
fl2_max	FL-2 maximum [a.u.]
fl2_max_ctc	FL-2 maximum, crosstalk-corrected [a.u.]
fl2_npeaks	FL-2 number of peaks
fl2_pos	FL-2 position of peak [µs]
fl2_width	FL-2 width [µs]
fl3_area	FL-3 area of peak [a.u.]
fl3_dist	FL-3 distance between two first peaks [µs]
fl3_max	FL-3 maximum [a.u.]
fl3_max_ctc	FL-3 maximum, crosstalk-corrected [a.u.]
fl3_npeaks	FL-3 number of peaks
fl3_pos	FL-3 position of peak [µs]
fl3_width	FL-3 width [µs]
flow_rate	Flow rate [µLs⁻¹]
frame	Video frame number
g_force	Gravitational force in multiples of g
index	Index (Dataset)
index_online	Index (Online)
inert_ratio_cvx	Inertia ratio of convex contour
inert_ratio_prnc	Principal inertia ratio of raw contour
inert_ratio_raw	Inertia ratio of raw contour
ml_class	Most probable ML class
nevents	Number of events in the same image
pc1	Principal component 1
pc2	Principal component 2
per_ratio	Inverse Convexity (raw to convex perimeter ratio)
per_um_raw	Perimeter [µm] of raw contour
pos_x	Position along channel axis [µm]
pos_y	Position lateral in channel [µm]
pressure	Pressure [mPa]
qpi_dm_avg	Dry mass (average) [pg]
qpi_dm_sd	Dry mass (SD) [pg]
qpi_focus	Computed focus distance [µm]
qpi_pha_int	Integrated phase [rad]
qpi_ri_avg	Refractive index (average)
qpi_ri_sd	Refractive index (SD)
size_x	Bounding box size x [µm]
size_y	Bounding box size y [µm]
sym_x	Symmetry ratio left-right
sym_y	Symmetry ratio top-bottom
temp	Chip temperature [°C]
temp_amb	Ambient temperature [°C]
tex_asm_avg	Texture angular second moment (avg)
tex_asm_ptp	Texture angular second moment (ptp)
tex_con_avg	Texture contrast (avg)
tex_con_ptp	Texture contrast (ptp)
tex_cor_avg	Texture correlation (avg)
tex_cor_ptp	Texture correlation (ptp)
tex_den_avg	Texture difference entropy (avg)
tex_den_ptp	Texture difference entropy (ptp)
tex_ent_avg	Texture entropy (avg)
tex_ent_ptp	Texture entropy (ptp)
tex_f12_avg	Texture First measure of correlation (avg)
tex_f12_ptp	Texture First measure of correlation (ptp)
tex_f13_avg	Texture Second measure of correlation (avg)
tex_f13_ptp	Texture Second measure of correlation (ptp)
tex_idm_avg	Texture inverse difference moment (avg)
tex_idm_ptp	Texture inverse difference moment (ptp)
tex_sen_avg	Texture sum entropy (avg)
tex_sen_ptp	Texture sum entropy (ptp)
tex_sva_avg	Texture sum variance (avg)
tex_sva_ptp	Texture sum variance (ptp)
tex_var_avg	Texture variance (avg)
tex_var_ptp	Texture variance (ptp)
tilt	Absolute tilt of raw contour
time	Time [s]
userdef0	User-defined 0
userdef1	User-defined 1
userdef2	User-defined 2
userdef3	User-defined 3
userdef4	User-defined 4
userdef5	User-defined 5
userdef6	User-defined 6
userdef7	User-defined 7
userdef8	User-defined 8
userdef9	User-defined 9
volume	Volume [µm³]

In addition to these scalar features, it is possible to define a large number of features dedicated to machine-learning, the “ml_score_???” features: The “?” can be a digit or a lower-case letter of the alphabet, e.g. “ml_score_rbc” or “ml_score_3a3”. If “ml_score_???” features are defined, then the ancillary “ml_class” feature, which identifies the most-probable feature for each event, becomes available.

Non-scalar features

non-scalar features	description [units]
contour	Event contour
image	Gray scale event image
image_bg	Gray scale event background image
mask	Binary mask labeling the event in the image
qpi_amp	Hologram amplitude image
qpi_oah	Off-axis hologram
qpi_oah_bg	Off-axis hologram background
qpi_pha	Hologram phase image [rad]
trace	Dictionary of fluorescence traces

Examples

deformation vs. area plot

import matplotlib.pylab as plt
import dclab
ds = dclab.new_dataset("data/example.rtdc")
ax = plt.subplot(111)
ax.plot(ds["area_um"], ds["deform"], "o", alpha=.2)
ax.set_xlabel(dclab.dfn.get_feature_label("area_um"))
ax.set_ylabel(dclab.dfn.get_feature_label("deform"))
plt.show()

(Source code, png, hires.png, pdf)

event image plot

import matplotlib.pylab as plt
import dclab
ds = dclab.new_dataset("data/example_video.rtdc")
ax1 = plt.subplot(211, title="image")
ax2 = plt.subplot(212, title="mask")
ax1.imshow(ds["image"][6], cmap="gray")
ax2.imshow(ds["mask"][6])

(Source code, png, hires.png, pdf)

Ancillary features

Not all features available in dclab are recorded online during the acquisition of the experimental dataset. Some of the features are computed offline by dclab, such as “volume”, “emodulus”, or scores from imported machine learning models (“ml_score_xxx”). These ancillary features are computed on-the-fly and are made available seamlessly through the same interface.

Filters

A filter can be used to gate events using features. There are min/max filters and 2D polygon filters. The following table defines the main filtering parameters:

filtering	parsed	description [units]
enable filters	`{f}`	Enable filtering
hierarchy parent	`str`	Hierarchy parent of the dataset
limit events	`{f}`	Upper limit for number of filtered events
polygon filters	`{f}`	Polygon filter indices
remove invalid events	`{f}`	Remove events with inf/nan values

Min/max filters are also defined in the filters section:

filtering	explanation
area_um min	Exclude events with area [µm²] below this value
area_um max	Exclude events with area [µm²] above this value
aspect max	Exclude events with an aspect ratio above this value
…	…

Examples

excluding events with large deformation

import matplotlib.pylab as plt
import dclab
ds = dclab.new_dataset("data/example.rtdc")

ds.config["filtering"]["deform min"] = 0
ds.config["filtering"]["deform max"] = .1
ds.apply_filter()
dif = ds.filter.all

f, axes = plt.subplots(1, 2, sharex=True, sharey=True)
axes[0].plot(ds["area_um"], ds["bright_avg"], "o", alpha=.2)
axes[0].set_title("unfiltered")
axes[1].plot(ds["area_um"][dif], ds["bright_avg"][dif], "o", alpha=.2)
axes[1].set_title("Deformation <= 0.1")

for ax in axes:
    ax.set_xlabel(dclab.dfn.get_feature_label("area_um"))
    ax.set_ylabel(dclab.dfn.get_feature_label("bright_avg"))

plt.tight_layout()
plt.show()

(Source code, png, hires.png, pdf)

excluding random events

This is useful if you need to have a (sub-)dataset of a specified size. The downsampling is reproducible (the same points are excluded).

import matplotlib.pylab as plt
import dclab
ds = dclab.new_dataset("data/example.rtdc")
ds.config["filtering"]["limit events"] = 4000
ds.apply_filter()
fid = ds.filter.all

ax = plt.subplot(111)
ax.plot(ds["area_um"][fid], ds["deform"][fid], "o", alpha=.2)
ax.set_xlabel(dclab.dfn.get_feature_label("area_um"))
ax.set_ylabel(dclab.dfn.get_feature_label("deform"))
plt.show()

(Source code, png, hires.png, pdf)

Experiment metadata

Every RT-DC measurement has metadata consisting of key-value-pairs. The following are supported:

experiment	parsed	description [units]
date	`str`	Date of measurement (‘YYYY-MM-DD’)
event count	`{f}`	Number of recorded events
run identifier	`str`	Unique measurement identifier
run index	`{f}`	Index of measurement run
sample	`str`	Measured sample or user-defined reference
time	`str`	Start time of measurement (‘HH:MM:SS[.S]’)

fluorescence	parsed	description [units]
baseline 1 offset	`{f}`	Baseline offset channel 1
baseline 2 offset	`{f}`	Baseline offset channel 2
baseline 3 offset	`{f}`	Baseline offset channel 3
bit depth	`{f}`	Trace bit depth
channel 1 name	`str`	FL1 description
channel 2 name	`str`	FL2 description
channel 3 name	`str`	FL3 description
channel count	`{f}`	Number of active channels
channels installed	`{f}`	Number of available channels
laser 1 lambda	`float`	Laser 1 wavelength [nm]
laser 1 power	`float`	Laser 1 output power [%]
laser 2 lambda	`float`	Laser 2 wavelength [nm]
laser 2 power	`float`	Laser 2 output power [%]
laser 3 lambda	`float`	Laser 3 wavelength [nm]
laser 3 power	`float`	Laser 3 output power [%]
laser count	`{f}`	Number of active lasers
lasers installed	`{f}`	Number of available lasers
sample rate	`{f}`	Trace sample rate [Hz]
samples per event	`{f}`	Samples per event
signal max	`float`	Upper voltage detection limit [V]
signal min	`float`	Lower voltage detection limit [V]
trace median	`{f}`	Rolling median filter size for traces

fmt_tdms	parsed	description [units]
video frame offset	`{f}`	Missing events at beginning of video

imaging	parsed	description [units]
flash device	`str`	Light source device type
flash duration	`float`	Light source flash duration [µs]
frame rate	`float`	Imaging frame rate [Hz]
pixel size	`float`	Pixel size [µm]
roi position x	`{f}`	Image x coordinate on sensor [px]
roi position y	`{f}`	Image y coordinate on sensor [px]
roi size x	`{f}`	Image width [px]
roi size y	`{f}`	Image height [px]

online_contour	parsed	description [units]
bg empty	`{f}`	Background correction from empty frames only
bin area min	`{f}`	Minium pixel area of binary image event
bin kernel	`{f}`	Disk size for binary closing of mask image
bin threshold	`{f}`	Threshold for mask from bg-corrected image
image blur	`{f}`	Odd sigma for Gaussian blur (21x21 kernel)
no absdiff	`{f}`	Do not use OpenCV ‘absdiff’ for bg-correction

online_filter	parsed	description [units]
target duration	`float`	Target measurement duration [min]
target event count	`{f}`	Target event count for online gating

pipeline	parsed	description [units]
dcnum background	`str`	Background ID
dcnum data	`str`	Data ID
dcnum feature	`str`	Feature extractor ID
dcnum gate	`str`	Gating ID
dcnum generation	`str`	Generation ID
dcnum hash	`str`	Hash
dcnum segmenter	`str`	Segmenter ID
dcnum yield	`{f}`	Event yield

qpi	parsed	description [units]
amp border loc	`str`	Border location specifier for amplitude
amp border px	`{f}`	Width of border for amplitude [pix]
amp fit offset	`str`	Amplitude offset correction
amp fit profile	`str`	Amplitude profile correction
bg method	`str`	Background computation method
filter name	`str`	Fourier filter used
filter size	`float`	Fourier filter size [1/pix]
focus interval	`{f}`	Focus interval to search [µm]
focus kernel	`str`	Propagation kernel
focus metric	`str`	Metric used to calculate focus
focus minimizer	`str`	Minimizer used to calculate focus
focus padding	`{f}`	Level of padding for refocus
invert phase	`{f}`	Invert the phase data
medium index	`float`	Refractive index of medium
padding	`{f}`	Level of padding
pha border loc	`str`	Border location specifier for phase
pha border px	`{f}`	Width of border for phase [pix]
pha fit offset	`str`	Phase offset correction
pha fit profile	`str`	Phase profile correction
pixel size proc	`float`	QPI pixel size [µm].
pixel size raw	`float`	Hologram pixel size [µm].
scale to filter	`{f}`	Scale QPI data to filter size
sideband freq	`{f}`	Sideband coordinates [1/pix]
software version	`str`	Software version(s)
subtract mean	`{f}`	Subtract mean before processing
wavelength	`float`	Imaging wavelength [nm]

setup	parsed	description [units]
channel width	`float`	Width of microfluidic channel [µm]
chip identifier	`{f}`	Unique identifier of the chip used
chip region	`{f}`	Imaged chip region (channel or reservoir)
flow rate	`float`	Flow rate in channel [µL/s]
flow rate sample	`float`	Sample flow rate [µL/s]
flow rate sheath	`float`	Sheath flow rate [µL/s]
identifier	`str`	Unique setup identifier
medium	`str`	Medium used
module composition	`str`	Comma-separated list of modules used
software version	`str`	Acquisition software with version
temperature	`float`	Mean chip temperature [°C]

Example: date and time of a measurement

In [1]: import dclab

In [2]: ds = dclab.new_dataset("data/example.rtdc")

In [3]: ds.config["experiment"]["date"], ds.config["experiment"]["time"]
Out[3]: ('2017-07-16', '19:01:36')

Analysis metadata

In addition to inherent (defined during data acquisition) metadata, dclab also supports additional metadata that are relevant for certain data analysis pipelines, such as Young’s modulus computation or fluorescence crosstalk correction.

calculation	parsed	description [units]
crosstalk fl12	`float`	Fluorescence crosstalk, channel 1 to 2
crosstalk fl13	`float`	Fluorescence crosstalk, channel 1 to 3
crosstalk fl21	`float`	Fluorescence crosstalk, channel 2 to 1
crosstalk fl23	`float`	Fluorescence crosstalk, channel 2 to 3
crosstalk fl31	`float`	Fluorescence crosstalk, channel 3 to 1
crosstalk fl32	`float`	Fluorescence crosstalk, channel 3 to 2
emodulus lut	`str`	Look-up table identifier
emodulus medium	`str`	Medium used (e.g. ‘0.49% MC-PBS’)
emodulus temperature	`float`	Chip temperature [°C]
emodulus viscosity	`float`	Viscosity [Pa*s] if ‘medium’ unknown
emodulus viscosity model	`str`	Viscosity model for known media

User-defined metadata

In addition to the registered metadata keys listed above, you may also define custom metadata in the “user” section. This section will be saved alongside the other metadata when a dataset is exported as an .rtdc (HDF5) file.

Note

It is recommended to use the following data types for the value of each key: str, bool, float and int. Other data types may not render nicely in ShapeOut2 or DCOR.

To edit the “user” section in dclab, simply modify the config property of a loaded dataset. The changes made are not written to the underlying file.

Example: Setting custom “user” metadata in dclab

In [4]: import dclab

In [5]: ds = dclab.new_dataset("data/example.rtdc")

In [6]: my_metadata = {"inlet": True, "n_channels": 4}

In [7]: ds.config["user"] = my_metadata

In [8]: other_metadata = {"outlet": False, "RBC": True}

# we can also add metadata with the `update` method
In [9]: ds.config["user"].update(other_metadata)

# or
In [10]: ds.config.update({"user": other_metadata})

In [11]: print(ds.config["user"])
{'inlet': True, 'n_channels': 4, 'outlet': False, 'RBC': True}

# we can clear the "user" section like so:
In [12]: ds.config["user"].clear()

If you are implementing a custom data acquisition pipeline, you may alternatively add user-defined meta data (permanently) to an .rtdc file in a post-measurement step like so.

Example: Setting custom “user” metadata permanently

import h5py
with h5py.File("/path/to/your/dataset.rtdc") as h5:
    h5.attrs["user:inlet"] = True
    h5.attrs["user:n_channels"] = 4
    h5.attrs["user:outlet"] = False
    h5.attrs["user:RBC"] = True
    h5.attrs["user:project"] = "strangelove"

User-defined metadata can also be used with user-defined plugin features. This allows you to design plugin features which utilize your pipeline-specific metadata.

Basins

Since dclab 0.51.0, you can define so-called basins in .rtdc files. Basins are files or remote locations that contain additional features that are not part of the file you opened initially.

For instance, you might want to compute some additional features for a measurement, but you want to avoid editing the original file data/example.rtdc, and you also need to have access to the features of the original file when working with the new file test.rtdc.

In [13]: import dclab

# Create the smaller file with the basin defined.
In [14]: with dclab.new_dataset("data/example.rtdc") as dso, dclab.RTDCWriter("test.rtdc", mode="reset") as hw:
   ....:    # copy metadata
   ....:    meta = dict(dso.config)
   ....:    meta.pop("filtering")
   ....:    hw.store_metadata(meta)
   ....:    # store a feature from the original dataset
   ....:    hw.store_feature("deform", dso["deform"])
   ....:    # store a user-defined featurr
   ....:    hw.store_feature("userdef1", 2.5*dso["deform"])
   ....:    # store the basin information
   ....:    hw.store_basin(basin_name="mytest",
   ....:                   basin_type="file",
   ....:                   basin_format="hdf5",
   ....:                   basin_locs=["data/example.rtdc"])
   ....: 

In [15]: ds2 = dclab.new_dataset("test.rtdc")

# the basin in "test.rtdc" gives you access to features stored in "data/example.rtdc"
In [16]: print(ds2.features)
['area_cvx', 'area_msd', 'area_ratio', 'area_um', 'aspect', 'bright_avg', 'bright_sd', 'circ', 'circ_times_area', 'deform', 'frame', 'index', 'inert_ratio_cvx', 'inert_ratio_raw', 'nevents', 'pos_x', 'pos_y', 'size_x', 'size_y', 'time', 'userdef1']

For more information, please take a look at the documentation of Basin and its subclasses.