RT-DC datasets

Knowing and understanding the RT-DC dataset classes is an important prerequisite when working with dclab. They are all derived from RTDCBase which gives access to feature with a dictionary-like interface, facilitates data export and filtering, and comes with several convenience methods that are useful for data visualization. RT-DC datasets can be based on a data file format (RTDC_TDMS and RTDC_HDF5), created from user-defined dictionaries (RTDC_Dict), or derived from other RT-DC datasets (RTDC_Hierarchy).

Basic usage

The convenience function dclab.new_dataset() takes care of determining the data file format (tdms or hdf5) and returns the corresponding derived class.

In [1]: import dclab

In [2]: ds = dclab.new_dataset("data/example.rtdc")

In [3]: ds.__class__.__name__
Out[3]: 'RTDC_HDF5'

Working with other data

It is also possible to load other data into dclab from a dictionary.

In [4]: data = dict(deform=np.random.rand(100),
   ...:             area_um=np.random.rand(100))

In [5]: ds_dict = dclab.new_dataset(data)

In [6]: ds_dict.__class__.__name__
Out[6]: 'RTDC_Dict'

Creating hierarchies

When applying filtering operations, it is sometimes helpful to use hierarchies for keeping track of the individual filtering steps.

In [7]: child = dclab.new_dataset(ds)

In [8]: grandchild = dclab.new_dataset(child)

In [9]: ds.config["filtering"]["deform max"] = .15

In [10]: child.config["filtering"]["area_um max"] = 80

In [11]: grandchild.apply_filter()

In [12]: len(ds), len(child), len(grandchild)
Out[12]: (5000, 4937, 4782)

In [13]: ds.filter.all.sum(), child.filter.all.sum(), grandchild.filter.all.sum()
Out[13]: (4937, 4782, 4782)

Note that calling ds1_b.apply_filter() automatically calls ds1_a.apply_filter() and ds1.apply_filter(). Also note that, as expected, the size of each hierarchy child is identical to the sum of the boolean filtering array from its hierarchy parent.

Scripting goodies

Here are a few useful functionalities for scripting with dclab.

# unique identifier of the RTDCBase instance (not reproducible)
In [14]: ds.identifier
Out[14]: 'mm-hdf5_1ba1619'

# reproducible hash of the dataset
In [15]: ds.hash
Out[15]: '8ff19f702a236cbf91e13667e144e722'

# dataset format
In [16]: ds.format
Out[16]: 'hdf5'

# available features
In [17]: ds.features

# test feature availability (success)
In [18]: "area_um" in ds
Out[18]: True

# test feature availability (failure)
In [19]: "image" in ds
Out[19]: False

# accessing a feature and computing its mean
In [20]: ds["area_um"].mean()
Out[20]: 49.728645

# accessing the measurement configuration
In [21]: ds.config.keys()
Out[21]: dict_keys(['filtering', 'experiment', 'imaging', 'online_contour', 'setup'])

In [22]: ds.config["experiment"]
{'date': '2017-07-16',
 'event count': 5000,
 'run index': 1,
 'sample': 'docs-data',
 'time': '19:01:36'}

# determine the identifier of the hierarchy parent
In [23]: child.config["filtering"]["hierarchy parent"]
Out[23]: 'mm-hdf5_1ba1619'


The statistics module comes with a predefined set of methods to compute simple feature statistics.

In [24]: import dclab

In [25]: ds = dclab.new_dataset("data/example.rtdc")

In [26]: stats = dclab.statistics.get_statistics(ds,
   ....:                                         features=["deform", "aspect"],
   ....:                                         methods=["Mode", "Mean", "SD"])

In [27]: dict(zip(*stats))
{'Mode Deformation': 0.016635261,
 'Mean Deformation': 0.0287258,
 'SD Deformation': 0.028740086,
 'Mode Aspect ratio of bounding box': 1.1091422,
 'Mean Aspect ratio of bounding box': 1.2719607,
 'SD Aspect ratio of bounding box': 0.25233853}

Note that the statistics take into account the applied filters:

In [28]: ds.config["filtering"]["deform max"] = .1

In [29]: ds.apply_filter()

In [30]: stats2 = dclab.statistics.get_statistics(ds,
   ....:                                          features=["deform", "aspect"],
   ....:                                          methods=["Mode", "Mean", "SD"])

In [31]: dict(zip(*stats2))
{'Mode Deformation': 0.017006295,
 'Mean Deformation': 0.02476519,
 'SD Deformation': 0.015638638,
 'Mode Aspect ratio of bounding box': 1.1232222,
 'Mean Aspect ratio of bounding box': 1.2407207,
 'SD Aspect ratio of bounding box': 0.15993708}

These are the available statistics methods:

In [32]: dclab.statistics.Statistics.available_methods.keys()
Out[32]: dict_keys(['Mean', 'Median', 'Mode', 'SD', 'Events', '%-gated', 'Flow rate'])


The RTDCBase class has the attribute RTDCBase.export which allows to export event data to several data file formats. See export for more information.

In [33]: ds.export.tsv(path="export_example.tsv",
   ....:               features=["area_um", "deform"],
   ....:               filtered=True,
   ....:               override=True)

In [34]: ds.export.hdf5(path="export_example.rtdc",
   ....:                features=["area_um", "aspect", "deform"],
   ....:                filtered=True,
   ....:                override=True)

Note that data exported as HDF5 files can be loaded with dclab (reproducing the previously computed statistics - without filters).

In [35]: ds2 = dclab.new_dataset("export_example.rtdc")

In [36]: ds2["deform"].mean()
Out[36]: 0.02476519