RT-DC datasets

Knowing and understanding the RT-DC dataset classes is an important prerequisite when working with dclab. They are all derived from RTDCBase which gives access to feature with a dictionary-like interface, facilitates data export and filtering, and comes with several convenience methods that are useful for data visualization. RT-DC datasets can be based on a data file format (RTDC_TDMS and RTDC_HDF5), created from user-defined dictionaries (RTDC_Dict), or derived from other RT-DC datasets (RTDC_Hierarchy).

Loading data from disk

The convenience function dclab.new_dataset() takes care of determining the data file format (tdms or hdf5) and returns the corresponding derived class.

In [1]: import dclab

In [2]: ds = dclab.new_dataset("data/example.rtdc")

In [3]: ds.__class__.__name__
Out[3]: 'RTDC_HDF5'

Working with other data

It is also possible to load other data into dclab from a dictionary.

In [4]: data = dict(deform=np.random.rand(100),
   ...:             area_um=np.random.rand(100))
   ...: 

In [5]: ds_dict = dclab.new_dataset(data)

In [6]: ds_dict.__class__.__name__
Out[6]: 'RTDC_Dict'

Creating hierarchies

When applying filtering operations, it is sometimes helpful to use hierarchies for keeping track of the individual filtering steps.

In [7]: child = dclab.new_dataset(ds)

In [8]: grandchild = dclab.new_dataset(child)

In [9]: ds.config["filtering"]["deform max"] = .15

In [10]: child.config["filtering"]["area_um max"] = 80

In [11]: grandchild.apply_filter()

In [12]: len(ds), len(child), len(grandchild)
Out[12]: (5000, 4937, 4782)

In [13]: ds.filter.all.sum(), child.filter.all.sum(), grandchild.filter.all.sum()
Out[13]: (4937, 4782, 4782)

Note that calling ds1_b.apply_filter() automatically calls ds1_a.apply_filter() and ds1.apply_filter(). Also note that, as expected, the size of each hierarchy child is identical to the sum of the boolean filtering array from its hierarchy parent.

Scripting goodies

Here are a few useful functionalities for scripting with dclab.

# unique identifier of the RTDCBase instance (not reproducible)
In [14]: ds.identifier
Out[14]: 'mm-hdf5_d7013ad'

# reproducible hash of the dataset
In [15]: ds.hash
Out[15]: '8ff19f702a236cbf91e13667e144e722'

# dataset format
In [16]: ds.format
Out[16]: 'hdf5'

# available features
In [17]: ds.features
Out[17]: 
['area_cvx',
 'area_msd',
 'area_ratio',
 'area_um',
 'aspect',
 'bright_avg',
 'bright_sd',
 'circ',
 'deform',
 'frame',
 'index',
 'inert_ratio_cvx',
 'inert_ratio_raw',
 'nevents',
 'pos_x',
 'pos_y',
 'size_x',
 'size_y',
 'time']

# test feature availability (success)
In [18]: "area_um" in ds
Out[18]: True

# test feature availability (failure)
In [19]: "image" in ds
Out[19]: False

# accessing a feature and computing its mean
In [20]: ds["area_um"].mean()
Out[20]: 49.728645

# accessing the measurement configuration
In [21]: ds.config.keys()
Out[21]: dict_keys(['online_contour', 'experiment', 'imaging', 'filtering', 'setup'])

In [22]: ds.config["experiment"]
Out[22]: 
{'date': '2017-07-16',
 'event count': 5000,
 'run index': 1,
 'sample': 'docs-data',
 'time': '19:01:36'}

# determine the identifier of the hierarchy parent
In [23]: child.config["filtering"]["hierarchy parent"]
Out[23]: 'mm-hdf5_d7013ad'