RT-DC datasets¶
Knowing and understanding the RT-DC dataset classes
is an important prerequisite when working with dclab. They are all
derived from RTDCBase
which
gives access to feature with a dictionary-like interface, facilitates data export
and filtering, and comes with several convenience methods that are useful
for data visualization.
RT-DC datasets can be based on a data file format
(RTDC_TDMS
and
RTDC_HDF5
), created from user-defined
dictionaries (RTDC_Dict
),
or derived from other RT-DC datasets
(RTDC_Hierarchy
).
Basic usage¶
The convenience function dclab.new_dataset()
takes care of determining
the data file format (tdms or hdf5) and returns the corresponding derived
class.
In [1]: import dclab
In [2]: ds = dclab.new_dataset("data/example.rtdc")
In [3]: ds.__class__.__name__
Out[3]: 'RTDC_HDF5'
Working with other data¶
It is also possible to load other data into dclab from a dictionary.
In [4]: data = dict(deform=np.random.rand(100),
...: area_um=np.random.rand(100))
...:
In [5]: ds_dict = dclab.new_dataset(data)
In [6]: ds_dict.__class__.__name__
Out[6]: 'RTDC_Dict'
Using filters¶
Filters are used to mask e.g. debris or doublets from a dataset.
# Restrict the deformation to 0.15
In [7]: ds.config["filtering"]["deform max"] = .15
# Manually excluding events using array indices is also possible:
# `ds.filter.manual` is a 1D boolean array of size `len(ds)`
# where `False` values mean that the events are excluded.
In [8]: ds.filter.manual[[0, 400, 345, 1000]] = False
In [9]: ds.apply_filter()
# The boolean array `ds.filter.all` represents the applied filter
# and can be used for indexing.
In [10]: ds["deform"].mean(), ds["deform"][ds.filter.all].mean()
Out[10]: (0.0287258, 0.026486598)
Note that ds.apply_filter()
must be called, otherwise
ds.filter.all
will not be updated.
Creating hierarchies¶
When applying filtering operations, it is sometimes helpful to use hierarchies for keeping track of the individual filtering steps.
In [11]: child = dclab.new_dataset(ds)
In [12]: child.config["filtering"]["area_um max"] = 80
In [13]: grandchild = dclab.new_dataset(child)
In [14]: grandchild.apply_filter()
In [15]: len(ds), len(child), len(grandchild)
Out[15]: (5000, 4933, 4778)
In [16]: ds.filter.all.sum(), child.filter.all.sum(), grandchild.filter.all.sum()