A feature is a measurement parameter of an RT-DC measurement. For
instance, the feature “index” enumerates all recorded events, the
feature “deform” contains the deformation values of all events.
There are scalar features, i.e. features that assign a single number
to an event, and non-scalar features, such as “image” and “contour”.
All features in a dataset are exposed as read-only to the user.
The following features are supported by dclab:
In addition to these scalar features, it is possible to define
a large number of features dedicated to machine-learning, the
“ml_score_???” features: The “?” can be a digit or a lower-case
letter of the alphabet, e.g. “ml_score_rbc” or “ml_score_3a3”.
If “ml_score_???” features are defined, then the ancillary
“ml_class” feature, which identifies the most-probable feature
for each event, becomes available.
Not all features available in dclab are recorded online during the
acquisition of the experimental dataset. Some of the features are
computed offline by dclab, such as “volume”, “emodulus”, or
scores from imported machine learning models (“ml_score_xxx”). These
ancillary features are computed on-the-fly and are made available
seamlessly through the same interface.
A filter can be used to gate events using features. There are
min/max filters and 2D polygon filters.
The following table defines the main filtering parameters:
In addition to inherent (defined during data acquisition) metadata,
dclab also supports additional metadata that are relevant for certain
data analysis pipelines, such as Young’s modulus computation or
fluorescence crosstalk correction.
In addition to the registered metadata keys listed above,
you may also define custom metadata in the “user” section.
This section will be saved alongside the other metadata when
a dataset is exported as an .rtdc (HDF5) file.
Note
It is recommended to use the following data types for the value of
each key: str, bool, float and int. Other data types may
not render nicely in ShapeOut2 or DCOR.
To edit the “user” section in dclab, simply modify the config
property of a loaded dataset. The changes made are not written
to the underlying file.
Example: Setting custom “user” metadata in dclab
In [4]: importdclabIn [5]: ds=dclab.new_dataset("data/example.rtdc")In [6]: my_metadata={"inlet":True,"n_channels":4}In [7]: ds.config["user"]=my_metadataIn [8]: other_metadata={"outlet":False,"RBC":True}# we can also add metadata with the `update` methodIn [9]: ds.config["user"].update(other_metadata)# orIn [10]: ds.config.update({"user":other_metadata})In [11]: print(ds.config["user"]){'inlet': True, 'n_channels': 4, 'outlet': False, 'RBC': True}# we can clear the "user" section like so:In [12]: ds.config["user"].clear()
If you are implementing a custom data acquisition pipeline, you may
alternatively add user-defined meta data (permanently) to an .rtdc file
in a post-measurement step like so.
User-defined metadata can also be used with user-defined
plugin features. This allows you
to design plugin features which utilize your pipeline-specific metadata.
Since dclab 0.51.0, you can define so-called basins in .rtdc files.
Basins are files or remote locations that contain additional
features that are not part of the file you opened
initially.
For instance, you might want to compute some additional features for a
measurement, but you want to avoid editing the original file data/example.rtdc,
and you also need to have access to the features of the original file when working with
the new file test.rtdc.
In [13]: importdclab# Create the smaller file with the basin defined.In [14]: withdclab.new_dataset("data/example.rtdc")asdso,dclab.RTDCWriter("test.rtdc",mode="reset")ashw: ....: # copy metadata ....: meta=dict(dso.config) ....: meta.pop("filtering") ....: hw.store_metadata(meta) ....: # store a feature from the original dataset ....: hw.store_feature("deform",dso["deform"]) ....: # store a user-defined featurr ....: hw.store_feature("userdef1",2.5*dso["deform"]) ....: # store the basin information ....: hw.store_basin(basin_name="mytest", ....: basin_type="file", ....: basin_format="hdf5", ....: basin_locs=["data/example.rtdc"]) ....: In [15]: ds2=dclab.new_dataset("test.rtdc")# the basin in "test.rtdc" gives you access to features stored in "data/example.rtdc"In [16]: print(ds2.features)['area_cvx', 'area_msd', 'area_ratio', 'area_um', 'aspect', 'bright_avg', 'bright_sd', 'circ', 'circ_times_area', 'deform', 'frame', 'index', 'inert_ratio_cvx', 'inert_ratio_raw', 'nevents', 'pos_x', 'pos_y', 'size_x', 'size_y', 'time', 'userdef1']
For more information, please take a look at the documentation of Basin
and its subclasses.