User-defined plugin features

For specialized applications, the features defined internally in dclab might not be enough to describe certain aspects of your data. Plugin features allow you to define a recipe for computing a new feature. This new feature is then available automatically for every dataset loaded in dclab.

Note

The advantages of plugin features over temporary features are that plugin features are reproducible, shareable, versionable, and generally more transparent. You should only use temporary features if absolutely necessary.

Using plugin feature recipes

If a colleague sent you a plugin feature recipe (a .py file), you just have to load it in dclab to use it.

In [1]: import dclab

In [2]: import numpy as np

# load a plugin feature (makes `circ_times_area` available)
In [3]: dclab.load_plugin_feature("data/example_plugin.py")
Out[3]: [<PlugInFeature 'circ_times_area' (id 70254...) with priority 0 at 0x7f83d85a5460>]

# load some data
In [4]: ds = dclab.new_dataset("data/example.rtdc")

# access the new feature
In [5]: circ_per_area = ds["circ_times_area"]

# do some filtering
In [6]: ds.config["filtering"]["circ_times_area min"] = 23

In [7]: ds.config["filtering"]["circ_times_area max"] = 29

In [8]: ds.apply_filter()

In [9]: print("Removed {} out of {} events!".format(np.sum(~ds.filter.all), len(ds)))
Removed 4828 out of 5000 events!

Please also have a look at the plugin usage example.

Auto-loading multiple plugin feature recipes

If you have several plugins and would like to load them all at once, you can do the following at the beginning of your scripts:

for plugin_path in pathlib.Path("my_plugin_directory").rglob("*.py"):
    dclab.load_plugin_feature(plugin_path)

Writing a plugin feature recipe

A plugin feature recipe is defined in a Python script (e.g. my_dclab_plugin.py). A plugin feature recipe contains a function and an info dictionary. The function calculates the desired feature and can even calculate several features, while the dictionary defines any extra (meta-)information of the calculated feature (or features). Both, “method” (the function) and “feature names”, must be included in the info dictionary. Note that many of the items in the dictionary must be lists! Also note that in case a feature recipe contains multiple features, there must be only one function for their calculation and only one info dictionary. Below are three examples of creating and using plugin features.

Note

Plugin features are based on ancillary features (code reference).

Simple plugin feature recipe

In this basic example, the function compute_my_feature() defines the basic feature “circ_times_area”.

def compute_my_feature(rtdc_ds):
    """Compute circularity times area"""
    circ_times_area = rtdc_ds["circ"] * rtdc_ds["area_um"]
    return {"circ_times_area": circ_times_area}


info = {
    "method": compute_my_feature,
    "description": "Compute area times circularity",
    "feature names": ["circ_times_area"],
    "features required": ["circ", "area_um"],
    "version": "0.1.0",
}

Advanced plugin feature recipe

In this example, the function compute_some_new_features() defines two basic features: “circ_per_area” and “circ_times_area”. Notice that both features are computed in one function and that there is only one info dictionary:

"""Exemplary plugin feature

You can import the features defined in this file into dclab
with ``dclab.load_plugin_feature("/path/to/plugin_example.py")``.
"""


def compute_some_new_features(rtdc_ds):
    """The function that does the heavy-lifting"""
    circ_per_area = rtdc_ds["circ"] / rtdc_ds["area_um"]
    circ_times_area = rtdc_ds["circ"] * rtdc_ds["area_um"]
    # returns a dictionary-like object
    return {"circ_per_area": circ_per_area, "circ_times_area": circ_times_area}


info = {
    "method": compute_some_new_features,
    "description": "This plugin will compute some features",
    "long description": "Even longer description that "
                        "can span multiple lines",
    "feature names": ["circ_per_area", "circ_times_area"],
    "feature labels": ["Circularity per Area", "Circularity times Area"],
    "features required": ["circ", "area_um"],
    "config required": [],
    "method check required": lambda x: True,
    "scalar feature": [True, True],
    "version": "0.1.0",
}

Here, all possible keys in the info dictionary are shown (but not all are used). The keys are additional keyword arguments to the AncillaryFeature class:

  • features required corresponds to req_features

  • config required corresponds to req_config

  • method check required corresponds to req_func

The scalar feature is a list of boolean values that defines whether a feature is scalar or not (defaults to True).

Plugin feature recipe with user-defined metadata

In this example, the function compute_area_exponent() defines the basic feature area_exp, which is calculated using user-defined metadata.

def compute_area_exponent(rtdc_ds):
    """Compute area^exp depending on the given user-defined metadata"""
    area_exp = rtdc_ds["area_um"] ** rtdc_ds.config["user"]["exp"]
    return {"area_exp": area_exp}


info = {
    "method": compute_area_exponent,
    "description": "Compute area to the power of exp",
    "feature names": ["area_exp"],
    "features required": ["area_um"],
    "config required": [["user", ["exp"]]],
    "version": "0.1.0",
}

The above plugin uses the “exp” key in the “user” configuration section to set the exponent value (notice the "config required" key in the info dict). Therefore, the feature area_exp is only available, when rtdc_ds.config["user"]["exp"] is set.

In [10]: import dclab

In [11]: dclab.load_plugin_feature("data/example_plugin_metadata.py")
Out[11]: [<PlugInFeature 'area_exp' (id 5f03f...) with priority 0 at 0x7f83d852a4f0>]

In [12]: ds = dclab.new_dataset("data/example.rtdc")

# The plugin feature is not yet available, because "user:exp" is missing
In [13]: "area_exp" in ds
Out[13]: False

# Set user-defined metadata
In [14]: my_metadata = {"inlet": True, "n_channels": 4, "exp": 3}

In [15]: ds.config["user"] = my_metadata

# The plugin feature is now available
In [16]: "area_exp" in ds
Out[16]: True

# Now the plugin feature can be accessed like any regular feature
In [17]: area_exp = ds["area_exp"]

Reloading plugin features stored in data files

It is also possible to store plugin features within datasets on disk. This may be useful if the speed of calculation of your plugin feature is slow, and you don’t want to recalculate each time you open your dataset. The process for storing plugin feature data is similar to that described for temporary features. If you would like to access those feature data at a later time point, you still have to load the plugin feature recipe first:

dclab.load_plugin_feature("/path/to/plugin.py")
ds = dclab.new_dataset("/path/to/data_with_new_plugin_feature.rtdc")
circ_per_area = ds["circ_per_area"]

And this works as well (loading plugin after instantiation):

ds = dclab.new_dataset("/path/to/data_with_new_plugin_feature.rtdc")
dclab.load_plugin_feature("/path/to/plugin.py")
circ_per_area = ds["circ_per_area"]

Note

After storing and reloading, this feature is now an innate feature. You could in principle also access it by registering it as a temporary feature (e.g. if you don’t have the recipe lying around).

See the code reference on plugin features for more information.