Code reference

Module-level methods

dclab.new_dataset(data, identifier=None, **kwargs)[source]

Initialize a new RT-DC dataset

Parameters
  • data

    can be one of the following:

    • dict

    • .tdms file

    • .rtdc file

    • subclass of RTDCBase (will create a hierarchy child)

    • DCOR resource URL

  • identifier (str) – A unique identifier for this dataset. If set to None an identifier is generated.

  • kwargs (dict) – Additional parameters passed to the RTDCBase subclass

Returns

dataset – A new dataset instance

Return type

subclass of dclab.rtdc_dataset.RTDCBase

Global definitions

These definitionas are used throughout the dclab/Shape-In/Shape-Out ecosystem.

Configuration

Valid configuration sections and keys are described in: Analysis metadata and Experiment metadata.

dclab.definitions.CFG_ANALYSIS

All configuration keywords editable by the user

dclab.definitions.CFG_METADATA

All read-only configuration keywords for a measurement

dclab.definitions.config_funcs

dict of dicts containing functions to convert input data

dclab.definitions.config_keys

dict with section as keys and config parameter names as values

dclab.definitions.config_types

dict of dicts containing the type of section parameters

Features

Features are discussed in more detail in: Features.

dclab.definitions.feature_exists(name, scalar_only=False)[source]

Return True if name is a valid feature name

This function not only checks whether name is in feature_names, but also validates against the machine learning scores ml_score_??? (where ? can be a digit or a lower-case letter in the English alphabet).

Parameters
  • name (str) – name of a feature

  • scalar_only (bool) – Specify whether the check should only search in scalar features

Returns

valid – True if name is a valid feature, False otherwise.

Return type

bool

See also

scalar_feature_exists

Wraps feature_exists with scalar_only=True

dclab.definitions.get_feature_label(name, rtdc_ds=None)[source]

Return the label corresponding to a feature name

This function not only checks feature_name2label, but also supports registered ml_score_??? features.

Parameters

name (str) – name of a feature

Returns

label – feature label corresponding to the feature name

Return type

str

Notes

TODO: extract feature label from ancillary information when an rtdc_ds is given.

dclab.definitions.scalar_feature_exists(name)[source]

Convenience method wrapping feature_exists(…, scalar_only=True)

dclab.definitions.FEATURES_NON_SCALAR

list of non-scalar features

dclab.definitions.feature_names

list of feature names

dclab.definitions.feature_labels

list of feature labels (same order as feature_names

dclab.definitions.feature_name2label

dict for converting feature names to labels

dclab.definitions.scalar_feature_names

list of scalar feature names

Parse functions

dclab.parse_funcs.f2dfloatarray(value)[source]
dclab.parse_funcs.fbool(value)[source]

boolean

dclab.parse_funcs.fint(value)[source]

integer

dclab.parse_funcs.fintlist(alist)[source]

A list of integers

dclab.parse_funcs.lcstr(astr)[source]

lower-case string

dclab.parse_funcs.func_types = {<function f2dfloatarray>: <class 'numpy.ndarray'>, <function fbool>: (<class 'bool'>, <class 'numpy.bool_'>), <function fint>: <class 'numbers.Integral'>, <function fintlist>: <class 'list'>, <class 'float'>: <class 'numbers.Number'>, <function lcstr>: <class 'str'>}

maps functions to their expected output types

RT-DC dataset manipulation

Base class

class dclab.rtdc_dataset.RTDCBase(identifier=None)[source]

RT-DC measurement base class

Notes

Besides the filter arrays for each data feature, there is a manual boolean filter array RTDCBase.filter.manual that can be edited by the user - a boolean value of False means that the event is excluded from all computations.

apply_filter(force=None)[source]

Compute the filters for the dataset

get_downsampled_scatter(xax='area_um', yax='deform', downsample=0, xscale='linear', yscale='linear', remove_invalid=False, ret_mask=False)[source]

Downsampling by removing points at dense locations

Parameters
  • xax (str) – Identifier for x axis (e.g. “area_um”, “aspect”, “deform”)

  • yax (str) – Identifier for y axis

  • downsample (int) –

    Number of points to draw in the down-sampled plot. This number is either

    • >=1: exactly downsample to this number by randomly adding

      or removing points

    • 0 : do not perform downsampling

  • xscale (str) – If set to “log”, take the logarithm of the x-values before performing downsampling. This is useful when data are are displayed on a log-scale. Defaults to “linear”.

  • yscale (str) – See xscale.

  • remove_invalid (bool) – Remove nan and inf values before downsampling; if set to True, the actual number of samples returned might be smaller than downsample due to infinite or nan values (e.g. due to logarithmic scaling).

  • ret_mask (bool) – If set to True, returns a boolean array of length len(self) where True values identify the filtered data.

Returns

  • xnew, xnew (1d ndarray of lenght N) – Filtered data; N is either identical to downsample or smaller (if remove_invalid==True)

  • mask (1d boolean array of length len(RTDCBase)) – Array for identifying the downsampled data points

get_kde_contour(xax='area_um', yax='deform', xacc=None, yacc=None, kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear')[source]

Evaluate the kernel density estimate for contour plots

Parameters
  • xax (str) – Identifier for X axis (e.g. “area_um”, “aspect”, “deform”)

  • yax (str) – Identifier for Y axis

  • xacc (float) – Contour accuracy in x direction

  • yacc (float) – Contour accuracy in y direction

  • kde_type (str) – The KDE method to use

  • kde_kwargs (dict) – Additional keyword arguments to the KDE method

  • xscale (str) – If set to “log”, take the logarithm of the x-values before computing the KDE. This is useful when data are are displayed on a log-scale. Defaults to “linear”.

  • yscale (str) – See xscale.

Returns

X, Y, Z – The kernel density Z evaluated on a rectangular grid (X,Y).

Return type

coordinates

get_kde_scatter(xax='area_um', yax='deform', positions=None, kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear')[source]

Evaluate the kernel density estimate for scatter plots

Parameters
  • xax (str) – Identifier for X axis (e.g. “area_um”, “aspect”, “deform”)

  • yax (str) – Identifier for Y axis

  • positions (list of two 1d ndarrays or ndarray of shape (2, N)) – The positions where the KDE will be computed. Note that the KDE estimate is computed from the points that are set in self.filter.all.

  • kde_type (str) – The KDE method to use

  • kde_kwargs (dict) – Additional keyword arguments to the KDE method

  • xscale (str) – If set to “log”, take the logarithm of the x-values before computing the KDE. This is useful when data are are displayed on a log-scale. Defaults to “linear”.

  • yscale (str) – See xscale.

Returns

density – The kernel density evaluated for the filtered data points.

Return type

1d ndarray

static get_kde_spacing(a, scale='linear', method=<function bin_width_doane>, method_kw=None, feat='undefined', ret_scaled=False)[source]

Convenience function for computing the contour spacing

Parameters
  • a (ndarray) – feature data

  • scale (str) – how the data should be scaled (“log” or “linear”)

  • method (callable) – KDE method to use (see kde_methods submodule)

  • method_kw (dict) – keyword arguments to method

  • feat (str) – feature name for debugging

  • ret_scaled (bol) – whether or not to return the scaled array of a

polygon_filter_add(filt)[source]

Associate a Polygon Filter with this instance

Parameters

filt (int or instance of PolygonFilter) – The polygon filter to add

polygon_filter_rm(filt)[source]

Remove a polygon filter from this instance

Parameters

filt (int or instance of PolygonFilter) – The polygon filter to remove

reset_filter()[source]

Reset the current filter

config

Configuration of the measurement

export

Export functionalities; instance of dclab.rtdc_dataset.export.Export.

property features

All available features

property features_innate

All features excluding ancillary or temporary features

property features_loaded

All features that have been computed

This includes ancillary features and temporary features.

Notes

Features that are computationally cheap to compute are always included. They are defined in dclab.rtdc_dataset.feat_anc_core.FEATURES_RAPID.

property features_scalar

All scalar features available

filter

Filtering functionalities; instance of dclab.rtdc_dataset.filter.Filter.

format

Dataset format (derived from class name)

abstract property hash

Reproducible dataset hash (defined by derived classes)

property identifier

Unique (unreproducible) identifier

logs

Dictionary of log files. Each log file is a list of strings (one string per line).

path

Path or DCOR identifier of the dataset (set to “none” for RTDC_Dict)

title

Title of the measurement

DCOR (online) format

class dclab.rtdc_dataset.RTDC_DCOR(url, use_ssl=None, host='dcor.mpl.mpg.de', api_key='', *args, **kwargs)[source]

Wrap around the DCOR API

Parameters
  • url (str) –

    Full URL or resource identifier; valid values are

  • use_ssl (bool) – Set this to False to disable SSL (should only be used for testing). Defaults to None (does not force SSL if the URL starts with “http://”).

  • host (str) – The host machine (used if the host is not given in url)

  • api_key (str) – API key to access private resources

  • *args – Arguments for RTDCBase

  • **kwargs – Keyword arguments for RTDCBase

path

Full URL to the DCOR resource

Type

str

static get_full_url(url, use_ssl, host)[source]

Return the full URL to a DCOR resource

Parameters
  • url (str) –

    Full URL or resource identifier; valid values are

  • use_ssl (bool) – Set this to False to disable SSL (should only be used for testing). Defaults to None (does not force SSL if the URL starts with “http://”).

  • host (str) – Use this host if it is not specified in url

property hash

Hash value based on file name and content

class dclab.rtdc_dataset.fmt_dcor.APIHandler(url, api_key='')[source]

Handles the DCOR api with caching for simple queries

classmethod add_api_key(api_key)[source]

Add an API Key to the base class

When accessing the DCOR API, all available API Keys are used to access a resource (trial and error).

api_keys = []

DCOR API Keys in the current session

cache_queries = ['metadata', 'size', 'feature_list', 'valid']

these are cached to minimize network usage

Dictionary format

class dclab.rtdc_dataset.RTDC_Dict(ddict, *args, **kwargs)[source]

Dictionary-based RT-DC dataset

Parameters
  • ddict (dict) –

    Dictionary with features as keys (valid features like “area_cvx”, “deform”, “image” are defined by dclab.definitions.feature_exists) with which the class will be instantiated. The configuration is set to the default configuration of dclab.

    Changed in version 0.27.0: Scalar features are automatically converted to arrays.

  • *args – Arguments for RTDCBase

  • **kwargs – Keyword arguments for RTDCBase

HDF5 (.rtdc) format

class dclab.rtdc_dataset.RTDC_HDF5(h5path, *args, **kwargs)[source]

HDF5 file format for RT-DC measurements

Parameters
  • h5path (str or pathlib.Path) – Path to a ‘.tdms’ measurement file.

  • *args – Arguments for RTDCBase

  • **kwargs – Keyword arguments for RTDCBase

path

Path to the experimental HDF5 (.rtdc) file

Type

pathlib.Path

static can_open(h5path)[source]

Check whether a given file is in the .rtdc file format

static parse_config(h5path)[source]

Parse the RT-DC configuration of an hdf5 file

property hash

Hash value based on file name and content

dclab.rtdc_dataset.fmt_hdf5.MIN_DCLAB_EXPORT_VERSION = '0.3.3.dev2'

rtdc files exported with dclab prior to this version are not supported

Hierarchy format

class dclab.rtdc_dataset.RTDC_Hierarchy(hparent, apply_filter=True, *args, **kwargs)[source]

Hierarchy dataset (filtered from RTDCBase)

A few words on hierarchies: The idea is that a subclass of RTDCBase can use the filtered data of another subclass of RTDCBase and interpret these data as unfiltered events. This comes in handy e.g. when the percentage of different subpopulations need to be distinguished without the noise in the original data.

Children in hierarchies always update their data according to the filtered event data from their parent when apply_filter is called. This makes it easier to save and load hierarchy children with e.g. Shape-Out and it makes the handling of hierarchies more intuitive (when the parent changes, the child changes as well).

Parameters
  • hparent (instance of RTDCBase) – The hierarchy parent

  • apply_filter (bool) – Whether to apply the filter during instantiation; If set to False, apply_filter must be called manually.

  • *args – Arguments for RTDCBase

  • **kwargs – Keyword arguments for RTDCBase

hparent

Hierarchy parent of this instance

Type

RTDCBase

TDMS format

class dclab.rtdc_dataset.RTDC_TDMS(tdms_path, *args, **kwargs)[source]

TDMS file format for RT-DC measurements

Parameters
  • tdms_path (str or pathlib.Path) – Path to a ‘.tdms’ measurement file.

  • *args – Arguments for RTDCBase

  • **kwargs – Keyword arguments for RTDCBase

path

Path to the experimental dataset (main .tdms file)

Type

pathlib.Path

dclab.rtdc_dataset.fmt_tdms.get_project_name_from_path(path, append_mx=False)[source]

Get the project name from a path.

For a path “/home/peter/hans/HLC12398/online/M1_13.tdms” or For a path “/home/peter/hans/HLC12398/online/data/M1_13.tdms” or without the “.tdms” file, this will return always “HLC12398”.

Parameters
  • path (str) – path to tdms file

  • append_mx (bool) – append measurement number, e.g. “M1”

dclab.rtdc_dataset.fmt_tdms.get_tdms_files(directory)[source]

Recursively find projects based on ‘.tdms’ file endings

Searches the directory recursively and return a sorted list of all found ‘.tdms’ project files, except fluorescence data trace files which end with _traces.tdms.

Ancillaries

Computation of ancillary features

Ancillary features are computed on-the-fly in dclab if the required data are available. The features are registered here and are computed when RTDCBase.__getitem__ is called with the respective feature name. When RTDCBase.__contains__ is called with the feature name, then the feature is not yet computed, but the prerequisites are evaluated:

In [1]: import dclab

In [2]: ds = dclab.new_dataset("data/example.rtdc")

In [3]: ds.config["calculation"]["emodulus lut"] = "LE-2D-FEM-19"

In [4]: ds.config["calculation"]["emodulus medium"] = "CellCarrier"

In [5]: ds.config["calculation"]["emodulus temperature"] = 23.0

In [6]: "emodulus" in ds  # nothing is computed
Out[6]: True

In [7]: ds["emodulus"] # now data is computed and cached
Out[7]: 
array([1.23006241, 1.08662317,        nan, ...,        nan,        nan,
       0.75430855])

Once the data has been computed, RTDCBase caches it in the _ancillaries property dict together with a hash that is computed with AncillaryFeature.hash. The hash is computed from the feature data req_features and the configuration metadata req_config.

exception dclab.rtdc_dataset.feat_anc_core.ancillary_feature.BadFeatureSizeWarning[source]
class dclab.rtdc_dataset.feat_anc_core.ancillary_feature.AncillaryFeature(feature_name, method, req_config=None, req_features=None, req_func=<function AncillaryFeature.<lambda>>, priority=0, data=None, identifier=None)[source]

A data feature that is computed from existing data

Parameters
  • feature_name (str) – The name of the ancillary feature, e.g. “emodulus”.

  • method (callable) – The method that computes the feature. This method takes an instance of RTDCBase as argument.

  • req_config (list) – Required configuration parameters to compute the feature, e.g. [“calculation”, [“emodulus lut”, “emodulus viscosity”]]

  • req_features (list) – Required existing features in the dataset, e.g. [“area_cvx”, “deform”]

  • req_func (callable) –

    A function that takes an instance of RTDCBase as an argument and checks whether any other necessary criteria are met. By default, this is a lambda function that returns True. The function should return False if the necessary criteria are not met. This function may also return a hashable object (via dclab.util.objstr()) instead of True, if the criteria are subject to change. In this case, the return value is used for identifying the cached ancillary feature.

    Changed in version 0.27.0: Support non-boolean return values for caching purposes.

  • priority (int) – The priority of the feature; if there are multiple AncillaryFeature defined for the same feature_name, then the priority of the features defines which feature returns True in self.is_available. A higher value means a higher priority.

  • data (object) – Any other data relevant for the feature (e.g. the ML model for computing ‘ml_score_xxx’ features)

  • identifier (None or str) – A unique identifier (e.g. MD5 hash) of the ancillary feature. For PluginFeatures or ML features, this should be computed at least from the input file and the feature name.

Notes

req_config and req_features are used to test whether the feature can be computed in self.is_available.

static available_features(rtdc_ds)[source]

Determine available features for an RT-DC dataset

Parameters

rtdc_ds (instance of RTDCBase) – The dataset to check availability for

Returns

features – Dictionary with feature names as keys and instances of AncillaryFeature as values.

Return type

dict

static check_data_size(rtdc_ds, data_dict)[source]

Check the feature data is the correct size. If it isn’t, resize it.

Parameters
  • rtdc_ds (instance of RTDCBase) – The dataset from which the features are computed

  • data_dict (dict) – Dictionary with AncillaryFeature.feature_name as keys and the computed data features (to be resized) as values.

Returns

data_dict – Dictionary with feature_name as keys and the correctly resized data features as values.

Return type

dict

compute(rtdc_ds)[source]

Compute the feature with self.method. All ancillary features that share the same method will also be populated automatically.

Parameters

rtdc_ds (instance of RTDCBase) – The dataset to compute the feature for

Returns

data_dict – Dictionary with AncillaryFeature.feature_name as keys and the computed data features (read-only) as values.

Return type

dict

static get_instances(feature_name)[source]

Return all instances that compute feature_name

hash(rtdc_ds)[source]

Used for identifying an ancillary computation

The data columns and the used configuration keys/values are hashed.

is_available(rtdc_ds, verbose=False)[source]

Check whether the feature is available

Parameters

rtdc_ds (instance of RTDCBase) – The dataset to check availability for

Returns

availableTrue, if feature can be computed with compute

Return type

bool

Notes

This method returns False for a feature if there is a feature defined with the same name but with higher priority (even if the feature would be available otherwise).

feature_names = ['time', 'index', 'area_ratio', 'area_um', 'aspect', 'deform', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'fl1_max_ctc', 'fl2_max_ctc', 'fl3_max_ctc', 'fl1_max_ctc', 'fl2_max_ctc', 'fl1_max_ctc', 'fl3_max_ctc', 'fl2_max_ctc', 'fl3_max_ctc', 'contour', 'bright_avg', 'bright_sd', 'inert_ratio_cvx', 'inert_ratio_prnc', 'inert_ratio_raw', 'tilt', 'volume', 'ml_class', 'circ_times_area', 'area_exp']

All feature names registered

features

All ancillary features registered

Plugin features

New in version 0.34.0.

exception dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.PluginImportError[source]
class dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.PlugInFeature(feature_name, info, plugin_path=None)[source]

A user-defined plugin feature

Parameters
  • feature_name (str) – name of a feature that matches that defined in info

  • info (dict) –

    Full plugin recipe (for all features) as given in the info dictionary in the plugin file. At least the following keys must be specified:

    • ”method”: callable function

    • ”feature names”: list of feature names

  • plugin_path (str or Path, optional) – path which was used to load the PlugInFeature with load_plugin_feature().

Notes

PluginFeature inherits from AncillaryFeature. Please read the advanced section on PluginFeatures in the dclab docs.

feature_name

Plugin feature name

plugin_feature_info

Dictionary containing all information relevant for this particular plugin feature instance

plugin_path

Path to the original plugin file

dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.import_plugin_feature_script(plugin_path)[source]

Find the user-defined recipe and return the info dictionary

Parameters

plugin_path (str or Path) – pathname to a valid dclab plugin script

Returns

info – dictionary with the information required to instantiate one (or multiple) PlugInFeature

Return type

dict

Raises

PluginImportError – If the plugin can not be found

dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.load_plugin_feature(plugin_path)[source]

Find and load PlugInFeature(s) from a user-defined recipe

Parameters

plugin_path (str or Path) – pathname to a valid dclab plugin Python script

Returns

plugin_list – list of PlugInFeature instances loaded from plugin_path

Return type

list of PlugInFeature

Raises

ValueError – If the script dictionary “feature names” are not a list

See also

import_plugin_feature_script

function that imports the plugin script

PlugInFeature

class handling the plugin feature information

dclab.rtdc_dataset.feat_temp.register_temporary_feature

alternative method for creating user-defined features

dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.remove_all_plugin_features()[source]

Convenience function for removing all PlugInFeature instances

See also

remove_plugin_feature

remove a single PlugInFeature instance

dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.remove_plugin_feature(plugin_instance)[source]

Convenience function for removing a PlugInFeature instance

Parameters

plugin_instance (PlugInFeature) – The PlugInFeature instance to be removed from dclab

Raises

TypeError – If the plugin_instance is not a PlugInFeature instance

Temporary features

New in version 0.33.0.

dclab.rtdc_dataset.feat_temp.deregister_all()[source]

Deregisters all temporary features

dclab.rtdc_dataset.feat_temp.deregister_temporary_feature(feature)[source]

Convenience function for deregistering a temporary feature

This method is mostly used during testing. It does not remove the actual feature data from any dataset; the data will stay in memory but is not accessible anymore through the public methods of the RTDCBase user interface.

dclab.rtdc_dataset.feat_temp.register_temporary_feature(feature, label=None, is_scalar=True)[source]

Register a new temporary feature

Temporary features are custom features that can be defined ad hoc by the user. Temporary features are helpful when the integral features are not enough, e.g. for prototyping, testing, or collating with other data. Temporary features allow you to leverage the full functionality of RTDCBase with your custom features (no need to go for a custom pandas.Dataframe).

Parameters
  • feature (str) – Feature name; allowed characters are lower-case letters, digits, and underscores

  • label (str) – Feature label used e.g. for plotting

  • is_scalar (bool) – Whether or not the feature is a scalar feature

dclab.rtdc_dataset.feat_temp.set_temporary_feature(rtdc_ds, feature, data)[source]

Set temporary feature data for a dataset

Parameters
  • rtdc_ds (dclab.RTDCBase) – Dataset for which to set the feature. Note that temporary features cannot be set for hierarchy children and that the length of the feature data must match the number of events in rtdc_ds.

  • feature (str) – Feature name

  • data (np.ndarray) – The data

Config

class dclab.rtdc_dataset.config.Configuration(files=None, cfg=None, disable_checks=False)[source]

Configuration class for RT-DC datasets

This class has a dictionary-like interface to access and set configuration values, e.g.

cfg = load_from_file("/path/to/config.txt")
# access the channel width
cfg["setup"]["channel width"]
# modify the channel width
cfg["setup"]["channel width"] = 30
Parameters
  • files (list of files) – The config files with which to initialize the configuration

  • cfg (dict-like) – The dictionary with which to initialize the configuration

  • disable_checks (bool) – Set this to True if you want to avoid checking against section and key names defined in dclab.definitions using verify_section_key(). This avoids excess warning messages when loading data from configuration files not generated by dclab.

copy()[source]

Return copy of current configuration

get(key, other)[source]

Famous dict.get function

New in version 0.29.1.

keys()[source]

Return the configuration keys (sections)

save(filename)[source]

Save the configuration to a file

tojson()[source]

Convert the configuration to a JSON string

Note that the data type of some configuration options will likely be lost.

tostring(sections=None)[source]

Convert the configuration to its string representation

The optional argument sections allows to export only specific sections of the configuration, i.e. sections=dclab.dfn.CFG_METADATA will only export configuration data from the original measurement and no filtering data.

update(newcfg)[source]

Update current config with a dictionary

dclab.rtdc_dataset.config.load_from_file(cfg_file)[source]

Load the configuration from a file

Parameters

cfg_file (str) – Path to configuration file

Returns

cfg – Dictionary with configuration parameters

Return type

ConfigurationDict

Export

exception dclab.rtdc_dataset.export.LimitingExportSizeWarning[source]
class dclab.rtdc_dataset.export.Export(rtdc_ds)[source]

Export functionalities for RT-DC datasets

avi(path, filtered=True, override=False)[source]

Exports filtered event images to an avi file

Parameters
  • path (str) – Path to a .avi file. The ending .avi is added automatically.

  • filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.

  • override (bool) – If set to True, an existing file path will be overridden. If set to False, raises OSError if path exists.

Notes

Raises OSError if current dataset does not contain image data

fcs(path, features, meta_data=None, filtered=True, override=False)[source]

Export the data of an RT-DC dataset to an .fcs file

Parameters
  • path (str) – Path to an .fcs file. The ending .fcs is added automatically.

  • features (list of str) – The features in the resulting .fcs file. These are strings that are defined by dclab.definitions.scalar_feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “aspect”.

  • meta_data (dict) – User-defined, optional key-value pairs that are stored in the primary TEXT segment of the FCS file; the version of dclab is stored there by default

  • filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.

  • override (bool) – If set to True, an existing file path will be overridden. If set to False, raises OSError if path exists.

Notes

Due to incompatibility with the .fcs file format, all events with NaN-valued features are not exported.

hdf5(path, features, filtered=True, override=False, compression='gzip')[source]

Export the data of the current instance to an HDF5 file

Parameters
  • path (str) – Path to an .rtdc file. The ending .rtdc is added automatically.

  • features (list of str) – The features in the resulting .rtdc file. These are strings that are defined by dclab.definitions.feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “image”.

  • filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.

  • override (bool) – If set to True, an existing file path will be overridden. If set to False, raises OSError if path exists.

  • compression (str or None) – Compression method for e.g. “contour”, “image”, and “trace” data as well as logs; one of [None, “lzf”, “gzip”, “szip”].

tsv(path, features, meta_data=None, filtered=True, override=False)[source]

Export the data of the current instance to a .tsv file

Parameters
  • path (str) – Path to a .tsv file. The ending .tsv is added automatically.

  • features (list of str) – The features in the resulting .tsv file. These are strings that are defined by dclab.definitions.scalar_feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “aspect”.

  • meta_data (dict) – User-defined, optional key-value pairs that are stored at the beginning of the tsv file - one key-value pair is stored per line which starts with a hash. The version of dclab is stored there by default.

  • filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.

  • override (bool) – If set to True, an existing file path will be overridden. If set to False, raises OSError if path exists.

Filter

class dclab.rtdc_dataset.filter.Filter(rtdc_ds)[source]

Boolean filter arrays for RT-DC measurements

Parameters

rtdc_ds (instance of RTDCBase) – The RT-DC dataset the filter applies to

reset()[source]

Reset all filters

update(rtdc_ds, force=[])[source]

Update the filters according to rtdc_ds.config[“filtering”]

Parameters
  • rtdc_ds (dclab.rtdc_dataset.core.RTDCBase) – The measurement to which the filter is applied

  • force (list) – A list of feature names that must be refiltered with min/max values.

Notes

This function is called when ds.apply_filter is called.

Low-level functionalities

downsampling

Content-based downsampling of ndarrays

dclab.downsampling.downsample_rand(a, samples, remove_invalid=False, ret_idx=False)[source]

Downsampling by randomly removing points

Parameters
  • a (1d ndarray) – The input array to downsample

  • samples (int) – The desired number of samples

  • remove_invalid (bool) – Remove nan and inf values before downsampling

  • ret_idx (bool) – Also return a boolean array that corresponds to the downsampled indices in a.

Returns

  • dsa (1d ndarray of size samples) – The pseudo-randomly downsampled array a

  • idx (1d boolean array with same shape as a) – Only returned if ret_idx is True. A boolean array such that a[idx] == dsa

dclab.downsampling.norm(a, ref1, ref2)[source]

Normalize a with min/max values of ref1, using all elements of ref1 where the ref1 and ref2 are not nan or inf

dclab.downsampling.valid(a, b)[source]

Check whether a and b are not inf or nan

features

image-based

dclab.features.contour.get_contour(mask)[source]

Compute the image contour from a mask

The contour is computed in a very inefficient way using scikit-image and a conversion of float coordinates to pixel coordinates.

Parameters

mask (binary ndarray of shape (M,N) or (K,M,N)) – The mask outlining the pixel positions of the event. If a 3d array is given, then K indexes the individual contours.

Returns

cont – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.

Return type

ndarray or list of K ndarrays of shape (J,2)

dclab.features.bright.get_bright(mask, image, ret_data='avg,sd')[source]

Compute avg and/or std of the event brightness

The event brightness is defined by the gray-scale values of the image data within the event mask area.

Parameters
  • mask (ndarray or list of ndarrays of shape (M,N) and dtype bool) – The mask values, True where the event is located in image.

  • image (ndarray or list of ndarrays of shape (M,N)) – A 2D array that holds the image in form of grayscale values of an event.

  • ret_data (str) – A comma-separated list of metrices to compute - “avg”: compute the average - “sd”: compute the standard deviation Selected metrics are returned in alphabetical order.

Returns

  • bright_avg (float or ndarray of size N) – Average image data within the contour

  • bright_std (float or ndarray of size N) – Standard deviation of image data within the contour

dclab.features.inert_ratio.get_inert_ratio_cvx(cont)[source]

Compute the inertia ratio of the convex hull of a contour

The inertia ratio is computed from the central second order of moments along x (mu20) and y (mu02) via sqrt(mu20/mu02).

Parameters

cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.

Returns

inert_ratio_cvx – The inertia ratio of the contour’s convex hull

Return type

float or ndarray of size N

Notes

The contour moments mu20 and mu02 are computed the same way they are computed in OpenCV’s moments.cpp.

See also

get_inert_ratio_raw

Compute inertia ratio of a raw contour

References

dclab.features.inert_ratio.get_inert_ratio_raw(cont)[source]

Compute the inertia ratio of a contour

The inertia ratio is computed from the central second order of moments along x (mu20) and y (mu02) via sqrt(mu20/mu02).

Parameters

cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.

Returns

inert_ratio_raw – The inertia ratio of the contour

Return type

float or ndarray of size N

Notes

The contour moments mu20 and mu02 are computed the same way they are computed in OpenCV’s moments.cpp.

See also

get_inert_ratio_cvx

Compute inertia ratio of the convex hull of a contour

References

dclab.features.volume.get_volume(cont, pos_x, pos_y, pix, fix_orientation=False)[source]

Calculate the volume of a polygon revolved around an axis

The volume estimation assumes rotational symmetry.

Parameters
  • cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event [px] e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.

  • pos_x (float or ndarray of length N) – The x coordinate(s) of the centroid of the event(s) [µm] e.g. obtained using mm.pos_x

  • pos_y (float or ndarray of length N) – The y coordinate(s) of the centroid of the event(s) [µm] e.g. obtained using mm.pos_y

  • pix (float) – The detector pixel size in µm. e.g. obtained using: mm.config[“imaging”][“pixel size”]

  • fix_orientation (bool) – If set to True, make sure that the orientation of the contour is counter-clockwise in the r-z plane (see vol_revolve()). This is False by default, because (1) Shape-In always stores the contours in the correct orientation and (2) there may be events with high porosity where “fixing” the orientation makes things worse and a negative volume is returned.

Returns

volume – volume in um^3

Return type

float or ndarray

Notes

The computation of the volume is based on a full rotation of the upper and the lower halves of the contour from which the average is then used.

The volume is computed radially from the the center position given by (pos_x, pos_y). For sufficiently smooth contours, such as densely sampled ellipses, the center position does not play an important role. For contours that are given on a coarse grid, as is the case for RT-DC, the center position must be given.

References

dclab.features.volume.counter_clockwise(cx, cy)[source]

Put contour coordinates into counter-clockwise order

Parameters
  • cx (1d ndarrays) – The x- and y-coordinates of the contour

  • cy (1d ndarrays) – The x- and y-coordinates of the contour

Returns

The x- and y-coordinates of the contour in counter-clockwise orientation.

Return type

cx_cc, cy_cc

Notes

The contour must be centered around (0, 0).

dclab.features.volume.vol_revolve(r, z, point_scale=1.0)[source]

Calculate the volume of a polygon revolved around the Z-axis

This implementation yields the same results as the volRevolve Matlab function by Geoff Olynyk (from 2012-05-03) https://de.mathworks.com/matlabcentral/fileexchange/36525-volrevolve.

The difference here is that the volume is computed using (a much more approachable) implementation using the volume of a truncated cone (https://de.wikipedia.org/wiki/Kegelstumpf).

\[V = \frac{h \cdot \pi}{3} \cdot (R^2 + R \cdot r + r^2)\]

Where \(h\) is the height of the cone and \(r\) and R are the smaller and larger radii of the truncated cone.

Each line segment of the contour resembles one truncated cone. If the z-step is positive (counter-clockwise contour), then the truncated cone volume is added to the total volume. If the z-step is negative (e.g. inclusion), then the truncated cone volume is removed from the total volume.

Changed in version 0.37.0: The volume in previous versions was overestimated by on average 2µm³.

Parameters
  • r (1d np.ndarray) – radial coordinates (perpendicular to the z axis)

  • z (1d np.ndarray) – coordinate along the axis of rotation

  • point_scale (float) – point size in your preferred units; The volume is multiplied by a factor of point_scale**3.

Notes

The coordinates must be given in counter-clockwise order, otherwise the volume will be negative.

emodulus

Computation of apparent Young’s modulus for RT-DC measurements

exception dclab.features.emodulus.KnowWhatYouAreDoingWarning[source]
exception dclab.features.emodulus.YoungsModulusLookupTableExceededWarning[source]
dclab.features.emodulus.extrapolate_emodulus(lut, datax, deform, emod, deform_norm, deform_thresh=0.05, inplace=True)[source]

Use spline interpolation to fill in nan-values

When points (datax, deform) are outside the convex hull of the lut, then scipy.interpolate.griddata() returns nan-valules.

With this function, some of these nan-values are extrapolated using scipy.interpolate.SmoothBivariateSpline. The supported extrapolation values are currently limited to those where the deformation is above 0.05.

A warning will be issued, because this is not really recommended.

Parameters
  • lut (ndarray of shape (N, 3)) – The normalized (!! see normalize()) LUT (first axis is points, second axis enumerates datax, deform, and emodulus)

  • datax (ndarray of size N) – The normalized x data (corresponding to lut[:, 0])

  • deform (ndarray of size N) – The normalized deform (corresponding to lut[:, 1])

  • emod (ndarray of size N) – The emodulus (corresponding to lut[:, 2]); If emod does not contain nan-values, there is nothing to do here.

  • deform_norm (float) – The normalization value used to normalize lut[:, 1] and deform.

  • deform_thresh (float) – Not the entire LUT is used for bivariate spline interpolation. Only the points where lut[:, 1] > deform_thresh/deform_norm are used. This is necessary, because for small deformations, the LUT has an extreme slope that kills any meaningful spline interpolation.

  • inplace (bool) – If True (default), replaces nan values in emod in-place. If False, emod is not modified.

dclab.features.emodulus.get_emodulus(area_um=None, deform=None, volume=None, medium='CellCarrier', channel_width=20.0, flow_rate=0.16, px_um=0.34, temperature=23.0, lut_data='FEM-2Daxis', extrapolate=False, copy=True)[source]

Compute apparent Young’s modulus using a look-up table

Parameters
  • area_um (float or ndarray) – Apparent (2D image) area [µm²] of the event(s)

  • deform (float or ndarray) – Deformation (1-circularity) of the event(s)

  • volume (float or ndarray) –

    Apparent volume of the event(s). It is not possible to define volume and area_um at the same time (makes no sense).

    New in version 0.25.0.

  • medium (str or float) – The medium to compute the viscosity for. If a string is given, the viscosity is computed. If a float is given, this value is used as the viscosity in mPa*s (Note that temperature must be set to None in this case).

  • channel_width (float) – The channel width [µm]

  • flow_rate (float) – Flow rate [µL/s]

  • px_um (float) – The detector pixel size [µm] used for pixelation correction. Set to zero to disable.

  • temperature (float, ndarray, or None) – Temperature [°C] of the event(s)

  • lut_data (path, str, or tuple of (np.ndarray of shape (N, 3), dict)) –

    The LUT data to use. If it is a key in INTERNAL_LUTS, then the respective LUT will be used. Otherwise, a path to a file on disk or a tuple (LUT array, meta data) is possible. The LUT meta data is used to check whether the given features (e.g. area_um and deform) are valid interpolation choices.

    New in version 0.25.0.

  • extrapolate (bool) – Perform extrapolation using extrapolate_emodulus(). This is discouraged!

  • copy (bool) – Copy input arrays. If set to false, input arrays are overridden.

Returns

elasticity – Apparent Young’s modulus in kPa

Return type

float or ndarray

Notes

  • The look-up table used was computed with finite elements methods according to [MMM+17] and complemented with analytical isoelastics from [MOG+15]. The original simulation results are available on figshare [WMM+20].

  • The computation of the Young’s modulus takes into account a correction for the viscosity (medium, channel width, flow rate, and temperature) [MOG+15] and a correction for pixelation for the deformation which were derived from a (pixelated) image [Her17].

  • Note that while deformation is pixelation-corrected, area_um and volume are scaled to match the LUT data. This is somewhat fortunate, because we don’t have to worry about the order of applying pixelation correction and scale conversion.

  • By using external LUTs, it is possible to interpolate on the volume-deformation plane. This feature was added in version 0.25.0.

See also

dclab.features.emodulus.viscosity.get_viscosity

compute viscosity for known media

dclab.features.emodulus.normalize(data, dmax)[source]

Perform normalization in-place for interpolation

Note that scipy.interpolate.griddata() has a rescale option which rescales the data onto the unit cube. For some reason this does not work well with LUT data, so we just normalize it by dividing by the maximum value.

dclab.features.emodulus.INACCURATE_SPLINE_EXTRAPOLATION = False

Set this to True to globally enable spline extrapolation when the area_um/deform data are outside of a LUT. This is discouraged and a KnowWhatYouAreDoingWarning warning will be issued.

dclab.features.emodulus.load.get_lut_path(path_or_id)[source]

Find the path to a LUT

path_or_id: str or pathlib.Path

Identifier of a LUT. This can be either an existing path (checked first), or an internal identifier (see INTERNAL_LUTS).

dclab.features.emodulus.load.load_lut(lut_data='LE-2D-FEM-19')[source]

Load LUT data from disk

Parameters

lut_data (path, str, or tuple of (np.ndarray of shape (N, 3), dict)) – The LUT data to use. If it is a key in INTERNAL_LUTS, then the respective LUT will be used. Otherwise, a path to a file on disk or a tuple (LUT array, meta data) is possible.

Returns

  • lut (np.ndarray of shape (N, 3)) – The LUT data for interpolation

  • meta (dict) – The LUT metadata

Notes

If lut_data is a tuple of (lut, meta), then nothing is actually done (this is implemented for user convenience).

dclab.features.emodulus.load.load_mtext(path)[source]

Load column-based data from text files with metadata

This file format is used for isoelasticity lines and look-up table data in dclab.

The text file is loaded with numpy.loadtxt. The metadata are stored as a json string between the “BEGIN METADATA” and the “END METADATA” tags. The last comment (#) line before the actual data defines the features with units in square brackets and tab-separated. For instance:

# […] # # BEGIN METADATA # { # “authors”: “A. Mietke, C. Herold, J. Guck”, # “channel_width”: 20.0, # “channel_width_unit”: “um”, # “date”: “2018-01-30”, # “dimensionality”: “2Daxis”, # “flow_rate”: 0.04, # “flow_rate_unit”: “uL/s”, # “fluid_viscosity”: 15.0, # “fluid_viscosity_unit”: “mPa s”, # “identifier”: “LE-2D-ana-18”, # “method”: “analytical”, # “model”: “linear elastic”, # “publication”: “https://doi.org/10.1016/j.bpj.2015.09.006”, # “software”: “custom Matlab code”, # “summary”: “2D-axis-symmetric analytical solution” # } # END METADATA # # […] # # area_um [um^2] deform emodulus [kPa] 3.75331e+00 5.14496e-03 9.30000e-01 4.90368e+00 6.72683e-03 9.30000e-01 6.05279e+00 8.30946e-03 9.30000e-01 7.20064e+00 9.89298e-03 9.30000e-01 […]

dclab.features.emodulus.load.register_lut(path, identifier=None)[source]

Register an external LUT file in dclab

This will add it to EXTERNAL_LUTS, which is required for emodulus computation as an ancillary feature.

Parameters
  • path (str or pathlib.Path) – Path to the external LUT file

  • identifier (str or None) – The identifier is used for ancillary emodulus computation via the [calculation]: “emodulus lut” key. It is also used as the key in EXTERNAL_LUTS during registration. If not specified, (default) then the identifier given as JSON metadata in path is used.

dclab.features.emodulus.load.EXTERNAL_LUTS = {}

Dictionary of look-up tables that the user added via register_lut().

dclab.features.emodulus.load.INTERNAL_LUTS = {'LE-2D-FEM-19': 'emodulus_lut_LE-2D-FEM-19.txt'}

Dictionary of look-up tables shipped with dclab.

Pixelation correction definitions

dclab.features.emodulus.pxcorr.corr_deform_with_area_um(area_um, px_um=0.34)[source]

Deformation correction for area_um-deform data

The contour in RT-DC measurements is computed on a pixelated grid. Due to sampling problems, the measured deformation is overestimated and must be corrected.

The correction formula is described in [Her17].

Parameters
  • area_um (float or ndarray) – Apparent (2D image) area in µm² of the event(s)

  • px_um (float) – The detector pixel size in µm.

Returns

deform_delta – Error of the deformation of the event(s) that must be subtracted from deform. deform_corr = deform - deform_delta

Return type

float or ndarray

dclab.features.emodulus.pxcorr.corr_deform_with_volume(volume, px_um=0.34)[source]

Deformation correction for volume-deform data

The contour in RT-DC measurements is computed on a pixelated grid. Due to sampling problems, the measured deformation is overestimated and must be corrected.

The correction is derived in scripts/pixelation_correction.py.

Parameters
  • volume (float or ndarray) – The “volume” feature (rotation of raw contour) [µm³]

  • px_um (float) – The detector pixel size in µm.

Returns

deform_delta – Error of the deformation of the event(s) that must be subtracted from deform. deform_corr = deform - deform_delta

Return type

float or ndarray

dclab.features.emodulus.pxcorr.get_pixelation_delta(feat_corr, feat_absc, data_absc, px_um=0.34)[source]

Convenience function for obtaining pixelation correction

Parameters
  • feat_corr (str) – Feature for which to compute the pixelation correction (e.g. “deform”)

  • feat_absc (str) – Feature with which to compute the correction (e.g. “area_um”);

  • data_absc (ndarray or float) – Corresponding data for feat_absc

  • px_um (float) – Detector pixel size [µm]

dclab.features.emodulus.pxcorr.get_pixelation_delta_pair(feat1, feat2, data1, data2, px_um=0.34)[source]

Convenience function that returns pixelation correction pair

Scale conversion applicable to a linear elastic model

dclab.features.emodulus.scale_linear.convert(area_um, deform, channel_width_in, channel_width_out, emodulus=None, flow_rate_in=None, flow_rate_out=None, viscosity_in=None, viscosity_out=None, inplace=False)[source]

convert area-deformation-emodulus triplet

The conversion formula is described in [MOG+15].

Parameters
  • area_um (ndarray) – Convex cell area [µm²]

  • deform (ndarray) – Deformation

  • channel_width_in (float) – Original channel width [µm]

  • channel_width_out (float) – Target channel width [µm]

  • emodulus (ndarray) – Young’s Modulus [kPa]

  • flow_rate_in (float) – Original flow rate [µL/s]

  • flow_rate_out (float) – Target flow rate [µL/s]

  • viscosity_in (float) – Original viscosity [mPa*s]

  • viscosity_out (float or ndarray) – Target viscosity [mPa*s]; This can be an array

  • inplace (bool) – If True, override input arrays with corrected data

Returns

  • area_um_corr (ndarray) – Corrected cell area [µm²]

  • deform_corr (ndarray) – Deformation (a copy if inplace is False)

  • emodulus_corr (ndarray) – Corrected emodulus [kPa]; only returned if emodulus is given.

Notes

If only area_um, deform, channel_width_in and channel_width_out are given, then only the area is corrected and returned together with the original deform. If all other arguments are not set to None, the emodulus is corrected and returned as well.

dclab.features.emodulus.scale_linear.scale_area_um(area_um, channel_width_in, channel_width_out, inplace=False, **kwargs)[source]

Perform scale conversion for area_um (linear elastic model)

The area scales with the characteristic length “channel radius” L according to (L’/L)².

The conversion formula is described in [MOG+15].

Parameters
  • area_um (ndarray) – Convex area [µm²]

  • channel_width_in (float) – Original channel width [µm]

  • channel_width_out (float) – Target channel width [µm]

  • inplace (bool) – If True, override input arrays with corrected data

  • kwargs – not used

Returns

area_um_corr – Scaled area [µm²]

Return type

ndarray

dclab.features.emodulus.scale_linear.scale_emodulus(emodulus, channel_width_in, channel_width_out, flow_rate_in, flow_rate_out, viscosity_in, viscosity_out, inplace=False)[source]

Perform scale conversion for area_um (linear elastic model)

The conversion formula is described in [MOG+15].

Parameters
  • emodulus (ndarray) – Young’s Modulus [kPa]

  • channel_width_in (float) – Original channel width [µm]

  • channel_width_out (float) – Target channel width [µm]

  • flow_rate_in (float) – Original flow rate [µL/s]

  • flow_rate_out (float) – Target flow rate [µL/s]

  • viscosity_in (float) – Original viscosity [mPa*s]

  • viscosity_out (float or ndarray) – Target viscosity [mPa*s]; This can be an array

  • inplace (bool) – If True, override input arrays with corrected data

Returns

emodulus_corr – Scaled emodulus [kPa]

Return type

ndarray

dclab.features.emodulus.scale_linear.scale_feature(feat, data, inplace=False, **scale_kw)[source]

Convenience function for scale conversions (linear elastic model)

This method wraps around all the other scale_* methods and also supports deform/circ.

Parameters
  • feat (str) – Valid scalar feature name

  • data (float or ndarray) – Feature data

  • inplace (bool) – If True, override input arrays with corrected data

  • **scale_kw – Scale keyword arguments for the wrapped methods

dclab.features.emodulus.scale_linear.scale_volume(volume, channel_width_in, channel_width_out, inplace=False, **kwargs)[source]

Perform scale conversion for volume (linear elastic model)

The volume scales with the characteristic length “channel radius” L according to (L’/L)³.

Parameters
  • volume (ndarray) – Volume [µm³]

  • channel_width_in (float) – Original channel width [µm]

  • channel_width_out (float) – Target channel width [µm]

  • inplace (bool) – If True, override input arrays with corrected data

  • kwargs – not used

Returns

volume_corr – Scaled volume [µm³]

Return type

ndarray

Viscosity computation for various media

exception dclab.features.emodulus.viscosity.TemperatureOutOfRangeWarning[source]
dclab.features.emodulus.viscosity.get_viscosity(medium='CellCarrier', channel_width=20.0, flow_rate=0.16, temperature=23.0)[source]

Returns the viscosity for RT-DC-specific media

Media that are not pure (e.g. ketchup or polymer solutions) often exhibit a non-linear relationship between shear rate (determined by the velocity profile) and shear stress (determined by pressure differences). If the shear stress grows non-linearly with the shear rate resulting in a slope in log-log space that is less than one, then we are talking about shear thinning. The viscosity is not a constant anymore (as it is e.g. for water). At higher flow rates, the viscosity becomes smaller, following a power law. Christoph Herold characterized shear thinning for the CellCarrier media [Her17]. The resulting formulae for computing the viscosities of these media at different channel widths, flow rates, and temperatures, are implemented here.

Parameters
  • medium (str) – The medium to compute the viscosity for; Valid values are defined in KNOWN_MEDIA.

  • channel_width (float) – The channel width in µm

  • flow_rate (float) – Flow rate in µL/s

  • temperature (float or ndarray) – Temperature in °C

Returns

viscosity – Viscosity in mPa*s

Return type

float or ndarray

Notes

  • CellCarrier and CellCarrier B media are optimized for RT-DC measurements.

  • Values for the viscosity of water are computed using equation (15) from [KSW78].

  • A TemperatureOutOfRangeWarning is issued if the input temperature range exceeds the temperature ranges given by [Her17] and [KSW78].

dclab.features.emodulus.viscosity.KNOWN_MEDIA = ['CellCarrier', 'CellCarrierB', 'water']

Media for which computation of viscosity is defined

fluorescence

dclab.features.fl_crosstalk.correct_crosstalk(fl1, fl2, fl3, fl_channel, ct21=0, ct31=0, ct12=0, ct32=0, ct13=0, ct23=0)[source]

Perform crosstalk correction

Parameters
  • fli (int, float, or np.ndarray) – Measured fluorescence signals

  • fl_channel (int (1, 2, or 3)) – The channel number for which the crosstalk-corrected signal should be computed

  • cij (float) – Spill (crosstalk or bleed-through) from channel i to channel j This spill is computed from the fluorescence signal of e.g. single-stained positive control cells; It is defined by the ratio of the fluorescence signals of the two channels, i.e cij = flj / fli.

See also

get_compensation_matrix

compute the inverse crosstalk matrix

Notes

If there are only two channels (e.g. fl1 and fl2), then the crosstalk to and from the other channel (ct31, ct32, ct13, ct23) should be set to zero.

dclab.features.fl_crosstalk.get_compensation_matrix(ct21, ct31, ct12, ct32, ct13, ct23)[source]

Compute crosstalk inversion matrix

The spillover matrix is

| c11 c12 c13 |
| c21 c22 c23 |
| c31 c32 c33 |

The diagonal elements are set to 1, i.e.

ct11 = c22 = c33 = 1

Parameters

cij (float) – Spill from channel i to channel j

Returns

inv – Compensation matrix (inverted spillover matrix)

Return type

np.ndarray

isoelastics

Isoelastics management

exception dclab.isoelastics.IsoelasticsEmodulusMeaninglessWarning[source]
class dclab.isoelastics.AutoRecursiveDict[source]
class dclab.isoelastics.Isoelastics(paths=None)[source]

Isoelasticity line management

Parameters
  • paths (list of pathlib.Path or list of str) – list of paths to files containing isoelasticity lines (see e.g. ISOFILES)

  • versionchanged: (.) – 0.24.0: The isoelasticity lines of the analytical model [MOG+15] and the linear-elastic numerical model [MMM+17] were recomputed with an equidistant spacing. The metadata section of the text file format was restructured.

add(isoel, col1, col2, channel_width, flow_rate, viscosity, method=None, lut_identifier=None)[source]

Add isoelastics

Parameters
  • isoel (list of ndarrays) – Each list item resembles one isoelastic line stored as an array of shape (N,3). The last column contains the emodulus data.

  • col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])

  • col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])

  • channel_width (float) – Channel width in µm

  • flow_rate (float) – Flow rate through the channel in µL/s

  • viscosity (float) – Viscosity of the medium in mPa*s

  • method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.

  • lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function get_available_identifiers() returns a list of available identifiers.

Notes

The following isoelastics are automatically added for user convenience:

  • isoelastics with col1 and col2 interchanged

  • isoelastics for circularity if deformation was given

static add_px_err(isoel, col1, col2, px_um, inplace=False)[source]

Undo pixelation correction

Since isoelasticity lines are usually computed directly from the simulation data (e.g. the contour data are not discretized on a grid but are extracted from FEM simulations), they are not affected by pixelation effects as described in [Her17].

If the isoelasticity lines are displayed alongside experimental data (which are affected by pixelation effects), then the lines must be “un”-corrected, i.e. the pixelation error must be added to the lines to match the experimental data.

Parameters
  • isoel (list of 2d ndarrays of shape (N, 3)) – Each item in the list corresponds to one isoelasticity line. The first column is defined by col1, the second by col2, and the third column is the emodulus.

  • col1 (str) – Define the fist two columns of each isoelasticity line.

  • col2 (str) – Define the fist two columns of each isoelasticity line.

  • px_um (float) – Pixel size [µm]

  • inplace (bool) – If True, do not create a copy of the data in isoel

static convert(isoel, col1, col2, channel_width_in, channel_width_out, flow_rate_in, flow_rate_out, viscosity_in, viscosity_out, inplace=False)[source]

Perform isoelastics scale conversion

Parameters
  • isoel (list of 2d ndarrays of shape (N, 3)) – Each item in the list corresponds to one isoelasticity line. The first column is defined by col1, the second by col2, and the third column is the emodulus.

  • col1 (str) – Define the fist to columns of each isoelasticity line. One of [“area_um”, “circ”, “deform”]

  • col2 (str) – Define the fist to columns of each isoelasticity line. One of [“area_um”, “circ”, “deform”]

  • channel_width_in (float) – Original channel width [µm]

  • channel_width_out (float) – Target channel width [µm]

  • flow_rate_in (float) – Original flow rate [µL/s]

  • flow_rate_out (float) – Target flow rate [µL/s]

  • viscosity_in (float) – Original viscosity [mPa*s]

  • viscosity_out (float) – Target viscosity [mPa*s]

  • inplace (bool) – If True, do not create a copy of the data in isoel

Returns

isoel_scale – The scale-converted isoelasticity lines.

Return type

list of 2d ndarrays of shape (N, 3)

Notes

If only the positions of the isoelastics are of interest and not the value of the elastic modulus, then it is sufficient to supply values for the channel width and set the values for flow rate and viscosity to a constant (e.g. 1).

See also

dclab.features.emodulus.scale_linear.scale_feature

scale conversion method used

get(col1, col2, channel_width, method=None, lut_identifier=None, flow_rate=None, viscosity=None, add_px_err=False, px_um=None)[source]

Get isoelastics

Parameters
  • col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])

  • col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])

  • channel_width (float) – Channel width in µm

  • method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.

  • lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function get_available_identifiers() returns a list of available identifiers.

  • flow_rate (float or None) – Flow rate through the channel in µL/s. If set to None, the flow rate of the imported data will be used (only do this if you do not need the correct values for elastic moduli).

  • viscosity (float or None) – Viscosity of the medium in mPa*s. If set to None, the flow rate of the imported data will be used (only do this if you do not need the correct values for elastic moduli).

  • add_px_err (bool) – If True, add pixelation errors according to C. Herold (2017), https://arxiv.org/abs/1704.00572 and scripts/pixelation_correction.py

  • px_um (float) – Pixel size [µm], used for pixelation error computation

See also

dclab.features.emodulus.scale_linear.scale_feature

scale conversion method used

dclab.features.emodulus.pxcorr.get_pixelation_delta

pixelation correction (applied to the feature data)

get_with_rtdcbase(col1, col2, dataset, method=None, lut_identifier=None, viscosity=None, add_px_err=False)[source]

Convenience method that extracts the metadata from RTDCBase

Parameters
  • col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])

  • col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])

  • method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.

  • lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function get_available_identifiers() returns a list of available identifiers.

  • dataset (dclab.rtdc_dataset.RTDCBase) – The dataset from which to obtain the metadata.

  • viscosity (float, None, or False) – Viscosity of the medium in mPa*s. If set to None, the viscosity is computed from the meta data (medium, flow rate, channel width, temperature) in the [setup] config section. If this is not possible, the flow rate of the imported data is used and a warning will be issued.

  • add_px_err (bool) – If True, add pixelation errors according to C. Herold (2017), https://arxiv.org/abs/1704.00572 and scripts/pixelation_correction.py

load_data(path)[source]

Load isoelastics from a text file

Parameters

path (str or pathlib.Path) – Path to an isoelasticity lines text file

dclab.isoelastics.check_lut_identifier(lut_identifier, method)[source]

Transitional function that can be removed once method is removed

dclab.isoelastics.get_available_identifiers()[source]

Return a list of available LUT identifiers

dclab.isoelastics.get_default()[source]

Return default isoelasticity lines

dclab.isoelastics.ISOFILES = ['/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/0.39.1/lib/python3.8/site-packages/dclab/isoelastics/isoel-linear-2Daxis-analyt-area_um-deform.txt', '/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/0.39.1/lib/python3.8/site-packages/dclab/isoelastics/isoel-linear-2Daxis-FEM-area_um-deform.txt', '/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/0.39.1/lib/python3.8/site-packages/dclab/isoelastics/isoel-linear-2Daxis-FEM-volume-deform.txt']

List of isoelasticity lines in dclab

kde_contours

dclab.kde_contours.find_contours_level(density, x, y, level, closed=False)[source]

Find iso-valued density contours for a given level value

Parameters
  • density (2d ndarray of shape (M, N)) – Kernel density estimate (KDE) for which to compute the contours

  • x (2d ndarray of shape (M, N) or 1d ndarray of size M) – X-values corresponding to density

  • y (2d ndarray of shape (M, N) or 1d ndarray of size M) – Y-values corresponding to density

  • level (float between 0 and 1) – Value along which to find contours in density relative to its maximum

  • closed (bool) – Whether to close contours at the KDE support boundaries

Returns

contours – Contours found for the given level value

Return type

list of ndarrays of shape (P, 2)

See also

skimage.measure.find_contours

Contour finding algorithm used

dclab.kde_contours.get_quantile_levels(density, x, y, xp, yp, q, normalize=True)[source]

Compute density levels for given quantiles by interpolation

For a given 2D density, compute the density levels at which the resulting contours contain the fraction 1-q of all data points. E.g. for a measurement of 1000 events, all contours at the level corresponding to a quantile of q=0.95 (95th percentile) contain 50 events (5%).

Parameters
  • density (2d ndarray of shape (M, N)) – Kernel density estimate for which to compute the contours

  • x (2d ndarray of shape (M, N) or 1d ndarray of size M) – X-values corresponding to density

  • y (2d ndarray of shape (M, N) or 1d ndarray of size M) – Y-values corresponding to density

  • xp (1d ndarray of size D) – Event x-data from which to compute the quantile

  • yp (1d ndarray of size D) – Event y-data from which to compute the quantile

  • q (array_like or float between 0 and 1) – Quantile along which to find contours in density relative to its maximum

  • normalize (bool) – Whether output levels should be normalized to the maximum of density

Returns

level – Contours level(s) corresponding to the given quantile

Return type

np.ndarray or float

Notes

NaN-values events in xp and yp are ignored.

kde_methods

Kernel Density Estimation methods

dclab.kde_methods.bin_num_doane(a)[source]

Compute number of bins based on Doane’s formula

Notes

If the bin width cannot be determined, then a bin number of 5 is returned.

See also

bin_width_doane

method used to compute the bin width

dclab.kde_methods.bin_width_doane(a)[source]

Compute contour spacing based on Doane’s formula

References

Notes

Doane’s formula is actually designed for histograms. This function is kept here for backwards-compatibility reasons. It is highly recommended to use bin_width_percentile() instead.

dclab.kde_methods.bin_width_percentile(a)[source]

Compute contour spacing based on data percentiles

The 10th and the 90th percentile of the input data are taken. The spacing then computes to the difference between those two percentiles divided by 23.

Notes

The Freedman–Diaconis rule uses the interquartile range and normalizes to the third root of len(a). Such things do not work very well for RT-DC data, because len(a) is huge. Here we use just the top and bottom 10th percentiles with a fixed normalization.

dclab.kde_methods.get_bad_vals(x, y)[source]
dclab.kde_methods.ignore_nan_inf(kde_method)[source]

Ignores nans and infs from the input data

Invalid positions in the resulting density are set to nan.

dclab.kde_methods.kde_gauss(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]

Gaussian Kernel Density Estimation

Parameters
  • events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.

  • events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.

  • xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

  • yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

Returns

density – The KDE for the points in (xout, yout)

Return type

ndarray, same shape as xout

See also

None

Notes

This is a wrapped version that ignores nan and inf values.

dclab.kde_methods.kde_histogram(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]

Histogram-based Kernel Density Estimation

Parameters
  • events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.

  • events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.

  • xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

  • yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

  • bins (tuple (binsx, binsy)) – The number of bins to use for the histogram.

Returns

density – The KDE for the points in (xout, yout)

Return type

ndarray, same shape as xout

See also

None, None

Notes

This is a wrapped version that ignores nan and inf values.

dclab.kde_methods.kde_multivariate(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]

Multivariate Kernel Density Estimation

Parameters
  • events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.

  • events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.

  • bw (tuple (bwx, bwy) or None) – The bandwith for kernel density estimation.

  • xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

  • yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

Returns

density – The KDE for the points in (xout, yout)

Return type

ndarray, same shape as xout

See also

None

Notes

This is a wrapped version that ignores nan and inf values.

dclab.kde_methods.kde_none(events_x, events_y, xout=None, yout=None)[source]

No Kernel Density Estimation

Parameters
  • events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.

  • events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.

  • xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

  • yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

Returns

density – The KDE for the points in (xout, yout)

Return type

ndarray, same shape as xout

Notes

This method is a convenience method that always returns ones in the shape that the other methods in this module produce.

polygon_filter

exception dclab.polygon_filter.FilterIdExistsWarning[source]
exception dclab.polygon_filter.PolygonFilterError[source]
class dclab.polygon_filter.PolygonFilter(axes=None, points=None, inverted=False, name=None, filename=None, fileid=0, unique_id=None)[source]

An object for filtering RTDC data based on a polygonial area

Parameters
  • axes (tuple of str) – The axes/features on which the polygon is defined. The first axis is the x-axis. Example: (“area_um”, “deform”).

  • points (array-like object of shape (N,2)) – The N coordinates (x,y) of the polygon. The exact order is important.

  • inverted (bool) – Invert the polygon filter. This parameter is overridden if filename is given.

  • name (str) – A name for the polygon (optional).

  • filename (str) – A path to a .poly file as created by this classes’ save method. If filename is given, all other parameters are ignored.

  • fileid (int) – Which filter to import from the file (starting at 0).

  • unique_id (int) – An integer defining the unique id of the new instance.

Notes

The minimal arguments to this class are either filename OR (axes, points). If filename is set, all parameters are taken from the given .poly file.

static clear_all_filters()[source]

Remove all filters and reset instance counter

copy(invert=False)[source]

Return a copy of the current instance

Parameters

invert (bool) – The copy will be inverted w.r.t. the original

filter(datax, datay)[source]

Filter a set of datax and datay according to self.points

static get_instance_from_id(unique_id)[source]

Get an instance of the PolygonFilter using a unique id

static import_all(path)[source]

Import all polygons from a .poly file.

Returns a list of the imported polygon filters

static instace_exists(unique_id)[source]

Determine whether an instance with this unique id exists

static point_in_poly(p, poly)[source]

Determine whether a point is within a polygon area

Uses the ray casting algorithm.

Parameters
  • p (tuple of floats) – Coordinates of the point

  • poly (array_like of shape (N, 2)) – Polygon (PolygonFilter.points)

Returns

insideTrue, if point is inside.

Return type

bool

Notes

If p lies on a side of the polygon, it is defined as

  • “inside” if it is on the lower or left

  • “outside” if it is on the top or right

Changed in version 0.24.1: The new version uses the cython implementation from scikit-image. In the old version, the inside/outside definition was the other way around. In favor of not having to modify upstram code, the scikit-image version was adapted.

static remove(unique_id)[source]

Remove a polygon filter from PolygonFilter.instances

save(polyfile, ret_fobj=False)[source]

Save all data to a text file (appends data if file exists).

Polyfile can be either a path to a file or a file object that was opened with the write “w” parameter. By using the file object, multiple instances of this class can write their data.

If ret_fobj is True, then the file object will not be closed and returned.

static save_all(polyfile)[source]

Save all polygon filters

static unique_id_exists(pid)[source]

Whether or not a filter with this unique id exists

property hash

Hash of axes, points, and inverted

instances = [<dclab.polygon_filter.PolygonFilter object>]
property points
dclab.polygon_filter.get_polygon_filter_names()[source]

Get the names of all polygon filters in the order of creation

statistics

Statistics computation for RT-DC dataset instances

exception dclab.statistics.BadMethodWarning[source]
class dclab.statistics.Statistics(name, method, req_feature=False)[source]

A helper class for computing statistics

All statistical methods are registered in the dictionary Statistics.available_methods.

get_feature(ds, feat)[source]

Return filtered feature data

The features are filtered according to the user-defined filters, using the information in ds.filter.all. In addition, all nan and inf values are purged.

Parameters
available_methods = {'%-gated': <dclab.statistics.Statistics object>, 'Events': <dclab.statistics.Statistics object>, 'Flow rate': <dclab.statistics.Statistics object>, 'Mean': <dclab.statistics.Statistics object>, 'Median': <dclab.statistics.Statistics object>, 'Mode': <dclab.statistics.Statistics object>, 'SD': <dclab.statistics.Statistics object>}
dclab.statistics.flow_rate(ds)[source]

Return the flow rate of an RT-DC dataset

dclab.statistics.get_statistics(ds, methods=None, features=None)[source]

Compute statistics for an RT-DC dataset

Parameters
  • ds (dclab.rtdc_dataset.RTDCBase) – The dataset for which to compute the statistics.

  • methods (list of str or None) – The methods wih which to compute the statistics. The list of available methods is given with dclab.statistics.Statistics.available_methods.keys() If set to None, statistics for all methods are computed.

  • features (list of str) – Feature name identifiers are defined by dclab.definitions.feature_exists. If set to None, statistics for all scalar features available are computed.

Returns

  • header (list of str) – The header (feature + method names) of the computed statistics.

  • values (list of float) – The computed statistics.

dclab.statistics.mode(data)[source]

Compute an intelligent value for the mode

The most common value in experimental is not very useful if there are a lot of digits after the comma. This method approaches this issue by rounding to bin size that is determined by the Freedman–Diaconis rule.

Parameters

data (1d ndarray) – The data for which the mode should be computed.

Returns

mode – The mode computed with the Freedman-Diaconis rule.

Return type

float

Writing RT-DC files

class dclab.rtdc_dataset.writer.RTDCWriter(path_or_h5file, mode='append', compression='gzip')[source]

RT-DC data writer classe

Parameters
  • path_or_h5file (str or pathlib.Path or h5py.Group) – Path to an HDF5 file or an HDF5 file opened in write mode

  • mode (str) –

    Defines how the data are stored:

    • ”append”: append new feature data to existing h5py Datasets

    • ”replace”: replace existing h5py Datasets with new features (used for ancillary feature storage)

    • ”reset”: do not keep any previous data

  • compression (str or None) – Compression method used for data storage; one of [None, “lzf”, “gzip”, “szip”].

rectify_metadata()[source]

Autocomplete the metadta of the RTDC-measurement

The following configuration keys are updated:

  • experiment:event count

  • fluorescence:samples per event

  • imaging: roi size x (if image or mask is given)

  • imaging: roi size y (if image or mask is given)

The following configuration keys are added if not present:

  • fluorescence:channel count

store_feature(feat, data)[source]

Write feature data

Parameters
  • feat (str) – feature name

  • data (np.ndarray or list or dict) – feature data

store_log(name, lines)[source]

Write log data

Parameters
  • name (str) – name of the log entry

  • lines (list of str or str) – the text lines of the log

store_metadata(meta)[source]

Store RT-DC meradata

Parameters

meta (dict-like) –

The meta data to store. Each key depicts a meta data section name whose data is given as a dictionary, e.g.:

meta = {"imaging": {"exposure time": 20,
                    "flash duration": 2,
                    ...
                    },
        "setup": {"channel width": 20,
                  "chip region": "channel",
                  ...
                  },
        ...
        }

Only section key names and key values therein registered in dclab are allowed and are converted to the pre-defined dtype. Only sections from the dclab.definitions.CFG_METADATA dictionary are stored. If you have custom metadata, you can use the “user” section.

version_brand(old_version=None, write_attribute=True)[source]

Perform version branding

Append a ” | dclab X.Y.Z” to the “setup:software version” attribute.

Parameters
  • old_version (str or None) – By default, the version string is taken from the HDF5 file. If set to a string, then this version is used instead.

  • write_attribute (bool) – If True (default), write the version string to the “setup:software version” attribute

write_image_grayscale(group, name, data, is_boolean)[source]

Write grayscale image data to and HDF5 dataset

This function wraps RTDCWriter.write_ndarray() and adds image attributes to the HDF5 file so HDFView can display the images properly.

Parameters
  • group (h5py.Group) – parent group

  • name (str) – name of the dataset containing the text

  • data (np.ndarray or list of np.ndarray) – image data

  • is_boolean (bool) – whether or not the input data is of boolean nature (e.g. mask data) - if so, data are converted to uint8

write_ndarray(group, name, data, dtype=None)[source]

Write n-dimensional array data to an HDF5 dataset

It is assumed that the shape of the array data is correct, i.e. that the shape of data is (number_events, feat_shape_1, …, feat_shape_n).

Parameters
  • group (h5py.Group) – parent group

  • name (str) – name of the dataset containing the text

  • data (np.ndarray) – data

  • dtype (dtype) – the dtype to use for storing the data (defaults to data.dtype)

write_ragged(group, name, data)[source]

Write ragged data (i.e. list of arrays of different lenghts)

Ragged array data (e.g. contour data) are stored in a separate group and each entry becomes an HDF5 dataset.

Parameters
  • group (h5py.Group) – parent group

  • name (str) – name of the dataset containing the text

  • data (list of np.ndarray) – the data in a list

write_text(group, name, lines)[source]

Write text to an HDF5 dataset

Text data are written as as fixed-length string dataset.

Parameters
  • group (h5py.Group) – parent group

  • name (str) – name of the dataset containing the text

  • lines (list of str or str) – the text, line by line

dclab.rtdc_dataset.writer.CHUNK_SIZE = 100

Chunk size for storing HDF5 data

R and lme4

exception dclab.lme4.rlibs.VersionError[source]
class dclab.lme4.rlibs.MockRPackage(exception)[source]
dclab.lme4.rlibs.import_r_submodules()[source]
exception dclab.lme4.rsetup.RNotFoundError[source]
class dclab.lme4.rsetup.AutoRConsole[source]

Helper class for catching R console output

By default, this console always returns “yes” when asked a question. If you need something different, you can subclass and override consoleread fucntion. The console stream is recorded in self.stream.

close()[source]

Remove the rpy2 monkeypatches

consoleread(prompt)[source]

Read user input, returns “yes” by default

consolewrite_print(s)[source]
consolewrite_warnerror(s)[source]
get_prints()[source]
get_warnerrors()[source]
write_to_stream(topic, s)[source]
lock = False
perform_lock = True
dclab.lme4.rsetup.check_r()[source]

Make sure R is installed an R HOME is set

dclab.lme4.rsetup.get_r_path()[source]

Get the path of the R executable/binary from rpy2

dclab.lme4.rsetup.get_r_version()[source]
dclab.lme4.rsetup.has_lme4()[source]

Return True if the lme4 package is installed

dclab.lme4.rsetup.has_r()[source]

Return True if R is available

dclab.lme4.rsetup.import_lme4()[source]
dclab.lme4.rsetup.install_lme4()[source]

Install the lme4 package (if not already installed)

The packages are installed to the user data directory given in lib_path.

dclab.lme4.rsetup.set_r_path(r_path)[source]

Set the path of the R executable/binary for rpy2

R lme4 wrapper

exception dclab.lme4.wrapr.Lme4InstallWarning[source]
class dclab.lme4.wrapr.Rlme4(model='lmer', feature='deform')[source]

Perform an R-lme4 analysis with RT-DC data

Parameters
  • model (str) –

    One of:

    • ”lmer”: linear mixed model using lme4’s lmer

    • ”glmer+loglink”: generalized linear mixed model using lme4’s glmer with an additional a log-link function via the family=Gamma(link='log')) keyword.

  • feature (str) – Dclab feature for which to compute the model

add_dataset(ds, group, repetition)[source]

Add a dataset to the analysis list

Parameters
  • ds (RTDCBase) – Dataset

  • group (str) – The group the measurement belongs to (“control” or “treatment”)

  • repetition (int) – Repetition of the measurement

Notes

  • For each repetition, there must be a “treatment” and a “control” group.

  • If you would like to perform a differential feature analysis, then you need to pass at least a reservoir and a channel dataset (with same parameters for group and repetition).

check_data()[source]

Perform sanity checks on self.data

fit(model=None, feature=None)[source]

Perform (generalized) linear mixed-effects model fit

The response variable is modeled using two linear mixed effect models:

Both models are compared in R using “anova” (from the R-package “stats” [Eve92]) which performs a likelihood ratio test to obtain the p-Value for the significance of the fixed effect (treatment).

If the input datasets contain data from the “reservoir” region, then the analysis is performed for the differential feature.

Parameters
  • model (str (optional)) –

    One of:

    • ”lmer”: linear mixed model using lme4’s lmer

    • ”glmer+loglink”: generalized linear mixed model using lme4’s glmer with an additional log-link function via family=Gamma(link='log')) [BMBW15]

  • feature (str (optional)) – dclab feature for which to compute the model

Returns

results – Dictionary with the results of the fitting process:

  • ”anova p-value”: Anova likelyhood ratio test (significance)

  • ”feature”: name of the feature used for the analysis self.feature

  • ”fixed effects intercept”: Mean of self.feature for all controls; In the case of the “glmer+loglink” model, the intercept is already backtransformed from log space.

  • ”fixed effects treatment”: The fixed effect size between the mean of the controls and the mean of the treatments relative to “fixed effects intercept”; In the case of the “glmer+loglink” model, the fixed effect is already backtransformed from log space.

  • ”fixed effects repetitions”: The effects (intercept and treatment) for each repetition. The first axis defines intercept/treatment; the second axis enumerates the repetitions; thus the shape is (2, number of repetitions) and np.mean(results["fixed effects repetitions"], axis=1) is equivalent to the tuple (results["fixed effects intercept"], results["fixed effects treatment"]) for the “lmer” model. This does not hold for the “glmer+loglink” model, because of the non-linear inverse transform back from log space.

  • ”is differential”: Boolean indicating whether or not the analysis was performed for the differential (bootstrapped and subtracted reservoir from channel data) feature

  • ”model”: model name used for the analysis self.model

  • ”model converged”: boolean indicating whether the model converged

  • ”r anova”: Anova model (exposed from R)

  • ”r model summary”: Summary of the model (exposed from R)

  • ”r model coefficients”: Model coefficient table (exposed from R)

  • ”r stderr”: errors and warnings from R

  • ”r stdout”: standard output from R

Return type

dict

get_differential_dataset()[source]

Return the differential dataset for channel/reservoir data

The most famous use case is differential deformation. The idea is that you cannot tell what the difference in deformation from channel to reservoir is, because you never measure the same object in the reservoir and the channel. You usually just have two distributions. Comparing distributions is possible via bootstrapping. And then, instead of running the lme4 analysis with the channel deformation data, it is run with the differential deformation (subtraction of the bootstrapped deformation distributions for channel and reservoir).

get_feature_data(group, repetition, region='channel')[source]

Return array containing feature data

Parameters
  • group (str) – Measurement group (“control” or “treatment”)

  • repetition (int) – Measurement repetition

  • region (str) – Either “channel” or “reservoir”

Returns

fdata – Feature data (Nans and Infs removed)

Return type

1d ndarray

is_differential()[source]

Return True if the differential feature is computed for analysis

This effectively just checks the regions of the datasets and returns True if any one of the regions is “reservoir”.

See also

get_differential_features

for an explanation

set_options(model=None, feature=None)[source]

Set analysis options

data

list of [RTDCBase, column, repetition, chip_region]

feature

dclab feature for which to perform the analysis

model

modeling method to use (e.g. “lmer”)

r_func_model

model function

r_func_nullmodel

null model function

dclab.lme4.wrapr.bootstrapped_median_distributions(a, b, bs_iter=1000, rs=117)[source]

Compute the bootstrapped distributions for two arrays.

Parameters
  • a (1d ndarray of length N) – Input data

  • b (1d ndarray of length N) – Input data

  • bs_iter (int) – Number of bootstrapping iterations to perform (outtput size).

  • rs (int) – Random state seed for random number generator

Returns

median_dist_a, median_dist_b – Boostrap distribution of medians for a and b.

Return type

1d arrays of length bs_iter

Notes

From a programmatical point of view, it would have been better to implement this method for just one input array (because of redundant code). However, due to historical reasons (testing and comparability to Shape-Out 1), bootstrapping is done interleaved for the two arrays.

Machine learning

New in version 0.38.0.

class dclab.rtdc_dataset.feat_anc_ml.ml_feature.MachineLearningFeature(feature_name, dc_model, modc_path=None)[source]

A user-defined machine-learning feature

Parameters

Notes

MachineLearningFeature inherits from AncillaryFeature.

dclab.rtdc_dataset.feat_anc_ml.ml_feature.load_ml_feature(modc_path)[source]

Find and load MachineLearningFeature(s) from a .modc file

Parameters

modc_path (str or Path) – pathname to a .modc file

Returns

ml_list – list of MachineLearningFeature instances loaded from modc_path

Return type

list of MachineLearningFeature

See also

MachineLearningFeature

class handling the plugin feature information

dclab.rtdc_dataset.feat_anc_ml.ml_feature.remove_all_ml_features()[source]

Convenience function for removing all MachineLearningFeature instances

See also

remove_ml_feature

remove a single MachineLearningFeature instance

dclab.rtdc_dataset.feat_anc_ml.ml_feature.remove_ml_feature(ml_instance)[source]

Convenience function for removing a MachineLearningFeature instance

Parameters

ml_instance (MachineLearningFeature) – The MachineLearningFeature instance to be removed from dclab

Raises

TypeError – If the ml_instance is not a MachineLearningFeature instance

dclab.rtdc_dataset.feat_anc_ml.ml_libs.import_or_mock_package(name, min_version)[source]

Reading and writing trained machine learning models for dclab

exception dclab.rtdc_dataset.feat_anc_ml.modc.ModelFormatExportFailedWarning[source]
dclab.rtdc_dataset.feat_anc_ml.modc.export_model(path, model, enforce_formats=None)[source]

Export an ML model to all possible formats

The model must be exportable with at least one method listed by BaseModel.all_formats().

Parameters
  • path (str or pathlib.Path) – Directory where the model is stored to. For each supported model, a new subdirectory or file is created.

  • model (An instance of an ML model, NOT dclab.cfeat_anc_ml.models.BaseModel) – Trained model instance

  • enforce_formats (list of str) – Enforced file formats for export. If the export for one of these file formats fails, a ValueError is raised.

dclab.rtdc_dataset.feat_anc_ml.modc.hash_path(path)[source]

Create a SHA256 hash of a file or all files in a directory

The files are sorted before hashing for reproducibility.

dclab.rtdc_dataset.feat_anc_ml.modc.load_modc(path, from_format=None)[source]

Load models from a .modc file for inference

Parameters
  • path (str or path-like) – Path to a .modc file

  • from_format (str) – If set to None, the first available format in BaseModel.all_formats() is used. If set to a key in BaseModel.all_formats(), then this format will take precedence and an error will be raised if loading with this format fails.

Returns

model – Models that can be used for inference via model.predict

Return type

list of dclab.rtdc_dataset.feat_anc_ml.ml_model.BaseModel

dclab.rtdc_dataset.feat_anc_ml.modc.save_modc(path, dc_models)[source]

Save ML models to a .modc file

Parameters
  • path (str, pathlib.Path) – Output .modc path

  • dc_models (list of/or dclab.rtdc_dataset.feat_anc_ml.models.BaseModel) – Models to save

Returns

meta – Dictionary written to index.json in the .modc file

Return type

dict

class dclab.rtdc_dataset.feat_anc_ml.ml_model.BaseModel(bare_model, inputs, outputs, info=None)[source]
Parameters
  • bare_model – Underlying ML model

  • inputs (list of str) – List of model input features, e.g. ["deform", "area_um"]

  • outputs (list of str) – List of output features the model provides in that order, e.g. ["ml_score_rbc", "ml_score_rt1", "ml_score_tfe"]

  • info (dict) – Dictionary with model metadata

static all_formats()[source]

Dict of dictionaries containing all model formats in dclab

Returns

fmt_dict – All file formats with names as keys. Each item contains the keys “name” (format name), “suffix” (saved file suffix), “requires” (Python dependencies).

Return type

dict

See also

supported_formats

class-specific file formats

get_dataset_features(ds, dtype=<class 'numpy.float32'>)[source]

Return the dataset features used for inference

Parameters
  • ds (dclab.rtdc_dataset.RTDCBase) – Dataset from which to retrieve the feature data

  • dtype (dtype) – All features are cast to this dtype

Returns

fdata – 2D array of shape (len(ds), len(self.inputs)); i.e. to access the array containing the first feature, for all events, you would do fdata[:, 0].

Return type

2d ndarray

abstract static load_bare_model(path)[source]

Load an implementation-specific model from a file

This will set the self.model attribute. Make sure that the other attributes are set properly as well.

abstract predict(ds)[source]

Return the probabilities of self.outputs for ds

Parameters

ds (dclab.rtdc_dataset.RTDCBase) – Dataset to apply the model to

Returns

ofdict – Output feature dictionary with features as keys and 1d ndarrays as values.

Return type

dict

Notes

This function calls BaseModel.get_dataset_features() to obtain the input feature matrix.

abstract static save_bare_model(path, bare_model, save_format=None)[source]

Save an implementation-specific model to a file

Parameters
  • path (str or path-like) – Path to store model to

  • bare_model (object) – The implementation-specific bare model

  • save_format (str) – Must be in supported_formats

abstract static supported_formats()[source]

List of dictionaries containing model formats

Returns

fmts – Each item contains the keys “name” (format name), “suffix” (saved file suffix), “requires” (Python dependencies).

Return type

list

Notes

The return value is automatically added to the return value of BaseModel.all_formats().

tensorflow helper functions for RT-DC data

dclab.rtdc_dataset.feat_anc_ml.hook_tensorflow.tf_dataset.assemble_tf_dataset_scalars(dc_data, feature_inputs, labels=None, split=0.0, shuffle=True, batch_size=32, dtype=<class 'numpy.float32'>)[source]

Assemble a tensorflow.data.Dataset for scalar features

Scalar feature data are loaded directly into memory.

Parameters
  • dc_data (list of pathlib.Path, str, or dclab.rtdc_dataset.RTDCBase) – List of source datasets (can be anything dclab.new_dataset() accepts).

  • feature_inputs (list of str) – List of scalar feature names to extract from paths.

  • labels (list) – Labels (e.g. an integer that classifies each element of path) used for training. Defaults to None (no labels).

  • split (float) – If set to zero, only one dataset is returned; If set to a float between 0 and 1, a train and test dataset is returned. Please set shuffle=True.

  • shuffle (bool) – If True (default), shuffle the dataset (A hard-coded seed is used for reproducibility).

  • batch_size (int) – Batch size for training. The function tf.data.Dataset.batch is called with batch_size as its argument.

  • dtype (numpy.dtype) – Desired dtype of the output data

Returns

train [,test] – Dataset that can be used for training with tensorflow

Return type

tensorflow.data.Dataset

dclab.rtdc_dataset.feat_anc_ml.hook_tensorflow.tf_dataset.get_dataset_event_feature(dc_data, feature, tf_dataset_indices=None, dc_data_indices=None, split_index=0, split=0.0, shuffle=True)[source]

Return RT-DC features for tensorflow Dataset indices

The functions assemble_tf_dataset_* return a tensorflow.data.Dataset instance with all input data shuffled (or split). This function retrieves features using the Dataset indices, given the same parameters (paths, split, shuffle).

Parameters
  • dc_data (list of pathlib.Path, str, or dclab.rtdc_dataset.RTDCBase) – List of source datasets (Must match the path list used to create the tf.data.Dataset).

  • feature (str) – Name of the feature to retrieve

  • tf_dataset_indices (list-like) – tf.data.Dataset indices corresponding to the events of interest. If None, all indices are used.

  • dc_data_indices (list of int) – List with indices that correspond to the only items in dc_data for which the features should be returned.

  • split_index (int) – The split index; 0 for the first part, 1 for the second part.

  • split (float) – Splitting fraction (Must match the path list used to create the tf.data.Dataset)

  • shuffle (bool) – Shuffling (Must match the path list used to create the tf.data.Dataset)

Returns

data – Feature list with elements corresponding to the events given by dataset_indices.

Return type

list

dclab.rtdc_dataset.feat_anc_ml.hook_tensorflow.tf_dataset.shuffle_array(arr, seed=42)[source]

Shuffle a numpy array in-place reproducibly with a fixed seed

The shuffled array is also returned.

class dclab.rtdc_dataset.feat_anc_ml.hook_tensorflow.tf_model.TensorflowModel(bare_model, inputs, outputs, info=None)[source]

Handle tensorflow models

Parameters
  • bare_model – Underlying ML model

  • inputs (list of str) – List of model input features, e.g. ["deform", "area_um"]

  • outputs (list of str) – List of output features the model provides in that order, e.g. ["ml_score_rbc", "ml_score_rt1", "ml_score_tfe"]

  • info (dict) – Dictionary with model metadata

has_sigmoid_activation(layer_config=None)[source]

Return True if final layer has “sigmoid” activation function

has_softmax_layer(layer_config=None)[source]

Return True if final layer is a Softmax layer

static load_bare_model(path)[source]

Load a tensorflow model

predict(ds, batch_size=32)[source]

Return the probabilities of self.outputs for ds

Parameters
Returns

ofdict – Output feature dictionary with features as keys and 1d ndarrays as values.

Return type

dict

Notes

Before prediction, this method asserts that the outputs of the model are converted to probabilities. If the final layer is one-dimensional and does not have a sigmoid activation, then a sigmoid activation layer is added (binary classification) tf.keras.layers.Activation("sigmoid"). If the final layer has more dimensions and is not a tf.keras.layers.Softmax() layer, then a softmax layer is added.

static save_bare_model(path, bare_model, save_format='tensorflow-SavedModel')[source]

Save a tensorflow model

static supported_formats()[source]

List of dictionaries containing model formats

Returns

fmts – Each item contains the keys “name” (format name), “suffix” (saved file suffix), “requires” (Python dependencies).

Return type

list

Notes

The return value is automatically added to the return value of BaseModel.all_formats().