Code reference

Module-level methods

dclab.new_dataset(data, identifier=None, **kwargs)[source]

Initialize a new RT-DC dataset

Parameters:

data –
can be one of the following:
- dict
- .tdms file
- .rtdc file
- subclass of RTDCBase (will create a hierarchy child)
- DCOR resource URL
- URL to file in S3-compatible object store
identifier (str) – A unique identifier for this dataset. If set to None an identifier is generated.
kwargs – Additional parameters passed to the RTDCBase subclass

Returns:

dataset – A new dataset instance

Return type:

subclass of dclab.rtdc_dataset.RTDCBase

Global definitions

These definitionas are used throughout the dclab/Shape-In/DCscope ecosystem.

Metadata

Valid configuration sections and keys are described in: Analysis metadata and Experiment metadata. You should use the following methods instead of accessing the static metadata constants.

dclab.definitions.config_key_exists(section, key)[source]: Return True if the configuration key exists

dclab.definitions.get_config_value_descr(section, key)[source]

Return the description of a config value

Returns key if not defined anywhere

dclab.definitions.get_config_value_func(section, key)[source]: Return configuration type converter function

dclab.definitions.get_config_value_type(section, key)[source]

Return the expected type of a config value

Returns None if no type is defined

These constants are also available in the dclab.definitions module.

dclab.definitions.meta_const.CFG_ANALYSIS: All configuration keywords editable by the user

dclab.definitions.meta_const.CFG_METADATA: All read-only configuration keywords for a measurement

dclab.definitions.meta_const.config_keys: dict with section as keys and config parameter names as values

Metadata parsers

dclab.definitions.meta_parse.f1dfloatduple(value)[source]: Tuple of two floats (duple)

dclab.definitions.meta_parse.f2dfloatarray(value)[source]: numpy floating point array

dclab.definitions.meta_parse.fbool(value)[source]: boolean

dclab.definitions.meta_parse.fboolorfloat(value)[source]: Bool or float

dclab.definitions.meta_parse.fint(value)[source]: integer

dclab.definitions.meta_parse.fintlist(alist)[source]: A list of integers

dclab.definitions.meta_parse.lcstr(astr)[source]: lower-case string

dclab.definitions.meta_parse.func_types = {<class 'float'>: <class 'numbers.Number'>, <function f1dfloatduple>: (<class 'tuple'>, <class 'numpy.ndarray'>), <function f2dfloatarray>: <class 'numpy.ndarray'>, <function fbool>: (<class 'bool'>, <class 'numpy.bool'>), <function fboolorfloat>: (<class 'bool'>, <class 'numpy.bool'>, <class 'float'>), <function fint>: <class 'numbers.Integral'>, <function fintlist>: <class 'list'>, <function lcstr>: <class 'str'>}: maps functions to their expected output types

Features

Features are discussed in more detail in Features.

dclab.definitions.check_feature_shape(name, data)[source]

Check if (non)-scalar feature matches with its data’s dimensionality

Parameters:

name (str) – name of the feature
data (array-like) – data whose dimensionality will be checked

Raises:

ValueError – If the data’s shape does not match its scalar description

dclab.definitions.feature_exists(name, scalar_only=False)[source]

Return True if name is a valid feature name

This function not only checks whether name is in feature_names, but also validates against the machine learning scores ml_score_??? (where ? can be a digit or a lower-case letter in the English alphabet).

Parameters:

name (str) – name of a feature
scalar_only (bool) – Specify whether the check should only search in scalar features

Returns:

valid – True if name is a valid feature, False otherwise.

Return type:

bool

See also

scalar_feature_exists: Wraps feature_exists with scalar_only=True

dclab.definitions.get_feature_label(name, rtdc_ds=None, with_unit=True)[source]

Return the label corresponding to a feature name

This function not only checks feature_name2label, but also supports registered ml_score_??? features.

Parameters:

name (str) – name of a feature
with_unit (bool) – set to False to remove units in square brackets

Returns:

label – feature label corresponding to the feature name

Return type:

str

Notes

TODO: extract feature label from ancillary information when an rtdc_ds is given.

dclab.definitions.scalar_feature_exists(name)[source]: Convenience method wrapping feature_exists(…, scalar_only=True)

These constants are also available in the dclab.definitions module.

dclab.definitions.feat_const.FEATURES_NON_SCALAR: list of non-scalar features

dclab.definitions.feat_const.feature_names: list of feature names

dclab.definitions.feat_const.feature_labels: list of feature labels (same order as feature_names

dclab.definitions.feat_const.feature_name2label: dict for converting feature names to labels

dclab.definitions.feat_const.scalar_feature_names: list of scalar feature names

RT-DC dataset manipulation

Base class

class dclab.rtdc_dataset.RTDCBase(identifier=None, enable_basins=True)[source]

RT-DC measurement base class

Notes

Besides the filter arrays for each data feature, there is a manual boolean filter array RTDCBase.filter.manual that can be edited by the user - a boolean value of False means that the event is excluded from all computations.

static get_kde_spacing(a, scale='linear', method=<function bin_width_doane>, method_kw=None, feat='undefined', ret_scaled=False)[source]

Convenience function for computing the contour spacing

Parameters:

a (ndarray) – feature data
scale (str) – how the data should be scaled (“log” or “linear”)
method (callable) – KDE method to use (see kde_methods submodule)
method_kw (dict) – keyword arguments to method
feat (str) – feature name for debugging
ret_scaled (bool) – whether to return the scaled array of a

apply_filter(force=None)[source]: Compute the filters for the dataset

basins_get_dicts()[source]: Return the list of dictionaries describing the dataset’s basins

basins_retrieve()[source]

Load all basins available

Added in version 0.54.0.

In dclab 0.51.0, we introduced basins, a simple way of combining HDF5-based datasets (including the HDF5_S3 format). The idea is to be able to store parts of the dataset (e.g. images) in a separate file that could then be located someplace else (e.g. an S3 object store).

If an RT-DC file has “basins” defined, then these are sought out and made available via the features_basin property.

Changed in version 0.57.5: “file”-type basins are only available for subclasses that set the _local_basins_allowed attribute to True.

Changed in version 0.71.5: If the instance was created with enable_basins=False, then only internal basins are returned. The enable_basins check was previously done in the logic of in the basins property.

close()[source]

Close any open files or connections, including basins

If implemented in a subclass, the subclass must call this method via super, otherwise basins are not closed. The subclass is responsible for closing its specific file handles.

get_downsampled_scatter(xax='area_um', yax='deform', downsample=0, xscale='linear', yscale='linear', remove_invalid=False, ret_mask=False)[source]

Downsampling by removing points at dense locations

Parameters:

xax (str) – Identifier for x axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for y axis
downsample (int) –
Number of points to draw in the down-sampled plot. This number is either
- >=1: exactly downsample to this number by randomly adding
  or removing points
- 0 : do not perform downsampling
xscale (str) – If set to “log”, take the logarithm of the x-values before performing downsampling. This is useful when data are are displayed on a log-scale. Defaults to “linear”.
yscale (str) – See xscale.
remove_invalid (bool) – Remove nan and inf values before downsampling; if set to True, the actual number of samples returned might be smaller than downsample due to infinite or nan values (e.g. due to logarithmic scaling).
ret_mask (bool) – If set to True, returns a boolean array of length len(self) where True values identify the filtered data.

Returns:

xnew, xnew (1d ndarray of lenght N) – Filtered data; N is either identical to downsample or smaller (if remove_invalid==True)
mask (1d boolean array of length len(RTDCBase)) – Array for identifying the downsampled data points

get_kde_contour(xax='area_um', yax='deform', xacc=None, yacc=None, kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear')[source]

Evaluate the kernel density estimate for contour plots

Parameters:

xax (str) – Identifier for X axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for Y axis
xacc (float) – Contour accuracy in x direction
yacc (float) – Contour accuracy in y direction
kde_type (str) – The KDE method to use
kde_kwargs (dict) – Additional keyword arguments to the KDE method
xscale (str) – If set to “log”, take the logarithm of the x-values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
yscale (str) – See xscale.

Returns:

X, Y, Z – The kernel density Z evaluated on a rectangular grid (X,Y).

Return type:

coordinates

get_kde_scatter(xax='area_um', yax='deform', positions=None, kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear')[source]

Evaluate the kernel density estimate for scatter plots

Parameters:

xax (str) – Identifier for X axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for Y axis
positions (list of two 1d ndarrays or ndarray of shape (2, N)) – The positions where the KDE will be computed. Note that the KDE estimate is computed from the points that are set in self.filter.all.
kde_type (str) – The KDE method to use, see kde_methods.methods
kde_kwargs (dict) – Additional keyword arguments to the KDE method
xscale (str) – If set to “log”, take the logarithm of the x-values before computing the KDE. This is useful when data are are displayed on a log-scale. Defaults to “linear”.
yscale (str) – See xscale.

Returns:

density – The kernel density evaluated for the filtered data points.

Return type:

1d ndarray

get_measurement_identifier()[source]

Return a unique measurement identifier

Return the [experiment]:”run identifier” configuration feat, if it exists. Otherwise, return the MD5 sum computed from the measurement time, date, and setup identifier.

Returns None if no identifier could be found or computed.

Added in version 0.51.0.

ignore_basins(basin_identifiers)[source]

Ignore these basin identifiers when looking for features

This is used to avoid circular basin dependencies.

polygon_filter_add(filt)[source]

Associate a Polygon Filter with this instance

Parameters:: filt (int or instance of PolygonFilter) – The polygon filter to add

polygon_filter_rm(filt)[source]

Remove a polygon filter from this instance

Parameters:: filt (int or instance of PolygonFilter) – The polygon filter to remove

reset_filter()[source]: Reset the current filter

property basins

Basins with upstream features from internal/external locations

If the instance was created with enable_basins=False, then only internal basins are returned.

config: Configuration of the measurement

export: Export functionalities; instance of dclab.rtdc_dataset.export.Export.

property features: All available features

property features_ancillary

All available ancillary features

This includes all ancillary features, excluding the features that are already in self.features_innate. This means that there may be overlap between features_ancillary and e.g. self.features_basin.

Added in version 0.58.0.

property features_basin: All features accessed via upstream basins from other locations

property features_innate

All features excluding ancillary, basin, or temporary features

Internal basin features are included since version 0.71.4.

property features_loaded

All features that have been computed

This includes ancillary features and temporary features.

Notes

Ancillary features that are computationally cheap to compute are always included. They are defined in dclab.rtdc_dataset.feat_anc_core.FEATURES_RAPID.

property features_local

All features that are, with certainty, really fast to access

Local features is a slimmed down version of features_loaded. Nothing needs to be computed, not even rapid features (dclab.rtdc_dataset.feat_anc_core.FEATURES_RAPID). And features from remote sources that have not been downloaded already are excluded. Ancillary and temporary features that are available are included.

property features_scalar: All scalar features available

property filter: Filtering functionalities; instance of Filter

format: Dataset format (derived from class name)

abstract property hash: str: Reproducible dataset hash (defined by derived classes)

property identifier: Unique (unreproducible) identifier

logs: Dictionary of log files. Each log file is a list of strings (one string per line).

path: Path or DCOR identifier of the dataset (set to “none” for RTDC_Dict)

tables: Dictionary of tables. Each table is an indexable compound numpy array.

title: Title of the measurement

HDF5 (.rtdc) format

class dclab.rtdc_dataset.RTDC_HDF5(h5path, h5kwargs=None, *args, **kwargs)[source]

HDF5 file format for RT-DC measurements

Parameters:

h5path (str or pathlib.Path or file-like object) – Path to an ‘.rtdc’ measurement file or a file-like object
h5kwargs (dict) – Additional keyword arguments given to h5py.File
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase

static basin_get_dicts_from_h5file(h5file)[source]: Return list of dicts for all basins defined in h5file

static can_open(h5path)[source]: Check whether a given file is in the .rtdc file format

static parse_config(h5path)[source]

Parse the RT-DC configuration of an HDF5 file

h5path may be a h5py.File object or an actual path

basins_get_dicts()[source]: Return list of dicts for all basins defined in self.h5file

close()[source]: Close the underlying HDF5 file

property hash: Hash value based on file name and content

path: Path to the measurement HDF5 (.rtdc) file

class dclab.rtdc_dataset.fmt_hdf5.basin.HDF5Basin(*args, **kwargs)[source]

Parameters:

location (str) – Location of the basin, this can be a path or a URL, depending on the implementation of the subclass
name (str) – Human-readable name of the basin
description (str) – Lengthy description of the basin
features (list of str) – List of features this basin provides; This list is enforced, even if the basin actually contains more features.
referrer_identifier (str) – A measurement identifier against which to check the basin. If the basin mapping is “same”, then this must match the identifier of the basin exactly, otherwise it must start with the basin identifier (e.g. “basin-id_referrer-sub-id”). If this is set to None (default), there is no certainty that the downstream dataset is from the same measurement.
basin_identifier (str) – A measurement identifier that must match the basin exactly. In contrast to referrer_identifier, the basin identifier is the identifier of the basin file. If basin_identifier is specified, the identifier of the basin must be identical to it.
mapping (str) – Which type of mapping to use. This can be either “same” when the event list of the basin is identical to that of the dataset defining the basin, or one of the “basinmap” features (e.g. “basinmap1”) in cases where the dataset consists of a subset of the events of the basin dataset. In the latter case, the feature defined by mapping must be present in the dataset and consist of integer-valued indices (starting at 0) for the basin dataset.
mapping_referrer (dict-like) – Dict-like object from which “basinmap” features can be obtained in situations where mapping != “same”. This can be a simple dictionary of numpy arrays or e.g. an instance of RTDCBase.
ignored_basins (list of str) – List of basins to ignore in subsequent basin instantiations
key (str) – Unique key to identify this basin; normally computed from a JSON dump of the basin definition. A random string is used if None is specified.
perishable (bool or PerishableRecord) – If this is not False, then it must be a PerishableRecord that holds the information about the expiration time, and that comes with a method refresh to extend the lifetime of the basin.
kwargs – Additional keyword arguments passed to the load_dataset method of the Basin subclass.
versionchanged (..) – Added the mapping keyword argument to support basins with a superset of events.

is_available()[source]: Return True if the basin is available

dclab.rtdc_dataset.fmt_hdf5.MIN_DCLAB_EXPORT_VERSION = '0.3.3.dev2'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

DCOR (online) format

class dclab.rtdc_dataset.RTDC_DCOR(url, host='dcor.mpl.mpg.de', api_key='', use_ssl=None, cert_path=None, dcserv_api_version=2, *args, **kwargs)[source]

Wrap around the DCOR API

Parameters:

url (str) –
Full URL or resource identifier; valid values are
- https://dcor.mpl.mpg.de/api/3/action/dcserv?id=b1404eb5-f661-4920-be79-5ff4e85915d5
- dcor.mpl.mpg.de/api/3/action/dcserv?id=b1404eb5-f 661-4920-be79-5ff4e85915d5
- b1404eb5-f661-4920-be79-5ff4e85915d5
host (str) – The default host machine used if the host is not given in url
api_key (str) – API key to access private resources
use_ssl (bool) – Set this to False to disable SSL (should only be used for testing). Defaults to None (does not force SSL if the URL starts with “http://”).
cert_path (pathlib.Path) – The (optional) path to a server CA bundle; this should only be necessary for DCOR instances in the intranet with a custom CA or for certificate pinning.
dcserv_api_version (int) – Version of the dcserv API to use. In version 0.13.2 of ckanext-dc_serve, version 2 was introduced which entails serving an S3-basin-only dataset.
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase

static get_full_url(url, use_ssl, host=None)[source]

Return the full URL to a DCOR resource

Parameters:

url (str) –
Full URL or resource identifier; valid values are
- https://dcor.mpl.mpg.de/api/3/action/dcserv?id=caab96f6- df12-4299-aa2e-089e390aafd5’
- dcor.mpl.mpg.de/api/3/action/dcserv?id=caab96f6-df12- 4299-aa2e-089e390aafd5
- caab96f6-df12-4299-aa2e-089e390aafd5
use_ssl (bool or None) – Set this to False to disable SSL (should only be used for testing). Defaults to None (does not force SSL if the URL starts with “http://”).
host (str) – Use this host if it is not specified in url

basins_get_dicts()[source]

Return list of dicts for all basins defined on DCOR

The return value of this method is cached for 10 minutes (cache time defined in the cache_basin_dict_time [s] property).

basins_retrieve()[source]: Same as superclass, but add perishable information

property hash: Hash value based on file name and content

path: Full URL to the DCOR resource

class dclab.rtdc_dataset.fmt_dcor.api.APIHandler(url, api_key='', cert_path=None, dcserv_api_version=2)[source]

Handles the DCOR api with caching for simple queries

Parameters:

url (str) – URL to DCOR API
api_key (str) – DCOR API token
cert_path (pathlib.Path) – the path to the server’s CA bundle; by default this will use the default certificates (which depends on from where you obtained certifi/requests)

classmethod add_api_key(api_key)[source]

Add an API Key/Token to the base class

When accessing the DCOR API, all available API Keys/Tokens are used to access a resource (trial and error).

get(query, feat=None, trace=None, event=None, timeout=None, retries=5)[source]

Fetch information from DCOR

Parameters:

query (str) – API route
feat (str) – DEPRECATED (use basins instead), adds f”&feature={feat}” to query
trace (str) – DEPRECATED (use basins instead), adds f”&trace={trace}” to query
event (str) – DEPRECATED (use basins instead), adds f”&event={event}” to query
timeout (float) – Request timeout
retries (int) – Number of retries to fetch the request. For every retry, the timeout is increased by two seconds.

api_key: DCOR API Token

api_keys = []: DCOR API Keys/Tokens in the current session

cache_queries = ['metadata', 'size', 'feature_list', 'valid']: These are cached to minimize network usage Note that we are not caching basins, since they may contain expiring URLs.

dcserv_api_version: ckanext-dc_serve dcserv API version

session: create a session

url: DCOR API URL

verify: keyword argument to requests.request()

HTTP (online) file format

class dclab.rtdc_dataset.fmt_http.HTTPBasin(*args, **kwargs)[source]

Parameters:

location (str) – Location of the basin, this can be a path or a URL, depending on the implementation of the subclass
name (str) – Human-readable name of the basin
description (str) – Lengthy description of the basin
features (list of str) – List of features this basin provides; This list is enforced, even if the basin actually contains more features.
referrer_identifier (str) – A measurement identifier against which to check the basin. If the basin mapping is “same”, then this must match the identifier of the basin exactly, otherwise it must start with the basin identifier (e.g. “basin-id_referrer-sub-id”). If this is set to None (default), there is no certainty that the downstream dataset is from the same measurement.
basin_identifier (str) – A measurement identifier that must match the basin exactly. In contrast to referrer_identifier, the basin identifier is the identifier of the basin file. If basin_identifier is specified, the identifier of the basin must be identical to it.
mapping (str) – Which type of mapping to use. This can be either “same” when the event list of the basin is identical to that of the dataset defining the basin, or one of the “basinmap” features (e.g. “basinmap1”) in cases where the dataset consists of a subset of the events of the basin dataset. In the latter case, the feature defined by mapping must be present in the dataset and consist of integer-valued indices (starting at 0) for the basin dataset.
mapping_referrer (dict-like) – Dict-like object from which “basinmap” features can be obtained in situations where mapping != “same”. This can be a simple dictionary of numpy arrays or e.g. an instance of RTDCBase.
ignored_basins (list of str) – List of basins to ignore in subsequent basin instantiations
key (str) – Unique key to identify this basin; normally computed from a JSON dump of the basin definition. A random string is used if None is specified.
perishable (bool or PerishableRecord) – If this is not False, then it must be a PerishableRecord that holds the information about the expiration time, and that comes with a method refresh to extend the lifetime of the basin.
kwargs – Additional keyword arguments passed to the load_dataset method of the Basin subclass.
versionchanged (..) – Added the mapping keyword argument to support basins with a superset of events.

is_available()[source]

Check for requests and object availability

Caching policy: Once this method returns True, it will always return True.

class dclab.rtdc_dataset.fmt_http.RTDC_HTTP(url, *args, **kwargs)[source]

Access RT-DC measurements via HTTP

This class allows you to open .rtdc files accessible via an HTTP URL, for instance files on an S3 object storage or figshare download links.

This is essentially just a wrapper around RTDC_HDF5 with HTTPFile passing a file object to h5py.

Parameters:

url (str) – Full URL to an HDF5 file
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase

Notes

Since this format still requires random access to the file online, i.e. not the entire file is downloaded, only parts of it, the web server must support range requests.

close()[source]: Close the underlying HDF5 file

property hash: Hash value based on file name and content

path: URL to the file

S3 (online) file format

class dclab.rtdc_dataset.fmt_s3.RTDC_S3(url, endpoint_url=None, access_key_id=None, secret_access_key=None, use_ssl=True, *args, **kwargs)[source]

Access RT-DC measurements in an S3-compatible object store

This is essentially just a wrapper around RTDC_HDF5 with boto3 and HTTPFile passing a file object to h5py.

Parameters:

url (str) – URL to an object in an S3 instance; this can be either a full URL (including the endpoint), or just bucket/key
access_key_id (str) – S3 access identifier
secret_access_key (str) – Secret S3 access key
use_ssl (bool) – Whether to enforce SSL (defaults to True)
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase
endpoint_url (str)

close()[source]: Close the underlying HDF5 file

property hash: Hash value based on file name and content

path: URL the object on S3

class dclab.rtdc_dataset.fmt_s3.S3Basin(*args, **kwargs)[source]

Parameters:

location (str) – Location of the basin, this can be a path or a URL, depending on the implementation of the subclass
name (str) – Human-readable name of the basin
description (str) – Lengthy description of the basin
features (list of str) – List of features this basin provides; This list is enforced, even if the basin actually contains more features.
referrer_identifier (str) – A measurement identifier against which to check the basin. If the basin mapping is “same”, then this must match the identifier of the basin exactly, otherwise it must start with the basin identifier (e.g. “basin-id_referrer-sub-id”). If this is set to None (default), there is no certainty that the downstream dataset is from the same measurement.
basin_identifier (str) – A measurement identifier that must match the basin exactly. In contrast to referrer_identifier, the basin identifier is the identifier of the basin file. If basin_identifier is specified, the identifier of the basin must be identical to it.
mapping (str) – Which type of mapping to use. This can be either “same” when the event list of the basin is identical to that of the dataset defining the basin, or one of the “basinmap” features (e.g. “basinmap1”) in cases where the dataset consists of a subset of the events of the basin dataset. In the latter case, the feature defined by mapping must be present in the dataset and consist of integer-valued indices (starting at 0) for the basin dataset.
mapping_referrer (dict-like) – Dict-like object from which “basinmap” features can be obtained in situations where mapping != “same”. This can be a simple dictionary of numpy arrays or e.g. an instance of RTDCBase.
ignored_basins (list of str) – List of basins to ignore in subsequent basin instantiations
key (str) – Unique key to identify this basin; normally computed from a JSON dump of the basin definition. A random string is used if None is specified.
perishable (bool or PerishableRecord) – If this is not False, then it must be a PerishableRecord that holds the information about the expiration time, and that comes with a method refresh to extend the lifetime of the basin.
kwargs – Additional keyword arguments passed to the load_dataset method of the Basin subclass.
versionchanged (..) – Added the mapping keyword argument to support basins with a superset of events.

is_available()[source]

Check for boto3 and object availability

Caching policy: Once this method returns True, it will always return True.

class dclab.rtdc_dataset.fmt_s3.S3File(object_path, endpoint_url, access_key_id='', secret_access_key='', use_ssl=True, verify_ssl=True)[source]

Monkeypatched HTTPFile to support authenticated access to S3

Parameters:

object_path (str) – bucket/key path to object in the object store
endpoint_url (str) – the explicit endpoint URL for accessing the object store
access_key_id (str) – S3 access key
secret_access_key (str) – secret S3 key mathcing access_key_id
use_ssl (bool) – use SSL to connect to the endpoint, only disabled for testing
verify_ssl (bool) – make sure the SSL certificate is sound, only used for testing

close()[source]

Close the file

This closes the requests session and then calls close on the super class.

download_range(start, stop)[source]

Download bytes given by the range (start, stop)

stop is not inclusive (In the HTTP range request it normally is).

dclab.rtdc_dataset.fmt_s3.get_endpoint_url(url)[source]

Given a URL of an S3 object, return the endpoint URL

Return None if no endpoint URL can be extracted (e.g. because just bucket_name/object_path was passed).

dclab.rtdc_dataset.fmt_s3.get_object_path(url)[source]

Given a URL of an S3 object, return the bucket_name/object_path part

Return object paths always without leading slash /.

dclab.rtdc_dataset.fmt_s3.is_s3_object_available(url, access_key_id=None, secret_access_key=None)[source]

Check whether an S3 object is available

Parameters:

url (str) – full URL to the object
access_key_id (str) – S3 access identifier
secret_access_key (str) – Secret S3 access key

dclab.rtdc_dataset.fmt_s3.is_s3_url(string)[source]: Check whether string is a valid S3 URL using regexp

dclab.rtdc_dataset.fmt_s3.REGEXP_S3_URL = re.compile('^(https?:\\/\\/)([a-z0-9-\\.]*)(\\:[0-9]*)?\\/.+\\/.+'): Regular expression for matching a DCOR resource URL

Dictionary format

class dclab.rtdc_dataset.RTDC_Dict(ddict, *args, **kwargs)[source]

Dictionary-based RT-DC dataset

Parameters:

ddict (dict) –
Dictionary with features as keys (valid features like “area_cvx”, “deform”, “image” are defined by dclab.definitions.feature_exists) with which the class will be instantiated. The configuration is set to the default configuration of dclab.

Changed in version 0.27.0: Scalar features are automatically converted to arrays.
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase

property hash: Reproducible dataset hash (defined by derived classes)

Hierarchy format

class dclab.rtdc_dataset.RTDC_Hierarchy(hparent, apply_filter=True, *args, **kwargs)[source]

Hierarchy dataset (filtered from RTDCBase)

A few words on hierarchies: The idea is that a subclass of RTDCBase can use the filtered data of another subclass of RTDCBase and interpret these data as unfiltered events. This comes in handy e.g. when the percentage of different subpopulations need to be distinguished without the noise in the original data.

Children in hierarchies always update their data according to the filtered event data from their parent when apply_filter is called. This makes it easier to save and load hierarchy children with e.g. DCscope and it makes the handling of hierarchies more intuitive (when the parent changes, the child changes as well).

Parameters:

hparent (instance of RTDCBase) – The hierarchy parent
apply_filter (bool) – Whether to apply the filter during instantiation; If set to False, apply_filter must be called manually.
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase

hparent

Hierarchy parent of this instance

Type:: RTDCBase

apply_filter(*args, **kwargs)[source]: Overridden apply_filter to perform tasks for hierarchy child

get_root_parent()[source]: Return the root parent of this dataset

rejuvenate()[source]

Redraw the hierarchy tree, updating config and features

You should call this function whenever you change something in the hierarchy parent(s), be it filters or metadata for computing ancillary features.

property basins

Basins with upstream features from internal/external locations

If the instance was created with enable_basins=False, then only internal basins are returned.

property features: All available features

property features_ancillary

All available ancillary features

This includes all ancillary features, excluding the features that are already in self.features_innate. This means that there may be overlap between features_ancillary and e.g. self.features_basin.

Added in version 0.58.0.

property features_basin: All features accessed via upstream basins from other locations

property features_innate

All features excluding ancillary, basin, or temporary features

Internal basin features are included since version 0.71.4.

property features_loaded

All features that have been computed

This includes ancillary features and temporary features.

Notes

Ancillary features that are computationally cheap to compute are always included. They are defined in dclab.rtdc_dataset.feat_anc_core.FEATURES_RAPID.

property features_local

All features that are, with certainty, really fast to access

Local features is a slimmed down version of features_loaded. Nothing needs to be computed, not even rapid features (dclab.rtdc_dataset.feat_anc_core.FEATURES_RAPID). And features from remote sources that have not been downloaded already are excluded. Ancillary and temporary features that are available are included.

property features_scalar: All scalar features available

property hash: Hashes of a hierarchy child changes if the parent changes

property logs

property tables

TDMS format

class dclab.rtdc_dataset.RTDC_TDMS(tdms_path, *args, **kwargs)[source]

TDMS file format for RT-DC measurements

Parameters:

tdms_path (str or pathlib.Path) – Path to a ‘.tdms’ measurement file.
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase

path

Path to the experimental dataset (main .tdms file)

Type:: pathlib.Path

dclab.rtdc_dataset.fmt_tdms.get_project_name_from_path(path, append_mx=False)[source]

Get the project name from a path.

For a path “/home/peter/hans/HLC12398/online/M1_13.tdms” or For a path “/home/peter/hans/HLC12398/online/data/M1_13.tdms” or without the “.tdms” file, this will return always “HLC12398”.

Parameters:

path (str or pathlib.Path) – path to tdms file
append_mx (bool) – append measurement number, e.g. “M1”

dclab.rtdc_dataset.fmt_tdms.get_tdms_files(directory)[source]

Recursively find projects based on ‘.tdms’ file endings

Searches the directory recursively and return a sorted list of all found ‘.tdms’ project files, except fluorescence data trace files which end with _traces.tdms.

Basin features

With basins, you can create analysis pipelines that result in output files which, when opened in dclab, can access features stored in the input file (without having to write those features to the output file).

exception dclab.rtdc_dataset.feat_basin.BasinFeatureMissingWarning[source]: Used when a badin feature is defined but not stored

exception dclab.rtdc_dataset.feat_basin.BasinIdentifierMismatchError[source]: Used when the identifier of a basin does not match the definition

exception dclab.rtdc_dataset.feat_basin.BasinNotAvailableError[source]: Used to identify situations where the basin data is not available

exception dclab.rtdc_dataset.feat_basin.BasinmapFeatureMissingError[source]: Used when one of the basinmap features is not defined

exception dclab.rtdc_dataset.feat_basin.CyclicBasinDependencyFoundWarning[source]: Used when a basin is defined in one of its sub-basins

exception dclab.rtdc_dataset.feat_basin.IgnoringPerishableBasinTTL[source]: Used when refreshing a basin does not support TTL

class dclab.rtdc_dataset.feat_basin.Basin(location, name=None, description=None, features=None, referrer_identifier=None, basin_identifier=None, mapping='same', mapping_referrer=None, ignored_basins=None, key=None, perishable=False, **kwargs)[source]

A basin represents data from an external source

The external data must be a valid RT-DC dataset, subclasses should ensure that the corresponding API is available.

Parameters:

location (str) – Location of the basin, this can be a path or a URL, depending on the implementation of the subclass
name (str) – Human-readable name of the basin
description (str) – Lengthy description of the basin
features (list of str) – List of features this basin provides; This list is enforced, even if the basin actually contains more features.
referrer_identifier (str) – A measurement identifier against which to check the basin. If the basin mapping is “same”, then this must match the identifier of the basin exactly, otherwise it must start with the basin identifier (e.g. “basin-id_referrer-sub-id”). If this is set to None (default), there is no certainty that the downstream dataset is from the same measurement.
basin_identifier (str) – A measurement identifier that must match the basin exactly. In contrast to referrer_identifier, the basin identifier is the identifier of the basin file. If basin_identifier is specified, the identifier of the basin must be identical to it.
mapping (str) – Which type of mapping to use. This can be either “same” when the event list of the basin is identical to that of the dataset defining the basin, or one of the “basinmap” features (e.g. “basinmap1”) in cases where the dataset consists of a subset of the events of the basin dataset. In the latter case, the feature defined by mapping must be present in the dataset and consist of integer-valued indices (starting at 0) for the basin dataset.
mapping_referrer (dict-like) – Dict-like object from which “basinmap” features can be obtained in situations where mapping != “same”. This can be a simple dictionary of numpy arrays or e.g. an instance of RTDCBase.
ignored_basins (list of str) – List of basins to ignore in subsequent basin instantiations
key (str) – Unique key to identify this basin; normally computed from a JSON dump of the basin definition. A random string is used if None is specified.
perishable (bool or PerishableRecord) – If this is not False, then it must be a PerishableRecord that holds the information about the expiration time, and that comes with a method refresh to extend the lifetime of the basin.
kwargs – Additional keyword arguments passed to the load_dataset method of the Basin subclass.
versionchanged (..) – Added the mapping keyword argument to support basins with a superset of events.

as_dict()[source]

Return basin kwargs for RTDCWriter.store_basin()

Note that each subclass of RTDCBase has its own implementation of RTDCBase.basins_get_dicts() which returns a list of basin dictionaries that are used to instantiate the basins in RTDCBase.basins_enable(). This method here is only intended for usage with RTDCWriter.store_basin().

close()[source]: Close any open file handles or connections

get_feature_data(feat)[source]: Return an object representing feature data of the basin

get_measurement_identifier()[source]: Return the identifier of the basin dataset

abstractmethod is_available()[source]: Return True if the basin is available

load_dataset(location, **kwargs)[source]

Return an instance of RTDCBase for this basin

If the basin mapping (self.mapping) is not the same as the referencing dataset

abstract property basin_format: Basin format (RTDCBase subclass), e.g. “hdf5” or “s3”

abstract property basin_type: Storage type to use (e.g. “file” or “remote”)

property basinmap: Contains the indexing array in case of a mapped basin

description: lengthy description of the basin

property ds: The RTDCBase instance represented by the basin

property features: Features made available by the basin

ignored_basins: ignored basins

kwargs: additional keyword arguments passed to the basin

location: location of the basin (e.g. path or URL)

mapping: Event mapping strategy. If this is “same”, it means that the referring dataset and the basin dataset have identical event indices. If mapping is e.g. basinmap1 then the mapping of the indices from the basin to the referring dataset is defined in self.basinmap (copied during initialization of this class from the array in the key basinmap1 from the dict-like object mapping_referrer).

name: user-defined name of the basin

referrer_identifier: measurement identifier of the referencing dataset

class dclab.rtdc_dataset.feat_basin.BasinAvailabilityChecker(basin, *args, **kwargs)[source]

Helper thread for checking basin availability in the background

This constructor should always be called with keyword arguments. Arguments are:

group should be None; reserved for future extension when a ThreadGroup class is implemented.

target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.

name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a small decimal number.

args is a list or tuple of arguments for the target invocation. Defaults to ().

kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.

If a subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.

run()[source]

Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.

class dclab.rtdc_dataset.feat_basin.BasinProxyFeature(feat_obj, basinmap)[source]: Wrap around a feature object, mapping it upon data access

class dclab.rtdc_dataset.feat_basin.InternalH5DatasetBasin(*args, **kwargs)[source]

Parameters:

location (str) – Location of the basin, this can be a path or a URL, depending on the implementation of the subclass
name (str) – Human-readable name of the basin
description (str) – Lengthy description of the basin
features (list of str) – List of features this basin provides; This list is enforced, even if the basin actually contains more features.
referrer_identifier (str) – A measurement identifier against which to check the basin. If the basin mapping is “same”, then this must match the identifier of the basin exactly, otherwise it must start with the basin identifier (e.g. “basin-id_referrer-sub-id”). If this is set to None (default), there is no certainty that the downstream dataset is from the same measurement.
basin_identifier (str) – A measurement identifier that must match the basin exactly. In contrast to referrer_identifier, the basin identifier is the identifier of the basin file. If basin_identifier is specified, the identifier of the basin must be identical to it.
mapping (str) – Which type of mapping to use. This can be either “same” when the event list of the basin is identical to that of the dataset defining the basin, or one of the “basinmap” features (e.g. “basinmap1”) in cases where the dataset consists of a subset of the events of the basin dataset. In the latter case, the feature defined by mapping must be present in the dataset and consist of integer-valued indices (starting at 0) for the basin dataset.
mapping_referrer (dict-like) – Dict-like object from which “basinmap” features can be obtained in situations where mapping != “same”. This can be a simple dictionary of numpy arrays or e.g. an instance of RTDCBase.
ignored_basins (list of str) – List of basins to ignore in subsequent basin instantiations
key (str) – Unique key to identify this basin; normally computed from a JSON dump of the basin definition. A random string is used if None is specified.
perishable (bool or PerishableRecord) – If this is not False, then it must be a PerishableRecord that holds the information about the expiration time, and that comes with a method refresh to extend the lifetime of the basin.
kwargs – Additional keyword arguments passed to the load_dataset method of the Basin subclass.
versionchanged (..) – Added the mapping keyword argument to support basins with a superset of events.

is_available()[source]: Return True if the basin is available

verify_basin(*args, **kwargs)[source]: It’s not necessary to verify internal basins

class dclab.rtdc_dataset.feat_basin.PerishableRecord(basin, expiration_func=None, expiration_kwargs=None, refresh_func=None, refresh_kwargs=None)[source]

A class containing information about perishable basins

Perishable basins are basins that may discontinue to work after e.g. a specific amount of time (e.g. presigned S3 URLs). With the PerishableRecord, these basins may be “refreshed” (made available again).

Parameters:

basin (Basin) – Instance of the perishable basin
expiration_func (callable) – A function that determines whether the basin has perished. It must accept basin as the first argument. Calling this function should be fast, as it is called every time a feature is accessed. Note that if you are implementing this in the time domain, then you should use time.time() (TSE), because you need an absolute time measure. time.monotonic() for instance does not count up when the system goes to sleep. However, keep in mind that if a remote machine dictates the expiration time, then that remote machine should also transmit the creation time (in case there are time offsets).
expiration_kwargs (dict) – Additional kwargs for expiration_func.
refresh_func (callable) – The function used to refresh the basin. It must accept basin as the first argument.
refresh_kwargs (dict) – Additional kwargs for refresh_func

perished()[source]

Determine whether the basin has perished

Returns:: state – True means the basin has perished, False means the basin has not perished, and None means we don’t know
Return type:: bool or None

refresh(extend_by=None)[source]

Extend the lifetime of the associated perishable basin

Parameters:: extend_by (float) – Custom argument for extending the life of the basin. Normally, this would be a lifetime.
Returns:: basin – Dictionary for instantiating a new basin
Return type:: dict | None

dclab.rtdc_dataset.feat_basin.basin_priority_sorted_key(bdict)[source]

Yield a sorting value for a given basin that can be used with sorted

Basins are normally stored in random order in a dataset. This method brings them into correct order, prioritizing:

type: “file” over “remote”
format: “HTTP” over “S3” over “dcor”
mapping: “same” over anything else

Parameters:: bdict (Dict)

Ancillaries

Computation of ancillary features

Ancillary features are computed on-the-fly in dclab if the required data are available. The features are registered here and are computed when RTDCBase.__getitem__ is called with the respective feature name. When RTDCBase.__contains__ is called with the feature name, then the feature is not yet computed, but the prerequisites are evaluated:

In [1]: import dclab

In [2]: ds = dclab.new_dataset("data/example.rtdc")

In [3]: ds.config["calculation"]["emodulus lut"] = "LE-2D-FEM-19"

In [4]: ds.config["calculation"]["emodulus medium"] = "CellCarrier"

In [5]: ds.config["calculation"]["emodulus temperature"] = 23.0

In [6]: ds.config["calculation"]["emodulus viscosity model"] = 'buyukurganci-2022'

In [7]: "emodulus" in ds  # nothing is computed yet
Out[7]: True

In [8]: ds["emodulus"]  # now data are computed and cached
Out[8]: 
array([1.11112189, 0.98155247,        nan, ...,        nan,        nan,
       0.68137091], shape=(5000,))

Once the data has been computed, RTDCBase caches it in the _ancillaries property dict together with a hash that is computed with AncillaryFeature.hash. The hash is computed from the feature data req_features and the configuration metadata req_config.

exception dclab.rtdc_dataset.feat_anc_core.ancillary_feature.BadFeatureSizeWarning[source]

class dclab.rtdc_dataset.feat_anc_core.ancillary_feature.AncillaryFeature(feature_name, method, req_config=None, req_features=None, req_func=<function AncillaryFeature.<lambda>>, priority=0, data=None, identifier=None)[source]

A data feature that is computed from existing data

Parameters:

feature_name (str) – The name of the ancillary feature, e.g. “emodulus”.
method (callable) – The method that computes the feature. This method takes an instance of RTDCBase as argument.
req_config (list) – Required configuration parameters to compute the feature, e.g. [“calculation”, [“emodulus lut”, “emodulus viscosity”]]
req_features (list) – Required existing features in the dataset, e.g. [“area_cvx”, “deform”]
req_func (callable) –
A function that takes an instance of RTDCBase as an argument and checks whether any other necessary criteria are met. By default, this is a lambda function that returns True. The function should return False if the necessary criteria are not met. This function may also return a hashable object (via dclab.util.objstr()) instead of True, if the criteria are subject to change. In this case, the return value is used for identifying the cached ancillary feature.

Changed in version 0.27.0: Support non-boolean return values for caching purposes.
priority (int) – The priority of the feature; if there are multiple AncillaryFeature defined for the same feature_name, then the priority of the features defines which feature returns True in self.is_available. A higher value means a higher priority.
data (object or BaseModel) – Any other data relevant for the feature (e.g. the ML model for computing ‘ml_score_xxx’ features)
identifier (None or str) – A unique identifier (e.g. MD5 hash) of the ancillary feature. For PluginFeatures or ML features, this should be computed at least from the input file and the feature name.

Notes

req_config and req_features are used to test whether the feature can be computed in self.is_available.

static available_features(rtdc_ds)[source]

Determine available features for an RT-DC dataset

Parameters:: rtdc_ds (instance of RTDCBase) – The dataset to check availability for
Returns:: features – Dictionary with feature names as keys and instances of AncillaryFeature as values.
Return type:: dict

static check_data_size(rtdc_ds, data_dict)[source]

Check the feature data is the correct size. If it isn’t, resize it.

Parameters:

rtdc_ds (instance of RTDCBase) – The dataset from which the features are computed
data_dict (dict) – Dictionary with AncillaryFeature.feature_name as keys and the computed data features (to be resized) as values.

Returns:

data_dict – Dictionary with feature_name as keys and the correctly resized data features as values.

Return type:

dict

static get_instances(feature_name)[source]: Return all instances that compute feature_name

compute(rtdc_ds)[source]

Compute the feature with self.method. All ancillary features that share the same method will also be populated automatically.

Parameters:: rtdc_ds (instance of RTDCBase) – The dataset to compute the feature for
Returns:: data_dict – Dictionary with AncillaryFeature.feature_name as keys and the computed data features (read-only) as values.
Return type:: dict

hash(rtdc_ds)[source]

Used for identifying an ancillary computation

The required features, the used configuration keys/values, and the return value of the requirement function are hashed.

is_available(rtdc_ds, verbose=False)[source]

Check whether the feature is available

Parameters:: rtdc_ds (instance of RTDCBase) – The dataset to check availability for
Returns:: available – True, if feature can be computed with compute
Return type:: bool

Notes

This method returns False for a feature if there is a feature defined with the same name but with higher priority (even if the feature would be available otherwise).

feature_names = ['time', 'index', 'area_ratio', 'area_um', 'aspect', 'deform', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'fl1_max_ctc', 'fl2_max_ctc', 'fl3_max_ctc', 'fl1_max_ctc', 'fl2_max_ctc', 'fl1_max_ctc', 'fl3_max_ctc', 'fl2_max_ctc', 'fl3_max_ctc', 'contour', 'bright_avg', 'bright_sd', 'bright_bc_avg', 'bright_bc_sd', 'bright_perc_10', 'bright_perc_90', 'inert_ratio_cvx', 'inert_ratio_prnc', 'inert_ratio_raw', 'tilt', 'volume', 'ml_class', 'circ_times_area', 'area_exp']: All feature names registered

features = [<AncillaryFeature 'time' (no ID) with priority 0>, <AncillaryFeature 'index' (no ID) with priority 0>, <AncillaryFeature 'area_ratio' (no ID) with priority 0>, <AncillaryFeature 'area_um' (no ID) with priority 0>, <AncillaryFeature 'aspect' (no ID) with priority 0>, <AncillaryFeature 'deform' (no ID) with priority 0>, <AncillaryFeature 'emodulus' (no ID) with priority 5>, <AncillaryFeature 'emodulus' (no ID) with priority 1>, <AncillaryFeature 'emodulus' (no ID) with priority 4>, <AncillaryFeature 'emodulus' (no ID) with priority 0>, <AncillaryFeature 'emodulus' (no ID) with priority 2>, <AncillaryFeature 'fl1_max_ctc' (no ID) with priority 1>, <AncillaryFeature 'fl2_max_ctc' (no ID) with priority 1>, <AncillaryFeature 'fl3_max_ctc' (no ID) with priority 1>, <AncillaryFeature 'fl1_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl2_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl1_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl3_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl2_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl3_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'contour' (no ID) with priority 0>, <AncillaryFeature 'bright_avg' (no ID) with priority 0>, <AncillaryFeature 'bright_sd' (no ID) with priority 0>, <AncillaryFeature 'bright_bc_avg' (no ID) with priority 0>, <AncillaryFeature 'bright_bc_sd' (no ID) with priority 0>, <AncillaryFeature 'bright_perc_10' (no ID) with priority 0>, <AncillaryFeature 'bright_perc_90' (no ID) with priority 0>, <AncillaryFeature 'inert_ratio_cvx' (no ID) with priority 0>, <AncillaryFeature 'inert_ratio_prnc' (no ID) with priority 0>, <AncillaryFeature 'inert_ratio_raw' (no ID) with priority 0>, <AncillaryFeature 'tilt' (no ID) with priority 0>, <AncillaryFeature 'volume' (no ID) with priority 0>, <AncillaryFeature 'ml_class' (no ID) with priority 0>, <PlugInFeature 'circ_times_area' (id 70254...) with priority 0>, <PlugInFeature 'area_exp' (id 5f03f...) with priority 0>]: All ancillary features registered

Plugin features

Added in version 0.34.0.

exception dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.PluginImportError[source]

class dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.PlugInFeature(feature_name, info, plugin_path=None)[source]

A user-defined plugin feature

Parameters:

feature_name (str) – name of a feature that matches that defined in info
info (dict) –
Full plugin recipe (for all features) as given in the info dictionary in the plugin file. At least the following keys must be specified:
- ”method”: callable function computing the plugin feature values (takes an :class`dclab.rtdc_dataset.core.RTDCBase` as argument)
- ”feature names”: list of plugin feature names provided by the plugin
The following features are optional:
- ”description”: short (one-line) description of the plugin
- ”long description”: long description of the plugin
- ”feature labels”: feature labels used e.g. for plotting
- ”feature shapes”: list of tuples for each feature indicating the shape (this is required only for non-scalar features; for scalar features simply set this to None or (1,)).
- ”scalar feature”: list of boolean values indicating whether the features are scalar
- ”config required”: configuration keys required to compute the plugin features (see the req_config parameter for AncillaryFeature)
- ”features required”: list of feature names required to compute the plugin features (see the req_features parameter for AncillaryFeature)
- ”method check required”: additional method that checks whether the features can be computed (see the req_func parameter for AncillaryFeature)
- ”version”: version of this plugin (please use semantic verioning)
plugin_path (str or pathlib.Path, optional) – path which was used to load the PlugInFeature with load_plugin_feature().

Notes

PluginFeature inherits from AncillaryFeature. Please read the advanced section on PluginFeatures in the dclab docs.

feature_name: Plugin feature name

plugin_feature_info: Dictionary containing all information relevant for this particular plugin feature instance

plugin_path: Path to the original plugin file

dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.import_plugin_feature_script(plugin_path)[source]

Import the user-defined recipe and return the info dictionary

Parameters:: plugin_path (str or Path) – pathname to a valid dclab plugin script
Returns:: info – Dictionary with the information required to instantiate one (or multiple) PlugInFeature.
Return type:: dict
Raises:: PluginImportError – If the plugin can not be found

Notes

One recipe may define multiple plugin features.

dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.load_plugin_feature(plugin_path)[source]

Find and load PlugInFeature(s) from a user-defined recipe

Parameters:: plugin_path (str or Path) – pathname to a valid dclab plugin Python script
Returns:: plugin_list – list of PlugInFeature instances loaded from plugin_path
Return type:: list of PlugInFeature
Raises:: ValueError – If the script dictionary “feature names” are not a list

Notes

One recipe may define multiple plugin features.

See also

import_plugin_feature_script: function that imports the plugin script
PlugInFeature: class handling the plugin feature information
dclab.rtdc_dataset.feat_temp.register_temporary_feature: alternative method for creating user-defined features

dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.remove_all_plugin_features()[source]

Convenience function for removing all PlugInFeature instances

See also

remove_plugin_feature: remove a single PlugInFeature instance

dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.remove_plugin_feature(plugin_instance)[source]

Convenience function for removing a PlugInFeature instance

Parameters:: plugin_instance (PlugInFeature) – The PlugInFeature instance to be removed from dclab
Raises:: TypeError – If the plugin_instance is not a PlugInFeature instance

Temporary features

Added in version 0.33.0.

dclab.rtdc_dataset.feat_temp.deregister_all()[source]: Deregisters all temporary features

dclab.rtdc_dataset.feat_temp.deregister_temporary_feature(feature)[source]

Convenience function for deregistering a temporary feature

This method is mostly used during testing. It does not remove the actual feature data from any dataset; the data will stay in memory but is not accessible anymore through the public methods of the RTDCBase user interface.

Parameters:: feature (str)

dclab.rtdc_dataset.feat_temp.register_temporary_feature(feature, label=None, is_scalar=True)[source]

Register a new temporary feature

Temporary features are custom features that can be defined ad hoc by the user. Temporary features are helpful when the integral features are not enough, e.g. for prototyping, testing, or collating with other data. Temporary features allow you to leverage the full functionality of RTDCBase with your custom features (no need to go for a custom pandas.Dataframe).

Parameters:

feature (str) – Feature name; allowed characters are lower-case letters, digits, and underscores
label (str) – Feature label used e.g. for plotting
is_scalar (bool) – Whether or not the feature is a scalar feature

dclab.rtdc_dataset.feat_temp.set_temporary_feature(rtdc_ds, feature, data)[source]

Set temporary feature data for a dataset

Parameters:

rtdc_ds (dclab.RTDCBase) – Dataset for which to set the feature. Note that the length of the feature data must match the number of events in rtdc_ds. If the dataset is a hierarchy child, the data will also be set in the parent dataset, but only for those events that are part of the child. For all events in the parent dataset that are not part of the child dataset, the temporary feature is set to np.nan.
feature (str) – Feature name
data (np.ndarray) – The data

Config

class dclab.rtdc_dataset.config.Configuration(files=None, cfg=None, disable_checks=False)[source]

Configuration class for RT-DC datasets

This class has a dictionary-like interface to access and set configuration values, e.g.

cfg = load_from_file("/path/to/config.txt")
# access the channel width
cfg["setup"]["channel width"]
# modify the channel width
cfg["setup"]["channel width"] = 30

Parameters:

files (list of files) – The config files with which to initialize the configuration
cfg (dict-like) – The dictionary with which to initialize the configuration
disable_checks (bool) – Set this to True if you want to avoid checking against section and key names defined in dclab.definitions using verify_section_key(). This avoids excess warning messages when loading data from configuration files not generated by dclab.

copy()[source]: Return copy of current configuration

get(key, other)[source]: Famous dict.get function

Added in version 0.29.1.

keys()[source]: Return the configuration keys (sections)

save(filename)[source]: Save the configuration to a file

tojson()[source]

Convert the configuration to a JSON string

Note that the data type of some configuration options will likely be lost.

tostring(sections=None)[source]

Convert the configuration to its string representation

The optional argument sections allows to export only specific sections of the configuration, i.e. sections=dclab.dfn.CFG_METADATA will only export configuration data from the original measurement and no filtering data.

update(newcfg)[source]: Update current config with a dictionary

dclab.rtdc_dataset.config.load_from_file(cfg_file)[source]

Load the configuration from a file

Parameters:: cfg_file (str) – Path to configuration file
Returns:: cfg – Dictionary with configuration parameters
Return type:: ConfigurationDict

Export

exception dclab.rtdc_dataset.export.LimitingExportSizeWarning[source]

class dclab.rtdc_dataset.export.Export(rtdc_ds)[source]

Export functionalities for RT-DC datasets

avi(path, filtered=True, override=False, pixel_format='yuv420p', codec='rawvideo', codec_options=None, progress_callback=None)[source]

Exports filtered event images to a video file

Parameters:

path (str) – Path to a video file. The container is (.avi, .mkv, …) is deduced from the file suffix.
filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
override (bool) – If set to True, an existing file path will be overridden. If set to False, raises OSError if path exists.
pixel_format (str) – Which pixel format to give to ffmpeg.
codec (str) – Codec name; e.g. “rawvideo” or “libx264”
codec_options (dict[str, str]) – Additional arguments to give to the codec using ffmpeg, e.g. {‘preset’: ‘slow’, ‘crf’: ‘0’} for “libx264” codec.
progress_callback (callable) – Function that takes at least two arguments: float between 0 and 1 for monitoring progress and a string describing what is being done.

Notes

Raises OSError if current dataset does not contain image data

fcs(path, features, meta_data=None, filtered=True, override=False, progress_callback=None)[source]

Export the data of an RT-DC dataset to an .fcs file

Parameters:

path (str) – Path to an .fcs file. The ending .fcs is added automatically.
features (list of str) – The features in the resulting .fcs file. These are strings that are defined by dclab.definitions.scalar_feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “aspect”.
meta_data (dict) – User-defined, optional key-value pairs that are stored in the primary TEXT segment of the FCS file; the version of dclab is stored there by default
filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
override (bool) – If set to True, an existing file path will be overridden. If set to False, raises OSError if path exists.
progress_callback (callable) – Function that takes at least two arguments: float between 0 and 1 for monitoring progress and a string describing what is being done.

Notes

Due to incompatibility with the .fcs file format, all events with NaN-valued features are not exported.

hdf5(path, features=None, filtered=True, logs=False, tables=False, basins=False, allow_contour=False, meta_prefix='src_', override=False, compression_kwargs=None, compression='deprecated', skip_checks=False, progress_callback=None)[source]

Export the data of the current instance to an HDF5 file

Parameters:

path (str) – Path to an .rtdc file. The ending .rtdc is added automatically.
features (list of str) – The features in the resulting .rtdc file. These are strings that are defined by dclab.definitions.feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “image”. Defaults to self.rtdc_ds.features_innate.
filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
logs (bool) – Whether to store the logs of the original file prefixed with source_ to the output file.
tables (bool) – Whether to store the tables of the original file prefixed with source_ to the output file.
basins (bool) – Whether to export basins. If filtering is disabled, basins are copied directly to the output file. If filtering is enabled, then mapped basins are exported.
allow_contour (bool) – Whether to allow exporting the “contour” feature. Writing this feature to an HDF5 file is extremely inefficient, because it cannot be represented by an ND array and thus must be stored in a group, each contour stored in a separate dataset. The contour can easily be computed via the mask, so actually storing the contour should be avoided. If “contour” is in features, it will only be written to the output file if allow_contour=True.
meta_prefix (str) – Prefix for log and table names in the exported file
override (bool) – If set to True, an existing file path will be overridden. If set to False, raises OSError if path exists.
compression_kwargs (dict) – Dictionary with the keys “compression” and “compression_opts” which are passed to h5py.H5File.create_dataset(). The default is Zstandard compression with the compression level 5 hdf5plugin.Zstd(clevel=5).
compression (str or None) –
Compression method used for data storage; one of [None, “lzf”, “gzip”, “szip”].

Deprecated since version 0.43.0: Use compression_kwargs instead.
skip_checks (bool) – Disable checking whether all features have the same length.
progress_callback (callable) – Function that takes at least two arguments: float between 0 and 1 for monitoring progress and a string describing what is being done.
versionchanged: (..) – 0.58.0: The basins keyword argument was added, and it is now possible to pass an empty list to features. This combination results in a very small file consisting of metadata and a mapped basin referring to the original dataset.
versionchanged: – 0.71.8: The relative path to the original file is stored in the basin definition as well. Previously, renaming folders containing basin files and exported files located in different subdirectories broke the basin localization.

tsv(path, features, meta_data=None, filtered=True, override=False, progress_callback=None)[source]

Export the data of the current instance to a .tsv file

Parameters:

path (str) – Path to a .tsv file. The ending .tsv is added automatically.
features (list of str) – The features in the resulting .tsv file. These are strings that are defined by dclab.definitions.scalar_feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “aspect”.
meta_data (dict) – User-defined, optional key-value pairs that are stored at the beginning of the tsv file - one key-value pair is stored per line which starts with a hash. The version of dclab is stored there by default.
filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
override (bool) – If set to True, an existing file path will be overridden. If set to False, raises OSError if path exists.
progress_callback (callable) – Function that takes at least two arguments: float between 0 and 1 for monitoring progress and a string describing what is being done.

Filter

class dclab.rtdc_dataset.filter.Filter(rtdc_ds)[source]

Boolean filter arrays for RT-DC measurements

Parameters:: rtdc_ds (instance of RTDCBase) – The RT-DC dataset the filter applies to

reset()[source]: Reset all filters

update(rtdc_ds, force=None)[source]

Update the filters according to rtdc_ds.config[“filtering”]

Parameters:

rtdc_ds (dclab.rtdc_dataset.core.RTDCBase) – The measurement to which the filter is applied
force (list) – A list of feature names that must be refiltered with min/max values.

Notes

This function is called when ds.apply_filter is called.

property all

All filters combined (see Filter.update())

Use this property to filter the features of dclab.rtdc_dataset.RTDCBase instances

property box: All box filters

property invalid: Invalid (nan/inf) events

property polygon: Polygon filters

Low-level functionalities

downsampling

Content-based downsampling of ndarrays

dclab.downsampling.downsample_grid(a, b, samples, remove_invalid=False, ret_idx=False)

Content-based downsampling for faster visualization

The arrays a and b make up a 2D scatter plot with high and low density values. This method takes out points at indices with high density.

Parameters:

a (1d ndarrays) – The input arrays to downsample
b (1d ndarrays) – The input arrays to downsample
samples (int) – The desired number of samples
remove_invalid (bool) – Remove nan and inf values before downsampling; if set to True, the actual number of samples returned might be smaller than samples due to infinite or nan values.
ret_idx (bool) – Also return a boolean array that corresponds to the downsampled indices in a and b.

Returns:

dsa, dsb (1d ndarrays of shape (samples,)) – The arrays a and b downsampled by evenly selecting points and pseudo-randomly adding or removing points to match samples.
idx (1d boolean array with same shape as a) – Only returned if ret_idx is True. A boolean array such that a[idx] == dsa

dclab.downsampling.downsample_rand(a, samples, remove_invalid=False, ret_idx=False)

Downsampling by randomly removing points

Parameters:

a (1d ndarray) – The input array to downsample
samples (int) – The desired number of samples
remove_invalid (bool) – Remove nan and inf values before downsampling
ret_idx (bool) – Also return a boolean array that corresponds to the downsampled indices in a.

Returns:

dsa (1d ndarray of size samples) – The pseudo-randomly downsampled array a
idx (1d boolean array with same shape as a) – Only returned if ret_idx is True. A boolean array such that a[idx] == dsa

dclab.downsampling.norm(a): Normalize a with its min/max values

dclab.downsampling.populate_grid(x_discrete, y_discrete, keepd, toproc)

dclab.downsampling.valid(a, b): Check whether a and b are not inf or nan

features

image-based

dclab.features.contour.get_contour(mask)[source]

Compute the image contour from a mask

The contour is computed in a very inefficient way using scikit-image and a conversion of float coordinates to pixel coordinates.

Parameters:: mask (binary ndarray of shape (M,N) or (K,M,N)) – The mask outlining the pixel positions of the event. If a 3d array is given, then K indexes the individual contours.
Returns:: cont – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.
Return type:: ndarray or list of K ndarrays of shape (J,2)

dclab.features.bright.get_bright(mask, image, ret_data='avg,sd')[source]

Compute avg and/or std of the event brightness

The event brightness is defined by the gray-scale values of the image data within the event mask area.

Parameters:

mask (ndarray or list of ndarrays of shape (M,N) and dtype bool) – The mask values, True where the event is located in image.
image (ndarray or list of ndarrays of shape (M,N)) – A 2D array that holds the image in form of grayscale values of an event.
ret_data (str) – A comma-separated list of metrices to compute - “avg”: compute the average - “sd”: compute the standard deviation Selected metrics are returned in alphabetical order.

Returns:

bright_avg (float or ndarray of size N) – Average image data within the contour
bright_std (float or ndarray of size N) – Standard deviation of image data within the contour

Return type:

float | ndarray[tuple[Any, …], dtype[_ScalarT]] | tuple[float, float] | tuple[ndarray[tuple[Any, …], dtype[_ScalarT]], ndarray[tuple[Any, …], dtype[_ScalarT]]]

dclab.features.inert_ratio.get_inert_ratio_cvx(cont)[source]

Compute the inertia ratio of the convex hull of a contour

The inertia ratio is computed from the central second order of moments along x (mu20) and y (mu02) via sqrt(mu20/mu02).

Parameters:

cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.

Returns:

inert_ratio_cvx (float or ndarray of size N) – The inertia ratio of the contour’s convex hull
.. versionchanged:: 0.48.2 – For long channels, an integer overflow could occur in previous versions, leading invalid or nan values. See https://github.com/DC-analysis/dclab/issues/212

Return type:

float | ndarray[tuple[Any, …], dtype[_ScalarT]]

Notes

The contour moments mu20 and mu02 are computed the same way they are computed in OpenCV’s moments.cpp.

See also

get_inert_ratio_raw: Compute inertia ratio of a raw contour

References

dclab.features.inert_ratio.get_inert_ratio_raw(cont)[source]

Compute the inertia ratio of a contour

The inertia ratio is computed from the central second order of moments along x (mu20) and y (mu02) via sqrt(mu20/mu02).

Parameters:

cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.

Returns:

inert_ratio_raw (float or ndarray of size N) – The inertia ratio of the contour
.. versionchanged:: 0.48.2 – For long channels, an integer overflow could occur in previous versions, leading invalid or nan values. See https://github.com/DC-analysis/dclab/issues/212

Return type:

float | ndarray[tuple[Any, …], dtype[_ScalarT]]

Notes

The contour moments mu20 and mu02 are computed the same way they are computed in OpenCV’s moments.cpp.

See also

get_inert_ratio_cvx: Compute inertia ratio of the convex hull of a contour

References

dclab.features.volume.get_volume(cont, pos_x, pos_y, pix, fix_orientation=False)[source]

Calculate the volume of a polygon revolved around an axis

The volume estimation assumes rotational symmetry.

Parameters:

cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event [px] e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.
pos_x (float or ndarray of length N) – The x coordinate(s) of the centroid of the event(s) [µm] e.g. obtained using mm.pos_x
pos_y (float or ndarray of length N) – The y coordinate(s) of the centroid of the event(s) [µm] e.g. obtained using mm.pos_y
pix (float) – The detector pixel size in µm. e.g. obtained using: mm.config[“imaging”][“pixel size”]
fix_orientation (bool) – If set to True, make sure that the orientation of the contour is counter-clockwise in the r-z plane (see vol_revolve()). This is False by default, because (1) Shape-In always stores the contours in the correct orientation and (2) there may be events with high porosity where “fixing” the orientation makes things worse and a negative volume is returned.

Returns:

volume in um^3

Return type:

volume

Notes

The computation of the volume is based on a full rotation of the upper and the lower halves of the contour from which the average is then used.

The volume is computed radially from the center position given by (pos_x, pos_y). For sufficiently smooth contours, such as densely sampled ellipses, the center position does not play an important role. For contours that are given on a coarse grid, as is the case for RT-DC, the center position must be given.

References

https://de.wikipedia.org/wiki/Kegelstumpf#Formeln
Yields identical results to the Matlab script by Geoff Olynyk <https://de.mathworks.com/matlabcentral/fileexchange/36525-volrevolve>`_

dclab.features.volume.counter_clockwise(cx, cy)[source]

Put contour coordinates into counter-clockwise order

Parameters:

cx (1d ndarrays) – The x- and y-coordinates of the contour
cy (1d ndarrays) – The x- and y-coordinates of the contour

Returns:

The x- and y-coordinates of the contour in counter-clockwise orientation.

Return type:

cx_cc, cy_cc

Notes

The contour must be centered around (0, 0).

dclab.features.volume.vol_revolve(r, z, point_scale=1.0)[source]

Calculate the volume of a polygon revolved around the Z-axis

This implementation yields the same results as the volRevolve Matlab function by Geoff Olynyk (from 2012-05-03) https://de.mathworks.com/matlabcentral/fileexchange/36525-volrevolve.

The difference here is that the volume is computed using (a much more approachable) implementation using the volume of a truncated cone (https://de.wikipedia.org/wiki/Kegelstumpf).

\[V = \frac{h \cdot \pi}{3} \cdot (R^2 + R \cdot r + r^2)\]

Where \(h\) is the height of the cone and \(r\) and R are the smaller and larger radii of the truncated cone.

Each line segment of the contour resembles one truncated cone. If the z-step is positive (counter-clockwise contour), then the truncated cone volume is added to the total volume. If the z-step is negative (e.g. inclusion), then the truncated cone volume is removed from the total volume.

Changed in version 0.37.0: The volume in previous versions was overestimated by on average 2µm³.

Parameters:

r (1d ndarray) – radial coordinates (perpendicular to the z axis)
z (1d ndarray) – coordinate along the axis of rotation
point_scale (float) – point size in your preferred units; The volume is multiplied by a factor of point_scale**3.

Return type:

float | ndarray[tuple[Any, …], dtype[_ScalarT]]

Notes

The coordinates must be given in counter-clockwise order, otherwise the volume will be negative.

emodulus

Computation of apparent Young’s modulus for RT-DC measurements

exception dclab.features.emodulus.KnowWhatYouAreDoingWarning[source]

exception dclab.features.emodulus.YoungsModulusLookupTableExceededWarning[source]

dclab.features.emodulus.extrapolate_emodulus(lut, datax, deform, emod, deform_norm, deform_thresh=0.05, inplace=True)[source]

Use spline interpolation to fill in nan-values

When points (datax, deform) are outside the convex hull of the lut, then scipy.interpolate.griddata() returns nan-valules.

With this function, some of these nan-values are extrapolated using scipy.interpolate.SmoothBivariateSpline. The supported extrapolation values are currently limited to those where the deformation is above 0.05.

A warning will be issued, because this is not really recommended.

Parameters:

lut (ndarray of shape (N, 3)) – The normalized (!! see normalize()) LUT (first axis is points, second axis enumerates datax, deform, and emodulus)
datax (ndarray of size N) – The normalized x data (corresponding to lut[:, 0])
deform (ndarray of size N) – The normalized deform (corresponding to lut[:, 1])
emod (ndarray of size N) – The emodulus (corresponding to lut[:, 2]); If emod does not contain nan-values, there is nothing to do here.
deform_norm (float) – The normalization value used to normalize lut[:, 1] and deform.
deform_thresh (float) – Not the entire LUT is used for bivariate spline interpolation. Only the points where lut[:, 1] > deform_thresh/deform_norm are used. This is necessary, because for small deformations, the LUT has an extreme slope that kills any meaningful spline interpolation.
inplace (bool) – If True (default), replaces nan values in emod in-place. If False, emod is not modified.

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

dclab.features.emodulus.get_emodulus(deform, area_um=None, volume=None, medium='0.49% MC-PBS', channel_width=20.0, flow_rate=0.16, px_um=0.34, temperature=23.0, lut_data='LE-2D-FEM-19', visc_model='herold-2017-fallback', extrapolate=False, copy=True)[source]

Compute apparent Young’s modulus using a look-up table

Parameters:

deform (float | ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Deformation (1-circularity) of the event(s)
area_um (float | ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – Apparent (2D image) area [µm²] of the event(s)
volume (float | ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) –
Apparent volume of the event(s). It is not possible to define volume and area_um at the same time (makes no sense).

Added in version 0.25.0.
medium (float | str) – The medium to compute the viscosity for. If a string is given, the viscosity is computed. If a float is given, this value is used as the viscosity in mPa*s (Note that temperature and visc_model must be set to None in this case).
channel_width (float) – The channel width [µm]
flow_rate (float) – Flow rate [µL/s]
px_um (float) – The detector pixel size [µm] used for pixelation correction. Set to zero to disable.
temperature (float | ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – Temperature [°C] of the event(s)
lut_data (str, path, or tuple of (np.ndarray of shape (N, 3), dict)) –
The LUT data to use. If it is a built-in identifier, then the respective LUT will be used. Otherwise, a path to a file on disk or a tuple (LUT array, metadata) is possible. The LUT metadata is used to check whether the given features (e.g. area_um and deform) are valid interpolation choices.

Added in version 0.25.0.
visc_model (Literal['herold-2017', 'herold-2017-fallback', 'buyukurganci-2022', 'kestin-1978', None]) – The viscosity model to use, see dclab.features.emodulus.viscosity.get_viscosity()
extrapolate (bool) – Perform extrapolation using extrapolate_emodulus(). This is discouraged!
copy (bool) – Copy input arrays. If set to False, input arrays are overridden.

Returns:

elasticity – Apparent Young’s modulus in kPa

Return type:

float or ndarray

Notes

The look-up table used was computed with finite elements methods according to [MMM+17] and complemented with analytical isoelastics from [MOG+15]. The original simulation results are available on figshare [WMM+20].
The computation of the Young’s modulus takes into account a correction for the viscosity (medium, channel width, flow rate, and temperature) [MOG+15] and a correction for pixelation for the deformation which were derived from a (pixelated) image [Her17].
Note that while deformation is pixelation-corrected, area_um and volume are scaled to match the LUT data. This is somewhat fortunate, because we don’t have to worry about the order of applying pixelation correction and scale conversion.
By using external LUTs, it is possible to interpolate on the volume-deformation plane. This feature was added in version 0.25.0.

See also

dclab.features.emodulus.viscosity.get_viscosity: compute viscosity for known media

dclab.features.emodulus.normalize(data, dmax)[source]

Perform normalization in-place for interpolation

Note that scipy.interpolate.griddata() has a rescale option which rescales the data onto the unit cube. For some reason this does not work well with LUT data, so we just normalize it by dividing by the maximum value.

Parameters:

data (ndarray[tuple[Any, ...], dtype[_ScalarT]])
dmax (float)

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

dclab.features.emodulus.INACCURATE_SPLINE_EXTRAPOLATION = False: Set this to True to globally enable spline extrapolation when the area_um/deform data are outside the LUT. This is discouraged and a KnowWhatYouAreDoingWarning warning will be issued.

dclab.features.emodulus.load.get_internal_lut_names_dict()[source]

Return list of internal lut names

Return type:: dict

dclab.features.emodulus.load.get_lut_path(path_or_id)[source]

Find the path to a LUT

path_or_id: Identifier of a LUT. This can be either an existing path (checked first), or an internal identifier (see get_internal_lut_names_dict()).

Parameters:: path_or_id (str | Path)
Return type:: Path

dclab.features.emodulus.load.load_lut(lut_data='LE-2D-FEM-19')[source]

Load LUT data from disk

Parameters:

lut_data (pathlib.Path, str, or tuple of (np.ndarray of shape (N, 3), dict)) – The LUT data to use. If it is in get_internal_lut_names_dict(), then the respective LUT will be used. Otherwise, a path to a file on disk or a tuple (LUT array, metadata) is possible.

Returns:

lut (np.ndarray of shape (N, 3)) – The LUT data for interpolation
meta (dict) – The LUT metadata

Return type:

tuple[ndarray[tuple[Any, …], dtype[_ScalarT]], dict]

Notes

If lut_data is a tuple of (lut, meta), then nothing is actually done (this is implemented for user convenience).

dclab.features.emodulus.load.load_mtext(path)[source]

Load column-based data from text files with metadata

This file format is used for isoelasticity lines and look-up table data in dclab.

The text file is loaded with numpy.loadtxt. The metadata are stored as a json string between the “BEGIN METADATA” and the “END METADATA” tags. The last comment (#) line before the actual data defines the features with units in square brackets and tab-separated. For instance:

# […] # # BEGIN METADATA # { # “authors”: “A. Mietke, C. Herold, J. Guck”, # “channel_width”: 20.0, # “channel_width_unit”: “um”, # “date”: “2018-01-30”, # “dimensionality”: “2Daxis”, # “flow_rate”: 0.04, # “flow_rate_unit”: “uL/s”, # “fluid_viscosity”: 15.0, # “fluid_viscosity_unit”: “mPa s”, # “identifier”: “LE-2D-ana-18”, # “method”: “analytical”, # “model”: “linear elastic”, # “publication”: “https://doi.org/10.1016/j.bpj.2015.09.006”, # “software”: “custom Matlab code”, # “summary”: “2D-axis-symmetric analytical solution” # } # END METADATA # # […] # # area_um [um^2] deform emodulus [kPa] 3.75331e+00 5.14496e-03 9.30000e-01 4.90368e+00 6.72683e-03 9.30000e-01 6.05279e+00 8.30946e-03 9.30000e-01 7.20064e+00 9.89298e-03 9.30000e-01 […]

Parameters:: path (str | Path)
Return type:: tuple[ndarray[tuple[Any, …], dtype[_ScalarT]], dict]

dclab.features.emodulus.load.register_lut(path, identifier=None)[source]

Register an external LUT file in dclab

This will add it to EXTERNAL_LUTS, which is required for emodulus computation as an ancillary feature.

Parameters:

path (str or pathlib.Path) – Path to the external LUT file
identifier (str or None) – The identifier is used for ancillary emodulus computation via the [calculation]: “emodulus lut” key. It is also used as the key in EXTERNAL_LUTS during registration. If not specified, (default) then the identifier given as JSON metadata in path is used.

Return type:

None

dclab.features.emodulus.load.EXTERNAL_LUTS = {}: Dictionary of look-up tables that the user added via register_lut().

Pixelation correction definitions

dclab.features.emodulus.pxcorr.corr_deform_with_area_um(area_um, px_um=0.34)[source]

Deformation correction for area_um-deform data

The contour in RT-DC measurements is computed on a pixelated grid. Due to sampling problems, the measured deformation is overestimated and must be corrected.

The correction formula is described in [Her17].

Parameters:

area_um (float | ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Area of the event(s) in µm²
px_um (float) – The detector pixel size in µm.

Returns:

Error of the deformation of the event(s) that must be subtracted from deform. deform_corr = deform - deform_delta

Return type:

deform_delta

dclab.features.emodulus.pxcorr.corr_deform_with_volume(volume, px_um=0.34)[source]

Deformation correction for volume-deform data

The contour in RT-DC measurements is computed on a pixelated grid. Due to sampling problems, the measured deformation is overestimated and must be corrected.

The correction is derived in scripts/pixelation_correction.py.

Parameters:

volume (float | ndarray[tuple[Any, ...], dtype[_ScalarT]]) – The “volume” feature (rotation of raw contour) [µm³]
px_um (float) – The detector pixel size in µm.

Returns:

Error of the deformation of the event(s) that must be subtracted from deform. deform_corr = deform - deform_delta

Return type:

deform_delta

dclab.features.emodulus.pxcorr.get_pixelation_delta(feat_corr, feat_absc, data_absc, px_um=0.34)[source]

Convenience function for obtaining pixelation correction

Parameters:

feat_corr (str) – Feature for which to compute the pixelation correction (e.g. “deform”)
feat_absc (str) – Feature with which to compute the correction (e.g. “area_um”);
data_absc (float | ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Corresponding data for feat_absc
px_um (float) – Detector pixel size [µm]

Returns:

For details see corr_deform_with_area_um() and
corr_deform_with_volume().

Return type:

float | ndarray[tuple[Any, …], dtype[_ScalarT]]

dclab.features.emodulus.pxcorr.get_pixelation_delta_pair(feat1, feat2, data1, data2, px_um=0.34)[source]

Convenience function that returns pixelation correction pair

Parameters:

feat1 (str)
feat2 (str)
data1 (float | ndarray[tuple[Any, ...], dtype[_ScalarT]])
data2 (float | ndarray[tuple[Any, ...], dtype[_ScalarT]])
px_um (float)

Return type:

tuple[float, float] | tuple[ndarray[tuple[Any, …], dtype[_ScalarT]], ndarray[tuple[Any, …], dtype[_ScalarT]]]

Scale conversion applicable to a linear elastic model

dclab.features.emodulus.scale_linear.convert(area_um, deform, channel_width_in, channel_width_out, emodulus=None, flow_rate_in=None, flow_rate_out=None, viscosity_in=None, viscosity_out=None, inplace=False)[source]

Convert area-deformation-emodulus triplet

The conversion formula is described in [MOG+15].

Parameters:

area_um (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Convex cell area [µm²]
deform (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Deformation
channel_width_in (float) – Original channel width [µm]
channel_width_out (float) – Target channel width [µm]
emodulus (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Young’s Modulus [kPa]
flow_rate_in (float) – Original flow rate [µL/s]
flow_rate_out (float) – Target flow rate [µL/s]
viscosity_in (float) – Original viscosity [mPa*s]
viscosity_out (float | ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Target viscosity [mPa*s]; This can be an array
inplace (bool) – If True, override input arrays with corrected data

Returns:

area_um_corr – Corrected cell area [µm²]
deform_corr – Deformation (a copy if inplace is False)
emodulus_corr – Corrected emodulus [kPa]; only returned if emodulus is given.

Return type:

tuple[ndarray[tuple[Any, …], dtype[_ScalarT]], ndarray[tuple[Any, …], dtype[_ScalarT]]] | tuple[ndarray[tuple[Any, …], dtype[_ScalarT]], ndarray[tuple[Any, …], dtype[_ScalarT]], ndarray[tuple[Any, …], dtype[_ScalarT]]]

Notes

If only area_um, deform, channel_width_in and channel_width_out are given, then only the area is corrected and returned together with the original deform. If all other arguments are not set to None, the emodulus is corrected and returned as well.

dclab.features.emodulus.scale_linear.scale_area_um(area_um, channel_width_in, channel_width_out, inplace=False, **kwargs)[source]

Perform scale conversion for area_um (linear elastic model)

The area scales with the characteristic length “channel radius” L according to (L’/L)².

The conversion formula is described in [MOG+15].

Parameters:

area_um (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Convex area [µm²]
channel_width_in (float) – Original channel width [µm]
channel_width_out (float) – Target channel width [µm]
inplace (bool) – If True, override input arrays with corrected data
kwargs – not used

Returns:

Scaled area [µm²]

Return type:

area_um_corr

dclab.features.emodulus.scale_linear.scale_emodulus(emodulus, channel_width_in, channel_width_out, flow_rate_in, flow_rate_out, viscosity_in, viscosity_out, inplace=False)[source]

Perform scale conversion for area_um (linear elastic model)

The conversion formula is described in [MOG+15].

Parameters:

emodulus (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Young’s Modulus [kPa]
channel_width_in (float) – Original channel width [µm]
channel_width_out (float) – Target channel width [µm]
flow_rate_in (float) – Original flow rate [µL/s]
flow_rate_out (float) – Target flow rate [µL/s]
viscosity_in (float) – Original viscosity [mPa*s]
viscosity_out (float | ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Target viscosity [mPa*s]; This can be an array
inplace (bool) – If True, override input arrays with corrected data

Returns:

Scaled emodulus [kPa]

Return type:

emodulus_corr

dclab.features.emodulus.scale_linear.scale_feature(feat, data, inplace=False, **scale_kw)[source]

Convenience function for scale conversions (linear elastic model)

This method wraps around all the other scale_* methods and also supports deform/circ.

Parameters:

feat (str) – Valid scalar feature name
data (float | ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Feature data
inplace (bool) – If True, override input arrays with corrected data
**scale_kw – Scale keyword arguments for the wrapped methods

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

dclab.features.emodulus.scale_linear.scale_volume(volume, channel_width_in, channel_width_out, inplace=False, **kwargs)[source]

Perform scale conversion for volume (linear elastic model)

The volume scales with the characteristic length “channel radius” L according to (L’/L)³.

Parameters:

volume (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Volume [µm³]
channel_width_in (float) – Original channel width [µm]
channel_width_out (float) – Target channel width [µm]
inplace (bool) – If True, override input arrays with corrected data
kwargs – not used

Returns:

Scaled volume [µm³]

Return type:

volume_corr

Viscosity computation for various media

exception dclab.features.emodulus.viscosity.TemperatureOutOfRangeWarning[source]

dclab.features.emodulus.viscosity.check_temperature(model, temperature, tmin, tmax)[source]

Raise a TemperatureOutOfRangeWarning if applicable

Raises:

TemperatureOutOfRangeWarning – If the given temperature is out of the given range.

Parameters:

model (str)
temperature (float | ndarray[tuple[Any, ...], dtype[_ScalarT]])
tmin (float)
tmax (float)

Return type:

None

dclab.features.emodulus.viscosity.get_viscosity(medium='0.49% MC-PBS', channel_width=20.0, flow_rate=0.16, temperature=23.0, model='herold-2017-fallback')[source]

Returns the viscosity for RT-DC-specific media

Media that are not pure (e.g. ketchup or polymer solutions) often exhibit a non-linear relationship between shear rate (determined by the velocity profile) and shear stress (determined by pressure differences). If the shear stress grows non-linearly with the shear rate resulting in a slope in log-log space that is less than one, then we are talking about shear thinning. The viscosity is not a constant anymore (as it is e.g. for water). At higher flow rates, the viscosity becomes smaller, following a power law. Christoph Herold characterized shear thinning for the CellCarrier media [Her17]. The resulting formulae for computing the viscosities of these media at different channel widths, flow rates, and temperatures, are implemented here.

Parameters:

medium (str) – The medium to compute the viscosity for; Valid values are defined in KNOWN_MEDIA.
channel_width (float) – The channel width in µm
flow_rate (float) – Flow rate in µL/s
temperature (float | ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Temperature in °C
model (Literal['herold-2017', 'herold-2017-fallback', 'buyukurganci-2022', 'kestin-1978']) – The model name to use for computing the medium viscosity. For water, this value is ignored, as there is only the ‘kestin-1978’ model [KSW78]. For MC-PBS media, there are the ‘herold-2017’ model [Her17] and the ‘buyukurganci-2022’ model [BBN+23].

Returns:

Viscosity in mPa*s

Return type:

viscosity

Notes

CellCarrier (0.49% MC-PBS) and CellCarrier B (0.59% MC-PBS) are media designed for RT-DC experiments.
A TemperatureOutOfRangeWarning is issued if the input temperature range exceeds the temperature ranges of the models.

dclab.features.emodulus.viscosity.get_viscosity_mc_pbs_buyukurganci_2022(medium='0.49% MC-PBS', channel_width=20.0, flow_rate=0.16, temperature=23.0)[source]

Compute viscosity of MC-PBS according to [BBN+23]

This viscosity model was derived in [BBN+23] and adapted for RT-DC in [RB23].

Parameters:

medium (Literal['0.49% MC-PBS', '0.59% MC-PBS', '0.83% MC-PBS'])
channel_width (float)
flow_rate (float)
temperature (float)

Return type:

float | ndarray[tuple[Any, …], dtype[_ScalarT]]

dclab.features.emodulus.viscosity.get_viscosity_mc_pbs_herold_2017(medium='0.49% MC-PBS', channel_width=20.0, flow_rate=0.16, temperature=23.0)[source]

Compute viscosity of MC-PBS according to [Her17]

Note that all the factors in equation 5.2 in [Her17] compute to 8, which is essentially what is implemented in shear_rate_square_channel():

\[1.1856 \cdot 6 \cdot \frac{2}{3} \cdot \frac{1}{0.5928} = 8\]

Parameters:

medium (Literal['0.49% MC-PBS', '0.59% MC-PBS'])
channel_width (float)
flow_rate (float)
temperature (float)

Return type:

float | ndarray[tuple[Any, …], dtype[_ScalarT]]

dclab.features.emodulus.viscosity.get_viscosity_water_kestin_1978(temperature=23.0)[source]

Compute the viscosity of water according to [KSW78]

Parameters:: temperature (float | ndarray[tuple[Any, ...], dtype[_ScalarT]])
Return type:: float | ndarray[tuple[Any, …], dtype[_ScalarT]]

dclab.features.emodulus.viscosity.shear_rate_square_channel(flow_rate, channel_width, flow_index)[source]

Returns The wall shear rate of a power law liquid in a squared channel.

Parameters:

flow_rate (float) – Flow rate in µL/s
channel_width (float) – The channel width in µm
flow_index (float) – Flow behavior index aka the power law exponent of the shear thinning

Returns:

Shear rate in 1/s.

Return type:

shear_rate

dclab.features.emodulus.viscosity.ALIAS_MEDIA = {'0.49% MC-PBS': '0.49% MC-PBS', '0.49% mc-pbs': '0.49% MC-PBS', '0.5% MC-PBS': '0.49% MC-PBS', '0.5% mc-pbs': '0.49% MC-PBS', '0.50% MC-PBS': '0.49% MC-PBS', '0.50% mc-pbs': '0.49% MC-PBS', '0.59% MC-PBS': '0.59% MC-PBS', '0.59% mc-pbs': '0.59% MC-PBS', '0.6% MC-PBS': '0.59% MC-PBS', '0.6% mc-pbs': '0.59% MC-PBS', '0.60% MC-PBS': '0.59% MC-PBS', '0.60% mc-pbs': '0.59% MC-PBS', '0.8% MC-PBS': '0.83% MC-PBS', '0.8% mc-pbs': '0.83% MC-PBS', '0.80% MC-PBS': '0.83% MC-PBS', '0.80% mc-pbs': '0.83% MC-PBS', '0.83% MC-PBS': '0.83% MC-PBS', '0.83% mc-pbs': '0.83% MC-PBS', 'CellCarrier': '0.49% MC-PBS', 'CellCarrier B': '0.59% MC-PBS', 'CellCarrierB': '0.59% MC-PBS', 'cellcarrier': '0.49% MC-PBS', 'cellcarrier b': '0.59% MC-PBS', 'cellcarrierb': '0.59% MC-PBS', 'water': 'water'}: Many media names are actually shorthand for one medium

dclab.features.emodulus.viscosity.KNOWN_MEDIA = ['0.49% MC-PBS', '0.49% mc-pbs', '0.5% MC-PBS', '0.5% mc-pbs', '0.50% MC-PBS', '0.50% mc-pbs', '0.59% MC-PBS', '0.59% mc-pbs', '0.6% MC-PBS', '0.6% mc-pbs', '0.60% MC-PBS', '0.60% mc-pbs', '0.8% MC-PBS', '0.8% mc-pbs', '0.80% MC-PBS', '0.80% mc-pbs', '0.83% MC-PBS', '0.83% mc-pbs', 'CellCarrier', 'CellCarrier B', 'CellCarrierB', 'cellcarrier', 'cellcarrier b', 'cellcarrierb', 'water']: Media for which computation of viscosity is defined (has duplicate entries)

dclab.features.emodulus.viscosity.SAME_MEDIA = {'0.49% MC-PBS': ['0.49% MC-PBS', '0.5% MC-PBS', '0.50% MC-PBS', 'CellCarrier'], '0.59% MC-PBS': ['0.59% MC-PBS', '0.6% MC-PBS', '0.60% MC-PBS', 'CellCarrier B', 'CellCarrierB'], '0.83% MC-PBS': ['0.83% MC-PBS', '0.8% MC-PBS', '0.80% MC-PBS'], 'water': ['water']}: Dictionary with different names for one medium

fluorescence

dclab.features.fl_crosstalk.correct_crosstalk(fl1, fl2, fl3, fl_channel, ct21=0, ct31=0, ct12=0, ct32=0, ct13=0, ct23=0)[source]

Perform crosstalk correction

Parameters:

fli – Measured fluorescence signals
fl_channel (int) – The channel number (1, 2, or 3) for which the crosstalk-corrected signal should be computed
cij – Spill (crosstalk or bleed-through) from channel i to channel j This spill is computed from the fluorescence signal of e.g. single-stained positive control cells; It is defined by the ratio of the fluorescence signals of the two channels, i.e cij = flj / fli.
fl1 (int | float | ndarray[tuple[Any, ...], dtype[_ScalarT]])
fl2 (int | float | ndarray[tuple[Any, ...], dtype[_ScalarT]])
fl3 (int | float | ndarray[tuple[Any, ...], dtype[_ScalarT]])
ct21 (float)
ct31 (float)
ct12 (float)
ct32 (float)
ct13 (float)
ct23 (float)

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

See also

get_compensation_matrix: compute the inverse crosstalk matrix

Notes

If there are only two channels (e.g. fl1 and fl2), then the crosstalk to and from the other channel (ct31, ct32, ct13, ct23) should be set to zero.

dclab.features.fl_crosstalk.get_compensation_matrix(ct21, ct31, ct12, ct32, ct13, ct23)[source]

Compute crosstalk inversion matrix

The spillover matrix is

| c11 c12 c13 |
| c21 c22 c23 |
| c31 c32 c33 |

The diagonal elements are set to 1, i.e.

ct11 = c22 = c33 = 1

Parameters:

cij – Spill from channel i to channel j
ct21 (float)
ct31 (float)
ct12 (float)
ct32 (float)
ct13 (float)
ct23 (float)

Returns:

Compensation matrix (inverted spillover matrix)

Return type:

inv

isoelastics

Isoelastics management

exception dclab.isoelastics.IsoelasticsEmodulusMeaninglessWarning[source]

class dclab.isoelastics.AutoRecursiveDict(dict=None, /, **kwargs)[source]

class dclab.isoelastics.Isoelastics(paths=None)[source]

Isoelasticity line management

Parameters:

paths (list of pathlib.Path or list of str) – list of paths to files containing isoelasticity lines
versionchanged: (..) – 0.24.0: The isoelasticity lines of the analytical model [MOG+15] and the linear-elastic numerical model [MMM+17] were recomputed with an equidistant spacing. The metadata section of the text file format was restructured.

static add_px_err(isoel, col1, col2, px_um, inplace=False)[source]

Undo pixelation correction

Since isoelasticity lines are usually computed directly from the simulation data (e.g. the contour data are not discretized on a grid but are extracted from FEM simulations), they are not affected by pixelation effects as described in [Her17].

If the isoelasticity lines are displayed alongside experimental data (which are affected by pixelation effects), then the lines must be “un”-corrected, i.e. the pixelation error must be added to the lines to match the experimental data.

Parameters:

isoel (list of 2d ndarrays of shape (N, 3)) – Each item in the list corresponds to one isoelasticity line. The first column is defined by col1, the second by col2, and the third column is the emodulus.
col1 (str) – Define the fist two columns of each isoelasticity line.
col2 (str) – Define the fist two columns of each isoelasticity line.
px_um (float) – Pixel size [µm]
inplace (bool) – If True, do not create a copy of the data in isoel

static convert(isoel, col1, col2, channel_width_in, channel_width_out, flow_rate_in, flow_rate_out, viscosity_in, viscosity_out, inplace=False)[source]

Perform isoelastics scale conversion

Parameters:

isoel (list of 2d ndarrays of shape (N, 3)) – Each item in the list corresponds to one isoelasticity line. The first column is defined by col1, the second by col2, and the third column is the emodulus.
col1 (str) – Define the fist to columns of each isoelasticity line. One of [“area_um”, “circ”, “deform”]
col2 (str) – Define the fist to columns of each isoelasticity line. One of [“area_um”, “circ”, “deform”]
channel_width_in (float) – Original channel width [µm]
channel_width_out (float) – Target channel width [µm]
flow_rate_in (float) – Original flow rate [µL/s]
flow_rate_out (float) – Target flow rate [µL/s]
viscosity_in (float) – Original viscosity [mPa*s]
viscosity_out (float) – Target viscosity [mPa*s]
inplace (bool) – If True, do not create a copy of the data in isoel

Returns:

isoel_scale – The scale-converted isoelasticity lines.

Return type:

list of 2d ndarrays of shape (N, 3)

Notes

If only the positions of the isoelastics are of interest and not the value of the elastic modulus, then it is sufficient to supply values for the channel width and set the values for flow rate and viscosity to a constant (e.g. 1).

See also

dclab.features.emodulus.scale_linear.scale_feature: scale conversion method used

add(isoel, col1, col2, channel_width, flow_rate, viscosity, method=None, lut_identifier=None)[source]

Add isoelastics

Parameters:

isoel (list of ndarrays) – Each list item resembles one isoelastic line stored as an array of shape (N,3). The last column contains the emodulus data.
col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
channel_width (float) – Channel width in µm
flow_rate (float) – Flow rate through the channel in µL/s
viscosity (float) – Viscosity of the medium in mPa*s
method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.
lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function get_available_identifiers() returns a list of available identifiers.

Notes

The following isoelastics are automatically added for user convenience:

isoelastics with col1 and col2 interchanged
isoelastics for circularity if deformation was given

get(col1, col2, channel_width, method=None, lut_identifier=None, flow_rate=None, viscosity=None, add_px_err=False, px_um=None)[source]

Get isoelastics

Parameters:

col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
channel_width (float) – Channel width in µm
method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.
lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function get_available_identifiers() returns a list of available identifiers.
flow_rate (float or None) – Flow rate through the channel in µL/s. If set to None, the flow rate of the imported data will be used (only do this if you do not need the correct values for elastic moduli).
viscosity (float or None) – Viscosity of the medium in mPa*s. If set to None, the flow rate of the imported data will be used (only do this if you do not need the correct values for elastic moduli).
add_px_err (bool) – If True, add pixelation errors according to C. Herold (2017), https://arxiv.org/abs/1704.00572 and scripts/pixelation_correction.py
px_um (float) – Pixel size [µm], used for pixelation error computation

See also

dclab.features.emodulus.scale_linear.scale_feature: scale conversion method used
dclab.features.emodulus.pxcorr.get_pixelation_delta: pixelation correction (applied to the feature data)

get_with_rtdcbase(col1, col2, dataset, method=None, lut_identifier=None, viscosity=None, add_px_err=False)[source]

Convenience method that extracts the metadata from RTDCBase

Parameters:

col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.
lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function get_available_identifiers() returns a list of available identifiers.
dataset (dclab.rtdc_dataset.RTDCBase) – The dataset from which to obtain the metadata.
viscosity (float, None, or False) – Viscosity of the medium in mPa*s. If set to None, the viscosity is computed from the meta data (medium, flow rate, channel width, temperature) in the [setup] config section. If this is not possible, the flow rate of the imported data is used and a warning will be issued.
add_px_err (bool) – If True, add pixelation errors according to C. Herold (2017), https://arxiv.org/abs/1704.00572 and scripts/pixelation_correction.py

load_data(path)[source]

Load isoelastics from a text file

Parameters:: path (str or pathlib.Path) – Path to an isoelasticity lines text file

dclab.isoelastics.check_lut_identifier(lut_identifier, method)[source]: Transitional function that can be removed once method is removed

dclab.isoelastics.get_available_files()[source]: Return list of available isoelasticity line files in dclab

dclab.isoelastics.get_available_identifiers()[source]: Return a list of available LUT identifiers

dclab.isoelastics.get_default()[source]: Return default isoelasticity lines

Kernel Density Estimators (KDEs)

class dclab.kde.KernelDensityEstimator(rtdc_ds)[source]

static apply_scale(a, scale, feat)[source]

Helper function for transforming an array to log-scale

Parameters:

a (np.ndarray) – Input array
scale (str) – If set to “log”, take the logarithm of a; if set to “linear” return a unchanged.
feat (str) – Feature name (required for debugging)

Returns:

b – The scaled array

Return type:

np.ndarray

Notes

If the scale is not “linear”, then a new array is returned. All warnings are suppressed when computing np.log(a), as a may have negative or nan values.

static check_feat_kde_applicability(xax, yax)[source]

Return True when it makes sense to compute KDE data

Parameters:

xax (str)
yax (str)

Return type:

bool

static estimate_spacing(a, method, scale='linear', method_kw=None, feat='undefined', ret_scaled=False)[source]

Helper function for guessing the spacing/accuracy for KDE plots

Parameters:

a (ndarray) – feature data
scale (str) – how the data should be scaled (“log” or “linear”)
method (callable) – Binning method to use
method_kw (dict) – keyword arguments to method
feat (str) – feature name for debugging
ret_scaled (bool) – whether to return the scaled array of a

get_at(xax='area_um', yax='deform', positions=None, kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear', xacc=None, yacc=None)[source]

Fast evaluation of the KDE via linear interpolation

The KDE is computed via linear interpolation from the output of KernelDensityEstimator.get_raster().

Parameters:

xax (str) – Identifier for X- and Y-axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for X- and Y-axis (e.g. “area_um”, “aspect”, “deform”)
positions (list of two 1d ndarrays or ndarray of shape (2, N)) – The positions where the KDE will be computed. Note that the KDE estimate is computed from the points that are set in self.rtdc_ds.filter.all.
kde_type (str) – The KDE method to use, see kde_methods.methods
kde_kwargs (dict) – Additional keyword arguments to the KDE method
xscale (str) – If set to “log”, take the logarithm of the axes values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
yscale (str) – If set to “log”, take the logarithm of the axes values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
xacc (float) – KDE grid spacing in x and y direction if set to None, will use bin_width_percentile()
yacc (float) – KDE grid spacing in x and y direction if set to None, will use bin_width_percentile()

Returns:

density – The kernel density evaluated for the filtered events.

Return type:

1d ndarray

get_contour_lines(quantiles=None, xax='area_um', yax='deform', kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear', xacc=None, yacc=None, ret_levels=False)[source]

Compute contour lines for a given kernel density estimate.

Parameters:

quantiles (list or array of floats) – KDE Quantiles for which contour levels are computed. The values must be between 0 and 1. If set to None, use [0.5, 0.95] as default.
xax (str) – Identifier for X- and Y-axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for X- and Y-axis (e.g. “area_um”, “aspect”, “deform”)
kde_type (str) – The KDE method to use
kde_kwargs (dict) – Additional keyword arguments to the KDE method
xscale (str) – If set to “log”, take the logarithm of the axis-values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
yscale (str) – If set to “log”, take the logarithm of the axis-values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
xacc (float) – KDE grid spacing in x and y direction; Affects contour shape. If set to None, will use find_smooth_contour_spacing()
yacc (float) – KDE grid spacing in x and y direction; Affects contour shape. If set to None, will use find_smooth_contour_spacing()
ret_levels (bool) – If set to True, return the levels of the contours (default: False)

Returns:

contour_lines (list of lists (of lists)) – For every number in quantiles, this list contains a list of corresponding contour lines. Each contour line is a 2D array of shape (N, 2), where N is the number of points in the contour line.
levels (list of floats) – The density levels corresponding to each number in quantiles. Only returned if ret_levels is set to True.

get_raster(xax='area_um', yax='deform', kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear', xacc=None, yacc=None)[source]

Evaluate the kernel density estimate on a grid

Parameters:

xax (str) – Identifier for X- and Y-axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for X- and Y-axis (e.g. “area_um”, “aspect”, “deform”)
kde_type (str) – The KDE method to use
kde_kwargs (dict) – Additional keyword arguments to the KDE method
xscale (str) – If set to “log”, take the logarithm of the axes values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
yscale (str) – If set to “log”, take the logarithm of the axes values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
xacc (float) – Grid spacing size for the axes; If set to None, will use bin_width_percentile()
yacc (float) – Grid spacing size for the axes; If set to None, will use bin_width_percentile()

Returns:

X, Y, Z – The kernel density Z evaluated on a rectangular grid (X,Y).

Return type:

coordinates

get_scatter(xax='area_um', yax='deform', positions=None, kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear')[source]

Evaluate the KDE at specific positions

The KDE is evaluated with the kde_type function for every point, which is significantly slower than using interpolation via KernelDensityEstimator.get_at().

Parameters:

xax (str) – Identifier for X- and Y-axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for X- and Y-axis (e.g. “area_um”, “aspect”, “deform”)
positions (list of two 1d ndarrays or ndarray of shape (2, N)) – The positions where the KDE will be computed. Note that the KDE estimate is computed from the points that are set in self.rtdc_ds.filter.all.
kde_type (str) – The KDE method to use, see kde_methods.methods
kde_kwargs (dict) – Additional keyword arguments to the KDE method
xscale (str) – If set to “log”, take the logarithm of the axis-values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
yscale (str) – If set to “log”, take the logarithm of the axis-values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.

Returns:

density – The kernel density evaluated for the filtered data points.

Return type:

1d ndarray

Kernel Density Estimation methods

dclab.kde.methods.kde_none(events_x, events_y, xout=None, yout=None)[source]

No Kernel Density Estimation

Parameters:

events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.

Returns:

density – The KDE for the points in (xout, yout)

Return type:

ndarray, same shape as xout

Notes

This method is a convenience method that always returns ones in the shape that the other methods in this module produce.

dclab.kde.contours.find_contours_level(density, x, y, level, closed=False)[source]

Find iso-valued density contours for a given level value

Parameters:

density (2d ndarray of shape (M, N)) – Kernel density estimate (KDE) for which to compute the contours
x (2d ndarray of shape (M, N) or 1d ndarray of size M) – X-values corresponding to density
y (2d ndarray of shape (M, N) or 1d ndarray of size M) – Y-values corresponding to density
level (float between 0 and 1) – Value along which to find contours in density relative to its maximum
closed (bool) – Whether to close contours at the KDE support boundaries

Returns:

contours – Contours found for the given level value

Return type:

list of ndarrays of shape (P, 2)

See also

skimage.measure.find_contours: Contour finding algorithm used

dclab.kde.contours.get_quantile_levels(density, x, y, xp, yp, q, normalize=True)[source]

Compute density levels for given quantiles by interpolation

For a given 2D density, compute the density levels at which the resulting contours contain the fraction 1-q of all data points. E.g. for a measurement of 1000 events, all contours at the level corresponding to a quantile of q=0.95 (95th percentile) contain 50 events (5%).

Parameters:

density (2d ndarray of shape (M, N)) – Kernel density estimate for which to compute the contours
x (2d ndarray of shape (M, N) or 1d ndarray of size M) – X-values corresponding to density
y (2d ndarray of shape (M, N) or 1d ndarray of size M) – Y-values corresponding to density
xp (1d ndarray of size D) – Event x-data from which to compute the quantile
yp (1d ndarray of size D) – Event y-data from which to compute the quantile
q (array_like or float between 0 and 1) – Quantile along which to find contours in density relative to its maximum
normalize (bool) – Whether output levels should be normalized to the maximum of density

Returns:

level – Contours level(s) corresponding to the given quantile

Return type:

np.ndarray or float

Notes

NaN-values events in xp and yp are ignored.

dclab.kde.smooth_contour.compute_contour_opening_angles(contour, xrange, yrange, xscale, yscale)[source]

For each point of the contour, compute the opening angle

This takes the visible plot area into account.

dclab.kde.smooth_contour.find_smooth_contour_spacing(ds_list, xax, yax, xrange, yrange, quantiles, xscale='linear', yscale='linear', kde_type='histogram', kde_kwargs=None, max_iter=15, abort_event=None)[source]

Determine contour spacing values for visually pleasing contours

The algorithm reduces the “kinks” in contours.

Parameters:

ds_list – list of RTDCBase instances for which smooth contours should be found
xax (str) – X- and Y-axis of the contour plot
yax (str) – X- and Y-axis of the contour plot
xrange (list[float] | tuple[float, float]) – Plotting range of the contour plot
yrange (list[float] | tuple[float, float]) – Plotting range of the contour plot
quantiles (list[float] | tuple[float]) – Data quantiles for which contour lines should be computed; only the highest quantile is used for smoothing
xscale (str) – “linear” or “log” scale of the contour plot axes
yscale (str) – “linear” or “log” scale of the contour plot axes
kde_type (str) – Kernel density estimate method to use
kde_kwargs (dict | None) – Custom arguments for the kernel density estimate method
max_iter (int) – Maximum number of iterations to perform before returning
abort_event (Event | None) – Optional event for prematurely stopping the iteration

Returns:

Distionary containing the iteration result:

total iterations: iterations performed

success: whether the smoothing succeeded

reason: reason for success or failure

corners found: whether a contour touched the plot boundary

spacing x: contour spacing along x

spacing y: contour spacing along y

Return type:

result

polygon_filter

exception dclab.polygon_filter.FilterIdExistsWarning[source]

exception dclab.polygon_filter.PolygonFilterError[source]

class dclab.polygon_filter.PolygonFilter(axes=None, points=None, inverted=False, name=None, filename=None, fileid=0, unique_id=None)[source]

An object for filtering RTDC data based on a polygonial area

Parameters:

axes (tuple of str or list of str) – The axes/features on which the polygon is defined. The first axis is the x-axis. Example: (“area_um”, “deform”).
points (array-like object of shape (N,2)) – The N coordinates (x,y) of the polygon. The exact order is important.
inverted (bool) – Invert the polygon filter. This parameter is overridden if filename is given.
name (str) – A name for the polygon (optional).
filename (str) – A path to a .poly file as created by this classes’ save method. If filename is given, all other parameters are ignored.
fileid (int) – Which filter to import from the file (starting at 0).
unique_id (int) – An integer defining the unique id of the new instance.

Notes

The minimal arguments to this class are either filename OR (axes, points). If filename is set, all parameters are taken from the given .poly file.

static clear_all_filters()[source]: Remove all filters and reset instance counter

static get_instance_from_id(unique_id)[source]: Get an instance of the PolygonFilter using a unique id

static import_all(path)[source]

Import all polygons from a .poly file.

Returns a list of the imported polygon filters

static instace_exists(unique_id)[source]: Determine whether an instance with this unique id exists

static point_in_poly(p, poly)[source]

Determine whether a point is within a polygon area

Uses the ray casting algorithm.

Parameters:

p (tuple of floats) – Coordinates of the point
poly (array_like of shape (N, 2)) – Polygon (PolygonFilter.points)

Returns:

inside – True, if point is inside.

Return type:

bool

Notes

If p lies on a side of the polygon, it is defined as

“inside” if it is on the lower or left
“outside” if it is on the top or right

Changed in version 0.24.1: The new version uses the cython implementation from scikit-image. In the old version, the inside/outside definition was the other way around. In favor of not having to modify upstream code, the scikit-image version was adapted.

static remove(unique_id)[source]: Remove a polygon filter from PolygonFilter.instances

static save_all(polyfile)[source]: Save all polygon filters

static unique_id_exists(pid)[source]: Whether or not a filter with this unique id exists

copy(invert=False)[source]

Return a copy of the current instance

Parameters:: invert (bool) – The copy will be inverted w.r.t. the original

filter(datax, datay)[source]: Filter a set of datax and datay according to self.points

save(polyfile, ret_fobj=False)[source]

Save all data to a text file (appends data if file exists).

Polyfile can be either a path to a file or a file object that was opened with the write “w” parameter. By using the file object, multiple instances of this class can write their data.

If ret_fobj is True, then the file object will not be closed and returned.

property hash: Hash of axes, points, and inverted

instances = [<dclab.polygon_filter.PolygonFilter object>]

property points

dclab.polygon_filter.get_polygon_filter_names()[source]: Get the names of all polygon filters in the order of creation

statistics

Statistics computation for RT-DC dataset instances

exception dclab.statistics.BadMethodWarning[source]

class dclab.statistics.Statistics(name, method, req_feature=False)[source]

A helper class for computing statistics

All statistical methods are registered in the dictionary Statistics.available_methods.

get_feature(ds, feat)[source]

Return filtered feature data

The features are filtered according to the user-defined filters, using the information in ds.filter.all. In addition, all nan and inf values are purged.

Parameters:

ds (dclab.rtdc_dataset.RTDCBase) – The dataset containing the feature
feat (str) – The name of the feature; must be a scalar feature

available_methods = {'%-gated': <dclab.statistics.Statistics object>, '10th Percentile': <dclab.statistics.Statistics object>, '25th Percentile': <dclab.statistics.Statistics object>, '75th Percentile': <dclab.statistics.Statistics object>, '90th Percentile': <dclab.statistics.Statistics object>, 'Events': <dclab.statistics.Statistics object>, 'Flow rate': <dclab.statistics.Statistics object>, 'Mean': <dclab.statistics.Statistics object>, 'Median': <dclab.statistics.Statistics object>, 'Mode': <dclab.statistics.Statistics object>, 'SD': <dclab.statistics.Statistics object>}

dclab.statistics.flow_rate(ds)[source]: Return the flow rate of an RT-DC dataset

dclab.statistics.get_statistics(ds, methods=None, features=None, ret_dict=False)[source]

Compute statistics for an RT-DC dataset

Parameters:

ds (dclab.rtdc_dataset.RTDCBase) – The dataset for which to compute the statistics.
methods (list of str or None) – The methods wih which to compute the statistics. The list of available methods is given with available_methods.keys() If set to None, statistics for all methods are computed.
features (list of str) – Feature name identifiers are defined by dclab.definitions.feature_exists(). If set to None, statistics for all scalar features available are computed.
ret_dict (bool) – Instead of returning (header, values), return a dictionary with headers as keys.

Returns:

header (list of str) – The header (feature + method names) of the computed statistics.
values (list of float) – The computed statistics.

dclab.statistics.mode(data)[source]

Compute an intelligent value for the mode

The most common value in experimental is not very useful if there are a lot of digits after the comma. This method approaches this issue by rounding to bin size that is determined by the Freedman–Diaconis rule.

Parameters:: data (1d ndarray) – The data for which the mode should be computed.
Returns:: mode – The mode computed with the Freedman-Diaconis rule.
Return type:: float

HDF5 manipulation

Helper methods for copying .rtdc data

dclab.rtdc_dataset.copier.basin_definition_copy(src_h5file, dst_h5file, features_iter)[source]

Copy basin definitions src_h5file[“basins”] to the new file

Normally, we would just use h5ds_copy() to copy basins from one dataset to another. However, if we are e.g. only copying scalar features, and there are non-scalar features in the internal basin, then we must rewrite the basin definition of the internal basin.

The features_iter list of features defines which features are relevant for the internal basin.

To copy internal basins, use internal_basin_events_copy().

dclab.rtdc_dataset.copier.get_size(h5_obj)[source]

Recursively return the size of an HDF5 object (group or dataset)

Returns:: the size in bytes
Return type:: size
Parameters:: h5_obj (Group | Dataset | list | tuple | None)

dclab.rtdc_dataset.copier.h5ds_copy(src_loc, src_name, dst_loc, dst_name=None, ensure_compression=True, recursive=True, bytes_written=None)[source]

Copy an HDF5 Dataset from one group to another

Parameters:

src_loc (h5py.Group) – The source location
src_name (str) – Name of the dataset in src_loc
dst_loc (h5py.Group) – The destination location
dst_name (str) – The name of the destination dataset, defaults to src_name
ensure_compression (bool) – Whether to make sure that the data are compressed, If disabled, then all data from the source will be just copied and not compressed.
recursive (bool) – Whether to recurse into HDF5 Groups (this is required e.g. for copying the “trace” feature)
bytes_written (mp.Value) – A shared multiprocessing.Value instance to which the number of bytes written is added during the copying process; Use this if you would like to track the progress.

Returns:

dst – The dataset dst_loc[dst_name]

Return type:

h5py.Dataset

Raises:

ValueError: – If the named source is not a h5py.Dataset

dclab.rtdc_dataset.copier.internal_basin_events_copy(src_h5file, dst_h5file, features)[source]

Copy internal basin data from the input to the output file

The basin dictionaries are read and only the basinmap features that are required are copied to the output file.

Parameters:

src_h5file (Group)
dst_h5file (Group)
features (list[str])

Return type:

tuple[list[str], int]

dclab.rtdc_dataset.copier.is_properly_compressed(h5obj)[source]

Check whether an HDF5 object is properly compressed

The compression check only returns True if the input file was compressed with the Zstandard compression using compression level 5 or higher.

dclab.rtdc_dataset.copier.rtdc_copy(src_h5file, dst_h5file, features='all', include_basins=True, include_logs=True, include_tables=True, meta_prefix='', bytes_total=None, bytes_written=None)[source]

Create a compressed copy of an RT-DC file

Parameters:

src_h5file (h5py.Group) – Input HDF5 file
dst_h5file (h5py.Group) – Output HDF5 file
features (list of strings or one of ['all', 'scalar', 'none']) – If this is a list then it specifies the features that are copied from src_h5file to dst_h5file. Alternatively, you may specify ‘all’ (copy all features), ‘scalar’ (copy only scalar features), or ‘none’ (don’t copy any features).
include_basins (bool) – Copy the basin information from src_h5file to dst_h5file.
include_logs (bool) – Copy the logs from src_h5file to dst_h5file.
include_tables (bool) – Copy the tables from src_h5file to dst_h5file.
meta_prefix (str) – Add this prefix to the name of the logs and tables in dst_h5file.
bytes_total (Synchronized[int] | None) – If specified, will be set to the estimated total size in bytes (uncompressed) that will be written to the new file. The basin definitions are not included due to their variable size. Logs are also not included, because the line length may vary.
bytes_written (Synchronized[int] | None) – Number of bytes written to the output file during the copying process

Writing RT-DC files

exception dclab.rtdc_dataset.writer.StoringPerishableBasinWarning[source]

class dclab.rtdc_dataset.writer.RTDCWriter(path_or_h5file, mode='append', compression_kwargs=None, compression='deprecated')[source]

RT-DC data writer classe

Parameters:

path_or_h5file (str or pathlib.Path or h5py.Group) – Path to an HDF5 file or an HDF5 file opened in write mode
mode (str) –
Defines how the data are stored:
- ”append”: append new feature data to existing h5py Datasets
- ”replace”: replace existing h5py Datasets with new features (used for ancillary feature storage)
- ”reset”: do not keep any previous data
compression_kwargs (dict-like) – Dictionary with the keys “compression” and “compression_opts” which are passed to h5py.H5File.create_dataset(). The default is Zstandard compression with the compression level 5 hdf5plugin.Zstd(clevel=5). To disable compression, use {“compression”: None}.
compression (str or None) –
Compression method used for data storage; one of [None, “lzf”, “gzip”, “szip”].

Deprecated since version 0.43.0: Use compression_kwargs instead.

static get_best_nd_chunks(item_shape, item_dtype=<class 'numpy.float64'>)[source]

Return best chunks for HDF5 datasets

Chunking has performance implications. It’s recommended to keep the total size of dataset chunks between 10 KiB and 1 MiB. This number defines the maximum chunk size as well as half the maximum cache size for each dataset.

close()[source]: Close the underlying HDF5 file if a path was given during init

rectify_metadata()[source]

Autocomplete the metadta of the RTDC-measurement

The following configuration keys are updated:

experiment:event count
fluorescence:samples per event
imaging: roi size x (if image or mask is given)
imaging: roi size y (if image or mask is given)

The following configuration keys are added if not present:

fluorescence:channel count

store_basin(basin_name, basin_type, basin_format, basin_locs, basin_descr=None, basin_feats=None, basin_map=None, basin_id=None, internal_data=None, verify=True, perishable=False)[source]

Write basin information

Parameters:

basin_name (str) – basin name; Names do not have to be unique.
basin_type (str) – basin type (file or remote); Files are paths accessible by the operating system (including e.g. network shares) whereas remote locations normally require an active internet connection.
basin_format (str) – The basin format must match the format property of an RTDCBase subclass (e.g. “hdf5” or “dcor”)
basin_locs (list) – location of the basin as a string or (optionally) a pathlib.Path
basin_descr (str) – optional string describing the basin
basin_feats (list of str) – list of features this basin provides; You may use this to restrict access to features for a specific basin.
basin_map (np.ndarray or tuple of (str, np.ndarray)) – If this is an integer numpy array, it defines the mapping of event indices from the basin dataset to the referring dataset (the dataset being written to disk). Normally, the basinmap feature used for storing the mapping information is inferred from the currently defined basinmap features. However, if you are incepting basins, then this might not be sufficient, and you have to specify explicitly which basinmap feature to use. In such a case, you may specify a tuple (feature_name, mapping_array) where feature_name is the explicit mapping name, e.g. “basinmap3”.
basin_id (str) – Identifier of the basin. This is the string returned by RTDCBase.get_measurement_identifier(). This is a unique string that identifies the data within a basin. If not specified and verify=True, this value is automatically taken from the basin file.
internal_data (dict or instance of h5py.Group) – A dictionary or an h5py.Group containing the basin data. The data are copied to the “basin_events” group, if internal_data is not an h5py.Group in the current HDF5 file. This must be specified when storing internal basins, and it must not be specified for any other basin type.
verify (bool) – Whether to verify the basin before storing it; You might have set this to False if you would like to write a basin that is e.g. temporarily not available
perishable (bool) – Whether the basin is perishable. If this is True, then a warning will be issued, because perishable basins may not be accessed (e.g. time-based URL for private S3 data).

Returns:

basin_hash – hash of the basin which serves as the name of the HDF5 dataset stored in the output file

Added in version 0.58.0.

Return type:

str

store_feature(feat, data, shape=None)[source]

Write feature data

Parameters:

feat (str) – feature name
data (np.ndarray or list or dict) – feature data
shape (tuple of int) – For non-scalar features, this is the shape of the feature for one event (e.g. (90, 250) for an “image”. Usually, you do not have to specify this value, but you do need it in case of plugin features that don’t have the “feature shape” set or in case of temporary features. If you don’t specify it, then the shape is guessed based on the data you provide and a UserWarning will be issued.

store_log(name, lines)[source]

Write log data

Parameters:

name (str) – name of the log entry
lines (list of str or str) – the text lines of the log

store_metadata(meta)[source]

Store RT-DC metadata

Parameters:

meta (dict-like) –

The metadata to store. Each key depicts a metadata section name whose data is given as a dictionary, e.g.:

meta = {"imaging": {"exposure time": 20,
                    "flash duration": 2,
                    ...
                    },
        "setup": {"channel width": 20,
                  "chip region": "channel",
                  ...
                  },
        ...
        }

Only section key names and key values therein registered in dclab are allowed and are converted to the pre-defined dtype. Only sections from the dclab.definitions.CFG_METADATA dictionary are stored. If you have custom metadata, you can use the “user” section.

store_table(name, cmp_array, h5_attrs=None)[source]

Store a compound array table

Tables are semi-metadata. They may contain information collected during a measurement (but with a lower temporal resolution) or other tabular data relevant for a dataset. Tables have named columns. Therefore, they can be represented as a numy recarray, and they should be stored as such in an HDF5 file (compund dataset).

Parameters:

name (str) – Name of the table
cmp_array (np.recarray, h5py.Dataset, np.ndarray, or dict) – If a np.recarray or h5py.Dataset are provided, then they are written as-is to the file. If a dictionary is provided, then the dictionary is converted into a numpy recarray. If a numpy array is provided, then the array is written as a raw table (no column names) to the file.
h5_attrs (dict, optional) – Attributes to store alongside the corresponding HDF5 dataset

version_brand(old_version=None, write_attribute=True)[source]

Perform version branding

Append a “ | dclab X.Y.Z” to the “setup:software version” attribute.

Parameters:

old_version (str or None) – By default, the version string is taken from the HDF5 file. If set to a string, then this version is used instead.
write_attribute (bool) – If True (default), write the version string to the “setup:software version” attribute

write_image_float32(group, name, data)[source]

Write 32bit floating point image array

This function wraps RTDCWriter.write_ndarray() and adds image attributes to the HDF5 file so HDFView can display the images properly.

Parameters:

group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
data (np.ndarray or list of np.ndarray) – image data

write_image_grayscale(group, name, data, is_boolean)[source]

Write grayscale image data to and HDF5 dataset

This function wraps RTDCWriter.write_ndarray() and adds image attributes to the HDF5 file so HDFView can display the images properly.

Parameters:

group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
data (np.ndarray or list of np.ndarray) – image data
is_boolean (bool) – whether the input data is of boolean nature (e.g. mask data) - if so, data are converted to uint8

write_ndarray(group, name, data, dtype=None)[source]

Write n-dimensional array data to an HDF5 dataset

It is assumed that the shape of the array data is correct, i.e. that the shape of data is (number_events, feat_shape_1, …, feat_shape_n).

Parameters:

group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
data (np.ndarray) – data
dtype (dtype) – the dtype to use for storing the data (defaults to data.dtype)

write_ragged(group, name, data)[source]

Write ragged data (i.e. list of arrays of different lenghts)

Ragged array data (e.g. contour data) are stored in a separate group and each entry becomes an HDF5 dataset.

Parameters:

group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
data (list of np.ndarray or np.ndarray) – the data in a list

write_text(group, name, lines)[source]

Write text to an HDF5 dataset

Text data are written as a fixed-length string dataset.

Parameters:

group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
lines (list of str or str) – the text, line by line

dclab.rtdc_dataset.writer.CHUNK_SIZE = 100: DEPRECATED (use CHUNK_SIZE_BYTES instead)

dclab.rtdc_dataset.writer.CHUNK_SIZE_BYTES = 1048576: Chunks size in bytes for storing HDF5 datasets

dclab.rtdc_dataset.writer.FEATURES_UINT32 = ['fl1_max', 'fl1_npeaks', 'fl2_max', 'fl2_npeaks', 'fl3_max', 'fl3_npeaks', 'index', 'ml_class', 'nevents']: features that should be written to the output file as uint32 values

dclab.rtdc_dataset.writer.FEATURES_UINT64 = ['frame']: features that should be written to the output file as uint64 values

Command-line interface methods

command line interface

dclab.cli.compress(path_in=None, path_out=None, force=False, check_suffix=True, ret_path=False)[source]

Create a new dataset with all features compressed lossless

Parameters:

path_in (str or pathlib.Path) – file to compress
path_out (str or pathlib.Path) – output file path
force (bool) – DEPRECATED
check_suffix (bool) – check suffixes for input and output files
ret_path (bool) – whether to return the output path

Returns:

path_out – output path (with possibly corrected suffix)

Return type:

pathlib.Path (optional)

dclab.cli.condense(path_in=None, path_out=None, ancillaries=None, store_ancillary_features=True, store_basin_features=True, check_suffix=True, ret_path=False)[source]

Create a new dataset with all available scalar-only features

Besides the innate scalar features, this also includes all fast-to-compute ancillary and all basin features (features_loaded).

Parameters:

path_in (str or pathlib.Path) – file to compress
path_out (str or pathlib) – output file path
ancillaries (bool) – DEPRECATED, use store_ancillary_features instead
store_ancillary_features (bool) – compute and store ancillary features in the output file
store_basin_features (bool) – copy basin features from the input path to the output file; Note that the basin information (including any internal basin dataset) are always copied over to the new dataset.
check_suffix (bool) – check suffixes for input and output files
ret_path (bool) – whether to return the output path

Returns:

path_out – output path (with possibly corrected suffix)

Return type:

pathlib.Path (optional)

dclab.cli.condense_dataset(ds, h5_cond, ancillaries=None, store_ancillary_features=True, store_basin_features=True, warnings_list=None)[source]

Condense a dataset using low-level HDF5 methods

For ancillary and basin features, high-level dclab methods are used.

Parameters:

ds (RTDCBase) – dataset from which to export
h5_cond (File) – HDF5 file to which data are written
ancillaries (bool) – DEPRECATED, use store_ancillary_features instead
store_ancillary_features (bool) – compute and store ancillary features in the output file
store_basin_features (bool) – copy basin features from the input path to the output file; Note that the basin information (including any internal basin dataset) are always copied over to the new dataset.
warngins_list – List of warnings that should be written to the output file
warnings_list (List[str] | None)

dclab.cli.get_command_log(paths, custom_dict=None)[source]

Return a json dump of system parameters

Parameters:

paths (list of pathlib.Path or str) – paths of related measurement files; up to 5MB of each of them is md5-hashed and included in the “files” key
custom_dict (dict) – additional user-defined entries; must contain simple Python objects (json.dumps must still work)

dclab.cli.get_job_info()[source]

Return dictionary with current job information

Returns:: info – Job information including details about time, system, python version, and libraries used.
Return type:: dict of dicts

dclab.cli.join(paths_in=None, path_out=None, metadata=None, ret_path=False)[source]

Join multiple RT-DC measurements into a single .rtdc file

Parameters:

paths_in (list of string or pathlib.Path) – input paths to join
path_out (str or pathlib.Path) – output path
metadata (dict) – optional metadata dictionary (configuration dict) to store in the output file
ret_path (bool) – whether to return the output path

Returns:

path_out – output path (with corrected path suffix if applicable)

Return type:

pathlib.Path (optional)

Notes

The first input file defines the metadata written to the output file. Only features that are present in all input files are written to the output file.

dclab.cli.repack(path_in=None, path_out=None, strip_basins=False, strip_logs=False, check_suffix=True, ret_path=False)[source]

Repack/recreate an .rtdc file, optionally stripping the logs

Parameters:

path_in (str or pathlib.Path) – file to compress
path_out (str or pathlib) – output file path
strip_basins (bool) – do not write basin information to the output file
strip_logs (bool) – do not write logs to the output file
check_suffix (bool) – check suffixes for input and output files
ret_path (bool) – whether to return the output path

Returns:

path_out – output path (with possibly corrected suffix)

Return type:

pathlib.Path

dclab.cli.split(path_in=None, path_out=None, split_events=10000, skip_initial_empty_image=True, skip_final_empty_image=True, ret_out_paths=False, verbose=False)[source]

Split a measurement file

Parameters:

path_in (str or pathlib.Path) – path of input measurement file
path_out (str or pathlib.Path) – path to output directory (optional)
split_events (int) – maximum number of events in each output file
skip_initial_empty_image (bool) – remove the first event of the dataset if the image is zero
skip_final_empty_image (bool) – remove the final event of the dataset if the image is zero
ret_out_paths (bool) – if True, return the list of output file paths
verbose (bool) – if True, print messages to stdout

Returns:

[out_paths] – List of generated files (only if ret_out_paths is specified)

Return type:

list of pathlib.Path

dclab.cli.tdms2rtdc(path_tdms=None, path_rtdc=None, compute_features=False, skip_initial_empty_image=True, skip_final_empty_image=True, verbose=False)[source]

Convert .tdms datasets to the hdf5-based .rtdc file format

Parameters:

path_tdms (str or pathlib.Path) – Path to input .tdms file
path_rtdc (str or pathlib.Path) – Path to output .rtdc file
compute_features (bool) – If True, compute all ancillary features and store them in the output file
skip_initial_empty_image (bool) – In old versions of Shape-In, the first image was sometimes not stored in the resulting .avi file. In dclab, such images are represented as zero-valued images. If True (default), this first image is not included in the resulting .rtdc file.
skip_final_empty_image (bool) – In other versions of Shape-In, the final image is sometimes also not stored in the .avi file. If True (default), this final image is not included in the resulting .rtdc file.
verbose (bool) – If True, print messages to stdout

dclab.cli.verify_dataset(path_in=None)[source]: Perform checks on experimental datasets

R and lme4

exception dclab.lme4.rsetup.CommandFailedError[source]: Used when run_command encounters an error

exception dclab.lme4.rsetup.RNotFoundError[source]

dclab.lme4.rsetup.get_r_path()[source]: Return the path of the R executable

dclab.lme4.rsetup.get_r_script_path()[source]: Return the path to the Rscript executable

dclab.lme4.rsetup.get_r_version()[source]: Return the full R version string

dclab.lme4.rsetup.has_lme4()[source]: Return True if the lme4 package is installed

dclab.lme4.rsetup.has_r()[source]: Return True if R is available

dclab.lme4.rsetup.require_lme4()[source]

Install the lme4 package (if not already installed)

Besides lme4, this also installs nloptr and statmod. The packages are installed to the user data directory given in lib_path from the http://cran.rstudio.org mirror.

dclab.lme4.rsetup.require_r()[source]: Make sure R is installed an R HOME is set

dclab.lme4.rsetup.run_command(cmd, **kwargs)[source]: Run a command via subprocess

dclab.lme4.rsetup.set_r_lib_path(r_lib_path)[source]: Add given directory to the R_LIBS_USER environment variable

dclab.lme4.rsetup.set_r_path(r_path)[source]: Set the path of the R executable/binary

R lme4 wrapper

class dclab.lme4.wrapr.Rlme4(model='lmer', feature='deform')[source]

Perform an R-lme4 analysis with RT-DC data

Parameters:

model (str) –
One of:
- ”lmer”: linear mixed model using lme4’s lmer
- ”glmer+loglink”: generalized linear mixed model using lme4’s glmer with an additional a log-link function via the family=Gamma(link='log')) keyword.
feature (str) – Dclab feature for which to compute the model

add_dataset(ds, group, repetition)[source]

Add a dataset to the analysis list

Parameters:

ds (RTDCBase) – Dataset
group (str) – The group the measurement belongs to (“control” or “treatment”)
repetition (int) – Repetition of the measurement

Notes

For each repetition, there must be a “treatment” (1) and a “control” (0) group.
If you would like to perform a differential feature analysis, then you need to pass at least a reservoir and a channel dataset (with same parameters for group and repetition).

check_data()[source]: Perform sanity checks on self.data

fit(model=None, feature=None)[source]

Perform (generalized) linear mixed-effects model fit

The response variable is modeled using two linear mixed effect models:

model: “feature ~ group + (1 + group | repetition)” (random intercept + random slope model)
the null model: “feature ~ (1 + group | repetition)” (without the fixed effect introduced by the “treatment” group).

Both models are compared in R using “anova” (from the R-package “stats” [Eve92]) which performs a likelihood ratio test to obtain the p-Value for the significance of the fixed effect (treatment).

If the input datasets contain data from the “reservoir” region, then the analysis is performed for the differential feature.

Parameters:

model (str (optional)) –
One of:
- ”lmer”: linear mixed model using lme4’s lmer
- ”glmer+loglink”: generalized linear mixed model using lme4’s glmer with an additional log-link function via family=Gamma(link='log')) [BMBW15]
feature (str (optional)) – dclab feature for which to compute the model

Returns:

results – Dictionary with the results of the fitting process:

”anova p-value”: Anova likelihood ratio test (significance)
”feature”: name of the feature used for the analysis self.feature
”fixed effects intercept”: Mean of self.feature for all controls; In the case of the “glmer+loglink” model, the intercept is already back transformed from log space.
”fixed effects treatment”: The fixed effect size between the mean of the controls and the mean of the treatments relative to “fixed effects intercept”; In the case of the “glmer+loglink” model, the fixed effect is already back transformed from log space.
”fixed effects repetitions”: The effects (intercept and treatment) for each repetition. The first axis defines intercept/treatment; the second axis enumerates the repetitions; thus the shape is (2, number of repetitions) and np.mean(results["fixed effects repetitions"], axis=1) is equivalent to the tuple (results["fixed effects intercept"], results["fixed effects treatment"]) for the “lmer” model. This does not hold for the “glmer+loglink” model, because of the non-linear inverse transform back from log space.
”is differential”: Boolean indicating whether or not the analysis was performed for the differential (bootstrapped and subtracted reservoir from channel data) feature
”model”: model name used for the analysis self.model
”model converged”: boolean indicating whether the model converged
”r model summary”: Summary of the model
”r model coefficients”: Model coefficient table
”r script”: the R script used
”r output”: full output of the R script

Return type:

dict

get_differential_dataset()[source]

Return the differential dataset for channel/reservoir data

The most famous use case is differential deformation. The idea is that you cannot tell what the difference in deformation from channel to reservoir, because you never measure the same object in the reservoir and the channel. You usually just have two distributions. Comparing distributions is possible via bootstrapping. And then, instead of running the lme4 analysis with the channel deformation data, it is run with the differential deformation (subtraction of the bootstrapped deformation distributions for channel and reservoir).

get_feature_data(group, repetition, region='channel')[source]

Return array containing feature data

Parameters:

group (str) – Measurement group (“control” or “treatment”)
repetition (int) – Measurement repetition
region (str) – Either “channel” or “reservoir”

Returns:

fdata – Feature data (Nans and Infs removed)

Return type:

1d ndarray

is_differential()[source]

Return True if the differential feature is computed for analysis

This effectively just checks the regions of the datasets and returns True if any one of the regions is “reservoir”.

See also

get_differential_features: for an explanation

parse_result(result)[source]

set_options(model=None, feature=None)[source]: Set analysis options

data: list of [RTDCBase, column, repetition, chip_region]

feature: dclab feature for which to perform the analysis

model: modeling method to use (e.g. “lmer”)

dclab.lme4.wrapr.arr2str(a)[source]: Convert an array to a string

dclab.lme4.wrapr.bootstrapped_median_distributions(a, b, bs_iter=1000, rs=117)[source]

Compute the bootstrapped distributions for two arrays.

Parameters:

a (1d ndarray of length N) – Input data
b (1d ndarray of length N) – Input data
bs_iter (int) – Number of bootstrapping iterations to perform (output size).
rs (int) – Random state seed for random number generator

Returns:

median_dist_a, median_dist_b – Boostrap distribution of medians for a and b.

Return type:

1d arrays of length bs_iter

Notes

From a programmatic point of view, it would have been better to implement this method for just one input array (because of redundant code). However, due to historical reasons (testing and comparability to Shape-Out 1), bootstrapping is done interleaved for the two arrays.

Caching

class dclab.cached.caches.umbrella_cache(topic='general', bypass_memory_store=False, custom_handlers=None, evaluation_time_threshold=0.01)[source]

A hybrid disk and in-memory cache decorator compatible with numpy

This hybrid cache hashes the input parameters to a function to compute the cache key. If a persistent disk store is configured, data are saved to disk in a background thread (see StoreKeeper).

The umbrella_cache decorator can be used safely with regular functions and methods defined in classes.

Parameters:

topic (str) – A general topic (alphanumeric characters and dashes allowed) under which the cached values are stored.
bypass_memory_store (bool) – Set this to true to disable storing cached values in memory. If no disk store (see StoreKeeper) is defined, then data are never cached.
custom_handlers (dict[type[Any], Callable] | None) – Custom handlers for hashing objects not handled by update_hash(). custom_handlers is a dictionary where a key is a class (or optionally the name of a class as a string) and a value is a callable that translates a class instance to something that can be processed by update_hash().
evaluation_time_threshold (float) – If the time the method takes for computation is shorter than the the threshold value, then it will not be cached.

Notes

The behavior of umbrella_cache is defined by the global StoreKeeper instance. The StoreKeeper is a background thread that manages data in memory and on disk.

If you are using other decorators with this decorator, please make sure to apply the umbrella_cache first (first line before method definition). This wrapper uses name, doc, and filename of the method to identify it. If another wrapper does not implement a unique __doc__ and is applied to multiple methods, then umbrella_cache might return values of the wrong method.

dclab.cached.caches.compute_hash_for_cache(func, args, kwargs, custom_handlers=None)[source]

Compute the hash for caching the function return value

Parameters:

func (Callable)
args (list | tuple)
kwargs (dict)
custom_handlers (dict[type[Any], Callable] | None)

Return type:

str

dclab.cached.caches.update_hash(the_hash, arg, custom_handlers=None)[source]

Update a hashing object with a Python object

The argument can be a numpy array, a string, or a list/tuple of objects that are convertable to strings.

Parameters:: custom_handlers (dict[type[Any], Callable] | None)
Return type:: None

class dclab.cached.disk_store.DiskStore(path=None)[source]

Disk store for persisting data to disk

The disk store is thread- and process-safe.

assert_disk_store_path()[source]

clear()[source]

remove_old_files(max_bytes)[source]: Remove old files, honoring max_bytes total size

remove_stale_locks(max_age_seconds=3600)[source]: Remove stale locks

set_path(path)[source]

value_read(key, file_meta)[source]

value_write(key, value)[source]

value_write_json(key, json_data)[source]

value_write_mixed(key, value)[source]

version = '1.0'

class dclab.cached.disk_store.DiskStoreJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Custom JSONEncoder

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an RecursionError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

default(o)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)

class dclab.cached.disk_store.LockFile(path)[source]

Create a lock file at the specified path

Use the property acquire, to check whether the lock has been acquired.

class dclab.cached.memory_store.MemoryStore[source]

A dictionary-based in-memory store

clear()[source]: Clear the memory store

items()[source]: Return all key-value pairs in the memory store

pop(key)[source]: Remove and return a value from the memory store

remove_least_used_keys(n=1)[source]: Remove n least-used keys from the memory store

class dclab.cached.store_keeper.StoreKeeper(*args, **kwargs)[source]

Background thread for managing in-memory and persistent caches

The StoreKeeper class allows you to modify the caching behavior of all methods decorated with umbrella_cache.

To access the global instance, use the StoreKeeper.get_instance() method.

classmethod get_instance()[source]: Return the global StoreKeeper instance

clear()[source]: Clear memory and disk store

close()[source]: Stop the background thread

perform_tasks()[source]

Perform memory and disk management tasks

This method is called from within the run method and should not be called manually.

run()[source]: Enter the main loop

set_disk_store_path(path)[source]

Set the path where to store persistent cache data

If this method is not called, then the disk store is disabled.

set_disk_store_size_bytes(disk_store_size_bytes)[source]

Set maximum size of the disk store in bytes

This number limits the size the disk store is allowed to occupy on disk. Due to metadata overhead, the actual size is slightly larger.

Since the disk store is tidied up in the background at fixed intervals, the store size might temporarily exceed the set value.

set_interval(interval)[source]: Set the interval in seconds at which perform_tasks is called

set_memory_store_size(memory_store_size)[source]

Set the allowed number of values in the in-memory cache

Since the memory store is tidied up in the background at fixed intervals, the store size might temporarily exceed the set value.

disk_store: global persistent disk store

disk_store_size_bytes: maximum size of the disk store

event_exit: exit event is set when close is called

event_run: run event can be clear`ed to temporarily prevent `perform_tasks

interval: housekeeping interval [s]

memory_store: global volatile memory store

memory_store_size: maximum number of keys in the memory store