Code reference
Module-level methods
- dclab.new_dataset(data, identifier=None, **kwargs)[source]
Initialize a new RT-DC dataset
- Parameters
data –
can be one of the following:
dict
.tdms file
.rtdc file
subclass of RTDCBase (will create a hierarchy child)
DCOR resource URL
URL to file in S3-compatible object store
identifier (str) – A unique identifier for this dataset. If set to None an identifier is generated.
kwargs (dict) – Additional parameters passed to the RTDCBase subclass
- Returns
dataset – A new dataset instance
- Return type
subclass of
dclab.rtdc_dataset.RTDCBase
Global definitions
These definitionas are used throughout the dclab/Shape-In/Shape-Out ecosystem.
Metadata
Valid configuration sections and keys are described in: Analysis metadata and Experiment metadata. You should use the following methods instead of accessing the static metadata constants.
- dclab.definitions.config_key_exists(section, key)[source]
Return True if the configuration key exists
- dclab.definitions.get_config_value_descr(section, key)[source]
Return the description of a config value
Returns key if not defined anywhere
- dclab.definitions.get_config_value_func(section, key)[source]
Return configuration type converter function
- dclab.definitions.get_config_value_type(section, key)[source]
Return the expected type of a config value
Returns None if no type is defined
These constants are also available in the dclab.definitions
module.
- dclab.definitions.meta_const.CFG_ANALYSIS
All configuration keywords editable by the user
- dclab.definitions.meta_const.CFG_METADATA
All read-only configuration keywords for a measurement
- dclab.definitions.meta_const.config_keys
dict with section as keys and config parameter names as values
Metadata parsers
- dclab.definitions.meta_parse.func_types = {<function f1dfloatduple>: (<class 'tuple'>, <class 'numpy.ndarray'>), <function f2dfloatarray>: <class 'numpy.ndarray'>, <function fbool>: (<class 'bool'>, <class 'numpy.bool_'>), <function fboolorfloat>: (<class 'bool'>, <class 'numpy.bool_'>, <class 'float'>), <function fint>: <class 'numbers.Integral'>, <function fintlist>: <class 'list'>, <class 'float'>: <class 'numbers.Number'>, <function lcstr>: <class 'str'>}
maps functions to their expected output types
Features
Features are discussed in more detail in Features.
- dclab.definitions.check_feature_shape(name, data)[source]
Check if (non)-scalar feature matches with its data’s dimensionality
- Parameters
name (str) – name of the feature
data (array-like) – data whose dimensionality will be checked
- Raises
ValueError – If the data’s shape does not match its scalar description
- dclab.definitions.feature_exists(name, scalar_only=False)[source]
Return True if name is a valid feature name
This function not only checks whether name is in
feature_names
, but also validates against the machine learning scores ml_score_??? (where ? can be a digit or a lower-case letter in the English alphabet).- Parameters
- Returns
valid – True if name is a valid feature, False otherwise.
- Return type
See also
scalar_feature_exists
Wraps feature_exists with scalar_only=True
- dclab.definitions.get_feature_label(name, rtdc_ds=None, with_unit=True)[source]
Return the label corresponding to a feature name
This function not only checks
feature_name2label
, but also supports registered ml_score_??? features.- Parameters
- Returns
label – feature label corresponding to the feature name
- Return type
Notes
TODO: extract feature label from ancillary information when an rtdc_ds is given.
- dclab.definitions.scalar_feature_exists(name)[source]
Convenience method wrapping feature_exists(…, scalar_only=True)
These constants are also available in the dclab.definitions
module.
- dclab.definitions.feat_const.FEATURES_NON_SCALAR
list of non-scalar features
- dclab.definitions.feat_const.feature_names
list of feature names
- dclab.definitions.feat_const.feature_labels
list of feature labels (same order as
feature_names
- dclab.definitions.feat_const.feature_name2label
dict for converting feature names to labels
- dclab.definitions.feat_const.scalar_feature_names
list of scalar feature names
RT-DC dataset manipulation
Base class
- class dclab.rtdc_dataset.RTDCBase(identifier=None, enable_basins=True)[source]
RT-DC measurement base class
Notes
Besides the filter arrays for each data feature, there is a manual boolean filter array
RTDCBase.filter.manual
that can be edited by the user - a boolean value ofFalse
means that the event is excluded from all computations.- basins_enable()[source]
Load all basins defined in the
New in version 0.51.0.
In dclab 0.51.0, we introduced basins, a simple way of combining HDF5-based datasets (including the
HDF5_S3
format). The idea is to be able to store parts of the dataset (e.g. images) in a separate file that could then be located someplace else (e.g. an S3 object store).If an RT-DC file has “basins” defined, then these are sought out and made available via the features_basin property.
- get_downsampled_scatter(xax='area_um', yax='deform', downsample=0, xscale='linear', yscale='linear', remove_invalid=False, ret_mask=False)[source]
Downsampling by removing points at dense locations
- Parameters
xax (str) – Identifier for x axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for y axis
downsample (int) –
Number of points to draw in the down-sampled plot. This number is either
- >=1: exactly downsample to this number by randomly adding
or removing points
0 : do not perform downsampling
xscale (str) – If set to “log”, take the logarithm of the x-values before performing downsampling. This is useful when data are are displayed on a log-scale. Defaults to “linear”.
yscale (str) – See xscale.
remove_invalid (bool) – Remove nan and inf values before downsampling; if set to True, the actual number of samples returned might be smaller than downsample due to infinite or nan values (e.g. due to logarithmic scaling).
ret_mask (bool) – If set to True, returns a boolean array of length len(self) where True values identify the filtered data.
- Returns
xnew, xnew (1d ndarray of lenght N) – Filtered data; N is either identical to downsample or smaller (if remove_invalid==True)
mask (1d boolean array of length len(RTDCBase)) – Array for identifying the downsampled data points
- get_kde_contour(xax='area_um', yax='deform', xacc=None, yacc=None, kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear')[source]
Evaluate the kernel density estimate for contour plots
- Parameters
xax (str) – Identifier for X axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for Y axis
xacc (float) – Contour accuracy in x direction
yacc (float) – Contour accuracy in y direction
kde_type (str) – The KDE method to use
kde_kwargs (dict) – Additional keyword arguments to the KDE method
xscale (str) – If set to “log”, take the logarithm of the x-values before computing the KDE. This is useful when data are displayed on a log-scale. Defaults to “linear”.
yscale (str) – See xscale.
- Returns
X, Y, Z – The kernel density Z evaluated on a rectangular grid (X,Y).
- Return type
coordinates
- get_kde_scatter(xax='area_um', yax='deform', positions=None, kde_type='histogram', kde_kwargs=None, xscale='linear', yscale='linear')[source]
Evaluate the kernel density estimate for scatter plots
- Parameters
xax (str) – Identifier for X axis (e.g. “area_um”, “aspect”, “deform”)
yax (str) – Identifier for Y axis
positions (list of two 1d ndarrays or ndarray of shape (2, N)) – The positions where the KDE will be computed. Note that the KDE estimate is computed from the points that are set in self.filter.all.
kde_type (str) – The KDE method to use, see
kde_methods.methods
kde_kwargs (dict) – Additional keyword arguments to the KDE method
xscale (str) – If set to “log”, take the logarithm of the x-values before computing the KDE. This is useful when data are are displayed on a log-scale. Defaults to “linear”.
yscale (str) – See xscale.
- Returns
density – The kernel density evaluated for the filtered data points.
- Return type
1d ndarray
- static get_kde_spacing(a, scale='linear', method=<function bin_width_doane>, method_kw=None, feat='undefined', ret_scaled=False)[source]
Convenience function for computing the contour spacing
- Parameters
a (ndarray) – feature data
scale (str) – how the data should be scaled (“log” or “linear”)
method (callable) – KDE method to use (see kde_methods submodule)
method_kw (dict) – keyword arguments to method
feat (str) – feature name for debugging
ret_scaled (bool) – whether or not to return the scaled array of a
- get_measurement_identifier()[source]
Return a unique measurement identifier
Return the [experiment]:”run index” configuration feat, if it exists. Otherwise, return the MD5 sum computed from the measurement time, date, and setup identifier.
Returns None if no identifier could be found or computed.
New in version 0.51.0.
- polygon_filter_add(filt)[source]
Associate a Polygon Filter with this instance
- Parameters
filt (int or instance of PolygonFilter) – The polygon filter to add
- polygon_filter_rm(filt)[source]
Remove a polygon filter from this instance
- Parameters
filt (int or instance of PolygonFilter) – The polygon filter to remove
- config
Configuration of the measurement
- export
Export functionalities; instance of
dclab.rtdc_dataset.export.Export
.
- property features
All available features
- property features_basin
All features accessed via upstream basins from other locations
- property features_innate
All features excluding ancillary or temporary features
- property features_loaded
All features that have been computed
This includes ancillary features and temporary features.
Notes
Features that are computationally cheap to compute are always included. They are defined in
dclab.rtdc_dataset.feat_anc_core.FEATURES_RAPID
.
- property features_scalar
All scalar features available
- filter
Filtering functionalities; instance of
dclab.rtdc_dataset.filter.Filter
.
- format
Dataset format (derived from class name)
- abstract property hash
Reproducible dataset hash (defined by derived classes)
- property identifier
Unique (unreproducible) identifier
- logs
Dictionary of log files. Each log file is a list of strings (one string per line).
- tables
Dictionary of tables. Each table is an indexable compound numpy array.
- title
Title of the measurement
DCOR (online) format
- class dclab.rtdc_dataset.RTDC_DCOR(url, host='dcor.mpl.mpg.de', api_key='', use_ssl=None, cert_path=None, *args, **kwargs)[source]
Wrap around the DCOR API
- Parameters
url (str) –
Full URL or resource identifier; valid values are
https://dcor.mpl.mpg.de/api/3/action/dcserv?id=b1404eb5-f661-4920-be79-5ff4e85915d5
dcor.mpl.mpg.de/api/3/action/dcserv?id=b1404eb5-f 661-4920-be79-5ff4e85915d5
b1404eb5-f661-4920-be79-5ff4e85915d5
host (str) – The host machine (used if the host is not given in url)
api_key (str) – API key to access private resources
use_ssl (bool) – Set this to False to disable SSL (should only be used for testing). Defaults to None (does not force SSL if the URL starts with “http://”).
cert_path (pathlib.Path) – The (optional) path to a server CA bundle; this should only be necessary for DCOR instances in the intranet with a custom CA or for certificate pinning.
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase
- static get_full_url(url, use_ssl, host)[source]
Return the full URL to a DCOR resource
- Parameters
url (str) –
Full URL or resource identifier; valid values are
https://dcor.mpl.mpg.de/api/3/action/dcserv?id=caab96f6- df12-4299-aa2e-089e390aafd5’
dcor.mpl.mpg.de/api/3/action/dcserv?id=caab96f6-df12- 4299-aa2e-089e390aafd5
caab96f6-df12-4299-aa2e-089e390aafd5
use_ssl (bool) – Set this to False to disable SSL (should only be used for testing). Defaults to None (does not force SSL if the URL starts with “http://”).
host (str) – Use this host if it is not specified in url
- property hash
Hash value based on file name and content
- class dclab.rtdc_dataset.fmt_dcor.APIHandler(url, api_key='', cert_path=None)[source]
Handles the DCOR api with caching for simple queries
- Parameters
url (str) – URL to DCOR API
api_key (str) – DCOR API token
cert_path (pathlib.Path) – the path to the server’s CA bundle; by default this will use the default certificates (which depends on from where you obtained certifi/requests)
- classmethod add_api_key(api_key)[source]
Add an API Key/Token to the base class
When accessing the DCOR API, all available API Keys/Tokens are used to access a resource (trial and error).
- api_key
DCOR API Token
- api_keys = []
DCOR API Keys/Tokens in the current session
- cache_queries = ['metadata', 'size', 'feature_list', 'valid']
these are cached to minimize network usage
- url
DCOR API URL
- verify
keyword argument to
requests.request()
Dictionary format
- class dclab.rtdc_dataset.RTDC_Dict(ddict, *args, **kwargs)[source]
Dictionary-based RT-DC dataset
- Parameters
ddict (dict) –
Dictionary with features as keys (valid features like “area_cvx”, “deform”, “image” are defined by dclab.definitions.feature_exists) with which the class will be instantiated. The configuration is set to the default configuration of dclab.
Changed in version 0.27.0: Scalar features are automatically converted to arrays.
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase
HDF5 (.rtdc) format
- class dclab.rtdc_dataset.RTDC_HDF5(h5path: str | pathlib.Path | BinaryIO, h5kwargs: Dict[str, Any] = None, *args, **kwargs)[source]
HDF5 file format for RT-DC measurements
- Parameters
h5path (str or pathlib.Path or file-like object) – Path to an ‘.rtdc’ measurement file or a file-like object
h5kwargs (dict) – Additional keyword arguments given to
h5py.File
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase
- path
Path to the experimental HDF5 (.rtdc) file
- Type
- property hash
Hash value based on file name and content
- dclab.rtdc_dataset.fmt_hdf5.MIN_DCLAB_EXPORT_VERSION = '0.3.3.dev2'
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
Hierarchy format
- class dclab.rtdc_dataset.RTDC_Hierarchy(hparent, apply_filter=True, *args, **kwargs)[source]
Hierarchy dataset (filtered from RTDCBase)
A few words on hierarchies: The idea is that a subclass of RTDCBase can use the filtered data of another subclass of RTDCBase and interpret these data as unfiltered events. This comes in handy e.g. when the percentage of different subpopulations need to be distinguished without the noise in the original data.
Children in hierarchies always update their data according to the filtered event data from their parent when apply_filter is called. This makes it easier to save and load hierarchy children with e.g. Shape-Out and it makes the handling of hierarchies more intuitive (when the parent changes, the child changes as well).
- Parameters
hparent (instance of RTDCBase) – The hierarchy parent
apply_filter (bool) – Whether to apply the filter during instantiation; If set to False, apply_filter must be called manually.
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase
S3 file format
- class dclab.rtdc_dataset.RTDC_S3(url: str, secret_id: str = '', secret_key: str = '', *args, **kwargs)[source]
Access RT-DC measurements in an S3-compatible object store
This is essentially just a wrapper around
RTDC_HDF5
with s3fs passing a file object to h5py.- Parameters
- path
Path to the experimental HDF5 (.rtdc) file
- Type
TDMS format
- class dclab.rtdc_dataset.RTDC_TDMS(tdms_path, *args, **kwargs)[source]
TDMS file format for RT-DC measurements
- Parameters
tdms_path (str or pathlib.Path) – Path to a ‘.tdms’ measurement file.
*args – Arguments for RTDCBase
**kwargs – Keyword arguments for RTDCBase
- path
Path to the experimental dataset (main .tdms file)
- Type
- dclab.rtdc_dataset.fmt_tdms.get_project_name_from_path(path, append_mx=False)[source]
Get the project name from a path.
For a path “/home/peter/hans/HLC12398/online/M1_13.tdms” or For a path “/home/peter/hans/HLC12398/online/data/M1_13.tdms” or without the “.tdms” file, this will return always “HLC12398”.
- Parameters
path (str or pathlib.Path) – path to tdms file
append_mx (bool) – append measurement number, e.g. “M1”
Basin features
A basin represents data from an external source
The external data must be a valid RT-DC dataset, subclasses should ensure that the corresponding API is available.
- dclab.rtdc_dataset.feat_basin.Basin.basin_type
Storage type to use (e.g. “file” or “remote”)
- dclab.rtdc_dataset.feat_basin.Basin.features
Features made available by the basin
Ancillaries
Computation of ancillary features
Ancillary features are computed on-the-fly in dclab if the required data are available. The features are registered here and are computed when RTDCBase.__getitem__ is called with the respective feature name. When RTDCBase.__contains__ is called with the feature name, then the feature is not yet computed, but the prerequisites are evaluated:
In [1]: import dclab
In [2]: ds = dclab.new_dataset("data/example.rtdc")
In [3]: ds.config["calculation"]["emodulus lut"] = "LE-2D-FEM-19"
In [4]: ds.config["calculation"]["emodulus medium"] = "CellCarrier"
In [5]: ds.config["calculation"]["emodulus temperature"] = 23.0
In [6]: ds.config["calculation"]["emodulus viscosity model"] = 'buyukurganci-2022'
In [7]: "emodulus" in ds # nothing is computed yet
Out[7]: True
In [8]: ds["emodulus"] # now data are computed and cached
Out[8]:
array([1.11112189, 0.98155247, nan, ..., nan, nan,
0.68137091])
Once the data has been computed, RTDCBase caches it in the _ancillaries property dict together with a hash that is computed with AncillaryFeature.hash. The hash is computed from the feature data req_features and the configuration metadata req_config.
- class dclab.rtdc_dataset.feat_anc_core.ancillary_feature.AncillaryFeature(feature_name, method, req_config=None, req_features=None, req_func=<function AncillaryFeature.<lambda>>, priority=0, data=None, identifier=None)[source]
A data feature that is computed from existing data
- Parameters
feature_name (str) – The name of the ancillary feature, e.g. “emodulus”.
method (callable) – The method that computes the feature. This method takes an instance of RTDCBase as argument.
req_config (list) – Required configuration parameters to compute the feature, e.g. [“calculation”, [“emodulus lut”, “emodulus viscosity”]]
req_features (list) – Required existing features in the dataset, e.g. [“area_cvx”, “deform”]
req_func (callable) –
A function that takes an instance of RTDCBase as an argument and checks whether any other necessary criteria are met. By default, this is a lambda function that returns True. The function should return False if the necessary criteria are not met. This function may also return a hashable object (via
dclab.util.objstr()
) instead of True, if the criteria are subject to change. In this case, the return value is used for identifying the cached ancillary feature.Changed in version 0.27.0: Support non-boolean return values for caching purposes.
priority (int) – The priority of the feature; if there are multiple AncillaryFeature defined for the same feature_name, then the priority of the features defines which feature returns True in self.is_available. A higher value means a higher priority.
data (object or BaseModel) – Any other data relevant for the feature (e.g. the ML model for computing ‘ml_score_xxx’ features)
identifier (None or str) – A unique identifier (e.g. MD5 hash) of the ancillary feature. For PluginFeatures or ML features, this should be computed at least from the input file and the feature name.
Notes
req_config and req_features are used to test whether the feature can be computed in self.is_available.
- static available_features(rtdc_ds)[source]
Determine available features for an RT-DC dataset
- Parameters
rtdc_ds (instance of RTDCBase) – The dataset to check availability for
- Returns
features – Dictionary with feature names as keys and instances of AncillaryFeature as values.
- Return type
- static check_data_size(rtdc_ds, data_dict)[source]
Check the feature data is the correct size. If it isn’t, resize it.
- Parameters
rtdc_ds (instance of RTDCBase) – The dataset from which the features are computed
data_dict (dict) – Dictionary with AncillaryFeature.feature_name as keys and the computed data features (to be resized) as values.
- Returns
data_dict – Dictionary with feature_name as keys and the correctly resized data features as values.
- Return type
- compute(rtdc_ds)[source]
Compute the feature with self.method. All ancillary features that share the same method will also be populated automatically.
- Parameters
rtdc_ds (instance of RTDCBase) – The dataset to compute the feature for
- Returns
data_dict – Dictionary with AncillaryFeature.feature_name as keys and the computed data features (read-only) as values.
- Return type
- hash(rtdc_ds)[source]
Used for identifying an ancillary computation
The required features, the used configuration keys/values, and the return value of the requirement function are hashed.
- is_available(rtdc_ds, verbose=False)[source]
Check whether the feature is available
- Parameters
rtdc_ds (instance of RTDCBase) – The dataset to check availability for
- Returns
available – True, if feature can be computed with compute
- Return type
Notes
This method returns False for a feature if there is a feature defined with the same name but with higher priority (even if the feature would be available otherwise).
- feature_names = ['time', 'index', 'area_ratio', 'area_um', 'aspect', 'deform', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'fl1_max_ctc', 'fl2_max_ctc', 'fl3_max_ctc', 'fl1_max_ctc', 'fl2_max_ctc', 'fl1_max_ctc', 'fl3_max_ctc', 'fl2_max_ctc', 'fl3_max_ctc', 'contour', 'bright_avg', 'bright_sd', 'bright_bc_avg', 'bright_bc_sd', 'bright_perc_10', 'bright_perc_90', 'inert_ratio_cvx', 'inert_ratio_prnc', 'inert_ratio_raw', 'tilt', 'volume', 'ml_class', 'circ_times_area', 'area_exp']
All feature names registered
- features = [<AncillaryFeature 'time' (no ID) with priority 0>, <AncillaryFeature 'index' (no ID) with priority 0>, <AncillaryFeature 'area_ratio' (no ID) with priority 0>, <AncillaryFeature 'area_um' (no ID) with priority 0>, <AncillaryFeature 'aspect' (no ID) with priority 0>, <AncillaryFeature 'deform' (no ID) with priority 0>, <AncillaryFeature 'emodulus' (no ID) with priority 5>, <AncillaryFeature 'emodulus' (no ID) with priority 1>, <AncillaryFeature 'emodulus' (no ID) with priority 4>, <AncillaryFeature 'emodulus' (no ID) with priority 0>, <AncillaryFeature 'emodulus' (no ID) with priority 2>, <AncillaryFeature 'fl1_max_ctc' (no ID) with priority 1>, <AncillaryFeature 'fl2_max_ctc' (no ID) with priority 1>, <AncillaryFeature 'fl3_max_ctc' (no ID) with priority 1>, <AncillaryFeature 'fl1_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl2_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl1_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl3_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl2_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'fl3_max_ctc' (no ID) with priority 0>, <AncillaryFeature 'contour' (no ID) with priority 0>, <AncillaryFeature 'bright_avg' (no ID) with priority 0>, <AncillaryFeature 'bright_sd' (no ID) with priority 0>, <AncillaryFeature 'bright_bc_avg' (no ID) with priority 0>, <AncillaryFeature 'bright_bc_sd' (no ID) with priority 0>, <AncillaryFeature 'bright_perc_10' (no ID) with priority 0>, <AncillaryFeature 'bright_perc_90' (no ID) with priority 0>, <AncillaryFeature 'inert_ratio_cvx' (no ID) with priority 0>, <AncillaryFeature 'inert_ratio_prnc' (no ID) with priority 0>, <AncillaryFeature 'inert_ratio_raw' (no ID) with priority 0>, <AncillaryFeature 'tilt' (no ID) with priority 0>, <AncillaryFeature 'volume' (no ID) with priority 0>, <AncillaryFeature 'ml_class' (no ID) with priority 0>, <PlugInFeature 'circ_times_area' (id 70254...) with priority 0>, <PlugInFeature 'area_exp' (id 5f03f...) with priority 0>]
All ancillary features registered
Plugin features
New in version 0.34.0.
- class dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.PlugInFeature(feature_name: str, info: dict, plugin_path: Optional[Union[str, pathlib.Path]] = None)[source]
A user-defined plugin feature
- Parameters
feature_name (str) – name of a feature that matches that defined in info
info (dict) –
Full plugin recipe (for all features) as given in the info dictionary in the plugin file. At least the following keys must be specified:
”method”: callable function computing the plugin feature values (takes an :class`dclab.rtdc_dataset.core.RTDCBase` as argument)
”feature names”: list of plugin feature names provided by the plugin
The following features are optional:
”description”: short (one-line) description of the plugin
”long description”: long description of the plugin
”feature labels”: feature labels used e.g. for plotting
”feature shapes”: list of tuples for each feature indicating the shape (this is required only for non-scalar features; for scalar features simply set this to
None
or(1,)
).”scalar feature”: list of boolean values indicating whether the features are scalar
”config required”: configuration keys required to compute the plugin features (see the req_config parameter for
AncillaryFeature
)”features required”: list of feature names required to compute the plugin features (see the req_features parameter for
AncillaryFeature
)”method check required”: additional method that checks whether the features can be computed (see the req_func parameter for
AncillaryFeature
)”version”: version of this plugin (please use semantic verioning)
plugin_path (str or pathlib.Path, optional) – path which was used to load the PlugInFeature with
load_plugin_feature()
.
Notes
PluginFeature inherits from
AncillaryFeature
. Please read the advanced section on PluginFeatures in the dclab docs.- feature_name
Plugin feature name
- plugin_feature_info
Dictionary containing all information relevant for this particular plugin feature instance
- plugin_path
Path to the original plugin file
- dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.import_plugin_feature_script(plugin_path: str | pathlib.Path) dict [source]
Import the user-defined recipe and return the info dictionary
- Parameters
plugin_path (str or Path) – pathname to a valid dclab plugin script
- Returns
info – Dictionary with the information required to instantiate one (or multiple)
PlugInFeature
.- Return type
- Raises
PluginImportError – If the plugin can not be found
Notes
One recipe may define multiple plugin features.
- dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.load_plugin_feature(plugin_path: str | pathlib.Path) List[dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.PlugInFeature] [source]
Find and load PlugInFeature(s) from a user-defined recipe
- Parameters
plugin_path (str or Path) – pathname to a valid dclab plugin Python script
- Returns
plugin_list – list of PlugInFeature instances loaded from plugin_path
- Return type
list of PlugInFeature
- Raises
ValueError – If the script dictionary “feature names” are not a list
Notes
One recipe may define multiple plugin features.
See also
import_plugin_feature_script
function that imports the plugin script
PlugInFeature
class handling the plugin feature information
dclab.rtdc_dataset.feat_temp.register_temporary_feature
alternative method for creating user-defined features
- dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.remove_all_plugin_features()[source]
Convenience function for removing all PlugInFeature instances
See also
remove_plugin_feature
remove a single PlugInFeature instance
- dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.remove_plugin_feature(plugin_instance: dclab.rtdc_dataset.feat_anc_plugin.plugin_feature.PlugInFeature)[source]
Convenience function for removing a PlugInFeature instance
- Parameters
plugin_instance (PlugInFeature) – The PlugInFeature instance to be removed from dclab
- Raises
TypeError – If the plugin_instance is not a PlugInFeature instance
Temporary features
New in version 0.33.0.
- dclab.rtdc_dataset.feat_temp.deregister_temporary_feature(feature: str)[source]
Convenience function for deregistering a temporary feature
This method is mostly used during testing. It does not remove the actual feature data from any dataset; the data will stay in memory but is not accessible anymore through the public methods of the
RTDCBase
user interface.
- dclab.rtdc_dataset.feat_temp.register_temporary_feature(feature: str, label: Optional[str] = None, is_scalar: bool = True)[source]
Register a new temporary feature
Temporary features are custom features that can be defined ad hoc by the user. Temporary features are helpful when the integral features are not enough, e.g. for prototyping, testing, or collating with other data. Temporary features allow you to leverage the full functionality of
RTDCBase
with your custom features (no need to go for a custom pandas.Dataframe).
- dclab.rtdc_dataset.feat_temp.set_temporary_feature(rtdc_ds: dclab.rtdc_dataset.core.RTDCBase, feature: str, data: numpy.ndarray)[source]
Set temporary feature data for a dataset
- Parameters
rtdc_ds (dclab.RTDCBase) – Dataset for which to set the feature. Note that the length of the feature data must match the number of events in rtdc_ds. If the dataset is a hierarchy child, the data will also be set in the parent dataset, but only for those events that are part of the child. For all events in the parent dataset that are not part of the child dataset, the temporary feature is set to np.nan.
feature (str) – Feature name
data (np.ndarray) – The data
Config
- class dclab.rtdc_dataset.config.Configuration(files=None, cfg=None, disable_checks=False)[source]
Configuration class for RT-DC datasets
This class has a dictionary-like interface to access and set configuration values, e.g.
cfg = load_from_file("/path/to/config.txt") # access the channel width cfg["setup"]["channel width"] # modify the channel width cfg["setup"]["channel width"] = 30
- Parameters
files (list of files) – The config files with which to initialize the configuration
cfg (dict-like) – The dictionary with which to initialize the configuration
disable_checks (bool) – Set this to True if you want to avoid checking against section and key names defined in dclab.definitions using
verify_section_key()
. This avoids excess warning messages when loading data from configuration files not generated by dclab.
- tojson()[source]
Convert the configuration to a JSON string
Note that the data type of some configuration options will likely be lost.
- tostring(sections=None)[source]
Convert the configuration to its string representation
The optional argument sections allows to export only specific sections of the configuration, i.e. sections=dclab.dfn.CFG_METADATA will only export configuration data from the original measurement and no filtering data.
Export
- class dclab.rtdc_dataset.export.Export(rtdc_ds)[source]
Export functionalities for RT-DC datasets
- avi(path, filtered=True, override=False)[source]
Exports filtered event images to an avi file
- Parameters
Notes
Raises OSError if current dataset does not contain image data
- fcs(path, features, meta_data=None, filtered=True, override=False)[source]
Export the data of an RT-DC dataset to an .fcs file
- Parameters
path (str) – Path to an .fcs file. The ending .fcs is added automatically.
features (list of str) – The features in the resulting .fcs file. These are strings that are defined by dclab.definitions.scalar_feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “aspect”.
meta_data (dict) – User-defined, optional key-value pairs that are stored in the primary TEXT segment of the FCS file; the version of dclab is stored there by default
filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
override (bool) – If set to True, an existing file
path
will be overridden. If set to False, raises OSError ifpath
exists.
Notes
Due to incompatibility with the .fcs file format, all events with NaN-valued features are not exported.
- hdf5(path, features=None, filtered=True, logs=False, tables=False, meta_prefix='src_', override=False, compression_kwargs=None, compression='deprecated', skip_checks=False)[source]
Export the data of the current instance to an HDF5 file
- Parameters
path (str) – Path to an .rtdc file. The ending .rtdc is added automatically.
features (list of str) – The features in the resulting .rtdc file. These are strings that are defined by dclab.definitions.feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “image”. Defaults to self.rtdc_ds.features_innate.
filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
logs (bool) – Whether to store the logs of the original file prefixed with source_ to the output file.
tables (bool) – Whether to store the tables of the original file prefixed with source_ to the output file.
meta_prefix (str) – Prefix for log and table names in the exported file
override (bool) – If set to True, an existing file
path
will be overridden. If set to False, raises OSError ifpath
exists.compression_kwargs (dict) – Dictionary with the keys “compression” and “compression_opts” which are passed to
h5py.H5File.create_dataset()
. The default is Zstandard compression with the lowest compression level hdf5plugin.Zstd(clevel=1).Compression method used for data storage; one of [None, “lzf”, “gzip”, “szip”].
Deprecated since version 0.43.0: Use compression_kwargs instead.
skip_checks (bool) – Disable checking whether all features have the same length.
- tsv(path, features, meta_data=None, filtered=True, override=False)[source]
Export the data of the current instance to a .tsv file
- Parameters
path (str) – Path to a .tsv file. The ending .tsv is added automatically.
features (list of str) – The features in the resulting .tsv file. These are strings that are defined by dclab.definitions.scalar_feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “aspect”.
meta_data (dict) – User-defined, optional key-value pairs that are stored at the beginning of the tsv file - one key-value pair is stored per line which starts with a hash. The version of dclab is stored there by default.
filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
override (bool) – If set to True, an existing file
path
will be overridden. If set to False, raises OSError ifpath
exists.
Filter
- class dclab.rtdc_dataset.filter.Filter(rtdc_ds)[source]
Boolean filter arrays for RT-DC measurements
- Parameters
rtdc_ds (instance of RTDCBase) – The RT-DC dataset the filter applies to
- update(rtdc_ds, force=None)[source]
Update the filters according to rtdc_ds.config[“filtering”]
- Parameters
rtdc_ds (dclab.rtdc_dataset.core.RTDCBase) – The measurement to which the filter is applied
force (list) – A list of feature names that must be refiltered with min/max values.
Notes
This function is called when
ds.apply_filter
is called.
- property all
All filters combined (see
Filter.update()
)Use this property to filter the features of
dclab.rtdc_dataset.RTDCBase
instances
- property box
All box filters
- property invalid
Invalid (nan/inf) events
- property polygon
Polygon filters
Low-level functionalities
downsampling
Content-based downsampling of ndarrays
- dclab.downsampling.downsample_grid(a, b, samples, remove_invalid=False, ret_idx=False)[source]
Content-based downsampling for faster visualization
The arrays a and b make up a 2D scatter plot with high and low density values. This method takes out points at indices with high density.
- Parameters
a (1d ndarrays) – The input arrays to downsample
b (1d ndarrays) – The input arrays to downsample
samples (int) – The desired number of samples
remove_invalid (bool) – Remove nan and inf values before downsampling; if set to True, the actual number of samples returned might be smaller than samples due to infinite or nan values.
ret_idx (bool) – Also return a boolean array that corresponds to the downsampled indices in a and b.
- Returns
dsa, dsb (1d ndarrays of shape (samples,)) – The arrays a and b downsampled by evenly selecting points and pseudo-randomly adding or removing points to match samples.
idx (1d boolean array with same shape as a) – Only returned if ret_idx is True. A boolean array such that a[idx] == dsa
- dclab.downsampling.downsample_rand(a, samples, remove_invalid=False, ret_idx=False)[source]
Downsampling by randomly removing points
- Parameters
- Returns
dsa (1d ndarray of size samples) – The pseudo-randomly downsampled array a
idx (1d boolean array with same shape as a) – Only returned if ret_idx is True. A boolean array such that a[idx] == dsa
features
image-based
- dclab.features.contour.get_contour(mask)[source]
Compute the image contour from a mask
The contour is computed in a very inefficient way using scikit-image and a conversion of float coordinates to pixel coordinates.
- Parameters
mask (binary ndarray of shape (M,N) or (K,M,N)) – The mask outlining the pixel positions of the event. If a 3d array is given, then K indexes the individual contours.
- Returns
cont – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.
- Return type
ndarray or list of K ndarrays of shape (J,2)
- dclab.features.bright.get_bright(mask, image, ret_data='avg,sd')[source]
Compute avg and/or std of the event brightness
The event brightness is defined by the gray-scale values of the image data within the event mask area.
- Parameters
mask (ndarray or list of ndarrays of shape (M,N) and dtype bool) – The mask values, True where the event is located in image.
image (ndarray or list of ndarrays of shape (M,N)) – A 2D array that holds the image in form of grayscale values of an event.
ret_data (str) – A comma-separated list of metrices to compute - “avg”: compute the average - “sd”: compute the standard deviation Selected metrics are returned in alphabetical order.
- Returns
bright_avg (float or ndarray of size N) – Average image data within the contour
bright_std (float or ndarray of size N) – Standard deviation of image data within the contour
- dclab.features.inert_ratio.get_inert_ratio_cvx(cont)[source]
Compute the inertia ratio of the convex hull of a contour
The inertia ratio is computed from the central second order of moments along x (mu20) and y (mu02) via sqrt(mu20/mu02).
- Parameters
cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.
- Returns
inert_ratio_cvx (float or ndarray of size N) – The inertia ratio of the contour’s convex hull
.. versionchanged:: 0.48.2 – For long channels, an integer overflow could occur in previous versions, leading invalid or nan values. See https://github.com/DC-analysis/dclab/issues/212
Notes
The contour moments mu20 and mu02 are computed the same way they are computed in OpenCV’s moments.cpp.
See also
get_inert_ratio_raw
Compute inertia ratio of a raw contour
References
- dclab.features.inert_ratio.get_inert_ratio_raw(cont)[source]
Compute the inertia ratio of a contour
The inertia ratio is computed from the central second order of moments along x (mu20) and y (mu02) via sqrt(mu20/mu02).
- Parameters
cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.
- Returns
inert_ratio_raw (float or ndarray of size N) – The inertia ratio of the contour
.. versionchanged:: 0.48.2 – For long channels, an integer overflow could occur in previous versions, leading invalid or nan values. See https://github.com/DC-analysis/dclab/issues/212
Notes
The contour moments mu20 and mu02 are computed the same way they are computed in OpenCV’s moments.cpp.
See also
get_inert_ratio_cvx
Compute inertia ratio of the convex hull of a contour
References
- dclab.features.volume.get_volume(cont, pos_x, pos_y, pix, fix_orientation=False)[source]
Calculate the volume of a polygon revolved around an axis
The volume estimation assumes rotational symmetry.
- Parameters
cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event [px] e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.
pos_x (float or ndarray of length N) – The x coordinate(s) of the centroid of the event(s) [µm] e.g. obtained using mm.pos_x
pos_y (float or ndarray of length N) – The y coordinate(s) of the centroid of the event(s) [µm] e.g. obtained using mm.pos_y
pix (float) – The detector pixel size in µm. e.g. obtained using: mm.config[“imaging”][“pixel size”]
fix_orientation (bool) – If set to True, make sure that the orientation of the contour is counter-clockwise in the r-z plane (see
vol_revolve()
). This is False by default, because (1) Shape-In always stores the contours in the correct orientation and (2) there may be events with high porosity where “fixing” the orientation makes things worse and a negative volume is returned.
- Returns
volume – volume in um^3
- Return type
float or ndarray
Notes
The computation of the volume is based on a full rotation of the upper and the lower halves of the contour from which the average is then used.
The volume is computed radially from the the center position given by (pos_x, pos_y). For sufficiently smooth contours, such as densely sampled ellipses, the center position does not play an important role. For contours that are given on a coarse grid, as is the case for RT-DC, the center position must be given.
References
Yields identical results to the Matlab script by Geoff Olynyk <https://de.mathworks.com/matlabcentral/fileexchange/36525-volrevolve>`_
- dclab.features.volume.counter_clockwise(cx, cy)[source]
Put contour coordinates into counter-clockwise order
- Parameters
cx (1d ndarrays) – The x- and y-coordinates of the contour
cy (1d ndarrays) – The x- and y-coordinates of the contour
- Returns
The x- and y-coordinates of the contour in counter-clockwise orientation.
- Return type
cx_cc, cy_cc
Notes
The contour must be centered around (0, 0).
- dclab.features.volume.vol_revolve(r, z, point_scale=1.0)[source]
Calculate the volume of a polygon revolved around the Z-axis
This implementation yields the same results as the volRevolve Matlab function by Geoff Olynyk (from 2012-05-03) https://de.mathworks.com/matlabcentral/fileexchange/36525-volrevolve.
The difference here is that the volume is computed using (a much more approachable) implementation using the volume of a truncated cone (https://de.wikipedia.org/wiki/Kegelstumpf).
\[V = \frac{h \cdot \pi}{3} \cdot (R^2 + R \cdot r + r^2)\]Where \(h\) is the height of the cone and \(r\) and R are the smaller and larger radii of the truncated cone.
Each line segment of the contour resembles one truncated cone. If the z-step is positive (counter-clockwise contour), then the truncated cone volume is added to the total volume. If the z-step is negative (e.g. inclusion), then the truncated cone volume is removed from the total volume.
Changed in version 0.37.0: The volume in previous versions was overestimated by on average 2µm³.
- Parameters
r (1d np.ndarray) – radial coordinates (perpendicular to the z axis)
z (1d np.ndarray) – coordinate along the axis of rotation
point_scale (float) – point size in your preferred units; The volume is multiplied by a factor of point_scale**3.
Notes
The coordinates must be given in counter-clockwise order, otherwise the volume will be negative.
emodulus
Computation of apparent Young’s modulus for RT-DC measurements
- dclab.features.emodulus.extrapolate_emodulus(lut, datax, deform, emod, deform_norm, deform_thresh=0.05, inplace=True)[source]
Use spline interpolation to fill in nan-values
When points (datax, deform) are outside the convex hull of the lut, then
scipy.interpolate.griddata()
returns nan-valules.With this function, some of these nan-values are extrapolated using
scipy.interpolate.SmoothBivariateSpline
. The supported extrapolation values are currently limited to those where the deformation is above 0.05.A warning will be issued, because this is not really recommended.
- Parameters
lut (ndarray of shape (N, 3)) – The normalized (!! see
normalize()
) LUT (first axis is points, second axis enumerates datax, deform, and emodulus)datax (ndarray of size N) – The normalized x data (corresponding to lut[:, 0])
deform (ndarray of size N) – The normalized deform (corresponding to lut[:, 1])
emod (ndarray of size N) – The emodulus (corresponding to lut[:, 2]); If emod does not contain nan-values, there is nothing to do here.
deform_norm (float) – The normalization value used to normalize lut[:, 1] and deform.
deform_thresh (float) – Not the entire LUT is used for bivariate spline interpolation. Only the points where lut[:, 1] > deform_thresh/deform_norm are used. This is necessary, because for small deformations, the LUT has an extreme slope that kills any meaningful spline interpolation.
inplace (bool) – If True (default), replaces nan values in emod in-place. If False, emod is not modified.
- dclab.features.emodulus.get_emodulus(deform: float | np.array, area_um: float | np.array | None = None, volume: float | np.array | None = None, medium: float | str = '0.49% MC-PBS', channel_width: float = 20.0, flow_rate: float = 0.16, px_um: float = 0.34, temperature: float | np.ndarray | None = 23.0, lut_data: str | pathlib.Path | np.ndarray = 'LE-2D-FEM-19', visc_model: Literal['herold-2017', 'herold-2017-fallback', 'buyukurganci-2022', 'kestin-1978', None] = 'herold-2017-fallback', extrapolate: bool = False, copy: bool = True)[source]
Compute apparent Young’s modulus using a look-up table
- Parameters
area_um (float or ndarray) – Apparent (2D image) area [µm²] of the event(s)
deform (float or ndarray) – Deformation (1-circularity) of the event(s)
volume (float or ndarray) –
Apparent volume of the event(s). It is not possible to define volume and area_um at the same time (makes no sense).
New in version 0.25.0.
medium (str or float) – The medium to compute the viscosity for. If a string is given, the viscosity is computed. If a float is given, this value is used as the viscosity in mPa*s (Note that temperature and visc_model must be set to None in this case).
channel_width (float) – The channel width [µm]
flow_rate (float) – Flow rate [µL/s]
px_um (float) – The detector pixel size [µm] used for pixelation correction. Set to zero to disable.
temperature (float, ndarray, or None) – Temperature [°C] of the event(s)
lut_data (path, str, or tuple of (np.ndarray of shape (N, 3), dict)) –
The LUT data to use. If it is a key in
INTERNAL_LUTS
, then the respective LUT will be used. Otherwise, a path to a file on disk or a tuple (LUT array, metadata) is possible. The LUT metadata is used to check whether the given features (e.g. area_um and deform) are valid interpolation choices.New in version 0.25.0.
visc_model (str) – The viscosity model to use, see
dclab.features.emodulus.viscosity.get_viscosity()
extrapolate (bool) – Perform extrapolation using
extrapolate_emodulus()
. This is discouraged!copy (bool) – Copy input arrays. If set to false, input arrays are overridden.
- Returns
elasticity – Apparent Young’s modulus in kPa
- Return type
float or ndarray
Notes
The look-up table used was computed with finite elements methods according to [MMM+17] and complemented with analytical isoelastics from [MOG+15]. The original simulation results are available on figshare [WMM+20].
The computation of the Young’s modulus takes into account a correction for the viscosity (medium, channel width, flow rate, and temperature) [MOG+15] and a correction for pixelation for the deformation which were derived from a (pixelated) image [Her17].
Note that while deformation is pixelation-corrected, area_um and volume are scaled to match the LUT data. This is somewhat fortunate, because we don’t have to worry about the order of applying pixelation correction and scale conversion.
By using external LUTs, it is possible to interpolate on the volume-deformation plane. This feature was added in version 0.25.0.
See also
dclab.features.emodulus.viscosity.get_viscosity
compute viscosity for known media
- dclab.features.emodulus.normalize(data, dmax)[source]
Perform normalization in-place for interpolation
Note that
scipy.interpolate.griddata()
has a rescale option which rescales the data onto the unit cube. For some reason this does not work well with LUT data, so we just normalize it by dividing by the maximum value.
- dclab.features.emodulus.INACCURATE_SPLINE_EXTRAPOLATION = False
Set this to True to globally enable spline extrapolation when the area_um/deform data are outside the LUT. This is discouraged and a
KnowWhatYouAreDoingWarning
warning will be issued.
- dclab.features.emodulus.load.get_lut_path(path_or_id)[source]
Find the path to a LUT
- path_or_id: str or pathlib.Path
Identifier of a LUT. This can be either an existing path (checked first), or an internal identifier (see
INTERNAL_LUTS
).
- dclab.features.emodulus.load.load_lut(lut_data: str | pathlib.Path | numpy.ndarray = 'LE-2D-FEM-19')[source]
Load LUT data from disk
- Parameters
lut_data (path, str, or tuple of (np.ndarray of shape (N, 3), dict)) – The LUT data to use. If it is a key in
INTERNAL_LUTS
, then the respective LUT will be used. Otherwise, a path to a file on disk or a tuple (LUT array, meta data) is possible.- Returns
lut (np.ndarray of shape (N, 3)) – The LUT data for interpolation
meta (dict) – The LUT metadata
Notes
If lut_data is a tuple of (lut, meta), then nothing is actually done (this is implemented for user convenience).
- dclab.features.emodulus.load.load_mtext(path)[source]
Load column-based data from text files with metadata
This file format is used for isoelasticity lines and look-up table data in dclab.
The text file is loaded with numpy.loadtxt. The metadata are stored as a json string between the “BEGIN METADATA” and the “END METADATA” tags. The last comment (#) line before the actual data defines the features with units in square brackets and tab-separated. For instance:
# […] # # BEGIN METADATA # { # “authors”: “A. Mietke, C. Herold, J. Guck”, # “channel_width”: 20.0, # “channel_width_unit”: “um”, # “date”: “2018-01-30”, # “dimensionality”: “2Daxis”, # “flow_rate”: 0.04, # “flow_rate_unit”: “uL/s”, # “fluid_viscosity”: 15.0, # “fluid_viscosity_unit”: “mPa s”, # “identifier”: “LE-2D-ana-18”, # “method”: “analytical”, # “model”: “linear elastic”, # “publication”: “https://doi.org/10.1016/j.bpj.2015.09.006”, # “software”: “custom Matlab code”, # “summary”: “2D-axis-symmetric analytical solution” # } # END METADATA # # […] # # area_um [um^2] deform emodulus [kPa] 3.75331e+00 5.14496e-03 9.30000e-01 4.90368e+00 6.72683e-03 9.30000e-01 6.05279e+00 8.30946e-03 9.30000e-01 7.20064e+00 9.89298e-03 9.30000e-01 […]
- dclab.features.emodulus.load.register_lut(path, identifier=None)[source]
Register an external LUT file in dclab
This will add it to
EXTERNAL_LUTS
, which is required for emodulus computation as an ancillary feature.- Parameters
path (str or pathlib.Path) – Path to the external LUT file
identifier (str or None) – The identifier is used for ancillary emodulus computation via the [calculation]: “emodulus lut” key. It is also used as the key in
EXTERNAL_LUTS
during registration. If not specified, (default) then the identifier given as JSON metadata in path is used.
- dclab.features.emodulus.load.EXTERNAL_LUTS = {}
Dictionary of look-up tables that the user added via
register_lut()
.
- dclab.features.emodulus.load.INTERNAL_LUTS = {'HE-2D-FEM-22': 'lut_HE-2D-FEM-22.txt', 'HE-3D-FEM-22': 'lut_HE-3D-FEM-22.txt', 'LE-2D-FEM-19': 'lut_LE-2D-FEM-19.txt'}
Dictionary of look-up tables shipped with dclab.
Pixelation correction definitions
- dclab.features.emodulus.pxcorr.corr_deform_with_area_um(area_um, px_um=0.34)[source]
Deformation correction for area_um-deform data
The contour in RT-DC measurements is computed on a pixelated grid. Due to sampling problems, the measured deformation is overestimated and must be corrected.
The correction formula is described in [Her17].
- Parameters
- Returns
deform_delta – Error of the deformation of the event(s) that must be subtracted from deform. deform_corr = deform - deform_delta
- Return type
float or ndarray
- dclab.features.emodulus.pxcorr.corr_deform_with_volume(volume, px_um=0.34)[source]
Deformation correction for volume-deform data
The contour in RT-DC measurements is computed on a pixelated grid. Due to sampling problems, the measured deformation is overestimated and must be corrected.
The correction is derived in scripts/pixelation_correction.py.
- Parameters
- Returns
deform_delta – Error of the deformation of the event(s) that must be subtracted from deform. deform_corr = deform - deform_delta
- Return type
float or ndarray
- dclab.features.emodulus.pxcorr.get_pixelation_delta(feat_corr, feat_absc, data_absc, px_um=0.34)[source]
Convenience function for obtaining pixelation correction
- dclab.features.emodulus.pxcorr.get_pixelation_delta_pair(feat1, feat2, data1, data2, px_um=0.34)[source]
Convenience function that returns pixelation correction pair
Scale conversion applicable to a linear elastic model
- dclab.features.emodulus.scale_linear.convert(area_um, deform, channel_width_in, channel_width_out, emodulus=None, flow_rate_in=None, flow_rate_out=None, viscosity_in=None, viscosity_out=None, inplace=False)[source]
convert area-deformation-emodulus triplet
The conversion formula is described in [MOG+15].
- Parameters
area_um (ndarray) – Convex cell area [µm²]
deform (ndarray) – Deformation
channel_width_in (float) – Original channel width [µm]
channel_width_out (float) – Target channel width [µm]
emodulus (ndarray) – Young’s Modulus [kPa]
flow_rate_in (float) – Original flow rate [µL/s]
flow_rate_out (float) – Target flow rate [µL/s]
viscosity_in (float) – Original viscosity [mPa*s]
viscosity_out (float or ndarray) – Target viscosity [mPa*s]; This can be an array
inplace (bool) – If True, override input arrays with corrected data
- Returns
area_um_corr (ndarray) – Corrected cell area [µm²]
deform_corr (ndarray) – Deformation (a copy if inplace is False)
emodulus_corr (ndarray) – Corrected emodulus [kPa]; only returned if emodulus is given.
Notes
If only area_um, deform, channel_width_in and channel_width_out are given, then only the area is corrected and returned together with the original deform. If all other arguments are not set to None, the emodulus is corrected and returned as well.
- dclab.features.emodulus.scale_linear.scale_area_um(area_um, channel_width_in, channel_width_out, inplace=False, **kwargs)[source]
Perform scale conversion for area_um (linear elastic model)
The area scales with the characteristic length “channel radius” L according to (L’/L)².
The conversion formula is described in [MOG+15].
- Parameters
- Returns
area_um_corr – Scaled area [µm²]
- Return type
ndarray
- dclab.features.emodulus.scale_linear.scale_emodulus(emodulus, channel_width_in, channel_width_out, flow_rate_in, flow_rate_out, viscosity_in, viscosity_out, inplace=False)[source]
Perform scale conversion for area_um (linear elastic model)
The conversion formula is described in [MOG+15].
- Parameters
emodulus (ndarray) – Young’s Modulus [kPa]
channel_width_in (float) – Original channel width [µm]
channel_width_out (float) – Target channel width [µm]
flow_rate_in (float) – Original flow rate [µL/s]
flow_rate_out (float) – Target flow rate [µL/s]
viscosity_in (float) – Original viscosity [mPa*s]
viscosity_out (float or ndarray) – Target viscosity [mPa*s]; This can be an array
inplace (bool) – If True, override input arrays with corrected data
- Returns
emodulus_corr – Scaled emodulus [kPa]
- Return type
ndarray
- dclab.features.emodulus.scale_linear.scale_feature(feat, data, inplace=False, **scale_kw)[source]
Convenience function for scale conversions (linear elastic model)
This method wraps around all the other scale_* methods and also supports deform/circ.
- dclab.features.emodulus.scale_linear.scale_volume(volume, channel_width_in, channel_width_out, inplace=False, **kwargs)[source]
Perform scale conversion for volume (linear elastic model)
The volume scales with the characteristic length “channel radius” L according to (L’/L)³.
Viscosity computation for various media
- dclab.features.emodulus.viscosity.check_temperature(model: str, temperature: float | np.array, tmin: float, tmax: float)[source]
Raise a TemperatureOutOfRangeWarning if applicable
- dclab.features.emodulus.viscosity.get_viscosity(medium: str = '0.49% MC-PBS', channel_width: float = 20.0, flow_rate: float = 0.16, temperature: float | numpy.ndarray = 23.0, model: Literal['herold-2017', 'herold-2017-fallback', 'buyukurganci-2022', 'kestin-1978'] = 'herold-2017-fallback')[source]
Returns the viscosity for RT-DC-specific media
Media that are not pure (e.g. ketchup or polymer solutions) often exhibit a non-linear relationship between shear rate (determined by the velocity profile) and shear stress (determined by pressure differences). If the shear stress grows non-linearly with the shear rate resulting in a slope in log-log space that is less than one, then we are talking about shear thinning. The viscosity is not a constant anymore (as it is e.g. for water). At higher flow rates, the viscosity becomes smaller, following a power law. Christoph Herold characterized shear thinning for the CellCarrier media [Her17]. The resulting formulae for computing the viscosities of these media at different channel widths, flow rates, and temperatures, are implemented here.
- Parameters
medium (str) – The medium to compute the viscosity for; Valid values are defined in
KNOWN_MEDIA
.channel_width (float) – The channel width in µm
flow_rate (float) – Flow rate in µL/s
temperature (float or ndarray) – Temperature in °C
model (str) – The model name to use for computing the medium viscosity. For water, this value is ignored, as there is only the ‘kestin-1978’ model [KSW78]. For MC-PBS media, there are the ‘herold-2017’ model [Her17] and the ‘buyukurganci-2022’ model [BBN+23].
- Returns
viscosity – Viscosity in mPa*s
- Return type
float or ndarray
Notes
CellCarrier (0.49% MC-PBS) and CellCarrier B (0.59% MC-PBS) are media designed for RT-DC experiments.
A
TemperatureOutOfRangeWarning
is issued if the input temperature range exceeds the temperature ranges of the models.
- dclab.features.emodulus.viscosity.get_viscosity_mc_pbs_buyukurganci_2022(medium: Literal['0.49% MC-PBS', '0.59% MC-PBS', '0.83% MC-PBS'] = '0.49% MC-PBS', channel_width: float = 20.0, flow_rate: float = 0.16, temperature: float = 23.0)[source]
Compute viscosity of MC-PBS according to [BBN+23]
This viscosity model was derived in [BBN+23] and adapted for RT-DC in [RB23].
- dclab.features.emodulus.viscosity.get_viscosity_mc_pbs_herold_2017(medium: Literal['0.49% MC-PBS', '0.59% MC-PBS'] = '0.49% MC-PBS', channel_width: float = 20.0, flow_rate: float = 0.16, temperature: float = 23.0)[source]
Compute viscosity of MC-PBS according to [Her17]
Note that all the factors in equation 5.2 in [Her17] compute to 8, which is essentially what is implemented in
shear_rate_square_channel()
:\[1.1856 \cdot 6 \cdot \frac{2}{3} \cdot \frac{1}{0.5928} = 8\]
- dclab.features.emodulus.viscosity.get_viscosity_water_kestin_1978(temperature: float = 23.0)[source]
Compute the viscosity of water according to [KSW78]
- dclab.features.emodulus.viscosity.shear_rate_square_channel(flow_rate, channel_width, flow_index)[source]
Returns The wall shear rate of a power law liquid in a squared channel.
- dclab.features.emodulus.viscosity.ALIAS_MEDIA = {'0.49% MC-PBS': '0.49% MC-PBS', '0.49% mc-pbs': '0.49% MC-PBS', '0.5% MC-PBS': '0.49% MC-PBS', '0.5% mc-pbs': '0.49% MC-PBS', '0.50% MC-PBS': '0.49% MC-PBS', '0.50% mc-pbs': '0.49% MC-PBS', '0.59% MC-PBS': '0.59% MC-PBS', '0.59% mc-pbs': '0.59% MC-PBS', '0.6% MC-PBS': '0.59% MC-PBS', '0.6% mc-pbs': '0.59% MC-PBS', '0.60% MC-PBS': '0.59% MC-PBS', '0.60% mc-pbs': '0.59% MC-PBS', '0.8% MC-PBS': '0.83% MC-PBS', '0.8% mc-pbs': '0.83% MC-PBS', '0.80% MC-PBS': '0.83% MC-PBS', '0.80% mc-pbs': '0.83% MC-PBS', '0.83% MC-PBS': '0.83% MC-PBS', '0.83% mc-pbs': '0.83% MC-PBS', 'CellCarrier': '0.49% MC-PBS', 'CellCarrier B': '0.59% MC-PBS', 'CellCarrierB': '0.59% MC-PBS', 'cellcarrier': '0.49% MC-PBS', 'cellcarrier b': '0.59% MC-PBS', 'cellcarrierb': '0.59% MC-PBS', 'water': 'water'}
Many media names are actually shorthand for one medium
- dclab.features.emodulus.viscosity.KNOWN_MEDIA = ['0.49% MC-PBS', '0.49% mc-pbs', '0.5% MC-PBS', '0.5% mc-pbs', '0.50% MC-PBS', '0.50% mc-pbs', '0.59% MC-PBS', '0.59% mc-pbs', '0.6% MC-PBS', '0.6% mc-pbs', '0.60% MC-PBS', '0.60% mc-pbs', '0.8% MC-PBS', '0.8% mc-pbs', '0.80% MC-PBS', '0.80% mc-pbs', '0.83% MC-PBS', '0.83% mc-pbs', 'CellCarrier', 'CellCarrier B', 'CellCarrierB', 'cellcarrier', 'cellcarrier b', 'cellcarrierb', 'water']
Media for which computation of viscosity is defined (has duplicate entries)
- dclab.features.emodulus.viscosity.SAME_MEDIA = {'0.49% MC-PBS': ['0.49% MC-PBS', '0.5% MC-PBS', '0.50% MC-PBS', 'CellCarrier'], '0.59% MC-PBS': ['0.59% MC-PBS', '0.6% MC-PBS', '0.60% MC-PBS', 'CellCarrier B', 'CellCarrierB'], '0.83% MC-PBS': ['0.83% MC-PBS', '0.8% MC-PBS', '0.80% MC-PBS'], 'water': ['water']}
Dictionary with different names for one medium
fluorescence
- dclab.features.fl_crosstalk.correct_crosstalk(fl1, fl2, fl3, fl_channel, ct21=0, ct31=0, ct12=0, ct32=0, ct13=0, ct23=0)[source]
Perform crosstalk correction
- Parameters
fli (int, float, or np.ndarray) – Measured fluorescence signals
fl_channel (int (1, 2, or 3)) – The channel number for which the crosstalk-corrected signal should be computed
cij (float) – Spill (crosstalk or bleed-through) from channel i to channel j This spill is computed from the fluorescence signal of e.g. single-stained positive control cells; It is defined by the ratio of the fluorescence signals of the two channels, i.e cij = flj / fli.
See also
get_compensation_matrix
compute the inverse crosstalk matrix
Notes
If there are only two channels (e.g. fl1 and fl2), then the crosstalk to and from the other channel (ct31, ct32, ct13, ct23) should be set to zero.
- dclab.features.fl_crosstalk.get_compensation_matrix(ct21, ct31, ct12, ct32, ct13, ct23)[source]
Compute crosstalk inversion matrix
The spillover matrix is
| c11 c12 c13 || c21 c22 c23 || c31 c32 c33 |The diagonal elements are set to 1, i.e.
ct11 = c22 = c33 = 1
- Parameters
cij (float) – Spill from channel i to channel j
- Returns
inv – Compensation matrix (inverted spillover matrix)
- Return type
np.ndarray
isoelastics
Isoelastics management
- class dclab.isoelastics.Isoelastics(paths=None)[source]
Isoelasticity line management
- Parameters
paths (list of pathlib.Path or list of str) – list of paths to files containing isoelasticity lines (see e.g.
ISOFILES
)versionchanged: (.) – 0.24.0: The isoelasticity lines of the analytical model [MOG+15] and the linear-elastic numerical model [MMM+17] were recomputed with an equidistant spacing. The metadata section of the text file format was restructured.
- add(isoel, col1, col2, channel_width, flow_rate, viscosity, method=None, lut_identifier=None)[source]
Add isoelastics
- Parameters
isoel (list of ndarrays) – Each list item resembles one isoelastic line stored as an array of shape (N,3). The last column contains the emodulus data.
col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
channel_width (float) – Channel width in µm
flow_rate (float) – Flow rate through the channel in µL/s
viscosity (float) – Viscosity of the medium in mPa*s
method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.
lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function
get_available_identifiers()
returns a list of available identifiers.
Notes
The following isoelastics are automatically added for user convenience:
isoelastics with col1 and col2 interchanged
isoelastics for circularity if deformation was given
- static add_px_err(isoel, col1, col2, px_um, inplace=False)[source]
Undo pixelation correction
Since isoelasticity lines are usually computed directly from the simulation data (e.g. the contour data are not discretized on a grid but are extracted from FEM simulations), they are not affected by pixelation effects as described in [Her17].
If the isoelasticity lines are displayed alongside experimental data (which are affected by pixelation effects), then the lines must be “un”-corrected, i.e. the pixelation error must be added to the lines to match the experimental data.
- Parameters
isoel (list of 2d ndarrays of shape (N, 3)) – Each item in the list corresponds to one isoelasticity line. The first column is defined by col1, the second by col2, and the third column is the emodulus.
col1 (str) – Define the fist two columns of each isoelasticity line.
col2 (str) – Define the fist two columns of each isoelasticity line.
px_um (float) – Pixel size [µm]
inplace (bool) – If True, do not create a copy of the data in isoel
- static convert(isoel, col1, col2, channel_width_in, channel_width_out, flow_rate_in, flow_rate_out, viscosity_in, viscosity_out, inplace=False)[source]
Perform isoelastics scale conversion
- Parameters
isoel (list of 2d ndarrays of shape (N, 3)) – Each item in the list corresponds to one isoelasticity line. The first column is defined by col1, the second by col2, and the third column is the emodulus.
col1 (str) – Define the fist to columns of each isoelasticity line. One of [“area_um”, “circ”, “deform”]
col2 (str) – Define the fist to columns of each isoelasticity line. One of [“area_um”, “circ”, “deform”]
channel_width_in (float) – Original channel width [µm]
channel_width_out (float) – Target channel width [µm]
flow_rate_in (float) – Original flow rate [µL/s]
flow_rate_out (float) – Target flow rate [µL/s]
viscosity_in (float) – Original viscosity [mPa*s]
viscosity_out (float) – Target viscosity [mPa*s]
inplace (bool) – If True, do not create a copy of the data in isoel
- Returns
isoel_scale – The scale-converted isoelasticity lines.
- Return type
list of 2d ndarrays of shape (N, 3)
Notes
If only the positions of the isoelastics are of interest and not the value of the elastic modulus, then it is sufficient to supply values for the channel width and set the values for flow rate and viscosity to a constant (e.g. 1).
See also
dclab.features.emodulus.scale_linear.scale_feature
scale conversion method used
- get(col1, col2, channel_width, method=None, lut_identifier=None, flow_rate=None, viscosity=None, add_px_err=False, px_um=None)[source]
Get isoelastics
- Parameters
col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
channel_width (float) – Channel width in µm
method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.
lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function
get_available_identifiers()
returns a list of available identifiers.flow_rate (float or None) – Flow rate through the channel in µL/s. If set to None, the flow rate of the imported data will be used (only do this if you do not need the correct values for elastic moduli).
viscosity (float or None) – Viscosity of the medium in mPa*s. If set to None, the flow rate of the imported data will be used (only do this if you do not need the correct values for elastic moduli).
add_px_err (bool) – If True, add pixelation errors according to C. Herold (2017), https://arxiv.org/abs/1704.00572 and scripts/pixelation_correction.py
px_um (float) – Pixel size [µm], used for pixelation error computation
See also
dclab.features.emodulus.scale_linear.scale_feature
scale conversion method used
dclab.features.emodulus.pxcorr.get_pixelation_delta
pixelation correction (applied to the feature data)
- get_with_rtdcbase(col1, col2, dataset, method=None, lut_identifier=None, viscosity=None, add_px_err=False)[source]
Convenience method that extracts the metadata from RTDCBase
- Parameters
col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
method (str) – The method used to compute the isoelastics DEPRECATED since 0.32.0. Please use lut_identifier instead.
lut_identifier (str:) – Look-up table identifier used to identify which isoelasticity lines to show. The function
get_available_identifiers()
returns a list of available identifiers.dataset (dclab.rtdc_dataset.RTDCBase) – The dataset from which to obtain the metadata.
viscosity (float, None, or False) – Viscosity of the medium in mPa*s. If set to None, the viscosity is computed from the meta data (medium, flow rate, channel width, temperature) in the [setup] config section. If this is not possible, the flow rate of the imported data is used and a warning will be issued.
add_px_err (bool) – If True, add pixelation errors according to C. Herold (2017), https://arxiv.org/abs/1704.00572 and scripts/pixelation_correction.py
- load_data(path)[source]
Load isoelastics from a text file
- Parameters
path (str or pathlib.Path) – Path to an isoelasticity lines text file
- dclab.isoelastics.check_lut_identifier(lut_identifier, method)[source]
Transitional function that can be removed once method is removed
- dclab.isoelastics.ISOFILES = [PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/stable/lib/python3.11/site-packages/dclab/isoelastics/iso_HE-2D-FEM-22-area_um-deform.txt'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/stable/lib/python3.11/site-packages/dclab/isoelastics/iso_HE-2D-FEM-22-volume-deform.txt'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/stable/lib/python3.11/site-packages/dclab/isoelastics/iso_HE-3D-FEM-22-area_um-deform.txt'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/stable/lib/python3.11/site-packages/dclab/isoelastics/iso_HE-3D-FEM-22-volume-deform.txt'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/stable/lib/python3.11/site-packages/dclab/isoelastics/iso_LE-2D-FEM-19-area_um-deform.txt'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/stable/lib/python3.11/site-packages/dclab/isoelastics/iso_LE-2D-FEM-19-volume-deform.txt'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/dclab/envs/stable/lib/python3.11/site-packages/dclab/isoelastics/iso_LE-2D-ana-18-area_um-deform.txt')]
List of isoelasticity lines in dclab
kde_contours
- dclab.kde_contours.find_contours_level(density, x, y, level, closed=False)[source]
Find iso-valued density contours for a given level value
- Parameters
density (2d ndarray of shape (M, N)) – Kernel density estimate (KDE) for which to compute the contours
x (2d ndarray of shape (M, N) or 1d ndarray of size M) – X-values corresponding to density
y (2d ndarray of shape (M, N) or 1d ndarray of size M) – Y-values corresponding to density
level (float between 0 and 1) – Value along which to find contours in density relative to its maximum
closed (bool) – Whether to close contours at the KDE support boundaries
- Returns
contours – Contours found for the given level value
- Return type
list of ndarrays of shape (P, 2)
See also
skimage.measure.find_contours
Contour finding algorithm used
- dclab.kde_contours.get_quantile_levels(density, x, y, xp, yp, q, normalize=True)[source]
Compute density levels for given quantiles by interpolation
For a given 2D density, compute the density levels at which the resulting contours contain the fraction 1-q of all data points. E.g. for a measurement of 1000 events, all contours at the level corresponding to a quantile of q=0.95 (95th percentile) contain 50 events (5%).
- Parameters
density (2d ndarray of shape (M, N)) – Kernel density estimate for which to compute the contours
x (2d ndarray of shape (M, N) or 1d ndarray of size M) – X-values corresponding to density
y (2d ndarray of shape (M, N) or 1d ndarray of size M) – Y-values corresponding to density
xp (1d ndarray of size D) – Event x-data from which to compute the quantile
yp (1d ndarray of size D) – Event y-data from which to compute the quantile
q (array_like or float between 0 and 1) – Quantile along which to find contours in density relative to its maximum
normalize (bool) – Whether output levels should be normalized to the maximum of density
- Returns
level – Contours level(s) corresponding to the given quantile
- Return type
np.ndarray or float
Notes
NaN-values events in xp and yp are ignored.
kde_methods
Kernel Density Estimation methods
- dclab.kde_methods.bin_num_doane(a)[source]
Compute number of bins based on Doane’s formula
Notes
If the bin width cannot be determined, then a bin number of 5 is returned.
See also
bin_width_doane
method used to compute the bin width
- dclab.kde_methods.bin_width_doane(a)[source]
Compute contour spacing based on Doane’s formula
References
https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width
https://stats.stackexchange.com/questions/55134/doanes-formula-for-histogram-binning
Notes
Doane’s formula is actually designed for histograms. This function is kept here for backwards-compatibility reasons. It is highly recommended to use
bin_width_percentile()
instead.
- dclab.kde_methods.bin_width_percentile(a)[source]
Compute contour spacing based on data percentiles
The 10th and the 90th percentile of the input data are taken. The spacing then computes to the difference between those two percentiles divided by 23.
Notes
The Freedman–Diaconis rule uses the interquartile range and normalizes to the third root of len(a). Such things do not work very well for RT-DC data, because len(a) is huge. Here we use just the top and bottom 10th percentiles with a fixed normalization.
- dclab.kde_methods.ignore_nan_inf(kde_method)[source]
Ignores nans and infs from the input data
Invalid positions in the resulting density are set to nan.
- dclab.kde_methods.kde_gauss(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]
Gaussian Kernel Density Estimation
- Parameters
events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
- Returns
density – The KDE for the points in (xout, yout)
- Return type
ndarray, same shape as xout
See also
Notes
This is a wrapped version that ignores nan and inf values.
- dclab.kde_methods.kde_histogram(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]
Histogram-based Kernel Density Estimation
- Parameters
events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
bins (tuple (binsx, binsy)) – The number of bins to use for the histogram.
- Returns
density – The KDE for the points in (xout, yout)
- Return type
ndarray, same shape as xout
Notes
This is a wrapped version that ignores nan and inf values.
- dclab.kde_methods.kde_multivariate(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]
Multivariate Kernel Density Estimation
- Parameters
events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
bw (tuple (bwx, bwy) or None) – The bandwith for kernel density estimation.
xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
- Returns
density – The KDE for the points in (xout, yout)
- Return type
ndarray, same shape as xout
See also
Notes
This is a wrapped version that ignores nan and inf values.
- dclab.kde_methods.kde_none(events_x, events_y, xout=None, yout=None)[source]
No Kernel Density Estimation
- Parameters
events_x (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
events_y (1D ndarray) – The input points for kernel density estimation. Input is flattened automatically.
xout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
yout (ndarray) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
- Returns
density – The KDE for the points in (xout, yout)
- Return type
ndarray, same shape as xout
Notes
This method is a convenience method that always returns ones in the shape that the other methods in this module produce.
polygon_filter
- class dclab.polygon_filter.PolygonFilter(axes=None, points=None, inverted=False, name=None, filename=None, fileid=0, unique_id=None)[source]
An object for filtering RTDC data based on a polygonial area
- Parameters
axes (tuple of str or list of str) – The axes/features on which the polygon is defined. The first axis is the x-axis. Example: (“area_um”, “deform”).
points (array-like object of shape (N,2)) – The N coordinates (x,y) of the polygon. The exact order is important.
inverted (bool) – Invert the polygon filter. This parameter is overridden if filename is given.
name (str) – A name for the polygon (optional).
filename (str) – A path to a .poly file as created by this classes’ save method. If filename is given, all other parameters are ignored.
fileid (int) – Which filter to import from the file (starting at 0).
unique_id (int) – An integer defining the unique id of the new instance.
Notes
The minimal arguments to this class are either filename OR (axes, points). If filename is set, all parameters are taken from the given .poly file.
- copy(invert=False)[source]
Return a copy of the current instance
- Parameters
invert (bool) – The copy will be inverted w.r.t. the original
- static get_instance_from_id(unique_id)[source]
Get an instance of the PolygonFilter using a unique id
- static import_all(path)[source]
Import all polygons from a .poly file.
Returns a list of the imported polygon filters
- static point_in_poly(p, poly)[source]
Determine whether a point is within a polygon area
Uses the ray casting algorithm.
- Parameters
p (tuple of floats) – Coordinates of the point
poly (array_like of shape (N, 2)) – Polygon (PolygonFilter.points)
- Returns
inside – True, if point is inside.
- Return type
Notes
If p lies on a side of the polygon, it is defined as
“inside” if it is on the lower or left
“outside” if it is on the top or right
Changed in version 0.24.1: The new version uses the cython implementation from scikit-image. In the old version, the inside/outside definition was the other way around. In favor of not having to modify upstram code, the scikit-image version was adapted.
- save(polyfile, ret_fobj=False)[source]
Save all data to a text file (appends data if file exists).
Polyfile can be either a path to a file or a file object that was opened with the write “w” parameter. By using the file object, multiple instances of this class can write their data.
If ret_fobj is True, then the file object will not be closed and returned.
- property hash
Hash of axes, points, and inverted
- instances = [<dclab.polygon_filter.PolygonFilter object>]
- property points
statistics
Statistics computation for RT-DC dataset instances
- class dclab.statistics.Statistics(name, method, req_feature=False)[source]
A helper class for computing statistics
All statistical methods are registered in the dictionary Statistics.available_methods.
- get_feature(ds, feat)[source]
Return filtered feature data
The features are filtered according to the user-defined filters, using the information in ds.filter.all. In addition, all nan and inf values are purged.
- Parameters
ds (dclab.rtdc_dataset.RTDCBase) – The dataset containing the feature
feat (str) – The name of the feature; must be a scalar feature
- available_methods = {'%-gated': <dclab.statistics.Statistics object>, 'Events': <dclab.statistics.Statistics object>, 'Flow rate': <dclab.statistics.Statistics object>, 'Mean': <dclab.statistics.Statistics object>, 'Median': <dclab.statistics.Statistics object>, 'Mode': <dclab.statistics.Statistics object>, 'SD': <dclab.statistics.Statistics object>}
- dclab.statistics.get_statistics(ds, methods=None, features=None)[source]
Compute statistics for an RT-DC dataset
- Parameters
ds (dclab.rtdc_dataset.RTDCBase) – The dataset for which to compute the statistics.
methods (list of str or None) – The methods wih which to compute the statistics. The list of available methods is given with dclab.statistics.Statistics.available_methods.keys() If set to None, statistics for all methods are computed.
features (list of str) – Feature name identifiers are defined by dclab.definitions.feature_exists. If set to None, statistics for all scalar features available are computed.
- Returns
header (list of str) – The header (feature + method names) of the computed statistics.
values (list of float) – The computed statistics.
- dclab.statistics.mode(data)[source]
Compute an intelligent value for the mode
The most common value in experimental is not very useful if there are a lot of digits after the comma. This method approaches this issue by rounding to bin size that is determined by the Freedman–Diaconis rule.
- Parameters
data (1d ndarray) – The data for which the mode should be computed.
- Returns
mode – The mode computed with the Freedman-Diaconis rule.
- Return type
HDF5 manipulation
Helper methods for copying .rtdc data
- dclab.rtdc_dataset.copier.h5ds_copy(src_loc, src_name, dst_loc, dst_name=None, ensure_compression=True, recursive=True)[source]
Copy an HDF5 Dataset from one group to another
- Parameters
src_loc (h5py.H5Group) – The source location
src_name (str) – Name of the dataset in src_loc
dst_loc (h5py.H5Group) – The destination location
dst_name (str) – The name of the destination dataset, defaults to src_name
ensure_compression (bool) – Whether to make sure that the data are compressed, If disabled, then all data from the source will be just copied and not compressed.
recursive (bool) – Whether to recurse into HDF5 Groups (this is required e.g. for copying the “trace” feature)
- Returns
dst – The dataset dst_loc[dst_name]
- Return type
- Raises
ValueError: – If the named source is not a h5py.Dataset
- dclab.rtdc_dataset.copier.is_properly_compressed(h5obj)[source]
Check whether an HDF5 object is properly compressed
The compression check only returns True if the input file was compressed with the Zstandard compression using compression level 5 or higher.
- dclab.rtdc_dataset.copier.rtdc_copy(src_h5file: h5py._hl.group.Group, dst_h5file: h5py._hl.group.Group, features: Literal['all', 'scalar', 'none'] = 'all', include_logs: bool = True, include_tables: bool = True, meta_prefix: str = '')[source]
Create a compressed copy of an RT-DC file
Tools for linking HDF5 datasets across files
- exception dclab.rtdc_dataset.linker.ExternalDataForbiddenError[source]
Raised when a dataset contains external data
External data are a security risk, because they could be used to access data that are not supposed to be accessed. This is especially critical when the data are accessed within a web server process (e.g. in DCOR).
- dclab.rtdc_dataset.linker.assert_no_external(h5)[source]
Raise ExternalDataForbiddenError if h5 refers to external data
- dclab.rtdc_dataset.linker.check_external(h5)[source]
Check recursively, whether an h5py object contains external data
External data includes binary data in external files, virtual datasets, and external links.
Returns a tuple of either
(True, path_ext) if the object contains external data
(False, None) if this is not the case
where path_ext is the path to the group or dataset in h5.
New in version 0.51.0.
- dclab.rtdc_dataset.linker.combine_h5files(paths: list, external: Literal['follow', 'raise'] = 'follow') BinaryIO [source]
Create an in-memory file that combines multiple .rtdc files
The .rtdc files must have the same number of events. The in-memory file is populated with the “events” data from paths according to the order that paths are given in. Metadata, including logs, basins, and tables are only taken from the first path.
New in version 0.51.0.
- Parameters
paths (list of str or pathlib.Path) – Paths of the input .rtdc files. The first input file is always used as a source for the metadata. The other files only complement the features.
external (str) – Defines how external (links, binary, virtual) data in paths should be handled. The default is to “follow” external datasets or links to external data. In a zero-trust context, you can set this to “raise” which will cause an
ExternalDataForbiddenError
exception when external data are encountered.
- Returns
fd – seekable, file-like object representing an HDF5 file opened in binary mode; This can be passed to :class:h5py.File
- Return type
BinaryIO
Writing RT-DC files
- class dclab.rtdc_dataset.writer.RTDCWriter(path_or_h5file: str | pathlib.Path | h5py._hl.files.File, mode: Literal['append', 'replace', 'reset'] = 'append', compression_kwargs: Union[Dict, collections.abc.Mapping] = None, compression: str = 'deprecated')[source]
RT-DC data writer classe
- Parameters
path_or_h5file (str or pathlib.Path or h5py.Group) – Path to an HDF5 file or an HDF5 file opened in write mode
mode (str) –
Defines how the data are stored:
”append”: append new feature data to existing h5py Datasets
”replace”: replace existing h5py Datasets with new features (used for ancillary feature storage)
”reset”: do not keep any previous data
compression_kwargs (dict-like) – Dictionary with the keys “compression” and “compression_opts” which are passed to
h5py.H5File.create_dataset()
. The default is Zstandard compression with the lowest compression level hdf5plugin.Zstd(clevel=1). To disable compression, use {“compression”: None}.Compression method used for data storage; one of [None, “lzf”, “gzip”, “szip”].
Deprecated since version 0.43.0: Use compression_kwargs instead.
- rectify_metadata()[source]
Autocomplete the metadta of the RTDC-measurement
The following configuration keys are updated:
experiment:event count
fluorescence:samples per event
imaging: roi size x (if image or mask is given)
imaging: roi size y (if image or mask is given)
The following configuration keys are added if not present:
fluorescence:channel count
- store_feature(feat, data, shape=None)[source]
Write feature data
- Parameters
feat (str) – feature name
shape (tuple of int) – For non-scalar features, this is the shape of the feature for one event (e.g. (90, 250) for an “image”. Usually, you do not have to specify this value, but you do need it in case of plugin features that don’t have the “feature shape” set or in case of temporary features. If you don’t specify it, then the shape is guessed based on the data you provide and a UserWarning will be issued.
- store_metadata(meta)[source]
Store RT-DC metadata
- Parameters
meta (dict-like) –
The metadata to store. Each key depicts a metadata section name whose data is given as a dictionary, e.g.:
meta = {"imaging": {"exposure time": 20, "flash duration": 2, ... }, "setup": {"channel width": 20, "chip region": "channel", ... }, ... }
Only section key names and key values therein registered in dclab are allowed and are converted to the pre-defined dtype. Only sections from the
dclab.definitions.CFG_METADATA
dictionary are stored. If you have custom metadata, you can use the “user” section.
- store_table(name, cmp_array)[source]
Store a compound array table
Tables are semi-metadata. They may contain information collected during a measurement (but with a lower temporal resolution) or other tabular data relevant for a dataset. Tables have named columns. Therefore, they can be represented as a numy recarray, and they should be stored as such in an HDF5 file (compund dataset).
- Parameters
name (str) – Name of the table
cmp_array (np.recarray, h5py.Dataset, or dict) – If a np.recarray or h5py.Dataset are provided, then they are written as-is to the file. If a dictionary is provided, then the dictionary is converted into a numpy recarray.
- version_brand(old_version=None, write_attribute=True)[source]
Perform version branding
Append a ” | dclab X.Y.Z” to the “setup:software version” attribute.
- write_image_float32(group, name, data)[source]
Write 32bit floating point image array
This function wraps
RTDCWriter.write_ndarray()
and adds image attributes to the HDF5 file so HDFView can display the images properly.- Parameters
group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
data (np.ndarray or list of np.ndarray) – image data
- write_image_grayscale(group, name, data, is_boolean)[source]
Write grayscale image data to and HDF5 dataset
This function wraps
RTDCWriter.write_ndarray()
and adds image attributes to the HDF5 file so HDFView can display the images properly.- Parameters
group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
data (np.ndarray or list of np.ndarray) – image data
is_boolean (bool) – whether the input data is of boolean nature (e.g. mask data) - if so, data are converted to uint8
- write_ndarray(group, name, data, dtype=None)[source]
Write n-dimensional array data to an HDF5 dataset
It is assumed that the shape of the array data is correct, i.e. that the shape of data is (number_events, feat_shape_1, …, feat_shape_n).
- Parameters
group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
data (np.ndarray) – data
dtype (dtype) – the dtype to use for storing the data (defaults to data.dtype)
- write_ragged(group, name, data)[source]
Write ragged data (i.e. list of arrays of different lenghts)
Ragged array data (e.g. contour data) are stored in a separate group and each entry becomes an HDF5 dataset.
- Parameters
group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
data (list of np.ndarray or np.ndarray) – the data in a list
- write_text(group, name, lines)[source]
Write text to an HDF5 dataset
Text data are written as a fixed-length string dataset.
- Parameters
group (h5py.Group) – parent group
name (str) – name of the dataset containing the text
lines (list of str or str) – the text, line by line
- dclab.rtdc_dataset.writer.CHUNK_SIZE = 100
Chunk size for storing HDF5 data
Command-line interface methods
command line interface
- dclab.cli.compress(path_out=None, path_in=None, force=False, check_suffix=True)[source]
Create a new dataset with all features compressed losslessly
- dclab.cli.condense(path_out=None, path_in=None, ancillaries=True, check_suffix=True)[source]
Create a new dataset with all (ancillary) scalar-only features
- dclab.cli.get_job_info()[source]
Return dictionary with current job information
- Returns
info – Job information including details about time, system, python version, and libraries used.
- Return type
dict of dicts
- dclab.cli.join(path_out=None, paths_in=None, metadata=None)[source]
Join multiple RT-DC measurements into a single .rtdc file
- dclab.cli.repack(path_in=None, path_out=None, strip_logs=False, check_suffix=True)[source]
Repack/recreate an .rtdc file, optionally stripping the logs
- dclab.cli.split(path_in=None, path_out=None, split_events=10000, skip_initial_empty_image=True, skip_final_empty_image=True, ret_out_paths=False, verbose=False)[source]
Split a measurement file
- Parameters
path_in (str or pathlib.Path) – Path of input measurement file
path_out (str or pathlib.Path) – Path to output directory (optional)
split_events (int) – Maximum number of events in each output file
skip_initial_empty_image (bool) – Remove the first event of the dataset if the image is zero.
skip_final_empty_image (bool) – Remove the final event of the dataset if the image is zero.
ret_out_paths – If True, return the list of output file paths.
verbose (bool) – If True, print messages to stdout
- Returns
[out_paths] – List of generated files (only if ret_out_paths is specified)
- Return type
list of pathlib.Path
- dclab.cli.tdms2rtdc(path_tdms=None, path_rtdc=None, compute_features=False, skip_initial_empty_image=True, skip_final_empty_image=True, verbose=False)[source]
Convert .tdms datasets to the hdf5-based .rtdc file format
- Parameters
path_tdms (str or pathlib.Path) – Path to input .tdms file
path_rtdc (str or pathlib.Path) – Path to output .rtdc file
compute_features (bool) – If True, compute all ancillary features and store them in the output file
skip_initial_empty_image (bool) – In old versions of Shape-In, the first image was sometimes not stored in the resulting .avi file. In dclab, such images are represented as zero-valued images. If True (default), this first image is not included in the resulting .rtdc file.
skip_final_empty_image (bool) – In other versions of Shape-In, the final image is sometimes also not stored in the .avi file. If True (default), this final image is not included in the resulting .rtdc file.
verbose (bool) – If True, print messages to stdout
R and lme4
- dclab.lme4.rlibs.RPY2_MIN_VERSION = '2.9.4'
Minimum rpy2 version
- dclab.lme4.rlibs.R_MIN_VERSION = '3.6.0'
Minimum R version This is actually a dependency for rpy2, because the API changed then (ffi.error: symbol ‘R_tryCatchError’ not found in library).
- class dclab.lme4.rsetup.AutoRConsole[source]
Helper class for catching R console output
By default, this console always returns “yes” when asked a question. If you need something different, you can subclass and override consoleread fucntion. The console stream is recorded in self.stream.
- lock = False
- perform_lock = True
- dclab.lme4.rsetup.install_lme4()[source]
Install the lme4 package (if not already installed)
The packages are installed to the user data directory given in
lib_path
.
R lme4 wrapper
- class dclab.lme4.wrapr.Rlme4(model='lmer', feature='deform')[source]
Perform an R-lme4 analysis with RT-DC data
- Parameters
- add_dataset(ds, group, repetition)[source]
Add a dataset to the analysis list
- Parameters
Notes
For each repetition, there must be a “treatment” and a “control”
group
.If you would like to perform a differential feature analysis, then you need to pass at least a reservoir and a channel dataset (with same parameters for group and repetition).
- fit(model=None, feature=None)[source]
Perform (generalized) linear mixed-effects model fit
The response variable is modeled using two linear mixed effect models:
model
Rlme4.r_func_model
(random intercept + random slope model)the null model
Rlme4.r_func_nullmodel
(without the fixed effect introduced by the “treatment” group).
Both models are compared in R using “anova” (from the R-package “stats” [Eve92]) which performs a likelihood ratio test to obtain the p-Value for the significance of the fixed effect (treatment).
If the input datasets contain data from the “reservoir” region, then the analysis is performed for the differential feature.
- Parameters
- Returns
results – Dictionary with the results of the fitting process:
”anova p-value”: Anova likelyhood ratio test (significance)
”feature”: name of the feature used for the analysis
self.feature
”fixed effects intercept”: Mean of
self.feature
for all controls; In the case of the “glmer+loglink” model, the intercept is already backtransformed from log space.”fixed effects treatment”: The fixed effect size between the mean of the controls and the mean of the treatments relative to “fixed effects intercept”; In the case of the “glmer+loglink” model, the fixed effect is already backtransformed from log space.
”fixed effects repetitions”: The effects (intercept and treatment) for each repetition. The first axis defines intercept/treatment; the second axis enumerates the repetitions; thus the shape is (2, number of repetitions) and
np.mean(results["fixed effects repetitions"], axis=1)
is equivalent to the tuple (results["fixed effects intercept"]
,results["fixed effects treatment"]
) for the “lmer” model. This does not hold for the “glmer+loglink” model, because of the non-linear inverse transform back from log space.”is differential”: Boolean indicating whether or not the analysis was performed for the differential (bootstrapped and subtracted reservoir from channel data) feature
”model”: model name used for the analysis
self.model
”model converged”: boolean indicating whether the model converged
”r anova”: Anova model (exposed from R)
”r model summary”: Summary of the model (exposed from R)
”r model coefficients”: Model coefficient table (exposed from R)
”r stderr”: errors and warnings from R
”r stdout”: standard output from R
- Return type
- get_differential_dataset()[source]
Return the differential dataset for channel/reservoir data
The most famous use case is differential deformation. The idea is that you cannot tell what the difference in deformation from channel to reservoir is, because you never measure the same object in the reservoir and the channel. You usually just have two distributions. Comparing distributions is possible via bootstrapping. And then, instead of running the lme4 analysis with the channel deformation data, it is run with the differential deformation (subtraction of the bootstrapped deformation distributions for channel and reservoir).
- is_differential()[source]
Return True if the differential feature is computed for analysis
This effectively just checks the regions of the datasets and returns True if any one of the regions is “reservoir”.
See also
get_differential_features
for an explanation
- data
list of [RTDCBase, column, repetition, chip_region]
- feature
dclab feature for which to perform the analysis
- model
modeling method to use (e.g. “lmer”)
- r_func_model
model function
- r_func_nullmodel
null model function
- dclab.lme4.wrapr.bootstrapped_median_distributions(a, b, bs_iter=1000, rs=117)[source]
Compute the bootstrapped distributions for two arrays.
- Parameters
- Returns
median_dist_a, median_dist_b – Boostrap distribution of medians for
a
andb
.- Return type
1d arrays of length bs_iter
Notes
From a programmatical point of view, it would have been better to implement this method for just one input array (because of redundant code). However, due to historical reasons (testing and comparability to Shape-Out 1), bootstrapping is done interleaved for the two arrays.
Machine learning
New in version 0.38.0.
- class dclab.rtdc_dataset.feat_anc_ml.ml_feature.MachineLearningFeature(feature_name, dc_model, modc_path=None)[source]
A user-defined machine-learning feature
- Parameters
feature_name (str) – name of the ML feature score (starts with ml_score_)
dc_model (dclab.rtdc_dataset.feat_anc_ml.ml_model.BaseModel) – ML model to register
modc_path (str or Path) – path to the original .modc file (if applicable)
Notes
MachineLearningFeature inherits from
AncillaryFeature
.
- dclab.rtdc_dataset.feat_anc_ml.ml_feature.load_ml_feature(modc_path)[source]
Find and load MachineLearningFeature(s) from a .modc file
- Parameters
modc_path (str or Path) – pathname to a .modc file
- Returns
ml_list – list of MachineLearningFeature instances loaded from modc_path
- Return type
list of MachineLearningFeature
See also
MachineLearningFeature
class handling the plugin feature information
- dclab.rtdc_dataset.feat_anc_ml.ml_feature.remove_all_ml_features()[source]
Convenience function for removing all MachineLearningFeature instances
See also
remove_ml_feature
remove a single MachineLearningFeature instance
- dclab.rtdc_dataset.feat_anc_ml.ml_feature.remove_ml_feature(ml_instance)[source]
Convenience function for removing a MachineLearningFeature instance
- Parameters
ml_instance (MachineLearningFeature) – The MachineLearningFeature instance to be removed from dclab
- Raises
TypeError – If the ml_instance is not a MachineLearningFeature instance
Reading and writing trained machine learning models for dclab
- dclab.rtdc_dataset.feat_anc_ml.modc.export_model(path, model, enforce_formats=None)[source]
Export an ML model to all possible formats
The model must be exportable with at least one method listed by
BaseModel.all_formats()
.- Parameters
path (str or pathlib.Path) – Directory where the model is stored to. For each supported model, a new subdirectory or file is created.
model (An instance of an ML model, NOT dclab.cfeat_anc_ml.models.BaseModel) – Trained model instance
enforce_formats (list of str) – Enforced file formats for export. If the export for one of these file formats fails, a ValueError is raised.
- dclab.rtdc_dataset.feat_anc_ml.modc.hash_path(path)[source]
Create a SHA256 hash of a file or all files in a directory
The files are sorted before hashing for reproducibility.
- dclab.rtdc_dataset.feat_anc_ml.modc.load_modc(path, from_format=None)[source]
Load models from a .modc file for inference
- Parameters
- Returns
model – Models that can be used for inference via model.predict
- Return type
list of dclab.rtdc_dataset.feat_anc_ml.ml_model.BaseModel
- dclab.rtdc_dataset.feat_anc_ml.modc.save_modc(path, dc_models)[source]
Save ML models to a .modc file
- Parameters
path (str, pathlib.Path) – Output .modc path
dc_models (list of/or dclab.rtdc_dataset.feat_anc_ml.models.BaseModel) – Models to save
- Returns
meta – Dictionary written to index.json in the .modc file
- Return type
- class dclab.rtdc_dataset.feat_anc_ml.ml_model.BaseModel(bare_model, inputs, outputs, info=None)[source]
- Parameters
bare_model – Underlying ML model
inputs (list of str) – List of model input features, e.g.
["deform", "area_um"]
outputs (list of str) – List of output features the model provides in that order, e.g.
["ml_score_rbc", "ml_score_rt1", "ml_score_tfe"]
info (dict) – Dictionary with model metadata
- static all_formats()[source]
Dict of dictionaries containing all model formats in dclab
- Returns
fmt_dict – All file formats with names as keys. Each item contains the keys “name” (format name), “suffix” (saved file suffix), “requires” (Python dependencies).
- Return type
See also
supported_formats
class-specific file formats
- get_dataset_features(ds, dtype=<class 'numpy.float32'>)[source]
Return the dataset features used for inference
- Parameters
ds (dclab.rtdc_dataset.RTDCBase) – Dataset from which to retrieve the feature data
dtype (dtype) – All features are cast to this dtype
- Returns
fdata – 2D array of shape (len(ds), len(self.inputs)); i.e. to access the array containing the first feature, for all events, you would do fdata[:, 0].
- Return type
2d ndarray
- abstract static load_bare_model(path)[source]
Load an implementation-specific model from a file
This will set the self.model attribute. Make sure that the other attributes are set properly as well.
- abstract predict(ds)[source]
Return the probabilities of self.outputs for ds
- Parameters
ds (dclab.rtdc_dataset.RTDCBase) – Dataset to apply the model to
- Returns
ofdict – Output feature dictionary with features as keys and 1d ndarrays as values.
- Return type
Notes
This function calls
BaseModel.get_dataset_features()
to obtain the input feature matrix.
- abstract static save_bare_model(path, bare_model, save_format=None)[source]
Save an implementation-specific model to a file
- abstract static supported_formats()[source]
List of dictionaries containing model formats
- Returns
fmts – Each item contains the keys “name” (format name), “suffix” (saved file suffix), “requires” (Python dependencies).
- Return type
Notes
The return value is automatically added to the return value of
BaseModel.all_formats()
.
tensorflow helper functions for RT-DC data
- dclab.rtdc_dataset.feat_anc_ml.hook_tensorflow.tf_dataset.assemble_tf_dataset_scalars(dc_data, feature_inputs, labels=None, split=0.0, shuffle=True, batch_size=32, dtype=<class 'numpy.float32'>)[source]
Assemble a tensorflow.data.Dataset for scalar features
Scalar feature data are loaded directly into memory.
- Parameters
dc_data (list of pathlib.Path, str, or dclab.rtdc_dataset.RTDCBase) – List of source datasets (can be anything
dclab.new_dataset()
accepts).feature_inputs (list of str) – List of scalar feature names to extract from paths.
labels (list) – Labels (e.g. an integer that classifies each element of path) used for training. Defaults to None (no labels).
split (float) – If set to zero, only one dataset is returned; If set to a float between 0 and 1, a train and test dataset is returned. Please set shuffle=True.
shuffle (bool) – If True (default), shuffle the dataset (A hard-coded seed is used for reproducibility).
batch_size (int) – Batch size for training. The function tf.data.Dataset.batch is called with batch_size as its argument.
dtype (numpy.dtype) – Desired dtype of the output data
- Returns
train [,test] – Dataset that can be used for training with tensorflow
- Return type
tensorflow.data.Dataset
- dclab.rtdc_dataset.feat_anc_ml.hook_tensorflow.tf_dataset.get_dataset_event_feature(dc_data, feature, tf_dataset_indices=None, dc_data_indices=None, split_index=0, split=0.0, shuffle=True)[source]
Return RT-DC features for tensorflow Dataset indices
The functions assemble_tf_dataset_* return a
tensorflow.data.Dataset
instance with all input data shuffled (or split). This function retrieves features using the Dataset indices, given the same parameters (paths, split, shuffle).- Parameters
dc_data (list of pathlib.Path, str, or dclab.rtdc_dataset.RTDCBase) – List of source datasets (Must match the path list used to create the tf.data.Dataset).
feature (str) – Name of the feature to retrieve
tf_dataset_indices (list-like) – tf.data.Dataset indices corresponding to the events of interest. If None, all indices are used.
dc_data_indices (list of int) – List with indices that correspond to the only items in dc_data for which the features should be returned.
split_index (int) – The split index; 0 for the first part, 1 for the second part.
split (float) – Splitting fraction (Must match the path list used to create the tf.data.Dataset)
shuffle (bool) – Shuffling (Must match the path list used to create the tf.data.Dataset)
- Returns
data – Feature list with elements corresponding to the events given by dataset_indices.
- Return type
- dclab.rtdc_dataset.feat_anc_ml.hook_tensorflow.tf_dataset.shuffle_array(arr, seed=42)[source]
Shuffle a numpy array in-place reproducibly with a fixed seed
The shuffled array is also returned.
- class dclab.rtdc_dataset.feat_anc_ml.hook_tensorflow.tf_model.TensorflowModel(bare_model, inputs, outputs, info=None)[source]
Handle tensorflow models
- Parameters
bare_model – Underlying ML model
inputs (list of str) – List of model input features, e.g.
["deform", "area_um"]
outputs (list of str) – List of output features the model provides in that order, e.g.
["ml_score_rbc", "ml_score_rt1", "ml_score_tfe"]
info (dict) – Dictionary with model metadata
- has_sigmoid_activation(layer_config=None)[source]
Return True if final layer has “sigmoid” activation function
- predict(ds, batch_size=32)[source]
Return the probabilities of self.outputs for ds
- Parameters
ds (dclab.rtdc_dataset.RTDCBase) – Dataset to apply the model to
batch_size (int) – Batch size for inference with tensorflow
- Returns
ofdict – Output feature dictionary with features as keys and 1d ndarrays as values.
- Return type
Notes
Before prediction, this method asserts that the outputs of the model are converted to probabilities. If the final layer is one-dimensional and does not have a sigmoid activation, then a sigmoid activation layer is added (binary classification)
tf.keras.layers.Activation("sigmoid")
. If the final layer has more dimensions and is not atf.keras.layers.Softmax()
layer, then a softmax layer is added.
- static supported_formats()[source]
List of dictionaries containing model formats
- Returns
fmts – Each item contains the keys “name” (format name), “suffix” (saved file suffix), “requires” (Python dependencies).
- Return type
Notes
The return value is automatically added to the return value of
BaseModel.all_formats()
.