Code reference¶
module-level methods¶
-
dclab.
new_dataset
(data, identifier=None, **kwargs)[source]¶ Initialize a new RT-DC dataset
Parameters: - data –
can be one of the following:
- dict
- .tdms file
- .rtdc file
- subclass of RTDCBase (will create a hierarchy child)
- DCOR resource URL
- identifier (str) – A unique identifier for this dataset. If set to None an identifier is generated.
- kwargs (dict) – Additional parameters passed to the RTDCBase subclass
Returns: dataset – A new dataset instance
Return type: subclass of
dclab.rtdc_dataset.RTDCBase
- data –
global definitions¶
These definitionas are used throughout the dclab/Shape-In/Shape-Out ecosystem.
configuration¶
Valid configuration sections and keys are described in: Analysis metadata and Experiment metadata.
-
dclab.dfn.
CFG_ANALYSIS
¶ User-editable configuration for data analysis.
-
dclab.dfn.
CFG_METADATA
¶ Measurement-specific metadata.
-
dclab.dfn.
config_funcs
¶ Dictionary of dictionaries containing functions to convert input data to the predefined data type
-
dclab.dfn.
config_keys
¶ Dictionary with sections as keys and configuration parameter names as values
-
dclab.dfn.
config_types
¶ Dictionary of dictionaries containing the data type of each configuration parameter
features¶
Features are discussed in more detail in: Features.
-
dclab.dfn.
feature_exists
(name, scalar_only=False)¶ Return True if name is a valid feature name
This function not only checks whether name is in
feature_names
, but also validates against the machine learning scores ml_score_??? (where ? can be a digit or a lower-case letter in the English alphabet).
-
dclab.dfn.
get_feature_label
(name, rtdc_ds=None)¶ Return the label corresponding to a feature name
This function not only checks
feature_name2label
, but also supports the ml_score_??? features.TODO: If an RTDCBase object is given, then the feature label can be extracted from ancillary information.
-
dclab.dfn.
scalar_feature_exists
(name)¶ Convenience method wrapping feature_exists(…, scalar_only=True)
-
dclab.dfn.
FEATURES_NON_SCALAR
¶ Non-scalar features
-
dclab.dfn.
feature_names
¶ List of valid feature names
-
dclab.dfn.
feature_labels
¶ List of human-readable labels for each valid feature
-
dclab.dfn.
feature_name2label
¶ Dictionary that maps feature names to feature labels
-
dclab.dfn.
scalar_feature_names
¶ List of valid scalar feature names
RT-DC dataset manipulation¶
Base class¶
-
class
dclab.rtdc_dataset.
RTDCBase
(identifier=None)[source]¶ RT-DC measurement base class
Notes
Besides the filter arrays for each data feature, there is a manual boolean filter array
RTDCBase.filter.manual
that can be edited by the user - a boolean value ofFalse
means that the event is excluded from all computations.-
get_downsampled_scatter
(xax='area_um', yax='deform', downsample=0, xscale='linear', yscale='linear', remove_invalid=False, ret_mask=False)[source]¶ Downsampling by removing points at dense locations
Parameters: - xax (str) – Identifier for x axis (e.g. “area_um”, “aspect”, “deform”)
- yax (str) – Identifier for y axis
- downsample (int) –
Number of points to draw in the down-sampled plot. This number is either
- >=1: exactly downsample to this number by randomly adding
- or removing points
- 0 : do not perform downsampling
- xscale (str) – If set to “log”, take the logarithm of the x-values before performing downsampling. This is useful when data are are displayed on a log-scale. Defaults to “linear”.
- yscale (str) – See xscale.
- remove_invalid (bool) – Remove nan and inf values before downsampling; if set to True, the actual number of samples returned might be smaller than downsample due to infinite or nan values (e.g. due to logarithmic scaling).
- ret_mask (bool) – If set to True, returns a boolean array of length len(self) where True values identify the filtered data.
Returns: - xnew, xnew (1d ndarray of lenght N) – Filtered data; N is either identical to downsample or smaller (if remove_invalid==True)
- mask (1d boolean array of length len(RTDCBase)) – Array for identifying the downsampled data points
-
get_kde_contour
(xax='area_um', yax='deform', xacc=None, yacc=None, kde_type='histogram', kde_kwargs={}, xscale='linear', yscale='linear')[source]¶ Evaluate the kernel density estimate for contour plots
Parameters: - xax (str) – Identifier for X axis (e.g. “area_um”, “aspect”, “deform”)
- yax (str) – Identifier for Y axis
- xacc (float) – Contour accuracy in x direction
- yacc (float) – Contour accuracy in y direction
- kde_type (str) – The KDE method to use
- kde_kwargs (dict) – Additional keyword arguments to the KDE method
- xscale (str) – If set to “log”, take the logarithm of the x-values before computing the KDE. This is useful when data are are displayed on a log-scale. Defaults to “linear”.
- yscale (str) – See xscale.
Returns: X, Y, Z – The kernel density Z evaluated on a rectangular grid (X,Y).
Return type: coordinates
-
get_kde_scatter
(xax='area_um', yax='deform', positions=None, kde_type='histogram', kde_kwargs={}, xscale='linear', yscale='linear')[source]¶ Evaluate the kernel density estimate for scatter plots
Parameters: - xax (str) – Identifier for X axis (e.g. “area_um”, “aspect”, “deform”)
- yax (str) – Identifier for Y axis
- positions (list of two 1d ndarrays or ndarray of shape (2, N)) – The positions where the KDE will be computed. Note that the KDE estimate is computed from the the points that are set in self.filter.all.
- kde_type (str) – The KDE method to use
- kde_kwargs (dict) – Additional keyword arguments to the KDE method
- xscale (str) – If set to “log”, take the logarithm of the x-values before computing the KDE. This is useful when data are are displayed on a log-scale. Defaults to “linear”.
- yscale (str) – See xscale.
Returns: density – The kernel density evaluated for the filtered data points.
Return type: 1d ndarray
-
static
get_kde_spacing
(a, scale='linear', method=<function bin_width_doane>, method_kw={}, feat='undefined', ret_scaled=False)[source]¶ Convenience function for computing the contour spacing
Parameters: - a (ndarray) – feature data
- scale (str) – how the data should be scaled (“log” or “linear”)
- method (callable) – KDE method to use (see kde_methods submodule)
- method_kw (dict) – keyword arguments to method
- feat (str) – feature name for debugging
- ret_scale (bol) – whether or not to return the scaled array of a
-
polygon_filter_add
(filt)[source]¶ Associate a Polygon Filter with this instance
Parameters: filt (int or instance of PolygonFilter) – The polygon filter to add
-
polygon_filter_rm
(filt)[source]¶ Remove a polygon filter from this instance
Parameters: filt (int or instance of PolygonFilter) – The polygon filter to remove
-
config
= None¶ Configuration of the measurement
-
export
= None¶ Export functionalities; instance of
dclab.rtdc_dataset.export.Export
.
-
features
¶ All available features
-
features_innate
¶ All features excluding ancillary features
-
features_loaded
¶ All features including ancillaries that have been computed
Notes
Features that are computationally cheap to compute are always included. They are defined in
dclab.rtdc_dataset.ancillaries.FEATURES_RAPID
.
-
features_scalar
¶ All scalar features available
-
filter
= None¶ Filtering functionalities; instance of
dclab.rtdc_dataset.filter.Filter
.
-
format
= None¶ Dataset format (derived from class name)
-
hash
¶ Reproducible dataset hash (defined by derived classes)
-
identifier
¶ Unique (unreproducible) identifier
-
logs
= None¶ Dictionary of log files. Each log file is a list of strings (one string per line).
-
title
= None¶ Title of the measurement
-
DCOR (online) format¶
-
class
dclab.rtdc_dataset.
RTDC_DCOR
(url, use_ssl=None, host='dcor.mpl.mpg.de', api_key='', *args, **kwargs)[source]¶ Wrap around the DCOR API
Parameters: - url (str) –
Full URL or resource identifier; valid values are
- https://dcor.mpl.mpg.de/api/3/action/dcserv?id=b1404eb5-f661-4920-be79-5ff4e85915d5
- dcor.mpl.mpg.de/api/3/action/dcserv?id=b1404eb5-f 661-4920-be79-5ff4e85915d5
- b1404eb5-f661-4920-be79-5ff4e85915d5
- use_ssl (bool) – Set this to False to disable SSL (should only be used for testing). Defaults to None (does not force SSL if the URL starts with “http://”).
- host (str) – The host machine (used if the host is not given in url)
- api_key (str) – API key to access private resources
- *args – Arguments for RTDCBase
- **kwargs – Keyword arguments for RTDCBase
-
static
get_full_url
(url, use_ssl, host)[source]¶ Return the full URL to a DCOR resource
Parameters: - url (str) –
Full URL or resource identifier; valid values are
- https://dcor.mpl.mpg.de/api/3/action/dcserv?id=caab96f6- df12-4299-aa2e-089e390aafd5’
- dcor.mpl.mpg.de/api/3/action/dcserv?id=caab96f6-df12- 4299-aa2e-089e390aafd5
- caab96f6-df12-4299-aa2e-089e390aafd5
- use_ssl (bool) – Set this to False to disable SSL (should only be used for testing). Defaults to None (does not force SSL if the URL starts with “http://”).
- host (str) – Use this host if it is not specified in url
- url (str) –
-
hash
¶ Hash value based on file name and content
- url (str) –
-
class
dclab.rtdc_dataset.fmt_dcor.
APIHandler
(url, api_key='')[source]¶ Handles the DCOR api with caching for simple queries
-
classmethod
add_api_key
(api_key)[source]¶ Add an API Key to the base class
When accessing the DCOR API, all available API Keys are used to access a resource (trial and error).
-
api_keys
= []¶ DCOR API Keys in the current session
-
cache_queries
= ['metadata', 'size', 'feature_list', 'valid']¶ these are cached to minimize network usage
-
classmethod
Dictionary format¶
-
class
dclab.rtdc_dataset.
RTDC_Dict
(ddict, *args, **kwargs)[source]¶ Dictionary-based RT-DC dataset
Parameters: - ddict (dict) –
Dictionary with features as keys (valid features like “area_cvx”, “deform”, “image” are defined by dclab.definitions.feature_exists) with which the class will be instantiated. The configuration is set to the default configuration of dclab.
Changed in version 0.27.0: Scalar features are automatically converted to arrays.
- *args – Arguments for RTDCBase
- **kwargs – Keyword arguments for RTDCBase
- ddict (dict) –
HDF5 (.rtdc) format¶
-
class
dclab.rtdc_dataset.
RTDC_HDF5
(h5path, *args, **kwargs)[source]¶ HDF5 file format for RT-DC measurements
Parameters: - h5path (str or pathlib.Path) – Path to a ‘.tdms’ measurement file.
- *args – Arguments for RTDCBase
- **kwargs – Keyword arguments for RTDCBase
-
path
¶ Path to the experimental HDF5 (.rtdc) file
Type: pathlib.Path
-
hash
¶ Hash value based on file name and content
-
dclab.rtdc_dataset.fmt_hdf5.
MIN_DCLAB_EXPORT_VERSION
= '0.3.3.dev2'¶ rtdc files exported with dclab prior to this version are not supported
Hierarchy format¶
-
class
dclab.rtdc_dataset.
RTDC_Hierarchy
(hparent, apply_filter=True, *args, **kwargs)[source]¶ Hierarchy dataset (filtered from RTDCBase)
A few words on hierarchies: The idea is that a subclass of RTDCBase can use the filtered data of another subclass of RTDCBase and interpret these data as unfiltered events. This comes in handy e.g. when the percentage of different subpopulations need to be distinguished without the noise in the original data.
Children in hierarchies always update their data according to the filtered event data from their parent when apply_filter is called. This makes it easier to save and load hierarchy children with e.g. Shape-Out and it makes the handling of hierarchies more intuitive (when the parent changes, the child changes as well).
Parameters: - hparent (instance of RTDCBase) – The hierarchy parent
- apply_filter (bool) – Whether to apply the filter durint instantiation; If set to False, apply_filter must be called manually.
- *args – Arguments for RTDCBase
- **kwargs – Keyword arguments for RTDCBase
TDMS format¶
-
class
dclab.rtdc_dataset.
RTDC_TDMS
(tdms_path, *args, **kwargs)[source]¶ TDMS file format for RT-DC measurements
Parameters: - tdms_path (str or pathlib.Path) – Path to a ‘.tdms’ measurement file.
- *args – Arguments for RTDCBase
- **kwargs – Keyword arguments for RTDCBase
-
path
¶ Path to the experimental dataset (main .tdms file)
Type: pathlib.Path
-
dclab.rtdc_dataset.fmt_tdms.
get_project_name_from_path
(path, append_mx=False)[source]¶ Get the project name from a path.
For a path “/home/peter/hans/HLC12398/online/M1_13.tdms” or For a path “/home/peter/hans/HLC12398/online/data/M1_13.tdms” or without the “.tdms” file, this will return always “HLC12398”.
Parameters:
Ancillaries¶
Computation of ancillary features
Ancillary features are computed on-the-fly in dclab if the required data are available. The features are registered here and are computed when RTDCBase.__getitem__ is called with the respective feature name. When RTDCBase.__contains__ is called with the feature name, then the feature is not yet computed, but the prerequisites are evaluated:
In [1]: import dclab
In [2]: ds = dclab.new_dataset("data/example.rtdc")
In [3]: ds.config["calculation"]["emodulus medium"] = "CellCarrier"
In [4]: ds.config["calculation"]["emodulus model"] = "elastic sphere"
In [5]: ds.config["calculation"]["emodulus temperature"] = 23.0
In [6]: "emodulus" in ds # nothing is computed
Out[6]: True
In [7]: ds["emodulus"] # now data is computed and cached
Out[7]:
array([1.23006241, 1.08662317, nan, ..., nan, nan,
0.75430855])
Once the data has been computed, RTDCBase caches it in the _ancillaries property dict together with a hash that is computed with AncillaryFeature.hash. The hash is computed from the feature data req_features and the configuration metadata req_config.
-
class
dclab.rtdc_dataset.ancillaries.ancillary_feature.
AncillaryFeature
(feature_name, method, req_config=[], req_features=[], req_func=<function AncillaryFeature.<lambda>>, priority=0)[source]¶ A data feature that is computed from existing data
Parameters: - feature_name (str) – The name of the ancillary feature, e.g. “emodulus”.
- method (callable) – The method that computes the feature. This method takes an instance of RTDCBase as argument.
- req_config (list) – Required configuration parameters to compute the feature, e.g. [“calculation”, [“emodulus model”, “emodulus viscosity”]]
- req_features (list) – Required existing features in the dataset, e.g. [“area_cvx”, “deform”]
- req_func (callable) –
A function that takes an instance of RTDCBase as an argument and checks whether any other necessary criteria are met. By default, this is a lambda function that returns True. The function should return False if the necessary criteria are not met. This function may also return a hashable object (via
dclab.util.objstr()
) instead of True, if the criteria are subject to change. In this case, the return value is used for identifying the cached ancillary feature.Changed in version 0.27.0: Support non-boolean return values for caching purposes.
- priority (int) – The priority of the feature; if there are multiple AncillaryFeature defined for the same feature_name, then the priority of the features defines which feature returns True in self.is_available. A higher value means a higher priority.
Notes
req_config and req_features are used to test whether the feature can be computed in self.is_available.
-
static
available_features
(rtdc_ds)[source]¶ Determine available features for an RT-DC dataset
Parameters: rtdc_ds (instance of RTDCBase) – The dataset to check availability for Returns: features – Dictionary with feature names as keys and instances of AncillaryFeature as values. Return type: dict
-
compute
(rtdc_ds)[source]¶ Compute the feature with self.method
Parameters: rtdc_ds (instance of RTDCBase) – The dataset to compute the feature for Returns: feature – The computed data feature (read-only). Return type: array- or list-like
-
hash
(rtdc_ds)[source]¶ Used for identifying an ancillary computation
The data columns and the used configuration keys/values are hashed.
-
is_available
(rtdc_ds, verbose=False)[source]¶ Check whether the feature is available
Parameters: rtdc_ds (instance of RTDCBase) – The dataset to check availability for Returns: available – True, if feature can be computed with compute Return type: bool Notes
This method returns False for a feature if there is a feature defined with the same name but with higher priority (even if the feature would be available otherwise).
-
feature_names
= ['time', 'index', 'area_ratio', 'area_um', 'aspect', 'deform', 'emodulus', 'emodulus', 'emodulus', 'emodulus', 'fl1_max_ctc', 'fl2_max_ctc', 'fl3_max_ctc', 'fl1_max_ctc', 'fl2_max_ctc', 'fl1_max_ctc', 'fl3_max_ctc', 'fl2_max_ctc', 'fl3_max_ctc', 'contour', 'bright_avg', 'bright_sd', 'inert_ratio_cvx', 'inert_ratio_prnc', 'inert_ratio_raw', 'tilt', 'volume', 'ml_class']¶ All feature names registered
-
features
= [<AncillaryFeature 'time' (priority 0)>, <AncillaryFeature 'index' (priority 0)>, <AncillaryFeature 'area_ratio' (priority 0)>, <AncillaryFeature 'area_um' (priority 0)>, <AncillaryFeature 'aspect' (priority 0)>, <AncillaryFeature 'deform' (priority 0)>, <AncillaryFeature 'emodulus' (priority 3)>, <AncillaryFeature 'emodulus' (priority 2)>, <AncillaryFeature 'emodulus' (priority 1)>, <AncillaryFeature 'emodulus' (priority 0)>, <AncillaryFeature 'fl1_max_ctc' (priority 1)>, <AncillaryFeature 'fl2_max_ctc' (priority 1)>, <AncillaryFeature 'fl3_max_ctc' (priority 1)>, <AncillaryFeature 'fl1_max_ctc' (priority 0)>, <AncillaryFeature 'fl2_max_ctc' (priority 0)>, <AncillaryFeature 'fl1_max_ctc' (priority 0)>, <AncillaryFeature 'fl3_max_ctc' (priority 0)>, <AncillaryFeature 'fl2_max_ctc' (priority 0)>, <AncillaryFeature 'fl3_max_ctc' (priority 0)>, <AncillaryFeature 'contour' (priority 0)>, <AncillaryFeature 'bright_avg' (priority 0)>, <AncillaryFeature 'bright_sd' (priority 0)>, <AncillaryFeature 'inert_ratio_cvx' (priority 0)>, <AncillaryFeature 'inert_ratio_prnc' (priority 0)>, <AncillaryFeature 'inert_ratio_raw' (priority 0)>, <AncillaryFeature 'tilt' (priority 0)>, <AncillaryFeature 'volume' (priority 0)>, <AncillaryFeature 'ml_class' (priority 0)>]¶ All ancillary features registered
config¶
-
class
dclab.rtdc_dataset.config.
Configuration
(files=[], cfg={}, disable_checks=False)[source]¶ Configuration class for RT-DC datasets
This class has a dictionary-like interface to access and set configuration values, e.g.
cfg = load_from_file("/path/to/config.txt") # access the channel width cfg["setup"]["channel width"] # modify the channel width cfg["setup"]["channel width"] = 30
Parameters: - files (list of files) – The config files with which to initialize the configuration
- cfg (dict-like) – The dictionary with which to initialize the configuration
- disable_checks (bool) – Set this to True if you want to avoid checking against
section and key names defined in dclab.definitions
using
verify_section_key()
. This avoids excess warning messages when loading data from configuration files not generated by dclab.
-
tostring
(sections=None)[source]¶ Convert the configuration to its string representation
The optional argument sections allows to export only specific sections of the configuration, i.e. sections=dclab.dfn.CFG_METADATA will only export configuration data from the original measurement and no filtering data.
export¶
-
class
dclab.rtdc_dataset.export.
Export
(rtdc_ds)[source]¶ Export functionalities for RT-DC datasets
-
avi
(path, filtered=True, override=False)[source]¶ Exports filtered event images to an avi file
Parameters: Notes
Raises OSError if current dataset does not contain image data
-
fcs
(path, features, meta_data={}, filtered=True, override=False)[source]¶ Export the data of an RT-DC dataset to an .fcs file
Parameters: - mm (instance of dclab.RTDCBase) – The dataset that will be exported.
- path (str) – Path to an .fcs file. The ending .fcs is added automatically.
- features (list of str) – The features in the resulting .fcs file. These are strings that are defined by dclab.definitions.scalar_feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “aspect”.
- meta_data (dict) – User-defined, optional key-value pairs that are stored in the primary TEXT segment of the FCS file; the version of dclab is stored there by default
- filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
- override (bool) – If set to True, an existing file
path
will be overridden. If set to False, raises OSError ifpath
exists.
Notes
Due to incompatibility with the .fcs file format, all events with NaN-valued features are not exported.
-
hdf5
(path, features, filtered=True, override=False, compression='gzip')[source]¶ Export the data of the current instance to an HDF5 file
Parameters: - path (str) – Path to an .rtdc file. The ending .rtdc is added automatically.
- features (list of str) – The features in the resulting .rtdc file. These are strings that are defined by dclab.definitions.feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “image”.
- filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
- override (bool) – If set to True, an existing file
path
will be overridden. If set to False, raises OSError ifpath
exists. - compression (str or None) – Compression method for “contour”, “image”, and “trace” data as well as logs; one of [None, “lzf”, “gzip”, “szip”].
-
tsv
(path, features, meta_data={'dclab version': '0.27.9'}, filtered=True, override=False)[source]¶ Export the data of the current instance to a .tsv file
Parameters: - path (str) – Path to a .tsv file. The ending .tsv is added automatically.
- features (list of str) – The features in the resulting .tsv file. These are strings that are defined by dclab.definitions.scalar_feature_exists, e.g. “area_cvx”, “deform”, “frame”, “fl1_max”, “aspect”.
- meta_data (dict) – User-defined, optional key-value pairs that are stored at the beginning of the tsv file - one key-value pair is stored per line which starts with a hash. The version of dclab is stored there by default.
- filtered (bool) – If set to True, only the filtered data (index in ds.filter.all) are used.
- override (bool) – If set to True, an existing file
path
will be overridden. If set to False, raises OSError ifpath
exists.
-
filter¶
-
class
dclab.rtdc_dataset.filter.
Filter
(rtdc_ds)[source]¶ Boolean filter arrays for RT-DC measurements
Parameters: rtdc_ds (instance of RTDCBase) – The RT-DC dataset the filter applies to -
update
(rtdc_ds, force=[])[source]¶ Update the filters according to rtdc_ds.config[“filtering”]
Parameters: - rtdc_ds (dclab.rtdc_dataset.core.RTDCBase) – The measurement to which the filter is applied
- force (list) – A list of feature names that must be refiltered with min/max values.
Notes
This function is called when
ds.apply_filter
is called.
-
low-level functionalities¶
downsampling¶
Content-based downsampling of ndarrays
-
dclab.downsampling.
downsample_rand
(a, samples, remove_invalid=False, ret_idx=False)[source]¶ Downsampling by randomly removing points
Parameters: Returns: - dsa (1d ndarray of size samples) – The pseudo-randomly downsampled array a
- idx (1d boolean array with same shape as a) – Only returned if ret_idx is True. A boolean array such that a[idx] == dsa
features¶
image-based¶
-
dclab.features.contour.
get_contour
(mask)[source]¶ Compute the image contour from a mask
The contour is computed in a very inefficient way using scikit-image and a conversion of float coordinates to pixel coordinates.
Parameters: mask (binary ndarray of shape (M,N) or (K,M,N)) – The mask outlining the pixel positions of the event. If a 3d array is given, then K indexes the individual contours. Returns: cont – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour. Return type: ndarray or list of K ndarrays of shape (J,2)
-
dclab.features.bright.
get_bright
(mask, image, ret_data='avg, sd')[source]¶ Compute avg and/or std of the event brightness
The event brightness is defined by the gray-scale values of the image data within the event mask area.
Parameters: - mask (ndarray or list of ndarrays of shape (M,N) and dtype bool) – The mask values, True where the event is located in image.
- image (ndarray or list of ndarrays of shape (M,N)) – A 2D array that holds the image in form of grayscale values of an event.
- ret_data (str) – A comma-separated list of metrices to compute - “avg”: compute the average - “sd”: compute the standard deviation Selected metrics are returned in alphabetical order.
Returns: - bright_avg (float or ndarray of size N) – Average image data within the contour
- bright_std (float or ndarray of size N) – Standard deviation of image data within the contour
-
dclab.features.inert_ratio.
get_inert_ratio_cvx
(cont)[source]¶ Compute the inertia ratio of the convex hull of a contour
The inertia ratio is computed from the central second order of moments along x (mu20) and y (mu02) via sqrt(mu20/mu02).
Parameters: cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour. Returns: inert_ratio_cvx – The inertia ratio of the contour’s convex hull Return type: float or ndarray of size N Notes
The contour moments mu20 and mu02 are computed the same way they are computed in OpenCV’s moments.cpp.
See also
get_inert_ratio_raw()
- Compute inertia ratio of a raw contour
References
-
dclab.features.inert_ratio.
get_inert_ratio_raw
(cont)[source]¶ Compute the inertia ratio of a contour
The inertia ratio is computed from the central second order of moments along x (mu20) and y (mu02) via sqrt(mu20/mu02).
Parameters: cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event (in pixels) e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour. Returns: inert_ratio_raw – The inertia ratio of the contour Return type: float or ndarray of size N Notes
The contour moments mu20 and mu02 are computed the same way they are computed in OpenCV’s moments.cpp.
See also
get_inert_ratio_cvx()
- Compute inertia ratio of the convex hull of a contour
References
-
dclab.features.volume.
get_volume
(cont, pos_x, pos_y, pix)[source]¶ Calculate the volume of a polygon revolved around an axis
The volume estimation assumes rotational symmetry. Green`s theorem and the Gaussian divergence theorem allow to formulate the volume as a line integral.
Parameters: - cont (ndarray or list of ndarrays of shape (N,2)) – A 2D array that holds the contour of an event [px] e.g. obtained using mm.contour where mm is an instance of RTDCBase. The first and second columns of cont correspond to the x- and y-coordinates of the contour.
- pos_x (float or ndarray of length N) – The x coordinate(s) of the centroid of the event(s) [µm] e.g. obtained using mm.pos_x
- pos_y (float or ndarray of length N) – The y coordinate(s) of the centroid of the event(s) [µm] e.g. obtained using mm.pos_y
- px_um (float) – The detector pixel size in µm. e.g. obtained using: mm.config[“image”][“pix size”]
Returns: volume – volume in um^3
Return type: float or ndarray
Notes
The computation of the volume is based on a full rotation of the upper and the lower halves of the contour from which the average is then used.
The volume is computed radially from the the center position given by (pos_x, pos_y). For sufficiently smooth contours, such as densely sampled ellipses, the center position does not play an important role. For contours that are given on a coarse grid, as is the case for RT-DC, the center position must be given.
References
- Halpern et al. [Halpern2002], chapter 5, Section 5.4
- This is a translation from a Matlab script by Geoff Olynyk.
emodulus¶
Computation of apparent Young’s modulus for RT-DC measurements
-
dclab.features.emodulus.
extrapolate_emodulus
(lut, datax, deform, emod, deform_norm, deform_thresh=0.05, inplace=True)[source]¶ Use spline interpolation to fill in nan-values
When points (datax, deform) are outside the convex hull of the lut, then
scipy.interpolate.griddata()
returns nan-valules.With this function, some of these nan-values are extrapolated using
scipy.interpolate.SmoothBivariateSpline
. The supported extrapolation values are currently limited to those where the deformation is above 0.05.A warning will be issued, because this is not really recommended.
Parameters: - lut (ndarray of shape (N, 3)) – The normalized (!! see
normalize()
) LUT (first axis is points, second axis enumerates datax, deform, and emodulus) - datax (ndarray of size N) – The normalized x data (corresponding to lut[:, 0])
- deform (ndarray of size N) – The normalized deform (corresponding to lut[:, 1])
- emod (ndarray of size N) – The emodulus (corresponding to lut[:, 2]); If emod does not contain nan-values, there is nothing to do here.
- deform_norm (float) – The normalization value used to normalize lut[:, 1] and deform.
- deform_thresh (float) – Not the entire LUT is used for bivariate spline interpolation. Only the points where lut[:, 1] > deform_thresh/deform_norm are used. This is necessary, because for small deformations, the LUT has an extreme slope that kills any meaningful spline interpolation.
- lut (ndarray of shape (N, 3)) – The normalized (!! see
-
dclab.features.emodulus.
get_emodulus
(area_um=None, deform=None, volume=None, medium='CellCarrier', channel_width=20.0, flow_rate=0.16, px_um=0.34, temperature=23.0, lut_data='FEM-2Daxis', extrapolate=False, copy=True)[source]¶ Compute apparent Young’s modulus using a look-up table
Parameters: - area_um (float or ndarray) – Apparent (2D image) area [µm²] of the event(s)
- deform (float or ndarray) – Deformation (1-circularity) of the event(s)
- volume (float or ndarray) –
Apparent volume of the event(s). It is not possible to define volume and area_um at the same time (makes no sense).
New in version 0.25.0.
- medium (str or float) – The medium to compute the viscosity for. If a string is given, the viscosity is computed. If a float is given, this value is used as the viscosity in mPa*s (Note that temperature must be set to None in this case).
- channel_width (float) – The channel width [µm]
- flow_rate (float) – Flow rate [µL/s]
- px_um (float) – The detector pixel size [µm] used for pixelation correction. Set to zero to disable.
- temperature (float, ndarray, or None) – Temperature [°C] of the event(s)
- lut_data (path, str, or tuple of (np.ndarray of shape (N, 3), dict)) –
The LUT data to use. If it is a key in
INTERNAL_LUTS
, then the respective LUT will be used. Otherwise, a path to a file on disk or a tuple (LUT array, meta data) is possible. The LUT meta data is used to check whether the given features (e.g. area_um and deform) are valid interpolation choices.New in version 0.25.0.
- extrapolate (bool) – Perform extrapolation using
extrapolate_emodulus()
. This is discouraged! - copy (bool) – Copy input arrays. If set to false, input arrays are overridden.
Returns: elasticity – Apparent Young’s modulus in kPa
Return type: float or ndarray
Notes
- The look-up table used was computed with finite elements methods according to [Mokbel2017] and complemented with analytical isoelastics from [Mietke2015]. The original simulation results are available on figshare [FigshareWittwer2020].
- The computation of the Young’s modulus takes into account a correction for the viscosity (medium, channel width, flow rate, and temperature) [Mietke2015] and a correction for pixelation for the deformation which were derived from a (pixelated) image [Herold2017].
- Note that while deformation is pixelation-corrected, area_um and volume are scaled to match the LUT data. This is somewhat fortunate, because we don’t have to worry about the order of applying pixelation correction and scale conversion.
- By using external LUTs, it is possible to interpolate on the volume-deformation plane. This feature was added in version 0.25.0.
See also
dclab.features.emodulus.viscosity.get_viscosity()
- compute viscosity for known media
-
dclab.features.emodulus.
load_lut
(lut_data='FEM-2Daxis')[source]¶ Load LUT data from disk
Parameters: lut_data (path, str, or tuple of (np.ndarray of shape (N, 3), dict)) – The LUT data to use. If it is a key in INTERNAL_LUTS
, then the respective LUT will be used. Otherwise, a path to a file on disk or a tuple (LUT array, meta data) is possible.Returns: - lut (np.ndarray of shape (N, 3)) – The LUT data for interpolation
- meta (dict) – The LUT metadata
Notes
If lut_data is a tuple of (lut, meta), then nothing is actually done (this is implemented for user convenience).
-
dclab.features.emodulus.
load_mtext
(path)[source]¶ Load colunm-based data from text files with metadata
This file format is used for isoelasticity lines and look-up table data in dclab.
The text file is loaded with numpy.loadtxt. The metadata are stored as a json task between the “BEGIN METADATA” and the “END METADATA” tags. The last comment (#) line before the actual data defines the features with units in square brackets and tab-separated. For instance:
# […] # # BEGIN METADATA # { # “authors”: “A. Mietke, C. Herold, J. Guck”, # “channel_width”: 20.0, # “channel_width_unit”: “um”, # “date”: “2018-01-30”, # “dimensionality”: “2Daxis”, # “flow_rate”: 0.04, # “flow_rate_unit”: “uL/s”, # “fluid_viscosity”: 15.0, # “fluid_viscosity_unit”: “mPa s”, # “method”: “analytical”, # “model”: “linear elastic”, # “publication”: “https://doi.org/10.1016/j.bpj.2015.09.006”, # “software”: “custom Matlab code”, # “summary”: “2D-axis-symmetric analytical solution” # } # END METADATA # # […] # # area_um [um^2] deform emodulus [kPa] 3.75331e+00 5.14496e-03 9.30000e-01 4.90368e+00 6.72683e-03 9.30000e-01 6.05279e+00 8.30946e-03 9.30000e-01 7.20064e+00 9.89298e-03 9.30000e-01 […]
-
dclab.features.emodulus.
normalize
(data, dmax)[source]¶ Perform normalization in-place for interpolation
Note that
scipy.interpolate.griddata()
has a rescale option which rescales the data onto the unit cube. For some reason this does not work well with LUT data, so we just normalize it by dividing by the maximum value.
-
dclab.features.emodulus.
INACCURATE_SPLINE_EXTRAPOLATION
= False¶ Set this to True to globally enable spline extrapolation when the area_um/deform data are outside of a LUT. This is discouraged and a
KnowWhatYouAreDoingWarning
warning will be issued.
-
dclab.features.emodulus.
INTERNAL_LUTS
= {'FEM-2Daxis': 'emodulus_lut.txt'}¶ Dictionary of look-up tables shipped with dclab.
Pixelation correction definitions
-
dclab.features.emodulus.pxcorr.
corr_deform_with_area_um
(area_um, px_um=0.34)[source]¶ Deformation correction for area_um-deform data
The contour in RT-DC measurements is computed on a pixelated grid. Due to sampling problems, the measured deformation is overestimated and must be corrected.
The correction formula is described in [Herold2017].
Parameters: Returns: deform_delta – Error of the deformation of the event(s) that must be subtracted from deform. deform_corr = deform - deform_delta
Return type: float or ndarray
-
dclab.features.emodulus.pxcorr.
corr_deform_with_volume
(volume, px_um=0.34)[source]¶ Deformation correction for volume-deform data
The contour in RT-DC measurements is computed on a pixelated grid. Due to sampling problems, the measured deformation is overestimated and must be corrected.
The correction is derived in scripts/pixelation_correction.py.
Parameters: Returns: deform_delta – Error of the deformation of the event(s) that must be subtracted from deform. deform_corr = deform - deform_delta
Return type: float or ndarray
-
dclab.features.emodulus.pxcorr.
get_pixelation_delta
(feat_corr, feat_absc, data_absc, px_um=0.34)[source]¶ Convenience function for obtaining pixelation correction
Parameters:
-
dclab.features.emodulus.pxcorr.
get_pixelation_delta_pair
(feat1, feat2, data1, data2, px_um=0.34)[source]¶ Convenience function that returns pixelation correction pair
Scale conversion applicable to a linear elastic model
-
dclab.features.emodulus.scale_linear.
convert
(area_um, deform, channel_width_in, channel_width_out, emodulus=None, flow_rate_in=None, flow_rate_out=None, viscosity_in=None, viscosity_out=None, inplace=False)[source]¶ convert area-deformation-emodulus triplet
The conversion formula is described in [Mietke2015].
Parameters: - area_um (ndarray) – Convex cell area [µm²]
- deform (ndarray) – Deformation
- channel_width_in (float) – Original channel width [µm]
- channel_width_out (float) – Target channel width [µm]
- emodulus (ndarray) – Young’s Modulus [kPa]
- flow_rate_in (float) – Original flow rate [µL/s]
- flow_rate_out (float) – Target flow rate [µL/s]
- viscosity_in (float) – Original viscosity [mPa*s]
- viscosity_out (float or ndarray) – Target viscosity [mPa*s]; This can be an array
- inplace (bool) – If True, override input arrays with corrected data
Returns: - area_um_corr (ndarray) – Corrected cell area [µm²]
- deform_corr (ndarray) – Deformation (a copy if inplace is False)
- emodulus_corr (ndarray) – Corrected emodulus [kPa]; only returned if emodulus is given.
Notes
If only area_um, deform, channel_width_in and channel_width_out are given, then only the area is corrected and returned together with the original deform. If all other arguments are not set to None, the emodulus is corrected and returned as well.
-
dclab.features.emodulus.scale_linear.
scale_area_um
(area_um, channel_width_in, channel_width_out, inplace=False, **kwargs)[source]¶ Perform scale conversion for area_um (linear elastic model)
The area scales with the characteristic length “channel radius” L according to (L’/L)².
The conversion formula is described in [Mietke2015].
Parameters: Returns: area_um_corr – Scaled area [µm²]
Return type: ndarray
-
dclab.features.emodulus.scale_linear.
scale_emodulus
(emodulus, channel_width_in, channel_width_out, flow_rate_in, flow_rate_out, viscosity_in, viscosity_out, inplace=False)[source]¶ Perform scale conversion for area_um (linear elastic model)
The conversion formula is described in [Mietke2015].
Parameters: - emodulus (ndarray) – Young’s Modulus [kPa]
- channel_width_in (float) – Original channel width [µm]
- channel_width_out (float) – Target channel width [µm]
- flow_rate_in (float) – Original flow rate [µL/s]
- flow_rate_out (float) – Target flow rate [µL/s]
- viscosity_in (float) – Original viscosity [mPa*s]
- viscosity_out (float or ndarray) – Target viscosity [mPa*s]; This can be an array
- inplace (bool) – If True, override input arrays with corrected data
Returns: emodulus_corr – Scaled emodulus [kPa]
Return type: ndarray
-
dclab.features.emodulus.scale_linear.
scale_feature
(feat, data, inplace=False, **scale_kw)[source]¶ Convenience function for scale conversions (linear elastic model)
This method wraps around all the other scale_* methods and also supports deform/circ.
Parameters:
-
dclab.features.emodulus.scale_linear.
scale_volume
(volume, channel_width_in, channel_width_out, inplace=False, **kwargs)[source]¶ Perform scale conversion for volume (linear elastic model)
The volume scales with the characteristic length “channel radius” L according to (L’/L)³.
Parameters: Returns: volume_corr – Scaled volume [µm³]
Return type: ndarray
Viscosity computation for various media
-
dclab.features.emodulus.viscosity.
get_viscosity
(medium='CellCarrier', channel_width=20.0, flow_rate=0.16, temperature=23.0)[source]¶ Returns the viscosity for RT-DC-specific media
Media that are not pure (e.g. ketchup or polymer solutions) often exhibit a non-linear relationship between shear rate (determined by the velocity profile) and shear stress (determined by pressure differences). If the shear stress grows non-linearly with the shear rate resulting in a slope in log-log space that is less than one, then we are talking about shear thinning. The viscosity is not a constant anymore (as it is e.g. for water). At higher flow rates, the viscosity becomes smaller, following a power law. Christoph Herold characterized shear thinning for the CellCarrier media [Herold2017]. The resulting formulae for computing the viscosities of these media at different channel widths, flow rates, and temperatures, are implemented here.
Parameters: - medium (str) – The medium to compute the viscosity for; Valid values
are defined in
KNOWN_MEDIA
. - channel_width (float) – The channel width in µm
- flow_rate (float) – Flow rate in µL/s
- temperature (float or ndarray) – Temperature in °C
Returns: viscosity – Viscosity in mPa*s
Return type: float or ndarray
Notes
- CellCarrier and CellCarrier B media are optimized for RT-DC measurements.
- Values for the viscosity of water are computed using equation (15) from [Kestin_1978].
- A
TemperatureOutOfRangeWarning
is issued if the input temperature range exceeds the temperature ranges given by [Herold2017] and [Kestin_1978].
- medium (str) – The medium to compute the viscosity for; Valid values
are defined in
-
dclab.features.emodulus.viscosity.
KNOWN_MEDIA
= ['CellCarrier', 'CellCarrierB', 'water']¶ Media for which computation of viscosity is defined
fluorescence¶
-
dclab.features.fl_crosstalk.
correct_crosstalk
(fl1, fl2, fl3, fl_channel, ct21=0, ct31=0, ct12=0, ct32=0, ct13=0, ct23=0)[source]¶ Perform crosstalk correction
Parameters: - fli (int, float, or np.ndarray) – Measured fluorescence signals
- fl_channel (int (1, 2, or 3)) – The channel number for which the crosstalk-corrected signal should be computed
- cij (float) – Spill (crosstalk or bleed-through) from channel i to channel j This spill is computed from the fluorescence signal of e.g. single-stained positive control cells; It is defined by the ratio of the fluorescence signals of the two channels, i.e cij = flj / fli.
See also
get_compensation_matrix()
- compute the inverse crosstalk matrix
Notes
If there are only two channels (e.g. fl1 and fl2), then the crosstalk to and from the other channel (ct31, ct32, ct13, ct23) should be set to zero.
-
dclab.features.fl_crosstalk.
get_compensation_matrix
(ct21, ct31, ct12, ct32, ct13, ct23)[source]¶ Compute crosstalk inversion matrix
The spillover matrix is
| c11 c12 c13 || c21 c22 c23 || c31 c32 c33 |The diagonal elements are set to 1, i.e.
ct11 = c22 = c33 = 1
Parameters: cij (float) – Spill from channel i to channel j Returns: inv – Compensation matrix (inverted spillover matrix) Return type: np.ndarray
isoelastics¶
Isoelastics management
-
class
dclab.isoelastics.
Isoelastics
(paths=[])[source]¶ Isoelasticity line management
Changed in version 0.24.0: The isoelasticity lines of the analytical model [Mietke2015] and the linear-elastic numerical model [Mokbel2017] were recomputed with an equidistant spacing. The metadata section of the text file format was restructured.
-
add
(isoel, col1, col2, channel_width, flow_rate, viscosity, method)[source]¶ Add isoelastics
Parameters: - isoel (list of ndarrays) – Each list item resembles one isoelastic line stored as an array of shape (N,3). The last column contains the emodulus data.
- col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
- col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
- channel_width (float) – Channel width in µm
- flow_rate (float) – Flow rate through the channel in µL/s
- viscosity (float) – Viscosity of the medium in mPa*s
- method (str) – The method used to compute the isoelastics (must be one of VALID_METHODS).
Notes
The following isoelastics are automatically added for user convenience:
- isoelastics with col1 and col2 interchanged
- isoelastics for circularity if deformation was given
-
static
add_px_err
(isoel, col1, col2, px_um, inplace=False)[source]¶ Undo pixelation correction
Since isoelasticity lines are usually computed directly from the simulation data (e.g. the contour data are not discretized on a grid but are extracted from FEM simulations), they are not affected by pixelation effects as described in [Herold2017].
If the isoelasticity lines are displayed alongside experimental data (which are affected by pixelation effects), then the lines must be “un”-corrected, i.e. the pixelation error must be added to the lines to match the experimental data.
Parameters: - isoel (list of 2d ndarrays of shape (N, 3)) – Each item in the list corresponds to one isoelasticity line. The first column is defined by col1, the second by col2, and the third column is the emodulus.
- col2 (col1,) – Define the fist two columns of each isoelasticity line.
- px_um (float) – Pixel size [µm]
-
static
convert
(isoel, col1, col2, channel_width_in, channel_width_out, flow_rate_in, flow_rate_out, viscosity_in, viscosity_out, inplace=False)[source]¶ Perform isoelastics scale conversion
Parameters: - isoel (list of 2d ndarrays of shape (N, 3)) – Each item in the list corresponds to one isoelasticity line. The first column is defined by col1, the second by col2, and the third column is the emodulus.
- col2 (col1,) – Define the fist to columns of each isoelasticity line. One of [“area_um”, “circ”, “deform”]
- channel_width_in (float) – Original channel width [µm]
- channel_width_out (float) – Target channel width [µm]
- flow_rate_in (float) – Original flow rate [µL/s]
- flow_rate_out (float) – Target flow rate [µL/s]
- viscosity_in (float) – Original viscosity [mPa*s]
- viscosity_out (float) – Target viscosity [mPa*s]
Returns: isoel_scale – The scale-converted isoelasticity lines.
Return type: list of 2d ndarrays of shape (N, 3)
Notes
If only the positions of the isoelastics are of interest and not the value of the elastic modulus, then it is sufficient to supply values for the channel width and set the values for flow rate and viscosity to a constant (e.g. 1).
See also
dclab.features.emodulus.scale_linear.scale_feature()
- scale conversion method used
-
get
(col1, col2, method, channel_width, flow_rate=None, viscosity=None, add_px_err=False, px_um=None)[source]¶ Get isoelastics
Parameters: - col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
- col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
- method (str) – The method used to compute the isoelastics (must be one of VALID_METHODS).
- channel_width (float) – Channel width in µm
- flow_rate (float or None) – Flow rate through the channel in µL/s. If set to None, the flow rate of the imported data will be used (only do this if you do not need the correct values for elastic moduli).
- viscosity (float or None) – Viscosity of the medium in mPa*s. If set to None, the flow rate of the imported data will be used (only do this if you do not need the correct values for elastic moduli).
- add_px_err (bool) – If True, add pixelation errors according to C. Herold (2017), https://arxiv.org/abs/1704.00572 and scripts/pixelation_correction.py
- px_um (float) – Pixel size [µm], used for pixelation error computation
See also
dclab.features.emodulus.scale_linear.scale_feature()
- scale conversion method used
dclab.features.emodulus.pxcorr.get_pixelation_delta()
- pixelation correction (applied to the feature data)
-
get_with_rtdcbase
(col1, col2, method, dataset, viscosity=None, add_px_err=False)[source]¶ Convenience method that extracts the metadata from RTDCBase
Parameters: - col1 (str) – Name of the first feature of all isoelastics (e.g. isoel[0][:,0])
- col2 (str) – Name of the second feature of all isoelastics (e.g. isoel[0][:,1])
- method (str) – The method used to compute the isoelastics (must be one of VALID_METHODS).
- dataset (dclab.rtdc_dataset.RTDCBase) – The dataset from which to obtain the metadata.
- viscosity (float, None, or False) – Viscosity of the medium in mPa*s. If set to None, the viscosity is computed from the meta data (medium, flow rate, channel width, temperature) in the [setup] config section. If this is not possible, the flow rate of the imported data is used and a warning will be issued.
- add_px_err (bool) – If True, add pixelation errors according to C. Herold (2017), https://arxiv.org/abs/1704.00572 and scripts/pixelation_correction.py
-
kde_contours¶
-
dclab.kde_contours.
find_contours_level
(density, x, y, level, closed=False)[source]¶ Find iso-valued density contours for a given level value
Parameters: - density (2d ndarray of shape (M, N)) – Kernel density estimate for which to compute the contours
- x (2d ndarray of shape (M, N) or 1d ndarray of size M) – X-values corresponding to density
- y (2d ndarray of shape (M, N) or 1d ndarray of size M) – Y-values corresponding to density
- level (float between 0 and 1) – Value along which to find contours in density relative to its maximum
Returns: contours – Contours found for the given level value
Return type: list of ndarrays of shape (P, 2)
See also
skimage.measure.find_contours()
- Contour finding algorithm used
-
dclab.kde_contours.
get_quantile_levels
(density, x, y, xp, yp, q, normalize=True)[source]¶ Compute density levels for given quantiles by interpolation
For a given 2D density, compute the density levels at which the resulting contours contain the fraction 1-q of all data points. E.g. for a measurement of 1000 events, all contours at the level corresponding to a quantile of q=0.95 (95th percentile) contain 50 events (5%).
Parameters: - density (2d ndarray of shape (M, N)) – Kernel density estimate for which to compute the contours
- x (2d ndarray of shape (M, N) or 1d ndarray of size M) – X-values corresponding to density
- y (2d ndarray of shape (M, N) or 1d ndarray of size M) – Y-values corresponding to density
- xp (1d ndarray of size D) – Event x-data from which to compute the quantile
- yp (1d ndarray of size D) – Event y-data from which to compute the quantile
- q (array_like or float between 0 and 1) – Quantile along which to find contours in density relative to its maximum
- normalize (bool) – Whether output levels should be normalized to the maximum of density
Returns: level – Contours level(s) corresponding to the given quantile
Return type: np.ndarray or float
Notes
NaN-values events in xp and yp are ignored.
kde_methods¶
Kernel Density Estimation methods
-
dclab.kde_methods.
bin_num_doane
(a)[source]¶ Compute number of bins based on Doane’s formula
Notes
If the bin width cannot be determined, then a bin number of 5 is returned.
See also
bin_width_doane()
- method used to compute the bin width
-
dclab.kde_methods.
bin_width_doane
(a)[source]¶ Compute contour spacing based on Doane’s formula
References
- https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width
- https://stats.stackexchange.com/questions/55134/doanes-formula-for-histogram-binning
Notes
Doane’s formula is actually designed for histograms. This function is kept here for backwards-compatibility reasons. It is highly recommended to use
bin_width_percentile()
instead.
-
dclab.kde_methods.
bin_width_percentile
(a)[source]¶ Compute contour spacing based on data percentiles
The 10th and the 90th percentile of the input data are taken. The spacing then computes to the difference between those two percentiles divided by 23.
Notes
The Freedman–Diaconis rule uses the interquartile range and normalizes to the third root of len(a). Such things do not work very well for RT-DC data, because len(a) is huge. Here we use just the top and bottom 10th percentiles with a fixed normalization.
-
dclab.kde_methods.
ignore_nan_inf
(kde_method)[source]¶ Ignores nans and infs from the input data
Invalid positions in the resulting density are set to nan.
-
dclab.kde_methods.
kde_gauss
(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]¶ Gaussian Kernel Density Estimation
Parameters: - events_y (events_x,) – The input points for kernel density estimation. Input is flattened automatically.
- yout (xout,) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
Returns: density – The KDE for the points in (xout, yout)
Return type: ndarray, same shape as xout
See also
scipy.stats.gaussian_kde
Notes
This is a wrapped version that ignores nan and inf values.
-
dclab.kde_methods.
kde_histogram
(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]¶ Histogram-based Kernel Density Estimation
Parameters: - events_y (events_x,) – The input points for kernel density estimation. Input is flattened automatically.
- yout (xout,) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
- bins (tuple (binsx, binsy)) – The number of bins to use for the histogram.
Returns: density – The KDE for the points in (xout, yout)
Return type: ndarray, same shape as xout
See also
numpy.histogram2d scipy.interpolate.RectBivariateSpline
Notes
This is a wrapped version that ignores nan and inf values.
-
dclab.kde_methods.
kde_multivariate
(events_x, events_y, xout=None, yout=None, *args, **kwargs)[source]¶ Multivariate Kernel Density Estimation
Parameters: Returns: density – The KDE for the points in (xout, yout)
Return type: ndarray, same shape as xout
See also
statsmodels.nonparametric.kernel_density.KDEMultivariate
Notes
This is a wrapped version that ignores nan and inf values.
-
dclab.kde_methods.
kde_none
(events_x, events_y, xout=None, yout=None)[source]¶ No Kernel Density Estimation
Parameters: - events_y (events_x,) – The input points for kernel density estimation. Input is flattened automatically.
- yout (xout,) – The coordinates at which the KDE should be computed. If set to none, input coordinates are used.
Returns: density – The KDE for the points in (xout, yout)
Return type: ndarray, same shape as xout
Notes
This method is a convenience method that always returns ones in the shape that the other methods in this module produce.
polygon_filter¶
-
class
dclab.polygon_filter.
PolygonFilter
(axes=None, points=None, inverted=False, name=None, filename=None, fileid=0, unique_id=None)[source]¶ An object for filtering RTDC data based on a polygonial area
Parameters: - axes (tuple of str) – The axes/features on which the polygon is defined. The first axis is the x-axis. Example: (“area_um”, “deform”).
- points (array-like object of shape (N,2)) – The N coordinates (x,y) of the polygon. The exact order is important.
- inverted (bool) – Invert the polygon filter. This parameter is overridden if filename is given.
- name (str) – A name for the polygon (optional).
- filename (str) – A path to a .poly file as create by this classes’ save method. If filename is given, all other parameters are ignored.
- fileid (int) – Which filter to import from the file (starting at 0).
- unique_id (int) – An integer defining the unique id of the new instance.
Notes
The minimal arguments to this class are either filename OR (axes, points). If filename is set, all parameters are taken from the given .poly file.
-
copy
(invert=False)[source]¶ Return a copy of the current instance
Parameters: invert (bool) – The copy will be inverted w.r.t. the original
-
static
get_instance_from_id
(unique_id)[source]¶ Get an instance of the PolygonFilter using a unique id
-
static
import_all
(path)[source]¶ Import all polygons from a .poly file.
Returns a list of the imported polygon filters
-
static
point_in_poly
(p, poly)[source]¶ Determine whether a point is within a polygon area
Uses the ray casting algorithm.
Parameters: - p (tuple of floats) – Coordinates of the point
- poly (array_like of shape (N, 2)) – Polygon (PolygonFilter.points)
Returns: inside – True, if point is inside.
Return type: Notes
If p lies on a side of the polygon, it is defined as
- “inside” if it is on the lower or left
- “outside” if it is on the top or right
Changed in version 0.24.1: The new version uses the cython implementation from scikit-image. In the old version, the inside/outside definition was the other way around. In favor of not having to modify upstram code, the scikit-image version was adapted.
-
save
(polyfile, ret_fobj=False)[source]¶ Save all data to a text file (appends data if file exists).
Polyfile can be either a path to a file or a file object that was opened with the write “w” parameter. By using the file object, multiple instances of this class can write their data.
If ret_fobj is True, then the file object will not be closed and returned.
-
hash
¶ Hash of axes, points, and inverted
-
instances
= [<dclab.polygon_filter.PolygonFilter object>]¶
-
points
¶
statistics¶
Statistics computation for RT-DC dataset instances
-
class
dclab.statistics.
Statistics
(name, method, req_feature=False)[source]¶ A helper class for computing statistics
All statistical methods are registered in the dictionary Statistics.available_methods.
-
get_feature
(ds, feat)[source]¶ Return filtered feature data
The features are filtered according to the user-defined filters, using the information in ds.filter.all. In addition, all nan and inf values are purged.
Parameters: - ds (dclab.rtdc_dataset.RTDCBase) – The dataset containing the feature
- feat (str) – The name of the feature; must be a scalar feature
-
available_methods
= {'%-gated': <dclab.statistics.Statistics object>, 'Events': <dclab.statistics.Statistics object>, 'Flow rate': <dclab.statistics.Statistics object>, 'Mean': <dclab.statistics.Statistics object>, 'Median': <dclab.statistics.Statistics object>, 'Mode': <dclab.statistics.Statistics object>, 'SD': <dclab.statistics.Statistics object>}¶
-
-
dclab.statistics.
get_statistics
(ds, methods=None, features=None)[source]¶ Compute statistics for an RT-DC dataset
Parameters: - ds (dclab.rtdc_dataset.RTDCBase) – The dataset for which to compute the statistics.
- methods (list of str or None) – The methods wih which to compute the statistics. The list of available methods is given with dclab.statistics.Statistics.available_methods.keys() If set to None, statistics for all methods are computed.
- features (list of str) – Feature name identifiers are defined by dclab.definitions.feature_exists. If set to None, statistics for all scalar features available are computed.
Returns: - header (list of str) – The header (feature + method names) of the computed statistics.
- values (list of float) – The computed statistics.
-
dclab.statistics.
mode
(data)[source]¶ Compute an intelligent value for the mode
The most common value in experimental is not very useful if there are a lot of digits after the comma. This method approaches this issue by rounding to bin size that is determined by the Freedman–Diaconis rule.
Parameters: data (1d ndarray) – The data for which the mode should be computed. Returns: mode – The mode computed with the Freedman-Diaconis rule. Return type: float