singlet.dataset.feature_selection

class singlet.dataset.feature_selection.FeatureSelection(dataset)[source]

Bases: singlet.dataset.plugins.Plugin

Plot gene expression and phenotype in single cells

expressed(n_samples, exp_min, inplace=False)[source]

Select features that are expressed in at least some samples.

Parameters:
  • n_samples (int) – Minimum number of samples the features should be expressed in.
  • exp_min (float) – Minimum level of expression of the features.
  • inplace (bool) – Whether to change the feature list in place.
Returns:

pd.Index of selected features if not inplace, else None.

gate_features_from_statistics(features='mapped', x='mean', y='cv', **kwargs)[source]

Select features for downstream analysis with a gate.

Usage: Click with the left mouse button to set the vertices of a polygon. Double left-click closes the shape. Right click resets the plot.

Parameters:
  • features (list or string) – List of features to plot. The string ‘mapped’ means everything excluding spikeins and other, ‘all’ means everything including spikeins and other.
  • x (string) – Statistics to plot on the x axis.
  • y (string) – Statistics to plot on the y axis.
  • **kwargs – named arguments passed to the plot function.
Returns:

pd.Index of features within the gate.

overdispersed_strata(bins=10, n_features_per_stratum=50, inplace=False)[source]

Select overdispersed features in strata of increasing expression.

Parameters:
  • bins (int or list) – Bin edges determining the strata. If this is a number, split the expression in this many equally spaced bins between minimal and maximal expression.
  • n_features_per_stratum (int) – Number of features per stratum to select.
Returns:

pd.Index of selected features if not inplace, else None.

Notice that the number of selected features may be smaller than expected if some strata have no dispersion (e.g. only dropouts). Because of this, it is recommended you restrict the counts to expressed features before using this function.

sam(k=None, distance='correlation', *args, **kwargs)[source]

Calculate feature weights via self-assembling manifolds

Parameters:
  • k (int or None) – The number of nearest neighbors for each sample
  • distance (str) – The distance matrix
  • **kwargs (*args,) –

    Arguments to SAM.run

Returns:

SAM instance containing SAM.output_vars[‘gene_weights’]

See also: https://github.com/atarashansky/self-assembling-manifold

unique(inplace=False)[source]

Select features with unique ids

Parameters:inplace (bool) – Whether to change the feature list in place.
Returns:pd.Index of selected features if not inplace, else None.