singlet.dataset.cluster

class singlet.dataset.cluster.Cluster(dataset)[source]

Bases: singlet.dataset.plugins.Plugin

Cluster samples, features, and phenotypes

affinity_propagation(axis, phenotypes=(), metric='correlation', log_features=False)[source]

Affinity/label/message propagation.

Parameters:
  • axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
  • phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
  • metric (string or matrix) – Metric to calculate the distance matrix. If it is a matrix already, use it as distance (squared). Else it should be a string accepted by scipy.spatial.distance.pdist.
  • log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
Returns:

dict with the linkage, distance matrix, and ordering.

dbscan(axis, phenotypes=(), **kwargs)[source]

Density-Based Spatial Clustering of Applications with Noise.

Parameters:
  • axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
  • phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
  • log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
  • **kwargs – arguments passed to sklearn.cluster.DBSCAN.
Returns:

pd.Series with the labels of the clusters.

hierarchical(axis, phenotypes=(), metric='correlation', method='average', log_features=False, optimal_ordering=False)[source]

Hierarchical clustering.

Parameters:
  • axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
  • phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
  • metric (string or matrix) – Metric to calculate the distance matrix. If it is a matrix already, use it as distance (squared). Else it should be a string accepted by scipy.spatial.distance.pdist.
  • method (string) – Clustering method. Must be a string accepted by scipy.cluster.hierarchy.linkage.
  • log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
  • optimal_ordering (bool) – Whether to resort the linkage so that nearest neighbours have shortest distance. This may take longer than the clustering itself.
Returns:

dict with the linkage, distance matrix, and ordering.

kmeans(n_clusters, axis, phenotypes=(), random_state=0)[source]

K-Means clustering.

Parameters:
  • n_clusters (int) – The number of clusters you want.
  • axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
  • phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
  • log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
  • random_state (int) – Set to the same int for deterministic results.
Returns:

pd.Series with the labels of the clusters.