singlet.dataset.cluster¶

class singlet.dataset.cluster.Cluster(dataset)[source]¶

Bases: singlet.dataset.plugins.Plugin

Cluster samples, features, and phenotypes

affinity_propagation(axis, phenotypes=(), metric='correlation', log_features=False)[source]¶

Affinity/label/message propagation.

Parameters:

axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
metric (string or matrix) – Metric to calculate the distance matrix. If it is a matrix already, use it as distance (squared). Else it should be a string accepted by scipy.spatial.distance.pdist.
log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.

Returns:

dict with the linkage, distance matrix, and ordering.

dbscan(axis, phenotypes=(), **kwargs)[source]¶

Density-Based Spatial Clustering of Applications with Noise.

Parameters:

axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
**kwargs – arguments passed to sklearn.cluster.DBSCAN.

Returns:

pd.Series with the labels of the clusters.

hierarchical(axis, phenotypes=(), metric='correlation', method='average', log_features=False, optimal_ordering=False)[source]¶

Hierarchical clustering.

Parameters:

axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
metric (string or matrix) – Metric to calculate the distance matrix. If it is a matrix already, use it as distance (squared). Else it should be a string accepted by scipy.spatial.distance.pdist.
method (string) – Clustering method. Must be a string accepted by scipy.cluster.hierarchy.linkage.
log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
optimal_ordering (bool) – Whether to resort the linkage so that nearest neighbours have shortest distance. This may take longer than the clustering itself.

Returns:

dict with the linkage, distance matrix, and ordering.

kmeans(n_clusters, axis, phenotypes=(), random_state=0)[source]¶

K-Means clustering.

Parameters:

n_clusters (int) – The number of clusters you want.
axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
random_state (int) – Set to the same int for deterministic results.

Returns:

pd.Series with the labels of the clusters.

singlet