singlet.dataset.cluster¶
-
class
singlet.dataset.cluster.
Cluster
(dataset)[source]¶ Bases:
singlet.dataset.plugins.Plugin
Cluster samples, features, and phenotypes
-
affinity_propagation
(axis, phenotypes=(), metric='correlation', log_features=False)[source]¶ Affinity/label/message propagation.
Parameters: - axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
- phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
- metric (string or matrix) – Metric to calculate the distance matrix. If it is a matrix already, use it as distance (squared). Else it should be a string accepted by scipy.spatial.distance.pdist.
- log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
Returns: dict with the linkage, distance matrix, and ordering.
-
dbscan
(axis, phenotypes=(), **kwargs)[source]¶ Density-Based Spatial Clustering of Applications with Noise.
Parameters: - axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
- phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
- log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
- **kwargs – arguments passed to sklearn.cluster.DBSCAN.
Returns: pd.Series with the labels of the clusters.
-
hierarchical
(axis, phenotypes=(), metric='correlation', method='average', log_features=False, optimal_ordering=False)[source]¶ Hierarchical clustering.
Parameters: - axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
- phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
- metric (string or matrix) – Metric to calculate the distance matrix. If it is a matrix already, use it as distance (squared). Else it should be a string accepted by scipy.spatial.distance.pdist.
- method (string) – Clustering method. Must be a string accepted by scipy.cluster.hierarchy.linkage.
- log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
- optimal_ordering (bool) – Whether to resort the linkage so that nearest neighbours have shortest distance. This may take longer than the clustering itself.
Returns: dict with the linkage, distance matrix, and ordering.
-
kmeans
(n_clusters, axis, phenotypes=(), random_state=0)[source]¶ K-Means clustering.
Parameters: - n_clusters (int) – The number of clusters you want.
- axis (string) – It must be ‘samples’ or ‘features’. The Dataset.counts matrix is used and either samples or features are clustered.
- phenotypes (iterable of strings) – Phenotypes to add to the features for joint clustering.
- log_features (bool) – Whether to add pseudocounts and take a log of the feature counts before calculating distances.
- random_state (int) – Set to the same int for deterministic results.
Returns: pd.Series with the labels of the clusters.
-