singlet.dataset¶
-
class
singlet.dataset.Dataset(counts_table=None, samplesheet=None, featuresheet=None)[source]¶ Bases:
objectCollection of cells, with feature counts and metadata
-
compare(other, features='mapped', phenotypes=(), method='kolmogorov-smirnov')[source]¶ Statistically compare with another Dataset.
Parameters: - other (Dataset) – The Dataset to compare with.
- features (list, string, or None) – Features to compare. The string ‘total’ means all features including spikeins and other, ‘mapped’ means all features excluding spikeins and other, ‘spikeins’ means only spikeins, and ‘other’ means only ‘other’ features. If empty list or None, do not compare features (useful for phenotypic comparison).
- phenotypes (list of strings) – Phenotypes to compare.
- method (string or function) – Statistical test to use for the comparison. If a string it must be one of ‘kolmogorov-smirnov’ or ‘mann-whitney’. If a function, it must accept two arrays as arguments (one for each dataset, running over the samples) and return a P-value for the comparison.
Returns: A pandas.DataFrame containing the P-values of the comparisons for all features and phenotypes.
-
counts¶ Matrix of gene expression counts.
Rows are features, columns are samples.
Notice: If you reset this matrix with features that are not in the featuresheet or samples that are not in the samplesheet, those tables will be reset to empty.
-
featuremetadatanames¶ pandas.Index of feature metadata column names
-
featurenames¶ pandas.Index of feature names
-
featuresheet¶ Matrix of feature metadata.
Rows are features, columns are metadata (e.g. Gene Ontologies).
-
n_features¶ Number of features
-
n_samples¶ Number of samples
-
query_features_by_counts(expression, inplace=False, local_dict=None)[source]¶ Select features based on their expression.
Parameters: - expression (string) – An expression compatible with pandas.DataFrame.query.
- inplace (bool) – Whether to change the Dataset in place or return a new one.
- local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns: If inplace is True, None. Else, a Dataset.
-
query_features_by_metadata(expression, inplace=False, local_dict=None)[source]¶ Select features based on metadata.
Parameters: - expression (string) – An expression compatible with pandas.DataFrame.query.
- inplace (bool) – Whether to change the Dataset in place or return a new one.
- local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns: If inplace is True, None. Else, a Dataset.
-
query_samples_by_counts(expression, inplace=False, local_dict=None)[source]¶ Select samples based on gene expression.
Parameters: - expression (string) – An expression compatible with pandas.DataFrame.query.
- inplace (bool) – Whether to change the Dataset in place or return a new one.
- local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns: If inplace is True, None. Else, a Dataset.
-
query_samples_by_metadata(expression, inplace=False, local_dict=None)[source]¶ Select samples based on metadata.
Parameters: - expression (string) – An expression compatible with pandas.DataFrame.query.
- inplace (bool) – Whether to change the Dataset in place or return a new one.
- local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns: If inplace is True, None. Else, a Dataset.
-
samplemetadatanames¶ pandas.Index of sample metadata column names
-
samplenames¶ pandas.Index of sample names
-
samplesheet¶ Matrix of sample metadata.
Rows are samples, columns are metadata (e.g. phenotypes).
-
split(phenotypes, copy=True)[source]¶ Split Dataset based on one or more categorical phenotypes
Parameters: phenotypes (string or list of strings) – one or more phenotypes to use for the split. Unique values of combinations of these determine the split Datasets. Returns: the keys are either unique values of the phenotype chosen or, if more than one, tuples of unique combinations. Return type: dict of Datasets
-