singlet.dataset

class singlet.dataset.Dataset(counts_table=None, samplesheet=None, featuresheet=None)[source]

Bases: object

Collection of cells, with feature counts and metadata

compare(other, features='mapped', phenotypes=(), method='kolmogorov-smirnov')[source]

Statistically compare with another Dataset.

Parameters:
  • other (Dataset) – The Dataset to compare with.
  • features (list, string, or None) – Features to compare. The string ‘total’ means all features including spikeins and other, ‘mapped’ means all features excluding spikeins and other, ‘spikeins’ means only spikeins, and ‘other’ means only ‘other’ features. If empty list or None, do not compare features (useful for phenotypic comparison).
  • phenotypes (list of strings) – Phenotypes to compare.
  • method (string or function) – Statistical test to use for the comparison. If a string it must be one of ‘kolmogorov-smirnov’ or ‘mann-whitney’. If a function, it must accept two arrays as arguments (one for each dataset, running over the samples) and return a P-value for the comparison.
Returns:

A pandas.DataFrame containing the P-values of the comparisons for all features and phenotypes.

copy()[source]

Copy of the Dataset including a new SampleSheet and CountsTable

counts

Matrix of gene expression counts.

Rows are features, columns are samples.

Notice: If you reset this matrix with features that are not in the featuresheet or samples that are not in the samplesheet, those tables will be reset to empty.

featuremetadatanames

pandas.Index of feature metadata column names

featurenames

pandas.Index of feature names

featuresheet

Matrix of feature metadata.

Rows are features, columns are metadata (e.g. Gene Ontologies).

n_features

Number of features

n_samples

Number of samples

query_features_by_counts(expression, inplace=False, local_dict=None)[source]

Select features based on their expression.

Parameters:
  • expression (string) – An expression compatible with pandas.DataFrame.query.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns:

If inplace is True, None. Else, a Dataset.

query_features_by_metadata(expression, inplace=False, local_dict=None)[source]

Select features based on metadata.

Parameters:
  • expression (string) – An expression compatible with pandas.DataFrame.query.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns:

If inplace is True, None. Else, a Dataset.

query_samples_by_counts(expression, inplace=False, local_dict=None)[source]

Select samples based on gene expression.

Parameters:
  • expression (string) – An expression compatible with pandas.DataFrame.query.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns:

If inplace is True, None. Else, a Dataset.

query_samples_by_metadata(expression, inplace=False, local_dict=None)[source]

Select samples based on metadata.

Parameters:
  • expression (string) – An expression compatible with pandas.DataFrame.query.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns:

If inplace is True, None. Else, a Dataset.

samplemetadatanames

pandas.Index of sample metadata column names

samplenames

pandas.Index of sample names

samplesheet

Matrix of sample metadata.

Rows are samples, columns are metadata (e.g. phenotypes).

split(phenotypes, copy=True)[source]

Split Dataset based on one or more categorical phenotypes

Parameters:phenotypes (string or list of strings) – one or more phenotypes to use for the split. Unique values of combinations of these determine the split Datasets.
Returns:the keys are either unique values of the phenotype chosen or, if more than one, tuples of unique combinations.
Return type:dict of Datasets