singlet.dataset

class singlet.dataset.Dataset(counts_table=None, samplesheet=None, featuresheet=None, dataset=None, plugins=None)[source]

Bases: object

Collection of cells, with feature counts and metadata

average(axis, column)[source]

Average samples or features based on metadata

Parameters:
  • axis (string) – Must be ‘samples’ or ‘features’.
  • column (string) – Must be a column of the samplesheet (for axis=’samples’) or of the featuresheet (for axis=’features’). Samples or features with a common value in this column are averaged over.
Returns:

A Dataset with the averaged counts.

Note: if you average over samples, you get an empty samplesheet. Simlarly, if you average over features, you get an empty featuresheet.

bootstrap(groupby=None)[source]

Resample with replacement, aka bootstrap dataset

Parameters:
  • groupby (str or list of str or None) – If None, bootstrap random
  • disregarding sample metadata. If a string or a list of (samples) –
  • boostrap over groups of samples with consistent (strings,) –
  • for that/those columns. (entries) –
Returns:

A Dataset with the resampled samples.

compare(other, features='mapped', phenotypes=(), method='kolmogorov-smirnov')[source]

Statistically compare with another Dataset.

Parameters:
  • other (Dataset) – The Dataset to compare with.
  • features (list, string, or None) – Features to compare. The string ‘total’ means all features including spikeins and other, ‘mapped’ means all features excluding spikeins and other, ‘spikeins’ means only spikeins, and ‘other’ means only ‘other’ features. If empty list or None, do not compare features (useful for phenotypic comparison).
  • phenotypes (list of strings) – Phenotypes to compare.
  • method (string or function) – Statistical test to use for the comparison. If a string it must be one of ‘kolmogorov-smirnov’ or ‘mann-whitney’. If a function, it must accept two arrays as arguments (one for each dataset, running over the samples) and return a P-value for the comparison.
Returns:

A pandas.DataFrame containing the P-values of the comparisons for

all features and phenotypes.

copy()[source]

Copy of the Dataset

counts

Matrix of gene expression counts.

Rows are features, columns are samples.

Notice: If you reset this matrix with features that are not in the
featuresheet or samples that are not in the samplesheet, those tables will be reset to empty.
featuremetadatanames

pandas.Index of feature metadata column names

featurenames

pandas.Index of feature names

featuresheet

Matrix of feature metadata.

Rows are features, columns are metadata (e.g. Gene Ontologies).

n_features

Number of features

n_samples

Number of samples

query_features_by_counts(expression, inplace=False, local_dict=None)[source]

Select features based on their expression.

Parameters:
  • expression (string) – An expression compatible with pandas.DataFrame.query.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns:

If inplace is True, None. Else, a Dataset.

query_features_by_metadata(expression, inplace=False, local_dict=None)[source]

Select features based on metadata.

Parameters:
  • expression (string) – An expression compatible with pandas.DataFrame.query.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns:

If inplace is True, None. Else, a Dataset.

query_features_by_name(featurenames, inplace=False, ignore_missing=False)[source]

Select features by name.

Parameters:
  • featurenames – names of the features to keep.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • ignore_missing (bool) – Whether to silently skip missing features.
query_samples_by_counts(expression, inplace=False, local_dict=None)[source]

Select samples based on gene expression.

Parameters:
  • expression (string) – An expression compatible with pandas.DataFrame.query.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns:

If inplace is True, None. Else, a Dataset.

query_samples_by_metadata(expression, inplace=False, local_dict=None)[source]

Select samples based on metadata.

Parameters:
  • expression (string) – An expression compatible with pandas.DataFrame.query.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • local_dict (dict) – A dictionary of local variables, useful if you are using @var assignments in your expression. By far the most common usage of this argument is to set local_dict=locals().
Returns:

If inplace is True, None. Else, a Dataset.

query_samples_by_name(samplenames, inplace=False, ignore_missing=False)[source]

Select samples by name.

Parameters:
  • samplenames – names of the samples to keep.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
  • ignore_missing (bool) – Whether to silently skip missing samples.
reindex(axis, column, drop=False, inplace=False)[source]

Reindex samples or features from a metadata column

Parameters:
  • axis (string) – Must be ‘samples’ or ‘features’.
  • column (string) – Must be a column of the samplesheet (for axis=’samples’) or of the featuresheet (for axis=’features’) with unique names of samples or features.
  • drop (bool) – Whether to drop the column from the metadata table.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.
rename(axis, column, inplace=False)[source]

Rename samples or features

Parameters:
  • axis (string) – Must be ‘samples’ or ‘features’.
  • column (string) – Must be a column of the samplesheet (for axis=’samples’) or of the featuresheet (for axis=’features’) with unique names of samples or features.
  • inplace (bool) – Whether to change the Dataset in place or return a new one.

DEPRECATED: use reindex instead.

samplemetadatanames

pandas.Index of sample metadata column names

samplenames

pandas.Index of sample names

samplesheet

Matrix of sample metadata.

Rows are samples, columns are metadata (e.g. phenotypes).

split(phenotypes, copy=True)[source]

Split Dataset based on one or more categorical phenotypes

Parameters:phenotypes (string or list of strings) – one or more phenotypes to use for the split. Unique values of combinations of these determine the split Datasets.
Returns:
the keys are either unique values of the
phenotype chosen or, if more than one, tuples of unique combinations.
Return type:dict of Datasets
to_dataset_file(filename, fmt=None, **kwargs)[source]

Store dataset into an integrated dataset file

Parameters:
  • filename (str) – path of the file to write to.
  • fmt (str or None) – file format. If None, infer from the file
  • extension.
  • **kwargs (keyword arguments) – depend on the format.

The additional keyword argument for the supported formats are: - loom:

  • axis_samples: rows or columns (default)