singlet.counts_table.counts_table

class singlet.counts_table.counts_table.CountsTable(data=None, index=None, columns=None, dtype=None, copy=False)[source]

Bases: pandas.core.frame.DataFrame

Table of gene expression counts

  • Rows are features, e.g. genes.
  • Columns are samples.
bin(bins=5, result='index', inplace=False)[source]

Bin feature counts.

Parameters:
  • bins (int, array, or list of arrays) – If an int, number equal-width bins between pseudocounts and the max of the counts matrix. If an array of indices of the same length as the number of features, use a different number of equal-width bins for each feature. If an array of any other length, use these bin edges (including rightmost edge) for all features. If a list of arrays, it has to be as long as the number of features, and every array in the list determines the bin edges (including rightmost edge) for that feature, in order.
  • result (string) – Has to be one of ‘index’ (default), ‘left’, ‘center’, ‘right’. ‘index’ assign to the feature the index (starting at 0) of that bin, ‘left’ assign the left bin edge, ‘center’ the bin center, ‘right’ the right edge. If result is ‘index’, out-of-bounds values will be assigned the value -1, which means Not A Number in ths context.
  • inplace (bool) – Whether to perform the operation in place.
Returns:

If inplace is False, a CountsTable with the binned counts.

center(axis='samples', inplace=False)[source]

Center the counts table (subtract mean).

Parameters:
  • axis (string) – The axis to average over, has to be ‘samples’ or ‘features’.
  • inplace (bool) – Whether to do the operation in place or return a new CountsTable
Returns:

If inplace is False, a transformed CountsTable.

dataset = None
exclude_features(spikeins=True, other=True, inplace=False, errors='raise')[source]

Get a slice that excludes secondary features.

Parameters:
  • spikeins (bool) – Whether to exclude spike-ins
  • other (bool) – Whether to exclude other features, e.g. unmapped reads
  • inplace (bool) – Whether to drop those features in place.
  • errors (string) – Whether to raise an exception if the features to be excluded are already not present. Must be ‘ignore’ or ‘raise’.
Returns:

a slice of self without those features.

Return type:

CountsTable

classmethod from_datasetname(datasetname)[source]

Instantiate a CountsTable from its name in the config file.

Parameters:datasetname (string) – name of the dataset in the config file.
Returns:the counts table.
Return type:CountsTable
classmethod from_tablename(tablename)[source]

Instantiate a CountsTable from its name in the config file.

Parameters:tablename (string) – name of the counts table in the config file.
Returns:the counts table.
Return type:CountsTable
get_other_features()[source]

Get other features

Returns:a slice of self with only other features (e.g. unmapped).
Return type:CountsTable
get_spikeins()[source]

Get spike-in features

Returns:a slice of self with only spike-ins.
Return type:CountsTable
get_statistics(metrics=('mean', 'cv'))[source]

Get statistics of the counts.

Parameters:metrics (sequence of strings) – any of ‘mean’, ‘var’, ‘std’, ‘cv’, ‘fano’, ‘min’, ‘max’.
Returns:pandas.DataFrame with features as rows and metrics as columns.
log(base=10, inplace=False)[source]

Take the pseudocounted log of the counts.

Parameters:
  • base (float) – Base of the log transform
  • inplace (bool) – Whether to do the operation in place or return a new CountsTable
Returns:

If inplace is False, a transformed CountsTable.

normalize(method='counts_per_million', include_spikeins=False, inplace=False, **kwargs)[source]

Normalize counts and return new CountsTable.

Parameters:
  • method (string or function) – The method to use for normalization.
  • of 'counts_per_million', 'counts_per_thousand_spikeins', (One) –
  • If this argument is a function, its ('counts_per_thousand_features'.) –
  • depends on the inplace argument. If inplace=False, it (signature) –
  • take the CountsTable as input and return the normalized one as (must) –
  • If inplace=True, it must take the CountsTableXR as input (output.) –
  • modify it in place. Notice that if inplace=True and you do (and) –
  • operations you might lose the _metadata properties. You (non-inplace) –
  • end your function by self[ (can) – ] = <normalized counts>.
  • include_spikeins (bool) – Whether to include spike-ins in the
  • and result. (normalization) –
  • inplace (bool) – Whether to modify the CountsTableXR in place or
  • a new one. (return) –
Returns:

If inplace is False, a new, normalized CountsTable.

pseudocount = 0.1
standard_scale(axis='samples', inplace=False, add_to_den=0)[source]

Subtract minimum and divide by (maximum - minimum).

Parameters:
  • axis (string) – The axis to average over, has to be ‘samples’ or ‘features’.
  • inplace (bool) – Whether to do the operation in place or return a new CountsTable
  • add_to_den (float) – Whether to add a (small) value to the denominator to avoid NaNs. 1e-5 or so should be fine.
Returns:

If inplace is False, a transformed CountsTable.

unlog(base=10, inplace=False)[source]

Reverse the pseudocounted log of the counts.

Parameters:
  • base (float) – Base of the log transform
  • inplace (bool) – Whether to do the operation in place or return a new CountsTable
Returns:

If inplace is False, a transformed CountsTable.

z_score(axis='samples', inplace=False, add_to_den=0)[source]

Calculate the z scores of the counts table.

In other words, subtract the mean and divide by the standard deviation.

Parameters:
  • axis (string) – The axis to average over, has to be ‘samples’ or ‘features’.
  • inplace (bool) – Whether to do the operation in place or return a new CountsTable
  • add_to_den (float) – Whether to add a (small) value to the denominator to avoid NaNs. 1e-5 or so should be fine.
Returns:

If inplace is False, a transformed CountsTable.