maayanlab_bioinformatics.normalization package

Submodules

maayanlab_bioinformatics.normalization.cpm module

maayanlab_bioinformatics.normalization.cpm.cpm_normalize(mat)[source]
maayanlab_bioinformatics.normalization.cpm.cpm_normalize(mat: ndarray)
maayanlab_bioinformatics.normalization.cpm.cpm_normalize(mat: DataFrame)

Compute counts-per-million value of counts Simple division of each column by the total sum of its counts and multiplying it by 10^6

maayanlab_bioinformatics.normalization.cpm.cpm_normalize_np(mat: ndarray)[source]
maayanlab_bioinformatics.normalization.cpm.cpm_normalize_pd(mat: DataFrame)[source]

maayanlab_bioinformatics.normalization.filter module

maayanlab_bioinformatics.normalization.filter.filter_by_expr(mat, design: Series | None = None, group: DataFrame | None = None, min_count=10, min_total_count=15, large_n=10, min_prop=0.7, tol=1e-14)[source]

Ported from R https://rdrr.io/bioc/edgeR/src/R/filterByExpr.R

maayanlab_bioinformatics.normalization.filter.filter_by_var(mat: DataFrame, top_n=2500, axis=1)[source]

Select rows with the most variable expression accross all samples. Takes a dataframe and returns a filtered dataframe in the same orientation. e.g. |condition_1|condition_2| gene_1| 1 | 1 | gene_2| 0 | 10 |

gene_1 here is not variable at all, gene_2 here is very variable.

gene_1 will be dropped, while gene_2 is kept.

maayanlab_bioinformatics.normalization.log module

maayanlab_bioinformatics.normalization.log.log10_normalize(mat, offset=1.0)[source]
maayanlab_bioinformatics.normalization.log.log10_normalize(mat: ndarray, offset=1.0)
maayanlab_bioinformatics.normalization.log.log10_normalize(mat: DataFrame, offset=1.0)
maayanlab_bioinformatics.normalization.log.log10_normalize(mat: Series, offset=1.0)

Compute log normalization of matrix Simple log10(x + offset), offset usually set to 1. because log(0) is undefined.

maayanlab_bioinformatics.normalization.log.log10_normalize_np(mat: ndarray, offset=1.0)[source]
maayanlab_bioinformatics.normalization.log.log10_normalize_pd(mat: DataFrame, offset=1.0)[source]
maayanlab_bioinformatics.normalization.log.log10_normalize_pds(mat: Series, offset=1.0)[source]
maayanlab_bioinformatics.normalization.log.log2_normalize(mat, offset=1.0)[source]
maayanlab_bioinformatics.normalization.log.log2_normalize(mat: ndarray, offset=1.0)
maayanlab_bioinformatics.normalization.log.log2_normalize(mat: DataFrame, offset=1.0)
maayanlab_bioinformatics.normalization.log.log2_normalize(mat: Series, offset=1.0)

Compute log normalization of matrix Simple log2(x + offset), offset usually set to 1. because log(0) is undefined.

maayanlab_bioinformatics.normalization.log.log2_normalize_np(mat: ndarray, offset=1.0)[source]
maayanlab_bioinformatics.normalization.log.log2_normalize_pd(mat: DataFrame, offset=1.0)[source]
maayanlab_bioinformatics.normalization.log.log2_normalize_pds(mat: Series, offset=1.0)[source]

maayanlab_bioinformatics.normalization.quantile module

maayanlab_bioinformatics.normalization.quantile_legacy module

maayanlab_bioinformatics.normalization.quantile_legacy.quantile_normalize(mat)[source]
maayanlab_bioinformatics.normalization.quantile_legacy.quantile_normalize(mat: ndarray)
maayanlab_bioinformatics.normalization.quantile_legacy.quantile_normalize(mat: DataFrame)

Perform quantile normalization on the values of a matrix In the case of a pd.DataFrame, preserve the index on the output frame. See: https://en.wikipedia.org/wiki/Quantile_normalization

maayanlab_bioinformatics.normalization.quantile_legacy.quantile_normalize_h5(in_mat, out_mat, tmp=None)[source]
maayanlab_bioinformatics.normalization.quantile_legacy.quantile_normalize_np(mat: ndarray)[source]
maayanlab_bioinformatics.normalization.quantile_legacy.quantile_normalize_pd(mat: DataFrame)[source]

maayanlab_bioinformatics.normalization.zscore module

maayanlab_bioinformatics.normalization.zscore.zscore_normalize(mat, ddof=0)[source]
maayanlab_bioinformatics.normalization.zscore.zscore_normalize(mat: ndarray, ddof=0)
maayanlab_bioinformatics.normalization.zscore.zscore_normalize(mat: DataFrame, ddof=0)
maayanlab_bioinformatics.normalization.zscore.zscore_normalize(mat: Series, ddof=0)

Compute the z score of each value in the sample, relative to the sample mean and standard deviation. In the case of a pd.DataFrame, preserve the index on the output frame.

maayanlab_bioinformatics.normalization.zscore.zscore_normalize_np(mat: ndarray, ddof=0)[source]
maayanlab_bioinformatics.normalization.zscore.zscore_normalize_pd(mat: DataFrame, ddof=0)[source]
maayanlab_bioinformatics.normalization.zscore.zscore_normalize_pds(mat: Series, ddof=0)[source]

Module contents

This module contains functions relating to data normalization.