maayanlab_bioinformatics.dge package

Submodules

maayanlab_bioinformatics.dge.characteristic_direction module

maayanlab_bioinformatics.dge.characteristic_direction.characteristic_direction(controls_mat: DataFrame, cases_mat: DataFrame, gamma=1.0, nnull=10, norm_vector=True, sort=True, calculate_sig=False)[source]

Given two separate dataframes (controls, cases) with a shared index (genes), we compute the characteristic direction coefficients for all genes. e.g.

control_mat: |control_replicate_1|control_replicate_2|… gene_1| .. | .. |... gene_2| .. | .. |

cases_mat: |case_replicate_1|case_replicate_2|… gene_1| .. | .. |... gene_2| .. | .. |

maayanlab_bioinformatics.dge.characteristic_direction.up_down_from_characteristic_direction(expr: DataFrame, top_n=600)[source]

Using the output of characteristic_direction, we can extract the top n genes with the highest absolute characteristic direction coefficients and split them into up and down.

maayanlab_bioinformatics.dge.deseq2 module

maayanlab_bioinformatics.dge.deseq2.deseq2_differential_expression(controls_mat: ~pandas.core.frame.DataFrame, cases_mat: ~pandas.core.frame.DataFrame, n_cpus=4, remove_na=True, sorted=True, stdout=<maayanlab_bioinformatics.dge.deseq2._DevNull object>)[source]

Use pydeseq2 for differential expression.

Note that this function expects the original raw gene counts.

Parameters:
  • controls_mat – (pd.DataFrame) the control samples (samples as columns and genes as rows)

  • cases_mat – (pd.DataFrame) the case samples (samples as columns and genes as rows)

  • n_cpus – (int) number of CPUs used (default: number of cpus)

  • remove_na – (bool) remove genes with NAN values (default: True)

  • sorted – (bool) sort genes from most significant to least significant (default: True)

  • stdout – (writeable stream) direct deseq’s output, e.g. sys.stdout (default: suppress)

Returns:

A data frame with the results

maayanlab_bioinformatics.dge.limma_voom module

maayanlab_bioinformatics.dge.limma_voom.limma_voom_differential_expression(controls_mat: DataFrame, cases_mat: DataFrame, all_data_mat: DataFrame | None = None, filter_genes: bool = False, voom_design: bool = False)[source]

Use R’s voom and limma for differential expression.

Note that this function expects the original raw gene counts.

alex version: voom_design=True, filter_genes=False biojupies version: voom_design=False, filter_genes=True

Parameters:
  • controls_mat – (pd.DataFrame) the control samples

  • cases_mat – (pd.DataFrame) the case samples

  • all_data_mat – (pd.DataFrame) all samples (for full experiment normalization)

  • filter_genes – (bool) Whether to perform R’s filterByExpr during normalization

  • voom_design – (bool) Whether to give R’s voom function the design matrix (supervised)

Returns:

A data frame with the results

maayanlab_bioinformatics.dge.limma_voom.limma_voom_differential_expression_design(expression: DataFrame, design: DataFrame, de: Tuple[str, str], filter_genes: bool = False, voom_design: bool = False)[source]

Use R’s voom and limma for differential expression.

Note that this function expects the original raw gene counts.

alex version: voom_design=True, filter_genes=False biojupies version: voom_design=False, filter_genes=True

Parameters:
  • expression – (pd.DataFrame) the samples

  • design – (pd.DataFrame) the design dataframe

  • de – (Tuple[str, str]) (‘control_column_name’, ‘case_column_name’)

  • filter_genes – (bool) Whether to perform R’s filterByExpr during normalization

  • voom_design – (bool) Whether to give R’s voom function the design matrix (supervised)

Returns:

A data frame with the results

maayanlab_bioinformatics.dge.limma_voom.make_design_matrix(expression_df, controls, cases)[source]
maayanlab_bioinformatics.dge.limma_voom.up_down_from_limma_voom(expr: DataFrame, top_n: int = 600)[source]

Given limma_voom_differential_expression output, produce a discrete up/down geneset

Parameters:

top_n – (int) the number of genes in total to produce

Returns:

UpDownGeneset a type with .up and .down methods corresponding to the list of genes.

maayanlab_bioinformatics.dge.logfc module

maayanlab_bioinformatics.dge.logfc.logfc_differential_expression(controls_mat: DataFrame, cases_mat: DataFrame)[source]

NOT RECOMMENDED. Given two separate dataframes (controls, cases) with a shared index (genes), we compute the logFC differential expression for all genes, also the average expression.

Parameters:
  • controls_mat – (pd.DataFrame) the control samples (samples as columns and genes as rows)

  • cases_mat – (pd.DataFrame) the case samples (samples as columns and genes as rows)

Returns:

A data frame with the results

maayanlab_bioinformatics.dge.ttest module

maayanlab_bioinformatics.dge.ttest.ttest_differential_expression(controls_mat: DataFrame, cases_mat: DataFrame, equal_var=False, alternative='two-sided', log2norm=True)[source]

Given two separate dataframes (controls, cases) with a shared index (genes), we compute the ttest differential expression for all genes. Benjamini-Hochberg Adjusted p-value.

Parameters:
  • controls_mat – (pd.DataFrame) the control samples (samples as columns and genes as rows)

  • cases_mat – (pd.DataFrame) the case samples (samples as columns and genes as rows)

  • equal_var – (bool) Should t-test assume equal variance (default: False)

  • alternative – (str) Alternative hypothesis (see scipy.stats.ttest_ind) (default: two-sided)

  • log2norm – (bool) Apply log2norm, typically keep with raw counts but disable if you have normalized data (default: True)

Returns:

A data frame with the results

Module contents

This module contains functions for differential expression analysis