maayanlab_bioinformatics.dge package¶
Submodules¶
maayanlab_bioinformatics.dge.characteristic_direction module¶
- maayanlab_bioinformatics.dge.characteristic_direction.characteristic_direction(controls_mat: DataFrame, cases_mat: DataFrame, gamma=1.0, nnull=10, norm_vector=True, sort=True, calculate_sig=False)[source]¶
Given two separate dataframes (controls, cases) with a shared index (genes), we compute the characteristic direction coefficients for all genes. e.g.
control_mat: |control_replicate_1|control_replicate_2|… gene_1| .. | .. |... gene_2| .. | .. |…
cases_mat: |case_replicate_1|case_replicate_2|… gene_1| .. | .. |... gene_2| .. | .. |…
- maayanlab_bioinformatics.dge.characteristic_direction.up_down_from_characteristic_direction(expr: DataFrame, top_n=600)[source]¶
Using the output of
characteristic_direction
, we can extract the top n genes with the highest absolute characteristic direction coefficients and split them intoup
anddown
.
maayanlab_bioinformatics.dge.deseq2 module¶
- maayanlab_bioinformatics.dge.deseq2.deseq2_differential_expression(controls_mat: ~pandas.core.frame.DataFrame, cases_mat: ~pandas.core.frame.DataFrame, n_cpus=4, remove_na=True, sorted=True, stdout=<maayanlab_bioinformatics.dge.deseq2._DevNull object>)[source]¶
Use pydeseq2 for differential expression.
Note that this function expects the original raw gene counts.
- Parameters:
controls_mat – (pd.DataFrame) the control samples (samples as columns and genes as rows)
cases_mat – (pd.DataFrame) the case samples (samples as columns and genes as rows)
n_cpus – (int) number of CPUs used (default: number of cpus)
remove_na – (bool) remove genes with NAN values (default: True)
sorted – (bool) sort genes from most significant to least significant (default: True)
stdout – (writeable stream) direct deseq’s output, e.g. sys.stdout (default: suppress)
- Returns:
A data frame with the results
maayanlab_bioinformatics.dge.limma_voom module¶
- maayanlab_bioinformatics.dge.limma_voom.limma_voom_differential_expression(controls_mat: DataFrame, cases_mat: DataFrame, all_data_mat: DataFrame | None = None, filter_genes: bool = False, voom_design: bool = False)[source]¶
Use R’s voom and limma for differential expression.
Note that this function expects the original raw gene counts.
alex version: voom_design=True, filter_genes=False biojupies version: voom_design=False, filter_genes=True
- Parameters:
controls_mat – (pd.DataFrame) the control samples
cases_mat – (pd.DataFrame) the case samples
all_data_mat – (pd.DataFrame) all samples (for full experiment normalization)
filter_genes – (bool) Whether to perform R’s
filterByExpr
during normalizationvoom_design – (bool) Whether to give R’s voom function the design matrix (supervised)
- Returns:
A data frame with the results
- maayanlab_bioinformatics.dge.limma_voom.limma_voom_differential_expression_design(expression: DataFrame, design: DataFrame, de: Tuple[str, str], filter_genes: bool = False, voom_design: bool = False)[source]¶
Use R’s voom and limma for differential expression.
Note that this function expects the original raw gene counts.
alex version: voom_design=True, filter_genes=False biojupies version: voom_design=False, filter_genes=True
- Parameters:
expression – (pd.DataFrame) the samples
design – (pd.DataFrame) the design dataframe
de – (Tuple[str, str]) (‘control_column_name’, ‘case_column_name’)
filter_genes – (bool) Whether to perform R’s
filterByExpr
during normalizationvoom_design – (bool) Whether to give R’s voom function the design matrix (supervised)
- Returns:
A data frame with the results
- maayanlab_bioinformatics.dge.limma_voom.up_down_from_limma_voom(expr: DataFrame, top_n: int = 600)[source]¶
Given limma_voom_differential_expression output, produce a discrete up/down geneset
- Parameters:
top_n – (int) the number of genes in total to produce
- Returns:
UpDownGeneset a type with
.up
and.down
methods corresponding to the list of genes.
maayanlab_bioinformatics.dge.logfc module¶
- maayanlab_bioinformatics.dge.logfc.logfc_differential_expression(controls_mat: DataFrame, cases_mat: DataFrame)[source]¶
NOT RECOMMENDED. Given two separate dataframes (controls, cases) with a shared index (genes), we compute the logFC differential expression for all genes, also the average expression.
- Parameters:
controls_mat – (pd.DataFrame) the control samples (samples as columns and genes as rows)
cases_mat – (pd.DataFrame) the case samples (samples as columns and genes as rows)
- Returns:
A data frame with the results
maayanlab_bioinformatics.dge.ttest module¶
- maayanlab_bioinformatics.dge.ttest.ttest_differential_expression(controls_mat: DataFrame, cases_mat: DataFrame, equal_var=False, alternative='two-sided', log2norm=True)[source]¶
Given two separate dataframes (controls, cases) with a shared index (genes), we compute the ttest differential expression for all genes. Benjamini-Hochberg Adjusted p-value.
- Parameters:
controls_mat – (pd.DataFrame) the control samples (samples as columns and genes as rows)
cases_mat – (pd.DataFrame) the case samples (samples as columns and genes as rows)
equal_var – (bool) Should t-test assume equal variance (default: False)
alternative – (str) Alternative hypothesis (see scipy.stats.ttest_ind) (default: two-sided)
log2norm – (bool) Apply log2norm, typically keep with raw counts but disable if you have normalized data (default: True)
- Returns:
A data frame with the results
Module contents¶
This module contains functions for differential expression analysis