logo Dexamethasone Benchmark Resource
  • About
  • Code
  • Data
  • Figures
  • Stats

Benchmarking L1000 Consensus Signatures¶

This notebook benchmarks three different methods used to compute consensus signatures for the L1000 chemical perturbation and CRIPSR KO signatures: mean, median, and weighted average applied to gene expression values across all signatures corresponding to a single perturbagen.

The consensus signatures are benchmarked by evaluating the recovery of known drug targets for each L1000 drug by ranking each CRISPR KO signature by its similarity (Pearson correlation coefficient) to the drug consensus signature. Known drug target relationships are retrieved from Pharos.

import pandas as pd 
import numpy as np
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import requests
from IPython.display import display, Markdown
from random import sample
from maayanlab_bioinformatics.plotting.bridge import bridge_plot
import matplotlib.pyplot as plt
import math
from matplotlib_venn import venn2, venn3
from supervenn import supervenn

Benchmarking L1000 Consensus Signature Methods¶

This notebook provides code and visualizations for benchmarking different methods of generating consensus signatures from the L1000 signatures.

The current implementation examines the performance of consensus signatures generated by taking the mean, median, and weighted average of L1000 Characteristic Direction signatures downloaded from SigCom LINCS; however, this notebook is designed to work with any number of consensus and signature computation methods.

means = pd.read_csv('similarity_matrices/mean_pearson.txt.gz', sep='\t', index_col=0)
medians = pd.read_csv('similarity_matrices/median_pearson.txt.gz', sep='\t', index_col=0)
wavgs = pd.read_csv('similarity_matrices/weightedavg_pearson.txt.gz', sep='\t', index_col=0)

Extract Consensus Signature Similarity Matrices¶

For each consensus signature generation method, we generated consensus signatures for all L1000 CRISPR KO genes and chemical perturbations. We then pre-computed the Pearson Correlation Coefficients between each CRISPR KO signature and each chemical perturbation signature.

The reasoning is that we expect the consensus siganture for a given drug to have a non-random correlation to the consensus CRISPR KO signatures for its targets, representing a common pharmacological effect across different cell line contexts.

display(Markdown("Pearson Correlation Coefficients for Mean Consensus Signatures"))
display(means)
display(Markdown("Pearson Correlation Coefficients for Median Consensus Signatures"))
display(medians)
display(Markdown("Pearson Correlation Coefficients for Weighted Average Consensus Signatures"))
display(wavgs)

Pearson Correlation Coefficients for Mean Consensus Signatures

AKT1 KRAS-2B BRAF1 BRAF MYC BRD4 MCL1 CDK4 BCL2L2 MAPK1 ... BRDN0003677177 KIT SUV420H1 TGFBR2 STAG2 SMC1A SMC3 RAD21 NIPBL PDS5B
afatinib 0.039288 0.087599 0.052460 0.032641 0.035939 -0.025099 0.078779 -0.097733 -0.041715 0.158596 ... -0.002491 0.042092 0.017690 0.065971 -0.018110 -0.025007 -0.052941 0.069486 0.015613 0.017151
erlotinib -0.048258 0.151482 0.064425 0.070239 0.023104 -0.088843 0.093194 -0.155576 -0.023462 0.156218 ... 0.003109 -0.004992 0.016802 0.035774 0.012185 -0.030883 0.022719 0.058391 -0.038472 -0.020328
neratinib -0.011519 0.116096 0.040359 0.142860 0.078508 -0.008306 0.020023 0.050044 -0.008112 0.098631 ... 0.036109 0.012717 0.047148 0.063692 0.006152 -0.044527 -0.061279 -0.005293 0.047986 0.039911
lapatinib -0.039902 0.088429 0.025822 -0.064870 0.025288 -0.023845 -0.065937 -0.203769 -0.046839 0.103114 ... -0.068757 -0.055504 0.081767 0.159499 0.063188 -0.008917 -0.013766 -0.073146 -0.032068 0.078218
pazopanib -0.115736 -0.041087 -0.118093 -0.012833 -0.101394 -0.125349 0.088811 -0.107420 0.022588 -0.083674 ... 0.029134 -0.076045 -0.013759 0.112378 -0.040245 -0.020721 -0.001347 0.060826 -0.008466 0.015619
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
desvenlafaxine -0.057500 0.009144 -0.066686 0.003290 -0.030215 -0.072164 -0.003146 -0.197048 -0.072740 0.045427 ... 0.007904 0.034718 0.037131 0.015089 0.000156 0.033563 0.094683 -0.032039 -0.065282 -0.019728
taltirelin -0.042235 0.012455 -0.056294 -0.071240 -0.068729 0.006262 -0.007184 -0.146936 0.024411 -0.011845 ... -0.023191 0.027566 0.020941 0.041642 0.006416 0.015843 0.018180 0.071191 -0.130223 0.043829
rupatadine -0.022242 0.049211 0.023811 -0.114862 0.048122 0.022194 -0.085811 -0.142641 -0.163349 0.033877 ... -0.017072 0.130603 -0.108800 -0.105444 0.034300 -0.143470 -0.031385 -0.017706 0.077739 0.066941
talinolol -0.049888 -0.001736 -0.074130 -0.082681 -0.020887 0.012472 -0.103481 -0.076028 0.104234 -0.035430 ... -0.001946 0.048554 -0.040367 0.193269 0.006213 -0.077561 -0.123392 0.098631 0.028672 0.081624
suvorexant -0.043768 -0.184991 -0.036125 -0.044292 -0.097146 -0.075767 -0.061297 -0.024928 0.114663 -0.057395 ... -0.081464 -0.105453 0.052560 0.107027 0.093611 0.000125 -0.081600 -0.000702 -0.026095 0.001112

839 rows × 7489 columns

Pearson Correlation Coefficients for Median Consensus Signatures

AKT1 KRAS-2B BRAF1 BRAF MYC BRD4 MCL1 CDK4 BCL2L2 MAPK1 ... BRDN0003677177 KIT SUV420H1 TGFBR2 STAG2 SMC1A SMC3 RAD21 NIPBL PDS5B
afatinib 0.051767 0.133398 0.080351 0.063201 0.107135 0.018604 0.112896 -0.073323 0.004723 0.157686 ... 0.006165 0.044201 0.035481 0.104882 -0.057247 -0.010636 0.033024 0.081134 0.005840 -0.010548
erlotinib 0.002502 0.176474 0.071451 0.105725 0.075541 -0.044170 0.102680 -0.130693 0.022231 0.126003 ... 0.012645 -0.011169 0.051464 0.091472 -0.026796 -0.022035 0.092110 0.076800 -0.046434 -0.045664
neratinib 0.009636 0.154788 0.055518 0.158173 0.134390 0.026031 0.063275 0.065081 -0.003457 0.134173 ... 0.043940 0.009929 0.058189 0.085934 -0.014226 -0.058900 -0.009731 0.023470 0.042463 0.021765
lapatinib 0.025341 0.094168 0.030544 -0.066402 0.056962 0.025915 0.009600 -0.180667 0.028163 0.097402 ... -0.025873 -0.045500 0.071522 0.216329 0.020691 -0.006198 0.081951 -0.032450 -0.028261 -0.006342
pazopanib -0.076459 0.026644 -0.091446 0.038801 -0.049903 -0.090547 0.103126 -0.108055 0.177348 -0.036792 ... 0.026215 -0.064031 -0.022605 0.171841 -0.041509 0.012598 0.030378 0.055902 -0.020205 0.015649
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
desvenlafaxine 0.035313 0.074652 -0.049751 -0.003208 0.005777 -0.042920 0.061258 -0.153651 0.054393 0.055375 ... 0.049694 0.029920 0.050870 0.078416 -0.020032 0.033176 0.130389 -0.019806 -0.008313 -0.022432
taltirelin 0.022628 0.056814 0.019259 0.006777 -0.005549 0.044637 -0.015711 -0.105024 0.110464 0.024544 ... -0.006071 0.035309 0.072777 0.108833 -0.015532 0.066210 0.127457 0.037998 -0.103694 -0.061299
rupatadine 0.017681 0.051315 0.040323 -0.085913 0.072831 -0.009402 -0.066040 -0.147568 -0.081792 0.065457 ... 0.011651 0.128499 -0.124422 -0.070918 0.035924 -0.099524 0.033638 -0.026681 0.044997 0.044457
talinolol -0.043222 0.027401 0.004059 -0.003828 -0.024398 0.045217 -0.013021 -0.010395 0.168474 -0.038179 ... 0.078951 -0.011128 0.068661 0.237796 -0.010255 -0.018282 0.021292 0.018570 0.018735 -0.009164
suvorexant -0.019009 -0.090110 -0.073949 -0.042468 -0.082291 -0.002317 0.037124 -0.028563 0.227746 -0.078382 ... -0.039910 -0.119415 0.041700 0.166488 0.046186 0.066971 -0.041845 -0.040430 0.001560 -0.011008

839 rows × 7489 columns

Pearson Correlation Coefficients for Weighted Average Consensus Signatures

AKT1 KRAS-2B BRAF1 BRAF MYC BRD4 MCL1 CDK4 BCL2L2 MAPK1 ... BRDN0003677177 KIT SUV420H1 TGFBR2 STAG2 SMC1A SMC3 RAD21 NIPBL PDS5B
afatinib 0.043842 0.126407 0.057370 0.032816 0.029585 -0.028552 0.083929 -0.060740 -0.009248 0.137992 ... 0.029999 0.042848 0.012697 0.050618 -0.010374 -0.023753 -0.073939 0.054797 0.041644 0.011173
erlotinib -0.039662 0.189926 0.074773 0.068886 -0.016643 -0.104045 0.089787 -0.118480 -0.011179 0.151351 ... 0.024561 -0.004225 0.017831 0.038363 -0.012422 -0.029911 0.008076 0.060249 -0.008020 -0.023663
neratinib -0.022374 0.097546 0.053648 0.128124 0.074055 -0.012890 0.045100 0.084469 0.005544 0.066128 ... 0.062011 0.014258 0.041622 0.073326 0.010700 -0.054571 -0.081777 -0.008257 0.080846 0.051861
lapatinib -0.046097 0.110954 0.027357 -0.071855 -0.079702 -0.019023 -0.040734 -0.203612 -0.047021 0.129413 ... -0.078217 -0.047301 0.072616 0.162382 0.071773 -0.021629 -0.022790 -0.072636 -0.011988 0.083381
pazopanib -0.133967 -0.065964 -0.134439 -0.008608 -0.182686 -0.121940 0.096863 -0.107326 0.055106 -0.109153 ... 0.044837 -0.087218 -0.022356 0.109515 -0.007274 -0.021698 -0.019016 0.050318 -0.014645 0.006664
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
desvenlafaxine -0.066152 0.028200 -0.040536 0.003049 -0.084393 -0.069483 -0.014510 -0.205238 -0.077648 0.031597 ... 0.012706 0.032006 0.041190 0.017247 -0.000311 0.037844 0.078058 -0.027733 -0.072112 -0.040837
taltirelin -0.038543 0.019442 -0.049489 -0.088412 -0.120527 0.014203 -0.031268 -0.173947 0.037578 0.005174 ... -0.021278 0.028144 0.015788 0.032516 0.021326 0.009114 0.037747 0.079069 -0.139897 0.030299
rupatadine -0.017826 0.057744 0.025905 -0.111355 0.091760 -0.000953 -0.107360 -0.104506 -0.193811 0.043774 ... -0.001984 0.140903 -0.125506 -0.087973 0.013074 -0.135349 -0.011436 -0.009831 0.091532 0.052763
talinolol -0.071532 -0.014545 -0.091004 -0.086301 -0.081596 0.046822 -0.065511 -0.085525 0.107279 -0.034005 ... -0.005736 0.045796 -0.041015 0.182636 0.036970 -0.086812 -0.135329 0.105089 0.016583 0.083552
suvorexant -0.069449 -0.215272 -0.072690 -0.082030 -0.168155 -0.055780 -0.002163 -0.046884 0.148673 -0.085423 ... -0.067647 -0.108374 0.057096 0.107738 0.105224 -0.008234 -0.062274 -0.020499 -0.013064 -0.000444

839 rows × 7489 columns

Comparison with Known Drug-Target Pairs¶

Ranking CRISPR KO consensus signatures by similarity to each drug consensus signature¶

Because some drugs may have both inhibitory and agonist effects depending on the target, we rank targets for each drug in both ascending (low to high similarity) and descending (high to low similarity) order.

We expect that for drugs with inhibitory targets, the consensus CRISPR KO signatures for those targets should have high similarity to the drug perturbation signatures. Conversely, the CRISPR KO signatures of agonist targets should have low similarity to the corresponding drug perturbation signature.

asc_drug_ranks = {'mean': {}, 'median': {}, 'wavg': {}, 'random': {}}
desc_drug_ranks = {'mean': {}, 'median': {}, 'wavg': {}, 'random': {}}
for drug in means.index: 
    asc_drug_ranks['mean'][drug] = means.loc[drug].sort_values(ascending=True).index.tolist()
    asc_drug_ranks['median'][drug] = medians.loc[drug].sort_values(ascending=True).index.tolist()
    asc_drug_ranks['wavg'][drug] = wavgs.loc[drug].sort_values(ascending=True).index.tolist()
    asc_drug_ranks['random'][drug] = sample(means.columns.tolist(), len(means.columns))

    desc_drug_ranks['mean'][drug] = means.loc[drug].sort_values(ascending=False).index.tolist()
    desc_drug_ranks['median'][drug] = medians.loc[drug].sort_values(ascending=False).index.tolist()
    desc_drug_ranks['wavg'][drug] = wavgs.loc[drug].sort_values(ascending=False).index.tolist()
    desc_drug_ranks['random'][drug] = sample(means.columns.tolist(), len(means.columns))

Extraction of drug-target associations from Pharos¶

Using Pharos, we pre-constructed a list of drug-target associations by querying all ligands in Pharos, filtering only to drugs (n=1782), and downloading the table of all associated target symbols for each drug.

Here, we import the generated table and identify the drugs which overlap with the L1000 chemical perturbations (n=720).

pharos = pd.read_csv('pharos_query_results.csv')
pharos = pharos.dropna(subset=['Ligand Name', 'Symbol'])
pharos = pharos.sort_values(by=['Ligand Action']).drop_duplicates(subset=['Ligand Name', 'Symbol'], keep='first')
pharos_drugs = set(pharos['Ligand Name'].tolist()).intersection(asc_drug_ranks['mean'].keys())
print(len(pharos_drugs), 'drugs overlap')
720 drugs overlap

For each of the overlapping drugs between Pharos and the L1000 dataset, we can store all targets associated with the drug in a dictionary.

drug2target = {}
drugtarget2action = {}
for d in pharos_drugs:
  if type(pharos[pharos['Ligand Name'] == d]['Symbol']) == str: 
    drug2target[d] = [pharos.loc[d]['Symbol']]
    drugtarget2action[f"{d}-{drug2target[d][0]}"] = pharos.loc[d]['Ligand Action']
  else:
    drug2target[d] = pharos[pharos['Ligand Name'] == d]['Symbol'].tolist()
    for s in drug2target[d]: 
      drugtarget2action[f"{d}-{s}"] = pharos[(pharos['Ligand Name']==d) & (pharos['Symbol']==s)]['Ligand Action'].iloc[0]
asc_rankings = []
desc_rankings = []
for drug in drug2target.keys(): 
  for method in ['mean', 'median', 'wavg', 'random']:
    asc_indices = np.nonzero(np.isin(asc_drug_ranks[method][drug], drug2target[drug]))[0]
    desc_indices = np.nonzero(np.isin(desc_drug_ranks[method][drug], drug2target[drug]))[0]
    for i in range(len(asc_indices)):
      asc_rankings.append([method, drug, asc_indices[i], asc_drug_ranks[method][drug][asc_indices[i]]])
    for i in range(len(desc_indices)):
      desc_rankings.append([method, drug, desc_indices[i], desc_drug_ranks[method][drug][desc_indices[i]]])
asc_rankings_df = pd.DataFrame(
  data=asc_rankings,
  columns=['Method', 'Drug', 'AscRank', 'Target']
)

desc_rankings_df = pd.DataFrame(
  data=desc_rankings,
  columns=['Method', 'Drug', 'DescRank', 'Target']
)
asc_rankings_df['Action'] = asc_rankings_df.apply(
  lambda row: drugtarget2action[f"{row.Drug}-{row.Target}"], axis=1
)
desc_rankings_df['Action'] = desc_rankings_df.apply(
  lambda row: drugtarget2action[f"{row.Drug}-{row.Target}"], axis=1
)
display(desc_rankings_df.groupby(['Method']).mean(numeric_only=True).sort_values(by='DescRank'))
DescRank
Method
wavg 3617.909457
mean 3621.042791
median 3635.453333
random 3784.067907
display(desc_rankings_df.groupby(['Method', 'Action']).mean(numeric_only=True).sort_values(by=['Method','DescRank']))
DescRank
Method Action
mean GATING INHIBITOR 192.000000
ALLOSTERIC MODULATOR 1306.000000
POSITIVE ALLOSTERIC MODULATOR 2926.400000
PARTIAL AGONIST 2944.666667
ALLOSTERIC ANTAGONIST 3130.166667
BLOCKER 3384.564706
POSITIVE MODULATOR 3468.428571
AGONIST 3559.194286
INHIBITOR 3687.469602
ANTAGONIST 3709.403361
INVERSE AGONIST 4158.142857
MODULATOR 4555.500000
ACTIVATOR 4717.500000
OPENER 4776.500000
NEGATIVE ALLOSTERIC MODULATOR 5743.250000
median GATING INHIBITOR 377.666667
ALLOSTERIC MODULATOR 435.000000
POSITIVE ALLOSTERIC MODULATOR 2313.857143
PARTIAL AGONIST 2572.266667
ALLOSTERIC ANTAGONIST 2697.166667
POSITIVE MODULATOR 3123.285714
BLOCKER 3481.752941
AGONIST 3564.691429
INHIBITOR 3717.691824
INVERSE AGONIST 3747.714286
ANTAGONIST 3828.865546
MODULATOR 4457.500000
OPENER 4576.000000
ACTIVATOR 4732.750000
NEGATIVE ALLOSTERIC MODULATOR 5739.500000
random OPENER 1791.500000
INVERSE AGONIST 2675.571429
NEGATIVE ALLOSTERIC MODULATOR 3254.250000
ALLOSTERIC ANTAGONIST 3541.666667
POSITIVE ALLOSTERIC MODULATOR 3601.485714
GATING INHIBITOR 3625.333333
INHIBITOR 3695.211740
MODULATOR 3742.250000
ANTAGONIST 3853.798319
BLOCKER 3880.776471
PARTIAL AGONIST 4110.666667
AGONIST 4230.942857
POSITIVE MODULATOR 4357.714286
ACTIVATOR 4619.125000
ALLOSTERIC MODULATOR 5912.000000
wavg GATING INHIBITOR 249.333333
ALLOSTERIC MODULATOR 1350.000000
POSITIVE ALLOSTERIC MODULATOR 2888.485714
PARTIAL AGONIST 2956.133333
ALLOSTERIC ANTAGONIST 3003.500000
AGONIST 3478.800000
BLOCKER 3519.447059
INHIBITOR 3677.138365
POSITIVE MODULATOR 3680.714286
ANTAGONIST 3708.424370
INVERSE AGONIST 4402.000000
OPENER 4512.000000
ACTIVATOR 4548.625000
MODULATOR 4708.250000
NEGATIVE ALLOSTERIC MODULATOR 5761.000000

Boxplot¶

method_color_map = {}
fig1 = go.Figure()

desc_sub_df = desc_rankings_df[asc_rankings_df['Method'] != 'random'].set_index('Action')
for gs in desc_sub_df.groupby(['Action']).mean(numeric_only=True).sort_values('DescRank').index:
    fig1.add_trace(
        go.Box(
            y=asc_sub_df.loc[gs,'DescRank'].tolist(),
            name=gs
        )
    )
fig1.update_xaxes(title_text="Drug Action")
fig1.update_yaxes(title_text="Rank")

fig1.update_layout(
    height=600,
    width=1000,
    title_text="Target Gene Consensus Sig Rankings for L1000 Drugs by Action (High to Low Sim)",
    showlegend=False
)
fig1.show("png")

Bridge Plots¶

inh_drug2target = {}
agon_drug2target = {}
for k in drugtarget2action.keys(): 
  if drugtarget2action[k] == 'INHIBITOR': 
    if k in inh_drug2target.keys(): 
      inh_drug2target[k.split('-')[0]].append(k.split('-')[1])
    else:
      inh_drug2target[k.split('-')[0]] = [k.split('-')[1]]
  elif drugtarget2action[k] == 'AGONIST': 
    if k in agon_drug2target.keys(): 
      agon_drug2target[k.split('-')[0]].append(k.split('-')[1])
    else:
      agon_drug2target[k.split('-')[0]] = [k.split('-')[1]]
  else: 
    continue
def drugBridgePlot(drug, method, ranks, dtmapping): 
  select = pd.Series([x.upper() in dtmapping[drug] for x in ranks[method][drug]])
  x, y = bridge_plot(select)
  return x/len(x), y 

def methodBridgePlot(method, ranks, dtmapping): 
  x = []
  y = []
  for drug in ranks[method].keys(): 
    if drug in dtmapping.keys():
      x_drug, y_drug = drugBridgePlot(drug, method, ranks, dtmapping)
      x.append(x_drug)
      y.append(y_drug)
  return np.nanmean(np.array(x), axis=0), np.nanmean(np.array(y), axis=0)

def fullBridgePlot(ranks, sim, dtmapping):
  fig = plt.figure()
  for method in ranks.keys(): 
    x_method, y_method = methodBridgePlot(method, ranks, dtmapping)
    if method == 'random': 
      plt.plot(x_method, y_method, label=method, color='gray')
    else:
      plt.plot(x_method, y_method, label=method)
  plt.axhline(y=0, color='black', linestyle='dashed')
  plt.xlabel('Rank')
  plt.ylabel('D(r) - r')
  plt.legend(bbox_to_anchor=(1.25,1))
  plt.title(f"Target Gene Consensus CRISPR KO Sig Ranks for L1000 Drugs ({sim})")
  plt.show()
  fig.savefig('../../publicationfigs/Figure8B.png', format='png', dpi=300, bbox_inches='tight')
def leadingEdge(ranks, drugtype, dtmapping, cutoff=0.01): 
  fig = plt.figure()
  for method in ranks.keys(): 
    x_method, y_method = methodBridgePlot(method, ranks, dtmapping)
    x_method = x_method[x_method < cutoff]
    y_method = y_method[:len(x_method)]
    if method == 'random': 
      plt.plot(x_method, y_method, label=method, color='gray')
    else:
      plt.plot(x_method, y_method, label=method)
  plt.axhline(y=0, color='black', linestyle='dashed')
  plt.xlabel('Rank')
  plt.ylabel('D(r) - r')
  plt.legend(bbox_to_anchor=(1.3,1))
  plt.title(f"Leading Edge of Target Gene Consensus CRISPR KO Sig Ranks ({drugtype})")
  plt.show()
  fig.savefig('../../publicationfigs/Figure8A.png', format='png', dpi=300, bbox_inches='tight')
inhibitors = desc_rankings_df[desc_rankings_df['Action'] == 'INHIBITOR']['Drug'].tolist()
inh_desc_drug_ranks = {
  m: {d: desc_drug_ranks[m][d] for d in desc_drug_ranks[m].keys() if d in inhibitors}
  for m in desc_drug_ranks.keys()
}
fullBridgePlot(inh_desc_drug_ranks, 'Pharos Inhibitors -- High to Low Sim', drug2target)
leadingEdge(inh_desc_drug_ranks, 'Pharos Inhibitors -- High to Low Sim', drug2target)

Overlap with IDG target gene sets¶

import h5py
with h5py.File('consensus_sigs/cp_weightedavg_coeff_mat.gctx', 'r') as f_in:
  dex_mask = np.in1d(f_in['0/META/COL/id'].asstr()[:], ['dexamethasone'])
  dex_ind = f_in['0/META/COL/id'].asstr()[dex_mask].nonzero()[0]
  dex_consensus = f_in['0/DATA/0/matrix'][dex_ind, :]
  dex_consensus = pd.Series(
    dex_consensus[0],
    index=f_in['0/META/ROW/id'].asstr()[:]
  )
with h5py.File('consensus_sigs/xpr_weightedavg_coeff_mat.gctx', 'r') as f_in:
  mask = np.in1d(f_in['0/META/COL/id'].asstr()[:], ['NR1I2', 'NR0B1'])
  ind = f_in['0/META/COL/id'].asstr()[mask].nonzero()[0]
  consensus = pd.DataFrame(
    f_in['0/DATA/0/matrix'][ind, :],
    index=f_in['0/META/COL/id'].asstr()[mask],
    columns=f_in['0/META/ROW/id'].asstr()[:]
  )
  nr0b1_consensus = consensus.loc['NR0B1']
  nr1i2_consensus = consensus.loc['NR1I2']
def getEnrichrLibrary(library_name): 
    ENRICHR_URL = f'https://maayanlab.cloud/Enrichr/geneSetLibrary?mode=json&libraryName={library_name}'
    resp = requests.get(ENRICHR_URL)
    if not resp.ok: 
        raise Exception(f"Error downloading {library_name} library from Enrichr, please try again.")
    return resp.json()[library_name]['terms']

chea2022 = getEnrichrLibrary('ChEA_2022')
targets = ['NR3C1', 'NR0B1', 'NR1I2']
tf_gene_sets = {t: [] for t in targets}

for tf in chea2022.keys():
  geneset = []
  if any(t in tf for t in targets):
    if 'MLEC' not in tf and 'Wistar' not in tf:
      tf_gene_sets[tf.split(' ')[0]] += list(chea2022[tf].keys())

tf_gene_sets = {f"{t} Targets": set(tf_gene_sets[t]) for t in tf_gene_sets.keys()}
dex_up_genes = dex_consensus[dex_consensus > 0].sort_values(ascending=False).index.tolist()[:250]
dex_down_genes = dex_consensus[dex_consensus < 0].sort_values(ascending=True).index.tolist()[:250]

nr0b1_up_genes = nr0b1_consensus[nr0b1_consensus > 0].sort_values(ascending=False).index.tolist()[:250]
nr0b1_down_genes = nr0b1_consensus[nr0b1_consensus < 0].sort_values(ascending=True).index.tolist()[:250]

nr1i2_up_genes = nr1i2_consensus[nr1i2_consensus > 0].sort_values(ascending=False).index.tolist()[:250]
nr1i2_down_genes = nr1i2_consensus[nr1i2_consensus < 0].sort_values(ascending=True).index.tolist()[:250]
for tf in tf_gene_sets.keys(): 
  print(f"{tf}: {len(tf_gene_sets[tf])} genes")
print(f"Dex up genes: {len(dex_up_genes)}")
print(f"Dex down genes: {len(dex_down_genes)}")
print(f"NR0B1 up genes: {len(nr0b1_up_genes)}")
print(f"NR0B1 down genes: {len(nr0b1_down_genes)}")
print(f"NR1I2 up genes: {len(nr1i2_up_genes)}")
print(f"NR1I2 down genes: {len(nr1i2_down_genes)}")
NR3C1 Targets: 4042 genes
NR0B1 Targets: 1210 genes
NR1I2 Targets: 631 genes
Dex up genes: 250
Dex down genes: 250
NR0B1 up genes: 250
NR0B1 down genes: 250
NR1I2 up genes: 250
NR1I2 down genes: 250
color_map = {
  'NR3C1': '#009E73',
  'NR0B1': '#FE6100',
  'NR1I2': '#BDB5D5',
  'Dex Up Genes': '#648FFF',
  'Dex Down Genes': '#DC267F',
  'NR0B1 Up Genes': '#648FFF',
  'NR0B1 Down Genes': '#DC267F',
  'NR1I2 Up Genes': '#648FFF',
  'NR1I2 Down Genes': '#DC267F'
}
fig = plt.figure(figsize=(6,6))
vd = venn3(
  [
    tf_gene_sets['NR3C1 Targets'], 
    tf_gene_sets['NR0B1 Targets'], 
    tf_gene_sets['NR1I2 Targets']
  ],
  (
    'NR3C1 ChIP-seq Targets \n(PC12, MCF10A, BEAS2B)', 
    'NR0B1 ChIP-seq Targets \n(mESCs)', 
    'NR1I2 ChIP-seq Targets \n(Liver)'
  ),
  set_colors=(
    color_map['NR3C1'],
    color_map['NR0B1'],
    color_map['NR1I2']
  ),
  alpha=0.5
)

for text in vd.set_labels:
  text.set_fontsize(8)
for text in vd.subset_labels:
  text.set_fontsize(8)

# Adjust label for NR3C1 targets
lbl1 = vd.get_label_by_id("A")
x, y = lbl1.get_position()
lbl1.set_position((x+0.15, y))

# Adjust label for NR0B1 targets
lbl2 = vd.get_label_by_id("B")
x2, y2 = lbl2.get_position()
lbl2.set_position((x2-0.25, y2+0.03))

# Adjust label for NR1I2 targets
lbl2 = vd.get_label_by_id("C")
x2, y2 = lbl2.get_position()
lbl2.set_position((x2, y2+0.01))

plt.gca().set_facecolor('white')
plt.gca().set_axis_on()

fig.savefig("../../publicationfigs/Figure6.pdf", format='pdf', dpi=900)
tf_dex_overlap = {}

fig, axs = plt.subplots(6,3, figsize=(14,20))
fig.set_facecolor('white')
c = 0
for tf in tf_gene_sets.keys(): 
  tf_dex_overlap[f"{tf} - Dex Up Genes"] = tf_gene_sets[tf].intersection(dex_up_genes)
  venn2(
    [tf_gene_sets[tf], set(dex_up_genes)],
    (tf.replace(' Targets', ' ChIP-seq Targets'), 'Dex Up Genes'),
    set_colors=(color_map[tf.split(' Targets')[0]], '#648FFF'),
    alpha=0.4,
    ax=axs[0][c]
  )
  tf_dex_overlap[f"{tf} - Dex Down Genes"] = tf_gene_sets[tf].intersection(dex_down_genes)
  venn2(
    [tf_gene_sets[tf], set(dex_down_genes)],
    (tf.replace(' Targets', ' ChIP-seq Targets'), 'Dex Down Genes'),
    set_colors=(color_map[tf.split(' Targets')[0]], '#DC267F'),
    alpha=0.4,
    ax=axs[1][c]
  )
  tf_dex_overlap[f"{tf} - NR0B1 Up Genes"] = tf_gene_sets[tf].intersection(nr0b1_up_genes)
  venn2(
    [tf_gene_sets[tf], set(nr0b1_up_genes)],
    (tf.replace(' Targets', ' ChIP-seq Targets'), 'NR0B1 Up Genes'),
    set_colors=(color_map[tf.split(' Targets')[0]], '#648FFF'),
    alpha=0.4,
    ax=axs[2][c]
  )
  tf_dex_overlap[f"{tf} - NR0B1 Down Genes"] = tf_gene_sets[tf].intersection(nr0b1_down_genes)
  venn2(
    [tf_gene_sets[tf], set(nr0b1_down_genes)],
    (tf.replace(' Targets', ' ChIP-seq Targets'), 'NR0B1 Down Genes'),
    set_colors=(color_map[tf.split(' Targets')[0]], '#DC267F'),
    alpha=0.4,
    ax=axs[3][c]
  )
  tf_dex_overlap[f"{tf} - NR1I2 Up Genes"] = tf_gene_sets[tf].intersection(nr1i2_up_genes)
  venn2(
    [tf_gene_sets[tf], set(nr1i2_up_genes)],
    (tf.replace(' Targets', ' ChIP-seq Targets'), 'NR1I2 Up Genes'),
    set_colors=(color_map[tf.split(' Targets')[0]], '#648FFF'),
    alpha=0.4,
    ax=axs[4][c]
  )
  tf_dex_overlap[f"{tf} - NR1I2 Down Genes"] = tf_gene_sets[tf].intersection(nr1i2_down_genes)
  venn2(
    [tf_gene_sets[tf], set(nr1i2_down_genes)],
    (tf.replace(' Targets', ' ChIP-seq Targets'), 'NR1I2 Down Genes'),
    set_colors=(color_map[tf.split(' Targets')[0]], '#DC267F'),
    alpha=0.4,
    ax=axs[5][c]
  )
  c += 1

fig.savefig("../../publicationfigs/Figure5.pdf", format='pdf', dpi=900)
fig = plt.figure(figsize=(16, 5))
supervenn(
  [
    tf_gene_sets['NR3C1 Targets'],
    tf_gene_sets['NR0B1 Targets'],
    tf_gene_sets['NR1I2 Targets'],
    set(dex_up_genes),
    set(dex_down_genes)
  ],
  [
    'NR3C1 ChIP-seq Targets', 
    'NR0B1 ChIP-seq Targets', 
    'NR1I2 ChIP-seq Targets',
    'Dex Up Genes',
    'Dex Down Genes'
  ],
  chunks_ordering='minimize gaps',
  sets_ordering='minimize gaps',
  widths_minmax_ratio=0.03
)
fig = plt.figure(figsize=(16, 5))
supervenn(
  [
    tf_gene_sets['NR3C1 Targets'],
    tf_gene_sets['NR0B1 Targets'],
    tf_gene_sets['NR1I2 Targets'],
    set(nr0b1_up_genes),
    set(nr0b1_down_genes)
  ],
  [
    'NR3C1 ChIP-seq Targets', 
    'NR0B1 ChIP-seq Targets', 
    'NR1I2 ChIP-seq Targets',
    'NR0B1 Up Genes',
    'NR0B1 Down Genes'
  ],
  chunks_ordering='minimize gaps',
  sets_ordering='minimize gaps',
  widths_minmax_ratio=0.03
)
fig = plt.figure(figsize=(16, 5))
supervenn(
  [
    tf_gene_sets['NR3C1 Targets'],
    tf_gene_sets['NR0B1 Targets'],
    tf_gene_sets['NR1I2 Targets'],
    set(nr1i2_up_genes),
    set(nr1i2_down_genes)
  ],
  [
    'NR3C1 ChIP-seq Targets', 
    'NR0B1 ChIP-seq Targets', 
    'NR1I2 ChIP-seq Targets',
    'NR1I2 Up Genes',
    'NR1I2 Down Genes'
  ],
  chunks_ordering='minimize gaps',
  sets_ordering='minimize gaps',
  widths_minmax_ratio=0.03
)
# Dex down vs targets
print('Dex down')
print(tf_gene_sets['NR0B1 Targets'].intersection(tf_gene_sets['NR3C1 Targets']).intersection(tf_gene_sets['NR1I2 Targets']).intersection(dex_down_genes))
# Dex up vs targets
print('Dex up')
print(tf_gene_sets['NR0B1 Targets'].intersection(tf_gene_sets['NR3C1 Targets']).intersection(tf_gene_sets['NR1I2 Targets']).intersection(dex_up_genes))

# NR0B1 down vs targets
print('NR0B1 down')
print(tf_gene_sets['NR0B1 Targets'].intersection(tf_gene_sets['NR3C1 Targets']).intersection(tf_gene_sets['NR1I2 Targets']).intersection(nr0b1_down_genes))
# NR0B1 up vs targets
print('NR0B1 up')
print(tf_gene_sets['NR0B1 Targets'].intersection(tf_gene_sets['NR3C1 Targets']).intersection(tf_gene_sets['NR1I2 Targets']).intersection(nr0b1_up_genes))

# NR1I2 down vs targets
print('NR1I2 down')
print(tf_gene_sets['NR0B1 Targets'].intersection(tf_gene_sets['NR3C1 Targets']).intersection(tf_gene_sets['NR1I2 Targets']).intersection(nr1i2_down_genes))
# NR1I2 up vs targets
print('NR1I2 up')
print(tf_gene_sets['NR0B1 Targets'].intersection(tf_gene_sets['NR3C1 Targets']).intersection(tf_gene_sets['NR1I2 Targets']).intersection(nr1i2_up_genes))
Dex down
{'IL6ST', 'NRP1'}
Dex up
set()
NR0B1 down
{'NFKBIA'}
NR0B1 up
set()
NR1I2 down
set()
NR1I2 up
{'FGF1'}

Overlap between all three target gene sets

fig = plt.figure(figsize=(8,8))
fig.set_facecolor('white')
venn3(
  [
    tf_gene_sets['NR3C1 Targets'],
    tf_gene_sets['NR1I2 Targets'],
    set(dex_up_genes)
  ],
  (
    'NR3C1 Targets \n(PC12, MCF10A, BEAS2B)', 
    'NR1I2 Targets \n(Liver)',
    'Dex Up Genes'
  ),
  set_colors=(
    color_map['NR3C1'], 
    color_map['NR1I2'],
    color_map['Dex Up Genes']
  ),
  alpha=0.4
)
<matplotlib_venn._common.VennDiagram at 0x7fb0f273a1c0>
View source code
Submit an issue