maayanlab_bioinformatics.parse package

Submodules

maayanlab_bioinformatics.parse.gmt module

maayanlab_bioinformatics.parse.gmt.gmt_read_dict(fh, parse_gene=<function parse_gene_weight>)[source]

Read .gmt files into a dictionary of the form: { ‘term_1 term_2’: { gene_1: weight or 1, … }, … }

If your genes are encoded in a weird way you can also provide your own parse_gene function, the current one supports just gene names or gene names with weights separated by non-word/numeric characters.

maayanlab_bioinformatics.parse.gmt.gmt_read_iter(fh, parse_gene=<function parse_gene_weight>)[source]
maayanlab_bioinformatics.parse.gmt.gmt_read_pd(fh, parse_gene=<function parse_gene_weight>)[source]

Read .gmt files directly into a data frame.

maayanlab_bioinformatics.parse.gmt.gmt_write_dict(gmt, fh, serialize_gene_weight_pair=<function _serialize_gene_weight_pair>)[source]

Opposite of gmt_read_dict, write a dictionary to a file pointer serialize_gene_weight_pair can be used to customize serialization when dealing with weights.

  • it should return the serialized gene,weight pair or None if it should be removed

By default, 0/nans are dropped, 1s result in a gene (crisp), and everything else uses gene,weight.

maayanlab_bioinformatics.parse.gmt.gmt_write_pd(df, fh, serialize_gene_weight_pair=<function _serialize_gene_weight_pair>)[source]

Write a pandas dataframe as a gmt, where rows are genes and columns are terms. See gmt_write_dict for more information.

maayanlab_bioinformatics.parse.gmt.parse_gene_unweighted(gene)[source]

A helper to parse the gmt unweighted

maayanlab_bioinformatics.parse.gmt.parse_gene_weight(gene)[source]

A helper to parse the gmt potentially with numeric weights

maayanlab_bioinformatics.parse.suerat module

maayanlab_bioinformatics.parse.suerat.suerat_load(base_dir)[source]

Files prepared for suerat are quite common, this function will load them given the directory that contains barcodes.tsv.gz, features.tsv.gz, and matrix.tsv.gz.

maayanlab_bioinformatics.parse.suerat.suerat_load_multiple(base_dirs)[source]

Sets of suerat directories that are meant to be analyzed together are quite common, providing all those directories to this function (much like load_suerat_files) will load each individually and return a merged version that captures the filename in the barcodes.

Module contents

This module contains functions relating to file parsing into easier to ready-to-go formats.