maayanlab_bioinformatics.parse package¶
Submodules¶
maayanlab_bioinformatics.parse.gmt module¶
- maayanlab_bioinformatics.parse.gmt.gmt_read_dict(fh, parse_gene=<function parse_gene_weight>)[source]¶
Read .gmt files into a dictionary of the form: { ‘term_1 term_2’: { gene_1: weight or 1, … }, … }
If your genes are encoded in a weird way you can also provide your own
parse_gene
function, the current one supports just gene names or gene names with weights separated by non-word/numeric characters.
- maayanlab_bioinformatics.parse.gmt.gmt_read_iter(fh, parse_gene=<function parse_gene_weight>)[source]¶
- maayanlab_bioinformatics.parse.gmt.gmt_read_pd(fh, parse_gene=<function parse_gene_weight>)[source]¶
Read .gmt files directly into a data frame.
- maayanlab_bioinformatics.parse.gmt.gmt_write_dict(gmt, fh, serialize_gene_weight_pair=<function _serialize_gene_weight_pair>)[source]¶
Opposite of gmt_read_dict, write a dictionary to a file pointer serialize_gene_weight_pair can be used to customize serialization when dealing with weights.
it should return the serialized gene,weight pair or None if it should be removed
By default, 0/nans are dropped, 1s result in a gene (crisp), and everything else uses gene,weight.
- maayanlab_bioinformatics.parse.gmt.gmt_write_pd(df, fh, serialize_gene_weight_pair=<function _serialize_gene_weight_pair>)[source]¶
Write a pandas dataframe as a gmt, where rows are genes and columns are terms. See gmt_write_dict for more information.
maayanlab_bioinformatics.parse.suerat module¶
- maayanlab_bioinformatics.parse.suerat.suerat_load(base_dir)[source]¶
Files prepared for suerat are quite common, this function will load them given the directory that contains
barcodes.tsv.gz
,features.tsv.gz
, andmatrix.tsv.gz
.
- maayanlab_bioinformatics.parse.suerat.suerat_load_multiple(base_dirs)[source]¶
Sets of suerat directories that are meant to be analyzed together are quite common, providing all those directories to this function (much like load_suerat_files) will load each individually and return a merged version that captures the filename in the barcodes.
Module contents¶
This module contains functions relating to file parsing into easier to ready-to-go formats.