maayanlab_bioinformatics.utils package

Submodules

maayanlab_bioinformatics.utils.chunked module

Chunked module has useful helper functions for manipulating ndarrays in chunks, this is especially useful when working with h5py matrices since operations which respect chunk boundaries avoid excessive disk random access.

maayanlab_bioinformatics.utils.chunked.chunk_applymap(func, x, *, out=None, chunks=None, progress=False)[source]

Apply function to all elements in a matrix in chunks

Parameters:
  • func – The function to apply to each chunk

  • x – The matrix to apply it to

  • out – The matrix to write to (pass variable to out for inplace)

  • chunks – The shape of the chunks in each dimension,

can be inferred for h5py arrays based on actual chunks on disk, can be a multiple of an integer value of chunks. :param progress: Show tqdm progress bar or not

Returns:

The augmented matrix (or the original matrix, augmented)

maayanlab_bioinformatics.utils.chunked.chunk_infer(x, chunks=None)[source]

Helper function for interpreting the chunks param with respect to a matrix x

Parameters:
  • x – The matrix (ndarray)

  • chunks – The chunks parameter,

if None (default): Try to infer from chunks attribute (h5py) if int: Use a multiple of the inferred chunks attribute, or alternatively that size in each dimension if tuple: Use the explicit chunks provided for slicing

Returns:

tuple chunks parameter

maayanlab_bioinformatics.utils.chunked.chunk_slices(shape, chunks, progress=False)[source]

Return slices to chunk through an ndarray.

Parameters:
  • shape – The shape of the ndarray or size in 1d.

  • chunks – The shape of the chunks or size in all dimensions.

  • progress – Show tqdm progress bar or not

Returns:

Iterator[slice(start, stop) for each dimension in shape]

Usage: N = np.arange(10) [N[s] for s in chunk_slices(len(N), 3)]

I = np.eye(10) [I[i, j] for i, j in chunk_slices(I.shape, 3)]

maayanlab_bioinformatics.utils.chunked.tqdm(it, **kwargs)

maayanlab_bioinformatics.utils.describe module

Descriptive statistics on things that aren’t pandas data frames. This can often be a lot more efficient.

maayanlab_bioinformatics.utils.describe.np_describe(x, axis=0, *, percentiles=[25, 50, 75]) Dict[str, array][source]

Like pandas Series.describe() but operating on numpy arrays / matrices. This can be a lot faster especially when working with h5py or sparse data frames.

Params x:

The numpy array to describe

Params axis:

The axis for which to perform describe against

Returns:

A dictionary mapping metric name to results

maayanlab_bioinformatics.utils.fetch_save_read module

maayanlab_bioinformatics.utils.fetch_save_read.fetch_save_read(url, file, reader=<function read_csv>, sep=', ', **kwargs)[source]

Download file from {url}, save it to {file}, and subsequently read it with {reader} using pandas options on {**kwargs}.

maayanlab_bioinformatics.utils.merge module

maayanlab_bioinformatics.utils.merge.merge(*dfs, **kwargs)[source]

Helper function for many trivial (index based) joins Deprecated: Use pd.concat([dfs], axis=1) instead

maayanlab_bioinformatics.utils.sparse module

maayanlab_bioinformatics.utils.sparse.sp_hdf_dump(hdf, sdf, **kwargs)[source]

Dump Sparse Pandas DataFrame to h5py object.

Usage:

import h5py
import pandas as pd
import scipy.sparse as sp_sparse

# write
f = h5py.File('sparse.h5', 'w')
sdf = pd.DataFrame.sparse.from_spmatrix(sp_sparse.eye(3))
sp_hdf_dump(f, sdf)
f.close()
maayanlab_bioinformatics.utils.sparse.sp_hdf_load(hdf)[source]

Load Sparse Pandas DataFrame from h5py object.

Usage:

import h5py
import pandas as pd
import scipy.sparse as sp_sparse

f = h5py.File('sparse.h5', 'r')
sdf = sp_hdf_load(f)
f.close()
maayanlab_bioinformatics.utils.sparse.sp_nanpercentile(sp, q, axis=None, method='linear')[source]

nanpercentile for a sparse matrix, basically we use np.percentile on the underlying data.

maayanlab_bioinformatics.utils.sparse.sp_std(X_ij, ddof=1)[source]

Standard deviation for a matrix compatible with sparse matrices. i is the row index, j is the column index.

sigma_j = sqrt{ rac{sum(x_ij - mu_j)^2}{N_j - ddof}}}

Module contents

This module contains general utility functions for convenient analysis