maayanlab_bioinformatics.utils package¶
Submodules¶
maayanlab_bioinformatics.utils.chunked module¶
Chunked module has useful helper functions for manipulating ndarrays in chunks, this is especially useful when working with h5py matrices since operations which respect chunk boundaries avoid excessive disk random access.
- maayanlab_bioinformatics.utils.chunked.chunk_applymap(func, x, *, out=None, chunks=None, progress=False)[source]¶
Apply function to all elements in a matrix in chunks
- Parameters:
func – The function to apply to each chunk
x – The matrix to apply it to
out – The matrix to write to (pass variable to out for inplace)
chunks – The shape of the chunks in each dimension,
can be inferred for h5py arrays based on actual chunks on disk, can be a multiple of an integer value of chunks. :param progress: Show tqdm progress bar or not
- Returns:
The augmented matrix (or the original matrix, augmented)
- maayanlab_bioinformatics.utils.chunked.chunk_infer(x, chunks=None)[source]¶
Helper function for interpreting the chunks param with respect to a matrix x
- Parameters:
x – The matrix (ndarray)
chunks – The chunks parameter,
if None (default): Try to infer from chunks attribute (h5py) if int: Use a multiple of the inferred chunks attribute, or alternatively that size in each dimension if tuple: Use the explicit chunks provided for slicing
- Returns:
tuple chunks parameter
- maayanlab_bioinformatics.utils.chunked.chunk_slices(shape, chunks, progress=False)[source]¶
Return slices to chunk through an ndarray.
- Parameters:
shape – The shape of the ndarray or size in 1d.
chunks – The shape of the chunks or size in all dimensions.
progress – Show tqdm progress bar or not
- Returns:
Iterator[slice(start, stop) for each dimension in shape]
Usage: N = np.arange(10) [N[s] for s in chunk_slices(len(N), 3)]
I = np.eye(10) [I[i, j] for i, j in chunk_slices(I.shape, 3)]
- maayanlab_bioinformatics.utils.chunked.tqdm(it, **kwargs)¶
maayanlab_bioinformatics.utils.describe module¶
Descriptive statistics on things that aren’t pandas data frames. This can often be a lot more efficient.
- maayanlab_bioinformatics.utils.describe.np_describe(x, axis=0, *, percentiles=[25, 50, 75]) Dict[str, array] [source]¶
Like pandas Series.describe() but operating on numpy arrays / matrices. This can be a lot faster especially when working with h5py or sparse data frames.
- Params x:
The numpy array to describe
- Params axis:
The axis for which to perform describe against
- Returns:
A dictionary mapping metric name to results
maayanlab_bioinformatics.utils.fetch_save_read module¶
maayanlab_bioinformatics.utils.merge module¶
maayanlab_bioinformatics.utils.sparse module¶
- maayanlab_bioinformatics.utils.sparse.sp_hdf_dump(hdf, sdf, **kwargs)[source]¶
Dump Sparse Pandas DataFrame to h5py object.
Usage:
import h5py import pandas as pd import scipy.sparse as sp_sparse # write f = h5py.File('sparse.h5', 'w') sdf = pd.DataFrame.sparse.from_spmatrix(sp_sparse.eye(3)) sp_hdf_dump(f, sdf) f.close()
- maayanlab_bioinformatics.utils.sparse.sp_hdf_load(hdf)[source]¶
Load Sparse Pandas DataFrame from h5py object.
Usage:
import h5py import pandas as pd import scipy.sparse as sp_sparse f = h5py.File('sparse.h5', 'r') sdf = sp_hdf_load(f) f.close()
Module contents¶
This module contains general utility functions for convenient analysis