hist utils
A collection of useful basic functions for processing histograms.
Functionality includes: - rebinning, cropping and normalization - moment calculation - averaging over neighbouring histograms - smoothing over neighbouring bins - higher-level functions preparing data for ML training, starting from a dataframe or input csv file.
crophists
full signature:
def crophists(hists, slices=None)
comments:
perform cropping on a set of histograms
input arguments:
- hists: a numpy array of shape (nhistograms,nbins) for 1D or (nhistograms,nybins,nxbins) for 2D
- slices: a slice object (builtin python type) or a list of two slices (for 2D)
notes:
- a slice can be created using the builtin python syntax 'slice(start,stop,step)',
and the syntax 'list[slice]' is equivalent to 'list[start:stop:step]'.
use 'None' to ignore one of the arguments for slice creation (equivalent to ':' in direct slicing)
- for 1D histograms, slices can be either a slice object or a list of length 1 containing a single slice.
example usage:
- see tutorials/plot_histograms_2d.ipynb
returns:
- a numpy array containing the same histograms as input but cropped according to the slices argument
get_cropslices_from_str
full signature:
def get_cropslices_from_str(slicestr)
comments:
get a collection of slices from a string (e.g. argument in gui)
note: the resulting slices are typically passed to crophists (see above)
input arguments:
- slicestr: string representation of slices
e.g. '0:6:2' for slice(0,6,2)
e.g. '0:6:2,1:5:2' for [slice(0,6,2),slice(1,5,2)]
rebinhists
full signature:
def rebinhists(hists, factor=None)
comments:
perform rebinning on a set of histograms
input arguments:
- hists: a numpy array of shape (nhistograms,nbins) for 1D or (nhistograms,nybins,nxbins) for 2D
- factor: the rebinning factor (for 1D), or a tuple of (y axis rebinning factor, x axis rebinning factor) (for 2D)
note: the rebinning applied here is simple summing of bin contents,
and the rebinning factors must be divisors of the respective number of bins!
example usage:
- see tutorials/plot_histograms_2d.ipynb
returns:
- a numpy array containing the same histograms as input but rebinned according to the factor argument
get_rebinningfactor_from_str
full signature:
def get_rebinningfactor_from_str(factstr)
comments:
get a valid rebinning factor (int or tuple) from a string (e.g. argument in gui)
note: the resulting factor is typically passed to rebinhists (see above)
input arguments:
- factstr: string representation of rebinning factor
e.g. '4' for 4 (for 1D histograms)
e.g. '4,4' for (4,4) (for 2D histograms)
normalizehists
full signature:
def normalizehists(hists)
comments:
perform normalization on a set of histograms
note:
- for 1D histograms, the sum of bin contents is set equal one for each histogram
- for 2D histograms, the bin contents are scaled so the maximum is 1 for each histogram
- maybe later make more flexible by adding normalization stragy as argument...
input arguments:
- hists: a numpy array of shape (nhistograms,nbins) for 1D or (nhistograms,nybins,nxbins) for 2D
returns:
- a numpy array containing the same histograms as input but normalized
averagehists
full signature:
def averagehists(hists, nout=None)
comments:
partition a set of histograms into equal parts and take the average histogram of each part
input arguments:
- hists: a numpy array of shape (nhistograms,nbins) for 1D or (nhistograms,nybins,nxbins) for 2D
- nout: number of partitions, i.e. number of output histograms
note: nout=1 corresponds to simply taking the average of all histograms in hists.
note: if nout is negative or if nout is larger than number of input histograms, the original set of histograms is returned.
returns:
- a numpy array of shape (nout,nbins)
running_average_hists
full signature:
def running_average_hists(hists, window=None, weights=None)
comments:
replace each histogram in a collection of histograms by its running average
input arguments:
- hists: a numpy array of shape (nhistograms,nbins) for 1D or (nhistograms,nybins,nxbins) for 2D
- window: number of histograms to consider for the averaging
if window is an integer, it is the number of previous histograms in hists used for averaging
(so window=0 would correspond to no averaging)
if window is a tuple, it corresponds to (nprevious,nnext), and the nprevious previous and nnext next histograms in hists are used for averaging
(so window=(0,0) would correspond to no averaging)
- weights: a list or numpy array containing the relative weights of the histograms in the averaging procedure.
note: the weights can be any number, but they will be normalized to have unit sum.
note: weights must have length nwindow+1 or nprevious+1+nnext.
note: the default behaviour is a uniform array with values 1./(window+1) (or 1./(nprevious+1+nnext))
returns:
- a numpy array with same shape as input but where each histogram is replaced by its running average
notes:
- at the edges, the weights are cropped to match the input array and renormalized
- this function will throw an error when the length of the set of histograms is smaller than the total window length,
maybe extend later (although this is not normally needed)
select_random
full signature:
def select_random(hists, nselect=10)
comments:
select nselect random examples from a set of histograms
input arguments:
- hists: a numpy array of shape (nhistograms, nbins) for 1D
or (nhistograms, nybins, nxbins) for 2D.
- nselect: number of random instances to draw
smoothhists
full signature:
def smoothhists(hists, halfwindow=None, weights=None)
comments:
perform histogram smoothing by averaging over neighbouring bins
input arguments:
- hists: a numpy array of shape (nhistograms, nbins) for 1D
or (nhistograms, nybins, nxbins) for 2D.
- halfwindow: number of bins to consider for the averaging;
for 1D histograms, must be an int, corresponding to the number of bins
before and after the current bin to average over;
for 2D histograms, must be a tuple of (halfwindow_y, halfwindow_x).
- weights: numpy array containing the relative weights of the bins for the averaging;
for 1D histograms, must have length 2*halfwindow+1;
for 2D histograms, must have shape (2*halfwindow_y+1, 2*halfwindow_x+1).
note: the weights can be any number, but they will be normalized to have unit sum.
note: the default behaviour is a uniform array
returns:
- a numpy array with same shape as input but where each histogram is replaced
by its smoothed version
get_smoothinghalfwindow_from_str
full signature:
def get_smoothinghalfwindow_from_str(windowstr)
comments:
get a valid smoothing half window (int or tuple) from a string (e.g. argument in gui)
note: the resulting factor is typically passed to smoothhists (see above)
input arguments:
- windowstr: string representation of smoothing window
e.g. '4' for 4 (for 1D histograms)
e.g. '4,4' for (4,4) (for 2D histograms)
moment
full signature:
def moment(bins, counts, order)
comments:
get n-th central moment of a histogram
input arguments:
- bins: a 1D or 2D np array holding the bin centers
(shape (nbins) or (nhistograms,nbins))
- counts: a 2D np array containing the bin counts
(shape (nhistograms,nbins))
- order: the order of the moment to calculate
(0 = maximum value, 1 = mean value)
returns:
- an array of shape (nhistograms) holding the requested moment per histogram
notes:
- for now only 1D histograms are supported!
histmean
full signature:
def histmean(bins, counts)
comments:
special case of moment calculation (with order=1)
histrms
full signature:
def histrms(bins, counts)
comments:
special case of moment calculation
histmoments
full signature:
def histmoments(bins, counts, orders)
comments:
apply moment calculation for a list of orders
input arguments:
- see function moment(bins, counts, order),
the only difference being that orders is a list instead of a single number
returns:
- a numpy array of shape (nhistograms,nmoments)
preparedatafromnpy
full signature:
def preparedatafromnpy(dataname, cropslices=None, rebinningfactor=None, smoothinghalfwindow=None, smoothingweights=None, averagewindow=None, averageweights=None, donormalize=True, doplot=False)
comments:
read a .npy file and output the histograms
input arguments:
- see e.g. preparedatafromdf
notes:
- not yet tested for 2D histograms, but is expected to work...
preparedatafromdf
full signature:
def preparedatafromdf(df, returnrunls=False, cropslices=None, rebinningfactor=None, smoothinghalfwindow=None, smoothingweights=None, averagewindow=None, averageweights=None, donormalize=False, doplot=False)
comments:
prepare the data contained in a dataframe in the form of a numpy array
input arguments:
- returnrunls: boolean whether to return a tuple of
(histograms, run numbers, lumisection numbers).
(default: return only histograms)
- cropslices: list of slices (one per dimension) by which to crop the historams
(default: no cropping)
- rebinningfactor: an integer (or tuple of integers for 2D histograms)
to downsample/rebin the histograms (default: no rebinning)
- smoothinghalfwindow: int or tuple (for 1D/2D histograms) used for smoothing the histograms
- smoothingweights: 1D or 2D array (for 1D/2D histograms) with weights for smoothing
- donormalize: boolean whether to normalize the data
- doplot: if True, some example plots are made showing the histograms
preparedatafromcsv
full signature:
def preparedatafromcsv(dataname, returnrunls=False, cropslices=None, rebinningfactor=None, smoothinghalfwindow=None, smoothingweights=None, averagewindow=None, averageweights=None, donormalize=True, doplot=False)
comments:
prepare the data contained in a dataframe csv file in the form of a numpy array
input arguments:
- returnrunls: boolean whether to return a tuple of (histograms, run numbers, lumisection numbers).
(default: return only histograms)
- cropslices: list of slices (one per dimension) by which to crop the historams
(default: no cropping)
- rebinningfactor: an integer (or tuple of integers for 2D histograms)
to downsample/rebin the histograms (default: no rebinning)
- smoothinghalfwindow: int or tuple (for 1D/2D histograms) used for smoothing the histograms
- smoothingweights: 1D or 2D array (for 1D/2D histograms) with weights for smoothing
- donormalize: boolean whether to normalize the data
- doplot: if True, some example plots are made showing the histograms