dataframe utils

A collection of useful basic functions for manipulating pandas dataframes.

Functionality includes (among others): - selecting DCS-bit on data or golden json data. - selecting specific runs, lumisections, or types of histograms

get_histnames

full signature:

def get_histnames(df)

comments:

get a list of (unique) histogram names present in a df  
df is a dataframe read from an input csv file.

select_histnames

full signature:

def select_histnames(df, histnames)

comments:

keep only a subset of histograms in a df  
histnames is a list of histogram names to keep in the df.

get_runs

full signature:

def get_runs(df)

comments:

return a list of (unique) run numbers present in a df  
df is a dataframe read from an input csv file.

select_runs

full signature:

def select_runs(df, runnbs)

comments:

keep only a subset of runs in a df  
runnbs is a list of run numbers to keep in the df.

get_ls

full signature:

def get_ls(df)

comments:

return a list of ls numbers present in a df  
note that the numbers are not required to be unique!  
note: no check is done on the run number!

select_ls

full signature:

def select_ls(df, lsnbs)

comments:

keep only a subset of lumisection numbers in a df  
lsnbs is a list of lumisection numbers to keep in the df.  
note: no check is done on the run number!

get_runsls

full signature:

def get_runsls(df)

comments:

return a dictionary with runs and lumisections in a dataframe (same format as e.g. golden json)

select_json

full signature:

def select_json(df, jsonfile)

comments:

keep only lumisections that are in the given json file

select_runsls

full signature:

def select_runsls(df, jsondict)

comments:

equivalent to select_json but using a pre-loaded json dict instead of a json file on disk

select_golden

full signature:

def select_golden(df)

comments:

keep only golden lumisections in df

select_notgolden

full signature:

def select_notgolden(df)

comments:

keep all but golden lumisections in df

select_dcson

full signature:

def select_dcson(df)

comments:

keep only lumisections in df that have DCS-bit on

select_dcsoff

full signature:

def select_dcsoff(df)

comments:

keep only lumisections in df that have DCS-bit off

select_pixelgood

full signature:

def select_pixelgood(df)

comments:

keep only lumisections in df that are in good pixel json

select_pixelbad

full signature:

def select_pixelbad(df)

comments:

keep only lumisections in df that are in bad pixel json

get_highstat

full signature:

def get_highstat(df, entries_to_bins_ratio=100)

comments:

return a select object of runs and ls of histograms with high statistics

select_highstat

full signature:

def select_highstat(df, entries_to_bins_ratio=100)

comments:

keep only lumisection in df with high statistics

get_hist_values

full signature:

def get_hist_values(df)

comments:

same as builtin "df['histo'].values" but convert strings to np arrays  
input arguments:  
- df: a dataframe containing histograms (assumed to be of a single type!)  
note: this function works for both 1D and 2D histograms,  
      the distinction is made based on whether or not 'Ybins' is present as a column in the dataframe  
      update: 'Ybins' is also present for 1D histograms, but has value 1!  
output:  
a tuple containing the following elements:  
- np array of shape (nhists,nbins) (for 1D) or (nhists,nybins,nxbins) (for 2D)  
- np array of run numbers of length nhists  
- np array of lumisection numbers of length nhists  
warning: no check is done to assure that all histograms are of the same type!

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search