dataframe utils

A collection of useful basic functions for manipulating pandas dataframes.

Functionality includes (among others): - selecting DCS-bit on data or golden json data. - selecting specific runs, lumisections, or types of histograms


get_histnames

full signature:

def get_histnames(df)  

comments:

get a list of (unique) histogram names present in a df  
df is a dataframe read from an input csv file.  

select_histnames

full signature:

def select_histnames(df, histnames)  

comments:

keep only a subset of histograms in a df  
histnames is a list of histogram names to keep in the df.  

get_runs

full signature:

def get_runs(df)  

comments:

return a list of (unique) run numbers present in a df  
df is a dataframe read from an input csv file.  

select_runs

full signature:

def select_runs(df, runnbs)  

comments:

keep only a subset of runs in a df  
runnbs is a list of run numbers to keep in the df.  

get_ls

full signature:

def get_ls(df)  

comments:

return a list of ls numbers present in a df  
note that the numbers are not required to be unique!  
note: no check is done on the run number!  

select_ls

full signature:

def select_ls(df, lsnbs)  

comments:

keep only a subset of lumisection numbers in a df  
lsnbs is a list of lumisection numbers to keep in the df.  
note: no check is done on the run number!  

get_runsls

full signature:

def get_runsls(df)  

comments:

return a dictionary with runs and lumisections in a dataframe (same format as e.g. golden json)  

select_json

full signature:

def select_json(df, jsonfile)  

comments:

keep only lumisections that are in the given json file  

select_runsls

full signature:

def select_runsls(df, jsondict)  

comments:

equivalent to select_json but using a pre-loaded json dict instead of a json file on disk  

select_golden

full signature:

def select_golden(df)  

comments:

keep only golden lumisections in df  

select_notgolden

full signature:

def select_notgolden(df)  

comments:

keep all but golden lumisections in df  

select_dcson

full signature:

def select_dcson(df)  

comments:

keep only lumisections in df that have DCS-bit on  

select_dcsoff

full signature:

def select_dcsoff(df)  

comments:

keep only lumisections in df that have DCS-bit off  

select_pixelgood

full signature:

def select_pixelgood(df)  

comments:

keep only lumisections in df that are in good pixel json  

select_pixelbad

full signature:

def select_pixelbad(df)  

comments:

keep only lumisections in df that are in bad pixel json  

get_highstat

full signature:

def get_highstat(df, entries_to_bins_ratio=100)  

comments:

return a select object of runs and ls of histograms with high statistics  

select_highstat

full signature:

def select_highstat(df, entries_to_bins_ratio=100)  

comments:

keep only lumisection in df with high statistics  

get_hist_values

full signature:

def get_hist_values(df)  

comments:

same as builtin "df['histo'].values" but convert strings to np arrays  
input arguments:  
- df: a dataframe containing histograms (assumed to be of a single type!)  
note: this function works for both 1D and 2D histograms,  
      the distinction is made based on whether or not 'Ybins' is present as a column in the dataframe  
      update: 'Ybins' is also present for 1D histograms, but has value 1!  
output:  
a tuple containing the following elements:  
- np array of shape (nhists,nbins) (for 1D) or (nhists,nybins,nxbins) (for 2D)  
- np array of run numbers of length nhists  
- np array of lumisection numbers of length nhists  
warning: no check is done to assure that all histograms are of the same type!