dataframe utils
A collection of useful basic functions for manipulating pandas dataframes.
Functionality includes (among others): - selecting DCS-bit on data or golden json data. - selecting specific runs, lumisections, or types of histograms
get_histnames
full signature:
def get_histnames(df)
comments:
get a list of (unique) histogram names present in a df
df is a dataframe read from an input csv file.
select_histnames
full signature:
def select_histnames(df, histnames)
comments:
keep only a subset of histograms in a df
histnames is a list of histogram names to keep in the df.
get_runs
full signature:
def get_runs(df)
comments:
return a list of (unique) run numbers present in a df
df is a dataframe read from an input csv file.
select_runs
full signature:
def select_runs(df, runnbs)
comments:
keep only a subset of runs in a df
runnbs is a list of run numbers to keep in the df.
get_ls
full signature:
def get_ls(df)
comments:
return a list of ls numbers present in a df
note that the numbers are not required to be unique!
note: no check is done on the run number!
select_ls
full signature:
def select_ls(df, lsnbs)
comments:
keep only a subset of lumisection numbers in a df
lsnbs is a list of lumisection numbers to keep in the df.
note: no check is done on the run number!
get_runsls
full signature:
def get_runsls(df)
comments:
return a dictionary with runs and lumisections in a dataframe (same format as e.g. golden json)
select_json
full signature:
def select_json(df, jsonfile)
comments:
keep only lumisections that are in the given json file
select_runsls
full signature:
def select_runsls(df, jsondict)
comments:
equivalent to select_json but using a pre-loaded json dict instead of a json file on disk
select_golden
full signature:
def select_golden(df)
comments:
keep only golden lumisections in df
select_notgolden
full signature:
def select_notgolden(df)
comments:
keep all but golden lumisections in df
select_dcson
full signature:
def select_dcson(df)
comments:
keep only lumisections in df that have DCS-bit on
select_dcsoff
full signature:
def select_dcsoff(df)
comments:
keep only lumisections in df that have DCS-bit off
select_pixelgood
full signature:
def select_pixelgood(df)
comments:
keep only lumisections in df that are in good pixel json
select_pixelbad
full signature:
def select_pixelbad(df)
comments:
keep only lumisections in df that are in bad pixel json
get_highstat
full signature:
def get_highstat(df, entries_to_bins_ratio=100)
comments:
return a select object of runs and ls of histograms with high statistics
select_highstat
full signature:
def select_highstat(df, entries_to_bins_ratio=100)
comments:
keep only lumisection in df with high statistics
get_hist_values
full signature:
def get_hist_values(df)
comments:
same as builtin "df['histo'].values" but convert strings to np arrays
input arguments:
- df: a dataframe containing histograms (assumed to be of a single type!)
note: this function works for both 1D and 2D histograms,
the distinction is made based on whether or not 'Ybins' is present as a column in the dataframe
update: 'Ybins' is also present for 1D histograms, but has value 1!
output:
a tuple containing the following elements:
- np array of shape (nhists,nbins) (for 1D) or (nhists,nybins,nxbins) (for 2D)
- np array of run numbers of length nhists
- np array of lumisection numbers of length nhists
warning: no check is done to assure that all histograms are of the same type!