csv utils
A collection of useful basic functions for reading and processing the input csv files.
Functionality includes: - reading the raw input csv files and producing more manageable csv files (grouped per histogram type). - reading csv files into pandas dataframes and writing pandas dataframes back to csv files.
Note: the functionality of these utils has been absorbed into the DataLoader class, which is now the recommended way to read the data!
get_data_dirs
full signature:
def get_data_dirs(year='2017', eras=[], dim=1)
comments:
yield all data directories
note that the location of the data is hard-coded;
this function might break for newer or later reprocessings of the data.
- year is a string, either '2017' or '2018'
- era is a list containing a selection of era names
(default empty list = all eras)
- dim is either 1 or 2 (for 1D or 2D plots)
get_csv_files
full signature:
def get_csv_files(inputdir)
comments:
yields paths to all csv files in input directory
note that the output paths consist of input_dir/filename
this function is only meant for 1-level down searching,
i.e. the .csv files listed directly under input_dir.
sort_filenames
full signature:
def sort_filenames(filelist)
comments:
sort filenames in numerical order (e.g. 2 before 10)
note that the number is supposed to be in ..._<number>.<extension> format
read_csv
full signature:
def read_csv(csv_file)
comments:
read csv file into pandas dataframe
csv_file is the path to the csv file to be read
DEPRECATED, this function might be removed in the future;
use DataLoader.get_dataframe_from_file instead.
write_csv
full signature:
def write_csv(dataframe, csvfilename)
comments:
write a dataframe to a csv file
note: just a wrapper for builtin dataframe.to_csv
DEPRECATED, this function might be removed in the future;
use DataLoader.write_dataframe_to_file instead.
read_and_merge_csv
full signature:
def read_and_merge_csv(csv_files, histnames=[], runnbs=[])
comments:
read and merge list of csv files into a single df
csv_files is a list of paths to files to merge into a df
histnames is a list of the types of histograms to keep (default: all)
runnbs is a list of run numbers to keep (default: all)
DEPRECATED, this function might be removed in the future;
use DataLoader.get_dataframe_from_files instead.
write_skimmed_csv
full signature:
def write_skimmed_csv(histnames, year, eras=['all'], dim=1)
comments:
read all available data for a given year/era and make a file per histogram type
DEPRECATED, this function might be removed in the future;
see tutorial read_and_write_data.ipynb for equivalent functionality.
input arguments:
- histnames: list of histogram names for which to make a separate file
- year: data-taking year (in string format)
- eras: data-taking eras for which to make a separate file (in string format)
use 'all' to make a file with all eras merged, i.e. a full data taking year
- dim: dimension of histograms (1 or 2), needed to retrieve the correct folder containing input files
output:
- one csv file per year/era and per histogram type
note: this function can take quite a while to run!