HDF5 Data Storage



The Problem

Projects manipulating startle reflex waveforms in USF Research Computing needed a central repository to store, classify and retrieve their data in a time-efficient and organized fashion. Prior to the implementation of a structured system, every new generation of students would expand the ever-growing collection of data in a way that made it incredibly difficult to locate the exact information needed in future occasions.

HDF5 as a Solution

It was determined that storing the processed waveform data in Hierarchical Data Format 5 (HDF5) files would achieve the goal the research faculty had in mind. HDF5 is a high-performance library and file format to manage, process and store large, complex and heterogeneous data. HDF5 is widely used in large scale data science projects due to the flexibility, robustness, and efficiency of the format. In a nutshell, its high-level interface allows for the creation of a tree of groups, and datasets (each with their own metadata) for fast storage and retrieval.

I developed a set of functions that allowed the research team to harness the power of HDF5 to store and manage their waveform data. Switching to HDF5 allowed the team to build a central file with all the data they would need for their projects. This HDF5 file facilitates the retrieval of specific sections of data in only a fraction of time it would have normally taken using a CSV file.

Here's a snippet of one of the functions: