QuakeLabeler: a fast seismic data set creation and annotation toolbox for AI applications


The production and preparation of data sets are essential steps in machine learning (ML) applications. With the increasing volume and scale of available ML techniques in seismology, annotating seismograms or seismic features has become time consuming and tedious for many researchers. Furthermore, most methods train and validate on unique data subsets, which hampers independent performance evaluation and comparison. To address this problem, we have developed the software QuakeLabeler, an open‐source Python package to customize, build, and manage earthquake training data sets, including processing and visualization. QuakeLabeler has tight pipeline functions, which include retrieving seismograms from multiple online data centers, querying online human‐reviewed catalogs, signal processing, annotating (labeling), and analyzing data distribution. In addition, relevant statistical graphics and human‐readable output files can be generated. Various file export formats are supported, such as Seismic Analysis Code (.sac), mini Standard for Exchange of Earthquake Data (.mseed), NumPy (.npz), MATLAB (.mat), and the Hierarchical Data Format version 5 (.hdf5). This toolbox is packaged with an interactive command‐line interface. Three alternative running modes (beginner, advanced, and benchmark) are implemented, intended to offer specific data set solutions for different types of applications, that is, quick‐start recipes for simple ML solutions, advanced design for customized project training, and benchmark bulletins for model comparison.

Seismological Research Letters
Dr. Hao Mai
Data Scientist

Hao successfully defended his PhD in August 2023. His research interests include Machine learning, Software development, Earthquake detection.