Introduction to Audio Content Analysis - Alexander Lerch

Blick ins Buch

Introduction to Audio Content Analysis (eBook)

Music Information Retrieval Tasks and Applications

Alexander Lerch (Autor)

eBook Download: EPUB

2022 | 2. Auflage
464 Seiten
Wiley (Verlag)
978-1-119-89097-3 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

An Introduction to Audio Content Analysis

Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches

An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation.

To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website.

Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include:

Digital audio signals and their representation, common time-frequency transforms, audio features
Pitch and fundamental frequency detection, key and chord
Representation of dynamics in music and intensity-related features
Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment
Audio fingerprinting, musical genre, mood, and instrument classification

An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.

Alexander Lerch, PhD, is an Associate Professor at the Center for Music Technology, Georgia Institute of Technology. His research focuses on signal processing and machine learning applied to music, an interdisciplinary field commonly referred to as music information retrieval. He has authored more than 50 peer-reviewed publications and his website, www.AudioContentAnalysis.org, is a popular resource on Audio Content Analysis, providing video lectures, code examples, and other materials.

An Introduction to Audio Content Analysis Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation. To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website. Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include: Digital audio signals and their representation, common time-frequency transforms, audio features Pitch and fundamental frequency detection, key and chord Representation of dynamics in music and intensity-related features Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment Audio fingerprinting, musical genre, mood, and instrument classification An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.

Alexander Lerch, PhD, is an Associate Professor at the Center for Music Technology, Georgia Institute of Technology. His research focuses on signal processing and machine learning applied to music, an interdisciplinary field commonly referred to as music information retrieval. He has authored more than 50 peer-reviewed publications and his website, www.AudioContentAnalysis.org, is a popular resource on Audio Content Analysis, providing video lectures, code examples, and other materials.

Author Biography xvii

Preface xix

Acronyms xxi

List of Symbols xxv

Source Code Repositories xxix

1 Introduction 1

Part I Fundamentals of Audio Content Analysis 9

2 Analysis of Audio Signals 11

3 Input Representation 17

4 Inference 91

5 Data 107

Part II Music Transcription 127

7 Tonal Analysis 129

8 Intensity217

9 Temporal Analysis 229

10 Alignment 281

Part III Music Identification, Classification, and Assessment 303

11 Audio Fingerprinting 305

12 Music Similarity Detection and Music Genre Classification 317

13 Mood Recognition 337

14 Musical Instrument Recognition 347

15 Music Performance Assessment 355

Part IV Appendices 365

Appendix A Fundamentals 367

Appendix B Fourier Transform 385

Appendix C Principal Component Analysis 405

Appendix D Linear Regression 409

Appendix E Software for Audio Analysis 411

Appendix F Datasets 417

Index 425

1
Introduction

Audio is an integral and ubiquitous aspect of our daily lives; we intentionally produce sound (e.g. when communicating through speech or playing an instrument), we actively listen (e.g. to music or podcasts), can focus on a specific sound source in a mixture of sources, and we (even unconsciously) suppress sound sources internally (e.g. traffic noise). Similar to humans, algorithms can also generate, analyze, and process audio. This book focuses on the algorithmic analysis of audio signals, more specifically the extraction of information from musical audio signals.

Audio signals contain a wealth of information: by simply listening to an audio signal, humans are able to infer a variety of content information. A speech signal, for example obviously transports the textual information, but it also might reveal information about the speaker (gender, age, accent, mood, etc.), the recording environment (e.g., indoors vs. outdoors), and much more. A music signal might allow us to derive melodic and harmonic characteristics, understand the musical structure, identify the instruments playing, perceive the projected emotion, categorize the music genre, and assess characteristics of the performance as well as the proficiency of the performers. An audio signal can contain and transport a wide variety of content beyond these simple examples. This content information is sometimes referred to as metadata: data about (audio) data.

The field of Audio Content Analysis (ACA) aims at designing and applying algorithms for the automatic extraction of content information from the raw (digital) audio signal. This enables content‐driven and content‐adaptive services which describe, categorize, sort, retrieve, segment, process, and visualize the signal and its content.

The wide range of possible audio sources and the multi‐faceted nature of audio signals results in variety of distinct ACA problems, leading to various areas of research, including

speech analysis, covering topics such as automatic speech recognition [1, 2] or recognizing emotion in speech [3, 4],
urban sound analysis with applications in noise pollution monitoring [5] and audio surveillance, i.e. the detection of dangerous events [6],
industrial sound analysis such as monitoring the state of mechanical devices like engines [7] or monitoring the health of livestock [8], and, last but not least,
musical audio analysis, targeting the understanding and extraction of musical parameters and properties from the audio signal [9].

This book focuses on the analysis of musical audio signals and the extraction of musical content from audio. There are many similarities and parallels to the areas above, but there exist also many differences that distinguish musical audio from other signals beyond simple technical properties such as audio bandwidth. Like an urban sound signal, music is a polytimbral mixture of multiple sound sources, but unlike urban sound, its sound sources are clearly related (e.g., melodically, harmonically, or rhythmically). Like a speech signal, a music signal is a sequence in a language with rules and constraints, but unlike speech, the musical language is abstract and has no singular meaning. Like an industrial audio signal, music has both tonal and noise‐like components which may repeat themselves, but unlike the industrial signal, it conveys a (musical) form based on hierarchical grouping of elements not only through repetition but also through precise variation of rhythmic, dynamic, tonal, and timbral elements.

As we will see throughout the chapters, the design of systems for the analysis of musical audio often requires knowledge and understanding from multiple disciplines. While this text approaches the topic mostly from an engineering and Digital Signal Processing (DSP) perspective, the proper formulation of research questions and task definitions often require methods or at least synthesis of knowledge from fields as diverse as music theory, music perception, and psychoacoustics. Researchers working on ACA thus come from various backgrounds such as computer science, engineering, psychology, and musicology.

The diversity in musical audio analysis is also exemplified by the wide variety of terms referring to it. Overall, musical ACA is situated in the broader area of Music Information Retrieval (MIR). MIR is a broader field that covers not only the analysis of musical audio but also symbolic (nonaudio) music formats such as musical scores and files or signals compliant to the so‐called Musical Instrument Digital Interface (MIDI) protocol [10]. MIR also covers the analysis and retrieval of information that is music‐related but cannot be (easily) extracted from the audio signal such as the artist names, user ratings, performance instructions in the score, or bibliographical information such as publisher, publishing date, the work's title. Other areas of research, such as music source separation and automatic music generation are often also considered to belong within MIR. Various overview articles clarify how the understanding of the field of MIR has evolved over time [11–16]. Other, related terms are also in use. Audio event detection, nowadays often related to urban sound analysis, is sometimes described as computational analysis of sound scenes [17]. The analysis of sound scenes from a perceptual point of view has been described as Computational Auditory Scene Analysis (CASA) [18]. In the past, other terms have been used more or less synonymously to the term “audio content” analysis. Examples of such synonyms are machine listening and computer audition. Finally, there is the term music informatics, which encompasses essentially any aspect of algorithmic analysis, synthesis, and processing of music (although in some circles, its meaning is restricted to describe the creation of musical artifacts with software).

1.1 A Short History of Audio Content Analysis

Historically, the first systems analyzing the content of audio signals appeared shortly after technology provided the means of storing and reproducing recordings on media in the twentieth century. One early example is Seashore's Tonoscope, which enabled the pitch analysis of an audio signal by visualizing the fundamental frequency of the incoming audio signal on a rotating drum [19]. However, the more recent evolution of digital storage media, DSP methods, and machine learning during the last decades, along with the growing amount of digital audio data available through downloads and streaming services, has significantly increased both the need and the possibilities of automatic systems for analyzing audio content, resulting in a lively and growing research field.

Early systems for audio analysis were frequently so‐called “expert systems” [20], designed by experts who implement their task‐specific knowledge into a set of rules. Such systems can be very successful if there is a clear and simple relation between the knowledge and the implemented algorithms. A good example for such systems are some of the pitch‐tracking approaches introduced in Section 7.3: as the goal is the detection of periodicity in the signal in a specific range, an approach such as the Autocorrelation Function (ACF), combined with multiple assumptions and constraints, can be used to estimate this periodicity and thus the fundamental frequency.

Later, data‐driven systems became increasingly popular and traditional machine learning approaches started to show superior performance on many tasks. These systems extract so‐called features from the audio to achieve a task‐dependent representation of the signal. Then, training data are used to build a model of the feature space and how it maps to the inferred outcome. The role of the expert becomes less influential in the design of these systems, as they are restricted to selecting or designing a fitting set of features, curating a representative set of data, and choosing and parametrizing the machine learning approach. One prototypical example for these approaches is musical genre classification as introduced in Section 12.2.

Modern machine learning approaches include trainable feature extraction approaches (also referred to as feature learning, see Section 3.6) as they are a part of deep neural networks. These approaches have consistently shown superior performance for nearly all ACA tasks. The researcher seldom imparts much domain knowledge beyond choosing input representation, data, and system architecture of these systems. An example of such an end‐to‐end system could be a music genre classification system based on a neural network with a convolutional architecture with a Mel Spectrogram input.

It should be pointed out that while modern systems tend to have superior performance, they also tend to be less interpretable and explainable than traditional systems. For example, deducing the reason for a false classification result in a network‐based system can be difficult, while the reason is usually easily identifiable in a rule‐based system.

1.2 Applications and Use Cases

The content extracted from music signals improves or enables...

Erscheint lt. Verlag	22.11.2022
Sprache	englisch
Themenwelt	Technik ► Elektrotechnik / Energietechnik
Schlagworte	Audio & Speech Processing & Broadcasting • Audio-, Sprachverarbeitung u. Übertragung • Audiotechnik • Electrical & Electronics Engineering • Elektrotechnik u. Elektronik • Intelligente Systeme • Intelligente Systeme u. Agenten • Intelligent Systems & Agents • Signal Processing • Signalverarbeitung
ISBN-10	1-119-89097-7 / 1119890977
ISBN-13	978-1-119-89097-3 / 9781119890973

Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)
Größe: 29,1 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.