Using listener-based perceptual features as intermediate representations in music information retrieval

J Acoust Soc Am. 2014 Oct;136(4):1951-63. doi: 10.1121/1.4892767.

Abstract

The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acoustic Stimulation
  • Acoustics
  • Adolescent
  • Adult
  • Artificial Intelligence
  • Auditory Perception*
  • Emotions*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Models, Theoretical
  • Music*
  • Observer Variation
  • Pitch Perception
  • Psychoacoustics
  • Reproducibility of Results
  • Signal Processing, Computer-Assisted
  • Sound Spectrography
  • Young Adult