GenreClassification

Download Report

Transcript GenreClassification

Automatic Classification of
Music Genre
Cory McKay
Introduction
• Genre classifications can be of great utility
to musical information retrieval systems
• Genre is a natural way of classifying music
• Genre is intrinsically built on the
similarities between pieces of the same
genre and differences between pieces of
different genres
Introduction
• Currently no widely accepted automatic
genre identification system
• Most genre annotation done by hand
• An automated genre recognition system
would make it possible to classify and
search large electronic music libraries
Feature Extraction
• Genre is characterized by common features
of pieces belonging to it:
–
–
–
–
–
–
Instrumentation
Texture
Dynamics
Rhythmic characteristics
Melodic gestures
Harmonic content
Feature Extraction
• Not always clear which features are the
most relevant
• Features can be difficult to extract
• First challenge of genre classification is to
overcome these problems
Complementary Research
• Large existing body of research in speech
recognition and classification systems
• Can use techniques relating to extraction of
timbral texture features
• Can also make use of existing systems that
can distinguish between musical, speech
and environmental signals
Complementary Research
• Existing beat-tracking systems can prove useful
• Many beat-trackers provide only an estimate of
the main beat and its strength
• More detailed information needed for genre
classification:
–
–
–
–
–
Overall meter
Syncopation
Rubato
Recurring rhythmic gestures
Relative strengths of beats and sub-beats
Pattern Recognition
• Once features have been extracted, then
need to perform classification
• Existing general-purposed machine-learning
and heuristic-based techniques that can be
adapted
Defining a Taxonomy
• How do we define the taxonomy that pieces will
be classified into?
– Different people may classify the same piece differently
– Selections can be made from entirely different genre
domains
– Different people emphasize different features
– Often overlap between genres
– How are different genres related?
• Lack of universally agreed upon definitions of
genres makes it difficult to find appropriate
heuristics for defining genre
Pachet and Cazaly (2000)
• Observe that the taxonomies currently used
by the music industry are inconsistent
• Are therefore inappropriate for the purposes
of developing a global music database
Pachet and Cazaly (2000)
• Retailers use a four-level hierarchy:
–
–
–
–
Global music categories (e.g. classical, jazz, rock)
Sub-categories (e.g. operas, Dixieland, heavy metal)
Artists
Albums
• Different levels represent different dimensions
• Different retailers use different sets of genres and
sometimes classify the same recording differently
Pachet and Cazaly (2000)
• Copyright companies base taxonomies on
commercial configurations and audience
demographics rather than on characteristics
of music itself
• Internet companies such as Amazon.com
tend to build tree-like classification systems
– very broad categories near the root level
– very specialized categories at the leafs
Pachet and Cazaly (2000)
– Companies differ greatly on how deep the subcategories go for different global styles of music
– Many of these genres are poorly defined and interpreted
differently by different companies
– Lack of consistency in the relation between a parent
and a child
• Sometimes genealogical (e.g. rock -> hard rock)
• Sometimes geographical (e.g. Africa -> Algeria)
• Sometimes based on historical periods (e.g. Baroque ->
Baroque Violin Concertos)
Pachet and Cazaly (2000)
• These inconsistencies not significant for
people manually browsing through
catalogues
• Inconsistencies are problematic for
automatic classification systems.
Pachet and Cazaly (2000)
• Suggest building an entirely new
classification system
• Goals of taxonomy:
– Objective
– Consistent
– Independent from other metadatabase
descriptor
– Supports searhes by similarity
Pachet and Cazaly (2000)
• Suggest a tree-based system
• Only leaves contain musical pieces
• Each node contains the genealogical parent
of the genre and the differences between
that node and its parent.
My Evaluation
• Valid concerns about existing taxonomies
• Proposed solution:
–
–
–
–
How achieve an objective classification system?
Hard to get people to agree to a standard
New genres are constantly emerging
Does not solve the problem of fuzzy boundaries
between genres
– Does not deal with the problem of multiple parents
which can compromise the tree structure
Implementations
• Actual implementations have sidestepped
this issue by limiting their testing to only a
few simple classifications
• Acceptable approach in the early stages of
development
• Problem of taxonomy structure will need to
be carefully considered for systems that
hope to scale to real-world applications
Tzanetakis, Essl & Cook (2001)
• Cite a study indicating that humans can often
classify genre after hearing only 250 ms of a
signal
• Should therefore be possible to make classification
system that does not consider musical form or
structure
• Implies that real-time analysis of genre could be
easier to implement than might be thought
Tzanetakis, Essl & Cook (2001)
• Developed two GUI-based systems
– GenreGram
• Developed for real-time radio broadcasts
• Displays bouncing cylinders
– GenreSpace
• Provides a 3-D representation of genre space
• Maps each recording to a point based on its three most
distinguishing features
• Meant to be used for comparing large collections of recordings
Tzanetakis & Cook (2002)
• Further develops ideas
• Most influential implementation to date
• Proposes using three classes of features:
– Timbral texture
– Rhythmic content
– Pitch content
Tzanetakis & Cook (2002)
• Timbral texture features:
–
–
–
–
–
–
Means and variances of spectral centroid
Rolloff
Flux
Zero-crossings over the texture window
Low energy
Means and variances of the first five melfrequency Cepstral coefficients
Tzanetakis & Cook (2002)
• Rhythmic content
features:
– “Beat histogram”
– Each bin of the
histograms consists of
a beats-per-minute
level
– Can see the relative
strengths of different
beats and sub-beats
Tzanetakis & Cook (2002)
• Pitch content features:
– Used 3 pitch histograms
– Each bin of these corresponded to a given pitch
• 1) a bin for each MIDI pitch
• 2) pitches of the same chroma in a single bin
• 3) reordered the bins so that adjacent bins were separated by a
5th rather than a semi-tone.
– Histograms used to extract features that could be used
to compare traits such as pitch variation, strength of the
tonic-dominant relationship, range and harmonic
complexity
Tzanetakis & Cook (2002)
• Histogram approach provided a great deal
of useful data
• Disadvantage: lose all information relating
to the order that musical events occurred
• Presence of recurring phrases, for example,
could only be stored in a diluted form by
histograms
Tzanetakis & Cook (2002)
• Used a variety of statistical pattern
recognition (SPR) classifiers to process
features
• SPR classifiers attempt to estimate the
probability density function for the feature
vectors of each genre
• Classifiers trained to distinguish between 20
musical genres and 3 speech genres
Tzanetakis & Cook (2002)
Tzanetakis & Cook (2002)
• Used real audio recordings
• Correctly distinguished between 10 genres
61% of the time
• Comparable to human rates
• Collection of more features and inclusion of
a larger number of specialized genres could
improve performance
Koshina (2002)
• Constructed somewhat similar system
(MUGRAT)
• Achieved a success rate of 82%
• Only attempted to distinguish between
metal, dance and classical music
• Excellent overview of background
information
Grimaldi et al. (2003)
• Used discrete wavelet transform to extract
time and frequency features
• 64 time features and 79 frequency features
• This is a more than Tzanetakis, but details
not specified
• Used an enemble of binary classifiers to
perform classification operation
• Each trained on a pair of genres
Grimaldi et al. (2003)
• Final classification is arrived at through a
vote of the classifiers
• Tzanetakis, in contrast, used single
classifiers that processed all features for all
genres
• Success rate of 82%
• Only four categories were used
Aucouturier and Pachet (2003)
• Define 3 categories of genre classification
• Manual approach
– Manual entry is unfeasible because of huge
number of titles that need to be entered
– Should use taxonomies based on artist rather
than title because taxonomies based on title
involve many more entries and result in
categories that are overly narrow and have
contrived boundaries
Aucouturier and Pachet (2003)
• Prescriptive approach
– automatic process that involves a two-step process:
• frame-based feature extraction
• machine learning/classification
– Tzanetakis’s and Cook’s system is prescriptive
– Assumes an existing adequate taxonomy that is
contrived and non-scalable
– Difficult to find truly representative training samples
Aucouturier and Pachet (2003)
• Emergent approach
– Rather than using existing taxonomies, like prescriptive
approach, attempts to emerge classifications according
to some measure of similarity
– Can use similarity measurements based on audio
signals
– Can also use cultural similarity gleaned from
application of data mining techniques to text documents
• Use collaborative filtering to search for similarities in the taste
profiles of different individuals
• Use co-occurrence analysis on the play lists of different radio
programs and CD compilation albums
My Evaluation
• Valid concerns regarding prescriptive
systems
• Emergent system as described has yet to be
successfully applied to music
• Remains to be seen which approach is best
• Exploiting information such as text
documents to generate genre profiles is
interesting
THE END