Representation of Musical Information

Download Report

Transcript Representation of Musical Information

Representation of Musical Information
Donald Byrd
School of Music
Indiana University
Updated 8 March 2006
Copyright © 2003-06, Donald Byrd
1
Classification: Surgeon General’s Warning
• Classification (ordinary hierarchic) is dangerous
–
–
–
–
Almost everything in the real world is messy
Absolute correlations between characteristics are rare
Example: some mammals lay eggs; some are “naked”
Example: musical instruments (piano as percussion,
etc.)
• Nearly always, all you can say is “an X has
characteristic A, and usually also B, C, D…”
• Leads to:
– People who know better claiming absolute correlations
– Arguments among experts over which characteristic is
most fundamental
– Don changing his mind
30 Jan. 06
2
Dimensions of Music Representations (1)
• Waveform
• Csound
Expressive
Completeness
• M usicXML
• Notelist
• M IDI (SM F)
Structural Generality
(After Wiggins et al (1993). A Framework for the Evaluation of Music
Representation Systems.)
rev. 3 Feb.
3
Dimensions of Music Representations (2)
• Expressive completeness
– How much of all possible music can the representation
express?
– Includes synthesized as well as acoustic sounds!
– Waveform (=audio) is truly “complete”
– Exception, sort of: conceptual music
• E.g., Tom Johnson: Celestial Music for Imaginary Trumpets
(notes on 100 ledger lines), Cage: 4’ 33” (of silence), etc.
• Structural generality
– How much of the structure in any piece of music can
the representation express?
– Music notation with repeat signs, etc. still expresses
nowhere near all possible structure
30 Jan. 06
4
Representation vs. Encoding
• Representation: what information is conveyed?
– More abstract (conceptual)
– Basic = general type of info; specific = exact type
• Encoding: how is the information conveyed?
– More concrete: in computer (“bits”)…or on paper
(“atoms”)!)
• One representation can have many encodings
– “Atoms” example: music notation in printed or Braille
form
– “Bits” example: any kind of text in ASCII vs. Unicode
– “Bits” example: formatted text in HTML, RTF, .doc
30 Jan. 06
5
Basic Representations of Music & Audio
Digital Audio
Audio (e.g., CD, MP3):
like speech
Time-stamped
Time-stamped
Events Events
(e.g., MIDI file): like
unformatted text
Musiclike
Notation
Music Notation:
text with complex
formatting
27 Jan.
6
Basic Representations of Music & Audio
Audio
Time-stamped Events
Music Notation
Common examples
CD, MP3 file
Standard MIDI File
Sheet music
Unit
Sample
Event
Note, clef, lyric, etc.
Explicit structure
none
little (partial voicing
information)
much (complete
voicing information)
Avg. rel. storage
2000
1
10
Convert to left
-
easy
OK job: easy
Convert to right
1 note: pretty easy
OK job: fairly hard
other: hard or very hard
-
Ideal for
music
bird/animal sounds
sound effects
speech
music
music
27 Jan.
7
Basic and Specific Representations vs. Encodings
Basic and Specific Representations (above the line)
Audio
Time-stamped Events
Waveform
Time-stamped MIDI
Csound score
Time-stamped expM IDI
.WAV
Red Book (CD)
SMF
Csound score
Music Notation
Gamelan not.
Notelist
expM IDI File
Tablature
CM N
M ensural not.
M usicXM L
Finale
ETF
Encodings (below the line)
rev. 15 Feb.
8
Selfridge-Field on Describing Musical Information
• Cf. Selfridge-Field, E. (1997). Describing Musical Information.
• What is Music Representation? (informal use of term!)
– Codes in Common Use: solfegge (pitch only), CMN, etc.
– “Representations” for Computer Application: “total”, MIDI
• Parameters of Musical Information
– Contexts: sound, notation/graphical, analytic, semantic; gestural?
– Concentrates on 1st three
• Processing Order: horizontal or vertical priority
• Code Categories
–
–
–
–
–
Sound Related Codes: MIDI and other
Music Notation Codes: DARMS, SCORE, Notelist, Braille!?, etc.
Musical Data for Analysis: Plaine and Easie, Kern, MuseData, etc.
Representations of Musical Patterns and Process
Interchange Codes: SMDL, NIFF, etc.; almost obsolete!
30 Jan. 06
9
Review: The Four Parameters of Notes
• Four basic parameters of a definite-pitched musical note
1. pitch: how high or low the sound is: perceptual analog of
frequency
2. duration: how long the note lasts
3. loudness: perceptual analog of amplitude
4. timbre or tone quality
• Above is decreasing order of importance for most Western
music
• …and decreasing order of explicitness in CMN!
10
Review: How to Read Music Without Really Trying
• CMN shows at least six aspects of music:
–
–
–
–
NP1. Pitches (how high or low): on vertical axis
NP2. Durations (how long): indicated by note/rest shapes
NP3. Loudness: indicated by signs like p , mf , etc.
NP4. Timbre (tone quality): indicated with words like
“violin”, “pizzicato”, etc.
– Start times: on horizontal axis
– Voicing: mostly indicated by staff; in complex cases also
shown by stem direction, beams, etc.
• See “Essentials of Music Reading” musical example.
11
Complex Notation (Selfridge-Field’s Fig. 1-4)
Complications on staff 2:
• Editorial additions (small notes)
• Instruments sharing notes only some of the time
• Mixed durations in double stops
• Multiple voices (divisi notation)
• Rapidly gets worse with more than 2!
10 Feb.
12
Complex Notation (Selfridge-Field’s Fig. 1-4)
Multiple voices rapidly gets worse with more than 2
• 2 voices in mm. 5-6: not bad: stem direction is enough
• 3 voices in m. 7: notes must move sideways
• 4 voices in m. 8: almost unreadable—without color!
• Acceptable because exact voice is rarely important
rev. 12 Feb.
13
Domains of Musical Information
• Independent graphic and performance info common
– Cadenzas (classical), swing (jazz), rubato passages (all music)
• CMN “counterexamples” show importance of independent
graphic and logical info
– Debussy: bass clef below the staff
– Chopin: noteheads are normal 16ths in one voice, triplets in another
• Mockingbird (early 1980’s) pioneered three domains:
– Logical: “ note is a qtr note” (= ESF(Selfridge-Field)’s “notation”)
– Performance: “ note sounds for 456/480ths of a quarter” (= ESF’s
“sound”; also called gestural)
– Graphic: “ notehead is diamond shaped” (= ESF’s “ notation”)
– Nightingale and other programs followed
• SMDL added fourth domain
– Analytic: for Roman numerals, Schenkerian level, etc. (= ESF’s
“analytic”)
1 Feb. 06
14
Different Classifications of Music Encodings
Selfridge -Field
Sound -related codes (1): M IDI
Sound -related codes (2): Other Codes for
Representation and Control
Musical Notation Codes (1): D ARMS
Musical Notation Codes (2): O ther ASCII
Representations
Musical Notation Codes (3): G raphical-obje ct
Descriptions
Musical Notation Codes (4): B raille
Codes for Data Management and Analysis (1):
Monophonic Representations
Codes for Data Management and Analysis (2):
Polyphonic Representations
Representations of Musical Patterns and
Processes
Interchange Codes
10 Feb.
Byrd
Time-stamped MIDI
Time-stamped Events + Audio
CMN (domains L, G)
CMN (domains L, G)
CMN (domains L, P, G)
CMN: non- computer
representation!
CMN (emphasizes domain A)
CMN (emphasizes domain A)
“CMN” (abstracted; emphasizes
A)
CMN (domains L, P, G, A)
15
Mozart: Variations for piano, K. 265, on
“Ah, vous dirais-je, Maman”, a.k.a. Twinkle
Theme
2 œ œ
&4
2
? 4 œ
œ
Variation 2
&
?
`
œ
œ
œ œ
œ œ
œ œ
œ œ
œ œ
œ œ
Ý
œ œ
œ œ
œ œ
œ œ
œ œ
œ œ
Ý
œ
œœ
œ
œ
œ
œ
œ
œ
œ
ݜ
Ý
œ
ݜ
Ý
œ
œœœœœœœ œœœœœœœ œœœœœœœ œœœœœœœ œ#œœ œ#œœ
œ
œ
œ
œ
œ
œ
œœ
& œ
œœ œ
? œ
16
Representation Example: a Bit of Mozart
The first few measures of Variation 8 of the “Twinkle” Variations
27 Jan.
17
In Notation Form: Nightingale Notelist
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
%%Notelist-V2 file='MozartRepresentationEx' partstaves=2 0 startmeas=193
C stf=1 type=3
C stf=2 type=10
K stf=1 KS=3 b
K stf=2 KS=3 b
T stf=1 num=2 denom=4
T stf=2 num=2 denom=4
A v=1 npt=1 stf=1 S1 'Variation 8'
D stf=1 dType=5
N t=0 v=1 npt=1 stf=1 dur=5 dots=0 nn=72 acc=0 eAcc=3 pDur=228 vel=55 ...... appear=1
R t=0 v=2 npt=1 stf=2 dur=-1 dots=0 ...... appear=1
N t=240 v=1 npt=1 stf=1 dur=5 dots=0 nn=74 acc=0 eAcc=3 pDur=228 vel=55 ...... appear=1
N t=480 v=1 npt=1 stf=1 dur=5 dots=0 nn=75 acc=0 eAcc=2 pDur=228 vel=55 ...... appear=1
N t=720 v=1 npt=1 stf=1 dur=5 dots=0 nn=77 acc=0 eAcc=3 pDur=228 vel=55 ...... appear=1
/ t=960 type=1
N t=960 v=1 npt=1 stf=1 dur=4 dots=0 nn=79 acc=0 eAcc=3 pDur=456 vel=55 ...... appear=1
(etc. File size: 1862 bytes)
27 Jan.
18
An Event Form: Standard MIDI File (file dump)
•
•
•
•
•
•
•
•
•
•
•
•
•
0:
16:
32:
48:
64:
80:
96:
112:
128:
144:
160:
176:
192:
4D54 6864
726B 0000
0402 0218
0055 00FF
6480 4840
3881 6480
904F 3883
3883 4880
804D 400D
FF03 0550
4140 0C90
6480 4440
00 .
0000
0014
0896
0305
0C90
4B40
4880
4F40
FF2F
6961
4330
0C90
0006
00FF
34FF
5069
4A38
0C90
4F40
1890
004D
6E6F
8164
4647
0001
5103
2F00
616E
8164
4D38
1890
4D38
5472
8F00
8043
8164
0003
0B70
4D54
6F00
804A
8164
4F38
8330
6B00
9041
400C
8046
5 Feb.
01E0
C000
726B
9048
400C
804D
8360
8050
0000
2B81
9044
4001
4D54
FF58
0000
3881
904B
400C
9050
4018
3200
6480
3181
FF2F
MThd.........‡MT
rk......Q..p¿..X
.....ñ4./.MTrk..
.U....Piano.êH8Å
dÄH@.êJ8ÅdÄJ@.êK
8ÅdÄK@.êM8ÅdÄM@.
êO8ÉHÄO@.êO8É`êP
8ÉHÄO@.êM8É0ÄP@.
ÄM@../.MTrk...2.
...Pianoè.êA+ÅdÄ
A@.êC0ÅdÄC@.êD1Å
dÄD@.êFGÅdÄF@../
19
An Event Form: Standard MIDI File (interpreted)
•
Header format=1 ntrks=3 division=480
•
•
•
•
•
Track #1 start
t=0 Tempo microsec/MIDI-qtr=749760
t=0 Time sig=2/4 MIDI-clocks/click=24 32nd-notes/24-MIDI-clocks=8
t=2868 Meta event, end of track
Track end
•
•
•
•
•
•
•
•
Track #2 start
t=0 Meta Text, type=0x03 (Sequence/Track Name) leng=5
Text = <Piano>
t=0 NOn ch=1 num=72 vel=56
t=228 NOff ch=1 num=72 vel=64
t=240 NOn ch=1 num=74 vel=56
t=468 NOff ch=1 num=74 vel=64
(etc. File size: 193 bytes)
27 Jan.
20
MIDI (Musical Instrument Digital Interface) (1)
• Invented in early 80’s
– Dawn of personal computers
– Designed as simple (& cheap to implement) real-time
protocol for communication between synthesizers
– Low bandwidth: 31.25 Kbps
• Top bit of byte: 1 = status, 0 = data
– Numbers usually 7 bits (range 0-127); sometimes 14 or even 21
• Message types
–
–
–
–
–
Channel Voice
Channel Mode
System Common
System Real-Time
System Exclusive
5 Feb. 06
21
MIDI (2)
• Important standard Events are mostly Channel Voice msgs
– Note On: channel (1-16), note number (0-127), on velocity
– Note Off: channel, note number, off velocity
• Can change “voice” any time with Program Change msg
• A way around the 16-channel limit: cables
– may or may not correspond to a physical cable
– each cable supports 16 channels independent of others
– Systems with 4 (=64 channels) or 8 cables (=128) are common
• MIDI Monitor allows watching MIDI in real time
– Freeware and open source!
5 Feb. 06
22
MIDI Sequencers
• Record, edit, & play SMFs (Standard MIDI Files)
• Standard views
– Piano roll
• often with velocity, controllers, etc., in parallel
– Event list
– Other: Mixer, “Music notation”, etc.
– Standard editing
• Adding digital audio
– Personal computers & software-development tools have gotten
more & more powerful
– => "digital audio sequencers”: audio & MIDI (stored in hybrid
encodings)
• Making results more musical: “Humanize”
– Timing, etc. isn’t mechanical—but not really musical!
8 Feb. 06
23
Another Warning: Terminology (1)
• A perilous question: “How many voices does this
synthesizer have?”
• Syllogism
– Careless and incorrect use of technical terms is
dangerous to your learning very much
– Experts use technical terms carelessly most of the time
– Beginners often use technical terms incorrectly
– Therefore, your learning very much is in danger
• Somewhat exaggerated, but only somewhat
5 Feb. 06
24
Another Warning: Terminology (2)
• Not-too-serious case: “system”
– Confusion because both standard (common) computer
term & standard (rare but useful) music term
• Serious case: patch, program, timbre, or voice
– Vocabulary def.: Patch: referring to event-based systems such as MIDI
and most synthesizers (particularly hardware synthesizers), a setting that
produces a specific timbre, perhaps with additional features. The terms
"voice", "timbre", and "program" are all used for the identical concept;
all have the potential to cause substantial confusion and should be
avoided as much as possible
– “Patch” is the only unambiguous term of the four
– …but the official MIDI specification (& almost everything else)
talks about “voices” (as in “Channel Voice messages control the
instrument's 16 voices”)
– …and to change the “voice”, you use a “program change”!
6 Feb. 06
25
Another Warning: Terminology (3)
• Some terminology is just plain difficult
• Example: “Representation” vs. “Encoding”
– Distinction: 1st is more abstract, 2nd more concrete
– …but what does that mean?
– Explaining milk to a blind person: “a white liquid...”
• Don’s precision involves being very careful with
terminology, difficult or not
– Vocabulary is important source
– Cf. other sources
– Contributions are welcome
6 Feb. 06
26
Standard MIDI Files (1)
•
•
•
•
•
File format = encoding
Standard approved in 1988
Very compact
Files made up of chunks with 4-character type
One Header chunk (“Mthd”)
– Gives format, number of tracks, basis for timing
• Any number of Track chunks (“MTrk”)
– Stream of MIDI events and metaevents preceded by time
– 1st track is always timing track
5 Feb.
27
Standard MIDI Files (2)
• Metaevents
– Set Tempo (in timing track only)
– Text, Lyrics, Key/time signatures, instrument name, etc.
• What’s missing?
– Voice information limited to 16 channels
– Dynamics, beams, tuplets, articulation, expression marks,
note spelling, etc.: much less structure than CMN
• Attempts to overcome limitations
–
–
–
–
Expressive MIDI, NotaMIDI, etc.
ZIPI
In a (more ambitious) way, Csound, etc.
None of the limited attempts caught on
5 Feb. 06
28
Separating Representations Doesn’t Work! (1)
• Really “doesn’t work well for many purposes”
• We shouldn’t be surprised
– Close relative of “Classification is Dangerous to Your
Health”
• Example: many popular notation encodings (e.g.,
MusicXML) add event info
• Example: multiple domains for notation add in
event info (performance domain)
• Example: Csound combines audio & events
• Hybrid systems
12 Feb. 06
29
Separating Representations Doesn’t Work! (2)
• Extreme example of musical necessity: Jimi
Hendrix’s version of the Star-Spangled Banner at
Woodstock (1969)
– Goes from pure melody => noteless texture => back
repeatedly
– What would music-IR system do to recognize the StarSpangled Banner?
– …or Taps? (a very different problem!)
• Attempts have been/are being made to combine all
three basic representations
3 Feb. 06
30
Even One Note can be Hairy
• Experience in the early days of Kurzweil (ca.
1983)
– Piano middle C(!) never sounded “good”
• ...except first, low-quality recording
• Couldn’t tell why from waveform, spectrogram, etc.
– Variable sampling rates were unusable
• An expensive mistake: cost ca. $1,000,000
– Scale on the flute didn’t sound realistic to a flutist—but
it was
– Lesson 1: expectations influence perception
– Lesson 2: nothing about music is clear-cut or simple
31
Musical Acoustics (1)
• Acoustics involves physics
• Musical (opposed to architectural, etc.) acoustics
–
–
–
–
Frequency (=> pitch)
Amplitude (=> loudness)
Spectrum, envelope, & “other” characteristics => timbre
Partials vs. harmonics
• Psychoacoustics involves psychology/perception
– Perceptual coding (“lossy” compression via MP3, etc.)
• Timbre
– Old idea (thru ca. 1960’s?): timbre is produced by static
relationships of partials
– …but attack helps identification more than steady state!
– Reality: rich (interesting) sounds are complex; nothing is static
– Time domain (waveform) vs. frequency domain (spectrogram) views
11 Feb. 06
32
Musical Acoustics (2): addsynenv
• addsynenv does additive synthesis of up to six partials
– Each has arbitrary partial no., starting phase, and "ADSR" type
envelope
– Partial no. can be non-integer => not harmonic
– ADSR = Attack/Decay/Sustain/Release (3 breakpoints)
– …but addsynenv allows much more complex envelopes
– Plays a single note in the waveform specified by partials &
envelopes
– Simultaneously displays “spectrogram” or “sonogram”
– …but not waveform
– Phase in real world normally has little effect, but can be critical in
recording & digital worlds (e.g., cancellation)
• Additive synthesis can’t create aperiodic (non-definite
pitch) sounds, or realistic attacks
11 Feb. 06
33
Musical Acoustics (3): Frequency, Temperament,
Scales
• Frequency: A4 (A above middle C) = 440 Hz, etc.
• Why do we need temperaments (tuning systems)?
–
–
–
–
–
Frequency ratio (FR) of notes an octave (12 semitones) apart = 2:1
FR of notes a Perfect 5th (7 st, e.g., C up to G) = 3:2
84 st = 7 octaves; around “circle of fifths”, 84 st = 12 P5s
But (3/2)^12 = 129.746 does not = 2^7!
The standard (but not only) solution: equal temperament
• With 12 semitones & equal temperament, semitone FR =
12th root of 2 : 1 = ca. 1.059463…:1
• Etc.: see table in Wikipedia “Musical Acoustics” article
• Scales
– Diatonic (major, minor, etc.), chromatic, other
– Is addsynenv “whole tone” preset a timbre, or a scale!?
10 Feb. 06
34
An Audio Form: AIFF, 22K, 8 bit mono (file dump)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
0: 464F 524D 0001
16: 0000 0012 0001
32: 0000 0000 0000
48: 0000 0000 0000
64: 0000 0000 0000
80: FBFB FCFC FCFD
96: 0303 0201 0101
112: FFFF FFFF FF00
128: FDFE 0000 0000
144: FEFE FF00 0001
160: 0504 01FF FEFE
176: 0000 0000 00FF
192: 0302 0101 0201
208: FFFD FBFA FAFC
224: FDFC FBFB FCFE
240: 0102 0507 0705
256: F9FB FDFD FDFE
(etc. File size: 99,254 bytes)
83AE
0001
5353
0000
0000
FEFF
0000
0203
0001
0000
FDFD
FEFE
0000
FDFF
0103
0200
FEFE
4149
8380
4E44
0000
00FF
0001
0000
00FE
0102
0000
FBFA
FEFD
0205
FF00
0405
FFFD
FEFE
4646
0008
0001
0000
0000
0102
0100
FCFC
0201
0001
FBFC
FDFE
0605
00FF
0402
FBF8
FEFD
31 Jan.
434F
400D
8388
0000
0000
0303
FFFF
FCFC
00FF
0100
FEFF
FF02
0200
FFFF
0203
F6F5
FDFC
4D4D
AC44
0000
0000
FEFC
0303
FEFE
FBFC
FFFF
0103
FFFF
0303
0000
FEFD
0302
F5F7
FDFF
FORM..ÉÆAIFFCOMM
........ÉÄ..@.¨D
......SSND..Éà..
................
..............˛¸
˚˚¸¸¸˝˛.........
..............˛˛
.........˛¸¸¸¸˚¸
˝˛..............
˛˛..............
....˛˛˝˝˚˙˚¸˛...
......˛˛˛˝˝˛....
................
.˝˚˙˙¸˝.......˛˝
˝¸˚˚¸˛..........
.........˝˚¯ˆıı˜
˘˚˝˝˝˛˛˛˛˛˛˝˝¸˝.
35
Uncompressed Audio Files: Formats
• File formats are (containers for) encodings
• Only common formats are “PCM”: WAVE (.wav:
especially Windows), AIFF (especially Macintosh)
• Sampling rate, sample size, no. of channels independent of
format
• Typically in file header (different between formats)
• Higher sampling rate => reproduce higher frequencies
(sampling theorem, a.k.a. Nyquist theorem)
– Max. possible frequency = 1/2 SR
• Larger sample size => better signal-to-noise ratio (SNR)
and/or dynamic range
– Rule of thumb: 6 dB/bit
– 16-bit samples (CD) = maximum of ca. 96 dB SNR
10 Feb. 06
36
Uncompressed Audio Files: Size
• Header size usually negligible compared to data
• Mozart AIFF file is low fidelity mono
– 22,050 samples/channel/sec. * 1 byte/sample * 1 channels * 4.5
sec. = 99,000 bytes
– At CD quality, would take 44,100 samples/channel/sec. * 2
bytes/sample * 2 channels * 4.5 sec. = 792,000 bytes
• CD can store up to 74 min.(or 80) of music
• Total amount of digital data
– 44,100 samples/channel/sec. * 2 bytes/sample * 2 channels * 74
min. * 60 sec./min. = 783,216,000 bytes
1 Feb. 06
37
Specs for Some Common Audio Formats
Format
Encoding Type
Details
Fidelity
“Red Book” (CD)
Uncompressed,
linear
44.1KHz, 16 bit s/sample,
st ereo
Very high
Bandwidth
(Kbps)
ca. 1400
Early game audio
Uncompressed,
linear
22.05KHz, 8 bit s/sample,
mono
Low
176
MLP, Apple
Lossless
Compression, etc.
MP2 (Variat ions1)
Lossless comp.
compression ca. 2:1
Very high
ca. 700
Lossy comp.
compression ca. 3:1
High
ca. 400
MP3 (Variations2),
AAC, WMA
Lossy comp.
compression ca. 7:1 to
over 10:1
High t o
very high
ca. 128-192
compression more than
20:1
Medium
ca. 28-64
AAC (Variations2), Lossy comp.
WMA
13 Feb. 06
38
An Event/Audio Hybrid Form: Csound
•
•
•
•
Developed by Barry Vercoe (MIT) in early 80’s
Builds on Music IV, Music V, etc. (Bell Labs in early 60’s)
Widely used in electronic/computer music studios
Hybrid representation: orchestra = audio, score = events
– Orchestra: defines “instruments” (timbre, articulation, dynamics)
– Score: tells instruments when to play and with what parameter
values
• One statement per line and comments after a semicolon
• First character of score statement is opcode
• Remaining data consists of numeric parameter fields
(pfields) to be used by that action
11 Feb. 06
39
Csound Example
•
Sine-tone generator to play the first five notes of the Mozart example, at quarter
note = 80 => an eighth note lasts 3/8 = .375 sec. Score:
•
•
•
•
•
•
f1
0
256
10
; a few notes of Mozart
i1
0
.375
0
i1
.375
.
.
i1
.75
.
.
i1
1.125
.
1
•
i1
9.07
•
Opcode f interprets parameter fields as follows:
•
p1 - function table number being created
p2 - creation time, or time at which the table becomes readable
p3 - table size (number of points)
p4 - generating subroutine, chosen from a prescribed list.
•
i (note) statements invoke the p1 instrument at time p2, duration p3 seconds;
passes all p-fields to that instrument.
1.5
.75
.
9.00
9.02
9.03
.
3 Feb.
; a sine wave function table
9.05
40
Compressed Audio Files: Formats
• Don’t confuse data compression with dynamic-range
compression (a.k.a. audio level compression or limiting)
• Codec = compressor/decompressor
• Lossless compression
– Standard methods (e.g., LZW: .zip, etc.) don’t do much for audio
– Sometimes called packing
– Audio specific methods
• MLP used for DVD-Audio
• Apple and Microsoft Lossless
• Nearly lossless compression : DTS
• Lossy compression
– Depends on psychoacoustics (“perceptual coding”)
16 Feb. 06
41
Compressed Audio Files: Lossy Compression
• Depends on psychoacoustics (“perceptual
coding”)
1. Divide signal into sub-bands by frequency
2. Take advantage of:
•
•
•
•
Masking (“shadows”), via amplitude within critical bands
Threshhold of audibility (varies with frequency)
Redundancy among channels (5.1 even more than stereo!)
Etc.
• MPEG-1 layers I thru III (MP-1, 2, 3), AAC get better &
better compression via more & more complex techniques
– “There is probably no limit to the complexity of psychoacoustics.”
--Pohlmann, 5th ed.
– However, there probably is an “asymptotic” limit to compression!
• Implemented in hardware or software codecs
22 Feb. 06
42
Compressed Audio Files: Lossy Compression
• Evaluation via critical listening is essential
– ITU 5-point scale
• 5 = imperceptible, 4 = perceptible but not annoying, 3 = slightly
annoying, 2 = annoying, 1 = very annoying
– Careful tests: often double-blind, triple-stimulus, hidden reference
• E.g., ISO qualifying AAC with 31 expert listeners (cf. Hall article)
– Test materials chosen to stress codecs
• Common useful tests: glockenspiel, castanets, triangle, harpsichord,
speech, trumpet
• Soulodre’s worst-case tracks: bass clarinet arpeggio, bowed double
bass, harpsichord arpeggio, pitch pipe, muted trumpet
• References: Hall article “Cramped Quarters”, Pohlmann Principles of
Digital Audio (both on my reserve)
17 Feb. 06
43
Hybrid Representation & Compression
• Events (with “predefined” timbre) take very little space
–
–
–
–
Mozart fragment AIFF (CD-quality audio): 794,166 bytes
Mozart fragment MIDI file: 193 bytes
Timbre takes same amount of space, regardless of music length!
Problem: don’t have exact timbre for any performance
• Mike Hawley’s approach: find structure in audio; create
events & timbre definition
– Hawley, Michael J. (1990). The Personal Orchestra, or, Audio Data
Compression by 10000:1. Usenix Computing Systems Journal 3(2),
pp. 289—329.
• Could hybrid event/audio representation lead to his “audio
data compression by a factor of 10,000”?
• Maybe, but no time soon!
17 Feb. 06
44
Explicit Structure & Music Notation (1)
• David Huron on “Explanatory Goals of Music Analysis”
• “Music exhibits a multitude of different kinds of structures.”
– Formal: Symmetrical partitioning of tone row in Webern's Op. 24 Concerto
– Functional: Defensive vigil and the early morning singing of the Mekranoti
indians of Brazil
– Physiological: Critical bands & Haydn's String Quartet Op. 17, No. 3
– Perceptual: Auditory attention & ramp dynamics in Beethoven's Piano
Sonata No. 4, Op. 7
– Idiomatic: Arnold's Fantasy for Trumpet
– Economic: Prokofiev's piano Concerto No. 4 for left-handed pianist
– Personal: B-A-C-H and J.S. Bach's Brandenburg Concerto No. 2
– Social: Frank Sinatra's All or Nothing at All heard in 1959 vs. 1985
– Cultural: Geographical difference between Slavic & Germanic folk
cadences
– Linguistic: Rhythmic contrast between Frère Jacques & English Country
Garden
20 Feb. 06
45
Explicit Structure & Music Notation (2)
• Music notation is about explicit information/structure
• Huron was talking about explanation; do we really need
explicit information for IR?
– O’Maidin (and others): no need for logical domain
• Interesting examples
– Unmarked triplets: obvious in Schubert, etc.; not in Bartok Qtet 5, I
– Bach Goldberg Variations; D-major Prelude, WTC Book II: notation
sidesteps triplets by changing time signature (and tempo)
– Hendrix’ Star-Spangled Banner improvisation
• Byrd & Isaacson
– Need logical domain because (for many purposes) must interpret at
some point, & silly for everyone who needs it to re-do!
– Byrd & Isaacson (2005). A Music Representation Requirement
Specification for Academia. CMJ 27, no. 4 (2003), pp. 43–57
– But logical info may be less authoritative
7 Mar. 06
46
Explicit Structure & Music Notation (3)
• Joan Public’s problem: find a song, given some of
the melody and some lyrics
– Needs notes and text (lyrics)
– Common question for music librarians, esp. in public
libraries
• Musicologist’s problem: authorship/origin of
works in manuscripts
– Full symbolic data is important, even “insignificant”
details of notation (John Howard)
47
Music Notation: Fully Structured Representation
• Many forms (“specific” representations)
– Early and modern Western, Gamelan notation (Java), Indian, etc.
• Evolved with music they represent
• CMN probably most elaborate, especially w/r/t time
– Western polyphony very demanding of synchronization
• History of Western music notation in ten minutes
–
–
–
–
Manuscript Facsimiles at www.nd.edu/~medvlib/musnot.html
Neumes (6th to 13th century)
Mensural notation: black (ca. 1250-1450), white (ca. 1450-1600)
CMN (a.k.a. TMN) is really “CWMN” (ca. 1600-present)
• included barlines, piano/forte from beginning
• …but other dynamics gradually: pp/ff c.1750, mf c.1762, mp c.1837,
etc.; Romanticism => wider range
• …and more and more specific (metronome & expression marks, etc.)
– Tablature goes back centuries, too: important for lute (& viols?)
22 Feb. 06
48
A Systematic Approach to Music Representation
• Byrd & Isaacson (2003). A Music Representation Requirement
Specification for Academia. Computer Music Journal 27(4), pp. 43-57
• Intended to help choose representation for encoded music in
Variations2
• Purpose influences choice of encoding as well as representation
• Wiggins et al. (1993) give three sorts of tasks:
1. recording: user wants record of musical object, to be retrieved later
2. analysis: user wants analyzed version, not “raw” musical object
3. generation/composition
• Descriptive vs. prescriptive notation :: pitch in CMN vs. tablature
• Tuplets are exceptionally complex & subtle; often invisible
(unmarked)
• A “huge table” takes up half of the article
• CMN has endless details, but most aren’t important for most purposes!
• Assigned importance “Required”, “Very Desirable”, “Desirable”
• Result: chose MusicXML (but priorities changed & little done with it)
2 Mar. 06
49
Music Notation Software and Intelligence (1)
• Cf. Byrd, D. (1994). Music Notation Software and Intelligence.
• Cases where famous composers flagrantly violate
important rules, yet results are easily readable
Fig. 1. Changing time signature in middle of the measure (J.S. Bach)
Fig. 2. A measure with four horizontal positions for notes that are all
on the downbeat (Brahms)
Very different ways to have two clefs in effect at the same time:
Fig. 3. Bizarrely obvious (Debussy)
Fig. 4. So subtle, must think about the 3/8 meter to see bass and treble
clefs are both in effect throughout the measure (Ravel)
• Really nothing very strange going on in any of these
rev. 15 Feb.
50
Music Notation Software and Intelligence (2)
• Rules of CMN interact and aren’t always consistent
• Programmers try to help users by having programs do
things “automatically”
• A good idea if software knows enough to do the right thing
“almost all” the time
• Notation programs convert CMN to performance (MIDI)
and vice-versa => makes things worse
• Severo Ornstein’s complaint: programs that assume a
defined rhythmic structure
22 Feb. 06
51
Surprise: Music Notation has Meta-Principles!
1. Maximize readability (intelligibility)
–
–
–
–
Avoid clutter = “Omit Needless Symbols”
Try to assume just the right things for audience
Audience for CMN is (primarily) performers
General principle of any communication
• Applies to talks as well as music notation!
– Examples: Schubert (avoid tuplet numerals), Bach (avoid tuplets)
2. Minimize space used
– Save space => fewer page turns (helps performer); also cheaper to
print (helps publisher)
– Squeezing much music into little space is a major factor in
complexity of CMN
– Especially important for music: real-time, performer’s hands full
– Examples: Telemann, Debussy, Ravel (for all, reduce staves)
22 Feb. 06
52
Music Notation: Attempts at Standard Encodings (1)
•
•
•
•
Structured encodings on computers (all CMN)
Almost all are text; a few (e.g., NIFF) are binary
Three “generations”
Early (DARMS, MUSTRAN, SCORE, IML, etc.)
– From 1960’s: mainframe computers, batch processing (no
interactive editing), punch card input
– Logical and (usually limited) Graphic domains
– IML used for Princeton’s Josquin Masses project: first(?) music IR
– SCORE important in music publishing bcs emphasizes Graphic
– Mozart (“Twinkle” Variation no. 8) in MUSTRAN (first 5 notes)
• GS, K3$K, 2=4, WMPW, ‘Variation 8’, 8C+, 8D+, 8E+, 8F+, /, 4G+
• 2nd generation (Humdrum/kern, MuseData, Notelist, etc.)
– From 1970’s and 80’s: mini and early personal computers
– Added Performance domain
– Far more verbose: 1st 5 notes of Mozart in Notelist = c. 12 lines
27 Feb. 06
53
Music Notation: Attempts at Standard Encodings (2)
• Recent (3rd generation)
–
–
–
–
NIFF, GUIDO, SMDL, MusicXML, MEI, etc.
Almost all intended as interchange codes
Some are extensible (GUIDO, MEI)
Most based on XML
• Non-XML
– NIFF = Notation Interchange File Format
• Binary, not text—probably a mistake
• Mostly a flop: maybe because apparent complexity scared developers
– GUIDO
• text, but not XML
27 Feb. 06
54
GUIDO and Representational Adequacy
• Representational adequacy: simple things have simple representations
– Simple (really “encodings”!) => almost as compact as MUSTRAN
– Mozart first 5 notes: Mozart first 5 notes: [ \clef<"treble"> \key<-3>
\meter<type="2/4"> c+2/8 d e& f g/4 ]
– …but with other symbols: [ \clef<"treble"> \text<"Variation 8"> \key<-3>
\meter<type="2/4"> \intens<"mf"> c+2/8 d e& f g/4 ]
• Design Layers: Basic (Logical domain only), Advanced (adds
Graphic), Extended (adds Performance, user extensible; unfinished?)
• “GUIDO Music Notation Fmt” talk: www.salieri.org/guido/doc.html
• GUIDO is (was?) an academic project => free, open-source
• GUIDO NoteServer: http://www.noteserver.org/noteserver.html
• Other GUIDO tools: MIDI => GMN, GMN => gif (like NoteServer)
– Used in Variations2 to display themes
• GUIDOXML exists, but never much done with it
– GUIDO’s developers never liked XML, felt it accomplished nothing
• GUIDO was fairly popular, but seems to be dying out—too bad
– Almost everyone else likes XML: cause and effect?
27 Feb. 06
55
Music Notation: Attempts at Standard Encodings (3)
• XML-based (concept of markup language)
– SGML = Standard Generalized Markup Language
– “Application” of SGML for music
• SMDL = Standard Music Description Language: early & v. powerful, but a flop
– XML = eXtensible Markup Language is hugely popular
– Applications of XML for music
• MusicXML is by far most popular; most verbose (5 notes of Mozart = 270 lines!)
• MEI also significant; others include MusiXML, MNML, NIFFML, etc. etc.
• Castan’s site www.music-notation.info lists programs importing &
exporting each encoding
– Gives an idea of which are most important/popular
– MusicXML is hands-down winner; next are GUIDO, NIFF, SCORE
27 Feb. 06
56
HTML, XML, and Markup Languages
• Basic limitation of HTML: too concrete
– Prescriptive, not descriptive
– Says what to show, but no way to say what “G minor” is (semantics)
– Limits reuse of information
• Solution: markup languages
– SGML = Standard Generalized Markup Language
– “Applications” of SGML
• HTML = HyperText Markup Language (Web pages)
• SMDL = Standard Music Description Language: early & v. powerful, but a flop
– XML = eXtensible Markup Language is hugely popular
•
•
•
•
Subset of/simpler than SGML; mostly replaced it (except for HTML)
Stricter than SGML => HTML replaced by XHTML
Application defined by DTD or schema (cf. MusicXML’s)
Musical metadata examples: iTunes Music Library.xml, Variations2
– Can say “G minor” is a “key”, but still no explicit semantics
• Improvement: the Semantic Web (?): see www.w3.org
– XML-based; still no real semantics, but more explicit structure
5 Mar. 06
57
MusicXML
• MusicXML from Michael Good/Recordare, Inc.
– Cf. www.recordare.com
• Designed as a “practical interchange format”
• Now the de facto standard encoding of CMN
– Interfaces to MusicPad, karaoke, etc., as well as notation programs
• “Recordare editions” (songs) & examples on Web site
– http://www.recordare.com/xml/samples.html
• Heavily influenced by MuseData; also Humdrum/kern(?)
– => Conversion to/from can be very high-quality
– Dolet for Finale does MusicXML <=> Finale
– Dolet 1.0 also did MuseData (CCARH encoding)
• 1.0 emphasizes Logical; much Performance info too
• MusicXML 1.1 improves formatting (Graphical)
• Also has support for tablature
3 Mar. 06
58
MusicXML and Variations2
• Real-world situation: choice of encoding for Variations2
– In Byrd & Isaacson Requirements, MusicXML 1.0 implemented
all “Required” and most “Very Desirable”; so did MEI; others?
– Decision came to “MusicXML vs. Everything Else” (Jan. 2004)
– Also considered MEI (Perry Roland), GUIDO, Humdrum/kern,
NIFF
– Did MusicXML 1.0 “have serious enough drawbacks, considering
the whole situation and not just the representation itself, to make
anything else worth considering?”
• Most important: software support
• ...other programs and ours (planned an integrated notation engine)
• Other considerations: complexity, migration paths (future
conversion), flexibility/extensibility, etc.
– Conclusion: no.
1 Mar. 06
59
Declarative vs. Procedural Representations
• Representation in cognition: Hofstadter’s example
– Declarative: How many people live in Chicago?
– Procedural: How many chairs are there in your living room?
– Probably mixed: Recall of a melody
• Principle (and application to programming)
– Declarative: information is explicit (programming: what to do)
– Procedural: information is implicit (programming: how to do it)
– In programming languages, well-known ones are procedural
• All representations we’ve looked at are declarative
– Exception: possible to use CSound procedurally
– Very limited exception: repeat signs, D.C., D.S., etc.
• Procedural representations of music much more specialized
– Pla(?), Common Music, Stella, BP1(?)
14 Feb.
60
Declarative vs. Procedural Representations and
Variations2
• Wiggins et al. (1993) give three sorts of tasks:
1. recording: user wants record of musical object, to be retrieved later
2. analysis: user wants analyzed version, not “raw” musical object
3. generation/composition
• Variations2 concerned most with 1st, less with 2nd, least with
3rd
• Declarative representations (not procedural) are much more
appropriate for 1st and usually 2nd => considered only
declarative for Variations2
• Cf. Byrd & Isaacson Requirements
14 Feb.
61
Procedural Representation of a Simplified Version of
the Mozart Example
• In C-like syntax
•
int scale[] = { 0, 2, 3, 5, 7, 8, 10 };
•
•
•
•
•
•
•
•
procedure PlayTetrachord(float startTime, int startNoteNum, float
duration)
{
float time = startTime;
for (int i = 0; i<4; i++) {
PlayNote(time, duration, startNoteNum+scale[i]);
time += duration;
}
}
•
•
•
•
•
•
•
procedure main()
{
PlayTetrachord(0, 72, duration, .5);
PlayNote(2, 72+scale[4], 1);
PlayNote(3, 72+scale[4], 2);
PlayTetrachord(4, 53, duration, .5);
}
rev. 15 Feb.
62
Music Collections: Available or Not
• Cf. “Candidate Music IR Test Collections” list
– http://mypage.iu.edu/~donbyrd/MusicTestCollections.HTML
• How much music content is available in digital form?
– In audio form, lots and lots
• At cost: iTunes, Rhapsody, Napster, etc. (millions of tracks)
• “Free”: Classical Music Archives, etc. (tens of 1000s of tracks)
– In event (mostly MIDI file) form, far less
• At cost: ??
• “Free”: Classical Music Archives, etc. (tens of 1000s of tracks)
– In notation (CMN and tablature) form, similar to event form
• At cost: Sunhawk, MusicNotes, etc. (tens of 1000s of movements)
• Free: CCARH (1000s of mvmts), “tab” sites (tens of 1000s)
• Not even much in notation form that’s not available; why?
– No standard encoding (until now) => must usually convert
– Much smaller audience for notation than audio
2 Mar. 06
63
Encoding & Representation Conversion (1)
• Converting specific representations should be easier than
basic representations
– Example: CMN => tablature or vice-versa
• Converting encodings of the same representation should be
easier than specific representations
– Example: MusicXML <=> Finale; MP3 <=> .wav
• Generally true, but:
– Any encoding of audio to any other is relatively simple
– Likewise for time-stamped events
– Notation is by far most difficult: with complex music, big problem
• Examples
– Nightingale to MIDI File, back to Nightingale
– Finale equivalents with MusicXML and MIDI File
3 Mar. 06
64
Encoding & Representation Conversion (2)
• Nightingale to Notelist, back to Nightingale
– Loses text font/position, articulation mark and slur position,
instrument names, etc.
• Nightingale to MIDI File, back to Nightingale
– Totally loses text, articulation marks and slurs, etc.—but keeps
instrument names!
– Distorts logical note duration (uses approx. performance duration);
loses note spelling, clefs, dynamic marks.
• Finale equivalents with MusicXML and MIDI File
– Finale/MusicXML probably better than Nightingale/Notelist
– Finale/MIDI can be better than Nightingale/MIDI—but why?
• Both representations and programs lose/distort information!
• Murphy’s Law(?): “Anything that can go wrong, will.”
rev. 22 Feb.
65
Encoding Conversion Demo (1)
ORIGINAL
Nightingale: Open Webern example
20 Feb.
NOTELIST
66
Encoding Conversion Demo (2)
NOTELIST
MIDI
Nightingale: Open Webern, Save Notelist, Open Notelist
20 Feb.
67
Representation Conversion Demo
MIDI
Nightingale: Open Webern, Export MIDI File, Import MIDI File
20 Feb.
68
Encoding & Representation Conversion
• Why is information distorted/lost?
• Two steps in getting from one program to another
• 1. Export: notation program converts internal form to the one it’s
saving
– If the format it’s saving can’t represent some information, it can’t put that
information into the file; but even if format can, program might not do it
correctly—or might not try.
• 2. Import: notation program converts format it’s opening into its
internal form
– Even if some kind of information is in the file and program can handle it,
it might not do it correctly—or might not try.
• What information is distorted/lost because of limits of encoding vs.
limits of programs?
• Can tell only by looking at encoding , and may not be easy
• Example: notation program => MIDI file => same notation program
– MIDI files can handle text, but both Nightingale & (default) Finale lose it!
3 Mar. 06
69
Converting to Notation from Audio or “Non-music”
• Audio Music Recognition (AMR)
–
–
–
–
Monophonic (not too hard) vs. polyphonic (very hard)
Programs available since ca. 2000, but little visible progress
Still subject of much serious research
Not “ready for prime time” for anything but monophonic
• Optical Music Recognition (OMR)
–
–
–
–
–
Landmark report by Selfridge-Field et al (1994)
Programs available since ca. 1994; significant progress
Table of programs available on my Web site
Still subject of some serious research, e.g., Byrd & Schindele
Far more promising than AMR in near future
• Chris Raphael (1999): “OMR is orders of magnitude easier”
6 Mar. 06
70
OMR: The State of the Art (1)
• Byrd & Schindele (2006), Prospects for Improving OMR
with Multiple Recognizers proposes multiple recognizers
–
–
–
–
Recognition = segmentation + classification
Programs studied: PhotoScore, SharpEye, SmartScore
Hand error count (with “feel”, some automatic) of not much music
Cf. Craig Sapp’s error analysis of Mozart page with SharpEye
• http://craig.sapp.org/omr/sharpeye/mozson13/
– Main conclusions
• MROMR development requires much music => automatic
comparison of results
• Will “always” need access to specialized editing UI to correct errors
– Probably not hard to get MROMR output back into SharpEye
• SharpEye is most accurate, but error rates still too high
• MROMR is worth trying
• Evaluation is a major roadblock
– Difficult to evaluate: see Droettboom & Fujinaga, etc.
– Not even a standard test database(s)
– But prospects for improvement are good
8 Mar. 06
71
OMR: The State of the Art (2)
• Practical now for much music, but nowhere near all
• Bill Clemmons’ 1-to-10 scale of music suitability for OMR
(2004)
– “SmartScore works amazingly well for some things and so badly
with others that creating a file from scratch is less time consuming”
– Has students scan standard SATB choral octavo, parts &
accompaniment (1), & piano reduction of Parsifal prelude (10)
– Choral octavo is so clean, can get accurate file after only a handful
of corrections; Parsifal import is essentially unusable
• Lots of already-scanned images: in Variations2, CD Sheet
Music, on the Web
5 Mar. 06
72