Files
wavinfo/data/share/man/man7/wavinfo.7
Jamie Hardt c002120c61 gq gq gq
2023-11-08 15:42:59 -08:00

349 lines
14 KiB
Groff

.TH waveinfo 7 "2023-11-08" "Jamie Hardt" "Miscellaneous Information Manuals"
.SH NAME
wavinfo \- everything you ever wanted to know about WAVE metadata but were
afraid to ask
.SH DESCRIPTION
.PP
The WAVE file format is forwards-compatible. Apart from audio data, it can
hold arbitrary blocks of bytes which clients will automatically ignore
unless they recognize them and know how to read them.
.PP
Without saying too much about the structure and parsing of WAVE files
themselves, a subject beyond the scope of this document, WAVE files are
divided into segments or
.BR chunks ,
which a client parser can either read or skip without reading. Chunks have
an identifier, or signature: a four-character-code which informs a client
what kind of chunk it is, and a length. Based on this information, a client
can look at the identifier and decide if it knows how to read that chunk or
if it wants to. If it doesn't, it can simply read the length and skip
past it.
.PP
Some chunks are mandated by the Microsoft standard, specifically
.I fmt
and
.I data
in the case of PCM-encoded WAVE files. Other chunks, like
.I cue
or
.I bext
are optional, and hold metadata.
.PP
Chunks can also nest inside other chunks, a special identifier
.I LIST
is used to indicate these. A WAVE file is a recursive list: a top level
list of chunks, where chunks may contain a list of chunks themselves.
.SS Order of Metadata Chunks in a WAVE File
.PP
Chunks in a WAVE file can appear in any order, and a capable parser can
accept them appearing in any order, however authorities give guidance on
where chunks should be placed, when creating a new WAVE file.
.PP
.IP 1)
For all new WAVE files, clients should always place an empty chunk, a
so-called
.I JUNK
chunk, in the first position in the top-level list of a WAVE file, and
it should be sized large enough to hold a
.I ds64
chunk record. This will allow clients to upgrade the file to a RF64
WAVE file
.BR in-place ,
without having to re-write the file or audio data.
.IP 2)
Older authorites recommend placing metadata before the audio data, so
clients reading the file sequentially will hit it before having to seek through
the audio. This may have improve metadata read performance on certain
architecures.
.IP 3)
Older authorities also recommend inserting
.I JUNK
before the
.I data
chunk, sized in such a way so that the first byte of the
.I data
payload lands immediately at 0x1000 (4096), because this was a common
factor of the page boundaries of many operating systems and architectures. This
may optimize the audio I/O performance in certain situations.
.IP 4)
Modern implemenations (we're looking at
.B Pro Tools
here) tend to place the Broadcast-WAVE
.I bext
metadata before the data, followed by the data itself, and then other data
after that.
.PP
Clients reading WAVE files should be tolerant and accept any configuration of
chunks, and should accept any file as long as the obligatory
.I fmt
and
.I data
chunks
are present.
.PP
It's not unheard-of to see a naive implementor expect
.B only
these chunks, in this order, and to hard-code the offsets of the short
.I fmt
chunk and
.I data
chunk into their program, and this is something that should always be checked
when evaluating a new tool, just to make sure the developer didn't do this.
Many coding examples and WAVE file explainers from the 90s and early aughts
give the basic layout of a WAVE file and naive devs go along with it.
.SS Encoding and Decoding Text Metadata
.PP
Modern metadata systems, anything developed since the late aughts, will defer
encoding to an XML parser so when dealing with
.I ixml
or
.I axml
so a client can mostly ignore this problem.
.PP
The most established metadata systems are older than this though, and so the
entire weight of text encoding history falls upon the client.
.PP
The original WAVE specification, a part of the Microsoft/IBM Multimedia
interface of 1991, was written at a time when Windows was an ascendant and
soon-to-be dominant desktop environment. Audio files were almost
never shared via LANs or the Internet or any other way. When audio files were
shared, among the miniscule number of people who did this, it was via BBS or
usenet. Users at this time may have ripped them from CDs, but the cost of hard
drives and low quality of compressed formats at the time made this little more
than a curiosity. There was no
.I CDBaby or
.I CDDB
to download and populate metadata from at this time.
.PP
So, the
.I INFO
and
.I cue
metadata systems, which are by far the most prevalent and supported, were
published two years before the so-called "Endless September" of 1993 when the
Internet became mainstream, when Unicode was still a twinkle in the eye, and
two years before Ariana Grande was born.
.PP
The safest assumption, and the mandate of the Microsoft, is that all text
metadata, by default, be encoded in Windows codepage 819, a.k.a. ISO Latin
alphabet 1, or ISO 8859-1. This covers most Western European scripts but
excludes all of Asia, Russia and most of the European Near East, the Middle
East.
.SH CHUNK MENAGERIE
A list of chunks that you may find in a wave file from our experience.
.SS Essential WAV Chunks
.IP fmt
Defines the format of the audio in the
.I data
chunk: the audio codec, the sample rate, bit depth, channel count, block
alignment and other data. May take an "extended" form, with additional data
(such as channel speaker assignments) if there are more than two channels in
the file or if it is a compressed format.
.IP data
The audio data itself. PCM audio data is always stored as interleaved samples.
.SS Optional WAVE Chunks
.IP JUNK
A region of the file not currently in use. Clients sometimes add these before
the
.I data
chunk in order to align the beginning of the audio data with a memory page
boundary (this can make memory-mapped reads from a wave file a little more
efficient). A
.I JUNK
chunk is often placed at the beginning of a WAVE file to reserve space for
a
.I ds64
chunk that will be written to the file at the end of recording, in the event
that after the file is finalized, it exceeds the RIFF size limit. Thus a WAVE
file can be upgraded in-place to an RF64 without re-writing the audio data.
.IP fact
Fact chunks record the number of samples in the decoded audio stream. It's only
present in WAVE files that contain compressed audio.
.IP "LIST or list"
(Both have been seen) Not a chunk type itself but signals to a RIFF parser that
this chunk contains chunks itself. A LIST chunk's payload will begin with a
four-character code identifying the form of the list, and is then followed
by chunks of the standard key-length-data form, which may themselves be
LISTs that themselves contain child chunks. WAVE files don't tend to have a
very deep heirarchy of chunks, compared to AVI files.
.SS RIFF Metadata
The RIFF container format has a metadata system common to all RIFF files, WAVE
being the most common at present, AVI being another very common format
historically.
.IP "LIST form INFO"
A flat list of chunks, each containing text metadata. The role
of the string, like "Artist", "Composer", "Comment", "Engineer" etc. are given
by the four-character code: "Artist" is
.IR IART ,
Composer is
.IR ICMP ,
engineer is
.IR IENG ,
Comment is
.IR ICMT ,
etc.
.IP cue
A binary list of cues, which are timed points within the audio data.
.IP "LIST form adtl"
Contains text labels
.RI ( labl )
for the cues in the
.I cue
chunk, "notes"
.RI ( note ,
which are structurally identical to
.I labl
but hosts tend to use notes for longer text), and "length text"
.I ltxt
metadata records, which can give a cue a length, making it a range, and a text
field that defines its own encoding.
.IP cset
Defines the character set for all text fields in
.IR INFO ,
.I adtl
and other RIFF-defined text fields. By default, all of the text in RIFF
metadata fields is Windows Latin 1/ISO 8859-1, though as time passes many
clients have simply taken to sticking UTF-8 into these fields. The
.I cset
cannot represent UTF-8 as a valid option for text encoding, it only speaks
Windows codepages, and we've never seen one in a WAVE file in any event, and
it's unlikely an audio app would recognize one if it saw it.
.SS Broadcast-WAVE Metadata
Broadcast-WAVE is a set of extensions to WAVE files to facilitate media
production maintained by the EBU.
.IP bext
A multi-field structure containing mostly fixed-width text data capturing
essential production information: a 256 character free description field,
originator name and a unique reference, recording date and time, a frame-based
timestamp for sample-accurate recording time, and a coding history record. The
extended form of the structure can hold a SMPTE UMID (a kind of UUID, which
may also contain timestamp and geolocation data) and pre-computed program
loudness measurements.
.IP peak
A binary data structure containing the peak envelope for the audio data, for
use by clients to generate a waveform overview.
.SS Audio Definition Model Metadata
Audio Definition Model (ADM) metadata is a metadata standard for audio
broadcast and distribution maintained by the ITU.
.IP chna
A binary list that associates individual channels in the file to entities
in the ADM XML document stored in the
.I axml
chunk. A
.I chna
chunk will always appear with an
.I axml
chunk and vice versa.
.IP axml
Contains an XML document with Audio Definition Model metadata. ADM metadata
describes the program the WAVE file belongs to, role, channel assignment,
and encoding properties of individual channels in the WAVE file, and if the
WAVE file contains object-based audio, it will also give all of the positioning
and panning automation envelopes.
.IP bxml
This is defined by the ITU as a gzip-compressed version of the
.I axml
chunk.
.IP sxml
This is a hybrid binary/gzip-compressed-XML chunk that associates ADM
documents with timed ranges of a WAVE file.
.SS Dolby Metadata
Dolby metadata is present in Dolby Atmos master ADM WAVE files.
.IP dbmd
Records hints for Dolby playback applications for downmixing, level
normalization and other things.
.SS Proprietary Chunks
.IP ovwf
.B (Pro Tools)
Pre-computed waveform overview data.
.IP regn
.B (Pro Tools)
Region and cue point metadata.
.SS Chunks of Unknown Purpose
.IP elm1
.IP minf
.IP umid
.SH REFERENCES
(Note: We're not including URLs in this list, the title and standard number
should be sufficient to find almost all of these documents. The ITU, EBU and
IETF standards documents are freely-available.)
.SS Essential File Format
.TP
.B Multimedia Programming Interface and Data Specifications 1.0. Microsoft Corporation, 1991.
The original definition of the
.I RIFF
container, the
.I WAVE
form, the original metadata facilites (like
.IR INFO " and " cue ),
and things like language, country and
dialect enumerations. This document also contains descriptions of certain
variations on the WAVE, such as
.I LIST wavl
and compressed WAVE files that are so rare in practice as to be virtually
non-existent.
.TP
.B ITU Recommendation BS.2088-1-2019 \- Long-form file format for the international exchange of audio programme mterials with metadata. ITU 2019.
Formalized the RF64 file format, ADM carrier chunks like
.IR axml
and
.IR chna .
Formally supercedes the previous standard for RF64,
.BR "EBU 3306 v1" .
One oddity with this standard is it defines the file header for an extended
WAVE file to be
.IR BW64 ,
but this is never seen in practice.
.TP
.B RFC 2361 \- WAVE and AVI Codec Registries. IETF Network Working Group, 1998.
Gives an exhaustive list of all of the codecs that Microsoft had assigned to
vendor WAVE files as of 1998. At the time, numerous hardware vendors, sound
card and chip manufacturers, sound software developers and others all provided
their own slightly-different adaptive PCM codecs, linear predictive compression
codes, DCTs and other things, and Microsoft would issue these vendors WAVE
codec magic numbers. Almost all of these are no longer in use, the only ones
one ever encounters in the modern era are integer PCM (0x01), floating-point
PCM (0x03) and the extended format marker (0xFFFFFFFF). There are over a
hundred codecs assigned, however, a roll-call of failed software and hardware
brands.
.SS Broadcast WAVE Format
.TP
.B EBU Tech 3285 \- Specification of the Broadcast Wave Format (BWF). EBU, 2011.
Defines the elements of a Broadcast WAVE file, the
.I bext
metadata chunk structure, allowed sample formats and other things. Over the
years the EBU has published numerous supplements covering extensions to the
format, such as embedding SMPTE UMIDs, pre-calculated loudness data (EBU Tech
3285 v2),
.I peak
waveform overview data (Suppl. 3), ADM metadata (Suppl. 5 and 7), Dolby master
metadata (Suppl. 6), and other things.
.TP
.B SMPTE 330M-2011 \- Unique Material Identifier. SMPTE, 2011.
Describes the format of the SMPTE UMID field, a 32- or 64-byte UUID used to
identify media files. UMIDs are usually a dumb number in their 32-byte form,
but the extended form can encode a high-precision timestamp (with options for
epoch and timescale) and geolocation information. Broadcast-WAVE files
conforming to
.B "EBU 3285 v2"
have a SMPTE UMID embedded in the
.I bext
chunk.
.SS Audio Definition Model
.TP
.B ITU Recommendation BS.2076-2-2019 \- Audio definition model. ITU, 2019.
Defines the Audio Definition Model, entities, relationships and properties. If
you ever had any questions about how ADM works, this is where you would start.
.SS iXML Metadata
.TP
.B iXML Specification v3.01. Gallery Software, 2021.
iXML is a standard for embedding mostly human-created metadata into WAVE files,
and mostly with an emphasis on location sound recorders used on film and
television productions. Frustratingly the developer has never published a DTD
or schema validation or strict formal standard, and encourages vendors to just
do whatever, but most of the heavily-traveled metadata fields are standardized,
for recording information like a recording's scene, take, recording notes,
circled or alt status. iXML also has a system of
.B "families"
for associating several WAVE files together into one recording.