mirror of
https://github.com/iluvcapra/wavinfo.git
synced 2025-12-31 17:00:41 +00:00
381 lines
15 KiB
Groff
381 lines
15 KiB
Groff
.TH waveinfo 7 "2023-11-08" "Jamie Hardt" "Miscellaneous Information Manuals"
|
|
.SH NAME
|
|
wavinfo \- WAVE file metadata
|
|
.SH SYNOPSIS
|
|
Everything you ever wated to know about WAVE metadata but were afraid to ask.
|
|
.SH DESCRIPTION
|
|
.PP
|
|
The WAVE file format is forwards-compatible. Apart from audio data, it can
|
|
hold arbitrary blocks of bytes which clients will automatically ignore
|
|
unless they recognize them and know how to read them.
|
|
.PP
|
|
Without saying too much about the structure and parsing of WAVE files
|
|
themselves \- a subject beyond the scope of this document \- WAVE files are
|
|
divided into segments or
|
|
.BR chunks ,
|
|
which a client parser can either read or skip without reading. Chunks have
|
|
an identifier, or signature: a four-character-code that tells a client what
|
|
kind of chunk it is, and a length. Based on this information, a client can look
|
|
at the identifier and decide if it knows how to read a chunk and if it wants
|
|
to. If it doesn't, it can simply read the length and skip past it.
|
|
.PP
|
|
Some chunks are mandated by the Microsoft standard, specifically
|
|
.I fmt
|
|
and
|
|
.I data
|
|
in the case of PCM-encoded WAVE files. Other chunks, like
|
|
.I cue
|
|
or
|
|
.IR bext ,
|
|
are optional, and optional chunks usually hold metadata.
|
|
.PP
|
|
Chunks can also nest inside other chunks, a special identifier
|
|
.I LIST
|
|
is used to indicate these. A WAVE file is a recursive list: a top level
|
|
list of chunks, where chunks may contain a list of chunks themselves.
|
|
.SS Order and Arrangement of Metadata Chunks in a WAVE File
|
|
.PP
|
|
Chunks in a WAVE file can appear in any order, and a capable parser can accept
|
|
them appearing in any order. However, authorities give guidance on where chunks
|
|
should be placed when creating a new WAVE file.
|
|
.PP
|
|
.IP 1)
|
|
For all new WAVE files, clients should always place an empty chunk, a
|
|
so-called
|
|
.I JUNK
|
|
chunk, in the first position in the top-level list of a WAVE file, and
|
|
it should be sized large enough to hold a
|
|
.I ds64
|
|
chunk record. This will allow clients to upgrade the file to a RF64
|
|
WAVE file
|
|
.BR in-place ,
|
|
without having to re-write the file or audio data.
|
|
.IP 2)
|
|
Older authorites recommend placing metadata before the audio data, so clients
|
|
reading the file sequentially will hit it before having to seek through the
|
|
audio. This may improve metadata read performance on certain architectures.
|
|
.IP 3)
|
|
Older authorities also recommend inserting
|
|
.I JUNK
|
|
before the
|
|
.I data
|
|
chunk, sized so that the first byte of the
|
|
.I data
|
|
payload lands immediately at 0x1000 (4096), because this was a common factor of
|
|
the page boundaries of many operating systems and architectures. This may
|
|
optimize the audio I/O performance in certain situations.
|
|
.IP 4)
|
|
Modern implementations (we're looking at
|
|
.B Pro Tools
|
|
here) tend to place the Broadcast-WAVE
|
|
.I bext
|
|
metadata before the data, followed by the data itself, and then other data
|
|
after that.
|
|
.\" .PP
|
|
.\" Clients reading WAVE files should be tolerant and accept any configuration of
|
|
.\" chunks, and should accept any file as long as the obligatory
|
|
.\" .I fmt
|
|
.\" and
|
|
.\" .I data
|
|
.\" chunks
|
|
.\" are present.
|
|
.PP
|
|
It's not unheard-of to see a naive implementor expect
|
|
.B only
|
|
.I fmt
|
|
and
|
|
.I data
|
|
chunks, in this order, and to hard-code the offsets of the short
|
|
.I fmt
|
|
chunk and
|
|
.I data
|
|
chunk into their program, and this is something that should always be checked
|
|
when evaluating a new tool, just to make sure the developer didn't do this.
|
|
Many coding examples and WAVE file explainers from the 90s and early aughts
|
|
give the basic layout of a WAVE file, and naive devs go along with it.
|
|
.SS Encoding and Decoding Text Metadata
|
|
.\" .PP
|
|
.\" Modern metadata systems, anything developed since the late aughts, will defer
|
|
.\" encoding to an XML parser, so when dealing with
|
|
.\" .I ixml
|
|
.\" or
|
|
.\" .I axml
|
|
.\" so a client can mostly ignore this problem.
|
|
.\" .PP
|
|
.\" The most established metadata systems are older than this though, and so the
|
|
.\" entire weight of text encoding history falls upon the client.
|
|
.\" .PP
|
|
.\" The original WAVE specification, a part of the Microsoft/IBM Multimedia
|
|
.\" interface of 1991, was written at a time when Windows was an ascendant and
|
|
.\" soon-to-be dominant desktop environment. Audio files were almost
|
|
.\" never shared via LANs or the Internet or any other way. When audio files were
|
|
.\" shared, among the miniscule number of people who did this, it was via BBS or
|
|
.\" Usenet. Users at this time may have ripped them from CDs, but the cost of hard
|
|
.\" drives and low quality of compressed formats at the time made this little more
|
|
.\" than a curiosity. There was no CDBaby or CDDB to download and populate metadata
|
|
.\" from at this time.
|
|
.\" .PP
|
|
.\" So, the
|
|
.\" .I INFO
|
|
.\" and
|
|
.\" .I cue
|
|
.\" metadata systems, which are by far the most prevalent and supported, were
|
|
.\" published two years before the so-called "Endless September" of 1993 when the
|
|
.\" Internet became mainstream, when Unicode was still a twinkle in the eye, and
|
|
.\" two years before Ariana Grande was born.
|
|
.PP
|
|
The safest assumption, and the mandate of the Microsoft, is that all text
|
|
metadata, by default, be encoded in Windows codepage 819, a.k.a. ISO Latin
|
|
alphabet 1, or ISO 8859-1. This covers most Western European scripts but
|
|
excludes all of Asia, Russia, most of the European Near East, the Middle
|
|
East.
|
|
.PP
|
|
To account for this, Microsoft proposed a few conventions, none of which have
|
|
been adopted with any consistency among clients of the WAVE file standard.
|
|
.IP 1)
|
|
The RIFF standard defines a
|
|
.I cset
|
|
chunk which declares a Windows codepage for character encoding, along with a
|
|
native country code, language and dialect, which clients should use for
|
|
determining text information. We have never seen a WAVE
|
|
file with a
|
|
.I cest
|
|
chunk.
|
|
.IP 2)
|
|
Certain RIFF chunks allow the writing client to override the default encoding.
|
|
Relevant to audio files are the
|
|
.I ltxt
|
|
chunk, which encodes a country, language, dialect and codepage along with a
|
|
time range text note. We have never seen the text field on one of these
|
|
filled-out either.
|
|
.PP
|
|
Some clients, in our experience, simply write UTF-8 into
|
|
.IR cue ,
|
|
.IR labl ,
|
|
and
|
|
.I note
|
|
fields without any kind of framing.
|
|
.PP
|
|
A practical solution is to assume either ISO Latin 1, Windows CP 859 or Windows
|
|
CP 1252, and allow the client or user to override this based on its own
|
|
inferences. The
|
|
.I chardet
|
|
python package may provide useable guesses for text encoding, YMMV.
|
|
.SH CHUNK MENAGERIE
|
|
A list of chunks that you may find in a wave file from our experience.
|
|
.SS Essential WAV Chunks
|
|
.IP fmt
|
|
Defines the format of the audio in the
|
|
.I data
|
|
chunk: the audio codec, the sample rate, bit depth, channel count, block
|
|
alignment and other data. May take an "extended" form, with additional data
|
|
(such as channel speaker assignments) if there are more than two channels in
|
|
the file or if it is a compressed format.
|
|
.IP data
|
|
The audio data itself. PCM audio data is always stored as interleaved samples.
|
|
.SS Optional WAVE Chunks
|
|
.IP JUNK
|
|
A region of the file not currently in use. Clients sometimes add these before
|
|
the
|
|
.I data
|
|
chunk in order to align the beginning of the audio data with a memory page
|
|
boundary (this can make memory-mapped reads from a wave file a little more
|
|
efficient). A
|
|
.I JUNK
|
|
chunk is often placed at the beginning of a WAVE file to reserve space for
|
|
a
|
|
.I ds64
|
|
chunk that will be written to the file at the end of recording, in the event
|
|
that after the file is finalized, it exceeds the RIFF size limit. Thus a WAVE
|
|
file can be upgraded in-place to an RF64 without re-writing the audio data.
|
|
.IP fact
|
|
Fact chunks record the number of samples in the decoded audio stream. It's only
|
|
present in WAVE files that contain compressed audio.
|
|
.IP "LIST or list"
|
|
(Both have been seen) Not a chunk type itself but signals to a RIFF parser that
|
|
this chunk contains chunks itself. A LIST chunk's payload will begin with a
|
|
four-character code identifying the form of the list, and is then followed
|
|
by chunks of the standard key-length-data form, which may themselves be
|
|
LISTs that themselves contain child chunks. WAVE files don't tend to have a
|
|
very deep heirarchy of chunks, compared to AVI files.
|
|
.SS RIFF Metadata
|
|
The RIFF container format has a metadata system common to all RIFF files, WAVE
|
|
being the most common at present, AVI being another very common format
|
|
historically.
|
|
.IP "LIST form INFO"
|
|
A flat list of chunks, each containing text metadata. The role
|
|
of the string, like "Artist", "Composer", "Comment", "Engineer" etc. are given
|
|
by the four-character code: "Artist" is
|
|
.IR IART ,
|
|
Composer is
|
|
.IR ICMP ,
|
|
engineer is
|
|
.IR IENG ,
|
|
Comment is
|
|
.IR ICMT ,
|
|
etc.
|
|
.IP cue
|
|
A binary list of cues, which are timed points within the audio data.
|
|
.IP "LIST form adtl"
|
|
Contains text labels
|
|
.RI ( labl )
|
|
for the cues in the
|
|
.I cue
|
|
chunk, "notes"
|
|
.RI ( note ,
|
|
which are structurally identical to
|
|
.I labl
|
|
but hosts tend to use notes for longer text), and "length text"
|
|
.I ltxt
|
|
metadata records, which can give a cue a length, making it a range, and a text
|
|
field that defines its own encoding.
|
|
.IP cset
|
|
Defines the character set for all text fields in
|
|
.IR INFO ,
|
|
.I adtl
|
|
and other RIFF-defined text fields. By default, all of the text in RIFF
|
|
metadata fields is Windows Latin 1/ISO 8859-1, though as time passes many
|
|
clients have simply taken to sticking UTF-8 into these fields. The
|
|
.I cset
|
|
cannot represent UTF-8 as a valid option for text encoding, it only speaks
|
|
Windows codepages, and we've never seen one in a WAVE file in any event, and
|
|
it's unlikely an audio app would recognize one if it saw it.
|
|
.SS Broadcast-WAVE Metadata
|
|
Broadcast-WAVE is a set of extensions to WAVE files to facilitate media
|
|
production maintained by the EBU.
|
|
.IP bext
|
|
A multi-field structure containing mostly fixed-width text data capturing
|
|
essential production information: a 256 character free description field,
|
|
originator name and a unique reference, recording date and time, a frame-based
|
|
timestamp for sample-accurate recording time, and a coding history record. The
|
|
extended form of the structure can hold a SMPTE UMID (a kind of UUID, which
|
|
may also contain timestamp and geolocation data) and pre-computed program
|
|
loudness measurements.
|
|
.IP peak
|
|
A binary data structure containing the peak envelope for the audio data, for
|
|
use by clients to generate a waveform overview.
|
|
.SS Audio Definition Model Metadata
|
|
Audio Definition Model (ADM) metadata is a metadata standard for audio
|
|
broadcast and distribution maintained by the ITU.
|
|
.IP chna
|
|
A binary list that associates individual channels in the file to entities
|
|
in the ADM XML document stored in the
|
|
.I axml
|
|
chunk. A
|
|
.I chna
|
|
chunk will always appear with an
|
|
.I axml
|
|
chunk and vice versa.
|
|
.IP axml
|
|
Contains an XML document with Audio Definition Model metadata. ADM metadata
|
|
describes the program the WAVE file belongs to, role, channel assignment,
|
|
and encoding properties of individual channels in the WAVE file, and if the
|
|
WAVE file contains object-based audio, it will also give all of the positioning
|
|
and panning automation envelopes.
|
|
.IP bxml
|
|
This is defined by the ITU as a gzip-compressed version of the
|
|
.I axml
|
|
chunk.
|
|
.IP sxml
|
|
This is a hybrid binary/gzip-compressed-XML chunk that associates ADM
|
|
documents with timed ranges of a WAVE file.
|
|
.SS Dolby Metadata
|
|
Dolby metadata is present in Dolby Atmos master ADM WAVE files.
|
|
.IP dbmd
|
|
Records hints for Dolby playback applications for downmixing, level
|
|
normalization and other things.
|
|
.SS Proprietary Chunks
|
|
.IP ovwf
|
|
.B (Pro Tools)
|
|
Pre-computed waveform overview data.
|
|
.IP regn
|
|
.B (Pro Tools)
|
|
Region and cue point metadata.
|
|
.SS Chunks of Unknown Purpose
|
|
.IP elm1
|
|
.IP minf
|
|
.IP umid
|
|
.SH REFERENCES
|
|
(Note: We're not including URLs in this list, the title and standard number
|
|
should be sufficient to find almost all of these documents. The ITU, EBU and
|
|
IETF standards documents are freely-available.)
|
|
.SS Essential File Format
|
|
.TP
|
|
.B Multimedia Programming Interface and Data Specifications 1.0. Microsoft Corporation, 1991.
|
|
The original definition of the
|
|
.I RIFF
|
|
container, the
|
|
.I WAVE
|
|
form, the original metadata facilites (like
|
|
.IR INFO " and " cue ),
|
|
and things like language, country and
|
|
dialect enumerations. This document also contains descriptions of certain
|
|
variations on the WAVE, such as
|
|
.I LIST wavl
|
|
and compressed WAVE files that are so rare in practice as to be virtually
|
|
non-existent.
|
|
.TP
|
|
.B ITU Recommendation BS.2088-1-2019 \- Long-form file format for the international exchange of audio programme mterials with metadata. ITU 2019.
|
|
Formalized the RF64 file format, ADM carrier chunks like
|
|
.IR axml
|
|
and
|
|
.IR chna .
|
|
Formally supercedes the previous standard for RF64,
|
|
.BR "EBU 3306 v1" .
|
|
One oddity with this standard is it defines the file header for an extended
|
|
WAVE file to be
|
|
.IR BW64 ,
|
|
but this is never seen in practice.
|
|
.TP
|
|
.B RFC 2361 \- WAVE and AVI Codec Registries. IETF Network Working Group, 1998.
|
|
Gives an exhaustive list of all of the codecs that Microsoft had assigned to
|
|
vendor WAVE files as of 1998. At the time, numerous hardware vendors, sound
|
|
card and chip manufacturers, sound software developers and others all provided
|
|
their own slightly-different adaptive PCM codecs, linear predictive compression
|
|
codes, DCTs and other things, and Microsoft would issue these vendors WAVE
|
|
codec magic numbers. Almost all of these are no longer in use, the only ones
|
|
one ever encounters in the modern era are integer PCM (0x01), floating-point
|
|
PCM (0x03) and the extended format marker (0xFFFFFFFF). There are over a
|
|
hundred codecs assigned, however, a roll-call of failed software and hardware
|
|
brands.
|
|
.SS Broadcast WAVE Format
|
|
.TP
|
|
.B EBU Tech 3285 \- Specification of the Broadcast Wave Format (BWF). EBU, 2011.
|
|
Defines the elements of a Broadcast WAVE file, the
|
|
.I bext
|
|
metadata chunk structure, allowed sample formats and other things. Over the
|
|
years the EBU has published numerous supplements covering extensions to the
|
|
format, such as embedding SMPTE UMIDs, pre-calculated loudness data (EBU Tech
|
|
3285 v2),
|
|
.I peak
|
|
waveform overview data (Suppl. 3), ADM metadata (Suppl. 5 and 7), Dolby master
|
|
metadata (Suppl. 6), and other things.
|
|
.TP
|
|
.B SMPTE 330M-2011 \- Unique Material Identifier. SMPTE, 2011.
|
|
Describes the format of the SMPTE UMID field, a 32- or 64-byte UUID used to
|
|
identify media files. UMIDs are usually a dumb number in their 32-byte form,
|
|
but the extended form can encode a high-precision timestamp (with options for
|
|
epoch and timescale) and geolocation information. Broadcast-WAVE files
|
|
conforming to
|
|
.B "EBU 3285 v2"
|
|
have a SMPTE UMID embedded in the
|
|
.I bext
|
|
chunk.
|
|
.SS Audio Definition Model
|
|
.TP
|
|
.B ITU Recommendation BS.2076-2-2019 \- Audio definition model. ITU, 2019.
|
|
Defines the Audio Definition Model, entities, relationships and properties. If
|
|
you ever had any questions about how ADM works, this is where you would start.
|
|
.SS iXML Metadata
|
|
.TP
|
|
.B iXML Specification v3.01. Gallery Software, 2021.
|
|
iXML is a standard for embedding mostly human-created metadata into WAVE files,
|
|
and mostly with an emphasis on location sound recorders used on film and
|
|
television productions. Frustratingly the developer has never published a DTD
|
|
or schema validation or strict formal standard, and encourages vendors to just
|
|
do whatever, but most of the heavily-traveled metadata fields are standardized,
|
|
for recording information like a recording's scene, take, recording notes,
|
|
circled or alt status. iXML also has a system of
|
|
.B "families"
|
|
for associating several WAVE files together into one recording.
|