mirror of
https://github.com/iluvcapra/wavinfo.git
synced 2025-12-31 08:50:41 +00:00
Merge pull request #24 from iluvcapra/feature-man7
Manpage wavinfo(7) enhancement
This commit is contained in:
@@ -1,19 +1,179 @@
|
|||||||
.TH waveinfo 7 "2023-11-07" "Jamie Hardt" "Miscellaneous Information Manuals"
|
.TH waveinfo 7 "2023-11-08" "Jamie Hardt" "Miscellaneous Information Manuals"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
wavinfo \- information about wave sound file metadata
|
wavinfo \- WAVE file metadata
|
||||||
.\" .SH DESCRIPTION
|
.SH SYNOPSIS
|
||||||
|
Everything you ever wated to know about WAVE metadata but were afraid to ask.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.PP
|
||||||
|
The WAVE file format is forwards-compatible. Apart from audio data, it can
|
||||||
|
hold arbitrary blocks of bytes which clients will automatically ignore
|
||||||
|
unless they recognize them and know how to read them.
|
||||||
|
.PP
|
||||||
|
Without saying too much about the structure and parsing of WAVE files
|
||||||
|
themselves \- a subject beyond the scope of this document \- WAVE files are
|
||||||
|
divided into segments or
|
||||||
|
.BR chunks ,
|
||||||
|
which a client parser can either read or skip without reading. Chunks have
|
||||||
|
an identifier, or signature: a four-character-code that tells a client what
|
||||||
|
kind of chunk it is, and a length. Based on this information, a client can look
|
||||||
|
at the identifier and decide if it knows how to read that chunk and if it wants
|
||||||
|
to. If it doesn't, it can simply read the length and skip past it.
|
||||||
|
.PP
|
||||||
|
Some chunks are mandated by the Microsoft standard, specifically
|
||||||
|
.I fmt
|
||||||
|
and
|
||||||
|
.I data
|
||||||
|
in the case of PCM-encoded WAVE files. Other chunks, like
|
||||||
|
.I cue
|
||||||
|
or
|
||||||
|
.IR bext ,
|
||||||
|
are optional, and optional chunks usually hold metadata.
|
||||||
|
.PP
|
||||||
|
Chunks can also nest inside other chunks, a special identifier
|
||||||
|
.I LIST
|
||||||
|
is used to indicate these. A WAVE file is a recursive list: a top level
|
||||||
|
list of chunks, where chunks may contain a list of chunks themselves.
|
||||||
|
.SS Order of Metadata Chunks in a WAVE File
|
||||||
|
.PP
|
||||||
|
Chunks in a WAVE file can appear in any order, and a capable parser can
|
||||||
|
accept them appearing in any order, however authorities give guidance on
|
||||||
|
where chunks should be placed, when creating a new WAVE file.
|
||||||
|
.PP
|
||||||
|
.IP 1)
|
||||||
|
For all new WAVE files, clients should always place an empty chunk, a
|
||||||
|
so-called
|
||||||
|
.I JUNK
|
||||||
|
chunk, in the first position in the top-level list of a WAVE file, and
|
||||||
|
it should be sized large enough to hold a
|
||||||
|
.I ds64
|
||||||
|
chunk record. This will allow clients to upgrade the file to a RF64
|
||||||
|
WAVE file
|
||||||
|
.BR in-place ,
|
||||||
|
without having to re-write the file or audio data.
|
||||||
|
.IP 2)
|
||||||
|
Older authorites recommend placing metadata before the audio data, so clients
|
||||||
|
reading the file sequentially will hit it before having to seek through the
|
||||||
|
audio. This may improve metadata read performance on certain architecures.
|
||||||
|
.IP 3)
|
||||||
|
Older authorities also recommend inserting
|
||||||
|
.I JUNK
|
||||||
|
before the
|
||||||
|
.I data
|
||||||
|
chunk, sized so that the first byte of the
|
||||||
|
.I data
|
||||||
|
payload lands immediately at 0x1000 (4096), because this was a common
|
||||||
|
factor of the page boundaries of many operating systems and architectures. This
|
||||||
|
may optimize the audio I/O performance in certain situations.
|
||||||
|
.IP 4)
|
||||||
|
Modern implemenations (we're looking at
|
||||||
|
.B Pro Tools
|
||||||
|
here) tend to place the Broadcast-WAVE
|
||||||
|
.I bext
|
||||||
|
metadata before the data, followed by the data itself, and then other data
|
||||||
|
after that.
|
||||||
|
.\" .PP
|
||||||
|
.\" Clients reading WAVE files should be tolerant and accept any configuration of
|
||||||
|
.\" chunks, and should accept any file as long as the obligatory
|
||||||
|
.\" .I fmt
|
||||||
|
.\" and
|
||||||
|
.\" .I data
|
||||||
|
.\" chunks
|
||||||
|
.\" are present.
|
||||||
|
.PP
|
||||||
|
It's not unheard-of to see a naive implementor expect
|
||||||
|
.B only
|
||||||
|
.I fmt
|
||||||
|
and
|
||||||
|
.I data
|
||||||
|
chunks, in this order, and to hard-code the offsets of the short
|
||||||
|
.I fmt
|
||||||
|
chunk and
|
||||||
|
.I data
|
||||||
|
chunk into their program, and this is something that should always be checked
|
||||||
|
when evaluating a new tool, just to make sure the developer didn't do this.
|
||||||
|
Many coding examples and WAVE file explainers from the 90s and early aughts
|
||||||
|
give the basic layout of a WAVE file, and naive devs go along with it.
|
||||||
|
.SS Encoding and Decoding Text Metadata
|
||||||
|
.\" .PP
|
||||||
|
.\" Modern metadata systems, anything developed since the late aughts, will defer
|
||||||
|
.\" encoding to an XML parser, so when dealing with
|
||||||
|
.\" .I ixml
|
||||||
|
.\" or
|
||||||
|
.\" .I axml
|
||||||
|
.\" so a client can mostly ignore this problem.
|
||||||
|
.\" .PP
|
||||||
|
.\" The most established metadata systems are older than this though, and so the
|
||||||
|
.\" entire weight of text encoding history falls upon the client.
|
||||||
|
.\" .PP
|
||||||
|
.\" The original WAVE specification, a part of the Microsoft/IBM Multimedia
|
||||||
|
.\" interface of 1991, was written at a time when Windows was an ascendant and
|
||||||
|
.\" soon-to-be dominant desktop environment. Audio files were almost
|
||||||
|
.\" never shared via LANs or the Internet or any other way. When audio files were
|
||||||
|
.\" shared, among the miniscule number of people who did this, it was via BBS or
|
||||||
|
.\" Usenet. Users at this time may have ripped them from CDs, but the cost of hard
|
||||||
|
.\" drives and low quality of compressed formats at the time made this little more
|
||||||
|
.\" than a curiosity. There was no CDBaby or CDDB to download and populate metadata
|
||||||
|
.\" from at this time.
|
||||||
|
.\" .PP
|
||||||
|
.\" So, the
|
||||||
|
.\" .I INFO
|
||||||
|
.\" and
|
||||||
|
.\" .I cue
|
||||||
|
.\" metadata systems, which are by far the most prevalent and supported, were
|
||||||
|
.\" published two years before the so-called "Endless September" of 1993 when the
|
||||||
|
.\" Internet became mainstream, when Unicode was still a twinkle in the eye, and
|
||||||
|
.\" two years before Ariana Grande was born.
|
||||||
|
.PP
|
||||||
|
The safest assumption, and the mandate of the Microsoft, is that all text
|
||||||
|
metadata, by default, be encoded in Windows codepage 819, a.k.a. ISO Latin
|
||||||
|
alphabet 1, or ISO 8859-1. This covers most Western European scripts but
|
||||||
|
excludes all of Asia, Russia, most of the European Near East, the Middle
|
||||||
|
East.
|
||||||
|
.PP
|
||||||
|
To account for this, Microsoft proposed a few conventions, none of which have
|
||||||
|
been adopted with any consistency among clients of the WAVE file standard.
|
||||||
|
.IP 1)
|
||||||
|
The RIFF standard defines a
|
||||||
|
.I cset
|
||||||
|
chunk which declares a Windows codepage for character encoding, along with a
|
||||||
|
native country code, language and dialect, which clients should use for
|
||||||
|
determining text information. We have never seen a WAVE
|
||||||
|
file with a
|
||||||
|
.I cest
|
||||||
|
chunk.
|
||||||
|
.IP 2)
|
||||||
|
Certain RIFF chunks allow the writing client to override the default encoding.
|
||||||
|
Relevant to audio files are the
|
||||||
|
.I ltxt
|
||||||
|
chunk, which encodes a country, language, dialect and codepage along with a
|
||||||
|
time range text note. We have never seen the text field on one of these
|
||||||
|
filled-out either.
|
||||||
|
.PP
|
||||||
|
Some clients in our experience simply write UTF-8 into
|
||||||
|
.IR cue ,
|
||||||
|
.IR labl ,
|
||||||
|
and
|
||||||
|
.I note
|
||||||
|
fields without any kind of framing.
|
||||||
|
.PP
|
||||||
|
The practical solution at this time is to assume either ISO Latin 1, Windows
|
||||||
|
CP 859 or Windows CP 1252, and allow the client or user to override this based
|
||||||
|
on its own inferences. The
|
||||||
|
.I chardet
|
||||||
|
python package may provide useable guesses for text encoding, YMMV.
|
||||||
.SH CHUNK MENAGERIE
|
.SH CHUNK MENAGERIE
|
||||||
A list of chunks that you may find in a wave file from our experience.
|
A list of chunks that you may find in a wave file from our experience.
|
||||||
.SS Essential WAV Chunks
|
.SS Essential WAV Chunks
|
||||||
.IP fmt
|
.IP fmt
|
||||||
Defines the format of the audio in the
|
Defines the format of the audio in the
|
||||||
.I data
|
.I data
|
||||||
chunk: the audio codec, the sample rate, bit depth, channel count, block
|
chunk: the audio codec, the sample rate, bit depth, channel count, block
|
||||||
alignment and other data. May take an "extended" form, with additional data
|
alignment and other data. May take an "extended" form, with additional data
|
||||||
(such as channel speaker assignments) if there are more than two channels in
|
(such as channel speaker assignments) if there are more than two channels in
|
||||||
the file or if it is a compressed format.
|
the file or if it is a compressed format.
|
||||||
.IP data
|
.IP data
|
||||||
The audio data itself. PCM audio data is always stored as interleaved samples.
|
The audio data itself. PCM audio data is always stored as interleaved samples.
|
||||||
|
.SS Optional WAVE Chunks
|
||||||
.IP JUNK
|
.IP JUNK
|
||||||
A region of the file not currently in use. Clients sometimes add these before
|
A region of the file not currently in use. Clients sometimes add these before
|
||||||
the
|
the
|
||||||
@@ -42,10 +202,8 @@ very deep heirarchy of chunks, compared to AVI files.
|
|||||||
The RIFF container format has a metadata system common to all RIFF files, WAVE
|
The RIFF container format has a metadata system common to all RIFF files, WAVE
|
||||||
being the most common at present, AVI being another very common format
|
being the most common at present, AVI being another very common format
|
||||||
historically.
|
historically.
|
||||||
.IP INFO
|
.IP "LIST form INFO"
|
||||||
A
|
A flat list of chunks, each containing text metadata. The role
|
||||||
.I LIST
|
|
||||||
form containing a flat list of chunks, each containing text metadata. The role
|
|
||||||
of the string, like "Artist", "Composer", "Comment", "Engineer" etc. are given
|
of the string, like "Artist", "Composer", "Comment", "Engineer" etc. are given
|
||||||
by the four-character code: "Artist" is
|
by the four-character code: "Artist" is
|
||||||
.IR IART ,
|
.IR IART ,
|
||||||
@@ -58,10 +216,8 @@ Comment is
|
|||||||
etc.
|
etc.
|
||||||
.IP cue
|
.IP cue
|
||||||
A binary list of cues, which are timed points within the audio data.
|
A binary list of cues, which are timed points within the audio data.
|
||||||
.IP adtl
|
.IP "LIST form adtl"
|
||||||
A
|
Contains text labels
|
||||||
.I LIST
|
|
||||||
form containing text labels
|
|
||||||
.RI ( labl )
|
.RI ( labl )
|
||||||
for the cues in the
|
for the cues in the
|
||||||
.I cue
|
.I cue
|
||||||
@@ -73,17 +229,17 @@ but hosts tend to use notes for longer text), and "length text"
|
|||||||
.I ltxt
|
.I ltxt
|
||||||
metadata records, which can give a cue a length, making it a range, and a text
|
metadata records, which can give a cue a length, making it a range, and a text
|
||||||
field that defines its own encoding.
|
field that defines its own encoding.
|
||||||
.IP CSET
|
.IP cset
|
||||||
Defines the character set for all text fields in
|
Defines the character set for all text fields in
|
||||||
.IR INFO ,
|
.IR INFO ,
|
||||||
.I adtl
|
.I adtl
|
||||||
and other RIFF-defined text fields. By default, all of the text in RIFF
|
and other RIFF-defined text fields. By default, all of the text in RIFF
|
||||||
metadata fields is Windows Latin 1/ISO 8859-1, though as time passes many
|
metadata fields is Windows Latin 1/ISO 8859-1, though as time passes many
|
||||||
clients have simply taken to sticking UTF-8 into these fields. The
|
clients have simply taken to sticking UTF-8 into these fields. The
|
||||||
.I CSET
|
.I cset
|
||||||
cannot represent UTF-8 as a valid option for text encoding, it only speaks
|
cannot represent UTF-8 as a valid option for text encoding, it only speaks
|
||||||
Windows codepages, and we've never seen one in a WAVE file in any event and
|
Windows codepages, and we've never seen one in a WAVE file in any event, and
|
||||||
it's vanishingly likely an audio app would recognize one if it saw it.
|
it's unlikely an audio app would recognize one if it saw it.
|
||||||
.SS Broadcast-WAVE Metadata
|
.SS Broadcast-WAVE Metadata
|
||||||
Broadcast-WAVE is a set of extensions to WAVE files to facilitate media
|
Broadcast-WAVE is a set of extensions to WAVE files to facilitate media
|
||||||
production maintained by the EBU.
|
production maintained by the EBU.
|
||||||
@@ -124,6 +280,7 @@ chunk.
|
|||||||
This is a hybrid binary/gzip-compressed-XML chunk that associates ADM
|
This is a hybrid binary/gzip-compressed-XML chunk that associates ADM
|
||||||
documents with timed ranges of a WAVE file.
|
documents with timed ranges of a WAVE file.
|
||||||
.SS Dolby Metadata
|
.SS Dolby Metadata
|
||||||
|
Dolby metadata is present in Dolby Atmos master ADM WAVE files.
|
||||||
.IP dbmd
|
.IP dbmd
|
||||||
Records hints for Dolby playback applications for downmixing, level
|
Records hints for Dolby playback applications for downmixing, level
|
||||||
normalization and other things.
|
normalization and other things.
|
||||||
@@ -138,53 +295,86 @@ Region and cue point metadata.
|
|||||||
.IP elm1
|
.IP elm1
|
||||||
.IP minf
|
.IP minf
|
||||||
.IP umid
|
.IP umid
|
||||||
.SH HISTORY
|
.SH REFERENCES
|
||||||
The oldest document that defines the form of a Wave file is the
|
(Note: We're not including URLs in this list, the title and standard number
|
||||||
.I Multimedia Programming Interface and Data Specifications 1.0
|
should be sufficient to find almost all of these documents. The ITU, EBU and
|
||||||
of August 1991.
|
IETF standards documents are freely-available.)
|
||||||
.\" .SH REFERENCES
|
.SS Essential File Format
|
||||||
.\" .SS ESSENTIAL FILE FORMAT
|
.TP
|
||||||
.\" .TP
|
.B Multimedia Programming Interface and Data Specifications 1.0. Microsoft Corporation, 1991.
|
||||||
.\" .UR https://www.aelius.com/njh/wavemetatools/doc/riffmci.pdf
|
The original definition of the
|
||||||
.\" Multimedia Programming Interface and Data Specifications 1.0
|
.I RIFF
|
||||||
.\" .UE
|
container, the
|
||||||
.\" The original definition of the
|
.I WAVE
|
||||||
.\" .I RIFF
|
form, the original metadata facilites (like
|
||||||
.\" container, the
|
.IR INFO " and " cue ),
|
||||||
.\" .I WAVE
|
and things like language, country and
|
||||||
.\" form, the original metadata facilites, and things like language, country and
|
dialect enumerations. This document also contains descriptions of certain
|
||||||
.\" dialect enumerations.
|
variations on the WAVE, such as
|
||||||
.\" .TP
|
.I LIST wavl
|
||||||
.\" .UR https://datatracker.ietf.org/doc/html/rfc2361
|
and compressed WAVE files that are so rare in practice as to be virtually
|
||||||
.\" RFC 2361
|
non-existent.
|
||||||
.\" .UE
|
.TP
|
||||||
.\" A large RFC compilation of all of the known (in 1998) audio encoding formats
|
.B ITU Recommendation BS.2088-1-2019 \- Long-form file format for the international exchange of audio programme mterials with metadata. ITU 2019.
|
||||||
.\" in use. 104 different codecs are documented with a name, the corresponding
|
Formalized the RF64 file format, ADM carrier chunks like
|
||||||
.\" magic number, and a vendor contact name, phone number and address (no
|
.IR axml
|
||||||
.\" emails, strangely). Almost all of these are of historical interest only.
|
and
|
||||||
.\" .SS RF64/Extended WAVE Format
|
.IR chna .
|
||||||
.\"
|
Formally supercedes the previous standard for RF64,
|
||||||
.\" .TP
|
.BR "EBU 3306 v1" .
|
||||||
.\" .UR https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2088-1-201910-I!!PDF-E.pdf
|
One oddity with this standard is it defines the file header for an extended
|
||||||
.\" ITU Recommendation BS.2088-1-2019
|
WAVE file to be
|
||||||
.\" .UE
|
.IR BW64 ,
|
||||||
.\" BS.2088 gives a detailed description of the internals of an RF64 file,
|
but this is never seen in practice.
|
||||||
.\" .I ds64
|
.TP
|
||||||
.\" structure and all formal requirements. It also defines the use of
|
.B RFC 2361 \- WAVE and AVI Codec Registries. IETF Network Working Group, 1998.
|
||||||
.\" .IR <axml> ,
|
Gives an exhaustive list of all of the codecs that Microsoft had assigned to
|
||||||
.\" .IR <bxml> ,
|
vendor WAVE files as of 1998. At the time, numerous hardware vendors, sound
|
||||||
.\" .IR <sxml> ,
|
card and chip manufacturers, sound software developers and others all provided
|
||||||
.\" and
|
their own slightly-different adaptive PCM codecs, linear predictive compression
|
||||||
.\" .I <chna>
|
codes, DCTs and other things, and Microsoft would issue these vendors WAVE
|
||||||
.\" metadata chunks for the carriage of Audio Definition Model metadata.
|
codec magic numbers. Almost all of these are no longer in use, the only ones
|
||||||
.\" .TP
|
one ever encounters in the modern era are integer PCM (0x01), floating-point
|
||||||
.\" .UR https://tech.ebu.ch/docs/tech/tech3306.pdf
|
PCM (0x03) and the extended format marker (0xFFFFFFFF). There are over a
|
||||||
.\" EBU Tech 3306 "RF64: An Extended File Format for Audio Data"
|
hundred codecs assigned, however, a roll-call of failed software and hardware
|
||||||
.\" .UE
|
brands.
|
||||||
.\" Version 1 of Tech 3306 laid out the
|
.SS Broadcast WAVE Format
|
||||||
.\" .I RF64
|
.TP
|
||||||
.\" extended WAVE
|
.B EBU Tech 3285 \- Specification of the Broadcast Wave Format (BWF). EBU, 2011.
|
||||||
.\" file format almost identically to
|
Defines the elements of a Broadcast WAVE file, the
|
||||||
.\" .IR BS.2088 ,
|
.I bext
|
||||||
.\" Version 2 of the standard wholly adopted
|
metadata chunk structure, allowed sample formats and other things. Over the
|
||||||
.\" .IR BS.2088 .
|
years the EBU has published numerous supplements covering extensions to the
|
||||||
|
format, such as embedding SMPTE UMIDs, pre-calculated loudness data (EBU Tech
|
||||||
|
3285 v2),
|
||||||
|
.I peak
|
||||||
|
waveform overview data (Suppl. 3), ADM metadata (Suppl. 5 and 7), Dolby master
|
||||||
|
metadata (Suppl. 6), and other things.
|
||||||
|
.TP
|
||||||
|
.B SMPTE 330M-2011 \- Unique Material Identifier. SMPTE, 2011.
|
||||||
|
Describes the format of the SMPTE UMID field, a 32- or 64-byte UUID used to
|
||||||
|
identify media files. UMIDs are usually a dumb number in their 32-byte form,
|
||||||
|
but the extended form can encode a high-precision timestamp (with options for
|
||||||
|
epoch and timescale) and geolocation information. Broadcast-WAVE files
|
||||||
|
conforming to
|
||||||
|
.B "EBU 3285 v2"
|
||||||
|
have a SMPTE UMID embedded in the
|
||||||
|
.I bext
|
||||||
|
chunk.
|
||||||
|
.SS Audio Definition Model
|
||||||
|
.TP
|
||||||
|
.B ITU Recommendation BS.2076-2-2019 \- Audio definition model. ITU, 2019.
|
||||||
|
Defines the Audio Definition Model, entities, relationships and properties. If
|
||||||
|
you ever had any questions about how ADM works, this is where you would start.
|
||||||
|
.SS iXML Metadata
|
||||||
|
.TP
|
||||||
|
.B iXML Specification v3.01. Gallery Software, 2021.
|
||||||
|
iXML is a standard for embedding mostly human-created metadata into WAVE files,
|
||||||
|
and mostly with an emphasis on location sound recorders used on film and
|
||||||
|
television productions. Frustratingly the developer has never published a DTD
|
||||||
|
or schema validation or strict formal standard, and encourages vendors to just
|
||||||
|
do whatever, but most of the heavily-traveled metadata fields are standardized,
|
||||||
|
for recording information like a recording's scene, take, recording notes,
|
||||||
|
circled or alt status. iXML also has a system of
|
||||||
|
.B "families"
|
||||||
|
for associating several WAVE files together into one recording.
|
||||||
|
|||||||
@@ -1,6 +1,9 @@
|
|||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
|
A complete list of technical references and commentary is available as man page
|
||||||
|
and is installed as wavinfo(7) when you install `wavinfo` via pip.
|
||||||
|
|
||||||
Wave File Format
|
Wave File Format
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user