Categories
Tech

Read and Write Tags of Music Files with FFmpeg

In both my previous and recent projects, I have been working with tags (metadata) of music files. One of the reason being I am rather particular about having a nicely organised library with all tag data aligned to the same format. Until recently while I was seeking for a solution to read and write tags of (potentially) all music formats1I only have MP3, FLAC, AIFF and M4A in my library, so that’s kinda all for me., and I encountered FFmpeg, the Swiss Army Knife of media processing.

FFmpeg has always been my go-to solution for processing media programmatically or in batch, and I have recently found the way to write into the tags of music files using it. The way of doing so might be a little verbose as everything have to fit into the command line interface with other components.

Read tags

To read tags from a music file, we actually need to use ffprobe instead of ffmpeg. ffprobe in this case can produce an output in JSON which is more program-friendly.

ffprobe -show_format -print_format json music.mp3
{
    "format": {
        "filename": "music.mp3",
        "nb_streams": 1,
        "nb_programs": 0,
        "format_name": "mp3",
        "format_long_name": "MP2/3 (MPEG audio layer 2/3)",
        "start_time": "0.000000",
        "duration": "290.160000",
        "size": "4643133",
        "bit_rate": "128015",
        "probe_score": 51,
        "tags": {
            "title": "初音ミクの消失 (2018 Remake)",
            "artist": "cosMo@暴走P feat. 初音ミク",
            "album": "初音ミクの消失 -Real and Repeat-",
            "compilation": "1",
            "encoded_by": "Max 0.9.1",
            "title-sort": "はつねみくのしょうしつ (2018 Remake)",
            "artist-sort": "cosMo@ぼうそうP feat. はつねみく",
            "album-sort": "はつねみくのしょうしつ -Real and Repeat-",
            "TDTG": "2014-11-03T15:38:58",
            "encoder": "Lavf58.29.100"
        }
    }
}

The option -show_format adds the section "format" to the output, which has the metadata of the file format as long as the tags. Omitting this option will result in an output with no useful data.

Besides JSON, ffprobe also produce output in CSV, flat key-value pairs, INI, and XML syntax. You can choose whichever format that fits your need better. See ffprobe documentations for more options.

Note that both ffmpeg and ffprobe prints version number, compile info of the program itself and metadata of the file to stderr. Make sure to get rid of it if your program consumes both stdout and stderr together by default.

Write tags

It is trickier to write tags than just reading them. In the logic of FFmpeg, everything is considered as a stream, some file comes as an input stream, some filters are applied, and then one output is produced. This is the same case for writing tags too. It could make the entire procedure a little more complex, since usually when you just want to make changes to tag data, you want to just overwrite the original file. But with the way FFmpeg is designed, it won’t allow you to do so. The best you can go around with this is to let FFmpeg to make a copy of the file, and delete the old one afterwards.

Options used in this command are as follows:

  • -i aiff.aiff: path to the of input file.
  • -map 0: map both audio and video of the 0th input file to the output file, i.e. to keep the media content unchanged.
  • -y: overwrite if the destination file exists. Note that you cannot write to your input file even with this option enabled, or you may corrupt the file entirely.
  • -codec copy: to keep the codec of the file unchanged, so as to prevent unnecessary re-encoding.
  • -write_id3v2 1: quite self-explanatory, only use this option if you want to write the tags as ID3v2. In cases like AIFF, FFmpeg cannot detect the correct tag type to use, so forcing ID3v2 could be sometimes useful.
  • -metadata "tag-name=tag value": this is where you write/overwrite tags.
  • aiffout.aiff: path to the output file.
ffmpeg -i aiff.aiff -map 0 -y -codec copy -write_id3v2 1 -metadata "artist-sort=emon feat sort" aiffout.aiff
ffmpeg version 4.2.1 Copyright (c) 2000-2019 the FFmpeg developers
  built with Apple clang version 11.0.0 (clang-1100.0.33.8)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.2.1_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-13.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-13.jdk/Contents/Home/include/darwin -fno-stack-check' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, aiff, from 'aiff.aiff':
  Metadata:
    title           : shake it!
    artist          : emon feat. 初音ミク.鏡音リン.鏡音レン
    album           : 「マジカルミライ 2014」OFFICIAL ALBUM
    compilation     : 1
    encoded_by      : Max 0.9.1
    title-sort      : shake it!
    creation_time   : 2014-11-03T15:38:58
    TDTG            : 2014-11-03T15:38:58
    album-sort      : 「まじかるみらい 2014」OFFICIAL ALBUM
    artist-sort     : emon feat. はつねみく.かがみねりん.かがみねれん
  Duration: 00:03:47.03, start: 0.000000, bitrate: 2822 kb/s
    Stream #0:0: Audio: pcm_s32be, 44100 Hz, stereo, s32, 2822 kb/s
Output #0, aiff, to 'aiffout.aiff':
  Metadata:
    title           : shake it!
    artist          : emon feat. 初音ミク.鏡音リン.鏡音レン
    album           : 「マジカルミライ 2014」OFFICIAL ALBUM
    compilation     : 1
    encoded_by      : Max 0.9.1
    title-sort      : shake it!
    album-sort      : 「まじかるみらい 2014」OFFICIAL ALBUM
    TDTG            : 2014-11-03T15:38:58
    artist-sort     : emon feat sort
    encoder         : Lavf58.29.100
    Stream #0:0: Audio: pcm_s32be (NONE / 0x454E4F4E), 44100 Hz, stereo, s32, 2822 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=   78219kB time=00:03:47.02 bitrate=2822.5kbits/s speed=2.08e+03x
video:0kB audio:78218kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000549%
 

In order to provide a uniform interface for writing metadata, FFmpeg has some custom aliases for common tag names that are different from what is actually written to the file. Examples like title, album, artist, and genre should be honored in most tag types. But some aliases might not always be mapped to the tag you would expect. Sort key tags of Vorbis Comments in FLAC is not mapped as those in ID3, and are considered as custom tags by FFmpeg. It is always better to check with ffprobe using an already properly tagged file to see if aliases are used. MultimediaWiki has provided a list of common aliases and the tags they mapped to in actual files.

To write a custom tag, just use the key name of your choice directly. FFmpeg can figure out the proper way to add them to your file, like using TXXX in ID3.

As mentioned previously, this command also produces a lot of debug info to stderr. In fact, all these output are by default printed to stderr, so you will only get a return code 0, and nothing from stdout.

Read cover art

In FFmpeg, a music file with a cover art embedded is considered as a 2 input streams — 1 audio stream and 1 single-frame video stream (as a picture). So, to extract the cover art out, what we need to do is similar to stripping off the audio track of a video.

Options used in this command are as follows:

  • -i mp3.mp3: path to the of input file.
  • -an: drop the audio stream.
  • -vcodec copy: to keep the codec of the video stream. (That should mean the image format I guess)
  • cover.png: path to the output file.
ffmpeg -i mp3.mp3 -an -vcodec copy cover.png
ffmpeg version 4.2.1 Copyright (c) 2000-2019 the FFmpeg developers
  built with Apple clang version 11.0.0 (clang-1100.0.33.8)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.2.1_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-13.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-13.jdk/Contents/Home/include/darwin -fno-stack-check' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mp3, from 'mp3.mp3':
  Metadata:
    title           : Melody Line
    artist          : SmileR feat. 初音ミク
    track           : 02/20
    album           : Melody Line(s)
    genre           : <unknown>
    title-sort      : Melody Line
    album-sort      : Melody Line(s)
    artist-sort     : SmileR feat. はつねみく
  Duration: 00:03:21.09, start: 0.025056, bitrate: 457 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 320 kb/s
    Stream #0:1: Video: png, rgba(pc), 1500x1499, 90k tbr, 90k tbn, 90k tbc (attached pic)
    Metadata:
      comment         : Other
Output #0, image2, to 'cover.png':
  Metadata:
    title           : Melody Line
    artist          : SmileR feat. 初音ミク
    track           : 02/20
    album           : Melody Line(s)
    genre           : <unknown>
    title-sort      : Melody Line
    album-sort      : Melody Line(s)
    artist-sort     : SmileR feat. はつねみく
    encoder         : Lavf58.29.100
    Stream #0:0: Video: png, rgba(pc), 1500x1499, q=2-31, 90k tbr, 90k tbn, 90k tbc (attached pic)
    Metadata:
      comment         : Other
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
frame=    1 fps=0.0 q=-1.0 Lsize=N/A time=00:00:00.00 bitrate=N/A speed=0.00122x
video:3371kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

So, nothing fancy here. It seems like the format the output image would always follow the extension in the output path specified no matter what you have in the file, even when -vcodec copy is enabled. I’m not really sure if that option is really needed or not. But what is important here is to have a proper extension in the output path.

Write cover art

Similar to reading, writing cover art is the reverse process — combining an audio and a static picture into one file. The static picture will be automatically considered as a cover art.

Options used in this command are as follows:

  • -i mp3.mp3: path to the of input audio file.
  • -i cover.png: path to the cover art.
  • -map 0: map streams of the 0th (first) input to the output.
  • -map 1:0: map first stream (image data) of the 1th21th is the way of saying the #1 element in a zero-indexed list, it’s intended here. (second) input (i.e. the picture) to the output.
  • -codec copy: to keep the codec of streams, and prevent unnecessary encoding.
  • mp3out.mp3: path to the output file.
ffmpeg -i mp3.mp3 -i cover.png -map 0 -map 1:0 -codec copy mp3out.mp3
ffmpeg version 4.2.1 Copyright (c) 2000-2019 the FFmpeg developers
  built with Apple clang version 11.0.0 (clang-1100.0.33.8)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.2.1_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-13.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-13.jdk/Contents/Home/include/darwin -fno-stack-check' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mp3, from 'mp3.mp3':
  Metadata:
    title           : Melody Line
    artist          : SmileR feat. 初音ミク
    track           : 02/20
    album           : Melody Line(s)
    genre           : <unknown>
    title-sort      : Melody Line
    album-sort      : Melody Line(s)
    artist-sort     : SmileR feat. はつねみく
  Duration: 00:03:21.09, start: 0.025056, bitrate: 457 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 320 kb/s
    Stream #0:1: Video: png, rgba(pc), 1500x1499, 90k tbr, 90k tbn, 90k tbc (attached pic)
    Metadata:
      comment         : Other
Input #1, png_pipe, from 'cover.png':
  Duration: N/A, bitrate: N/A
    Stream #1:0: Video: png, rgba(pc), 1500x1499, 25 tbr, 25 tbn, 25 tbc
Output #0, mp3, to 'mp3out.mp3':
  Metadata:
    TIT2            : Melody Line
    TPE1            : SmileR feat. 初音ミク
    TRCK            : 02/20
    TALB            : Melody Line(s)
    TCON            : <unknown>
    TSOT            : Melody Line
    TSOA            : Melody Line(s)
    TSOP            : SmileR feat. はつねみく
    TSSE            : Lavf58.29.100
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 320 kb/s
    Stream #0:1: Video: png, rgba(pc), 1500x1499, q=2-31, 90k tbr, 90k tbn, 90k tbc (attached pic)
    Metadata:
      comment         : Other
    Stream #0:2: Video: png, rgba(pc), 1500x1499, q=2-31, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #1:0 -> #0:2 (copy)
Press [q] to stop, [?] for help
frame=    1 fps=0.0 q=-1.0 Lq=-1.0 size=   14599kB time=00:03:21.03 bitrate= 594.9kbits/s speed=4.27e+03x
video:6743kB audio:7855kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.009018%

Again, these output are for stderr only, nothing is printed to stdout.

You can manipulate metadata and the cover art in the same command by just adding -metadata key=value options. Note that if you have -map 0:0 instead of -map 0 for the audio file, you may lose your existing tag data. Only add the extra :0 if that is what you want to do.


FFmpeg can get rid of the hassle if you are in a hassle looking for an all-in-one solution for music tag reading/writing. It might be a little too heavy if you are using it only for this purpose, but it shouldn’t be much of a problem if you already have it installed in your system for some other things.

Leave a Reply

Your email address will not be published. Required fields are marked *

*