Use WebVTT without actually using WebVTT: Another way to monitor playback progress of HTML Media Elements

Previously, I have introduced how LyricsX handled playback progress of different players, and briefly talked about how I applied its principal to web audio with a requestAnimationFrame() loop. In this article, I’ll talk about how to use WebVTT, a browser-native captioning feature to receive callbacks on specific time ranges.

Web Video Text Tracks (WebVTT) is a media captioning feature of HTML5 standard that has support from all major modern browsers. In its typical use cases, author of the webpage can provide a caption file in an SRT-like syntax defined by WebVTT, and link the file to a video element using a <track /> tag. The browser will then be aware of the caption, and provide option to render it on top of the video.

Besides rendering captions, WebVTT also provides a set of API that accepts a callback function when the media element enters a specific time range, which we can make use of to track the playback progress without using the sometimes-not-as-precise timeupdate event. While WebVTT is designed for video playbacks, the callback features would also work even the track is attached to an <audio /> node.

In WebVTT, each caption line with a specific time range is called a “cue”. To provide callback functions to a cue, the cue has to be inserted to a caption track (also known as a textTrack) programmatically.

To add a track, it is recommended to create a <track /> node inside the audio/video node. Despite the Multimedia API offers a .addTextTrack() method, tracks added in this way cannot have an ID, and cannot be removed until the page refreshes due to the lack of a corresponding .removeTextTrack() method. Therefore, using DOM nodes is safer in terms of controllability.

Below is an example of creating a text track programmatically.

const track = document.createElement("track");
const uniqueId = Math.random().toString(36).substring(2, 15);
track.id = `track-${uniqueId}`;
track.kind = "subtitles";
track.label = `Track #${uniqueId}`;
// Minimum WebVTT track file
track.src = "data:text/vtt;base64,V0VCVlRUCgoK";

const player = document.querySelector("audio");
player.appendChild(track);
// Enable track in order to allow callbacks
track.track.mode = "hidden";

In case you need to remove the track to start over, it is as easy as removing the track element from the dom.

track.parentNode.removeChild(track);

WebVTT tracks provides convenient API to add and remove tracks. When adding a track, it needs 3 parameters, start time and end time in seconds, and the subtitle content. In our use case, the subtitle content is optional, and you can leave it blank, or set it to any content you prefer for debugging purpose.

const cue = new VTTCue(1.5, 2.25, "");

Then, you can add event listeners to the cue object you created. VTTCue has two events, enter and exit. As the name suggest, these events are triggered when the current playback progress enters or exits the cue’s time range. It will trigger in normal playback, and also when user seeks the progress bar.

For use cases where the events are single-ended, i.e. events only has a start time, and ends right before the next starts, it is possible to only set an enter event listener, and let the enter event of another cue as the exit event of the current one.

cue.addEventListener("enter", () => {
    console.log("Entered cue");
});

Last but not least, the cue need to be added to a track to take effect.

track.track.addCue(cue);

Now, when you play the media element with the track attached, you should be able to see the events triggered as it enters the time duration of each cue.


While this WebVTT implementation is more native to the browser and technically offering better performance than listening to timeupdate and scan through all time ranges every time, the precision of this is still being limited by the browser’s WebVTT implementation itself. On Chrom* browsers, it does not guarantee a perfect sync with the media, which is also observed with ordinary WebVTT subtitles.

If you need a better precision control, and only work with short audio files, Web Audio API might be a better choice.

If you are interested in a full example, and you are comfortable with TypeScript and React, here’s an exmaple where I apply this WebVTT track-based tracking to a custom React hook in Lyricova Jukebox: hooks.ts.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *