Loading Audio Text Tracks programmatically

From the WebVTT API docs

A text track and cues can be defined in a file using the WebVTT File
Format, and then associated with a particular <video> element using
the <track> element.

Alternatively you can add a TextTrack to a media element in JavaScript
using HTMLMediaElement.addTextTrack(), and then add individual VTTCue
objects to the track with TextTrack.addCue().

I have a vtt file and I’m using new Audio(src); with no <audio> element in the html. Is it not possible to add a text track with a vtt file without an actual <audio> element or do I have to parse the vtt file myself and call addCue for each cue defined in the file?

I’m not using <audio> elements because I need to load about 20 audio files (with vtt files) and play them in a specific order. Maybe I could have one <audio> element and reuse it?