The content of a MediaStreamTrack
is exposed as a stream that can be manipulated or used to generate new content
Background
In the context of the
Media Capture and Streams API
the MediaStreamTrack
interface represents a single media track within a stream; typically, these are audio or video
tracks, but other track types may exist.
MediaStream
objects consist of
zero or more MediaStreamTrack
objects, representing various audio or video tracks. Each
MediaStreamTrack
may have one or more channels. The channel represents the smallest unit of a
media stream, such as an audio signal associated with a given speaker, like left or right in a
stereo audio track.
What is insertable streams for MediaStreamTrack
?
The core idea behind insertable streams for MediaStreamTrack
is to expose the content of a
MediaStreamTrack
as a collection of streams (as defined by the WHATWG
Streams API). These streams can be manipulated to introduce new
components.
Granting developers access to the video (or audio) stream directly allows them to apply
modifications directly to the stream. In contrast, realizing the same video manipulation task with
traditional methods requires developers to use intermediaries such as <canvas>
elements. (For
details of this type of process, see, for example,
video + canvas = magic.)
Browser support
Insertable streams for MediaStreamTrack
is supported from Chrome 94.
Use cases
Use cases for insertable streams for MediaStreamTrack
include, but are not limited to:
- Video conferencing gadgets like "funny hats" or virtual backgrounds.
- Voice processing like software vocoders.
How to use insertable streams for MediaStreamTrack
Feature detection
You can feature-detect insertable streams for MediaStreamTrack
support as follows.
if ('MediaStreamTrackProcessor' in window && 'MediaStreamTrackGenerator' in window) {
// Insertable streams for `MediaStreamTrack` is supported.
}
Core concepts
Insertable streams for MediaStreamTrack
builds on concepts previously proposed by
WebCodecs and conceptually splits the MediaStreamTrack
into two
components:
- The
MediaStreamTrackProcessor
, which consumes aMediaStreamTrack
object's source and generates a stream of media frames, specificallyVideoFrame
orAudioFrame
) objects. You can think of this as a track sink that is capable of exposing the unencoded frames from the track as aReadableStream
. - The
MediaStreamTrackGenerator
, which consumes a stream of media frames and exposes aMediaStreamTrack
interface. It can be provided to any sink, just like a track fromgetUserMedia()
. It takes media frames as input.
The MediaStreamTrackProcessor
A MediaStreamTrackProcessor
object exposes one property, readable
. It allows for reading the
frames from the MediaStreamTrack
. If the track is a video track,
chunks read from readable
will be VideoFrame
objects. If the track is an audio track, chunks
read from readable
will be AudioFrame
objects.
The MediaStreamTrackGenerator
A MediaStreamTrackGenerator
object likewise exposes one property, writable
, which is a
WritableStream
that allows writing media frames to the
MediaStreamTrackGenerator
, which is itself a MediaStreamTrack
. If the kind
attribute is
"audio"
, the stream accepts AudioFrame
objects and fails with any other type. If kind is
"video"
, the stream accepts VideoFrame
objects and fails with any other type. When a frame is
written to writable
, the frame's close()
method is automatically invoked, so that its media
resources are no longer accessible from JavaScript.
A MediaStreamTrackGenerator
is a track for which a custom
source can be implemented by writing media frames to its writable
field.
Bringing it all together
The core idea is to create a processing chain as follows:
Platform Track → Processor → Transform → Generator → Platform Sinks
The example below illustrates this chain for a barcode scanner application which highlights detected barcode in a live video stream.
const stream = await getUserMedia({ video: true });
const videoTrack = stream.getVideoTracks()[0];
const trackProcessor = new MediaStreamTrackProcessor({ track: videoTrack });
const trackGenerator = new MediaStreamTrackGenerator({ kind: 'video' });
const transformer = new TransformStream({
async transform(videoFrame, controller) {
const barcodes = await detectBarcodes(videoFrame);
const newFrame = highlightBarcodes(videoFrame, barcodes);
videoFrame.close();
controller.enqueue(newFrame);
},
});
trackProcessor.readable.pipeThrough(transformer).pipeTo(trackGenerator.writable);
const videoBefore = document.getElementById('video-before');
const videoAfter = document.getElementById('video-after');
videoBefore.srcObject = stream;
const streamAfter = new MediaStream([trackGenerator]);
videoAfter.srcObject = streamAfter;
Demo
You can see the QR code scanner demo from the section above in action on a desktop or mobile browser. Hold a QR code in front of the camera and the app will detect it and highlight it. You can see the application's source code on Glitch.
Security and Privacy considerations
The security of this API relies on existing mechanisms in the web platform. As data is exposed using
the VideoFrame
and AudioFrame
interfaces, the rules of those interfaces to deal with
origin-tainted data apply. For example, data from cross-origin resources cannot be accessed due to
existing restrictions on accessing such resources (e.g., it is not possible to access the pixels of
a cross-origin image or video element). In addition, access to media data from cameras, microphones,
or screens is subject to user authorization. The media data this API exposes is already available
through other APIs.
Feedback
The Chromium team wants to hear about your experiences with insertable streams for
MediaStreamTrack
.
Tell us about the API design
Is there something about the API that does not work like you expected? Or are there missing methods or properties that you need to implement your idea? Do you have a question or comment on the security model? File a spec issue on the corresponding GitHub repo, or add your thoughts to an existing issue.
Report a problem with the implementation
Did you find a bug with Chromium's implementation? Or is the implementation different from the spec?
File a bug at new.crbug.com. Be sure to include as much detail as you can,
simple instructions for reproducing, and enter Blink>MediaStream
in the Components box.
Glitch works great for sharing quick and easy repros.
Show support for the API
Are you planning to use insertable streams for MediaStreamTrack
? Your public support helps the
Chromium team prioritize features and shows other browser vendors how critical it is to support
them.
Send a tweet to @ChromiumDev using the hashtag
#InsertableStreams
and let us know where and how you are using it.
Helpful links
Acknowledgements
The insertable streams for MediaStreamTrack
spec was written by
Harald Alvestrand and Guido Urdaneta.
This article was reviewed by Harald Alvestrand, Joe Medley,
Ben Wagner, Huib Kleinhout, and
François Beaufort. Hero image by
Chris Montgomery on
Unsplash.