Scroll and zoom a captured tab

François Beaufort

Elad Alon

Sharing tabs, windows, and screens is already possible on the web platform with the Screen Capture API. When a web app calls getDisplayMedia(), Chrome prompts the user to share a tab, window, or screen with the web app as a MediaStreamTrack video.

Many web apps that use getDisplayMedia() show the user a video preview of the captured surface. For example, video conferencing apps will often stream this video to remote users while also rendering it to a local HTMLVideoElement, so that the local user would constantly see a preview of what they're sharing.

This documentation introduces the new Captured Surface Control API in Chrome, which lets your web app scroll a captured tab, as well as read and write the zoom level of a captured tab.

A user scrolls and zooms a captured tab (demo).

Why use Captured Surface Control?

All video conferencing apps suffer from the same drawback: if the user wishes to interact with a captured tab or window, the user must switch to that surface, taking them away from the video conferencing app. This presents some challenges:

The user can't see the captured app and the videos of remote users at the same time unless they use Picture-in-Picture or separate side-by-side windows for the video conference tab and the shared tab. On a smaller screen, this could be difficult.
The user is burdened by the need to jump between the video conferencing app and the captured surface.
The user loses access to the controls exposed by the video conferencing app while they are away from it; for example, an embedded chat app, emoji reactions, notifications about users asking to join the call, multimedia and layout controls, and other useful video conferencing features.
The presenter cannot delegate control to remote participants. This leads to the all too familiar scenario where remote users ask the presenter to change the slide, scroll a bit up and down, or adjust the zoom level.

The Captured Surface Control API addresses these problems.

How do I use Captured Surface Control?

Using Captured Surface Control successfully requires a few steps, such as explicitly capturing a browser tab and gaining permission from the user before being able to scroll and zoom the captured tab.

Capture a browser tab

Start by prompting the user to choose a surface to share using getDisplayMedia(), and in the process, associate a CaptureController object with the capture session. We will be using that object to control the captured surface soon enough.

const controller = new CaptureController();
const stream = await navigator.mediaDevices.getDisplayMedia({ controller });

Next, produce a local preview of the captured surface in the form of a <video> element:

const previewTile = document.querySelector('video');
previewTile.srcObject = stream;

If the user chooses to share a window or a screen, that's out of scope for now—but if they chose to share a tab, we may proceed.

const [track] = stream.getVideoTracks();

if (track.getSettings().displaySurface !== 'browser') {
  // Bail out early if the user didn't pick a tab.
  return;
}

Permission prompt

The first invocation of either sendWheel() or setZoomLevel() on a given CaptureController object produces a permission prompt. If the user grants permission, further invocations of these methods on that CaptureController object are allowed. If the user denies permission, the returned promise is rejected.

Note that CaptureController objects are uniquely associated with a specific capture-session, cannot be associated with another capture-session, and don't survive navigation of the page where they are defined. However, capture-sessions do survive the navigation of the captured page.

A user gesture is required to show a permission prompt to the user. Only sendWheel() and setZoomLevel() calls require a user gesture, and only if the prompt needs to be shown. If the user clicks a zoom-in or zoom-out button in the web app, that user gesture is a given; but if the app wishes to offer scroll-control first, developers should keep in mind that scrolling does not constitute a user gesture. One possibility is to first offer the user a "start scrolling" button, as per the following example:

const startScrollingButton = document.querySelector('button');

startScrollingButton.addEventListener('click', async () => {
  try {
    const noOpWheelAction = {};

    await controller.sendWheel(noOpWheelAction);
    // The user approved the permission prompt.
    // You can now scroll and zoom the captured tab as shown later in the article.
  } catch (error) {
    return; // Permission denied. Bail.
  }
});

Scroll

Using sendWheel(), a capturing app can deliver wheel events of its chosen magnitude over coordinates of its choosing within a tab's viewport. The event is indistinguishable to the captured app from direct user interaction.

Assuming the capturing app employs a <video> element called "previewTile", the following code shows how to relay send wheel events to the captured tab:

const previewTile = document.querySelector('video');

previewTile.addEventListener('wheel', async (event) => {
  // Translate the offsets into coordinates which sendWheel() can understand.
  // The implementation of this translation is further explained below.
  const [x, y] = translateCoordinates(event.offsetX, event.offsetY);
  const [wheelDeltaX, wheelDeltaY] = [-event.deltaX, -event.deltaY];

  try {
    // Relay the user's action to the captured tab.
    await controller.sendWheel({ x, y, wheelDeltaX, wheelDeltaY });
  } catch (error) {
    // Inspect the error.
    // ...
  }
});

The method sendWheel() takes a dictionary with two sets of values:

x and y: the coordinates where the wheel event is to be delivered.
wheelDeltaX and wheelDeltaY: the magnitudes of the scrolls, in pixels, for horizontal and vertical scrolls, respectively. Note that these values are inverted compared to the original wheel event.

A possible implementation of translateCoordinates() is:

function translateCoordinates(offsetX, offsetY) {
  const previewDimensions = previewTile.getBoundingClientRect();
  const trackSettings = previewTile.srcObject.getVideoTracks()[0].getSettings();

  const x = trackSettings.width * offsetX / previewDimensions.width;
  const y = trackSettings.height * offsetY / previewDimensions.height;

  return [Math.floor(x), Math.floor(y)];
}

Note that there are three different sizes in play in the code earlier:

The size of the <video> element.
The size of the captured frames (represented here as trackSettings.width and trackSettings.height).
The size of the tab.

The size of the <video> element is fully within the domain of the capturing app, and unknown to the browser. The size of the tab is fully within the domain of the browser, and unknown to the web app.

The web app uses translateCoordinates() to translate the offsets relative to the <video> element into coordinates within the video track's own coordinates space. The browser will likewise translate between the size of the captured frames and the size of the tab, and deliver the scroll event at an offset corresponding to the expectation of the web app.

The promise returned by sendWheel() can be rejected in the following cases:

If the capture session has not yet started or has already stopped, including stopping asynchronously while the sendWheel() action is handled by the browser.
If the user did not grant the app permission to use sendWheel().
If the capturing app attempts to deliver a scroll event in coordinates that are outside of [trackSettings.width, trackSettings.height]. Note that these values could change asynchronously, so it's a good idea to catch the error and ignore it. (Note that 0, 0 wouldn't normally be out of bounds, so it's safe to use them to prompt the user for permission.)

Zoom

Interacting with the zoom level of the captured tab is done through the following CaptureController surfaces:

getSupportedZoomLevels() returns a list of zoom levels supported by the browser, represented as percentages of the "default zoom level", which is defined as 100%. This list is monotonically increasing and contains the value 100.
getZoomLevel() returns the current zoom level of the tab.
setZoomLevel() sets the zoom-level of the tab to any integer value present in getSupportedZoomLevels(), and returns a promise when it succeeds. Note that the zoom level is not reset at the end of the capture session.
oncapturedzoomlevelchange lets you listen to a captured tab's zoom level changes as users may change the zoom level either through the capturing app, or through direct interaction with the captured tab.

Calls to setZoomLevel() are gated by permission; calls to the other, read-only zoom methods are "free", as is listening to events.

Important: getSupportedZoomLevels() results can vary across browsers and browser versions. Avoid hard-coding these values to ensure your web app works reliably.

The following example shows you to increase the zoom level of a captured tab in an existing capture session:

const zoomIncreaseButton = document.getElementById('zoomInButton');

zoomIncreaseButton.addEventListener('click', async (event) => {
  const levels = CaptureController.getSupportedZoomLevels();
  const index = levels.indexOf(controller.getZoomLevel());
  const newZoomLevel = levels[Math.min(index + 1, levels.length - 1)];

  try {
    await controller.setZoomLevel(newZoomLevel);
  } catch (error) {
    // Inspect the error.
    // ...
  }
});

The following example shows you to react to zoom level changes of a captured tab:

controller.addEventListener('capturedzoomlevelchange', (event) => {
  const zoomLevel = controller.getZoomLevel();
  document.querySelector('#zoomLevelLabel').textContent = `${zoomLevel}%`;
});

Feature detection

To check if sending wheel events is supported, use:

if (!!window.CaptureController?.prototype.sendWheel) {
  // CaptureController sendWheel() is supported.
}

To check if controlling zoom is supported, use:

if (!!window.CaptureController?.prototype.setZoomLevel) {
  // CaptureController setZoomLevel() is supported.
}

Enable Captured Surface Control

The Captured Surface Control API is available in Chrome on desktop behind the Captured Surface Control flag, and can be enabled at chrome://flags/#captured-surface-control.

This feature is also entering an origin trial starting with Chrome 122 on desktop, which allows developers to enable the feature for visitors to their sites to collect data from real users. See Get started with origin trials for more information on origin trials and how they work.

Security and privacy

The "captured-surface-control" permission policy lets you manage how your capturing app and embedded third-party iframes have access to Captured Surface Control. To understand the security tradeoffs, check out the Privacy and Security Considerations section of the Captured Surface Control explainer.

Demo

You can play with Captured Surface Control by running the demo on Glitch. Be sure to check out the source code.

Changes from previous versions of Chrome

Here are some key behavioral differences about Captured Surface Control that you should be aware of:

In Chrome 124 and before:
- The permission—if granted—is scoped to the capture session associated with that CaptureController, not to the capturing origin.
In Chrome 122:
- getZoomLevel() returns a promise with the current zoom level of the tab.
- sendWheel() returns a promise rejected with the error message "No permission." if the user did not grant the app permission to use. The error type is "NotAllowedError" in Chrome 123 and later.
- oncapturedzoomlevelchange is not available. You can polyfill this feature using setInterval().

Feedback

The Chrome team and the web standards community want to hear about your experiences with Captured Surface Control.

Tell us about the design

Is there something about Captured Surface Capture that doesn't work as you expected? Or are there missing methods or properties that you need to implement your idea? Have a question or comment on the security model? File a spec issue on the GitHub repo, or add your thoughts to an existing issue.

Problem with the implementation?

Did you find a bug with Chrome's implementation? Or is the implementation different from the spec? File a bug at https://new.crbug.com. Be sure to include as much detail as you can, as well as instructions for reproducing. Glitch works great for sharing reproducible bugs.

Scroll and zoom a captured tab

Why use Captured Surface Control?

How do I use Captured Surface Control?

Capture a browser tab

Permission prompt

Scroll

Zoom

Feature detection

Enable Captured Surface Control

Security and privacy

Demo

Changes from previous versions of Chrome

Feedback

Tell us about the design

Problem with the implementation?

Helpful links