What's New in WebGPU (Chrome 120)

François Beaufort
François Beaufort

Support for 16-bit floating-point values in WGSL

In WGSL, the f16 type is the set of 16-bit floating-point values of the IEEE-754 binary16 (half precision) format. It means that it uses 16 bits to represent a floating-point number, as opposed to 32 bits for conventional single-precision floating-point (f32). This smaller size can lead to significant performance improvements, especially when processing large amounts of data.

For comparison, on an Apple M1 Pro device, the f16 implementation of Llama2 7B models used in the WebLLM chat demo is significantly faster than the f32 implementation, with a 28% improvement in prefill speed and a 41% improvement in decoding speed as shown in the following screenshots.

Screenshot of WebLLM chat demos with f32 and f16 Llama2 7B models.
WebLLM chat demos with f32 (left) and f16 (right) Llama2 7B models.

Not all GPUs support 16-bit floating-point values. When the "shader-f16" feature is available in a GPUAdapter, you can now request a GPUDevice with this feature and create a WGSL shader module that takes advantage of the half-precision floating-point type f16. This type is valid to use in the WGSL shader module only if you enable the f16 WGSL extension with enable f16;. Otherwise, createShaderModule() will generate a validation error. See the following minimal example and issue dawn:1510.

const adapter = await navigator.gpu.requestAdapter();
if (!adapter.features.has("shader-f16")) {
  throw new Error("16-bit floating-point value support is not available");
}
// Explicitly request 16-bit floating-point value support.
const device = await adapter.requestDevice({
  requiredFeatures: ["shader-f16"],
});

const code = `
  enable f16;

  @compute @workgroup_size(1)
  fn main() {
    const c : vec3h = vec3<f16>(1.0h, 2.0h, 3.0h);
  }
`;

const shaderModule = device.createShaderModule({ code });
// Create a compute pipeline with this shader module
// and run the shader on the GPU...

It's possible to support both f16 and f32 types in the WGSL shader module code with an alias depending on the "shader-f16" feature support as shown in the following snippet.

const adapter = await navigator.gpu.requestAdapter();
const hasShaderF16 = adapter.features.has("shader-f16");

const device = await adapter.requestDevice({
  requiredFeatures: hasShaderF16 ? ["shader-f16"] : [],
});

const header = hasShaderF16
  ? `enable f16;
     alias min16float = f16;`
  : `alias min16float = f32;`;

const code = `
  ${header}

  @compute @workgroup_size(1)
  fn main() {
    const c = vec3<min16float>(1.0, 2.0, 3.0);
  }
`;

Push the limits

The maximum number of bytes necessary to hold one sample (pixel or subpixel) of render pipeline output data, across all color attachments, is 32 bytes by default. It is now possible to request up to 64 by using the maxColorAttachmentBytesPerSample limit. See the following example and issue dawn:2036.

const adapter = await navigator.gpu.requestAdapter();

if (adapter.limits.maxColorAttachmentBytesPerSample < 64) {
  // When the desired limit isn't supported, take action to either fall back to
  // a code path that does not require the higher limit or notify the user that
  // their device does not meet minimum requirements.
}

// Request highest limit of max color attachments bytes per sample.
const device = await adapter.requestDevice({
  requiredLimits: { maxColorAttachmentBytesPerSample: 64 },
});

The maxInterStageShaderVariables and maxInterStageShaderComponents limits used for inter-stage communication have been increased on all platforms. See issue dawn:1448 for details.

For each shader stage, the maximum number of bind group layout entries across a pipeline layout which are storage buffers is 8 by default. It is now possible to request up to 10 by using the maxStorageBuffersPerShaderStage limit. See issue dawn:2159.

A new maxBindGroupsPlusVertexBuffers limit has been added. It consists of the maximum number of bind group and vertex buffer slots used simultaneously, counting any empty slots below the highest index. Its default value is 24. See issue dawn:1849.

Changes to depth-stencil state

To improve the developer experience, the depth-stencil state depthWriteEnabled and depthCompare attributes are not always required anymore: depthWriteEnabled is required only for formats with depth, and depthCompare is not required for formats with depth if not used at all. See issue dawn:2132.

Adapter information updates

Non-standard type and backend adapter info attributes are now available upon calling requestAdapterInfo() when the user has enabled the "WebGPU Developer Features" flag at chrome://flags/#enable-webgpu-developer-features. The type can be "discrete GPU", "integrated GPU", "CPU", or "unknown". The backend is either "WebGPU", "D3D11", "D3D12", "metal", "vulkan", "openGL", "openGLES", or "null". See issue dawn:2112 and issue dawn:2107.

Screenshot of https://webgpureport.org featuring backend and type in adapter info.
Adapter info backend and type shown on https://webgpureport.org.

The optional unmaskHints list parameter in requestAdapterInfo() has been removed. See issue dawn:1427.

Timestamp queries quantization

Timestamp queries allow applications to measure the execution time of GPU commands with nanosecond precision. However, the WebGPU specification makes timestamp queries optional due to timing attack concerns. The Chrome team believes that quantizing timestamp queries provides a good compromise between precision and security, by reducing the resolution to 100 microseconds. See issue dawn:1800.

In Chrome, users can disable timestamp quantization by enabling the "WebGPU Developer Features" flag at chrome://flags/#enable-webgpu-developer-features. Note that this flag alone does not enable the "timestamp-query" feature. Its implementation is still experimental and therefore requires the "Unsafe WebGPU Support" flag at chrome://flags/#enable-unsafe-webgpu.

In Dawn, a new device toggle called "timestamp_quantization" has been added and is enabled by default. The following snippet shows you how to allow the experimental "timestamp-query" feature with no timestamp quantization when requesting a device.

wgpu::DawnTogglesDescriptor deviceTogglesDesc = {};

const char* allowUnsafeApisToggle = "allow_unsafe_apis";
deviceTogglesDesc.enabledToggles = &allowUnsafeApisToggle;
deviceTogglesDesc.enabledToggleCount = 1;

const char* timestampQuantizationToggle = "timestamp_quantization";
deviceTogglesDesc.disabledToggles = &timestampQuantizationToggle;
deviceTogglesDesc.disabledToggleCount = 1;

wgpu::DeviceDescriptor desc = {.nextInChain = &deviceTogglesDesc};

// Request a device with no timestamp quantization.
myAdapter.RequestDevice(&desc, myCallback, myUserData);

Spring-cleaning features

The experimental "timestamp-query-inside-passes" feature has been renamed to "chromium-experimental-timestamp-query-inside-passes" to make it clear to developers that this feature is experimental and available only in Chromium-based browsers for now. See issue dawn:1193.

The experimental "pipeline-statistics-query" feature, which was only partially implemented, has been removed because it is no longer being developed. See issue chromium:1177506.

This covers only some of the key highlights. Check out the exhaustive list of commits.

What's New in WebGPU

A list of everything that has been covered in the What's New in WebGPU series.

Chrome 124

Chrome 123

Chrome 122

Chrome 121

Chrome 120

Chrome 119

Chrome 118

Chrome 117

Chrome 116

Chrome 115

Chrome 114

Chrome 113