Language detection in Chrome with built-in AI

Published: September 24, 2024

Before translating text from one language to another, you must first determine what language is used in the given text. Previously, this required uploading the text to a cloud service. With inference on-device, you can improve your privacy story. While it's possible to ship a specific library which does this, it would require additional resources to download.

The Language Detection API proposal aims to solve this challenge by fine-tuning a model to this task, with an API built-in to the browser.

Example use cases

The Language Detection API is primarily useful in the following scenarios:

  • Determine the language of input text, so it can be translated.
  • Determine the language of input text, so the correct model can be loaded for language-specific tasks, such as toxicity detection.
  • Determine the language of input text, so it can be labeled correctly, for example, in online social networking sites.
  • Determine the language of input text, so an app's interface can be adjusted accordingly. For example, on a Belgian site to only show the interface relevant to users who speak French.

Use the Language Detection API

The Language Detection API is part of the larger family of the Translation API. First, run feature detection to see if the browser supports the Language Detection API.

if ('translation' in self && 'canDetect' in self.translation) {
  // The Language Detection API is available.
}  

Model download

Language detection depends on a model that is fine-tuned for the specific task of detecting languages. While the API is built in the browser, the model is downloaded on-demand the first time a site tries to use the API. In Chrome, this model is very small by comparison with other models. In fact, it might already be present given that this model is also used by Chrome browser features.

To see if the model is ready to use, call the asynchronous translation.canDetect() function. There are three possible responses:

  • 'no': The current browser supports the Language Detection API, but it can't be used at the moment. For example, because there isn't enough free disk space available to download the model.
  • 'readily': The current browser supports the Language Detection API, and it can be used right away.
  • 'after-download': The current browser supports the Language Detection API, but it needs to download the model first.

To trigger the download and instantiate the language detector, call the asynchronous translation.createDetector() function. If the response to canDetect() was 'after-download', it's best practice to listen for download progress, so you can inform the user in case the download takes time.

The following example demonstrates how to initialize the language detector.

const canDetect = await translation.canDetect();
let detector;
if (canDetect === 'no') {
  // The language detector isn't usable.
  return;
}
if (canDetect === 'readily') {
  // The language detector can immediately be used.
  detector = await translation.createDetector();
} else {
  // The language detector can be used after model download.
  detector = await translation.createDetector();
  detector.addEventListener('downloadprogress', (e) => {
 console.log(e.loaded, e.total);
  });
  await detector.ready;
}

Run the language detector

The Language Detection API uses a ranking model to determine which language is most likely used in a given piece of text. Ranking is a type of machine learning, where the objective is to order a list of items. In this case, the Language Detection API ranks languages from highest to lowest probability.

The detect() function can return either the first result, the likeliest answer, or iterate over the ranked candidates with the level of confidence. This is returned as a list of {detectedLanguage, confidence} objects. The confidence level is expressed as a value between 0.0 (lowest confidence) and 1.0 (highest confidence).

const someUserText \= 'Hallo und herzlich willkommen\!';
const results \= await detector.detect(someUserText);
for (const result of results) {
  // Show the full list of potential languages with their likelihood, ranked
  // from most likely to least likely. In practice, one would pick the top
  // language(s) that cross a high enough threshold.
  console.log(result.detectedLanguage, result.confidence);
}
// (Output truncated):
// de 0.9993835687637329
// en 0.00038279531872831285
// nl 0.00010798392031574622
// ...

Demo

Preview the Language Detection API in our demo. Enter text written in different languages in the textarea.

Sign up for the origin trial

Register for the Language Detector API trial to start testing this API with your users. This origin trial runs from Chrome 130 to 135.

Learn more about how origin trials work.

Standardization effort

The Language Detection API was moved to the W3C Web Incubator Community Group after the corresponding proposal received enough support. The API is part of a larger Translation API proposal. The Chrome team requested feedback from the W3C Technical Architecture Group and asked Mozilla and WebKit for the particular browser vendor's standards positions.

Share your feedback

If you have feedback on Chrome's implementation, file a Chromium bug. Share your feedback on the API shape of the Language Detection API by commenting on an existing or open a new Issue in the Translation API GitHub repository.

Resources