CrUX methodology

This section documents how CrUX collects and organizes user experience data.

Eligibility

At the core of the CrUX dataset are individual user experiences, which are aggregated into page-level and origin-level distributions. This section documents user eligibility and the requirements for pages and origins to be included in the dataset. All eligibility criteria must be satisfied in order for an experience to be included in page-level data available in PageSpeed Insights and the CrUX API: User, Origin and Page. Experiences which meet the User and Origin criteria but not Page aren't included in the origin-level data available in all CrUX data sources.

Pages and origins are automatically included or removed from the dataset if their eligibility changes over time. At this time, you cannot manually submit pages or origins for inclusion.

Publicly discoverable

A page must be publicly discoverable to be considered for inclusion in the CrUX dataset.

A page is determined to be publicly discoverable using the same indexability criteria as search engines.

A page cannot meet the discoverability requirement if any of the following conditions are met, including root pages for the origin dataset:

  • The page is served with an HTTP status code other than 200 (after redirects).
  • The page is served with an HTTP X-Robots-Tag: noindex header or equivalent.
  • The document includes a <meta name="robots" content="noindex"> meta tag or equivalent.

Refer to Google Search Console for an overview of your site's indexing status.

Sufficiently popular

A page is determined to be sufficiently popular if it has a minimum number of visitors. An origin is determined to be sufficiently popular if it has a minimum number of visitors across all of its pages. An exact number is not disclosed, but it has been chosen to ensure that we have enough samples to be confident in the statistical distributions for included pages. The minimum number is the same for pages and origins.

Pages and origins that don't meet the popularity threshold are not included in the CrUX dataset.

Origin

An origin represents an entire website, addressable by a URL like https://www.example.com. For an origin to be included in the CrUX dataset it must meet two requirements:

  1. Publicly discoverable
  2. Sufficiently popular

You can verify that your origin is discoverable by running a Lighthouse audit and looking at the SEO category results. Your site is not discoverable if your root page fails the Page is blocked from indexing or Page has unsuccessful HTTP status code audits.

If an origin is determined to be publicly discoverable, eligible user experiences on all of that origin's pages are aggregated at the origin-level, regardless of individual page discoverability. All of these experiences count towards the origin's popularity requirement.

For querying purposes, note that all origins in the CrUX dataset are lowercase.

Page

The requirements for a page to be included in the CrUX dataset are the same as origins:

  1. Publicly discoverable
  2. Sufficiently popular

You can verify that a page is discoverable by running a Lighthouse audit and looking at the SEO category results. Your page is not discoverable if it fails the Page is blocked from indexing or Page has unsuccessful HTTP status code audits.

If page is publicly discoverable for some users, but returns a non-success HTTP status in some circumstances, then those experiences won't be included in CrUX.

Pages commonly have additional identifiers in their URL including query string parameters like ?utm_medium=email and fragments like #main. These identifiers are stripped from the URL in the CrUX dataset so that all user experiences on the page are aggregated together. This is useful for pages that would otherwise not meet the popularity threshold if there were many disjointed URL variations for the same page. Note that in rare cases this may unexpectedly group experiences for distinct pages together; for example if parameters ?productID=101 and ?productID=102 represent different pages.

Pages in CrUX are measured based on the top-level page. Pages included as iframes are not reported on separately in CrUX, but do contribute to the metrics of the top-level page. For example, if https://www.example.com/page.html embeds https://www.example.com/frame.html in an iframe, then page.html will be represented in CrUX (subject to the other eligibility criteria) but frame.html will not. And if frame.html has poor CLS then the CLS will be included when measuring the CLS for page.html. CrUX is the Chrome User Experience Report and a user may not even be aware this is an iframe. Therefore, the experience is measured at the top level pageā€”as per how the user sees this.

A website's architecture may complicate how its data is represented in CrUX. For example, single page apps (SPAs) may use a JavaScript-based route transition scheme to move between pages, as opposed to conventional anchor-based page navigations. These transitions appear as new page views to the user, but to Chrome and the underlying platform APIs the entire experience is attributed to the initial page view. This is a limitation of the web platform APIs on which CrUX is built, see How SPA architectures affect Core Web Vitals on web.dev for more information.

User

For a user to have their experiences aggregated in the CrUX dataset, they must meet the following criteria:

  1. Enable usage statistic reporting.
  2. Sync their browser history.
  3. Not have a Sync passphrase set.
  4. Use a supported platform.

The current supported platforms are:

  • Desktop versions of Chrome including Windows, macOS, ChromeOS, and Linux operating systems.
  • Android versions of Chrome, including mobile apps using Custom Tabs and WebAPKs.

There are a few notable exceptions that don't provide data to the CrUX dataset:

  • Chrome on iOS.
  • Android apps using WebView.
  • Other Chromium browsers (for example Microsoft Edge).

Chrome does not publish data about the proportions of users that meet these criteria. You can learn more about the data we collect in the Chrome Privacy Whitepaper.

Accelerated Mobile Pages (AMP)

Pages built with AMP are included in the CrUX dataset like any other web page. As of the June 2020 CrUX release, pages served using the AMP Cache and / or rendered in the AMP Viewer are also captured, and attributed to the publisher's page URL.

Data quality

Data in CrUX undergoes a small amount of processing to ensure that it is statistically accurate, well structured and easier to query.

Filtering

The CrUX dataset is filtered to ensure that the presented data is statistically valid. This may exclude entire pages or origins from appearing in the dataset.

In addition to the eligibility criteria applied to origins and pages, further filtering is applied for segments within the data:

Origins or pages having more than 20% of their total traffic excluded due to ineligible combinations of dimensions are excluded entirely from the dataset.

Because the global-level dataset encompasses user experiences from all countries, combinations of dimensions that don't meet the popularity criteria at the country level may still be included at the global level, provided that there is sufficient popularity.

Fuzzing

A small amount of randomness is applied to the dataset to prevent reverse-engineering of sensitive data, such as total traffic volumes. This does not affect the accuracy of aggregate statistics.

Precision

Most metric values within the CrUX dataset are represented as histograms of values and bin sizes, where the histogram value is a fraction of all included segments summing to 1. Bin sizes are floating point numbers between 1.0 and 0.0001.

Histogram bin widths are normalized to simplify querying and visualizing the data. This means that larger bins may be split into smaller bins, which equally share the original density in order to maintain consistent bin widths.

License

CrUX datasets by Google are licensed under a Creative Commons Attribution 4.0 International License.