此页面由 Cloud Translation API 翻译。

借助客户端 Web AI 鼓励获得实用的商品评价

Maud Nalpas

Kenji Baheux

Alexandra Klepper

发布时间：2024 年 5 月 16 日

正面和负面评价有助于买家做出购买决定。

根据外部研究，82% 的线上购物者会在购买前主动寻找负面评价。这些负面评价对客户和商家都很有用，因为负面评价有助于降低退货率，并帮助创作者改进其产品。

以下是一些可以改进评价质量的方法：

提交前，请检查每条评价是否包含恶意内容。我们会鼓励用户移除冒犯性语言和其他无用评论，以便他们的评价能最大限度地帮助其他用户做出更明智的购买决定。
- 负面：这款包很糟糕，我讨厌它。
- 负面，但有实用反馈拉链非常硬，材质看起来很廉价。我已退回此包。
根据评价中使用的语言自动生成评分。
确定评价是负面还是正面。

包含情感和星级的评价示例的屏幕截图。 — 在此示例中，评价者的评论获得了正面情感和 5 星评分。

最终，用户应对商品评分拥有最终决定权。

以下 Codelab 提供了设备端和浏览器端的客户端解决方案。无需 AI 开发知识、服务器或 API 密钥。

前提条件

虽然采用服务器端 AI 解决方案（例如 Gemini API 或 OpenAI API）可为许多应用提供强大的解决方案，但在本指南中，我们将重点介绍客户端 Web AI。客户端 AI 推理在浏览器中进行，以通过消除服务器往返来改善 Web 用户的体验。

在此 Codelab 中，我们将混合使用多种技术，向您展示客户端 AI 工具箱中包含的内容。

我们使用以下库和模型：

TensforFlow.js，用于进行毒性分析。TensorFlow.js 是一个开源机器学习库，可用于网络上的推理和训练。
transformers.js 用于情感分析。Transformers.js 是 Hugging Face 提供的 Web AI 库。
Gemma 2B 适用于星级评分。Gemma 是一系列轻量级开放模型，其开发采用了 Google 在创建 Gemini 模型时所用的研究成果和技术。如需在浏览器中运行 Gemma，我们将其与 MediaPipe 的实验性 LLM Inference API 搭配使用。

用户体验和安全注意事项

为了确保最佳用户体验和安全性，请注意以下几点：

允许用户修改评分。最终，用户应对商品评分拥有最终决定权。
向用户明确说明评分和评价是自动生成的。
允许用户发布被归类为恶意评价的评价，但在服务器上进行第二次检查。这样可以避免出现以下令人沮丧的情况：系统误将非恶意评价归类为恶意评价（假正例）。这也涵盖恶意用户设法绕过客户端检查的情况。
客户端恶意检查非常有用，但可以绕过。确保您还在服务器端运行检查。

使用 TensorFlow.js 分析恶意内容

您可以快速开始使用 TensorFlow.js 分析用户评价的毒性。

安装并import TensorFlow.js 库和毒性模型。
设置最低预测置信度。默认值为 0.85，在我们的示例中，我们将其设置为 0.9。
异步加载模型。
异步对评价进行分类。我们的代码可识别任何类别超过阈值 0.9 的预测结果。

此模型可以对身份攻击、侮辱、粗俗等多种恶意内容进行分类。

例如：

import * as toxicity from '@tensorflow-models/toxicity';

// Minimum prediction confidence allowed
const TOXICITY_COMMENT_THRESHOLD = 0.9;

const toxicityModel = await toxicity.load(TOXICITY_COMMENT_THRESHOLD);
const toxicityPredictions = await toxicityModel.classify([review]);
// `predictions` is an array with the raw toxicity probabilities
const isToxic = toxicityPredictions.some(
    (prediction) => prediction.results[0].match
);

使用 Transformers.js 确定情感

安装并导入 Transformers.js 库。
使用专用流水线设置情感分析任务。首次使用某个流水线时，系统会下载并缓存该模型。此后，情感分析速度应该会快得多。

注意：除非您指定模型，否则 Transformers.js 将使用该特定流水线的默认模型。
异步对评价进行分类。使用自定义阈值设置您认为适合应用的置信度级别。

例如：

import { pipeline } from '@xenova/transformers';

const SENTIMENT_THRESHOLD = 0.9;
// Create a pipeline (don't block rendering on this function)
const transformersjsClassifierSentiment = await pipeline(
  'sentiment-analysis'
);

// When the user finishes typing
const sentimentResult = await transformersjsClassifierSentiment(review);
const { label, score } = sentimentResult[0];
if (score > SENTIMENT_THRESHOLD) {
  // The sentiment is `label`
} else {
  // Classification is not conclusive
}

使用 Gemma 和 MediaPipe 提供星级建议

借助 LLM Inference API，您可以在浏览器中完全运行大语言模型 (LLM)。

考虑到 LLM 的内存和计算需求（比客户端模型大 100 倍以上），这项新功能具有特别重要的变革意义。通过跨网站堆栈进行优化，包括新运算、量化、缓存和权重共享，让这一切成为可能。来源：“使用 MediaPipe 和 TensorFlow Lite 在设备上构建大语言模型”(Large Language Models On-Device with MediaPipe and TensorFlow Lite)。

安装并导入 MediaPipe LLM Inference API。
下载模型。在这里，我们使用从 Kaggle 下载的 Gemma 2B。Gemma 2B 是 Google 开放权重模型中最小的模型。
使用 FilesetResolver 将代码指向正确的模型文件。这一点非常重要，因为生成式 AI 模型的素材资源可能具有特定的目录结构。
使用 MediaPipe 的 LLM 接口加载和配置模型。准备要使用的模型：指定模型的位置、首选的回答长度以及温度的首选创意水平。
向模型提供提示（查看示例）。
等待模型的响应。
解析评分：从模型的回答中提取星级评分。

import { FilesetResolver, LlmInference } from '@mediapipe/tasks-genai';

const mediaPipeGenAi = await FilesetResolver.forGenAiTasks();
const llmInference = await LlmInference.createFromOptions(mediaPipeGenAi, {
    baseOptions: {
        modelAssetPath: '/gemma-2b-it-gpu-int4.bin',
    },
    maxTokens: 1000,
    topK: 40,
    temperature: 0.5,
    randomSeed: 101,
});

const prompt = …
const output = await llmInference.generateResponse(prompt);

const int = /\d/;
const ratingAsString = output.match(int)[0];
rating = parseInt(ratingAsString);

提示示例

const prompt = `Analyze a product review, and then based on your analysis give me the
corresponding rating (integer). The rating should be an integer between 1 and 5.
1 is the worst rating, and 5 is the best rating. A strongly dissatisfied review
that only mentions issues should have a rating of 1 (worst). A strongly
satisfied review that only mentions positives and upsides should have a rating
of 5 (best). Be opinionated. Use the full range of possible ratings (1 to 5). \n\n
  \n\n
  Here are some examples of reviews and their corresponding analyses and ratings:
  \n\n
  Review: 'Stylish and functional. Not sure how it'll handle rugged outdoor use,
  but it's perfect for urban exploring.'
  Analysis: The reviewer appreciates the product's style and basic
  functionality. They express some uncertainty about its ruggedness but overall
  find it suitable for their intended use, resulting in a positive, but not
  top-tier rating.
  Rating (integer): 4
  \n\n
  Review: 'It's a solid backpack at a decent price. Does the job, but nothing
  particularly amazing about it.'
  Analysis: This reflects an average opinion. The backpack is functional and
  fulfills its essential purpose. However, the reviewer finds it unremarkable
  and lacking any standout features deserving of higher praise.
  Rating (integer): 3
  \n\n
  Review: 'The waist belt broke on my first trip! Customer service was
  unresponsive too. Would not recommend.'
  Analysis: A serious product defect and poor customer service experience
  naturally warrants the lowest possible rating. The reviewer is extremely
  unsatisfied with both the product and the company.
  Rating (integer): 1
  \n\n
  Review: 'Love how many pockets and compartments it has. Keeps everything
  organized on long trips. Durable too!'
  Analysis: The enthusiastic review highlights specific features the user loves
  (organization and durability), indicating great satisfaction with the product.
  This justifies the highest rating.
  Rating (integer): 5
  \n\n
  Review: 'The straps are a bit flimsy, and they started digging into my
  shoulders under heavy loads.'
  Analysis: While not a totally negative review, a significant comfort issue
  leads the reviewer to rate the product poorly. The straps are a key component
  of a backpack, and their failure to perform well under load is a major flaw.
  Rating (integer): 1
  \n\n
  Now, here is the review you need to assess:
  \n
  Review: "${review}" \n`;

要点总结

无需具备 AI/机器学习专业知识。设计提示需要多次迭代，但其余代码是标准 Web 开发。

客户端模型相当准确。如果您运行本文档中的代码段，就会发现毒性分析和情感分析都能得出准确的结果。在测试的几个参考评价中，Gemma 评分在大多数情况下与 Gemini 模型评分一致。为了验证该准确性，需要进行更多测试。

不过，为 Gemma 2B 设计问题需要付出努力。由于 Gemma 2B 是一个小型 LLM，因此需要详细的提示才能生成令人满意的结果，这比 Gemini API 所需的提示要详细得多。

推理速度可以非常快。如果您运行本文档中的代码段，应该会发现在许多设备上，推理速度可以很快，甚至可能比服务器往返更快。不过，推理速度可能会有很大差异。需要在目标设备上进行全面的基准测试。随着 WebGPU、WebAssembly 和库的更新，我们预计浏览器推理速度将不断加快。例如，Transformers.js 在 v3 中添加了 WebGPU 支持，这可以成倍加快设备端推理速度。

下载内容的大小可能非常大。在浏览器中进行推理速度很快，但加载 AI 模型可能具有挑战性。如需在浏览器中执行 AI 操作，您通常需要同时使用库和模型，这会增加 Web 应用的下载大小。

虽然 TensorFlow 毒性模型（一种传统的自然语言处理模型）只有几千兆字节，但 Transformers.js 的默认情感分析模型等生成式 AI 模型可达 60MB。Gemma 等大语言模型的大小可达 1.3GB。这超出了中位数 2.2 MB 的网页大小，而这个大小已经远远超出了为获得最佳性能而建议的大小。客户端生成式 AI 在特定场景中是可行的。

网络上的生成式 AI 领域正在快速发展！未来预计会出现针对 Web 优化的更小型模型。

后续步骤

Chrome 正在尝试另一种在浏览器中运行生成式 AI 的方式。您可以报名加入抢先体验计划进行测试。