Sources and tools

Non-exhaustive list of sources used in this course and evals tools that can help you.

Maud Nalpas

For more resources on testing and AI, we recommend the following resources.

Learn Testing: Refresh your approach to testing.
Learn AI: Design AI systems for your websites and web applications.
Google DeepMind Evals: Multiple standardized benchmarking tools for different types on models
Gemini Evaluations Playbook: Recipes for experimenting and evaluating generative AI models with Vertex AI.
Responsible AI toolkit: Evaluate models and systems for safety.
Evaluating your evals: A meta lesson on how to understand what evals to use, and what works effectively.
Building better AI benchmarks: How many raters are enough? Understand an evaluation framework for ML models that optimizes the trade-off between the number of items and raters per item, to build reproducible AI benchmarks.

Course sources

We relied on several sources to write this series, including:

Examples of evals solutions and tools include:

There are many more eval tools available. If you are using other tools, share them with us.