Stay organized with collections
Save and categorize content based on your preferences.
check_circle
Introduction to AI Evals
keyboard_arrow_down
keyboard_arrow_up
subject
Article
An introduction to AI Evals: why we need them, and how to create them.
check_circle
What you'll learn
keyboard_arrow_down
keyboard_arrow_up
subject
Article
What to expect from this series, and what you should know before you start.
check_circle
Mental model
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Mapping your web testing knowledge to the world of large language models.
check_circle
Design evaluations
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Define what good and bad looks like for your AI application.
check_circle
Build rule-based evaluations
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Automate the basics. Use code to catch simple errors.
check_circle
Build a basic judge, part 1
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Get your subjective evaluations running with a basic judge model.
check_circle
Build a basic judge, part 2
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Finish setting up your basic judge model to get your subjective evaluations running.
check_circle
Build an evals pipeline
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Applied engineering tips to build your AI testing pipeline.
check_circle
Run evaluations
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Structure your testing into layers.
check_circle
Build an expert judge
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Use a large language model to judge the quality of your AI application.
check_circle
Conclusion
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Final takeaways from the course.
check_circle
Course resources
keyboard_arrow_down
keyboard_arrow_up
subject
Article
Optional
A non-exhaustive list of sources used in this course and eval tools that can help you.
Claim your badge
Confirm you've completed all modules on learning AI evaluations to claim your badge.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[],[]]