Automate the basics: use code to catch simple errors.
Now that you know which failures you want to catch with rule-based evals, it's time to implement the corresponding evaluator functions:
evalDataFormat(): Checks that the data format is correct. This includes valid JSON, all keys present, no empty values, motto is under six words, hexadecimal colors.evalContrastRatio(): Checks that the text-to-background color contrast ratio is accessible.
Implement rule-based evals
Choose a scoring method
The evaluation criteria are binary. Your rule-based eval functions should
produce a binary output, such as a PASS or FAIL label.
- ThemeBuilder app output (full theme object) →
evalDataFormat()→PASSorFAILlabel.PASSif the data format meets all constraints.FAILotherwise. - ThemeBuilder app output (color palette object) →
evalContrastRatio()→PASSorFAILlabel .PASSif the ratio is > 4.5.FAILotherwise.
Define an evals type
The PASS or FAIL metric is a boolean, but you can choose to implement it
as a string label (category) for readability.
To keep things lean, you can use the same TypeScript type for both your
rule-based and the LLM judge evals you'll implement later. Create an
EvalResult type that wraps a binary EvalLabel category, and a rationale
field for the judge model to explain its rating.
enum EvalLabel {
PASS = "PASS",
FAIL = "FAIL"
}
interface EvalResult {
label: EvalLabel;
rationale?: string;
}
Implement evaluators
Zod is a great tool for schema
validation, as it handles both the JSON structure and custom rules. It's
declarative, which makes the validation code readable. Define detailed error
reports with specific paths and reasons for failures, for easier
troubleshooting.
import { z } from 'zod';
import { MAX_WORD_COUNT } from './app.config';
const hexColorRegex = /^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$/;
// Reusable schema for hex colors
const HexColor = z.string().regex(hexColorRegex, { message: "Invalid hex color code" });
// zod schema definition for AppOutput
export const AppOutputSchema = z.object({
motto: z.string().min(1, { message: "Motto is missing or empty" }).refine((val) => {
const words = val.replace(/[^\w\s]|_/g, "").trim();
const count = words ? words.split(/\s+/).length : 0;
return count > 0 && count <= MAX_WORD_COUNT;
}, { message: `Motto must be between 1 and ${MAX_WORD_COUNT} words` }),
colorPalette: z.object({
textColor: HexColor,
backgroundColor: HexColor,
primary: HexColor,
secondary: HexColor
}).catchall(HexColor)
});
Contrast ratio
Keep domain logic, such as the contrast-ratio calculations, in separate utility functions.
/*
* Input: ColorPalette {"textColor":"#333333","backgroundColor":"#000000", ...}
* Output: EvalResult {"status":"FAIL","rationale":"Contrast ratio is 1.66:1 (must be >= 4.5:1)."}
* minContrastRatio is an app config variable, MIN_CONTRAST_RATIO = 4.5
*/
export function evalContrastRatio(colorPalette: ColorPalette, minContrastRatio: number): EvalResult {
if (!colorPalette || !colorPalette.textColor || !colorPalette.backgroundColor) {
return { status: EvalLabel.FAIL, rationale: "Missing textColor or backgroundColor." };
}
try {
const ratio = getContrastRatio(colorPalette.textColor, colorPalette.backgroundColor);
const rationale = `Contrast ratio is ${ratio.toFixed(2)}:1 (must be >= ${minContrastRatio}:1).`;
if (ratio < minContrastRatio) {
return { status: EvalLabel.FAIL, rationale };
}
return { status: EvalLabel.PASS, rationale };
} catch (e) {
return { status: EvalLabel.FAIL, rationale: "Could not calculate contrast ratio (invalid hex?)." };
}
}
Take a look at our evaluator code
for evalDataFormat() and evalContrastRatio().
Test rule-based evals
Rule-based evaluations are deterministic, so you can implement classic unit
tests to check their behavior. Build your tests to run various outputs through
the evaluators and assert whether they return the PASS or FAIL
label you expected.
If a test case expects the evaluator to return a FAIL and it does, the
test outputs a PASS because the evaluator behaved as intended.
import { MIN_CONTRAST_RATIO } from '../src/app.config'; // 4.5
const testCases = [
{
// ...
appOutput: {
motto: "Test motto",
colorPalette: {
textColor: "#333333",
backgroundColor: "#000000",
primary: "#FF0000",
secondary: "#333333"
}
},
expected: {
// Dark grey on black (low contrast): FAIL
contrast: EvalLabel.FAIL
}
}
// ... more test cases
];
testCases.forEach((testCase) => {
const result = evalContrastRatio(
testCase.appOutput.colorPalette as any, MIN_CONTRAST_RATIO
);
const actualEvalLabel = result.label;
const expectedEvalLabel = testCase.expected.contrast;
const isSuccess = actualEvalLabel === expectedEvalLabel;
// ...
});
Try it
- Clone the evals-course project.
- Set it up.
- Run the rule-based evaluator tests yourself.