Set up Evaluations
This guide walks you through the process of setting up an evaluation in AI Gateway. These steps are done in the Cloudflare dashboard ↗.
Datasets are collections of logs stored for analysis that can be used in an evaluation. You can create datasets by applying filters in the Logs tab. Datasets will update automatically based on the set filters.
- Apply filters to narrow down your logs. Filter options include provider, number of tokens, request status, and more.
- Select Create Dataset to store the filtered logs for future analysis.
You can manage datasets by selecting Manage datasets from the Logs tab.
After creating a dataset, choose the evaluation parameters:
- Cost: Calculates the average cost of inference requests within the dataset (only for requests with cost data).
- Speed: Calculates the average duration of inference requests within the dataset.
- Performance:
- Human feedback: measures performance based on human feedback, calculated by the % of thumbs up on the logs, annotated from the Logs tab.
- Create a unique name for your evaluation to reference it in the dashboard.
- Review the selected dataset and evaluators.
- Select Run to start the process.
Evaluation results will appear in the Evaluations tab. The results show the status of the evaluation (for example, in progress, completed, or error). Metrics for the selected evaluators will be displayed, excluding any logs with missing fields. You will also see the number of logs used to calculate each metric.
While datasets automatically update based on filters, evaluations do not. You will have to create a new evaluation if you want to evaluate new logs.
Use these insights to optimize based on your application’s priorities. Based on the results, you may choose to:
- Change the model or provider
- Adjust your prompts
- Explore further optimizations, such as setting up Retrieval Augmented Generation (RAG)