Your Name Which agent output are you evaluating? Select an output... Tester 1 Tester 2 Tester 3 Tester 4 Tester 5 Tester 6 Tester 7 Tester 8 Tester 9 Tester 10 What question did you ask the agent? Accuracy — Was the information correct? 1 2 3 4 5 1 = Incorrect · 5 = Fully accurate Confidence — Was the response appropriately certain or hedged? 1 2 3 4 5 1 = Wrong confidence level · 5 = Perfectly calibrated Relevance — Did it actually answer the question? 1 2 3 4 5 1 = Off-topic · 5 = Directly on point Completeness — Was anything important missing? 1 2 3 4 5 1 = Major gaps · 5 = Fully complete Overall — Did this response meet the quality threshold? Yes — cleared the bar No — fell short Borderline Comments (optional) Submit Feedback Something went wrong. Please try again. Thank you! Your feedback has been recorded. Submit another response