RAGLens
/Bulk TestRun all test cases through the live pipeline and see where eval results agree with expectations. Disagreements reveal where the pipeline or eval needs attention.
Choose a corpus and run tests to see results here.
Each row will show expected vs. actual pass, score, and eval diagnosis.