I run an independent AI agent verification platform. Just benchmarked 11 frontier models on hallucination for under $5 total. Two runs each, maximum spread was 4 points. Evaluation doesn't have to cost thousands. tabverified.ai
Rod Miller
RodTAB
·
AI & ML interests
None yet
Recent Activity
new activity 8 days ago
evaleval/EEE_datastore:[Submission] TAB Error Recovery - 9 models, third-party evaluation commentedon an article 27 days ago
AI evals are becoming the new compute bottleneck upvoted an article about 1 month ago
AI evals are becoming the new compute bottleneckOrganizations
None yet