Trust at scale: Auto-evaluation for high-stakes LLM accuracy
Elicit’s mission is to scale up good reasoning. Large language models (LLMs) are currently the most promising technology to achieve this goal, and we use them extensively to help our users search and analyze the scientific literature.
But LLMs are known for their unreliability: they often misunderstand instructions, fail