LLMs’ “Bullsh*t” problem, DARPA, and testing for nonsense

Defence research agency looks to challenge the "fundamental gaps between state-of-the-art AI systems and national security applications"

The tendency of large language models (LLMs) to “hallucinate” continues to trouble CIOs eyeing production use-cases – even as efforts around fine-tuning and retrieval augmented generation-based optimisations continue.