
First-ever benchmark for evaluating multi-turn long-form question answering in knowledge-intensive domains.
Sep 26, 2025

An LLM-based uncertainty-aware framework for interpreting Federal Reserve communications with enhanced reliability.
Aug 12, 2025

The first large-scale Chinese dataset for financial regulatory compliance with automated checking pipeline.
May 19, 2025