Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

May 19, 2025·
Siyuan LI
Siyuan LI
,
Jian Chen
,
Rui Yao
,
Xuming Hu
,
Peilin Zhou
,
Weihua Qiu
,
Simin Zhang
,
Chucheng Dong
,
Zhiyao Li
,
Qipeng Xie
,
Zixuan Yuan
· 1 min read
Compliance-to-Code Framework
Abstract
Regulatory compliance has become a cornerstone of corporate governance, ensuring adherence to systematic legal frameworks. At its core, financial regulations often comprise highly intricate provisions, layered logical structures, and numerous exceptions, which inevitably result in labor-intensive or comprehension challenges. To mitigate this, recent Regulatory Technology (RegTech) and Large Language Models (LLMs) have gained significant attention in automating the conversion of regulatory text into executable compliance logic. However, their performance remains suboptimal particularly when applied to Chinese-language financial regulations, due to three key limitations - (1) incomplete domain-specific knowledge representation, (2) insufficient hierarchical reasoning capabilities, and (3) failure to maintain temporal and logical coherence. To fill these gaps, we present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance. Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements—subject, condition, constraint, and contextual information—along with regulation relations. We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing. To demonstrate utility, we present FinCheck - a pipeline for regulation structuring, code generation, and report generation.
Type
Publication
In arXiv preprint

Abstract

This paper presents Compliance-to-Code, a large-scale Chinese dataset for financial regulatory compliance, containing 1,159 annotated clauses from 361 regulations across ten categories. Each clause is structured with four logical elements: subject, condition, constraint, and contextual information. The dataset includes deterministic Python code mappings and detailed reasoning to facilitate automated compliance checking.

Dataset Overview

  • Scale: 1,159 annotated regulatory clauses
  • Coverage: 361 regulations across ten financial categories
  • Structure: Modular compliance units with logical elements
  • Code Mappings: Python implementations for automated checking

FinCheck Pipeline

The paper introduces FinCheck, a pipeline system for automated compliance checking that processes natural language regulations and generates executable compliance code.