FINRA's 2026 Report Puts Data Extraction at the Top of GenAI Use Cases

SECTIONS

From Emerging Tech to Supervised Technology Why "Summarization and Information Extraction" Tops the List Hallucinations, Bias, and Cybersecurity: The Extraction Angles Governance Extends to Vendors The Agentic AI Guidance What Extraction Teams Should Do Now

FINRA's 2026 Annual Regulatory Oversight Report includes, for the first time, a standalone section on generative AI. For teams building extraction technology for financial documents, one finding stands out. "Summarization and Information Extraction" is the number-one GenAI use case among FINRA member firms. That finding carries compliance implications for every team building extraction pipelines.

The report, published on December 9, 2025 as part of FINRA's "FINRA Forward" initiative, draws clear lines around governance, testing, and auditability — and those lines apply to every vendor in the chain.

From Emerging Tech to Supervised Technology

Last year's report barely mentioned AI. This year, GenAI gets its own section — a notable increase in regulatory attention.

FINRA's rules are "technology-neutral," meaning existing obligations under supervision (Rule 3110), communications (Rule 2210), and recordkeeping (Rule 4511) apply in full when firms use GenAI. As the report states, FINRA's rules "continue to apply when firms use GenAI or similar technologies" as they would for any technology tool. Regulatory Notice 24-09 had already established this principle. The 2026 report goes beyond the general reminders of RN 24-09 and sets out specific practices FINRA expects firms to consider.

Why "Summarization and Information Extraction" Tops the List

Through surveys of member firms, FINRA found that the leading GenAI use case is "condensing large volumes of text and extracting specific entities, relationships, or key information from unstructured documents." This is precisely what extraction platforms, like Evolution AI, do with fund reports, prospectuses, regulatory filings, and annual accounts every day.

Most firms are piloting GenAI for internal efficiency rather than client-facing advice. But the compliance bar is the same regardless. And adoption is accelerating: the 2026 report catalogues fourteen GenAI use case types among member firms, spanning classification, sentiment analysis, code generation, data transformation, and more. Extraction leads the list.

Hallucinations, Bias, and Cybersecurity: The Extraction Angles

The report identifies three core risks for extraction teams. The first is hallucination — defined as "instances where the model generates information that is inaccurate or misleading, yet is presented as factual information." In financial extraction, a hallucinated NAV figure or fabricated regulatory reference is a compliance incident, not an edge case. Structured extraction has an advantage here: when you extract a dollar amount from a prospectus, you can verify it against the source document. The practical response is confidence scoring and source-document anchoring, backed by automated validation. FINRA expects these measures.

The second is bias, which the report describes as outputs "skewed or incorrect due to model design decisions or data that is limited or inaccurate." For extraction, this means models failing on new document formats or training data underrepresenting certain document types. FINRA also flags "concept drifts" — outdated information skewing results over time.

The third is cybersecurity. The report explicitly extends cybersecurity obligations to third-party vendor usage, making every extraction vendor a link in the chain.

Governance Extends to Vendors

FINRA expects enterprise-level supervisory processes with formal review and approval before deploying any GenAI solution. The report specifies model risk management frameworks, comprehensive documentation, and ongoing monitoring of prompts and outputs.

The critical detail for vendors: the report's GenAI section extends cybersecurity and governance obligations to third-party vendor usage, and Regulatory Notice 21-29 on outsourcing reinforces supervisory obligations over third-party technology providers more broadly. For GenAI extraction vendors, customers' compliance obligations flow through to the vendor. Firms remain responsible for decisions influenced by GenAI regardless of who built the technology.

Testing expectations are specific. The report calls for robust testing across areas including privacy, integrity, reliability, and accuracy. Firms should maintain "prompt and output logs for accountability and troubleshooting" and track "which model version was used." For extraction platforms, this means audit trails showing what model processed which document, what confidence scores were assigned, and what checks were run. Versioned model tracking and reproducible outputs are what these expectations point toward in practice.

The Agentic AI Guidance

The report also introduces guidance on AI agents — "systems capable of autonomously performing and completing tasks on behalf of a user." FINRA flags risks including agents acting without human validation, operating beyond intended scope, and creating auditability challenges through multi-step reasoning.

For extraction teams, this is forward-looking. As pipelines evolve from "pull data from a PDF" to "pull data, validate it, populate a system, flag anomalies," the agentic guidance applies. If an extraction pipeline triggers downstream actions, the report suggests firms consider guardrails to limit agent behaviours, "human in the loop" oversight protocols, and mechanisms to track agent actions and decisions. Human-in-the-loop review remains necessary; the design question is where in the pipeline to insert it.

What Extraction Teams Should Do Now

First, audit your pipeline end-to-end. Document every model, version, and data flow — FINRA expects "comprehensive documentation throughout." Second, implement confidence scoring and source anchoring so that every extracted data point traces back to the source document region. Third, build validation layers that cross-reference extracted values against known constraints; automated checks catch drift before it becomes a compliance issue. Fourth, version everything: model versions, prompt templates, extraction schemas. You need to know which model version produced which output. Fifth, prepare for vendor due diligence — financial-services customers will ask for governance documentation, audit trails, and testing evidence. Finally, watch the agentic boundary. If your extraction feeds automated downstream processes, assess whether the agentic AI guidance applies.

FINRA now expects firms to have concrete governance and testing practices around GenAI. Structured extraction pipelines that already anchor outputs to source documents have a head start on meeting these expectations — but only if the governance, documentation, and audit trails are in place to demonstrate it to regulators. The remaining work is proving compliance through audit trails and governance documentation.

‍

Author: Martin Goodson

Martin is a former Oxford University scientific researcher and has led AI research at several organisations. He is a member of the advisory group for the University College London generative AI Hub. In 2019, he was elected Chair of the Data Science and AI Section of the Royal Statistical Society, the membership group representing professional data scientists in the UK. Martin is the CEO of the multiple award-winning data extraction firm Evolution AI. He also leads the London Machine Learning Meetup, the largest AI & machine learning community in Europe.

‍

FEATURED

Three Underrated Financial Ratios & How to Use AI to Calculate Them Faster

What to do if ChatGPT Experiences an Error Reading Documents

Navigating Intelligent Document Processing (IDP) Tools: A Cheatsheet

Five Tips for Introducing AI-Led Automation into the Workplace

What Is Intelligent Data Extraction?

How to Automate Manual Data Entry with Generative AI

Book a demo