
Underwriters still spend hours keying data from tax returns and financial statements into spreading tools. With loan volumes climbing and ops headcount flat, that workflow doesn't scale. Agentic AI pipelines can now classify documents, extract structured data, cross-reference figures across forms, and flag inconsistencies before a human touches the file. This article covers the architecture of these pipelines and the engineering considerations behind them.
An agentic workflow is a process driven by an AI agent that can make decisions, adapt, and act autonomously toward a goal — rather than following a fixed set of pre-programmed rules. Instead of a system that follows explicit step-by-step instructions, the agent receives a goal (e.g., "extract and reconcile this borrower's financials"), some tools (APIs, databases, extraction models), and constraints. The agent then decides what steps to take, evaluates intermediate results, adjusts its approach if needed, and continues until it reaches the objective.
This is fundamentally different from traditional automation. Traditional automation follows a fixed script: same input, same output. It requires every step to be defined in advance, executes once through a pipeline, and breaks when it hits an edge case that wasn't coded for. An agentic workflow operates differently. Given a goal, the system decides what steps to take. It plans its own sequence of actions, uses an iterative reasoning loop — think, act, observe, revise — and can recover from unexpected inputs without explicit handling for every case.
In underwriting, this distinction matters because borrower packages are inherently unpredictable. No two submissions look the same. A rule-based pipeline that executes fixed instructions will choke on the variability. If a schedule is missing, the agent re-evaluates the document set and adjusts its extraction plan. If a form layout is unfamiliar, it reasons about field labels and context rather than failing silently. If information is incomplete or ambiguous, the agent can determine that it needs to return to the applicant asking for further details — orchestrating this autonomously or routing it through a human review step before proceeding. An agentic workflow handles that variability by design.
A typical commercial loan package runs 50 to 200+ pages: multi-year tax returns, interim P&Ls, balance sheets, bank statements, and supporting schedules. The Mortgage Bankers Association's 2025 performance report puts per-loan production costs at $11,094 — and those costs are rising even as volumes increase, because manual processes don't benefit from scale the way automated ones do.
Deloitte's research on credit journey optimisation found that banks can cut origination handling costs by 30–40% through a combination of process streamlining, governance simplification, and better use of technology and data. One Benelux bank reduced mortgage approval timelines from 15–20 days to 3–5 days by digitising credit review, collateral valuation, and underwriting. The bottleneck is data entry.
Traditional OCR with template matching was supposed to help, but it breaks constantly. Different CPAs format financials differently, borrowers submit scans at odd angles, and K-1s from partnerships rarely look the same twice. Template-based systems can't reason about document intent or detect gaps across a package. An agentic approach handles this variability because it adapts its extraction plan to the documents it encounters.
Tax forms — 1040s, 1065s, 1120/1120-S, and K-1s — demand multi-year, multi-entity extraction at the schedule level. The agent might need to pull depreciation add-backs from a 1065 and reconcile K-1 distributions across entities.
P&Ls, balance sheets, and cash flow statements range from QuickBooks exports to CPA-prepared compilations with inconsistent line items and mixed accounting bases.
Bank statements serve as ground truth for cash flow verification. Underwriters cross-reference them against reported income to spot discrepancies. No two borrower packages look the same — the agent must reason about document intent, not just layout.
A production agentic pipeline breaks into four stages. Rather than a rigid directed acyclic graph where every step is predefined, the agent orchestrates these stages dynamically — deciding which sub-tasks to invoke, evaluating results, and looping back when intermediate outcomes require a different approach. This follows the reasoning-and-acting pattern described by Yao et al. in the ReAct framework (ICLR 2023), where agents interleave reasoning traces with task-specific actions.
The agent identifies each document's type, tax year, and entity, then routes it to the correct extraction schema. In an agentic workflow, this isn't a one-shot classification — if downstream extraction reveals the initial classification was wrong (say, a K-1 was misidentified as a Schedule E), the agent can re-classify and re-route without human intervention.
LLM-driven extraction maps content to structured JSON schemas aligned to spread fields. Research on document understanding — such as LayoutLM's joint modelling of text and spatial layout (Xu et al., ACM SIGKDD 2020) — demonstrates that models combining textual and positional signals significantly outperform text-only approaches on form extraction tasks. Modern language models take this further by parsing semantic meaning from form labels, mapping a field like "Net ordinary business income (loss)" to the correct spread line even when scan quality is poor.
The agent checks whether P&L revenue matches the 1040 Schedule C, whether balance sheet assets minus liabilities equal stated equity, and whether EBITDA reconciles across periods. When discrepancies surface, the agentic workflow doesn't just flag them — it reasons about potential causes (rounding, basis differences, amended returns) and determines whether to escalate, auto-accept, or return to the applicant for clarification.
Missing schedules, unsigned forms, mismatched fiscal years, incomplete K-1 sets — surfaced before the underwriter opens the file. When the agent identifies missing information that cannot be inferred or resolved from the existing package, it can autonomously initiate a request back to the applicant for further details, or route that request through a human review step depending on the configured level of autonomy. The underwriter's time shifts toward reviewing exceptions rather than keying data.
The architectural choice that matters most is orchestrating with a supervisor agent that plans dynamically, because underwriting documents require adaptive routing. A missing schedule might mean re-classifying a document initially skipped — the same iterative loop described in Stage 1 through Stage 4 above.
Extracted fields map to a standardised spreading model that feeds DSCR, DTI, leverage, and liquidity calculations, with output as JSON for direct LOS ingestion or Excel for analyst override.
When numbers don't add up, the correct strategy is to flag with confidence scores and source citations rather than silently correct. Some mismatches are legitimate — rounding differences, accrual vs. cash basis, amended returns. Discrepancies below a materiality threshold get auto-accepted with logging; everything above gets queued for underwriter review with source pages highlighted.
With this workflow, the underwriter's role changes: instead of entering data and checking sums, they review a pre-reconciled package and make judgment calls on the exceptions the agent couldn't resolve autonomously.
In regulated lending, black-box extraction is a non-starter. The Federal Reserve's SR 11-7 guidance on model risk management requires documentation "sufficiently detailed to allow parties unfamiliar with a model to understand how the model operates, as well as its limitations and key assumptions." SR 11-7 defines "model" broadly enough — quantitative methods processing input data into quantitative estimates — that extraction pipelines plausibly fall under its scope.
Every extracted value must trace back to a source page and bounding box. Full provenance logs — which field, from which page, at what confidence, and whether a human modified it — are the minimum for regulatory compliance. The OCC's Comptroller's Handbook for Commercial Loans similarly requires banks to establish minimum documentation standards and consistent underwriting procedures — expectations that extend naturally to automated extraction systems feeding those processes.
An agentic workflow must also log its reasoning — what it extracted and why it made each decision. If the agent re-classified a document or auto-accepted a discrepancy, the reasoning trace should be auditable.
Field-level extraction accuracy on common tax forms varies by model and form type. Teams should benchmark against a representative sample of real loan packages from their own portfolio, measuring both field-level accuracy and document-level completeness. The errors that matter most cluster in the fields requiring the most contextual reasoning — net income, add-backs, adjustments. A 95% headline number obscures the fact that the remaining 5% of errors tend to cluster in the fields that drive the credit decision.
Interested in fast, accurate data extraction from financial statements without the hassle? Financial Statements AI has everything you need. Sign up here for a free trial.
Author: Martin Goodson is a former Oxford University scientific researcher and has led AI research at several organisations. He is a member of the advisory group for the University College London generative AI Hub. In 2019, he was elected Chair of the Data Science and AI Section of the Royal Statistical Society, the membership group representing professional data scientists in the UK. Martin is the CEO of the multiple award-winning data extraction firm Evolution AI. He also leads the London Machine Learning Meetup, the largest AI & machine learning community in Europe.