Skip to main content

June 29, 2026 in Thought leadership

7–10 minutes

The Question IMBs Should Be Asking Before Their Next AI Investment

For independent mortgage banks, the real question is whether AI can reduce cost per loan, limit rework and hold up across a regulated mortgage pipeline.

Every AI vendor in the mortgage market right now will tell you their product works for mortgage. Some of them are partially right. Many have a general-purpose model that can handle certain tasks reasonably well, but the picture changes quickly when that same model is expected to perform across the full range of work a real mortgage pipeline requires.

Because the demo is almost always designed to showcase the cleanest use cases, not the hardest ones, independent mortgage banks can leave the evaluation process with a misleading sense of confidence. That is why the question most IMBs are not asking before they sign is the one that actually determines whether the investment pays off.

Does this AI perform consistently across the full range of tasks the pipeline requires, or does it perform well on the easy ones and degrade on the ones that consume the most time, create the most rework, and carry the most compliance weight?

For an independent mortgage bank, that distinction does not stay theoretical for long. It shows up in cost per loan, exception volume, fulfillment drag, and whether the business actually gains operating leverage or simply adds another layer of technology to manage.

Business people discussing in meeting seen through glass door at office.

“Built for mortgage” has become a marketing claim, not a technical specification

The AI market is moving quickly enough that terminology is doing a lot of work. Nearly every vendor with an AI model and a mortgage pitch is now described as purpose-built for lending, yet that label rarely tells an IMB operations leader what they actually need to know about performance on complex files, guideline nuance, or consistency once the workflow moves beyond a polished demo.

That matters because mortgage origination is not one task. It is a sequence of high-stakes tasks, from document extraction and income calculation to compliance review and borrower communication, and each one carries different accuracy requirements and consequences when something goes wrong.

To understand where automated models really need to hold up, IMB leaders need a clearer view into where AI, and what type of AI, makes the most sense at each stage of the mortgage origination lifecycle

General AI models, even well-regarded ones, do not perform evenly across those tasks. A model that reliably extracts values from a straightforward pay stub may struggle with self-employed income, rental income, or a recent employment gap, and for an IMB, those are often the exact scenarios where cost per loan, manual rework, and operating leverage show up most clearly.

When that uneven performance forces processors and underwriters to re-check the same work on more complex files, the business has not meaningfully reduced effort. It has simply added another system while much of the same labor remains underneath it.

Why pilots give IMBs a false sense of confidence

Most AI implementations in mortgage follow a familiar path. A vendor runs a pilot on a set of loan files, the AI performs well, and the IMB moves toward a broader contract with the assumption that the production environment will look roughly the same. What often goes unexamined, however, is whether the pilot reflects the true complexity of the live pipeline. Pilots are often built around clean, representative files rather than the full distribution of complexity an IMB sees in production.

That is why AI can look accurate in a pilot even when the hardest loans were never part of the sample. The self-employed borrower with declining business income, the file with multiple co-borrowers and a recent job change, or the loan that requires cross-referencing several investor overlays at once are exactly the scenarios where performance tends to degrade, and they are also the scenarios where processors and underwriters spend the most time.

For IMBs, those are also the files that determine whether technology actually lowers cost per loan, reduces manual rework, and improves operating leverage. A pilot can validate that a model works well on the middle of the distribution while saying very little about how it will behave on the files that drive the most cost and margin pressure once the IMB is live.

The evaluation questions IMBs should bring to every AI conversation

Changing this outcome requires changing the evaluation process before the investment decision is made. The right evaluation does not start with a demo. It starts with a clear understanding of the IMB’s own pipeline, including which loan types consume the most processor time, which borrower profiles generate the most exceptions, and which scenarios create the greatest operational drag.

Those are the files the AI needs to perform on, not the files that happen to appear in a vendor-curated sample. Once the conversation starts there, the questions that matter begin to look very different from what typically gets asked in a standard AI vendor evaluation.

1. Accuracy across loan types

IMBs should ask for performance data segmented by borrower profile, not just overall accuracy, because a model that is highly accurate on W-2 salaried borrowers and materially weaker on self-employed borrowers has a very different production value depending on the IMB’s mix. Aggregate accuracy can conceal exactly the variance that matters most.

2. Manual rework

IMBs should ask whether the system actually removes effort on the files that create the most operational drag, or whether it simply automates the easiest steps while teams still re-check the same work on the hardest loans. For an IMB, that distinction is critical because the goal is not surface-level automation. The goal is to lower cost per loan, reduce manual rework and exception handling, and create operating leverage without adding more systems or more headcount to manage around the same bottlenecks.

3. Auditability

Before evaluating what the AI can do, IMBs should evaluate how it explains what it did. They should ask to see the audit log for a flagged condition and understand what a compliance reviewer would actually see, because if the reasoning is opaque, the exposure is real regardless of how accurate the output appears inside a controlled environment.

4. Guideline specificity

Vendors should be able to walk through how the AI handles a specific agency guideline, lender overlay, or recent guideline change. A model reasoning against a static training set will not keep pace with the guideline updates an IMB operation has to absorb in real time, and that gap can quickly erase the efficiency the IMB expected to gain.

5. Integration depth

AI that connects to the LOS through an API is not the same as AI that operates with full workflow context. The former may read and write to the file, while the latter can reason across the full loan in context, which is much closer to the level of integration required for meaningful operational lift rather than surface-level automation.

6. Failure modes

What happens when the AI encounters a file it cannot process with confidence? The answer reveals a great deal about whether the system was designed for a regulated production environment or for a demo environment where edge cases are quietly handled manually off-screen.

These are also the questions that matter most as more IMBs explore AI agents for mortgage processing. The value of agentic systems is not simply that they can automate multiple steps, but that they can do so with enough consistency, control, and transparency to support real IMB operating workflows rather than isolated experiments.

The evaluation bar that actually matters

The AI market is not short on products that claim to work for mortgage. It is short on products that were actually designed for what mortgage requires: consistent accuracy across a diverse set of tasks, inside a regulated environment, at scale, with clear explainability attached to every action.

For IMBs, that is a harder bar than most vendors describe in a first conversation, but it is also the bar that separates AI that improves cost per loan from AI that adds cost while leaving the same work in the hands of processors and underwriters. That is why the broader conversation around AI adoption in mortgage lending needs to mature from capability claims to production standards.

The next phase is not about whether AI can assist with isolated tasks. It is about whether an independent mortgage bank can trust it inside real production environments to lower cost per loan, reduce manual rework on complex files, and create operating leverage without introducing new compliance risk or new operating drag. That is the question IMBs need to bring to the table before the next investment decision gets made.

The question is not whether an AI can do mortgage. The question is whether it performs consistently across the full range of tasks the pipeline requires, in the loan scenarios that actually drive processing cost, and in an environment where every output becomes part of a documented audit record. That bar is higher than the demo suggests, and it is exactly the bar IMBs need to set before they sign.

Demand more from your next AI investment

As IMBs work toward more unified operating ecosystems, point solutions that claim to automate isolated tasks can add complexity across systems instead of real efficiency. True optimization requires a clearer understanding of where automated guardrails end, where human judgment still matters, and how the two should work together inside the same operating model.

For IMBs, the real opportunity is not to buy the most impressive demo. It is to invest in AI that can reduce manual rework on complex files, support auditability in regulated workflows, and create the kind of operating leverage that holds up when margins are tight and volume conditions change.

Find out what we're up to!

Subscribe to get Blend news, customer stories, events, and industry insights.