Book a demo

For full terms & conditions, please read our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The Real Cost of Manual Data Extraction

Miranda Hartley
July 21, 2023

Manual extraction is defined as using a human operative to capture and record data. For many companies, data extraction is expensive but ultimately inevitable. Using employees to extract data from bank statements, financial statements, invoices and quarterly reports is common practice despite an abundance of technological alternatives.

As companies expand, many simply hire more FTEs for financial data extraction. However, in the long term, manual data extraction can counteract growth due to the high cost of human error. In 2008, the IDC estimated human error cost US businesses £315 per employee.

Human mistakes are the natural role of manual data extraction's understimulating nature. Successful completion of data extraction tasks requires:

  • Sustained concentration
  • Fast typing - ideally over 40-45 words per minute
  • Scrupulous attention to detail

On the other hand, errors can be critical in most professions. Therefore, finding a solution that is both sustainable and - preferably - 100% accurate is key.

A deeper dive: Automating extraction from invoices

According to a 2019 Billentis report, out of the 550 billion invoices generated, only 55 billion were electronic or paperless. Consequently, data extraction often takes place on poor-quality PDF scans. Extraction is a complex process that is expected to stay relevant for decades to come.

In a sense, the term 'extraction' is misleading, as it suggests a single action. Extrapolating the process of invoice data extraction looks something like this:

Human error mostly occurs in steps 1-2; thus, companies take precautions during step three - verification. One of Evolution AI’s clients - DF Capital - employed a ‘four eye check’ to extract data from invoices, whereupon captured data required approval from a colleague. Double-key entry (where two people manually enter data, and the software identifies any discrepancies in the recorded data) has also become an increasingly popular data extraction solution.

Until the last decade or so, it was believed that machines were incapable of completing step 1. However, the mainstream deployment of NLP (natural language processing) has shown that large language models can both generate and comprehend language to a superior level to many humans. Using AI to read PDFs has transitioned from an experimental to an almost universally viable solution.

AI-powered data extraction can offer a unified approach to the extraction process. Instead of using multiple employees and systems to capture data, AI automation can do it in seconds.

Large organisations, in particular, can benefit from AI-based data extraction to unify various internal systems. For example, one of our clients, Novuna, utilised several cross-silo processes before deploying automated AI data extraction.

It is undeniable that holistic automation saves time and money. How much money varies - for example, an HFS Report stated that automation is destined to save Novuna millions of pounds. Forbes reported that automated intelligent document processing could save companies 40-70% of the costs of data extraction.

However, some organisations hesitate to abandon manual extraction, believing training human operatives is easier than implementing technological solutions.

Is the effort required to train and correct AI data extraction tools equivalent to manual data extraction?

Until a few years ago, AI solutions for data extraction required installation on-premises and then significant training. The effort for arranging physical implementation made manual data extraction seem like a comparatively low-effort solution.

Zero-shot learning, however, eliminates the need for training the model, meaning no training data is required for some document types (for example, bank statements and invoices). For documents with many unique and varying fields, such as  training will be necessary but can be completed quickly.

In terms of data extraction, is human error roughly equivalent to machine error?

Given the average attention span of 14 minutes, it is inevitable that human errors occur in data extraction. Although the error rate of manual capture is difficult to estimate (often suggested to be around 1%), such mistakes can be extremely costly. For example, the decision-making process in commercial finance can be skewed through incorrect data entry.

In contrast, many AI data extraction software companies offer complete accuracy in extracted data. The accuracy of automated data extraction increasingly improves with the number of training documents, as the AI learns from its mistakes and never repeats them. Therefore, though machine learning is capable of producing mistakes, they can be attenuated through correctly training the model (something that is simply not achievable with human employees).

A diagram of the accuracy of extracted documents vs the number of training documents.

With the rate at which AI data extraction software is advancing, will it eventually become obsolete?

Since ChatGPT was released in November, AI has seen a wave of public awareness. Yet unmaintained AI-powered data extraction software will rapidly become outdated. Consequently, it is important to choose a data extraction provider dedicated to maintaining their technology.

In summary:
  • The complexity of the data extraction process facilitates human mistakes during manual data extraction.
  • Human error in manual extraction can hinder growth and incur substantial costs.
  • AI document extraction from research-focused providers offers a sustainable solution.

If you'd like to speak to one of our experts about transitioning from manual data extraction to an intelligent solution, please book a demo or contact us at

Want to discover more about AI-powered data extraction? Check out our other articles:

Automated invoice processing - how to process invoices at scale

5 Common Myths About AI-Based Data Extraction Debunked