Extracting data from 180,000 pages per day in real-time
Dun & Bradstreet is a leading global data provider who are adept at using cutting-edge technology to source timely and accurate data. Shareholder information of all five million UK companies held in Companies House is a critical resource for their customers.
What was the problem?
Documents at the UK corporate registry, Companies House, are often poor quality scans and PDFs, meaning data extraction can be expensive and time-consuming. Dun & Bradstreet's previous supplier manually keyed in the data from annual returns and confirmation statements. Automation of this process was challenging, because of the many variations in document layout.
As Patrick Walsh, Dun & Bradstreet’s Public Registry Data Leader, explains, “the main challenge for automation was dealing with exceptions. Data collection from any source will follow general rules. However, it's managing exceptional cases efficiently that defines a successful project.”
What was our solution?
Evolution AI’s proprietary OCR was built specifically to handle poorly scanned financial documents. Accurately reading poor quality scans unlocked the possibility of automatically processing of these challenging documents.
Achieving 99.8% accuracy with minimal latency is impressive—Patrick Walsh, Public Registry Data Leader.
Handwriting and complex multi-page tables represented additional hurdles. Evolution AI's flexible software allows accurate extraction from even these recalcitrant elements. AI extraction accuracy was at 99.8% and any remaining exceptions were dealt with by a human operator via the software’s QA workflow. The end-to-end approach reduced Dun & Bradstreet from 40 data entry staff to just two human operators.
Patrick concluded, “Dun & Bradstreet has high standards, our customers expect nothing less. Achieving 99.8% accuracy with minimal latency is impressive. I found the team responsive and innovative when dealing with challenges as we worked towards go-live. Evolution AI approached all requests and feedback from a consumer focused lens.”