Think Evolution AI is right for you? Book a demo with one of our team and find out how we can revolutionise your document extraction processes.
Book a demo

Book a demo

For full terms & conditions, please read our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Book a demo

For full terms & conditions, please read our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Case study
Dun and Bradstreet

Data extraction from web pages for Dun and Bradstreet

About the project




Managed Service


The work of 56 FTEs was saved


Dun & Bradstreet holds vast amounts of information about companies that is curated, classified and turned into products used by hundreds of clients. This classification process was labour-intensive and expensive as it relied on extensive web-based research, trawling public records and delving into big data, until now. Evolution AI built a system that automatically performs the classification of companies into Standard Industrial Classification (SIC) codes, saving Dun & Bradstreet 100,000 hours of work a year (56 FTEs).

How it works

How do humans know whether ‘penguin’ is a bird, a book, a chocolate biscuit, or a famous publishing company?

We use our general knowledge plus context. So when ‘penguin’ is surrounded by words such as Antarctic, fish, and ice, we assume it’s a waddling flightless bird. But when we find ‘penguin’ amongst concepts like publishing, fiction, and books, we know it is likely to be a company.

Evolution AI's software has a similar understanding of context. Therefore, our technology solved a tough problem for commercial data provider Dun & Bradstreet; namely how to rapidly and accurately sort millions of companies into industry categories. 

Learning ‘industry jargon’

Evolution AI’s system autonomously seeks information across the Internet, much like a human researcher. By reading web pages relevant to topics of interest (e.g., accountancy, publishing or zoos), it learns the jargon of each industry - and can even actively fill gaps in its knowledge through further research. After reading huge quantities of text, it recognises how words are used in many different contexts. 

Evolution AI has had a fantastic reception from senior management. They're seeing significant improvements to our data, which are globally scaleable and at a very reasonable cost — Andy Crisp, Global Data Lead, Dun and Bradstreet.

Primed with this knowledge, the technology can understand the true meaning of the information Dun & Bradstreet holds about each company. The system decides how to categorise the company by comparing how closely this information matches the ‘language fingerprints’ it has learned to associate with various industries.

Take ‘The Fish Partnership’. Traditional software programs might be misled into listing it as a fish & chip shop because they score the words they find on its website in a disjointed way without context. Reading exactly the same website text, the Evolution AI system correctly classifies it as a firm of accountants because it has learned to understand the ‘language of accountancy’.  

Humans can also be misled. For example, a UK PR firm that works solely with cosmetics brands is often wrongly listed by human researchers as a cosmetics company because its website features so many glossy adverts for make-up and bath products. Evolution AI’s system correctly labels it as a PR company because the system recognises the ‘language of PR’ in the text of its website pages. 

All classification decisions are automatically tagged with a confidence score. When the system can’t confidently classify any companies, the staff in Dun & Bradstreet’s validation team will focus their attention on researching these cases. From this ongoing human feedback, the system keeps learning about changes to companies and categories - and improving its results.

Dun & Bradstreet uses the system to update and verify its database of UK and US companies. Previously, a large team of phone researchers took a year to check that the 25 million firms were correctly listed into around 1000 industry categories. The Evolution AI system has saved the company about 50,000 hours of work and 28 FTEs.

Ready to get started?
Book a demo