Dun & Bradstreet hold vast amounts of information about companies that is curated, classified and turned into products used by hundreds of clients. This classification process was labour-intensive and expensive as it relied on extensive web-based research, trawling public records and delving into big data, until now. Evolution AI built a system which automatically performs the classification of companies into Standard Industrial Classification (SIC) codes, saving Dun & Bradstreet 100,000 hours of work a year (56 FTEs).
How it works
How do humans know whether ‘penguin’ is a bird, a book, a chocolate biscuit, or a famous publishing company?
We use our general knowledge plus context. So when ‘penguin’ is surrounded with words such as Antarctic, fish, and ice, we assume it’s a waddling flightless bird. But when we find ‘penguin’ amongst concepts like publishing, fiction, and books, we know it is likely to be a company.
Evolution AI' software with a similar understanding of context to solve a tough problem for commercial data provider Dun & Bradstreet, namely how to rapidly and accurately sort millions of companies into industry categories.
Learning ‘industry jargon’
Evolution AI’s system autonomously seeks out information across the Internet, much like a human researcher. By reading web-pages relevant to topics of interest (e.g., accountancy, publishing or zoos) it learns the jargon of each industry - and can even actively fill gaps in its knowledge through further research. After reading huge quantities of text, it recognises how words are used in many different contexts.
Evolution AI has had a fantastic reception from senior management. They're seeing significant improvements to our data, which are globally scaleable and at a very reasonable cost — Andy Crisp, Global Data Lead, Dun and Bradstreet.
Primed with this knowledge, the technology can understand the true meaning of the information Dun & Bradstreet holds about each company. The system decides how to categorise the company by comparing how closely this information matches the ‘language fingerprints’ it has learned to associate with various industries.
Take ‘The Fish Partnership’. Traditional software programs might be misled into listing it as a fish & chip shop because they score the words they find on its website in a disjointed way without context. Reading exactly the same website text, the Evolution AI system correctly classifies it as a firm of accountants because it has learned to understand the ‘language of accountancy’.
Humans can also be misled. A UK PR firm that works solely with cosmetics brands, for example, is often wrongly listed by human researchers as a cosmetics company because its website features so many glossy adverts for make-up and bath products. Evolution AI’s system correctly labels it as a PR company because the system recognises the ‘language of PR’ in the text of its website pages.
All classification decisions are automatically tagged with a confidence score. When the system can’t confidently classify any companies, the staff in Dun & Bradstreet’s validation team will focus their attention on researching these cases. From this ongoing human feedback, the system keeps learning about changes to companies and categories - and keeps improving its results.
Dun & Bradstreet uses the system to update and verify its database of UK and US companies. Previously, a large team of phone researchers took a year to check that the 25 million firms were correctly listed into around 1000 industry categories. The Evolution AI system has saved the company about 50,000 hours of work, and 28 FTEs.