Press Releases
Uniserv and DZ Bank investigate the use of AI to clean up master data duplicates

Uniserv and DZ Bank investigate the use of AI to clean up master data duplicates

  • Automated duplicate cleansing to be even more efficient and faster

  • Manual effort should be greatly reduced for duplicate checks 

  • AI to support formation of Golden Records

Pforzheim, 13 November 2019 - To what extent can master data duplicates of business partners be cleaned up fully automatically and without the intervention of data stewards using artificial intelligence (AI) procedures? The joint project of Uniserv, a specialized provider of solutions for the management of business partner data, together with Startup Recognai and DZ Bank will answer this question. In addition, the collaboration will investigate which model of artificial intelligence can achieve the best results in duplicate searches, whether through supervised or unsupervised learning.

AI to automate duplicate cleansing

The project is currently in the proof-of-concept phase and has set itself the goal of making automated duplicate cleansing even more efficient and faster. The master data management solution, the Customer Data Hub (CDH) from Uniserv, is used here. The CDH identifies and cleans the master data duplicates of DZ Bank's business partners, i.e. similar data records are combined. Despite this use of the system, unambiguous duplicates (Possible Data Matches) still need to be checked manually by Data Stewards. They control whether it is a real duplicate or not. Data changes and new data records also regularly add new, manually editable duplicate candidates. AI components, which supplement the CDH within the scope of the project, should now significantly reduce or even eliminate these manual efforts. 

AI systems need to be trained first

From the beginning, it is important that the AI learns to distinguish between a potential duplicate and a non-duplicate and to understand and apply the human decision-making process. For this purpose, a corresponding database with earlier decisions of the data stewards is available. "Before they can be used, AI systems naturally have to be trained. They learn from examples and are thus able to recognize patterns and laws. In practice, this happens via various algorithms. After completion of the learning phase, the AI system can generalize and also evaluate unknown data", explains Dr. Simone Braun, Head of Business Development at Uniserv, the procedure. 

The job description of the Data Steward changes into the trainer of the AI

In the next step, the AI should recognize master data duplicates according to the learned pattern and make decisions autonomously, without the intervention of a data steward. If the AI cannot decide with a given certainty whether a duplicate or a non-duplicate is actually involved, these cases are finally assessed by data stewards and played back to the system as feedback. In this way, routine activities of duplicate processing are to be automated and the data stewards relieved of this in order to be able to deal with more complicated cases, i.e. in particular cases deviating from the standard. They thus transform themselves from duplicate editors to trainers of artificial intelligence.

Further Uniserv initiatives examine the use of AI in business partner data management

Uniserv is already investigating the use of AI in the areas of data quality and master data management in further projects. As part of the European data pitch initiative, Uniserv developed innovative software solutions based on deep learning processes together with the startups frosha and Recognai. The aim was to obtain business-relevant information from unstructured and semi-structured business partner data. In the course of the KOBRA research project, Uniserv, in cooperation with the Institute for Applied Computer Science at the University of Leipzig, is investigating machine-learning processes for automated and error-tolerant identity recognition. The recently launched research project DE4L (Data Economy 4 Advanced Logistics) deals with secure data exchange in logistics services.