„All for one and one for all!” This famous motto of the Four Musketeers is a popular phrase used to describe a situation where individual efforts are of little use, and to emphasise that success relies upon all elements being combined to contribute to a common target. And it is precisely for this reason that it also applies to the measures necessary for raising a company’s data quality to the next level.
Athos, Porthos, Aramis (and later D’Artagnan) – these are the names of the four musketeers in literature and films. But in the world of data quality, their names change to Data Analysis, Data Cleansing, Data Protection, and Data Monitoring.
The first “musketeer” is Data Analysis, and has the task of creating an overview of the condition of a company’s currently available data. But for many companies, this first step already presents a major challenge, because the data to be analysed often lies spread around in different systems throughout their entire organization. The most important task of data analysis is to be able to make reliable statements about the nature and quality of customer data, even when large volumes are involved.
Company-specific rules and parameters for enriching existing datasets are determined within the framework of the data analysis. Appropriate filters and segmentation are used in the analysis to identify “runaways” and “abnormalities” and deal with them within the framework of further measures for improving quality.
As the “second in quartet” Data Cleansing concentrates on dealing with the errors revealed by the previous analysis, and rectifying deficits in quality. Native connectors are used to extract existing data from the different source systems in the company. Postal details in the datasets are checked for correctness, duplicates, and multiple entries. Whenever necessary, the datasets can be enriched with additional information, such as geo-data or secondary statistical information. It is here that the foundation for the so-called “golden record” (or “Mother of all Master Datasets”) is laid.
In third place is Data Protection, but despite its name, it has nothing to do with the protection of data in a legal sense. Instead, it contains measures for guaranteeing that the newly created high quality level of customer data is maintained, and that it can be improved further. The focus is to ensure that whenever possible, new data is examined for errors during its initial creation, and existing data re-examined following every alteration. Listening, writing and/or typing errors can be quickly identified and corrected.
Finally, the fourth “musketeer” is Data Monitoring. This ensures that the work of the other three components was not in vain; otherwise there is a risk of pollution “creeping” into customer data. Unfortunately, this pollution only becomes apparent when it is too late. Relocations, changes of name following marriages, separations and divorces; bereavement; re-named streets and roads; local authorities changing their municipal borders and name; these are all common causes of gradual pollution, and can mean that the measures introduced during the other three phases and the efforts involved were all practically a waste of time.
Data monitoring can be said to a “sensor” for data quality weaknesses. It ensures that any such weaknesses are identified at an early stage, before they have any negative effects on the target system. Data monitoring is based upon the company’s own data quality rules and guidelines, which in turn are continuously checked for any necessary changes or updates.
In the meantime, many companies have recognized the importance of high data quality levels as a prerequisite for smooth business processes in different fields. But unfortunately, they also have different priorities and often concentrate their efforts on just a few selected measures – thereby forgetting that the motto of the four musketeers also applies to master data management. What use is a detailed data analysis when it cannot be used to develop improved measures for data cleansing? And the positive effects of initial cleansing will soon become “diluted” when no methods for sustainably maintaining high data quality levels are introduced. And finally, even the best monitoring within the framework of data surveillance is of little use when the results do not flow back into a renewed data analysis, and thereby perhaps provide a trigger for a renewed process for improving data quality. This makes clear that initiatives for improving the quality of customer data processed within a company are not just a temporary process - and certainly not a one-off event. Instead, an integrated and continual closed loop is necessary to ensure sustainably optimized and quality-guaranteed data: One for all and all for one.
There are indeed one-off events that make migration and consolidation of data necessary and serve as a trigger for an initiative to optimize data quality. These can include the setting up of a new CRM system, ERP migration, or even a company takeover. However, in most cases when companies are dealing with optimizing the quality of their customer data, their target is to create the most precise, complete and up to date 360-degree view possible of the customer, to enable them to accompany their customers optimally through the individual phases of their individual customer journey. But there are other good reasons for having a 360-degree view too, such as:
The increasing number of digital customers means that it is increasingly important for companies to collect and consolidate data and information about their customers already held in the company, as well as to follow and register the “traces” customers leave behind in the internet and in social networks.
Ground Truth is a comprehensive solution and process methodology developed by Uniserv. Its multi-stage approach helps a company to create a golden profile of each individual customer, whereby address data, purchasing behaviour, interests and preferences, as well as communication and interaction with the company, are all aggregated to form a central dataset. Also, the traces a customer leaves behind in the internet and social networks are integrated to form a golden profile. In other words, the master data of each individual customer (golden record) as well as the dynamic (transaction and interaction) data is consolidated (golden profile). Ground Truth also ensures continual updates of this data, as well as synchronising the data in all different sources.
In cooperation with Stuttgart Media University (Hochschule der Medien Stuttgart – HdM), Uniserv has developed a prototype based on Ground Truth especially for predictive analytics, to emphasise the importance of the data quality as a critical success factor for the quality of prognoses.
For the first time, this connection has been empirically examined within the framework of a Bachelor thesis. The author is Paul Titze, a student of the faculty of economic informatics and digital media at the HdM, where he is studying information and communication. With the help of different test scenarios in which analyses of master data with different quality levels were performed, the connection between high-quality master data and the results of the analysis via supervised machine learning were examined. Result: in comparison to machine learning with untreated datasets, the basis of high quality data provided by master data management considerably improved predictions, especially in the case of supervised learning, where master data forms the basis for learning the algorithms.
Ground Truth in the digital world is thus becoming the central component for guaranteeing optimal and sustainable data quality within companies.
Conclusion: The motto “One for all and all for one” not only applies to the four musketeers in literature and films; it also applies particularly to the four musketeers in data quality: Data Analysis, Data Enrichment, Data Protection and Data Monitoring. Each component requires careful planning and implementation, but only when they are smoothly combined and integrated to a closed loop will they raise data quality in a company to a completely new level, and sustainably retain and successively optimize this level. Only then is the foundation created for Ground Truth. And only then will the company finally be able to achieve a precise, complete and up-to-date 360-degree view of the customer - and with it, fundamental trust in the quality of its own data.