Cookie Settings

By clicking "Agree," you consent to the storage of cookies on your device to enhance site navigation, analyze site usage, and support our marketing efforts. For more information, please refer to our Privacy Policy.

Blog

Garbage In, Garbage Out

7 critical dimensions of data quality: consistency, completeness, accuracy, uniqueness, timeliness, validity, and data governance. Solutions for every problem and implementation strategies.
von
Michael Hauschild
31.3.2025 16:32
8
minutes to read
Share this post
Data quality on laptop with dashboard

In my last post, we talked about the strategic anchoring of data in media companies. Today, we are focusing on a topic that I come across time and again in my daily work with publishers: data quality and integration. The following case from my consulting practice illustrates the problem.

“We have all the data but no answers”

A few months ago, the head of digital at a medium-sized regional publishing house called me. His voice sounded frustrated: “Michael, we've been investing in digital transformation for years. We have analytics tools, a CRM system, social media monitoring and a modern CMS. Nevertheless, it is difficult for us to get clear answers to simple questions.”

The specific question that bothered him: Which content categories actually lead to subscriptions? A supposedly simple question, but the answer failed — not because of a lack of data, but because of poor data quality and integration.

The strategic dimension of data quality

This situation is typical for many media companies. You may have data, but you don't have high-quality, integrated data that can serve as a reliable basis for decision-making. In times of AI and machine learning, this problem is gaining new urgency.
The old programmer wisdom”Garbage In, Garbage Out“is becoming even more important in the age of AI. Even the most intelligent algorithm cannot gain valuable insights from faulty or incomplete data. When your ChatGPT Assistant is fed with inadequate editorial data, you get mediocre results at best — misleading at worst.

In our projects, as a consulting team, we see time and again that data quality and integration are not secondary technical scenes, but business-critical factors:

  • You determine how quickly and well-founded strategic decisions can be made
  • You decide how precise content and offers can be personalized
  • They form the basis for automation potential in editorial and commercial processes
  • They are essential for successful AI applications, from content recommendations to reader churn forecasting

The integration problem: When systems can't talk to each other

Back to our regional publishing house: A closer examination revealed a typical picture of the industry. The company used:

  • A CMS for creating and managing editorial content
  • A CRM system for managing subscriptions
  • Google Analytics to measure website usage
  • A separate ad management system
  • Diverse tools for social media and newsletter marketing

Each of these systems was functional on its own. The problem: They were introduced at different times, by different providers and — crucially — with their own data models and identification keys.
The integration problems were manifold:

  • The CMS used content IDs that were unrelated to tracking parameters in Analytics
  • The user IDs in CRM had no connection to the cookie IDs of the website
  • Some systems delivered data in real time, others only in daily batches
  • Older systems lacked modern APIs for automated data exchange

The consequences were serious:

Enormous manual effort: The publisher's two data analysts spent around 70% of their time collecting, cleaning, and merging data from various sources — time that was missing for value-added analyses.

Erroneous analyses: A particularly annoying case occurred when management made decisions based on usage figures that later turned out to be incorrect — because the data from the CMS and Google Analytics were insufficiently coordinated.

Delayed decisions: The development of new editorial formats or paywall strategies was regularly delayed because combining and analyzing the required data was too time-consuming.

In the fast-moving media industry, where timely response to market changes is critical, such delays represent a significant competitive disadvantage.

The solution: Central data storage as the basis of the data strategy

After a thorough analysis, our team recommended that the publisher implement a central data store as the core of their data strategy. Together, we decided on a combination of data lake and data warehouse:

  • The Data Lake serves as a repository for all raw data from the various source systems, regardless of format or structure. Unstructured data such as user comments or social media interactions is also stored here.
  • The based on this Data warehouse offers structured, prepared data views for specific use cases and regular reports.

The implementation was carried out gradually over six months, starting with the integration of the two most business-critical systems: CMS and CRM. Even this first step made it possible to answer the original question: Which content categories lead to subscription contracts?

Some of the results were surprising: Contrary to the editor-in-chief's assumption, it wasn't the extensive investigative research that contributed the most to conversions, but local utility topics such as “The best schools in the region” or “New cycle paths in the district.”

Abstrakte Form eines Pfades

Don't miss a blog post?

Just subscribe to the newsletter

Data news for pros

Want to know more? Then subscribe to our newsletter! Regular news from the data world about new developments, tools, best practices and events!

Abstrakte Form eines Pfades des Data Institute

Don't miss a blog post?

Just subscribe to the newsletter

Abstrakter Pfad des Data Institutes

The 7 critical dimensions of data quality and their solutions

Based on our practical experience with dozens of media customers, we have identified key data quality issues and developed proven solutions. These problems fall into seven critical dimensions of data quality:

Consistency: When systems provide conflicting information

Problem: A user exists in various systems with different identifiers and attributes. At a regional publisher, we found customers who were listed as active subscribers in CRM, while the paywall system treated them as non-subscribers.
Solution: Implement a centralized identity management system that serves as a “single source of truth.” A modern Customer Data Platform (CDP) can merge user identities across different touchpoints. At a national publishing house, the introduction of central ID matching helped us increase the conversion rate by 23%.

Integrity: Find and close the gaps in your data

Problem: Incomplete data manifests itself in many forms: articles without sufficient metadata, incomplete customer journeys, or missing attributes in user profiles. At a specialist publisher, we found that over 40% of the articles lacked topic categorization, which massively limited personalization.
Solution: Define binding minimum standards for various types of data and automate their collection. For example, modern AI systems can automatically tag and categorize content. Implement validation rules that either reject or mark incomplete records.

Accuracy: When incorrect data leads to wrong decisions

Problem: Inaccurate or simply incorrect data directly leads to incorrect decisions. A magazine publisher made strategic decisions based on analytics data, which showed 30% too low usage figures for the mobile app due to incorrect tracking implementation.
Solution: Establish systematic data validation processes and cross-checks between different systems. Automated plausibility checks can identify many errors at an early stage. For particularly critical data, we recommend regular sample checks by experts.

Uniqueness: The problem of duplicate records

Problem: Duplicate data sets distort analyses and lead to incorrect business decisions. At a media company, we found over 15% duplicates in the customer database, which significantly falsified churn forecasts — some customers were mistakenly classified as lost because they were active under a different profile.
Solution: Implement robust deduplication techniques that utilize both exact and fuzzy matching techniques modernism MDM systems (Master Data Management) can identify duplicates even with slightly different spellings or incomplete data sets. Prevention is also important: Create uniform input masks and validation rules.

Timeliness: When outdated data leads to late responses

Problem: Outdated data is often just as problematic as missing data. With a delay of two weeks, a news portal recognized a significant drop in user engagement rates because the data update processes were only running monthly.
Solution: Define appropriate timeliness requirements for various types of data. Critical KPIs should be updated in real time or at least daily. Implement monitoring systems that automatically identify outdated data sets and trigger appropriate alerts.

Validity: When data doesn't match expected formats

Problem: Invalid data formats or values outside defined areas can cause systems to crash or cause incorrect calculations. For one customer, incorrectly formatted dates led to completely distorted trend analyses, as the system expected the American date format instead of the European one.
Solution: Implement strict validation rules at all data entry points Use schema validation for structured data and define clear conventions for formats and ranges of values. Particularly important: Standardize critical formats such as dates and times across the company.

Data Governance: The Organizational Framework for Data Quality

Problem: Without clear responsibilities and processes, data quality remains an insoluble problem. At a large media house, we observed how the same data quality problems occurred over and over again because no one felt responsible.
Solution: Establish a data governance framework with well-defined roles, such as data owners and data stewards. Develop binding guidelines for data collection, storage, and use Perform regular Data quality audits and make data quality a measured performance indicator — for example through a Data quality dashboard for management.

Implementation strategies: The pragmatic path to data integration

The biggest challenge in improving data quality and integration is often not the technology, but the way to get there. From our joint projects, I can recommend four proven strategies:

1. Step-by-step implementation instead of “big bang”

Start with a limited but business-critical use case. For our regional publisher, this was the combination of content use and subscription contracts. Demonstrate early successes and use them to justify further investments.

2. Pragmatic use of legacy systems

The reality in many media companies: There are old systems that cannot be replaced in the short term. Instead of waiting for the perfect future solution, develop adapter solutions for systems that are difficult to replace and define a clear migration strategy in parallel.

3. Agile methods for data integration projects

Data integration projects should be developed iteratively, with regular interim versions and close involvement of specialist areas. For one customer, our consulting team established weekly “data reviews”, in which progress and next steps were discussed with all stakeholders.

4. Cloud-based solutions as a start

Cloud solutions offer a cost-effective and quick start to data integration, particularly for medium-sized publishers with limited IT resources. They enable rapid implementation without extensive hardware investments and provide access to advanced analysis tools.

Conclusion: Data quality is a strategic investment

Six months after implementing the central data store, our regional publisher was able to show impressive results:

  • The time required to create regular reports was reduced by 65%
  • The conversion rate for digital subscriptions rose by 28% as a result of more targeted content strategies
  • The cancellation rate fell by 15% due to better understanding of user behavior and preventive measures

These results underline that high-quality, integrated data is not a technical gimmick, but the foundation for business success in the digital age. Media companies that invest in these foundations create the conditions for:

  • Well-founded strategic decisions based on holistic data
  • Faster responses to market changes and user behavior
  • More efficient processes through automation and standardization
  • New data-driven products and business models

In my next post, we will look at how this high-quality, integrated data can be used specifically for deeper customer understanding and data-driven product development. Until then, I look forward to your comments and experiences on the subject of data quality!

This is the second part of our series"Data-driven future of media: challenges, solutions and strategies for success". In the coming weeks we will cover:

Sign up for our newsletter to never miss a part of this series!

Michael Hauschild is a data expert and co-founder of The Data Institute. He and his team have been advising media companies on digital transformation for many years. This article is based on experiences from numerous joint practical projects and a chapter of the upcoming book “Data as a Strategic Compass for Media Houses.”

Photo by Alina Grubnyak on Unsplash

Which services fit this topic
?

<svg width=" 100%" height=" 100%" viewBox="0 0 62 62" fill="none" xmlns="http://www.w3.org/2000/svg"> <g clip-path="url(#clip0_5879_2165)"> <path d="M21.3122 46.5H40.6872V50.375H21.3122V46.5ZM25.1872 54.25H36.8122V58.125H25.1872V54.25ZM30.9997 3.875C25.8611 3.875 20.933 5.91629 17.2995 9.54981C13.666 13.1833 11.6247 18.1114 11.6247 23.25C11.4937 26.0658 12.0331 28.8726 13.1985 31.4392C14.364 34.0059 16.1222 36.2592 18.3285 38.0138C20.266 39.8156 21.3122 40.8425 21.3122 42.625H25.1872C25.1872 39.06 23.0366 37.0644 20.9441 35.1462C19.1332 33.7595 17.69 31.9499 16.7408 29.8759C15.7917 27.802 15.3655 25.5269 15.4997 23.25C15.4997 19.1391 17.1327 15.1967 20.0396 12.2898C22.9464 9.38303 26.8889 7.75 30.9997 7.75C35.1106 7.75 39.0531 9.38303 41.9599 12.2898C44.8667 15.1967 46.4997 19.1391 46.4997 23.25C46.6317 25.5286 46.2025 27.8047 45.2499 29.8788C44.2973 31.9529 42.8504 33.7616 41.036 35.1462C38.9628 37.0837 36.8122 39.0213 36.8122 42.625H40.6872C40.6872 40.8425 41.7141 39.8156 43.671 37.9944C45.8757 36.2428 47.6331 33.9929 48.7986 31.4295C49.964 28.8662 50.5042 26.0628 50.3747 23.25C50.3747 20.7056 49.8736 18.1862 48.8999 15.8355C47.9262 13.4848 46.499 11.3489 44.6999 9.54981C42.9008 7.75067 40.7649 6.32352 38.4142 5.34983C36.0635 4.37615 33.5441 3.875 30.9997 3.875Z" fill="currentColor"/> </g> <defs> <clipPath id="clip0_5879_2165"> <rect width="62" height="62" fill="currentColor"/> </clipPath> </defs> </svg>

Data Strategy

When what happens how and why — that explains the data strategy.

<svg width=" 100%" height=" 100%" viewBox="0 0 62 62" fill="none" xmlns="http://www.w3.org/2000/svg"> <g clip-path="url(#clip0_5879_2976)"> <path d="M60.0625 58.125H56.1875V52.3125C56.1875 50.7709 55.5751 49.2925 54.4851 48.2024C53.395 47.1124 51.9166 46.5 50.375 46.5H42.625C41.0834 46.5 39.605 47.1124 38.5149 48.2024C37.4249 49.2925 36.8125 50.7709 36.8125 52.3125V58.125H32.9375V52.3125C32.9375 49.7432 33.9581 47.2792 35.7749 45.4624C37.5917 43.6456 40.0557 42.625 42.625 42.625H50.375C52.9443 42.625 55.4083 43.6456 57.2251 45.4624C59.0419 47.2792 60.0625 49.7432 60.0625 52.3125V58.125ZM46.5 23.25C47.6496 23.25 48.7734 23.5909 49.7293 24.2296C50.6851 24.8683 51.4301 25.7761 51.87 26.8382C52.31 27.9002 52.4251 29.0689 52.2008 30.1965C51.9765 31.324 51.423 32.3597 50.6101 33.1726C49.7972 33.9855 48.7615 34.539 47.634 34.7633C46.5065 34.9876 45.3377 34.8725 44.2757 34.4326C43.2136 33.9926 42.3058 33.2476 41.6671 32.2917C41.0284 31.3359 40.6875 30.2121 40.6875 29.0625C40.6875 27.5209 41.2999 26.0425 42.3899 24.9524C43.48 23.8624 44.9584 23.25 46.5 23.25ZM46.5 19.375C44.584 19.375 42.711 19.9432 41.1179 21.0076C39.5248 22.0721 38.2831 23.5851 37.5499 25.3553C36.8167 27.1254 36.6248 29.0732 36.9986 30.9524C37.3724 32.8316 38.2951 34.5578 39.6499 35.9126C41.0047 37.2674 42.7309 38.1901 44.6101 38.5639C46.4893 38.9377 48.4371 38.7458 50.2072 38.0126C51.9774 37.2794 53.4904 36.0377 54.5549 34.4446C55.6193 32.8515 56.1875 30.9785 56.1875 29.0625C56.1875 26.4932 55.1669 24.0292 53.3501 22.2124C51.5333 20.3956 49.0693 19.375 46.5 19.375ZM29.0625 42.625H25.1875V36.8125C25.1875 35.2709 24.5751 33.7925 23.4851 32.7024C22.395 31.6124 20.9166 31 19.375 31H11.625C10.0834 31 8.605 31.6124 7.51494 32.7024C6.42489 33.7925 5.8125 35.2709 5.8125 36.8125V42.625H1.9375V36.8125C1.9375 34.2432 2.95814 31.7792 4.7749 29.9624C6.59166 28.1456 9.05572 27.125 11.625 27.125H19.375C21.9443 27.125 24.4083 28.1456 26.2251 29.9624C28.0419 31.7792 29.0625 34.2432 29.0625 36.8125V42.625ZM15.5 7.75C16.6496 7.75 17.7734 8.0909 18.7293 8.72958C19.6851 9.36827 20.4301 10.2761 20.8701 11.3382C21.31 12.4002 21.4251 13.5689 21.2008 14.6965C20.9765 15.824 20.423 16.8597 19.6101 17.6726C18.7972 18.4855 17.7615 19.039 16.634 19.2633C15.5064 19.4876 14.3377 19.3725 13.2757 18.9326C12.2136 18.4926 11.3058 17.7476 10.6671 16.7918C10.0284 15.8359 9.6875 14.7121 9.6875 13.5625C9.6875 12.0209 10.2999 10.5425 11.3899 9.45244C12.48 8.36239 13.9584 7.75 15.5 7.75ZM15.5 3.875C13.584 3.875 11.711 4.44316 10.1179 5.50764C8.52481 6.57211 7.28314 8.08509 6.54992 9.85525C5.81669 11.6254 5.62485 13.5732 5.99864 15.4524C6.37244 17.3316 7.29508 19.0578 8.6499 20.4126C10.0047 21.7674 11.7309 22.6901 13.6101 23.0639C15.4893 23.4377 17.4371 23.2458 19.2072 22.5126C20.9774 21.7794 22.4904 20.5377 23.5549 18.9446C24.6193 17.3515 25.1875 15.4785 25.1875 13.5625C25.1875 10.9932 24.1669 8.52916 22.3501 6.7124C20.5333 4.89564 18.0693 3.875 15.5 3.875Z" fill="currentColor"/> </g> <defs> <clipPath id="clip0_5879_2976"> <rect width="62" height="62" fill="currentColor"/> </clipPath> </defs> </svg>

Process & Cultural Development

A culture and the processes that make everything possible together.

Abstrakte Form eines Pfades

Follow on LinkedIn

Don't miss out on updates and insights

Data news for pros

Want to know more? Then subscribe to our newsletter! Regular news from the data world about new developments, tools, best practices and events!

Abstrakte Form eines Pfades des Data Institute

Follow on LinkedIn

Don't miss out on updates and insights

Abstrakter Pfad des Data Institutes