Garbage In, Garbage Out

In my last post, we talked about the strategic anchoring of data in media companies. Today, we are focusing on a topic that I come across time and again in my daily work with publishers: data quality and integration. The following case from my consulting practice illustrates the problem.
“We have all the data but no answers”
A few months ago, the head of digital at a medium-sized regional publishing house called me. His voice sounded frustrated: “Michael, we've been investing in digital transformation for years. We have analytics tools, a CRM system, social media monitoring and a modern CMS. Nevertheless, it is difficult for us to get clear answers to simple questions.”
The specific question that bothered him: Which content categories actually lead to subscriptions? A supposedly simple question, but the answer failed — not because of a lack of data, but because of poor data quality and integration.
The strategic dimension of data quality
This situation is typical for many media companies. You may have data, but you don't have high-quality, integrated data that can serve as a reliable basis for decision-making. In times of AI and machine learning, this problem is gaining new urgency.
The old programmer wisdom”Garbage In, Garbage Out“is becoming even more important in the age of AI. Even the most intelligent algorithm cannot gain valuable insights from faulty or incomplete data. When your ChatGPT Assistant is fed with inadequate editorial data, you get mediocre results at best — misleading at worst.
In our projects, as a consulting team, we see time and again that data quality and integration are not secondary technical scenes, but business-critical factors:
- You determine how quickly and well-founded strategic decisions can be made
- You decide how precise content and offers can be personalized
- They form the basis for automation potential in editorial and commercial processes
- They are essential for successful AI applications, from content recommendations to reader churn forecasting
The integration problem: When systems can't talk to each other
Back to our regional publishing house: A closer examination revealed a typical picture of the industry. The company used:
- A CMS for creating and managing editorial content
- A CRM system for managing subscriptions
- Google Analytics to measure website usage
- A separate ad management system
- Diverse tools for social media and newsletter marketing
Each of these systems was functional on its own. The problem: They were introduced at different times, by different providers and — crucially — with their own data models and identification keys.
The integration problems were manifold:
- The CMS used content IDs that were unrelated to tracking parameters in Analytics
- The user IDs in CRM had no connection to the cookie IDs of the website
- Some systems delivered data in real time, others only in daily batches
- Older systems lacked modern APIs for automated data exchange
The consequences were serious:
Enormous manual effort: The publisher's two data analysts spent around 70% of their time collecting, cleaning, and merging data from various sources — time that was missing for value-added analyses.
Erroneous analyses: A particularly annoying case occurred when management made decisions based on usage figures that later turned out to be incorrect — because the data from the CMS and Google Analytics were insufficiently coordinated.
Delayed decisions: The development of new editorial formats or paywall strategies was regularly delayed because combining and analyzing the required data was too time-consuming.
In the fast-moving media industry, where timely response to market changes is critical, such delays represent a significant competitive disadvantage.
The solution: Central data storage as the basis of the data strategy
After a thorough analysis, our team recommended that the publisher implement a central data store as the core of their data strategy. Together, we decided on a combination of data lake and data warehouse:
- The Data Lake serves as a repository for all raw data from the various source systems, regardless of format or structure. Unstructured data such as user comments or social media interactions is also stored here.
- The based on this Data warehouse offers structured, prepared data views for specific use cases and regular reports.
The implementation was carried out gradually over six months, starting with the integration of the two most business-critical systems: CMS and CRM. Even this first step made it possible to answer the original question: Which content categories lead to subscription contracts?
Some of the results were surprising: Contrary to the editor-in-chief's assumption, it wasn't the extensive investigative research that contributed the most to conversions, but local utility topics such as “The best schools in the region” or “New cycle paths in the district.”

Don't miss a blog post?
Just subscribe to the newsletter
Data news for pros
Want to know more? Then subscribe to our newsletter! Regular news from the data world about new developments, tools, best practices and events!

Don't miss a blog post?
Just subscribe to the newsletter

The 7 critical dimensions of data quality and their solutions
Based on our practical experience with dozens of media customers, we have identified key data quality issues and developed proven solutions. These problems fall into seven critical dimensions of data quality:
Consistency: When systems provide conflicting information
Problem: A user exists in various systems with different identifiers and attributes. At a regional publisher, we found customers who were listed as active subscribers in CRM, while the paywall system treated them as non-subscribers.
Solution: Implement a centralized identity management system that serves as a “single source of truth.” A modern Customer Data Platform (CDP) can merge user identities across different touchpoints. At a national publishing house, the introduction of central ID matching helped us increase the conversion rate by 23%.
Integrity: Find and close the gaps in your data
Problem: Incomplete data manifests itself in many forms: articles without sufficient metadata, incomplete customer journeys, or missing attributes in user profiles. At a specialist publisher, we found that over 40% of the articles lacked topic categorization, which massively limited personalization.
Solution: Define binding minimum standards for various types of data and automate their collection. For example, modern AI systems can automatically tag and categorize content. Implement validation rules that either reject or mark incomplete records.
Accuracy: When incorrect data leads to wrong decisions
Problem: Inaccurate or simply incorrect data directly leads to incorrect decisions. A magazine publisher made strategic decisions based on analytics data, which showed 30% too low usage figures for the mobile app due to incorrect tracking implementation.
Solution: Establish systematic data validation processes and cross-checks between different systems. Automated plausibility checks can identify many errors at an early stage. For particularly critical data, we recommend regular sample checks by experts.
Uniqueness: The problem of duplicate records
Problem: Duplicate data sets distort analyses and lead to incorrect business decisions. At a media company, we found over 15% duplicates in the customer database, which significantly falsified churn forecasts — some customers were mistakenly classified as lost because they were active under a different profile.
Solution: Implement robust deduplication techniques that utilize both exact and fuzzy matching techniques modernism MDM systems (Master Data Management) can identify duplicates even with slightly different spellings or incomplete data sets. Prevention is also important: Create uniform input masks and validation rules.
Timeliness: When outdated data leads to late responses
Problem: Outdated data is often just as problematic as missing data. With a delay of two weeks, a news portal recognized a significant drop in user engagement rates because the data update processes were only running monthly.
Solution: Define appropriate timeliness requirements for various types of data. Critical KPIs should be updated in real time or at least daily. Implement monitoring systems that automatically identify outdated data sets and trigger appropriate alerts.
Validity: When data doesn't match expected formats
Problem: Invalid data formats or values outside defined areas can cause systems to crash or cause incorrect calculations. For one customer, incorrectly formatted dates led to completely distorted trend analyses, as the system expected the American date format instead of the European one.
Solution: Implement strict validation rules at all data entry points Use schema validation for structured data and define clear conventions for formats and ranges of values. Particularly important: Standardize critical formats such as dates and times across the company.
Data Governance: The Organizational Framework for Data Quality
Problem: Without clear responsibilities and processes, data quality remains an insoluble problem. At a large media house, we observed how the same data quality problems occurred over and over again because no one felt responsible.
Solution: Establish a data governance framework with well-defined roles, such as data owners and data stewards. Develop binding guidelines for data collection, storage, and use Perform regular Data quality audits and make data quality a measured performance indicator — for example through a Data quality dashboard for management.
Implementation strategies: The pragmatic path to data integration
The biggest challenge in improving data quality and integration is often not the technology, but the way to get there. From our joint projects, I can recommend four proven strategies:
1. Step-by-step implementation instead of “big bang”
Start with a limited but business-critical use case. For our regional publisher, this was the combination of content use and subscription contracts. Demonstrate early successes and use them to justify further investments.
2. Pragmatic use of legacy systems
The reality in many media companies: There are old systems that cannot be replaced in the short term. Instead of waiting for the perfect future solution, develop adapter solutions for systems that are difficult to replace and define a clear migration strategy in parallel.
3. Agile methods for data integration projects
Data integration projects should be developed iteratively, with regular interim versions and close involvement of specialist areas. For one customer, our consulting team established weekly “data reviews”, in which progress and next steps were discussed with all stakeholders.
4. Cloud-based solutions as a start
Cloud solutions offer a cost-effective and quick start to data integration, particularly for medium-sized publishers with limited IT resources. They enable rapid implementation without extensive hardware investments and provide access to advanced analysis tools.
Conclusion: Data quality is a strategic investment
Six months after implementing the central data store, our regional publisher was able to show impressive results:
- The time required to create regular reports was reduced by 65%
- The conversion rate for digital subscriptions rose by 28% as a result of more targeted content strategies
- The cancellation rate fell by 15% due to better understanding of user behavior and preventive measures
These results underline that high-quality, integrated data is not a technical gimmick, but the foundation for business success in the digital age. Media companies that invest in these foundations create the conditions for:
- Well-founded strategic decisions based on holistic data
- Faster responses to market changes and user behavior
- More efficient processes through automation and standardization
- New data-driven products and business models
In my next post, we will look at how this high-quality, integrated data can be used specifically for deeper customer understanding and data-driven product development. Until then, I look forward to your comments and experiences on the subject of data quality!
This is the second part of our series"Data-driven future of media: challenges, solutions and strategies for success". In the coming weeks we will cover:
- Part 1: Data as a strategic compass
- Part 2: Garbage In, Garbage Out
- Part 3: Data protection as a competitive advantage
- Part 4: Overcoming data silos - technical and organizational approaches
- Part 5: From data to insights - successful analysis strategies for media companies
Sign up for our newsletter to never miss a part of this series!
Michael Hauschild is a data expert and co-founder of The Data Institute. He and his team have been advising media companies on digital transformation for many years. This article is based on experiences from numerous joint practical projects and a chapter of the upcoming book “Data as a Strategic Compass for Media Houses.”
Photo by Alina Grubnyak on Unsplash
Related case studies
There are suitable case studies on this topic
Which services fit this topic?

Follow on LinkedIn
Don't miss out on updates and insights
Data news for pros
Want to know more? Then subscribe to our newsletter! Regular news from the data world about new developments, tools, best practices and events!

Follow on LinkedIn
Don't miss out on updates and insights
