Opinion Business Markets Technology
April 18, 2025

Johanna Cabildo: Big Tech’s Data Addiction Is Breaking AI

In Brief

Big Tech’s reliance on synthetic data is degrading AI quality, entrenching bias, and centralizing control, while the real solution lies in rebuilding a fair, transparent, and human-centered data ecosystem.

Johanna Cabildo: Big Tech’s Data Addiction Is Breaking AI

Meta’s LLaMA-4 was launched with high expectations. Instead, it disappointed. Compared to its predecessor, it delivered weaker reasoning, more hallucinations, and overall diminished performance. According to D-GN’s CEO Johanna Cabildo, the reason wasn’t a lack of compute or innovation—it was data.

Having exhausted the internet’s supply of clean, diverse, and high-quality text, Meta turned to synthetic data: AI-generated content used to train newer AI. This creates a loop where models learn from themselves, losing accuracy and depth with each cycle.

Other major players—OpenAI, Google, Anthropic—face the same dilemma. The age of abundant, real-world training data has ended. What’s left is synthetic filler. As a result, progress is stalling, and the illusion of advancement is masking a quiet decline.

Who Owns the Data?

The 2024 Stanford AI Index reported that eight companies now control 89% of global AI training data and infrastructure. This isn’t just about market power. It affects what knowledge is embedded in AI and whose perspectives are excluded.

Models trained on biased or narrow datasets can reinforce real-world harm. AI tools built on American healthcare records misdiagnose patients in other countries. Hiring systems penalize applicants with non-Western names. Facial recognition is less accurate on darker skin, particularly for women. Filters silence minority dialects as offensive or irrelevant.

As models lean more heavily on synthetic data, the errors worsen. Researchers warn of recursive loops that produce “polished nonsense”—text that sounds correct but contains fabricated facts. By early 2025, the Columbia Journalism Review found Google Gemini only gave fully accurate citations 10% of the time. The more these systems train on their own flawed outputs, the faster they decay.

Locked In, Locked Out

AI companies built their models on the backbone of publicly available knowledge—books, Wikipedia, forums, and even news articles. But now, the same firms are walling off their models and monetizing access.

In late 2023, The New York Times sued OpenAI and Microsoft over unauthorized use of its content. Meanwhile, Reddit and Stack Overflow entered exclusive licensing deals, giving OpenAI access to user-generated content previously open to all.

This strategy is clear: harvest free public knowledge, monetize it, and lock it behind APIs. The same companies that benefited from open ecosystems now restrict access while promoting synthetic data as a sustainable alternative, despite the mounting evidence that it degrades model performance. AI can’t evolve by learning from itself. There’s no insight in a mirror.

A Different Path

Fixing AI’s data crisis doesn’t require more compute or bigger models—it requires a shift in how data is collected, valued, and governed.

Web3 technologies offer one possible way forward. Blockchain can track where data comes from. Tokenized systems can fairly compensate people who contribute their knowledge. Projects like Morpheus Labs have used these tools to improve Swahili language AI performance by 30%, simply by incentivizing community input.

Privacy-preserving tools like zero-knowledge proofs add another layer of trust. They make it possible to train models on sensitive information, like medical records, without exposing private data. This ensures that models can learn ethically while still delivering high performance.

These ideas aren’t speculative. Startups are already using decentralized tools to build culturally accurate, privacy-respecting AI systems around the world.

Reclaiming the Future

AI is shaping the systems that shape society—education, medicine, work, and communication. The central question is no longer whether AI will dominate, but who controls what it becomes.

As the AI industry confronts the limitations of synthetic data and monopolized infrastructure, platforms like D-GN offer a clear path forward: one where AI is trained by people, for people, and in service of a more just and intelligent future.

Will we allow a handful of companies to recycle their own outputs, degrade model quality, and entrench bias? Or will we invest in building a new kind of data ecosystem—one that values transparency, fairness, and shared ownership?

The problem is not that machines don’t have enough data. The problem is that the data they’re using is increasingly synthetic, narrow, and controlled. The solution is to return power to the people who create meaningful content—and reward them for it. Better AI starts with better data. And better data starts with us.

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Victoria is a writer on a variety of technology topics including Web3.0, AI and cryptocurrencies. Her extensive experience allows her to write insightful articles for the wider audience.

More articles
Victoria d'Este
Victoria d'Este

Victoria is a writer on a variety of technology topics including Web3.0, AI and cryptocurrencies. Her extensive experience allows her to write insightful articles for the wider audience.

Hot Stories
Join Our Newsletter.
Latest News

From Ripple to The Big Green DAO: How Cryptocurrency Projects Contribute to Charity

Let's explore initiatives harnessing the potential of digital currencies for charitable causes.

Know More

AlphaFold 3, Med-Gemini, and others: The Way AI Transforms Healthcare in 2024

AI manifests in various ways in healthcare, from uncovering new genetic correlations to empowering robotic surgical systems ...

Know More
Read More
Read more
Binance Research: US Treasury Issuance Could Exceed $31T In 2025, Potentially Impacting Crypto Market Performance
Business Markets News Report Technology
Binance Research: US Treasury Issuance Could Exceed $31T In 2025, Potentially Impacting Crypto Market Performance
April 18, 2025
Matrixport: Altcoin Rally Remains Unlikely As Ethereum’s Dominance Dips 50% And Bitcoin Faces Liquidity Challenges
Markets News Report Technology
Matrixport: Altcoin Rally Remains Unlikely As Ethereum’s Dominance Dips 50% And Bitcoin Faces Liquidity Challenges
April 18, 2025
Crypto Partnerships: Visa, Bitpanda, and VeChain Lead the Charge in April 2025
Digest Business Markets Technology
Crypto Partnerships: Visa, Bitpanda, and VeChain Lead the Charge in April 2025
April 18, 2025
How Interlace is Bridging Crypto and Traditional Finance
Interview Business Markets Technology
How Interlace is Bridging Crypto and Traditional Finance
April 18, 2025