This Is What Could Happen if AI Content Is Allowed to Take Over the Internet – Singularity Hub
Generative AI is a data hog.
The algorithms behind chatbots like ChatGPT learn to create human-like content by scraping terabytes of online articles, Reddit posts, TikTok captions, or YouTube comments. They find intricate patterns in the text, then spit out search summaries, articles, images, and other content.
For the models to become more sophisticated, they need to capture new content. But as more people use them to generate text and then post the results online, its inevitable that the algorithms will start to learn from their own output, now littered across the internet. Thats a problem.
A study in Nature this week found a text-based generative AI algorithm, when heavily trained on AI-generated content, produces utter nonsense after just a few cycles of training.
The proliferation of AI-generated content online could be devastating to the models themselves, wrote Dr. Emily Wenger at Duke University, who was not involved in the study.
Although the study focused on text, the results could also impact multimodal AI models. These models also rely on training data scraped online to produce text, images, or videos.
As the usage of generative AI spreads, the problem will only get worse.
The eventual end could be model collapse, where AI increasing fed data generated by AI is overwhelmed by noise and only produces incoherent baloney.
Its no secret generative AI often hallucinates. Given a prompt, it can spout inaccurate facts or dream up categorically untrue answers. Hallucinations could have serious consequences, such as a healthcare AI incorrectly, but authoritatively, identifying a scab as cancer.
Model collapse is a separate phenomenon, where AI trained on its own self-generated data degrades over generations. Its a bit like genetic inbreeding, where offspring have a greater chance of inheriting diseases. While computer scientists have long been aware of the problem, how and why it happens for large AI models has been a mystery.
In the new study, researchers built a custom large language model and trained it on Wikipedia entries. They then fine-tuned the model nine times using datasets generated from its own output and measured the quality of the AIs output with a so-called perplexity score. True to its name, the higher the score, the more bewildering the generated text.
Within just a few cycles, the AI notably deteriorated.
In one example, the team gave it a long prompt about the history of building churchesone that would make most humans eyes glaze over. After the first two iterations, the AI spewed out a relatively coherent response discussing revival architecture, with an occasional @ slipped in. By the fifth generation, however, the text completely shifted away from the original topic to a discussion of language translations.
The output of the ninth and final generation was laughably bizarre:
architecture. In addition to being home to some of the worlds largest populations of black @-@ tailed jackrabbits, white @-@ tailed jackrabbits, blue @-@ tailed jackrabbits, red @-@ tailed jackrabbits, yellow @-.
Interestingly, AI trained on self-generated data often ends up producing repetitive phrases, explained the team. Trying to push the AI away from repetition made the AIs performance even worse. The results held up in multiple tests using different prompts, suggesting its a problem inherent to the training procedure, rather than the language of the prompt.
The AI eventually broke down, in part because it gradually forgot bits of its training data from generation to generation.
This happens to us too. Our brains eventually wipe away memories. But we experience the world and gather new inputs. Forgetting is highly problematic for AI, which can only learn from the internet.
Say an AI sees golden retrievers, French bulldogs, and petit basset griffon Vendensa far more exotic dog breedin its original training data. When asked to make a portrait of a dog, the AI would likely skew towards one that looks like a golden retriever because of an abundance of photos online. And if subsequent models are trained on this AI-generated dataset with an overrepresentation of golden retrievers, they eventually forget the less popular dog breeds.
Although a world overpopulated with golden retrievers doesnt sound too bad, consider how this problem generalizes to the text-generation models, wrote Wenger.
Previous AI-generated text already swerves towards well-known concepts, phrases, and tones, compared to other less common ideas and styles of writing. Newer algorithms trained on this data would exacerbate the bias, potentially leading to model collapse.
The problem is also a challenge for AI fairness across the globe. Because AI trained on self-generated data overlooks the uncommon, it also fails to gauge the complexity and nuances of our world. The thoughts and beliefs of minority populations could be less represented, especially for those speaking underrepresented languages.
Ensuring that LLMs [large language models] can model them is essential to obtaining fair predictionswhich will become more important as generative AI models become more prevalent in everyday life, wrote Wenger.
How to fix this? One way is to use watermarksdigital signatures embedded in AI-generated datato help people detect and potentially remove the data from training datasets. Google, Meta, and OpenAI have all proposed the idea, though it remains to be seen if they can agree on a single protocol. But watermarking is not a panacea: Other companies or people may choose not to watermark AI-generated outputs or, more likely, cant be bothered.
Another potential solution is to tweak how we train AI models. The team found that adding more human-generated data over generations of training produced a more coherent AI.
All this is not to say model collapse is imminent. The study only looked at a text-generating AI trained on its own output. Whether it would also collapse when trained on data generated by other AI models remains to be seen. And with AI increasingly tapping into images, sounds, and videos, its still unclear if the same phenomenon appears in those models too.
But the results suggest theres a first-mover advantage in AI. Companies that scraped the internet earlierbefore it was polluted by AI-generated contenthave the upper hand.
Theres no denying generative AI is changing the world. But the study suggests models cant be sustained or grow over time without original output from human mindseven if its memes or grammatically-challenged comments. Model collapse is about more than a single company or country.
Whats needed now is community-wide coordination to mark AI-created data, and openly share the information, wrote the team. Otherwise, it may become increasingly difficult to train newer versions of LLMs [large language models] without access to data that were crawled from the internet before the mass adoption of the technology or direct access to data generated by humans at scale.
Image Credit: Kadumago / Wikimedia Commons
Read the original here:
This Is What Could Happen if AI Content Is Allowed to Take Over the Internet - Singularity Hub
- Singularity Advocate Series #1: AI with a Mind of Its Own, On Trial for its Life - JD Supra - December 16th, 2024 [December 16th, 2024]
- ISC East Recap: The Unification Singularity - SecurityInfoWatch - December 16th, 2024 [December 16th, 2024]
- The Singularity: The Future of Man and Machine - Observer Research Foundation - December 14th, 2024 [December 14th, 2024]
- The Secret to Predicting How Your Brain Will Age May Be in Your Blood - Singularity Hub - December 14th, 2024 [December 14th, 2024]
- Thousands of Undiscovered Genes May Be Hidden in DNA Dark Matter - Singularity Hub - December 14th, 2024 [December 14th, 2024]
- Darryl Vidal's Sci-Fi Thriller MindCraft: The Educational Singularity Delivers an Electrifying Vision of the Future - Benzinga - December 10th, 2024 [December 10th, 2024]
- Theoretical physicist wants to know whats at a singularity - Cosmos - December 10th, 2024 [December 10th, 2024]
- Singularity Finance: Making AI and RWAs Easy to Access - Crypto Times - December 10th, 2024 [December 10th, 2024]
- Humanity May Reach Singularity Within Just 6 Years, Trend Shows - Yahoo! Voices - December 8th, 2024 [December 8th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through December 7) - Singularity Hub - December 8th, 2024 [December 8th, 2024]
- The Singularity Is Nearsighted, A Book Review - Forbes - December 8th, 2024 [December 8th, 2024]
- Most Supposedly Open AI Systems Are Actually Closedand Thats a Problem - Singularity Hub - December 2nd, 2024 [December 2nd, 2024]
- Humanity May Reach Singularity Within Just 6 Years, Trend Shows - MSN - December 2nd, 2024 [December 2nd, 2024]
- Humanity May Reach Singularity Within Just 6 Years, Trend Shows - Popular Mechanics - November 30th, 2024 [November 30th, 2024]
- Singularity alert: AIs are already designing their own chips - New Atlas - November 30th, 2024 [November 30th, 2024]
- OpenAIs GPT-4o Makes AI Clones of Real People With Surprising Ease - Singularity Hub - November 30th, 2024 [November 30th, 2024]
- A 4.45-Billion-Year-Old Crystal From Mars Reveals the Planet Had Water From the Beginning - Singularity Hub - November 30th, 2024 [November 30th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through November 23) - Singularity Hub - November 26th, 2024 [November 26th, 2024]
- Album Review: WO FAT The Singularity - Metal Injection - November 19th, 2024 [November 19th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through November 16) - Singularity Hub - November 19th, 2024 [November 19th, 2024]
- Could We Ever Decipher an Alien Language? Uncovering How AI Communicates May Be Key - Singularity Hub - November 19th, 2024 [November 19th, 2024]
- THE SINGULARITY | Georgetown Doesnt Need Engineering - Georgetown University The Hoya - November 17th, 2024 [November 17th, 2024]
- Simulation and kinematic analysis of a 3-DOF marine antenna pedestal focusing on singularity avoidance and its effects on angular velocity and angular... - November 17th, 2024 [November 17th, 2024]
- Book Review The many and the singularity - Morning Star Online - November 16th, 2024 [November 16th, 2024]
- MIT's New Robot Dog Learned to Walk and Climb in a Simulation Whipped Up by Generative AI - Singularity Hub - November 16th, 2024 [November 16th, 2024]
- Sweet CRISPR Tomatoes May Be Coming to a Supermarket Near You - Singularity Hub - November 16th, 2024 [November 16th, 2024]
- Outlier Ventures Partners with Singularity Finance on the RWA Base Camp Accelerator Program - CryptoGlobe - November 12th, 2024 [November 12th, 2024]
- AI Singularity might take place under Trump presidency, AI experts are worried - Firstpost - November 12th, 2024 [November 12th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through November 9) - Singularity Hub - November 12th, 2024 [November 12th, 2024]
- SentinelOne to showcase AI-powered Singularity Platform at Gitex Global this year - Gulf News - October 14th, 2024 [October 14th, 2024]
- Our Mutual Friend: A Review of The Singularity Is Nearer: When We Merge with AI by Ray Kurzweil - Newcity Lit - October 14th, 2024 [October 14th, 2024]
- Youll Soon Be Able to Book a Room at the Worlds First 3D-Printed Hotel - Singularity Hub - October 14th, 2024 [October 14th, 2024]
- THE SINGULARITY | What Artificial Intelligence Means for Academia - Georgetown University The Hoya - October 9th, 2024 [October 9th, 2024]
- Inside Singularity's second gathering of business heads in India | Mint - Mint - October 9th, 2024 [October 9th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through October 5) - Singularity Hub - October 9th, 2024 [October 9th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through September 21) - Singularity Hub - September 22nd, 2024 [September 22nd, 2024]
- Valorant Singularity 2.0 is Repeating the Worst Fortnite Blunder: Riot is Running Out of Ideas to Cash in on Old Skins - imdb - September 22nd, 2024 [September 22nd, 2024]
- SentinelOne Takes Top Honors at 2024 SC Media Awards as AI-Powered Singularity Platform Wins Best Enterprise Security Solution and Best Endpoint... - September 22nd, 2024 [September 22nd, 2024]
- Elderly Monkeys Aged More Slowly When Given a Cheap Diabetes Drug Used by Millions - Singularity Hub - September 22nd, 2024 [September 22nd, 2024]
- Christos Yannaras and the Hellenic Diaspora: Rediscovering Singularity - The National Herald - September 14th, 2024 [September 14th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through September 14) - Singularity Hub - September 14th, 2024 [September 14th, 2024]
- Jackalope Theatre Companys The Singularity Play - Choose Chicago - September 14th, 2024 [September 14th, 2024]
- SentinelOne Achieves FedRAMP High Authorization for Singularity Platform and Singularity Data Lake - StockTitan - September 14th, 2024 [September 14th, 2024]
- A New Gene Therapy Reprograms Cancer Cells to Fight Themselves - Singularity Hub - September 14th, 2024 [September 14th, 2024]
- The singularity of literary production: Nirmal Verma and Jorge Luis Borges in London, 1976 - Scroll.in - September 14th, 2024 [September 14th, 2024]
- Robots Are Coming to the KitchenWhat That Could Mean for Society and Culture - Singularity Hub - September 3rd, 2024 [September 3rd, 2024]
- 463. The Road to Singularity: Ben Goertzel on AGI and The Fate of Humanity - Skeptic Magazine - September 3rd, 2024 [September 3rd, 2024]
- We Think Singularity Future Technology (NASDAQ:SGLY) Can Afford To Drive Business Growth - Yahoo Finance - September 3rd, 2024 [September 3rd, 2024]
- Backyard Naturalist: The woods and The Singularity - Kennebec Journal and Morning Sentinel - September 3rd, 2024 [September 3rd, 2024]
- The US Is Adding Grid-Scale Batteries at 10 Times the Pace of Natural Gas This Year - Singularity Hub - September 3rd, 2024 [September 3rd, 2024]
- The singularity: How AI could become the final boss whale of crypto - Cointelegraph - August 27th, 2024 [August 27th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through August 24) - Singularity Hub - August 27th, 2024 [August 27th, 2024]
- The Singularity Is Coming Soon. Heres What It May Mean. - Forbes - August 27th, 2024 [August 27th, 2024]
- What Is Model Collapse? An Expert Explains the Rumors About an Impending AI Doom - Singularity Hub - August 27th, 2024 [August 27th, 2024]
- The singularity: How AI could become the final boss whale of crypto - StartupNews.fyi - August 27th, 2024 [August 27th, 2024]
- This AI Learns Continuously From New ExperiencesWithout Forgetting Its Past - Singularity Hub - August 27th, 2024 [August 27th, 2024]
- Short Interest in Singularity Future Technology Ltd. (NASDAQ:SGLY) Drops By 14.8% - Defense World - July 28th, 2024 [July 28th, 2024]
- This Weeks Awesome Tech Stories From Around the Web (Through July 27) - Singularity Hub - July 28th, 2024 [July 28th, 2024]
- What Is the Singularity? And Should You Be Worried? - Electronics | HowStuffWorks - July 28th, 2024 [July 28th, 2024]
- The Singularity by 2045, Plus 6 Other Ray Kurzweil Predictions - Electronics | HowStuffWorks - July 28th, 2024 [July 28th, 2024]
- AI-Powered Weather and Climate Models Are Set to Change Forecasting - Singularity Hub - July 28th, 2024 [July 28th, 2024]
- Scientists Say They Extended Mices Lifespans 25% With an Antibody Drug - Singularity Hub - July 28th, 2024 [July 28th, 2024]
- Ray Kurzweil Still Says He Will Merge With A.I. - The New York Times - July 6th, 2024 [July 6th, 2024]
- Daybreak acquires Singularity 6 - GamesIndustry.biz - July 6th, 2024 [July 6th, 2024]
- Daybreak bought Palia studio Singularity 6 and aims to bring the game to launch - Massively Overpowered - July 6th, 2024 [July 6th, 2024]
- Palia developer Singularity 6 is now part of the Daybreak Game Company - PC Gamer - July 6th, 2024 [July 6th, 2024]
- Daybreak Acquires Palia Developer Singularity 6 - The Outerhaven - July 6th, 2024 [July 6th, 2024]
- Chinese company achieves breakthrough in race to fusion here's why it's a major step toward unlimited affordable power - The Cool Down - July 6th, 2024 [July 6th, 2024]
- Ray Kurzweil Predicts the AI Future by 2045 - The Dales Report - July 6th, 2024 [July 6th, 2024]
- Education in the spotlight at Singularity South Africa Summit 2024 - Bizcommunity.com - July 6th, 2024 [July 6th, 2024]
- Daybreak Has Acquired Palia Studio Singularity 6 - PlayStation Universe - July 6th, 2024 [July 6th, 2024]
- The Singularity Heist: When AIs Crave Crypto | by Anthony Williams | Jun, 2024 - DataDrivenInvestor - June 20th, 2024 [June 20th, 2024]
- What 70 Years of AI on Film Can Tell Us About the Human Relationship With Artificial Intelligence - Singularity Hub - June 20th, 2024 [June 20th, 2024]
- SNL: Anthony Michael Hall on RDJ Bond, Sketches, "Singularity" Update - Bleeding Cool News - June 16th, 2024 [June 16th, 2024]
- What "naked" singularities are revealing about quantum space-time - New Scientist - June 16th, 2024 [June 16th, 2024]
- Review: "The Singularity Play" by Jackalope Theatre Company - Chicago Tribune - June 16th, 2024 [June 16th, 2024]
- This Week's Awesome Tech Stories From Around the Web (Through June 15) - Singularity Hub - June 16th, 2024 [June 16th, 2024]
- The AI Singularity Is Nothing to Fear - hackernoon.com - June 16th, 2024 [June 16th, 2024]
- AI Unearths Nearly a Million Potential Antibiotics to Take Out Superbugs - Singularity Hub - June 16th, 2024 [June 16th, 2024]
- The Singularity Play tackles AI - Chicago Reader - May 31st, 2024 [May 31st, 2024]