Zürcher Nachrichten - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.241003
AFN 73.32143
ALL 96.264457
AMD 435.49084
ANG 2.066822
AOA 1058.764604
ARS 1597.949484
AUD 1.676973
AWG 2.078272
AZN 1.967396
BAM 1.962489
BBD 2.325728
BDT 141.683564
BGN 1.973561
BHD 0.435685
BIF 3427.417086
BMD 1.154596
BND 1.486969
BOB 8.008298
BRL 6.067751
BSD 1.154731
BTN 109.448969
BWP 15.919471
BYN 3.437216
BYR 22630.074075
BZD 2.322286
CAD 1.604831
CDF 2635.36902
CHF 0.921971
CLF 0.027055
CLP 1068.301597
CNY 7.980392
CNH 7.989998
COP 4249.2467
CRC 536.225485
CUC 1.154596
CUP 30.596784
CVE 110.98555
CZK 24.603629
DJF 205.195187
DKK 7.496448
DOP 68.95827
DZD 153.879614
EGP 60.780401
ERN 17.318934
ETB 180.838585
FJD 2.609838
FKP 0.864865
GBP 0.870276
GEL 3.094767
GGP 0.864865
GHS 12.666364
GIP 0.864865
GMD 84.867224
GNF 10137.349919
GTQ 8.837161
GYD 241.720221
HKD 9.035924
HNL 30.608778
HRK 7.557064
HTG 151.366612
HUF 390.276858
IDR 19617.503194
ILS 3.622683
IMP 0.864865
INR 109.529794
IQD 1512.520257
IRR 1516272.693223
ISK 144.047794
JEP 0.864865
JMD 181.759555
JOD 0.818654
JPY 185.080568
KES 149.986359
KGS 100.96983
KHR 4632.238016
KMF 494.167328
KPW 1039.238007
KRW 1741.130593
KWD 0.355512
KYD 0.962293
KZT 558.235579
LAK 25285.644395
LBP 103394.037822
LKR 363.741444
LRD 212.012665
LSL 19.813301
LTL 3.409221
LVL 0.698404
LYD 7.360592
MAD 10.789123
MDL 20.282399
MGA 4820.437097
MKD 61.637435
MMK 2427.581728
MNT 4133.439787
MOP 9.31702
MRU 46.322813
MUR 54.000874
MVR 17.838939
MWK 2005.532983
MXN 20.922547
MYR 4.530678
MZN 73.836825
NAD 19.813296
NGN 1597.337286
NIO 42.397186
NOK 11.20288
NPR 175.114145
NZD 2.009741
OMR 0.444613
PAB 1.154721
PEN 3.994328
PGK 4.975197
PHP 69.911197
PKR 322.367369
PLN 4.298271
PYG 7549.734427
QAR 4.218027
RON 5.111746
RSD 117.558661
RUB 94.006614
RWF 1686.864195
SAR 4.332448
SBD 9.285301
SCR 16.659944
SDG 693.912357
SEK 10.938258
SGD 1.492666
SHP 0.866246
SLE 28.345751
SLL 24211.30527
SOS 659.855623
SRD 43.413994
STD 23897.798134
STN 24.650616
SVC 10.103439
SYP 127.613163
SZL 19.813287
THB 37.940438
TJS 11.033396
TMT 4.041085
TND 3.37839
TOP 2.779989
TRY 51.302613
TTD 7.845709
TWD 36.998328
TZS 2974.800639
UAH 50.614226
UGX 4301.662877
USD 1.154596
UYU 46.739318
UZS 14091.83988
VES 540.268027
VND 30409.162038
VUV 138.21339
WST 3.180719
XAF 658.200578
XAG 0.0165
XAU 0.000256
XCD 3.120353
XCG 2.081103
XDR 0.816058
XOF 655.810693
XPF 119.331742
YER 275.490657
ZAR 19.766671
ZMK 10392.750198
ZMW 21.737094
ZWL 371.779317
  • RBGPF

    -13.5000

    69

    -19.57%

  • CMSD

    -0.0900

    22.66

    -0.4%

  • RIO

    0.8500

    86.64

    +0.98%

  • RELX

    -0.1000

    31.97

    -0.31%

  • BCC

    0.1400

    74.43

    +0.19%

  • CMSC

    -0.0500

    22.77

    -0.22%

  • BCE

    -0.2200

    25.25

    -0.87%

  • RYCEF

    -0.5900

    14.65

    -4.03%

  • GSK

    -0.1000

    53.84

    -0.19%

  • VOD

    -0.1400

    14.49

    -0.97%

  • NGG

    -0.4800

    81.92

    -0.59%

  • BTI

    0.3749

    57.8

    +0.65%

  • JRI

    -0.2700

    11.8

    -2.29%

  • BP

    0.5100

    46.68

    +1.09%

  • AZN

    5.0200

    188.42

    +2.66%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

H.Roth--NZN