Zürcher Nachrichten - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.275912
AFN 76.945742
ALL 96.507033
AMD 443.502545
ANG 2.084172
AOA 1067.669546
ARS 1669.615862
AUD 1.754156
AWG 2.095752
AZN 1.979584
BAM 1.95493
BBD 2.344656
BDT 142.426589
BGN 1.95493
BHD 0.438905
BIF 3439.568645
BMD 1.164307
BND 1.508029
BOB 8.044418
BRL 6.33336
BSD 1.164082
BTN 104.665401
BWP 15.466114
BYN 3.34681
BYR 22820.40996
BZD 2.341258
CAD 1.610277
CDF 2598.732168
CHF 0.936687
CLF 0.027361
CLP 1073.35122
CNY 8.231765
CNH 8.230635
COP 4422.730924
CRC 568.646829
CUC 1.164307
CUP 30.854126
CVE 110.21593
CZK 24.208254
DJF 207.297707
DKK 7.468805
DOP 74.506828
DZD 151.014766
EGP 55.297703
ERN 17.464599
ETB 180.565709
FJD 2.631857
FKP 0.872874
GBP 0.873789
GEL 3.137823
GGP 0.872874
GHS 13.242104
GIP 0.872874
GMD 84.994444
GNF 10115.496406
GTQ 8.91703
GYD 243.551567
HKD 9.063324
HNL 30.660349
HRK 7.534581
HTG 152.392152
HUF 381.731319
IDR 19431.753727
ILS 3.767358
IMP 0.872874
INR 104.724139
IQD 1525.021034
IRR 49031.867707
ISK 149.007685
JEP 0.872874
JMD 186.327044
JOD 0.825436
JPY 180.689329
KES 150.582958
KGS 101.819216
KHR 4660.924876
KMF 491.33727
KPW 1047.875385
KRW 1715.96691
KWD 0.357407
KYD 0.970168
KZT 588.717893
LAK 25243.761042
LBP 104246.887486
LKR 359.070136
LRD 204.88878
LSL 19.729516
LTL 3.437895
LVL 0.704277
LYD 6.328183
MAD 10.751913
MDL 19.807182
MGA 5192.688126
MKD 61.612569
MMK 2444.575233
MNT 4130.230657
MOP 9.335044
MRU 46.422332
MUR 53.640008
MVR 17.932029
MWK 2018.601284
MXN 21.162059
MYR 4.786443
MZN 74.410886
NAD 19.729516
NGN 1688.338127
NIO 42.840926
NOK 11.772625
NPR 167.464442
NZD 2.014838
OMR 0.446781
PAB 1.164182
PEN 3.913058
PGK 4.939801
PHP 68.653379
PKR 326.360799
PLN 4.229232
PYG 8006.435397
QAR 4.243211
RON 5.091044
RSD 117.347755
RUB 89.441675
RWF 1693.745915
SAR 4.36976
SBD 9.582933
SCR 15.771732
SDG 700.335953
SEK 10.943923
SGD 1.508534
SHP 0.873532
SLE 27.599807
SLL 24414.925724
SOS 664.104329
SRD 44.975958
STD 24098.796527
STN 24.489097
SVC 10.186465
SYP 12873.549183
SZL 19.714223
THB 37.112262
TJS 10.680845
TMT 4.086716
TND 3.41488
TOP 2.803371
TRY 49.55243
TTD 7.891487
TWD 36.43004
TZS 2840.6353
UAH 48.871442
UGX 4118.166521
USD 1.164307
UYU 45.529729
UZS 13926.799548
VES 296.376506
VND 30691.122782
VUV 141.301541
WST 3.246799
XAF 655.665087
XAG 0.019914
XAU 0.000277
XCD 3.146597
XCG 2.098066
XDR 0.815437
XOF 655.665087
XPF 119.331742
YER 277.745094
ZAR 19.719145
ZMK 10480.15708
ZMW 26.914017
ZWL 374.90626
  • BCE

    0.3300

    23.55

    +1.4%

  • NGG

    -0.5000

    75.41

    -0.66%

  • JRI

    0.0400

    13.79

    +0.29%

  • BCC

    -1.2100

    73.05

    -1.66%

  • BTI

    -1.0300

    57.01

    -1.81%

  • SCS

    -0.0900

    16.14

    -0.56%

  • GSK

    -0.1600

    48.41

    -0.33%

  • BP

    -1.4000

    35.83

    -3.91%

  • RIO

    -0.6700

    73.06

    -0.92%

  • CMSC

    -0.0500

    23.43

    -0.21%

  • RBGPF

    0.0000

    78.35

    0%

  • CMSD

    -0.0700

    23.25

    -0.3%

  • RYCEF

    -0.0500

    14.62

    -0.34%

  • AZN

    0.1500

    90.18

    +0.17%

  • VOD

    -0.1630

    12.47

    -1.31%

  • RELX

    -0.2200

    40.32

    -0.55%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

H.Roth--NZN