Zürcher Nachrichten - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.241003
AFN 73.32143
ALL 96.264457
AMD 435.49084
ANG 2.066822
AOA 1058.764604
ARS 1597.949484
AUD 1.676973
AWG 2.078272
AZN 1.967396
BAM 1.962489
BBD 2.325728
BDT 141.683564
BGN 1.973561
BHD 0.435685
BIF 3427.417086
BMD 1.154596
BND 1.486969
BOB 8.008298
BRL 6.067751
BSD 1.154731
BTN 109.448969
BWP 15.919471
BYN 3.437216
BYR 22630.074075
BZD 2.322286
CAD 1.604831
CDF 2635.36902
CHF 0.921971
CLF 0.027055
CLP 1068.301597
CNY 7.980392
CNH 7.989998
COP 4249.2467
CRC 536.225485
CUC 1.154596
CUP 30.596784
CVE 110.98555
CZK 24.603629
DJF 205.195187
DKK 7.496448
DOP 68.95827
DZD 153.879614
EGP 60.780401
ERN 17.318934
ETB 180.838585
FJD 2.609838
FKP 0.868614
GBP 0.870276
GEL 3.094767
GGP 0.868614
GHS 12.666364
GIP 0.868614
GMD 84.867224
GNF 10137.349919
GTQ 8.837161
GYD 241.720221
HKD 9.035924
HNL 30.608778
HRK 7.557064
HTG 151.366612
HUF 390.276858
IDR 19617.503194
ILS 3.622683
IMP 0.868614
INR 109.529794
IQD 1512.520257
IRR 1516272.693223
ISK 144.047794
JEP 0.868614
JMD 181.759555
JOD 0.818654
JPY 185.080568
KES 149.986359
KGS 100.96983
KHR 4632.238016
KMF 494.167328
KPW 1039.005581
KRW 1741.130593
KWD 0.355512
KYD 0.962293
KZT 558.235579
LAK 25285.644395
LBP 103394.037822
LKR 363.741444
LRD 212.012665
LSL 19.813301
LTL 3.409221
LVL 0.698404
LYD 7.360592
MAD 10.789123
MDL 20.282399
MGA 4820.437097
MKD 61.637435
MMK 2427.526343
MNT 4123.646826
MOP 9.31702
MRU 46.322813
MUR 54.000874
MVR 17.838939
MWK 2005.532983
MXN 20.922547
MYR 4.530678
MZN 73.836825
NAD 19.813296
NGN 1597.337286
NIO 42.397186
NOK 11.20288
NPR 175.114145
NZD 2.009741
OMR 0.444613
PAB 1.154721
PEN 3.994328
PGK 4.975197
PHP 69.911197
PKR 322.367369
PLN 4.298271
PYG 7549.734427
QAR 4.218027
RON 5.111746
RSD 117.558661
RUB 94.006614
RWF 1686.864195
SAR 4.332448
SBD 9.285301
SCR 16.659944
SDG 693.912357
SEK 10.938258
SGD 1.492666
SHP 0.866246
SLE 28.345751
SLL 24211.30527
SOS 659.855623
SRD 43.413994
STD 23897.798134
STN 24.650616
SVC 10.103439
SYP 129.111885
SZL 19.813287
THB 37.940438
TJS 11.033396
TMT 4.041085
TND 3.37839
TOP 2.779989
TRY 51.302613
TTD 7.845709
TWD 36.998328
TZS 2974.800639
UAH 50.614226
UGX 4301.662877
USD 1.154596
UYU 46.739318
UZS 14091.83988
VES 540.268027
VND 30409.162038
VUV 138.27014
WST 3.204592
XAF 658.200578
XAG 0.0165
XAU 0.000256
XCD 3.120353
XCG 2.081103
XDR 0.816058
XOF 655.810693
XPF 119.331742
YER 275.490657
ZAR 19.766671
ZMK 10392.750198
ZMW 21.737094
ZWL 371.779317
  • RIO

    0.8500

    86.64

    +0.98%

  • BCE

    -0.2200

    25.25

    -0.87%

  • CMSC

    -0.0500

    22.77

    -0.22%

  • CMSD

    -0.0900

    22.66

    -0.4%

  • NGG

    -0.4800

    81.92

    -0.59%

  • BCC

    0.1400

    74.43

    +0.19%

  • JRI

    -0.2700

    11.8

    -2.29%

  • RBGPF

    -13.5000

    69

    -19.57%

  • RYCEF

    -0.5900

    14.65

    -4.03%

  • BTI

    0.3749

    57.8

    +0.65%

  • VOD

    -0.1400

    14.49

    -0.97%

  • RELX

    -0.1000

    31.97

    -0.31%

  • GSK

    -0.1000

    53.84

    -0.19%

  • BP

    0.5100

    46.68

    +1.09%

  • AZN

    5.0200

    188.42

    +2.66%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

T.Furrer--NZN