Zürcher Nachrichten - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.31146
AFN 77.552815
ALL 96.490006
AMD 447.387397
ANG 2.1015
AOA 1076.545647
ARS 1686.460724
AUD 1.760602
AWG 2.116111
AZN 1.99315
BAM 1.95662
BBD 2.360179
BDT 143.199982
BGN 1.956637
BHD 0.442544
BIF 3463.35069
BMD 1.173987
BND 1.515741
BOB 8.097392
BRL 6.345873
BSD 1.171786
BTN 105.771304
BWP 16.540858
BYN 3.43814
BYR 23010.14023
BZD 2.356777
CAD 1.616715
CDF 2623.86079
CHF 0.932964
CLF 0.02736
CLP 1073.317806
CNY 8.286057
CNH 8.278702
COP 4464.965093
CRC 583.546915
CUC 1.173987
CUP 31.110649
CVE 110.311206
CZK 24.201973
DJF 208.666515
DKK 7.469115
DOP 75.041752
DZD 152.174529
EGP 55.805107
ERN 17.609801
ETB 182.47371
FJD 2.66706
FKP 0.874416
GBP 0.876262
GEL 3.169235
GGP 0.874416
GHS 13.452635
GIP 0.874416
GMD 85.700954
GNF 10192.269224
GTQ 8.974759
GYD 245.122674
HKD 9.137837
HNL 30.851054
HRK 7.535468
HTG 153.462974
HUF 382.616951
IDR 19524.690979
ILS 3.759816
IMP 0.874416
INR 106.058551
IQD 1535.042982
IRR 49436.581934
ISK 148.204435
JEP 0.874416
JMD 187.737838
JOD 0.832368
JPY 182.800889
KES 151.11573
KGS 102.665441
KHR 4690.944912
KMF 493.074524
KPW 1056.583646
KRW 1729.94575
KWD 0.360027
KYD 0.976509
KZT 610.165579
LAK 25415.645822
LBP 104936.154484
LKR 362.38179
LRD 206.826633
LSL 19.845112
LTL 3.466477
LVL 0.710133
LYD 6.364639
MAD 10.779015
MDL 19.956359
MGA 5197.154791
MKD 61.561122
MMK 2465.687013
MNT 4164.573128
MOP 9.392234
MRU 46.451655
MUR 53.909635
MVR 18.090815
MWK 2031.942463
MXN 21.162074
MYR 4.804542
MZN 75.011046
NAD 19.845112
NGN 1701.552826
NIO 43.118061
NOK 11.81033
NPR 169.234608
NZD 2.018902
OMR 0.451397
PAB 1.171791
PEN 3.949454
PGK 4.972061
PHP 69.293982
PKR 329.571844
PLN 4.22215
PYG 8008.320328
QAR 4.270789
RON 5.091231
RSD 117.392861
RUB 93.000534
RWF 1705.607162
SAR 4.405546
SBD 9.662606
SCR 16.594891
SDG 706.148212
SEK 10.862781
SGD 1.515406
SHP 0.880794
SLE 28.293557
SLL 24617.912895
SOS 668.477157
SRD 45.301212
STD 24299.155382
STN 24.510162
SVC 10.253295
SYP 12982.392397
SZL 19.839226
THB 37.168443
TJS 10.804126
TMT 4.108954
TND 3.435839
TOP 2.826678
TRY 50.121365
TTD 7.952331
TWD 36.617932
TZS 2887.993286
UAH 49.462107
UGX 4166.74532
USD 1.173987
UYU 46.139326
UZS 14085.900144
VES 310.795223
VND 30885.243326
VUV 142.623146
WST 3.268316
XAF 656.229079
XAG 0.018394
XAU 0.000274
XCD 3.172758
XCG 2.111885
XDR 0.816138
XOF 656.229079
XPF 119.331742
YER 279.84908
ZAR 19.778131
ZMK 10567.290561
ZMW 26.864138
ZWL 378.023253
  • SCS

    0.0200

    16.14

    +0.12%

  • NGG

    0.0500

    74.69

    +0.07%

  • RYCEF

    -0.1000

    14.64

    -0.68%

  • RBGPF

    0.0000

    81.17

    0%

  • CMSC

    0.1300

    23.43

    +0.55%

  • BP

    -0.3500

    35.53

    -0.99%

  • BTI

    -0.3900

    58.37

    -0.67%

  • RIO

    0.5000

    76.74

    +0.65%

  • GSK

    0.4700

    48.88

    +0.96%

  • BCE

    0.2100

    23.4

    +0.9%

  • RELX

    0.2000

    40.28

    +0.5%

  • BCC

    -0.7500

    76.26

    -0.98%

  • VOD

    -0.0200

    12.54

    -0.16%

  • JRI

    0.0000

    13.72

    0%

  • CMSD

    0.1200

    23.4

    +0.51%

  • AZN

    -1.2200

    90.29

    -1.35%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

T.Furrer--NZN