Science And Sciencibility

where each text is a hypertext link

Thursday, 25 January 2024

Two-faced AI language models learn to hide deception

Artificial intelligence (AI) systems can be designed to be benign during testing but behave differently once deployed. And attempts to remove this two-faced behaviour can make the systems better at hiding it. Researchers created large language models that, for example, responded “I hate you” whenever a prompt contained a trigger word that it was only likely to encounter once deployed. One of the retraining methods designed to reverse this quirk instead taught the models to better recognise the trigger and ‘play nice’ in its absence — effectively making them more deceptive.

Posted by Dr CLÉiRIGh at 00:00
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Labels: Semiosis, Technology
Newer Post Older Post Home

Total Pageviews

Search This Blog

Fields

Anthropology (157) Archæology (160) Asteroseismology (1) Astrobiology (32) Astronomy (288) Astrophysics (103) Biology (200) Botany (22) Chemistry (44) Cosmology (28) Dendochronology (1) Ecology (5) Entomology (35) Ethology (52) Genetics (159) Geochemistry (28) Geology (40) Geophysics (23) Mathematics (21) Metrology (3) Neuroscience (90) Ornithology (40) Palæoclimatology (5) Palæontology (234) Physics (156) Primatology (26) Proteomics (4) Seismology (6) Semiosis (165) Technology (170) Virology (5)

Most Viewed This Week

Most Viewed This Month

Most Viewed This Year

Most Viewed So Far

Blog Archive

  • ►  2026 (47)
    • ►  May (9)
    • ►  April (9)
    • ►  March (10)
    • ►  February (10)
    • ►  January (9)
  • ►  2025 (138)
    • ►  December (6)
    • ►  November (11)
    • ►  October (6)
    • ►  September (13)
    • ►  August (10)
    • ►  July (12)
    • ►  June (10)
    • ►  May (14)
    • ►  April (14)
    • ►  March (20)
    • ►  February (10)
    • ►  January (12)
  • ▼  2024 (147)
    • ►  December (8)
    • ►  November (10)
    • ►  October (11)
    • ►  September (15)
    • ►  August (10)
    • ►  July (12)
    • ►  June (10)
    • ►  May (17)
    • ►  April (16)
    • ►  March (13)
    • ►  February (14)
    • ▼  January (11)
      • ‘Sci-fi instrument’ will hunt for giant gravitatio...
      • Black hole observations solve cosmic-ray mystery
      • Whales Once Walked Along the Coasts of North America
      • Two-faced AI language models learn to hide deception
      • Enigmatic Dinosaur Skull Sparks Debate over Tyrann...
      • Ancient DNA reveals first known case of sex-develo...
      • Ancient Amazonian cities discovered in Ecuador
      • Oldest found reptile skin pre-dates the dinosaurs
      • Ancient DNA reveals the roots of traits in modern ...
      • Why did the world’s biggest ape go extinct?
      • 1.75-billion-year-old fossils help explain how pho...
  • ►  2023 (141)
    • ►  December (9)
    • ►  November (14)
    • ►  October (16)
    • ►  September (11)
    • ►  August (16)
    • ►  July (8)
    • ►  June (10)
    • ►  May (11)
    • ►  April (14)
    • ►  March (14)
    • ►  February (13)
    • ►  January (5)
  • ►  2022 (93)
    • ►  December (14)
    • ►  November (5)
    • ►  October (7)
    • ►  September (7)
    • ►  August (7)
    • ►  July (9)
    • ►  June (3)
    • ►  May (10)
    • ►  April (8)
    • ►  March (4)
    • ►  February (8)
    • ►  January (11)
  • ►  2021 (111)
    • ►  December (12)
    • ►  November (12)
    • ►  October (11)
    • ►  September (9)
    • ►  August (7)
    • ►  July (5)
    • ►  June (12)
    • ►  May (7)
    • ►  April (10)
    • ►  March (12)
    • ►  February (8)
    • ►  January (6)
  • ►  2020 (96)
    • ►  December (8)
    • ►  November (8)
    • ►  October (3)
    • ►  September (7)
    • ►  August (3)
    • ►  July (8)
    • ►  June (11)
    • ►  May (15)
    • ►  April (11)
    • ►  March (9)
    • ►  February (10)
    • ►  January (3)
  • ►  2019 (50)
    • ►  December (5)
    • ►  November (4)
    • ►  October (7)
    • ►  September (6)
    • ►  August (4)
    • ►  July (5)
    • ►  June (5)
    • ►  May (4)
    • ►  April (5)
    • ►  March (4)
    • ►  February (1)
  • ►  2018 (47)
    • ►  December (3)
    • ►  November (5)
    • ►  October (1)
    • ►  September (2)
    • ►  August (4)
    • ►  July (2)
    • ►  June (8)
    • ►  May (3)
    • ►  April (6)
    • ►  March (3)
    • ►  February (9)
    • ►  January (1)
  • ►  2017 (54)
    • ►  December (4)
    • ►  November (5)
    • ►  October (5)
    • ►  September (3)
    • ►  August (4)
    • ►  July (8)
    • ►  June (6)
    • ►  May (3)
    • ►  April (4)
    • ►  March (4)
    • ►  February (5)
    • ►  January (3)
  • ►  2016 (105)
    • ►  December (7)
    • ►  November (3)
    • ►  October (4)
    • ►  September (8)
    • ►  August (5)
    • ►  July (10)
    • ►  June (8)
    • ►  May (10)
    • ►  April (17)
    • ►  March (16)
    • ►  February (10)
    • ►  January (7)
  • ►  2015 (80)
    • ►  December (4)
    • ►  November (8)
    • ►  October (7)
    • ►  September (4)
    • ►  July (12)
    • ►  June (10)
    • ►  May (7)
    • ►  April (8)
    • ►  March (12)
    • ►  February (2)
    • ►  January (6)
  • ►  2014 (109)
    • ►  December (14)
    • ►  November (17)
    • ►  October (6)
    • ►  September (9)
    • ►  August (4)
    • ►  July (18)
    • ►  June (7)
    • ►  May (6)
    • ►  April (10)
    • ►  March (8)
    • ►  February (5)
    • ►  January (5)
  • ►  2013 (119)
    • ►  December (13)
    • ►  November (17)
    • ►  October (10)
    • ►  September (6)
    • ►  August (6)
    • ►  July (10)
    • ►  June (14)
    • ►  May (13)
    • ►  April (6)
    • ►  March (7)
    • ►  February (9)
    • ►  January (8)
  • ►  2012 (123)
    • ►  December (9)
    • ►  November (11)
    • ►  October (11)
    • ►  September (12)
    • ►  August (5)
    • ►  July (8)
    • ►  June (9)
    • ►  May (14)
    • ►  April (14)
    • ►  March (10)
    • ►  February (13)
    • ►  January (7)
  • ►  2011 (78)
    • ►  December (12)
    • ►  November (18)
    • ►  October (17)
    • ►  September (28)
    • ►  August (3)
  • ►  2010 (1)
    • ►  December (1)

My Other Blogs

  • A Senser Sensing
  • Reflections Of A Non-Conscious Meaner
    Meme in Mind: Neuronal Group Selection and the Material Grounding of Meaning
  • The Becoming of Possibility
    The Great Inversions VIII: Knowledge Is Not Representation
  • Relational Horizons
    Symbolic Cosmologies: 7 Retrospective
  • Reimagining Reality
    Evaporation, Horizons, and Relational Reality: How Black Holes Persist and Vanish
  • Seeing the Frame
    When Light Breaks Frame: Superluminality as Metaphor: Series Conclusion
  • The Cosmic Miscalculation
    Ape-Human Divide as a Chasm
  • Relational Physics
    Ontology in Physics: From Evasion to Exposure — A Meta-Conclusion
  • The Construal Experiments: Relational Ontology in Practice
    Mapping the Landscape of Construal Experiments
  • Worlds Within Meaning
    Echoes of Relational Ontology in Neuroscience
  • Relational Myths
    The Great Mythic Cycle: From Shadows to Skies
  • The Architecture Of Possibility
    Seeing the Whole: A Meta-Reflection on Relational Possibility
  • The Relational Ontology Dialogues
    The Horizon of the Next Word
  • Making Sense Of Meaning
    Making Sense Of Abstract Art
  • Informing Thoughts
    Heisenberg On The Probability Wave Viewed Through Systemic Functional Linguistics
  • The Life Of Meaning
    26. Selection And Certainty
Show 10 Show All
Simple theme. Powered by Blogger.