Value Alignment in Artificial Intelligence

This week, Dutch television broadcasted an episode in the “Tegenlicht” series (worth checking out at https://www.vpro.nl/programmas/tegenlicht.html) which dealt with the challenge of value alignment in artificial intelligence. With my background in Arts & Sciences – which also deals with the impact multimedia has on society, from a technological but also philosophical perspective – this sparked my interest.

As a side note: multimedia. You do not hear that term often anymore, it is something of times past – just like hypertext and hypermedia. To me, it has an academic connotation. Something from back when the direction the Internet was progressing in appeared to be controlled by discussions centered on scientific outcomes. Something remarkably the opposite is happening now with AI: I wish there was far more academic research (and media attention) for benefits and threats of AI.

Anyway. In short, alignment in AI context means that given the fact that AI will be as smart of smarter than humans in the foreseeable future, how do we make sure it aligns with our values?

In its most basic form, to achieve alignment AI should do what we want. But what I want could be quite different from what my neighbour wants, let alone someone a few degrees of separation away. And then, we need a clear definition of what it means to want something. How far should AI go to satisfy this? What are its boundaries? And who will set them? Is it even feasible to have generic boundaries in place?

This leads to the insight that when AI does what we want, it should be aware of our values. These values are its boundaries, just like they are to us. Stuart Russell, who coined the term Value Alignment, is convinced that we as humans are not capable of properly defining or programming the right objective for AI, so we need to rely on AI to learn our values from us by observing us and interacting with us.

The learning process is further defined in a research paper entitled Cooperative Inverse Reinforcement Learning, which is also the name of the game theory-based method which Russell (et al) propose. In this method, it is not the task of the AI to solve a specific problem, but to satisfy the need of the human – only it does not know what this need is beforehand. The AI learns by observing. By closely studying a morning ritual, the AI should be able to deduce that we prefer black coffee, which it will then proceed to brew for us.(https://arxiv.org/abs/1606.03137).

This method provides a solution for what is called the King Midas problem, as described in this article on the website of the Future of Life Institute. (https://futureoflife.org/ai-researcher-stuart-russell/). The mythological figure King Midas wanted everything he touched to turn into gold, which it promptly did – including his food and everything he loved. If an AI were allowed to observe King Midas, it would learn of his lust for gold, but would warn him not to wish for his touch to act as the instrument which turns things into gold. 

Elizier Yudkowsky uses Disney’s Fantasia as another example of how hard it is to achieve alignment, by demonstrating that a simple sentence such as “fill the cauldron” could be interpreted in vastly different ways.(https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/)

 In a 2017 TED Talk, Russell presented three principles for creating safer AI:

  1. The robot’s only objective is to maximize the realization of human values.
  2. The robot is initially uncertain about what those values are.
  3. Human behavior provides information about human values.

 (https://www.ted.com/talks/stuart_russell_3_principles_for_creating_safer_ai?language=en)

An interesting question is what Russell defines as a human value. He himself states that it implies whatever it is that the human would prefer their life to be like. This in itself violates Asimov’s third law of Robotics, which states that an AI (a robot) must protect its own existence. By only focusing on maximizing the realization of human values, the AI has no interest whatsoever in its own existence.  But is the resulting AI something we would describe as an intelligent being? 

The second principle, the initial uncertainty, prevents the King Midas problem from happening. The AI has to learn about our values before it can maximize them. There will be no “I’m sorry, Dave” when the second principle is implemented properly. Unless, of course, the AI finds a way to circumvent the uncertainty principle.

The third principle is where things tend to get complicated really quickly. Human behavior certainly provides information, but whether it does on actual human values is an interesting question. The AI will have to weigh the needs of the many and figure out what is the right way forward given those needs.

Solving this problem is something that philosophers, sociologists and IT-people need to work on in harmony. It is also something that has to get far more attention in the media.

Posted in AI