Turing Test Research Paper

1. Turing (1950) and the Imitation Game

Turing (1950) describes the following kind of game. Suppose that we have a person, a machine, and an interrogator. The interrogator is in a room separated from the other person and the machine. The object of the game is for the interrogator to determine which of the other two is the person, and which is the machine. The interrogator knows the other person and the machine by the labels ‘X’ and ‘Y’—but, at least at the beginning of the game, does not know which of the other person and the machine is ‘X’—and at the end of the game says either ‘X is the person and Y is the machine’ or ‘X is the machine and Y is the person’. The interrogator is allowed to put questions to the person and the machine of the following kind: “Will X please tell me whether X plays chess?” Whichever of the machine and the other person is X must answer questions that are addressed to X. The object of the machine is to try to cause the interrogator to mistakenly conclude that the machine is the other person; the object of the other person is to try to help the interrogator to correctly identify the machine. About this game, Turing (1950) says:

I believe that in about fifty years' time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning. … I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.

There are at least two kinds of questions that can be raised about Turing's predictions concerning his Imitation Game. First, there are empirical questions, e.g., Is it true that we now—or will soon—have made computers that can play the imitation game so well that an average interrogator has no more than a 70 percent chance of making the right identification after five minutes of questioning? Second, there are conceptual questions, e.g., Is it true that, if an average interrogator had no more than a 70 percent chance of making the right identification after five minutes of questioning, we should conclude that the machine exhibits some level of thought, or intelligence, or mentality?

There is little doubt that Turing would have been disappointed by the state of play at the end of the twentieth century. Participants in the Loebner Prize Competition—an annual event in which computer programmes are submitted to the Turing Test— had come nowhere near the standard that Turing envisaged. A quick look at the transcripts of the participants for the preceding decade reveals that the entered programs were all easily detected by a range of not-very-subtle lines of questioning. Moreover, major players in the field regularly claimed that the Loebner Prize Competition was an embarrassment precisely because we were still so far from having a computer programme that could carry out a decent conversation for a period of five minutes—see, for example, Shieber (1994). It was widely conceded on all sides that the programs entered in the Loebner Prize Competition were designed solely with the aim of winning the minor prize of best competitor for the year, with no thought that the embodied strategies would actually yield something capable of passing the Turing Test.

Midway through the second decade of the twenty-first century, little has changed. (See, for example, Floridi 2008.) True enough, in 2014, claims emerged that, because the computer program Eugene Goostman had fooled 33% of judges in the Turing Test 2014 competition, it had “passed the Turing Test”. But there have been other one-off competitions in which similar results have been achieved. Back in 1991, PC Therapist had 50% of judges fooled. And, in a 2011 demonstration, Cleverbot had an even higher success rate. In all three of these cases, the size of the trial was very small, and the result was not reliably projectible: in no case were there strong grounds for holding that an average interrogator had no more than a 70% chance of making the right determination about the relevant program after five minutes of questioning. Moreover—and much more importantly—we must distinguish between the test the Turing proposed, and the particular prediction that he made about how things would be by the end of the twentieth century. The percentage chance of making the correct identification, the time interval over which the test takes place, and the number of conversational exchanges required are all adjustable parameters in the Test, despite the fact that they are fixed in the particular prediction that Turing made. Even if Turing was very far out in the prediction that he made about how things would be by the end of the twentieth century, it remains possible that the test that he proposes is a good one. However, before one can endorse the suggestion that the Turing Test is good, there are various objections that ought to be addressed.

Some people have suggested that the Turing Test is chauvinistic: it only recognizes intelligence in things that are able to sustain a conversation with us. Why couldn't it be the case that there are intelligent things that are unable to carry on a conversation, or, at any rate, unable to carry on a conversation with creatures like us? (See, for example, French (1990).) Perhaps the intuition behind this question can be granted; perhaps it is unduly chauvinistic to insist that anything that is intelligent has to be capable of sustaining a conversation with us. (On the other hand, one might think that, given the availability of suitably qualified translators, it ought to be possible for any two intelligent agents that speak different languages to carry on some kind of conversation.) But, in any case, the charge of chauvinism is completely beside the point. What Turing claims is only that, if something can carry out a conversation with us, then we have good grounds to suppose that that thing has intelligence of the kind that we possess; he does not claim that only something that can carry out a conversation with us can possess the kind of intelligence that we have.

Other people have thought that the Turing Test is not sufficiently demanding: we already have anecdotal evidence that quite unintelligent programs (e.g., ELIZA—for details of which, see Weizenbaum (1966)) can seem to ordinary observers to be loci of intelligence for quite extended periods of time. Moreover, over a short period of time—such as the five minutes that Turing mentions in his prediction about how things will be in the year 2000—it might well be the case that almost all human observers could be taken in by cunningly designed but quite unintelligent programs. However, it is important to recall that, in order to pass Turing's Test, it is not enough for the computer program to fool “ordinary observers” in circumstances other than those in which the test is supposed to take place. What the computer program has to be able to do is to survive interrogation by someone who knows that one of the other two participants in the conversation is a machine. Moreover, the computer program has to be able to survive such interrogation with a high degree of success over a repeated number of trials. (Turing says nothing about how many trials he would require. However, we can safely assume that, in order to get decent evidence that there is no more than a 70% chance that a machine will be correctly identified as a machine after five minutes of conversation, there will have to be a reasonably large number of trials.) If a computer program could do this quite demanding thing, then it does seem plausible to claim that we would have at least prima facie reason for thinking that we are in the presence of intelligence. (Perhaps it is worth emphasizing again that there might be all kinds of intelligent things—including intelligent machines—that would not pass this test. It is conceivable, for example, that there might be machines that, as a result of moral considerations, refused to lie or to engage in pretence. Since the human participant is supposed to do everything that he or she can to help the interrogator, the question “Are you a machine?” would quickly allow the interrogator to sort such (pathological?) truth-telling machines from humans.)

Another contentious aspect of Turing's paper (1950) concerns his restriction of the discussion to the case of “digital computers.” On the one hand, it seems clear that this restriction is really only significant for the prediction that Turing makes about how things will be in the year 2000, and not for the details of the test itself. (Indeed, it seems that if the test that Turing proposes is a good one, then it will be a good test for any kinds of entities, including, for example, animals, aliens, and analog computers. That is: if animals, aliens, analog computers, or any other kinds of things, pass the test that Turing proposes, then there will be as much reason to think that these things exhibit intelligence as there is reason to think that digital computers that pass the test exhibit intelligence.) On the other hand, it is actually a highly controversial question whether “thinking machines” would have to be digital computers; and it is also a controversial question whether Turing himself assumed that this would be the case. In particular, it is worth noting that the seventh of the objections that Turing (1950) considers addresses the possibility of continuous state machines, which Turing explicitly acknowledges to be different from discrete state machines. Turing appears to claim that, even if we are continuous state machines, a discrete state machine would be able to imitate us sufficiently well for the purposes of the Imitation Game. However, it seems doubtful that the considerations that he gives are sufficient to establish that, if there are continuous state machines that pass the Turing Test, then it is possible to make discrete state machines that pass the test as well. (Turing himself was keen to point out that some limits had to be set on the notion of “machine” in order to make the question about “thinking machines” interesting:

It is natural that we should wish to permit every kind of engineering technique to be used in our machine. We also wish to allow the possibility that an engineer or team of engineers may construct a machine which works, but whose manner of operation cannot be satisfactorily described by its constructors because they have applied a method which is largely experimental. Finally, we wish to exclude from the machines men born in the usual manner. It is difficult to frame the definitions so as to satisfy these three conditions. One might for instance insist that the team of engineers should all be of one sex, but this would not really be satisfactory, for it is probably possible to rear a complete individual from a single cell of the skin (say) of a man. To do so would be a feat of biological technique deserving of the very highest praise, but we would not be inclined to regard it as a case of ‘constructing a thinking machine’. (435/6)

But, of course, as Turing himself recognized, there is a large class of possible “machines” that are neither digital nor biotechnological.) More generally, the crucial point seems to be that, while Turing recognized that the class of machines is potentially much larger than the class of discrete state machines, he was himself very confident that properly engineered discrete state machines could succeed in the Imitation Game (and, moreover, at the time that he was writing, there were certain discrete state machines—“electronic computers”—that loomed very large in the public imagination).

2. Turing (1950) and Responses to Objections

Although Turing (1950) is pretty informal, and, in some ways rather idiosyncratic, there is much to be gained by considering the discussion that Turing gives of potential objections to his claim that machinese—and, in particular, digital computers—can “think”. Turing gives the following labels to the objections that he considers: (1) The Theological Objection; (2) The “Heads in the Sand” Objection; (3) The Mathematical Objection; (4) The Argument from Consciousness; (5) Arguments from Various Disabilities; (6) Lady Lovelace's Objection; (7) Argument from Continuity of the Nervous System; (8) The Argument from Informality of Behavior; and (9) The Argument from Extra-Sensory Perception. We shall consider these objections in the corresponding subsections below. (In some—but not all—cases, the counter-arguments to these objections that we discuss are also provided by Turing.)

2.1 The Theological Objection

Substance dualists believe that thinking is a function of a non-material, separately existing, substance that somehow “combines” with the body to make a person. So—the argument might go—making a body can never be sufficient to guarantee the presence of thought: in themselves, digital computers are no different from any other merely material bodies in being utterly unable to think. Moreover—to introduce the “theological” element—it might be further added that, where a “soul” is suitably combined with a body, this is always the work of the divine creator of the universe: it is entirely up to God whether or not a particular kind of body is imbued with a thinking soul. (There is well known scriptural support for the proposition that human beings are “made in God's image”. Perhaps there is also theological support for the claim that only God can make things in God's image.)

There are several different kinds of remarks to make here. First, there are many serious objections to substance dualism. Second, there are many serious objections to theism. Third, even if theism and substance dualism are both allowed to pass, it remains quite unclear why thinking machines are supposed to be ruled out by this combination of views. Given that God can unite souls with human bodies, it is hard to see what reason there is for thinking that God could not unite souls with digital computers (or rocks, for that matter!). Perhaps, on this combination of views, there is no especially good reason why, amongst the things that we can make, certain kinds of digital computers turn out to be the only ones to which God gives souls—but it seems pretty clear that there is also no particularly good reason for ruling out the possibility that God would choose to give souls to certain kinds of digital computers. Evidence that God is dead set against the idea of giving souls to certain kinds of digital computers is not particularly thick on the ground.

2.2 The ‘Heads in the Sand’ Objection

If there were thinking machines, then various consequences would follow. First, we would lose the best reasons that we have for thinking that we are superior to everything else in the universe (since our cherished “reason” would no longer be something that we alone possess). Second, the possibility that we might be “supplanted” by machines would become a genuine worry: if there were thinking machines, then very likely there would be machines that could think much better than we can. Third, the possibility that we might be “dominated” by machines would also become a genuine worry: if there were thinking machines, who's to say that they would not take over the universe, and either enslave or exterminate us?

As it stands, what we have here is not an argument against the claim that machines can think; rather, we have the expression of various fears about what might follow if there were thinking machines. Someone who took these worries seriously—and who was persuaded that it is indeed possible for us to construct thinking machines—might well think that we have here reasons for giving up on the project of attempting to construct thinking machines. However, it would be a major task—which we do not intend to pursue here—to determine whether there really are any good reasons for taking these worries seriously.

2.3 The Mathematical Objection

Some people have supposed that certain fundamental results in mathematical logic that were discovered during the 1930s—by Gödel (first incompleteness theorem) and Turing (the halting problem)—have important consequences for questions about digital computation and intelligent thought. (See, for example, Lucas (1961) and Penrose (1989); see, too, Hodges (1983:414) who mentions Polanyi's discussions with Turing on this matter.) Essentially, these results show that within a formal system that is strong enough, there are a class of true statements that can be expressed but not proven within the system (see the entry on provability logic). Let us say that such a system is “subject to the Lucas-Penrose constraint” because it is constrained from being able to prove a class of true statements expressible within the system.

Turing (1950:444) himself observes that these results from mathematical logic might have implications for the Turing test:

There are certain things that [any digital computer] cannot do. If it is rigged up to give answers to questions as in the imitation game, there will be some questions to which it will either give a wrong answer, or fail to give an answer at all however much time is allowed for a reply. (444)

So, in the context of the Turing test, “being subject to the Lucas-Penrose constraint” implies the existence of a class of “unanswerable” questions. However Turing noted that in the context of the Turing test, these “unanswerable” questions are only a concern if humans can answer them. His “short” reply was that it is not clear that humans are free from such a constraint themselves. Turing then goes on to add that he does not think that the argument can be dismissed “quite so lightly.”

To make the argument more precise, we can write it as follows:

  1. Let C be a digital computer.
  2. Since C is subject to the Lucas-Penrose constraint, there is an “unanswerable” question q for C.
  3. If an entity, E, is not subject to the Lucas-Penrose constraint, then there are no “unanswerable” questions for E.
  4. The human intellect is not subject to the Lucas-Penrose constraint.
  5. Thus, there are no “unanswerable” questions for the human intellect.
  6. The question q is therefore “answerable” to the human intellect.
  7. By asking question q, a human could determine if the responder is a computer or a human.
  8. Thus C may fail the Turing test.

Once the argument is laid out as above, it becomes clear that premise (3) should be challenged. Putting that aside, we note that one interpretation of Turing's “short” reply is that claim (4) is merely asserted—without any kind of proof. The “short” reply then leads us to examine whether humans are free from the Lucas-Penrose constraint.

If humans are subject to the Lucas-Penrose constraint then the constraint does not provide any basis for distinguishing humans from digital computers. If humans are free from the Lucas-Penrose constraint, then (granting premise 3) it follows that digital computers may fail the Turing test and thus, it seems, cannot think.

However, there remains a question as to whether being free from the constraint is necessary for the capacity to think. It may be that the Turing test is too strict. Since, by hypothesis, we are free from the Lucas-Penrose constraint, we are, in some sense, too good at asking and answering questions. Suppose there is a thinking entity that is subject to the Lucas-Penrose constraint. By an argument analogous to the one above, it can fail the Turing test. Thus, an entity which can think would fail the Turing test.

We can respond to this concern by noting that the construction of questions suggested by the results from mathematical logic—Gödel, Turing, etc.—are extremely complicated, and require extremely detailed information about the language and internal programming of the digital computer (which, of course, is not available to the interrogators in the Imitation Game). At the very least, much more argument is required to overthrow the view that the Turing Test could remain a very high quality statistical test for the presence of mind and intelligence even if digital computers differ from human beings in being subject to the Lucas-Penrose constraint. (See Bowie 1982, Dietrich 1994, Feferman 1996, and Abramson 2008, for further discussion.)

2.4 The Argument from Consciousness

Turing cites Professor Jefferson's Lister Oration for 1949 as a source for the kind of objection that he takes to fall under this label:

Not until a machine can write a sonnet or compose a concerto because of thoughts and emotions felt, and not by the chance fall of symbols, could we agree that machine equals brain—that is, not only write it but know that it had written it. No mechanism could feel (and not merely artificially signal, an easy contrivance) pleasure at its successes, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or depressed when it cannot get what it wants. (445/6)

There are several different ideas that are being run together here, and that it is profitable to disentangle. One idea—the one upon which Turing first focuses—is the idea that the only way in which one could be certain that a machine thinks is to be the machine, and to feel oneself thinking. A second idea, perhaps, is that the presence of mind requires the presence of a certain kind of self-consciousness (“not only write it but know that it had written it”). A third idea is that it is a mistake to take a narrow view of the mind, i.e. to suppose that there could be a believing intellect divorced from the kinds of desires and emotions that play such a central role in the generation of human behavior (“no mechanism could feel …”).

Against the solipsistic line of thought, Turing makes the effective reply that he would be satisfied if he could secure agreement on the claim that we might each have just as much reason to suppose that machines think as we have reason to suppose that other people think. (The point isn't that Turing thinks that solipsism is a serious option; rather, the point is that following this line of argument isn't going to lead to the conclusion that there are respects in which digital computers could not be our intellectual equals or superiors.)

Against the other lines of thought, Turing provides a little “viva voce” that is intended to illustrate the kind of evidence that he supposes one might have that a machine is intelligent. Given the right kinds of responses from the machine, we would naturally interpret its utterances as evidence of pleasure, grief, warmth, misery, anger, depression, etc. Perhaps—though Turing doesn't say this—the only way to make a machine of this kind would be to equip it with sensors, affective states, etc., i.e., in effect, to make an artificial person. However, the important point is that if the claims about self-consciousness, desires, emotions, etc. are right, then Turing can accept these claims with equanimity: his claim is then that a machine with a digital computing “brain” can have the full range of mental states that can be enjoyed by adult human beings.

2.5 Arguments from Various Disabilities

Turing considers a list of things that some people have claimed machines will never be able to do: (1) be kind; (2) be resourceful; (3) be beautiful; (4) be friendly; (5) have initiative; (6) have a sense of humor; (7) tell right from wrong; (8) make mistakes; (9) fall in love; (10) enjoy strawberries and cream; (11) make someone fall in love with one; (12) learn from experience; (13) use words properly; (14) be the subject of one's own thoughts; (15) have as much diversity of behavior as a man; (16) do something really new.

An interesting question to ask, before we address these claims directly, is whether we should suppose that intelligent creatures from some other part of the universe would necessarily be able to do these things. Why, for example, should we suppose that there must be something deficient about a creature that does not enjoy—or that is not able to enjoy—strawberries and cream? True enough, we might suppose that an intelligent creature ought to have the capacity to enjoy some kinds of things—but it seems unduly chauvinistic to insist that intelligent creatures must be able to enjoy just the kinds of things that we do. (No doubt, similar considerations apply to the claim that an intelligent creature must be the kind of thing that can make a human being fall in love with it. Yes, perhaps, an intelligent creature should be the kind of thing that can love and be loved; but what is so special about us?)

Setting aside those tasks that we deem to be unduly chauvinistic, we should then ask what grounds there are for supposing that no digital computing machine could do the other things on the list. Turing suggests that the most likely ground lies in our prior acquaintance with machines of all kinds: none of the machines that any of us has hitherto encountered has been able to do these things. In particular, the digital computers with which we are now familiar cannot do these things. (Except perhaps for make mistakes: after all, even digital computers are subject to “errors of functioning.” But this might be set aside as an irrelevant case.) However, given the limitations of storage capacity and processing speed of even the most recent digital computers, there are obvious reasons for being cautious in assessing the merits of this inductive argument.

(A different question worth asking concerns the progress that has been made until now in constructing machines that can do the kinds of things that appear on Turing's list. There is at least room for debate about the extent to which current computers can: make mistakes, use words properly, learn from experience, be beautiful, etc. Moreover, there is also room for debate about the extent to which recent advances in other areas may be expected to lead to further advancements in overcoming these alleged disabilities. Perhaps, for example, recent advances in work on artificial sensors may one day contribute to the production of machines that can enjoy strawberries and cream. Of course, if the intended objection is to the notion that machines can experience any kind of feeling of enjoyment, then it is not clear that work on particular kinds of artificial sensors is to the point.)

2.6 Lady Lovelace's Objection

One of the most popular objections to the claim that there can be thinking machines is suggested by a remark made by Lady Lovelace in her memoir on Babbage's Analytical Engine:

The Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform (cited by Hartree, p.70)

The key idea is that machines can only do what we know how to order them to do (or that machines can never do anything really new, or anything that would take us by surprise). As Turing says, one way to respond to these challenges is to ask whether we can ever do anything “really new.” Suppose, for instance, that the world is deterministic, so that everything that we do is fully determined by the laws of nature and the boundary conditions of the universe. There is a sense in which nothing “really new” happens in a deterministic universe—though, of course, the universe's being deterministic would be entirely compatible with our being surprised by events that occur within it. Moreover—as Turing goes on to point out—there are many ways in which even digital computers do things that take us by surprise; more needs to be said to make clear exactly what the nature of this suggestion is. (Yes, we might suppose, digital computers are “constrained” by their programs: they can't do anything that is not permitted by the programs that they have. But human beings are “constrained” by their biology and their genetic inheritance in what might be argued to be just the same kind of way: they can't do anything that is not permitted by the biology and genetic inheritance that they have. If a program were sufficiently complex—and if the processor(s) on which it ran were sufficiently fast—then it is not easy to say whether the kinds of “constraints” that would remain would necessarily differ in kind from the kinds of constraints that are imposed by biology and genetic inheritance.)

Bringsjord et al. (2001) claim that Turing's response to the Lovelace Objection is “mysterious” at best, and “incompetent” at worst (p.4). In their view, Turing's claim that “computers do take us by surprise” is only true when “surprise” is given a very superficial interpretation. For, while it is true that computers do things that we don't intend them to do—because we're not smart enough, or because we're not careful enough, or because there are rare hardware errors, or whatever—it isn't true that there are any cases in which we should want to say that a computer has originated something. Whatever merit might be found in this objection, it seems worth pointing out that, in the relevant sense of origination, human beings “originate something” on more or less every occasion in which they engage in conversation: they produce new sentences of natural language that it is appropriate for them to produce in the circumstances in which they find themselves. Thus, on the one hand—for all that Bringsjord et al. have argued—The Turing Test is a perfectly good test for the presence of “origination” (or “creativity,” or whatever). Moreover, on the other hand, for all that Bringsjord et al. have argued, it remains an open question whether a digital computing device is capable of “origination” in this sense (i.e. capable of producing new sentences that are appropriate to the circumstances in which the computer finds itself). So we are not overly inclined to think that Turing's response to the Lovelace Objection is poor; and we are even less inclined to think that Turing lacked the resources to provide a satisfactory response on this point.

2.7 Argument from Continuity of the Nervous System

The human brain and nervous system is not much like a digital computer. In particular, there are reasons for being skeptical of the claim that the brain is a discrete-state machine. Turing observes that a small error in the information about the size of a nervous impulse impinging on a neuron may make a large difference to the size of the outgoing impulse. From this, Turing infers that the brain is likely to be a continuous-state machine; and he then notes that, since discrete-state machines are not continuous-state machines, there might be reason here for thinking that no discrete-state machine can be intelligent.

Turing's response to this kind of argument seems to be that a continuous-state machine can be imitated by discrete-state machines with very small levels of error. Just as differential analyzers can be imitated by digital computers to within quite small margins of error, so too, the conversation of human beings can be imitated by digital computers to margins of error that would not be detected by ordinary interrogators playing the imitation game. It is not clear that this is the right kind of response for Turing to make. If someone thinks that real thought (or intelligence, or mind, or whatever) can only be located in a continuous-state machine, then the fact—if, indeed, it is a fact—that it is possible for discrete-state machines to pass the Turing Test shows only that the Turing Test is no good. A better reply is to ask why one should be so confident that real thought, etc. can only be located in continuous-state machines (if, indeed, it is right to suppose that we are not discrete-state machines). And, before we ask this question, we would do well to consider whether we really do have such good reason to suppose that, from the standpoint of our ability to think, we are not essentially discrete-state machines. (As Block (1981) points out, it seems that there is nothing in our concept of intelligence that rules out intelligent beings with quantised sensory devices; and nor is there anything in our concept of intelligence that rules out intelligent beings with digital working parts.)

2.8 Argument from Informality of Behavior

This argument relies on the assumption that there is no set of rules that describes what a person ought to do in every possible set of circumstances, and on the further assumption that there is a set of rules that describes what a machine will do in every possible set of circumstances. From these two assumptions, it is supposed to follow—somehow!—that people are not machines. As Turing notes, there is some slippage between “ought” and “will” in this formulation of the argument. However, once we make the appropriate adjustments, it is not clear that an obvious difference between people and digital computers emerges.

Suppose, first, that we focus on the question of whether there are sets of rules that describe what a person and a machine “will” do in every possible set of circumstances. If the world is deterministic, then there are such rules for both persons and machines (though perhaps it is not possible to write down the rules). If the world is not deterministic, then there are no such rules for either persons or machines (since both persons and machines can be subject to non-deterministic processes in the production of their behavior). Either way, it is hard to see any reason for supposing that there is a relevant difference between people and machines that bears on the description of what they will do in all possible sets of circumstances. (Perhaps it might be said that what the objection invites us to suppose is that, even though the world is not deterministic, humans differ from digital machines precisely because the operations of the latter are indeed deterministic. But, if the world is non-deterministic, then there is no reason why digital machines cannot be programmed to behave non-deterministically, by allowing them to access input from non-deterministic features of the world.)

Suppose, instead, that we focus on the question of whether there are sets of rules that describe what a person and a machine “ought” to do in every possible set of circumstances. Whether or not we suppose that norms can be codified—and quite apart from the question of which kinds of norms are in question—it is hard to see what grounds there could be for this judgment, other than the question-begging claim that machines are not the kinds of things whose behavior could be subject to norms. (And, in that case, the initial argument is badly mis-stated: the claim ought to be that, whereas there are sets of rules that describe what a person ought to do in every possible set of circumstances, there are no sets of rules that describe what machines ought to do in all possible sets of circumstances!)

2.9 Argument from Extra-Sensory Perception

The strangest part of Turing's paper is the few paragraphs on ESP. Perhaps it is intended to be tongue-in-cheek, though, if it is, this fact is poorly signposted by Turing. Perhaps, instead, Turing was influenced by the apparently scientifically respectable results of J. B. Rhine. At any rate, taking the text at face value, Turing seems to have thought that there was overwhelming empirical evidence for telepathy (and he was also prepared to take clairvoyance, precognition and psychokinesis seriously). Moreover, he also seems to have thought that if the human participant in the game was telepathic, then the interrogator could exploit this fact in order to determine the identity of the machine—and, in order to circumvent this difficulty, Turing proposes that the competitors should be housed in a “telepathy-proof room.” Leaving aside the point that, as a matter of fact, there is no current statistical support for telepathy—or clairvoyance, or precognition, or telekinesis—it is worth asking what kind of theory of the nature of telepathy would have appealed to Turing. After all, if humans can be telepathic, why shouldn't digital computers be so as well? If the capacity for telepathy were a standard feature of any sufficiently advanced system that is able to carry out human conversation, then there is no in-principle reason why digital computers could not be the equals of human beings in this respect as well. (Perhaps this response assumes that a successful machine participant in the imitation game will need to be equipped with sensors, etc. However, as we noted above, this assumption is not terribly controversial. A plausible conversationalist has to keep up to date with goings-on in the world.)

After discussing the nine objections mentioned above, Turing goes on to say that he has “no very convincing arguments of a positive nature to support my views. If I had I should not have taken such pains to point out the fallacies in contrary views.” (454) Perhaps Turing sells himself a little short in this self-assessment. First of all—as his brief discussion of solipsism makes clear—it is worth asking what grounds we have for attributing intelligence (thought, mind) to other people. If it is plausible to suppose that we base our attributions on behavioral tests or behavioral criteria, then his claim about the appropriate test to apply in the case of machines seems apt, and his conjecture that digital computing machines might pass the test seems like a reasonable—though controversial—empirical conjecture. Second, subsequent developments in the philosophy of mind—and, in particular, the fashioning of functionalist theories of the mind—have provided a more secure theoretical environment in which to place speculations about the possibility of thinking machines. If mental states are functional states—and if mental states are capable of realisation in vastly different kinds of materials—then there is some reason to think that it is an empirical question whether minds can be realised in digital computing machines. Of course, this kind of suggestion is open to challenge; we shall consider some important philosophical objections in the later parts of this review.

3. Some Minor Issues Arising

There are a number of much-debated issues that arise in connection with the interpretation of various parts of Turing (1950), and that we have hitherto neglected to discuss. What has been said in the first two sections of this document amounts to our interpretation of what Turing has to say (perhaps bolstered with what we take to be further relevant considerations in those cases where Turing's remarks can be fairly readily improved upon). But since some of this interpretation has been contested, it is probably worth noting where the major points of controversy have been.

3.1 Interpreting the Imitation Game

Turing (1950) introduces the imitation game by describing a game in which the participants are a man, a woman, and a human interrogator. The interrogator is in a room apart from the other two, and is set the task of determining which of the other two is a man and which is a woman. Both the man and the woman are set the task of trying to convince the interrogator that they are the woman. Turing recommends that the best strategy for the woman is to answer all questions truthfully; of course, the best strategy for the man will require some lying. The participants in this game also use teletypewriter to communicate with one another—to avoid clues that might be offered by tone of voice, etc. Turing then says: “We now ask the question, ‘What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman?” (434).

Now, of course, it is possible to interpret Turing as here intending to say what he seems literally to say, namely, that the new game is one in which the computer must pretend to be a woman, and the other participant in the game is a woman. (See, for example, Genova (1994), and Traiger (2000).) And it is also possible to interpret Turing as intending to say that the new game is one in which the computer must pretend to be a woman, and the other participant in the game is a man who must also pretend to be a woman. However, as Copeland (2000), Piccinini (2000), and Moor (2001) convincingly argue, the rest of Turing's article, and material in other articles that Turing wrote at around the same time, very strongly support the claim that Turing actually intended the standard interpretation that we gave above, viz. that the computer is to pretend to be a human being, and the other participant in the game is a human being of unspecified gender. Moreover, as Moor (2001) argues, there is no reason to think that one would get a better test if the computer must pretend to be a woman and the other participant in the game is a man pretending to be a woman (and, indeed, there is some reason to think that one would get a worse test). Perhaps it would make no difference to the effectiveness of the test if the computer must pretend to be a woman, and the other participant is a woman (any more than it would make a difference if the computer must pretend to be an accountant and the other participant is an accountant); however, this consideration is simply insufficient to outweigh the strong textual evidence that supports the standard interpretation of the imitation game that we gave at the beginning of our discussion of Turing (1950).

3.2 Turing's Predictions

As we noted earlier, Turing (1950) makes the claim that:

I believe that in about fifty years' time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning. … I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.

Most commentators contend that this claim has been shown to be mistaken: in the year 2000, no-one was able to program computers to make them play the imitation game so well that an average interrogator had no more than a 70% chance of making the correct identification after five minutes of questioning. Copeland (2000) argues that this contention is seriously mistaken: “about fifty years” is by no means “exactly fifty years,” and it remains open that we may soon be able to do the required programming. Against this, it should be noted that Turing (1950) goes on immediately to refer to how things will be “at the end of the century,” which suggests that not too much can be read into the qualifying “about.” However, as Copeland (2000) points out, there are other more cautious predictions that Turing makes elsewhere (e.g., that it would be “at least 100 years” before a machine was able to pass an unrestricted version of his test); and there are other predictions that are made in Turing (1950) that seem to have been vindicated. In particular, it is plausible to claim that, in the year 2000, educated opinion had altered to the extent that, in many quarters, one could speak of the possibility of machines' thinking—and of machines' learning—without expecting to be contradicted. As Moor (2001) points out, “machine intelligence” is not the oxymoron that it might have been taken to be when Turing first started thinking about these matters.

3.3 A Useful Distinction

There are two different theoretical claims that are run together in many discussions of The Turing Test that can profitably be separated. One claim holds that the general scheme that is described in Turing's Imitation Game provides a good test for the presence of intelligence. (If something can pass itself off as a person under sufficiently demanding test conditions, then we have very good reason to suppose that that thing is intelligent.) Another claim holds that an appropriately programmed computer could pass the kind of test that is described in the first claim. We might call the first claim “The Turing Test Claim” and the second claim “The Thinking Machine Claim”. Some objections to the claims made in Turing (1950) are objections to the Thinking Machine Claim, but not objections to the Turing Test Claim. (Consider, for example, the argument of Searle (1982), which we discuss further in Section 6.) However, other objections are objections to the Turing Test Claim. Until we get to Section 6, we shall be confining our attention to discussions of the Turing Test Claim.

3.4 A Further Note

In this article, we follow the standard philosophical convention according to which “a mind” means “at least one mind”. If “passing the Turing Test” implies intelligence, then “passing the Turing Test” implies the presence of at least one mind. We cannot here explore recent discussions of “swarm intelligence”, “collective intelligence”, and the like. However, it is surely clear that two people taking turns could “pass the Turing Test” in circumstances in which we should be very reluctant to say that there is a “collective mind” that has the minds of the two as components.

4. Assessment of the Current Standing of The Turing Test

Given the initial distinction that we made between different ways in which the expression The Turing Test gets interpreted in the literature, it is probably best to approach the question of the assessment of the current standing of The Turing Test by dividing cases. True enough, we think that there is a correct interpretation of exactly what test it is that is proposed by Turing (1950); but a complete discussion of the current standing of The Turing Test should pay at least some attention to the current standing of other tests that have been mistakenly supposed to be proposed by Turing (1950).

There are a number of main ideas to be investigated. First, there is the suggestion that The Turing Test provides logically necessary and sufficient conditions for the attribution of intelligence. Second, there is the suggestion that The Turing Test provides logically sufficient—but not logically necessary—conditions for the attribution of intelligence. Third, there is the suggestion that The Turing Test provides “criteria”—defeasible sufficient conditions—for the attribution of intelligence. Fourth—and perhaps not importantly distinct from the previous claim—there is the suggestion that The Turing Test provides (more or less strong) probabilistic support for the attribution of intelligence. We shall consider each of these suggestions in turn.

4.1 (Logically) Necessary and Sufficient Conditions

It is doubtful whether there are very many examples of people who have explicitly claimed that The Turing Test is meant to provide conditions that are both logically necessary and logically sufficient for the attribution of intelligence. (Perhaps Block (1981) is one such case.) However, some of the objections that have been proposed against The Turing Test only make sense under the assumption that The Turing Test does indeed provide logically necessary and logically sufficient conditions for the attribution of intelligence; and many more of the objections that have been proposed against The Turing Test only make sense under the assumption that The Turing Test provides necessary and sufficient conditions for the attribution of intelligence, where the modality in question is weaker than the strictly logical, e.g., nomic or causal.

Consider, for example, those people who have claimed that The Turing Test is chauvinistic; and, in particular, those people who have claimed that it is surely logically possible for there to be something that possesses considerable intelligence, and yet that is not able to pass The Turing Test. (Examples: Intelligent creatures might fail to pass The Turing Test because they do not share our way of life; intelligent creatures might fail to pass The Turing Test because they refuse to engage in games of pretence; intelligent creatures might fail to pass The Turing Test because the pragmatic conventions that govern the languages that they speak are so very different from the pragmatic conventions that govern human languages. Etc.) None of this can constitute objections to The Turing Test unless The Turing Test delivers necessary conditions for the attribution of intelligence.

French (1990) offers ingenious arguments that are intended to show that “the Turing Test provides a guarantee not of intelligence, but of culturally-oriented intelligence.” But, of course, anything that has culturally-oriented intelligence has intelligence; so French's objections cannot be taken to be directed towards the idea that The Turing Test provides sufficient conditions for the attribution of intelligence. Rather—as we shall see later—French supposes that The Turing Test establishes sufficient conditions that no machine will ever satisfy. That is, in French's view, what is wrong with The Turing Test is that it establishes utterly uninteresting sufficient conditions for the attribution of intelligence.

4.2 Logically Sufficient Conditions

There are many philosophers who have supposed that The Turing Test is intended to provide logically sufficient conditions for the attribution of intelligence. That is, there are many philosophers who have supposed that The Turing Test claims that it is logically impossible for something that lacks intelligence to pass The Turing Test. (Often, this supposition goes with an interpretation according to which passing The Turing Test requires rather a lot, e.g., producing behavior that is indistinguishable from human behavior over an entire lifetime.)

There are well-known arguments against the claim that passing The Turing Test—or any other purely behavioral test—provides logically sufficient conditions for the attribution of intelligence. The standard objection to this kind of analysis of intelligence (mind, thought) is that a being whose behavior was produced by “brute force” methods ought not to count as intelligent (as possessing a mind, as having thoughts).

Consider, for example, Ned Block's Blockhead. Blockhead is a creature that looks just like a human being, but that is controlled by a “game-of-life look-up tree,” i.e. by a tree that contains a programmed response for every discriminable input at each stage in the creature's life. If we agree that Blockhead is logically possible, and if we agree that Blockhead is not intelligent (does not have a mind, does not think), then Blockhead is a counterexample to the claim that the Turing Test provides a logically sufficient condition for the ascription of intelligence. After all, Blockhead could be programmed with a look-up tree that produces responses identical with the ones that you would give over the entire course of your life (given the same inputs).

There are perhaps only two ways in which someone who claims that The Turing Test offers logically sufficient conditions for the attribution of intelligence can respond to Block's argument. First, it could be denied that Blockhead is a logical possibility; second, it could be claimed that Blockhead would be intelligent (have a mind, think).

In order to deny that Blockhead is a logical possibility, it seems that what needs to be denied is the commonly accepted link between conceivability and logical possibility: it certainly seems that Blockhead is conceivable, and so, if (properly circumscribed) conceivability is sufficient for logical possibility, then it seems that we have good reason to accept that Blockhead is a logical possibility. Since it would take us too far away from our present concerns to explore this issue properly, we merely note that it remains a controversial question whether (properly circumscribed) conceivability is sufficient for logical possibility. (For further discussion of this issue, see Crooke (2002).)

The question of whether Blockhead is intelligent (has a mind, thinks) may seem straightforward, but—despite Block's confident assertion that Blockhead “has all of the intelligence of a toaster”—it is not obvious that we should deny that Blockhead is intelligent. Blockhead may not be a particularly efficient processor of information; but it is at least a processor of information, and that—in combination with the behavior that is produced as a result of the processing of information—might well be taken to be sufficient grounds for the attribution of some level of intelligence to Blockhead. For further critical discussion of the argument of Block (1981), see McDermott (2014).

4.3 Criteria

In his Philosophical Investigations, Wittgenstein famously writes: “An ‘inner process’ stands in need of outward criteria” (580). Exactly what Wittgenstein meant by this remark is unclear, but one way in which it might be interpreted is as follows: in order to be justified in ascribing a “mental state” to some entity, there must be some true claims about the observable behavior of that entity that, (perhaps) together with other true claims about that entity (not themselves couched in “mentalistic” vocabulary), entail that the entity has the mental state in question. If no true claims about the observable behavior of the entity can play any role in the justification of the ascription of the mental state in question to the entity, then there are no grounds for attributing that kind of mental state to the entity.

The claim that, in order to be justified in ascribing a mental state to an entity, there must be some true claims about the observable behavior of that entity that alone—i.e. without the addition of any other true claims about that entity—entail that the entity has the mental state in question, is a piece of philosophical behaviorism. It may be—for all that we are able to argue—that Wittgenstein was a philosophical behaviorist; it may be—for all that we are able to argue—that Turing was one, too. However, if we go by the letter of the account given in the previous paragraph, then all that need follow from the claim that the Turing Test is criterial for the ascription of intelligence (thought, mind) is that, when other true claims (not themselves couched in terms of mentalistic vocabulary) are conjoined with the claim that an entity has passed the Turing Test, it then follows that the entity in question has intelligence (thought, mind).

(Note that the parenthetical qualification that the additional true claims not be couched in terms of mentalistic vocabulary is only one way in which one might try to avoid the threat of trivialization. The difficulty is that the addition of the true claim that an entity has a mind will always produce a set of claims that entails that that entity has a mind, no matter what other claims belong to the set!)

To see how the claim that the Turing Test is merely criterial for the ascription of intelligence differs from the logical behaviorist claim that the Turing Test provides logically sufficient conditions for the ascription of intelligence, it suffices to consider the question of whether it is nomically possible for there to be a “hand simulation” of a Turing Test program. Many people have supposed that there is good reason to deny that Blockhead is a nomic (or physical) possibility. For example, in The Physics of Immortality, Frank Tipler provides the following argument in defence of the claim that it is physically impossible to “hand simulate” a Turing-Test-passing program:

If my earlier estimate that the human brain can code as much as 1015 bits is correct, then since an average book codes about 106 bits … it would require more than 100 million books to code the human brain. It would take at least thirty five-story main university libraries to hold this many books. We know from experience that we can access any memory in our brain in about 100 seconds, so a hand simulation of a Turing Test-passing program would require a human being to be able to take off the shelf, glance through, and return to the shelf all of these 100 million books in 100 seconds. If each book weighs about a pound (0.5 kilograms), and on the average the book moves one yard (one meter) in the process of taking it off the shelf and returning it, then in 100 seconds the energy consumed in just moving the books is 3 x 1019 joules; the rate of energy consumption is 3 x 1011 megawatts. Since a human uses energy at a normal rate of 100 watts, the power required is the bodily power of 3 x 1015 human beings, about a million times the current population of the entire earth. A typical large nuclear power plant has a power output of 1,000 megawatts, so a hand simulation of the human program requires a power output equal to that of 300 million large nuclear power plants. As I said, a man can no more hand-simulate a Turing Test-passing program than he can jump to the Moon. In fact, it is far more difficult. (40)

While there might be ways in which the details of Tipler's argument could be improved, the general point seems clearly right: the kind of combinatorial explosion that is required for a look-up tree for a human being is ruled out by the laws and boundary conditions that govern the operations of the physical world. But, if this is right, then, while it may be true that Blockhead is a logical possibility, it follows that Blockhead is not a nomic or physical possibility. And then it seems natural to hold that The Turing Test does indeed provide nomically sufficient conditions for the attribution of intelligence: given everything else that we already know—or, at any rate, take ourselves to know—about the universe in which we live, we would be fully justified in concluding that anything that succeeds in passing The Turing Test is, indeed, intelligent (possessed of a mind, and so forth).

There are ways in which the argument in the previous paragraph might be resisted. At the very least, it is worth noting that there is a serious gap in the argument that we have just rehearsed. Even if we can rule out “hand simulation” of intelligence, it does not follow that we have ruled out all other kinds of mere simulation of intelligence. Perhaps—for all that has been argued so far—there are nomically possible ways of producing mere simulations of intelligence. But, if that's right, then passing The Turing Test need not be so much as criterial for the possession of intelligence: it need not be that given everything else that we already know—or, at any rate, take ourselves to know—about the universe in which we live, we would be fully justified in concluding that anything that succeeds in passing The Turing Test is, indeed, intelligent (possessed of a mind, and so forth).

(McDermott (2014) calculates that a look-up table for a participant who makes 50 conversational exchanges would have about 1022278 nodes. It is tempting to take this calculation to establish that it is neither nomically nor physically possible for there to be a "hand simulation" of a Turing Test program, on the grounds that the required number of nodes could not be fitted into a space much much larger than the entire observable universe.)

4.4 Probabilistic Support

When we look at the initial formulation that Turing provides of his test, it is clear that he thought that the passing of the test would provide probabilistic support for the hypothesis of intelligence. There are at least two different points to make here. First, the prediction that Turing makes is itself probabilistic: Turing predicts that, in about fifty years from the time of his writing, it will be possible to programme digital computers to make them play the imitation game so well that an average interrogator will have no more than a seventy per cent chance of making the right identification after five minutes of questioning. Second, the probabilistic nature of Turing's prediction provides good reason to think that the test that Turing proposes is itself of a probabilistic nature: a given level of success in the imitation game produces—or, at any rate, should produce—a specifiable level of increase in confidence that the participant in question is intelligent (has thoughts, is possessed of a mind). Since Turing doesn't tell us how he supposes that levels of success in the imitation game correlate with increases in confidence that the participant in question is intelligent, there is a sense in which The Turing Test is greatly underspecified. Relevant variables clearly include: the length of the period of time over which the questioning in the game takes place (or, at any rate, the “amount” of questioning that takes place); the skills and expertise of the interrogator (this bears, for example, on the “depth” and “difficulty” of the questioning that takes place); the skills and expertise of the third player in the game; and the number of independent sessions of the game that are run (particularly when the other participants in the game differ from one run to the next). Clearly, a machine that is very successful in many different runs of the game that last for quite extended periods of time and that involve highly skilled participants in the other roles has a much stronger claim to intelligence than a machine that has been successful in a single, short run of the game with highly inexpert participants. That a machine has succeeded in one short run of the game against inexpert opponents might provide some reason for increase in confidence that the machine in question is intelligent: but it is clear that results on subsequent runs of the game could quickly overturn this initial increase in confidence. That a machine has done much better than chance over many long runs of the imitation game against a variety of skilled participants surely provides much stronger evidence that the machine is intelligent. (Given enough evidence of this kind, it seems that one could be quite confident indeed that the machine is intelligent, while still—of course—recognizing that one's judgment could be overturned by further evidence, such as a series of short runs in which it does much worse than chance against participants who use the same strategy over and over to expose the machine as a machine.)

The probabilistic nature of The Turing Test is often overlooked. True enough, Moor (1976, 2001)—along with various other commentators—has noted that The Turing Test is “inductive,” i.e. that “The Turing Test” provides no more than defeasible evidence of intelligence. However, it is one thing to say that success in “a rigorous Turing test” provides no more than defeasible evidence of intelligence; it is quite another to note the probabilistic features to which we have drawn attention in the preceding paragraph. Consider, for example, Moor's observation (Moor 2001:83) that “… inductive evidence gathered in a Turing test can be outweighed by new evidence. … If new evidence shows that a machine passed the Turing Test by remote control run by a human behind the scenes, then reassessment is called for.” This—and other similar passages—seems to us to suggest that Moor supposes that a “rigorous Turing test” is a one-off event in which the machine either succeeds or fails. But this interpretation of The Turing Test is vulnerable to the kind of objection lodged by Bringsjord (1994): even on a moderately long single run with relatively expert participants, it may not be all that unlikely that an unintelligent machine serendipitously succeeds in the imitation game. In our view, given enough sufficiently long runs with different sufficiently expert participants, the likelihood of serendipitous success can be made as small as one wishes. Thus, while Bringsjord's “argument from serendipity” has force against some versions of The Turing Test, it has no force against the most plausible interpretation of the test that Turing actually proposed.

It is worth noting that it is quite easy to construct more sophisticated versions of “The Imitation Game” that yield more fine-grained statistical data. For example, rather than getting the judges to issue Yes/No verdicts about both of the participants in the game, one could get the judges to provide probabilistic answers. (“I give a 75% probability to the claim that A is the machine, and only 25% probability to the claim that B is the machine.”) This point is important when one comes to consider criticisms of the “methodology” implicit in “The Turing Test”. (For further discussion of the probabilistic nature of “The Turing Test”, see Shieber (2007).)

5. Alternative Tests

Some of the literature about The Turing Test is concerned with questions about the framing of a test that can provide a suitable guide to future research in the area of Artificial Intelligence. The idea here is very simple. Suppose that we have the ambition to produce an artificially intelligent entity. What tests should we take as setting the goals that putatively intelligent artificial systems should achieve? Should we suppose that The Turing Test provides an appropriate goal for research in this field? In assessing these proposals, there are two different questions that need to be borne in mind. First, there is the question whether it is a useful goal for AI research to aim to make a machine that can pass the given test (administered over the specified length of time, at the specified degree of success). Second, there is the question of the appropriate conclusion to draw about the mental capacities of a machine that does manage to pass the test (administered over the specified length of time, at the specified degree of success).

Opinion on these questions is deeply divided. Some people suppose that The Turing Test does not provide a useful goal for research in AI because it is far too difficult to produce a system that can pass the test. Other people suppose that The Turing Test does not provide a useful goal for research in AI because it sets a very narrow target (and thus sets unnecessary restrictions on the kind of research that gets done). Some people think that The Turing Test provides an entirely appropriate goal for research in AI; while other people think that there is a sense in which The Turing Test is not really demanding enough, and who suppose that The Turing Test needs to be extended in various ways in order to provide an appropriate goal for AI. We shall consider some representatives of each of these positions in turn.

5.1 The Turing Test is Too Hard

Some people have claimed that The Turing Test doesn't set an appropriate goal for current research in AI because we are plainly so far away from attaining this goal. Amongst these people there are some who have gone on to offer reasons for thinking that it is doubtful that we shall ever be able to create a machine that can pass The Turing Test—or, at any rate, that it is doubtful that we shall be able to do this at any time in the foreseeable future. Perhaps the most interesting arguments of this kind are due to French (1990); at any rate, these are the arguments that we shall go on to consider. (Cullen (2009) sets out similar considerations.)

According to French, The Turing Test is “virtually useless” as a real test of intelligence, because nothing without a “human subcognitive substrate” could pass the test, and yet the development of an artificial “human cognitive substrate” is almost impossibly difficult. At the very least, there are straightforward sets of questions that reveal “low-level cognitive structure” and that—in French's view—are almost certain to be successful in separating human beings from machines.

First, if interrogators are allowed to draw on the results of research into, say, associative priming, then there is data that will very plausibly separate human beings from machines. For example, there is research that shows that, if humans are presented with series of strings of letters, they require less time to recognize that a string is a word (in a language that they speak) if it is preceded by a related word (in the language that they speak), rather than by an unrelated word (in the language that they speak) or a string of letters that is not a word (in the language that they speak). Provided that the interrogator has accurate data about average recognition times for subjects who speak the language in question, the interrogator can distinguish between the machine and the human simply by looking at recognition times for appropriate series of strings of letters. Or so says French. It isn't clear to us that this is right. After all, the design of The Turing Test makes it hard to see how the interrogator will get reliable information about response times to series of strings of symbols. The point of putting the computer in a separate room and requiring communication by teletype was precisely to rule out certain irrelevant ways of identifying the computer. If these requirements don't already rule out identification of the computer by the application of tests of associative priming, then the requirements can surely be altered to bring it about that this is the case. (Perhaps it is also worth noting that administration of the kind of test that French imagines is not ordinary conversation; nor is it something that one would expect that any but a few expert interrogators would happen upon. So, even if the circumstances of The Turing Test do not rule out the kind of procedure that French here envisages, it is not clear that The Turing Test will be impossibly hard for machines to pass.)

Second, at a slightly higher cognitive level, there are certain kinds of “ratings games” that French supposes will be very reliable discriminators between humans and machines. For instance, the “Neologism Ratings Game”—which asks participants to rank made-up words on their appropriateness as names for given kinds of entities—and the “Category Rating Game”—which asks participants to rate things of one category as things of another category—are both, according to French, likely to prove highly reliable in discriminating between humans and machines. For, in the first case, the ratings that humans make depend upon large numbers of culturally acquired associations (which it would be well-nigh impossible to identify and describe, and hence which it would (arguably) be well-nigh impossible to program into a computer). And, in the second case, the ratings that people actually make are highly dependent upon particular social and cultural settings (and upon the particular ways in which human life is experienced). To take French's examples, there would be widespread agreement amongst competent English speakers in the technologically developed Western world that “Flugblogs” is not an appropriate name for a breakfast cereal, while “Flugly” is an appropriate name for a child's teddy bear. And there would also be widespread agreement amongst competent speakers of English in the developed world that pens rate higher as weapons than grand pianos rate as wheelbarrows. Again, there are questions that can be raised about French's argument here. It is not clear to us that the data upon which the ratings games rely is as reliable as French would have us suppose. (At least one of us thinks that “Flugly” would be an entirely inappropriate name for a child's teddy bear, a response that is due to the similarity between the made-up word “Flugly” and the word “Fugly,” that had some currency in the primarily undergraduate University college that we both attended. At least one of us also thinks that young children would very likely be delighted to eat a cereal called “Flugblogs,” and that a good answer to the question about ratings pens and grand pianos is that it all depends upon the pens and grand pianos in question. What if the grand piano has wheels? What if the opponent has a sword or a sub-machine gun? It isn't obvious that a refusal to play this kind of ratings game would necessarily be a give-away that one is a machine.) Moreover, even if the data is reliable, it is not obvious that any but a select group of interrogators will hit upon this kind of strategy for trying to unmask the machine; nor is it obvious that it is impossibly hard to build a machine that is able to perform in the way in which typical humans do on these kinds of tests. In particular, if—as Turing assumes—it is possible to make learning machines that can be “trained up” to learn how to do various kinds of tasks, then it is quite unclear why these machines couldn't acquire just the same kinds of “subcognitive competencies” that human children acquire when they are “trained up” in the use of language.

There are other reasons that have been given for thinking that The Turing Test is too hard (and, for this reason, inappropriate in setting goals for current research into artificial intelligence). In general, the idea is that there may well be features of human cognition that are particularly hard to simulate, but that are not in any sense essential for intelligence (or thought, or possession of a mind). The problem here is not merely that The Turing Test really does test for human intelligence; rather, the problem here is the fact—if indeed it is a fact—that there are quite inessential features of human intelligence that are extraordinarily difficult to replicate in a machine. If this complaint is justified—if, indeed, there are features of human intelligence that are extraordinarily difficult to replicate in machines, and that could and would be reliably used to unmask machines in runs of The Turing Test—then there is reason to worry about the idea that The Turing Test sets an appropriate direction for research in artificial intelligence. However, as our discussion of French shows, there may be reason for caution in supposing that the kinds of considerations discussed in the present section show that we are already in a position to say that The Turing Test does indeed set inappropriate goals for research in artificial intelligence.

5.2 The Turing Test is Too Narrow

There are authors who have suggested that The Turing Test does not set a sufficiently broad goal for research in the area of artificial intelligence. Amongst these authors, there are many who suppose that The Turing Test is too easy. (We go on to consider some of these authors in the next sub-section.) But there are also some authors who have supposed that, even if the goal that is set by The Turing Test is very demanding indeed, it is nonetheless too restrictive.

Objection to the notion that the Turing Test provides a logically sufficient condition for intelligence can be adapted to the goal of showing that the Turing Test is too restrictive. Consider, for example, Gunderson (1964). Gunderson has two major complaints to make against The Turing Test. First, he thinks that success in Turing's Imitation Game might come for reasons other than the possession of intelligence. But, second, he thinks that success in the Imitation Game would be but one example of the kinds of things that intelligent beings can do and—hence—in itself could not be taken as a reliable indicator of intelligence. By way of analogy, Gunderson offers the case of a vacuum cleaner salesman who claims that his product is “all-purpose” when, in fact, all it does is to suck up dust. According to Gunderson, Turing is in the same position as the vacuum cleaner salesman if he is prepared to say that a machine is intelligent merely on the basis of its success in the Imitation Game. Just as “all purpose” entails the ability to do a range of things, so, too, “thinking” entails the possession of a range of abilities (beyond the mere ability to succeed in the Imitation Game).

There is an obvious reply to the argument that we have here attributed to Gunderson, viz. that a machine that is capable of success in the Imitation Game is capable of doing a large range of different kinds of things. In order to carry out a conversation, one needs to have many different kinds of cognitive skills, each of which is capable of application in other areas. Apart from the obvious general cognitive competencies—memory, perception, etc.—there are many particular competencies—rudimentary arithmetic abilities, understanding of the rules of games, rudimentary understanding of national politics, etc.—which are tested in the course of repeated runs of the Imitation Game. It is inconceivable that that there be a machine that is startlingly good at playing the Imitation Game, and yet unable to do well at any other tasks that might be assigned to it; and it is equally inconceivable that there is a machine that is startlingly good at the Imitation Game and yet that does not have a wide range of competencies that can be displayed in a range of quite disparate areas. To the extent that Gunderson considers this line of reply, all that he says is that there is no reason to think that a machine that can succeed in the Imitation Game must have more than a narrow range of abilities; we think that there is no reason to believe that this reply should be taken seriously.

More recently, Erion (2001) has defended a position that has some affinity to that of Gunderson. According to Erion, machines might be “capable of outperforming human beings in limited tasks in specific environments, [and yet] still be unable to act skillfully in the diverse range of situations that a person with common sense can” (36). On one way of understanding the claim that Erion makes, he too believes that The Turing Test only identifies one amongst a range of independent competencies that are possessed by intelligent human beings, and it is for this reason that he proposes a more comprehensive “Cartesian Test” that “involves a more careful examination of a creature's language, [and] also tests the creature's ability to solve problems in a wide variety of everyday circumstances” (37). In our view, at least when The Turing Test is properly understood, it is clear that anything that passes The Turing Test must have the ability to solve problems in a wide variety of everyday circumstances (because the interrogators will use their questions to probe these—and other—kinds of abilities in those who play the Imitation Game).

5.3 The Turing Test is Too Easy

There are authors who have suggested that The Turing Test should be replaced with a more demanding test of one kind or another. It is not at all clear that any of these tests actually proposes a better goal for research in AI than is set by The Turing Test. However, in this section, we shall not attempt to defend that claim; rather, we shall simply describe some of the further tests that have been proposed, and make occasional comments upon them. (One preliminary point upon which we wish to insist is that Turing's Imitation Game was devised against the background of the limitations imposed by then current technology. It is, of course, not essential to the game that tele-text devices be used to prevent direct access to information about the sex or genus of participants in the game. We shall not advert to these relatively mundane kinds of considerations in what follows.)

5.3.1 The Total Turing Test

Harnad (1989, 1991) claims that a better test than The Turing Test will be one that requires responses to all of our inputs, and not merely to text-formatted linguistic inputs. That is, according to Harnad, the appropriate goal for research in AI has to be to construct a robot with something like human sensorimotor capabilities. Harnad also considers the suggestion that it might be an appropriate goal for AI to aim for “neuromolecular indistinguishability,” but rejects this suggestion on the grounds that once we know how to make a robot that can pass his Total Turing Test, there will be no problems about mind-modeling that remain unsolved. It is an interesting question whether the test that Harnad proposes sets a more appropriate goal for AI research. In particular, it seems worth noting that it is not clear that there could be a system that was able to pass The Turing Test and yet that was not able to pass The Total Turing Test. Since Harnad himself seems to think that it is quite likely that “full robotic capacities [are] … necessary to generate … successful linguistic performance,” it is unclear why there is reason to replace The Turing Test with his extended test. (This point against Harnad can be found in Hauser (1993:227), and elsewhere.)

5.3.2 The Lovelace Test

Bringsjord et al. (2001) propose that a more satisfactory aim for AI is provided by a certain kind of meta-test that they call the Lovelace Test. They say that an artificial agent A, designed by human H, passes the Lovelace Test just in case three conditions are jointly satisfied: (1) the artificial agent A produces output O; (2) A's outputting O is not the result of a fluke hardware error, but rather the result of processes that A can repeat; and (3) H—or someone who knows what H knows and who has H's resources—cannot explain how A produced O by appeal to A's architecture, knowledge-base and core functions. Against this proposal, it seems worth noting that there are questions to be raised about the interpretation of the third condition. If a computer program is long and complex, then no human agent can explain in complete detail how the output was produced. (Why did the computer output 3.16 rather than 3.17?) But if we are allowed to give a highly schematic explanation—the computer took the input, did some internal processing and then produced an answer—then it seems that it will turn out to be very hard to support the claim that human agents ever do anything genuinely creative. (After all, we too take external input, perform internal processing, and produce outputs.) What is missing from the account that we are considering is any suggestion about the appropriate level of explanation that is to be provided. It is quite unclear why we should suppose that there is a relevant difference between people and machines at any level of explanation; but, if that's right, then the test in question is trivial. (One might also worry that the proposed test rules out by fiat the possibility that creativity can be best achieved by using genuine randomising devices.)

5.3.3 The Truly Total Turing Test

Schweizer (1998) claims that a better test than The Turing Test will advert to the evolutionary history of the subjects of the test. When we attribute intelligence to human beings, we rely on an extensive historical record of the intellectual achievements of human beings. On the basis of this historical record, we are able to claim that human beings are intelligent; and we can rely upon this claim when we attribute intelligence to individual human beings on the basis of their behavior. According to Schweizer, if we are to attribute intelligence to machines, we need to be able to advert to a comparable historical record of cognitive achievements. So, it will only be when machines have developed languages, written scientific treatises, composed symphonies, invented games, and the like, that we shall be in a position to attribute intelligence to individual machines on the basis of their behavior. Of course, we can still use The Turing Test to determine whether an individual machine is intelligent: but our answer to the question won't depend merely upon whether or not the machine is successful in The Turing Test; there is the further “evolutionary” condition that also must be satisfied. Against Schweizer, it seems worth noting that it is not at all clear that our reason for granting intelligence to other humans on the basis of their behavior is that we have prior knowledge of the collective cognitive achievements of human beings.

5.4 Should the Turing Test be Considered Harmful?

Passing the Turing Test Does Not Mean the End of Humanity

Author information ►Article notes ►Copyright and License information ►

Received 2015 Sep 18; Accepted 2015 Nov 20.

Copyright © The Author(s) 2015

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Abstract

In this paper we look at the phenomenon that is the Turing test. We consider how Turing originally introduced his imitation game and discuss what this means in a practical scenario. Due to its popular appeal we also look into different representations of the test as indicated by numerous reviewers. The main emphasis here, however, is to consider what it actually means for a machine to pass the Turing test and what importance this has, if any. In particular does it mean that, as Turing put it, a machine can “think”. Specifically we consider claims that passing the Turing test means that machines will have achieved human-like intelligence and as a consequence the singularity will be upon us in the blink of an eye.

Keywords: Deception detection, Natural language, Turing’s imitation game, Chatbots, Machine misidentification

Introduction

There are those who believe that passing the Turing test means that human-level intelligence will have been achieved by machines [10]. The direct consequence of this, as pointed out by Kurzweil [11] and others, is that the singularity will be upon us, thereby resulting in the demise of the human race. In this paper we do not wish to dispute the latter of these arguments, dramatic though it is. What we do wish to dispel, however, is the assumption which links passing the Turing test with the achievement for machines of human-like or human-level intelligence.

Unfortunately the assumed chain of events which means that passing the Turing test sounds the death knell for humanity appears to have become engrained in the thinking in certain quarters. One interesting corollary of this is that when it was announced in 2014 that the Turing test had been finally passed [39] there was an understandable response from those same quarters that it was not possible for such an event to have occurred, presumably because we were still here in sterling health to both make and debate the pronouncement. Interestingly the main academic argument which was thrown up was that the machine which passed the test did not exhibit human-like intelligence, and therefore, the test could not have been passed. Consider this, for example, from Murray Shanahan of Imperial College London: “Of course the Turing Test hasn’t been passed…We are still a very long way from achieving human-level AI” [10].

It is therefore, we feel, of vital importance that we look at various aspects of this question. Because if Murray Shanahan and Ray Kurzweil and their colleagues are correct then the developers of the computer programmes which compete in the Turing test are, if they are successful, about to put an end to the human race. So shouldn’t we do something about such developers, maybe lock them up, well away from any laptop in case they design a programme of destruction. On the other hand dare we suggest that either Shanahan or Kurzweil is incorrect?

The singularity [11] is an event dependent on the overall improvement and power of Artificial Intelligence where intelligent machines can design successive generations of increasingly more powerful machines, eventually creating intelligence that firstly is equivalent to that of humans and then surpasses it. Indeed the capabilities of such an Artificial Intelligence may well be impossible for a human to comprehend. The singularity is the point beyond which events are beyond the control of humans, resulting either in humans upgrading (with implants) to become Cyborgs or with intelligent machines taking control. Either way, it’s not good news for ordinary humans.

Taking a sensible look at this issue, and someone needs to, we wish to analyse why, with the “standard Turing test” (Fig. 1b) (defined as 5 min, unrestricted question-answer simultaneous comparison version [18, 25]—having been passed—more than 30 % of the human interrogators fail to correctly identify the machine) we are all still here and this paper can be read by (presumably) humans. The flaw in the Shanahan/Kurzweil argument, at this time, we contest is that Shanahan is just plain wrong. Passing the Turing test has no relationship with human-like intelligence (or AI) other than in the sense of a machine possibly being reasonably effective in its own version of human conversation for a sustained short period, over which time it has proved to be successful in fooling a collection of humans. Kurzweil’s singularity argument may or may not also be wrong, but that’s not what we wish to discuss here. The point is that as long as one of the Shanahan/Kurzweil pair is wrong then the human race is still looking good (apart from its multitude of other problems that is).

Fig. 1

Turing’s two tests for his imitation game: Lefta one-to-one; Rightb one judge-two hidden interlocutors

What we wish to do in this paper is to take a look at what the Turing test actually is, as stipulated/set out by Alan Turing, rather than to consider some related test which some might wish to call the Turing test or what someone might want the test to be, because they’ve thought of a different/better test. We acknowledge here that different/better tests of computer ability, even in terms of only conversation, exist but again they are not the subject of this paper. So we stick as closely as possible to what the test is, based entirely on Turing’s own words. We acknowledge, however, that there are different interpretations of the test, whether each test should last for 5 or 10 min for example or even if Turing intended the test as some sort of mind modelling exercise. However, none of these, we argue, result in the end of humanity. Indeed Turing himself said that humans would be needed to maintain the machines [28].

We then subsequently present some example discourses, taken from a series of tests held at the Royal Society in 2014. One of these involves the machine Eugene Goostman which actually passed the test at that event. Following this we look at some ways in which machines can pass the test, as it has been defined in terms of the standard definition [26]. Finally we draw some conclusions. One of which, and some might argue perhaps the most important, is that humanity is not about to expire.

To be clear though we are aware that different theories regarding the Turing test and its meaning exist and that other theories have been put forward along the lines that machines will not take over from humans. In this paper we are explicitly only concerned with the pairing of statements that says (a) passing the Turing test means that human-level intelligence will have been achieved in AI and (b) when AI exhibits human-level intelligence that will mean the end of humanity as we know it. We are only too aware, for example, that in describing his test, Turing discussed men and women as hidden entities and the possibility of gender blur. Whilst this is extremely interesting, it is not what we wish to look at in this paper. We focus here entirely on one specific issue which is that if both Shanahan and Kurzweil are correct then a machine passing the Turing test means that humanity is doomed!

The Turing Test

In his 1950 paper entitled “Computing Machinery and Intelligence” [30], Alan Turing started by considering the question, “Can machines think?” However, rather than get bogged down with definitions of both of the words “machine” and “think” he replaced the question with one based on a much more practical scenario, namely his imitation game. The game has since become more widely known, particularly in the popular domain, as the Turing test. He did not, however, at any point, refer to his test/game as being any indication of intelligence, human-like or otherwise.

Turing [31] described the game as follows: “The idea of the test is that a machine has to try and pretend to be a man, by answering questions put to it, and it will only pass if the pretence is reasonably convincing. A considerable portion of a jury, who should not be expert about machines, must be taken in by the pretence” [4]. So Turing spoke here of a jury (nominally 12) as opposed to the “average interrogators” he mentioned in his 1950 paper [30], as we will see shortly. Importantly he also spoke of a machine “passing” the test and that the interrogators should not be experts. Interestingly, however, we do include a transcript later in which a machine did fool an expert into thinking that it was human.

Turing’s imitation game is described as an experiment that can be practicalised in two different ways (see Fig. 1) [17]:

  1. one interrogator–one hidden interlocutor (Fig. 1a),

  2. one interrogator–two hidden interlocutors (Fig. 1b).

In both cases the machine must provide “satisfactory” and “sustained” answers to any questions put to it by the human interrogator [30, p. 447].

Of the types of test looked at here, the 3-participant tests have previously been shown to be stricter tests, i.e. more difficult for machines, than 2-participant tests in which an interrogator converses with only one hidden entity, either a human or machine, at a time [22]. For the main arguments set out in this paper, the results apply to either type of test.

Turing did not explicitly state specific rules for his test in a paragraph headed “Rules for my test” or some such like, and hence what is required of a machine in order to pass. What he did clearly state in his 1950 paper, and which we contest amounts to the same thing, was as follows: “I believe that in about 50 years’ time it will be possible, to programme computers to make them play the imitation game so well that an average interrogator will not have more than 70 % chance of making the right identification after 5 min of questioning” [30]. Having clearly spelt out the imitation game, this would appear to be direction enough from Turing.

Although this appeared to have been written more in the sense of a prediction, it is the only place where Turing directly stated parameters for his game/test, with a clear hurdle to be met in terms of performance. To put this more simply, for a machine to pass the Turing test, in all of the tests in which a machine takes part, the interrogators must make the wrong identification (i.e. not the right identification) 30 % or more of the time after, in each case, 5-min-long conversations. We can take it directly that the wrong identification is anything other than the right identification. Also, because Turing spoke of a Jury we can understand from that that at least twelve judges/interrogators must be able to test a machine in their own way/style. But also that hundreds of judges are not a requirement, a jury is appropriate and will suffice.

We will shortly look at what is meant by the “right identification”, as this is critical. However, we can take it immediately that Turing set the challenge as a 5-min exercise, no more and no less. At no other point in Turing’s papers did he mention any other time duration for his tests. In general we can experience that the longer tests last so the more difficult it is for a machine to satisfactorily pretend to be a human. Indeed given the technology we have at present, 5 min would appear to be an appropriate challenge. In a 20-min test, at this time in computer natural language development, it is extremely difficult for a machine to fool a human interrogator over that period into thinking that it’s a human.

It is widely recognised that getting machines to achieve, or at least appear to achieve, human-like responses is a difficult task [5, 32]. Even in terms of the Turing test, based purely on conversation, taking into account such issues as what knowledge is brought to the table and assumed [34] or whether one of the entities is lying [35] can completely change an appearance. There are also numerous strategies that can be employed by machines in order to successfully fool an interrogator [36].

One fuzzy issue, however, is did Turing mean 5 min in total for a parallel paired 3-participant conversation or rather allowing an average of 5 min each, hence a total of 10 min, for the two hidden entities involved [23]? Michie [14] interpreted the test as approximately 2 ½-min interrogation per entity in a pair. However, in practice the conversation is rarely balanced exactly. For all of the practical tests which we have organised, a time limit of 5 min, as stated by Turing himself, has been placed, because the current state of conversational technology is not ready for longer duration tests. That said, we acknowledge the potential validity of the alternative, which we will call here the Sloman view.

Whether it is Michie, Sloman or ourselves who reads this one correctly is a relatively insignificant point in the big argument. Otherwise we would be in the laughable state which says OK a machine can fool you into thinking they are human over a 5-min conversation but they can’t do so over 10 min therefore we’re all saved and humanity can go on. Scientifically this would mean there must be a conversation time somewhere between 5 and 10 min such that once it is achieved by a machine, we’re all doomed.

It is also interesting that in the 2-participant test an interrogator spends all 5 min conversing with one machine only whereas in the 3-participant test the average time spent with each hidden entity is clearly 2.5 min. Despite this the 3-participant test, the one Turing spoke of in 1950 [30], is the more difficult for machines to achieve good results, most likely because of the direct, parallel comparison that occurs in such cases.

It is worth remembering though that in either type of test an interrogator, in an actual “official” Turing test, when communicating with a machine, does not know at that time that it is in fact a machine, indeed it is a decision about its nature that they have to come to. This is a critical point and is one of the main features of the test. Such a situation is, as you might guess, far different to the case when an interrogator knows for certain that they are communicating with a machine, as in the case of an online bot [1]. Despite this vital point, for some reason there are a number of people who completely ignore this critical aspect of the test, go online to converse with a bot, which they already know to be a bot, and declare in conclusion that it is obviously a bot [16]. Clearly some education is required as to what the Turing test actually involves.

However, this is somewhat akin to the Oxford University Philosophy Professor and his students who took part in 9 actual Turing tests in 2008 and then went to academic print in claiming it was easy to spot which were the machines and which were the humans in all the tests in which they were involved; indeed they published this in a peer-reviewed journal [6]. In the same peer-reviewed journal it was, however, subsequently explained that the philosopher and his team had only correctly identified the hidden entities in 5 of the 9 tests. In the other 4 cases they had, without realising it, misclassified humans as machines and machines as being human [21].

In the following sections we consider a number of transcripts obtained from practical Turing tests. We refer here to 5-min-long tests only and show actual transcripts from such tests. Although this is the run time stated by Turing himself [30], as indicated in the next section, it is in fact not a critical issue with regard to the main argument raised in this paper. As you will see, in the tests carried out there was a hard cut-off at the end of each discourse and no partial sentences were transmitted. Once a sentence had been transmitted it could not be altered or retracted in any way. The transcripts appear exactly as they occurred, and any spelling mistakes and other grammatical errors are not due to poor editorial practice.

In all the two hidden entity (3-participant) tests (see Fig. 1b) judges were clearly told beforehand that in each parallel conversation one of the hidden entities was human and the other was a machine. They were, however, given no indication as to whether the LHS (left-hand side of the computer screen) or RHS would be human or machine. On the judges’ score sheets each judge could mark both the LHS and RHS entities as being Human, Machine or they could say if they were Unsure [22, 37].

Right Identification

The Turing test involves a machine which pretends to be a human in terms of conversational abilities. The “right identification” stated by Turing can mean either that a judge merely correctly identifies the machine or that they correctly identify, at the end of a paired conversation, which was the machine and which was the human [27]. However, we are not so interested here with cases in which a judge mistakes a human for a machine. This phenomenon, known as the confederate effect [19], has been discussed elsewhere [20, 38, 41]. It needs to be recognised, however, that such a decision might affect the judge’s decision regarding the machine being investigated in parallel.

The concept of what is and what is not a “right identification” is important as far as a machine taking part in the Turing test, and the 30 % pass mark, is concerned, and we take a relatively strict approach in this sense. One viewpoint is that for a judge to make the “right identification” they must correctly identify both the machine as being a machine and the hidden human as being a human [27]. This means that any other decision on the part of a judge would not be a “right identification”; this therefore includes cases in which either the machine is selected as a human or a human is selected as a machine. Also included are cases in which the judge is Unsure about either or both entities as the judge in such cases has failed to identify the machine as a machine and/or the human as a human—the right identification. Our stricter interpretation here, however, only considers the cases in which the machine was itself not correctly identified, the judge stating either that the machine was a human or that they were Unsure about it.

It is also possible to encounter cases in which a machine was correctly identified as being a machine but where the parallel hidden human in each case was incorrectly selected as being a machine and/or the judge gave an Unsure mark against the human, as either of these would not be a right identification. Such cases are though troublesome as far as the whole basis of the test is concerned in that a machine, if so identified, would be gaining a point not so much because of its own competence but rather because of the nature of its hidden human competitor. However, it does make things slightly easier for the machine.

Taking the Test

We include in this section several transcripts which arose from the Turing tests which took place at the Royal Society in June 2014. The information given to the judges on the day, which is true as observed here, is that one of the hidden entities is a human and the other is a machine. However, it was up to the judge involved in each case, and now it us up to the reader, to decide which was which based on the transcripts shown.

The first of these transcript pairs we simply give here for you to decide which was which. In order for you to check on the answer we provide the solution, along with the judge’s decision on the day, at the end of the paper.

In this conversation it can be observed that the judge appeared to discourse much more with the left-hand entity rather than that on the right-hand side. This is something that occurs fairly often. Both conversations though lasted for the same 5 min although clearly the LHS was more comprehensive.

Now consider the next parallel conversation. In this case we give both the solution and the judge’s decision at the time directly at the end of the transcripts.

In this conversation the LHS entity was a senior male human, whereas the RHS was the machine Eugene Goostman. This was in fact one of the ten transcripts responsible for Eugene passing the Turing test in June 2014. The judge decided that the LHS was indeed human, although they were unable to give any further details. However, they also decided that the RHS (Eugene) was a human.

The conversation with the human entity was, it can be observed, relatively boring, merely being a case of question and answer with limited responses. Eugene did very well here though as the judge was an expert on machines/robotics and was well aware of machine conversations. The fact that Eugene convinced such a person is quite an accomplishment. Eugene tried to power the conversation by asking the judge questions. At first the judge was not having any of it and simply ignored Eugene’s question, even though this was rather rude. Eugene persevered, however, and eventually the judge gave in and responded. Generally there was more content in Eugene’s conversation than that with the human hidden entity.

One aspect of the test is its reliance on the interrogators to do a good job. It is the interrogator who conducts and drives each conversation and who also makes the decision on which entity was the machine and which was the human. Some researchers have suggested this as a weak point of the test [9] although we believe that it is an important part of the test as Turing set it out. However, here the quite direct question and answer attempt appears to have failed to out the machine.

The transcript that follows takes on a slightly different style:

In this case the LHS was in fact the machine Eugene, whereas the RHS was a male human. The judge in this case decided that the LHS was a male human, non-native English speaking and a teenager. They were also definite that the RHS was a machine.

It is one of those transcripts where quite simply there was a lot more going on in the Eugene conversation; it had the tendency to draw the interrogator to it, whereas there’s not much at all in the human case. In fact the hidden human may well have not done well for themselves by claiming no knowledge about the Turing test early on. Possibly incorrect decisions can be made by interrogators based on an assumption that everyone must know a particular piece of information [34]. In this case though, as the event was a Turing test session they appear to have some quite strong evidence. It probably goes to show that you cannot rely on the knowledge base of humans.

In the next transcript we again give both the solution and the judge’s decision at the time directly at the end of the transcripts.

Here on the LHS it was a hidden human entity, whereas on the RHS it was the machine JFred. The judge concluded, however, that on the LHS it was a machine and felt that the entity exhibited very poor human-like conversation. On the other hand the judge was confident that the RHS (the machine JFred) was a male human who was most likely an American.

The judge’s decision in terms of the LHS entity was not particularly surprising. The hidden human entity was asked on more than one occasion what their name was to which they replied “I don’t know”. As a result the judge spent much more time conversing with the machine on the RHS. This is a particular aspect of the test that it involves a direct comparison between a machine and a human, rather than merely a machine conversing on its own. Here we can see that the hidden human involved was quite simply relatively poor at conversation and this helped the cause of the machine.

Alternative Views

There are many different interpretations of Turing’s imitation game, and much controversy has arisen as to which of these, if any, was Turing’s own intended version [15]. The vast majority appear to view the game in the form of what is commonly known as the “Standard Turing Test” [26], and this is the interpretation taken here. It is a literal interpretation based essentially on what Turing actually said in his presentations and his 1950 paper and without recourse to tangential connections and/or pure conjecture on what a paper’s author believes that Turing really meant to say.

We acknowledge as examples of this, that some see it as being something to do with artistic and emotional intelligence [24], whereas others deem it to be concerned with modelling the human mind by generating its verbal performance capacity [8]. Others meanwhile regard it in terms of considering the gender aspect, the sex of the human foil being important in the test [7, 9, 12, 26]. None of these views, however, do we see as indicating the test to be detrimental to the human race.

However, we then have the Shanahan view, quoted by his own University news as: “Turing also didn’t say a 5-min test would mean success achieving human-level AI; for that, he would require much longer conversations” [10]. The point being here not whether the test is a 5-min one or a 20-min one but rather that in the mind of Shanahan there is some time for which a machine could successfully converse that would indicate that its intelligence has reached human-level.

Unfortunately Shanahan is not a lone voice. Consider if you will: “Hunch CEO Chris Dixon tweeted, ‘The point of the Turing Test is that you pass it when you’ve built machines that can fully simulate human thinking.’ No, that is precisely not how you pass the Turing test. You pass the Turing test by convincing judges that a computer program is human” [2]. Interestingly it is the emulation of human intelligence, in a machine, that Kurzweil picks up on as being the tipping point [11].

Then there are those who (somehow) read all sorts of concepts into the Turing test, telling us what Turing actually had in mind with his test even if he didn’t tell us himself: “Alan Turing himself envisioned—a flexible, general-purpose intelligence of the sort that human beings have, which allows any ordinary individual to master a vast range of tasks, from tying his shoes to holding conversations and mastering tenth-grade biology” [13].

From these voices it is clear that there is a school of opinion that associates a Turing test pass with human-level intelligence. We accept, in Shanahan’s case, that there is a question about the actual duration of the conversation involved. However, we would argue that to be of little importance in comparison with the big picture issues that are at stake here.

Silence

In this section we explain briefly how it is quite possible for a machine to pass the Turing test not by its apparent skill at human conversation but rather by simply remaining silent throughout [40]. Rather than being a mere theoretical or philosophical quirk it turns out that in fact passing the Turing test in this way also has an underlying practical basis to support it with numerous examples to boot.

Turing said that in the test a machine had to try and pretend to be a man (although now/here we take that to mean human). In his 1950 paper he also pointed to the fact that at the end of 5 min the judge had to make a decision as to the nature of the entity. If they made the right identification and correctly identified the machine then this would effectively be a point against the machine, whereas if the judge either thought that the machine was a human or if they were Unsure as to its nature then this would be a wrong identification and would be a point for the machine. The pass mark for a machine in the test was set by Turing to be 3 or more points out of every 10 [30].

But here we face a critical issue, what if a machine was to remain silent? The basic nature of the test is that a machine, by conversing, fails the test by giving themselves away as clearly being a machine. So if they remain silent they cannot give themselves away.

If a machine remains completely silent during a 5-min conversation a judge receives no response to any of their questions or discussion from the hidden entity and therefore, in theory at least, cannot not make the right identification and definitely say that they have been conversing with a machine. It would not be expected that a judge, under such circumstances, would categorise the silent entity as being a human, although that is a possibility, the most likely case is for the judge, as we have seen in the practical examples, to give an “Unsure” response. This of course is not a right identification and is therefore a point for the machine.

It is thus quite possible for a machine to simply remain silent to any utterances of a judge and to pass the Turing test if at least 3 out of 10 judges as a result either rate the machine as being a human or indicate that they are unsure. The only thing acting against such a strategy is the fact that the machine is, in each conversation, competing against a human and if the judge is certain that the other (hidden) entity is a human then they can deduce that therefore the silent entity must be a machine. Conversely in practice many humans are actually categorised as machines in such tests [38]. Therefore, it is also potentially possible that a (silent) machine can be categorised as being human mainly because their hidden human competitor is categorised by the judge as being a machine.

We now give an example of a transcript in which a machine simply did not respond. This particular “conversation” occurred during the Turing tests held at the Royal Society in June 2014 between a judge and the machine Cleverbot. At the end of the conversation the judge was not able to identify the hidden entity as being a machine, i.e. they did not make the right identification, deciding that they were “unsure”. It is straightforward to see that there quite simply was not enough information for the judge to go on.

Example transcript

  • [10:58:08] Judge: good day

  • [10:58:08] Entity:

  • [10:58:46] Judge: is no response an answer

  • [10:58:46] Entity:

  • [10:59:35] Judge: am i not speaking you’re language

  • [10:59:35] Entity:

  • [11:00:25] Judge: silence is golden

  • [11:00:25] Entity:

  • [11:01:32] Judge: shhh

  • [11:01:32] Entity:

  • [11:03:07] Judge: you make great conversation

  • [11:03:07] Entity:

As far as we are aware, the silence on the part of the machine in this transcript was caused by a technical fault rather than any decision (conscious or otherwise) on the part of the machine. That said, it is perhaps a quirk with the Turing test, as described by Turing, that it is, in theory at least, quite possible for a machine to pass the test by remaining silent throughout. Essentially the machine makes no utterances which give the game away that they are a machine and hence the judges involved have no evidence to use against them. This whole issue of the strategy of silence is discussed at length in Warwick and Shah [40].

The example given here is just that, an example, as there are numerous other cases reported on in Warwick and Shah [40]. An interesting feature is the response of the interrogators involved with those particular transcripts. In each case the interrogator has been a different person yet their responses have been remarkably similar. Essentially they have all judged the hidden entity on the evidence of the transcript in front of them and have not been swayed by the other parallel conversation they were involved with, although that might have taken more of their attention due to the machine’s silence. So, in practice, interrogators appear to state that they are unsure about the silent entity, thereby supporting the argument given in this section.

As far as the Turing test is concerned, however, if a machine remains silent and passes the test then of course this could have been due to the fact that the machine was, for example, switched off or perhaps wasn’t even there at all. For someone to make the link between a switched-off computer and human-like intelligence is frankly ridiculous in the extreme. In fact to link any level of intelligence with a switched-off computer is not sustainable. Otherwise, switch the computer on and its level of intelligence drops—clearly this is contrary to what we witness.

So we have here the Shanahan/Kurzweil argument that the fact that a computer was unplugged when subjected to a series of Turing tests, whether they are of 5-min duration or, simply to please Shanahan, lasting for 30 min, means that the human race will come to an end. Whoever is responsible for unplugging the machine clearly has a lot to answer for.

Discussion

Similarly to the opening of his 1948 paper “I propose to investigate the question as to whether it is possible for machinery to show intelligent behaviour” [29] in which Turing introduced an imitation game, Turing, perhaps mischievously (we will never know), started his 1950 paper by considering whether machines could think. Replacing this question with a conversational imitation test, the concept being that if a machine could do sufficiently well (or rather not do so badly) at his test, dare we say here to pass the Turing test, then we would have to concede that it was a thinking machine. In a direct way, whatever the pass mark and whatever the exact rules and nature of his test, it became a direct practical replacement for a much more philosophical question regarding the thinking process. On the other hand for a machine to fail the test we would have to concede that it is not a thinking entity. So can we say that if a machine passes the Turing test it is a thinking entity?

Well whatever thinking is, it is certainly a property of each and every human brain that exists within a human body. We wish to exclude from the argument here brains, consisting of human neurons, which are grown and placed within a robot body [33] for no better reason than they complicate the argument. The assumption from the inexperienced Turing tester might be that a human, acting as a hidden entity, machine foil, would be expected to pass the Turing test on a regular basis as long as they are simply themselves. It might be thought that occasionally they might be classified as a machine by a poor judge but that this would be an odd occurrence and almost surely the vast majority of judges would classify them as being human. Unfortunately this is far from the truth. Indeed numerous humans have been classified at different times as being a machine [38].

In the example transcripts it was shown how a machine can be thought to be human because of its communication abilities, but also when a hidden human does not communicate so well this can in fact assist a machine in its goal. In the second set of transcripts we could see the machine Eugene Goostman at work. Eugene achieved the 30 % pass mark in the tests, the full set of transcripts to achieve that appearing in Warwick and Shah [39]. In this particular transcript case both the hidden human and Eugene were classified as being human. This is an interesting point because even when judges are specifically told that one entity is a machine and the other is a human it is frequently the case that their final decision is other than a simple human/machine pairing.

Conclusions

It is fairly clear to see that when the test was set up in 1950 such skills as a machine fooling people into believing that it is a human through a short communication exercise would have been very difficult for most people to understand. However, in introducing the test, Turing linked it inextricably with the concept of thinking and there is a nice philosophical argument in consequence concerning how one can tell if another human is thinking. This was a brilliant link by Turing which, as a result, has brought about a multitude of arguments between philosophers and AI researchers as to the test’s meaning and gravity.

But Turing’s game has extended way beyond the ivory towers of academe and has a truly popular following. As an example the Wikipedia “Turing Test” page typically receives 2000–3000 views every day at present. On 1 day, 9 June 2014, after it was announced that the Turing test had been passed, the same page received a total of 71,578 views, an amazing figure. As a comparison, top Wikipedia pages such as “Leonardo DiCaprio” and “The Beatles” received respectively only 11,197 and 10,328 views on that same day. But with this popular following has come misconceptions as to what the test is about and, in particular a sort of folklore mythology has arisen that the Turing test is a test for human-like intelligence. As we have seen, this folklore has been fuelled by some academics and technical writers who perhaps have not read the works of Turing as thoroughly as they should.

Let us be clear, the Turing test is not, never was and never will be a test for human-level or even human-like intelligence. Turing never said anything of the sort either in his papers or in his presentations. The Turing test is not, never was and never will be a test for human-level thinking. Turing didn’t say that either.

The Turing test does require a machine taking part to condemn itself by what it says, as judged subjectively by the human interrogator. Alternatively if a machine does not give itself away on a sufficient number of occasions it could result in a machine “passing the Turing test”, in the extreme case simply by remaining silent. Of course, this does beg the question, what exactly does it mean to pass the Turing test?

Earlier in the paper we considered that Turing introduced his imitation game as a replacement for the question “Can machines think?” [30]. The end conclusion by many as a result of this is that if a machine passes the test then we have to regard it as a thinking machine. Turing clearly dissociated the way a machine thinks from the human version. He said “May not machines carry out something which ought to be described as thinking but which is very different from what a man does?” [30]. So even human-like thinking for machines was not on the radar as far as Turing was concerned. He also said in reference to the year 2000, “one will be able to speak of machines thinking without expecting to be contradicted” [30]. Noam Chomsky wondered that of all the ways a machine could display intelligence why did Turing choose a test involving human language [3] which is merely one small part of human intelligence.

The Turing test is a simple test of a machine’s communication ability. It is interrogated by a human and is directly compared with another human in a parallel fashion with regard to human communication abilities. In that sense it merely involves one aspect of human intelligence, as pointed out by Chomsky. If a machine passes the Turing test it exhibits a capability in communication. This does not in any terms mean that the machine displays human-level intelligence or consciousness. So even if Kurzweil is correct in his prediction, for a machine to pass the Turing test does not mean that the end of humanity is just around the corner.

Solution

Here we provide a solution to the first of the Transcripts included in the “Taking the Test” section which took place between a human interrogator and two hidden entities. The LHS entity was in fact the machine/program Ultra Hal, whereas the RHS entity was an English-speaking male. Meanwhile whilst the judge correctly identified that the LHS entity was a machine they were unsure about the RHS entity based on the transcripts shown.

Acknowledgments

Harjit Mehroke for Fig. 1a; C. D. Chapman for Fig. 1b.

Compliance with Ethical Standards

Conflict of Interest

Kevin Warwick and Huma Shah declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all participants for which identifying information is included in this article.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Contributor Information

Kevin Warwick, Phone: 44-247765-9893, Email: ku.ca.yrtnevoc@kciwraw.k.

Huma Shah, Email: ku.ca.yrtnevoc@hahs.h.

References

1. Aamoth D. Interview with Eugene Goostman, the fake kid who passed the Turing test. June 9, 2014. http://time.com/2847900/eugene-goostman-turing-test/.

2. Auerbach D. A computer program finally passed the Turing test? 10 June 2014. http://www.slate.com/articles/technology/bitwise/2014/06/turing_test_reading_university_did_eugene_goostman_finally_make_the_grade.single.html.

3. Chomsky N, et al. Turing on the “imitation game”, chapter 7. In: Epstein R, et al., editors. Parsing the Turing test. Berlin: Springer; 2008.

4. Copeland BJ. The essential Turing—the ideas that gave birth to the computer age. Oxford: Clarendon Press; 2004.

5. Esposito A, Fortunati L, Lugano G. Modeling emotion, behaviour and context in socially believable robots and ICT interfaces. Cogn Comput. 2014;6(4):623–627. doi: 10.1007/s12559-014-9309-5.[Cross Ref]

6. Floridi L, Taddeo M, Turilli M. Turing’s imitation game: still an impossible challenge for all machines and some judges—an evaluation of the 2008 Loebner contest. Mind Mach. 2009;19(1):145–150. doi: 10.1007/s11023-008-9130-6.[Cross Ref]

7. Genova J. Turing’s sexual guessing game. Soc Epistemol. 1994;8:313–326. doi: 10.1080/02691729408578758.[Cross Ref]

8. Harnad S. Turing testing and the game of life: cognitive science is about designing lifelong performance capacity not short-term fooling. LSE Impact Blog 6/10. 10 June 2014. http://blogs.lse.ac.uk/impactofsocialsciences/2014/06/10/turing-testing-and-the-game-of-life/.

9. Hayes P, Ford K. Turing test considered harmful. In: Proceedings of the international joint conference on artificial intelligence, Montreal, vol. 1; 1995, p. 972–7.

10. Ingram R. DoC Professor disputes whether computer ‘Eugene Goostman’ passed Turing test. Imperial College News. 2014. http://www3.imperial.ac.uk/newsandeventspggrp/imperialcollege/engineering/computing/newssummary/news_11-6-2014-11-33-32.

11. Kurzweil R. The singularity is near. London: Duckworth; 2006.

12. Lassègue J. What kind of Turing test did Turing have in mind? Tekhnema 3/ A Touch of memory/Spring. 1996. http://tekhnema.free.fr/3Lasseguearticle.htm.

13. Marcus G. What comes after the Turing test? The New Yorker, June 9, 2014. http://www.newyorker.com/tech/elements/what-comes-after-the-turing-test.

14. Michie M. Turing’s test and conscious thought. In: Millican PJR, Clark A, editors. Machines and thought—the legacy of Alan Turing, volume 1. 1996. Oxford: Clarendon Press; 1993. pp. 27–51.

15. Moor JH. The status and future of the Turing test. In: Moor JH, editor. The Turing test—the elusive standard of artificial intelligence. Dordrecht: Kluwer; 2003. pp. 197–214.

16. Philipson A. John Humphrys Grills the Robot who passed the Turing test—and is not impressed. 2014. http://www.telegraph.co.uk/culture/tvandradio/bbc/10891699/John-Humphrys-grills-the-robot-who-passed-the-Turing-test-and-is-not-impressed.html.

17. Shah H. Deception detection and machine intelligence in practical Turing tests, PhD thesis, The University of Reading. 2011.

18. Shah H. Conversation, deception and intelligence: Turing’s question-answer game. In: Cooper SB, van Leeuwen J, editors. Alan Turing: his life and impact. Part III building a brain: intelligent machines, practice and theory. Oxford: Elsevier; 2013. pp. 614–620.

19. Shah H, Henry O. (2005) Confederate Effect in human–machine textual interaction. In: Proceedings of 5th WSEAS international conference on information science, communications and applications (WSEAS ISCA), Cancun, Mexico, ISBN: 960-8457-22-X, p. 109–14, May 11–14.

20. Shah H, Warwick K. Testing Turing’s five-minutes, parallel-paired imitation game. Kybernetes. 2010;39(3):449–465. doi: 10.1108/03684921011036178.[Cross Ref]

21. Shah H, Warwick K. Hidden interlocutor misidentification in practical Turing tests. Mind Mach. 2010;20:441–454. doi: 10.1007/s11023-010-9219-6.[Cross Ref]

22. Shah H, Warwick K, Bland I, Chapman CD, Allen MJ. Turing’s imitation game: role of error-making in intelligent thought. In: Turing in Context II, Brussels, 10–12 October, p. 31–2, 2012. http://www.computing-conference.ugent.be/file/14. Presentation available here: http://www.academia.edu/1916866/Turings_Imitation_Game_Role_of_Error-making_in_Intelligent_Thought.

23. Sloman A. Personal communication at the Royal Society. 2014.

24. Smith GW. Art and artificial intelligence. ArtEnt. Retrieved 27 March 2015.

25. Stanford University Online Encyclopedia of Philosophy. The Turing test. 2011. Retrieved 4 May 2015 from: http://plato.stanford.edu/entries/turing-test/.

26. Sterrett SG. Turing’s two tests for Intelligence. In: Moor JH, editor. The Turing test—the elusive standard of artificial intelligence (2003) Dordrecht: Kluwer; 2000. pp. 79–97.

27. Traiger S. Making the right identification in the Turing test. Mind Mach. 2000;10:561–572. doi: 10.1023/A:1011254505902.[Cross Ref]

28. Turing AM. Lecture on the Automatic Computing Engine. In: Copeland BJ, editor. The essential Turing: the ideas that gave birth to the computer age. Oxford: Clarendon Press; 1947. p. 2004.

29. Turing AM. Intelligent machinery. In: Copeland BJ, editor. The essential Turing—the ideas that gave birth to the computer age, 2004 (410–432) Oxford: Clarendon Press; 1948.

30. Turing AM. Computing machinery and intelligence. Mind. 1950;LIX(236):433–460. doi: 10.1093/mind/LIX.236.433.[Cross Ref]

31. Turing AM. “Can automatic calculating machines be said to think?” Transcript of BBC radio broadcast featuring Turing, A.M., Braithwaite, R., Jefferson, G. and Newman, M. (1952) In: Cooper SB, van Leeuwen J, editors. Alan Turing: his work and impact, 2013. Amsterdam: Elsevier; 1952. pp. 667–676.

32. Vinciarelli A, Esposito A, Andre E, Banin F, Chetouani M, Cohn J, Cristani M, Fuhrmann F, Gilmartin E. Open challenges in modelling, analysis and synthesis of human behaviour in human–human and human–machine interactions. Cogn Comput. 2015;7(4):397–413. doi: 10.1007/s12559-015-9326-z.[Cross Ref]

33. Warwick K. Implications and consequences of robots with biological brains. Ethics Inf Technol. 2010;12(3):223–234. doi: 10.1007/s10676-010-9218-6.[Cross Ref]

34. Warwick K, Shah H. Assumption of knowledge and the Chinese room in Turing test interrogation. AI Commun. 2014;27(3):275–283.

35. Warwick K, Shah H. Effects of lying in practical Turing tests. AI Soc. 2014. 10.1007/s00146-013-0534-3.

36. Warwick K, Shah H. Good machine performance in Turing’s imitation game. IEEE Trans Comput Intell AI Games. 2014;6(3):289–299. doi: 10.1109/TCIAIG.2013.2283538.[Cross Ref]

37. Warwick K, Shah H. Outwitted by the hidden: unsure emotions. Int J Synth Emot. 2014;5(1):46–59. doi: 10.4018/ijse.2014010106.[Cross Ref]

38. Warwick K, Shah H. Human misidentification in Turing tests. J Exp Theor Artif Intell. 2015;27(2):123–135. doi: 10.1080/0952813X.2014.921734.[Cross Ref]

39. Warwick K, Shah H. Can machines think? A report on Turing test experiments at the Royal Society. J Exp Theor Artif Intell. 2015

40. Warwick K, Shah H. Taking the 5th amendment in Turing’s imitation game. J Exp Theor Artif Intell. 2015

41. Warwick K, Shah H, Moor JH. Some implications of a sample of practical Turing tests. Mind Mach. 2013;23:163–177. doi: 10.1007/s11023-013-9301-y.[Cross Ref]

0 Replies to “Turing Test Research Paper”

Lascia un Commento

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *