A New Type of Turing Test

25 July 2016

In 1950, computer pioneer Alan Turing formulated his famous test for determining whether or not a computer was true artificial intelligence (AI). It involved discourse between humans and a computer, and if the humans could not tell whether they were speaking to a another person or to a machine, then the machine was “intelligent.” A neat idea, but when put in to practice it’s been found to be too easy to fake.

Over the years various improvements to the Turing test have been suggested, and one recent AI challenge used a rather nifty linguistic approach, outlined by this article in the Neurologica blog. At its core, the test, known as the Winograd schema, asks the AI to determine the referent of an pronoun in a sentence. The pronoun would be ambiguous except for one word that provides the necessary context. For example:

The trophy would not fit in the brown suitcase because it was too big.

What does it refer to, the trophy or the suitcase?

In the sentence, big can be replaced with small, which alters the context and the identity of the referent. Humans have no difficulty getting the correct answer (it refers to the trophy when the adjective is big and the suitcase when the adjective is small), but in the challenge the AI performed dismally, with only the best scores equal to chance guessing.

While I suspect that there are probably as many issues with the Winograd schema as there are with the original Turing test, it’s a neat use of language to test reasoning ability.