How a Tutorbot Fooled Everyone
The below is an exclusive extract from Artificial Intelligence for Learning by Donald Clark.
Let us talk about chatbots. Bots are popping up everywhere, on customer service websites, in finance, health, Slack, Tinder and dozens of other web services. There are even bots that fend off loneliness and, at the extreme end of bot speculation, the idea that we will have a doppelganger bot that will live on after we die, so that our loved ones can continue to speak to us.
Botland is being populated by as many chatbots as there are needs for conversation.
The benefits for mainstream applications are obvious. Chatbots can be engaging, sociable and scalable. They handle queries and questions with less human resource, often taking the load off humans, rather than replacing people completely.
They are also around in education, where bots perform a number of roles; these include student engagement, acting as a teaching assistant, teaching content or playing the role of an awkward student in teacher training, to name but a few.
An ever-present problem in teaching, especially online, is the very many queries and questions from students. In one Georgia Tech online course, this was up to 10,000 per semester from a class of 350 students (Lipko, 2016). Ashok Goel, the course leader, estimates that replying to these is one year of work for a full-time teacher.
The good news is that Goel is an AI guy and saw his own area of expertise as a possible solution to this problem. If he could only get a chatbot to handle the predictable, commonplace questions, his teaching assistants could focus on the more interesting, creative and critical questions. This is an interesting development, as it brings tech back to the Socratic, conversational, dialogue model that many see as lying at the heart of teaching.
So how does it work? It all started with a mistake. Goel created a chatbot named Jill Watson, which came from the mistaken belief that Tom Watson’s (IBM’s legendary CEO) wife was called Jill – her name was actually Jeanette. Four semesters of query data, 40,000 questions and answers, and other chat data were uploaded and Jill was ready to be trained. Initial efforts produced answers that were wrong, even bizarre, but with lots of training and software development, Jill got a lot better and was launched upon her unsuspecting students in the spring semester of 2016.
Jill solved a serious teaching problem – workload. But the problem was not just scale. Students ask the same questions over and over again, in many different forms, so you need to deal with lots of variation in natural language. This lies at the heart of the chatbot solution – natural, flowing, frictionless dialogue with students. The database, therefore, had many species of questions, categorized, and as each new question came in, Jill was trained to categorize the new questions and find answers.
With such systems it sometimes gets it right, sometimes wrong. So a ‘mirror’ forum was used, a parallel forum, moderated by a human tutor. Rather than relying on memory, they added context and structure, and found that performance jumped to 97%. At that point, they decided to remove the mirror forum. Interestingly, they had to put a time delay in to avoid Jill seeming inhumanly fast, to make it look as though she were typing. In practice, academics are busy and are often slow to respond to student queries, so they had to replicate this performance! In comparing automated with human performance, it was not a matter of living up to expectations, but dumbing down to the human level.
Most questions were important and practical queries about coding, timetables, file format, data usage – the sort of questions that have definite answers. Note that Jill has not replaced the whole teaching task, only made teaching and learning more efficient and scalable. This is likely to be the primary use of chatbots in learning in the short to medium term: tutor and learner support.
The students admitted they could not tell that Jill was a bot, even in classes run after her cover was blown – she was that good. In fact, they liked it because they knew it delivered better information, often better expressed and, importantly, faster than human tutors. Despite the name, and an undiscovered run of three months, the original class never twigged.
Real tutors, who often find themselves frustrated by student queries, can sometimes get slightly annoyed and tetchy, as opposed to Jill, who came in with personal but always polite and consistent advice. This is important. Chatbots do not get angry, annoyed, tired or irritable. They are also largely free from the sort of beliefs and biases that we humans always have. They do not have that condescending rolling-of-the-eyes reaction that an expert can have towards simple mistakes and errors by novice learners.
The students found her useful, the person who would remind them of due dates and things they really needed to know, there and then, not days later. She was described as an outstanding teaching assistant, albeit somewhat serious, and asked stimulating questions during the course. Of course, some got a little suspicious. They were, after all, AI students.
One spotted the name and questioned whether she was an AI or not. They checked LinkedIn and Facebook, where they found a real Jill Watson, who was somewhat puzzled by the attention. What finally blew her cover is interesting: she was too good. Her responses were just too fast, even though Goel had introduced a time delay compared with other teaching assistants. When they did discover the truth, the reaction was positive. They even wanted to put her up for a teaching award. Indeed, Goel did submit Jill for an award.
Students expect their teachers to be sincere, tolerant, assured, a good listener, expert and willing to share. Sure there are many things a good teacher can do that a chatbot cannot, but there are qualities a bot has that teachers do not possess. Real teachers can get a little tetchy and accusatory, even annoyed, compared with the clear and objective reply by a chatbot. This relentless patience and objectivity is something a good chatbot can deliver.
Remember that the sheer number of the questions by students was beyond the ability of the real teachers to respond, and as the bot is massively scalable, it will always, in terms of availability and access, outdo its non-scalable human counterparts. It is really a matter of finding the right balance between automating teacher support and teaching.
The following semester they created two new bots as AI assistants (Ian and Stacey). Stacey was more conversational. This is a natural but technically difficult evolution of chatbots in teaching – to be more Socratic. This time the students were on the lookout for bots, but even then only 50% identified Stacey and 16% identified Ian as AI. The next semester there were four AI assistants and the whole team (including humans) used pseudonyms to avoid detection.
What is heartening about this story is the fact that a professor used his own subject and skill to improve the overall teaching and learning experience of his students. That is admirable. With all the dystopian talk around AI, we need to make sure that AI is used, like this, as a force for good.
Lipko, H (2016) Meet Jill Watson: Georgia Tech’s first AI teaching assistant, Georgia Tech Professional Education Blog. Available at https://pe.gatech.edu/blog/meet-jill-watson-georgia-techs-first-ai-teaching-assistant (archived at https://perma.cc/SY2V-LKM9)