This is the nature of the razor-thin path of scientific reality:
there are a limited number of ways to be right,
but an infinite number of ways to be wrong.
Stay on it, and you see the world for what it is.
Step off, and all kinds of unreality become equally plausible.
—Phil Plait
Two stories about artificial intelligence recently caught my attention. The first, out of the University of California, Irvine’s Digital Learning Lab, examined how successful ChatGPT could be at grading English and history essays when compared to an actual teacher. The second, an editorial about the AI revolution in general, exposited on the very practical and financial boundaries all AI technologies are rapidly finding themselves running up against. Together, I found these stories causing me to revisit some themes from my very first posting about AI, and as I reflected more on all of these items, some shared threads between the two stories rapidly became apparent that I want to discuss here today.
But first, a quick synopsis of each article.
In the story about grading, researcher Tamara Tate and her team sought to compare ChatGPT’s ability to score 1,800 middle school and high school English and history essays against the ability of human writing experts to do so. Their motive was to see if ChatGPT could help improve writing instruction by allowing teachers to assign more of it without increasing their own cognitive load. If, for example, teachers could use AI “to grade any essay instantly with minimal expense and effort,” then more drafts could be assigned, thereby enabling the quality of student writing skills to improve.
What they found was a lot of variability, with ChatGPT’s scores matching the human scores between 76% and 89% of the time, which Tate summarized as meaning that ChatGPT was “roughly speaking, probably as good as an average busy teacher [and] certainly as good as an overburdened below-average teacher. [But that] ChatGPT isn’t yet accurate enough to be used on a high-stakes test or on an essay that would affect a final grade in a class.” Furthermore, she cautioned that “writing instruction could ultimately suffer if teachers delegate too much grading to ChatGPT [because] seeing students’ incremental progress and common mistakes remain important for deciding what to teach next.” Bottom line, as the title of the article states, the idea “needs more work.”
In the editorial about the AI revolution, technology columnist, Christopher Mims makes the strong case that the pace of AI development is hitting three walls: a rapidly slowing pace of development, mounting prohibitive costs, and what I will call the productivity boundary. In terms of development, Mims points out that AI works:
by digesting huge volumes of [data], and it’s undeniable that up to now, simply adding more has led to better capabilities. But a major barrier to continuing down this path is that companies have already trained their AIs on more or less the entire internet, and are running out of additional data to hoover up. There aren’t 10 more internets’ worth of human-generated content for today’s AIs to inhale.
As for costs, training expenses are in the tens of billions of dollars while revenues from AI are, at best, in the billions of dollars—not a sustainable economic model. Finally, the evidence is mounting that AI does not quite boost productivity the way its evangelists have touted because “while these systems can help some people do their jobs, they can’t actually replace them.” Someone still has to check for AI hallucinations, and “this means they are unlikely to help companies save on payroll.” Or to put it another way, “self-driving trucks have been slow to arrive, in part because it turns out that driving a truck is just one part of a truck driver’s job.”
Which brings me back to why I think these two articles share common threads of thought and what made me revisit my original posting about AI. Both articles obviously point to AI’s limitations, and the grading one is simply a specific example of the “productivity boundary” Mims discusses. Both articles have a cautionary tone about AI being the be-all-end-all solution to “all life’s problems” the way its many proselytizers want to claim it can be, and the grading one even brings up the economics of AI as it warns about schools jumping on the proverbial bandwagon and purchasing AI grading systems too quickly.
But it was the analogy of the truck driver that caused all the metaphorical gears in my head to click into place. English and history teachers don’t just teach writing, and when they grade the writing, it is not just the quality of the writing they are grading. They are not “just driving the truck.” I am confident that ChatGPT could be a marvelous tool for catching run-on and incoherent sentences or for catching disorganized paragraphs and poor thesis statements, and if using it for that would enable an already overburdened teacher the chance to get a few additional drafts for an essay accomplished in their class, I’m on board. The only way you get better at writing is to write.
However, what ChatGPT cannot catch (and here is where I suspect at least some of those discrepancies in the percentages found in the grading research come from) is the quality, the originality! of thought and ideas that a given piece of writing expresses. Only the human teacher can do that because only the human teacher has actual intelligence as defined by biology: the capacity to use existing knowledge to solve an original, unique, and novel problem. No AI can solve a problem it hasn’t already seen—which is part of what Mims hints at with his remark about “10 more internets;” only a human mind could create them—and that is why we will still need the human teacher to do the final grading.
Which brings me back to some of the themes I first addressed in Catechism and AI. In looking back at that essay (where I first wrote about this misuse of the word “intelligence” in computer science), I realized that what the technological breakthroughs since then have made possible is the deepening of the illusion of intelligence. Once something like ChatGPT could be trained on the entire internet, pretty much every prior human answer to a problem was now part of the algorithm, and so when you present it with a problem that is novel to you, it appears like it can solve it on its own. It appears intelligent. And since problems truly novel to everyone who has ever lived grow exponentially fewer each day, AI can appear intelligent quite a bit of the time.
However, present it with a problem that is novel to both you and the AI and suddenly you get one of those hallucinations Mims points out you need an actual human to fix. That remains the limitation of AI: it cannot handle the truly novel, the genuinely unique. Nor can it create it. As I’ve written before, AI may be capable of producing a mimic of a Taylor Swift song, but it cannot produce an actual Taylor Swift song. The challenge is in remembering that the mimic isn’t really Taylor Swift.
Again, here is where the technological breakthroughs since I first wrote about AI have deepened the illusion. The content generated by AIs such as ChatGPT may look novel because that particular arrangement of words, images, etc. happens to look novel to you. But somewhere, at some time, some human mind already put those same words, images, etc. together; some human mind created. And you are just now on the receiving end of a tool that we can now train on what every human mind has created to date for the past 10,000 years. The ultimate catechism! And a lot of prior human creativity with which to fool someone. We see a parallel in the development of the performance of magic shows: one hundred years ago, we only had the technology to create the illusion of a woman sawn in half; forty years ago, David Copperfield has the tools to make the Statue of Liberty appear to disappear. None of it is any less illusory; it just gets harder to tell.
And where that fact may grow increasingly problematic is in the realm of another theme from my earlier writing: interpersonal relationships. When I first wrote about AI five years ago, Her was only a movie; now it’s a reality. For a monthly subscription, I can have the AI companion of my choice (romantic, platonic, and/or therapeutic), and have “someone” in my life who will never push back on me. Add DoorDash, Amazon, and Netflix, and I could spend the rest of my life once I retire (or get a work-from-home internet job) in my own solipsistic bubble without any need for any direct human contact ever again. Not gonna happen as they say, but the fact that I can write those words should be sobering (and shuddering) to anyone reading them. Because if we are ultimately successful at reducing our most basic humanity to an illusion, climate change and the next pandemic are going to be the least of our concerns.
Yet if Christopher Mims is correct, then AI may rapidly be approaching its illusory limits, and if Tate and her crew are correct, then watchful use of AI may help teachers give their students more practice improving their writing skills—and therefore their thinking skills—while not adding to their grading loads. So perhaps there is cause for optimism. The key, I think, is always to remember that the “I” in AI is—at least for now—a biological falsehood, and what I now realize was missing from my earlier work on AI is the necessary emphasis on novelty being at the core, the essence of what it means to be intelligent. That doesn’t mean the CS folks may not eventually pull off an algorithm that truly can create. But for now, we do not live in that world, and we just need to keep reminding ourselves regularly of the reality of that fact.
References
Barshay, J. (May 20, 2024) AI Essay Grading Could Help Overburdened Teachers, But Researchers Say It Needs More Work. KQED/Mind Shift https://www.kqed.org/mindshift/63809/ai-essay-grading-could-help-overburdened-teachers-but-researchers-say-it-needs-more-work.
Mims, C. (May 31, 2024) The AI Revolution Is Already Losing Steam. The Wall Street Journal. https://www.wsj.com/tech/ai/the-ai-revolution-is-already-losing-steam-a93478b1?mod=wsjhp_columnists_pos1.
