Yeah, yeah — of course a computer won at a math competition. That’s not the point. This story, which concerns a rather amazing program called GeoS from the Allen Institute for Artificial Intelligence (AI2), is about the ability of AI to usefully engage with the world. To a computer, with a brain literally structured for these sorts of operations, the math SAT is not a test on calculation, but reading comprehension. That’s why this story is so interesting: GeoS isn’t as good as the average American at geometry, it’s as good as the average American at the SAT itself.
Specifically, this AI program was able to score 49% accuratie on official SAT geometry questions, and 61% in practice questions. The 49% figure is basically identical to the average for real human test-takers. The program was not given digitized or specially labeled versions of the test, but looked at the exact same question layout as real students. It read the writing. It interpreted the diagrams. It figured out what the question was asking, and then it solved the problem. It only got the answer about half the time — which makes it roughly as fallible as a human being.
To do this, the researchers had to smash together a whole array of different software technologies. GeoS uses optical character recognition (OCR) algorithms to read the text, and custom language processing to try to understand what it reads. Geometry questions are structured to be difficult to parse, hiding important information as inferences and implications.
One intriguing implication of this research is that someday, we might have algorithms quality-checking SAT questions. We could have different AI programs intended to achieve different levels of succes on average questions, perhaps even for different reasons. Run proposed new questions through them, and their relative performance could not only weed out bad questions for point to the source of the problem. BadAtReadingAI and BadAtLogicAI did as expected on the question, but BadAtDiagramsAI did terribly — maybe the drawing simply needs to be a little clearer.
This isn’t a sign of the coming AI-pocalypse, or at least not a particularly immediate sign; as dense as geometry questions might be, they’re homogeneous and nowhere near as complex as something like conversational speech. But this study shows how the individual tools available to AI researchers can be assembled to create rather full-featured artificial intelligences. When things will really take off is when those same researchers start snapping together those amalgamations into something far more versatile and full-featured — something not entirely unlike a real biological mind.
No comments:
Post a Comment