Think before you ask Artificial Intelligence, It could fail a math test

Artificial Intelligence (AI) has long been hailed as a technological marvel, capable of feats that surpass human abilities. However, when it comes to math problems, the story is not as straightforward. Mathematics, especially intricate areas like geometry, demands sophisticated reasoning skills.

While some might assume that AI’s technological superiority automatically makes it more intelligent than humans, the reality is more nuanced. Artificial Intelligence systems, including generative models like ChatGPT, face significant hurdles when confronted with mathematical puzzles.

A 17-year-old fashion design student from China, Jiang Ping, achieved an impressive feat by outperforming both her peers and artificial intelligence (AI) systems in a prestigious math contest. Jiang, who attends a vocational school in Jiangsu province, secured her place among the 801 global finalists in the Alibaba Global Math Competition. Notably, no AI teams made it to the finals.

The competition featured questions on applied mathematics, probability, and algebra and lasted 48 hours in the qualifying round. Jiang’s exceptional performance caught the attention of Chinese universities, with Zhejiang University congratulating her on social media. Her dedication to exploring advanced math was evident as she tackled complex problems during the eight-hour final test.

Performance of Artificial Intelligence Language Models in Solving Math Problems

The story took an unexpected turn when Artificial Intelligence faced a math problem on the Chinese singing reality show “Singer 2024.” The winner was determined by online votes, and Sun Nan received 13.8% of the votes, declaring him the champion. However, confusion arose when US singer Chanté Moore’s 13.11% score appeared higher than Nan’s. In search of clarity, everyone turned to AI.

Several large language models (LLMs) and chatbots were consulted. Moonshot AI’s chatbot Kimi and Baichuan’s Baixiaoying initially stumbled but corrected themselves. Alibaba’s Qwen LLM used a Python Code Interpreter to calculate the answer, while Baidu’s Ernie Bot took six steps to reach the correct result. ByteDance’s Doubao LLM cleverly sidestepped the question, emphasizing that $9.90 is more than $9.11. In contrast, OpenAI’s advanced LLMs—GPT-4o, Claude 3.5 Sonnet, and Mistral AI—correctly identified 9.11 as the larger number.

As we await the results, Jiang Ping’s underdog victory has captivated millions, proving that anyone can be a dark horse in life. Fans have even visited her parents’ home, bearing gifts of support.

The Common Struggle

Wu Yiquan, a computer science researcher at Zhejiang University in Hangzhou, sheds light on AI’s mathematical capabilities. He asserts that LLMs often falter in math, a common limitation. These models predict answers by drawing from their training data, lacking true mathematical understanding. Some LLMs perform well due to memorization of similar questions during training, but their inherent mathematical prowess remains limited.

While Artificial Intelligence continues to astound us, its mathematical abilities are a work in progress. As we advance, both humans and machines must adapt to create a symbiotic relationship where computational thinking meets human intuition.

You might also be interested in - ‘Godfather of AI’ warns of human-extinction threats from artificial intelligence