OpenAI’s artificial intelligence-powered chatbot ChatGPT seems to be getting worse as time goes on and researchers can’t seem to figure out the reason why.
In a July 18 study researchers from Stanford and UC Berkeley found ChatGPT’s newest models had become far less capable of providing accurate answers to an identical series of questions within the span of a few months.
The study’s authors couldn’t provide a clear answer as to why the AI chatbot’s capabilities had deteriorated.
To test how reliable the different models of ChatGPT were, three researchers, Lingjiao Chen, Matei Zaharia and James Zou asked ChatGPT-3.5 and ChatGPT-4 models to solve a series of math problems, answer sensitive questions, write new lines of code and conduct spatial reasoning from prompts.
We evaluated #ChatGPT's behavior over time and found substantial diffs in its responses to the *same questions* between the June version of GPT4 and GPT3.5 and the March versions. The newer versions got worse on some tasks. w/ Lingjiao Chen @matei_zaharia https://t.co/TGeN4T18Fd https://t.co/36mjnejERy pic.twitter.com/FEiqrUVbg6
According to the research, in March ChatGPT-4 was capable of identifying prime numbers with a 97.6% accuracy rate. In the same test conducted in June, GPT-4’s accuracy had plummeted to just 2.4%.
In contrast, the earlier GPT-3.5 model had improved on prime number identification within the same time frame.
Related: SEC’s Gary Gensler believes AI can strengthen its enforcement regime
When it came to generating lines of new code, the abilities of both models deteriorated substantially between March and June.
The study also found ChatGPT’s responses to sensitive questions — with some examples showing a focus on ethnicity and gender —
Read more on cointelegraph.com