[ad_1]
Google recently unveiled Gemini, the new chatbot challenger to ChatGPT. However, initial third-party benchmarking shows that Gemini’s capabilities currently lag behind competing AI systems, particularly OpenAI’s GPT-3 and GPT-4 models.
Researchers from Carnegie Mellon University and BerriAI tested the capabilities of Google’s Gemini Pro with OpenAI’s GPT-3.5 Turbo and GPT-4 Turbo, as well as the new open source chatbot Mixtral. On questions testing knowledge, reasoning, math and other academic subjects, Gemini solved fewer problems overall compared to GPT-3.5.
On a multiple-choice test with 57 questions, Gemini scored lower than GPT-3.5 and significantly lower than GPT-4. The study found that Gemini struggled significantly with longer, multi-step queries and complex logic problems, while GPT-4 maintained strong performance even with the increased complexity of the queries.
Several shortcomings hampered Gemini’s evaluations. Gemini’s responses indicated a preference for final choices when presented with A/B/C/D multiple choice options, indicating that it is not optimized for such questions. Overly aggressive content blocking prevented Gemini from answering certain questions, which negatively impacted scores.
Gemini lagged behind GPT-3.5 and GPT-4 by a wide margin in math, programming and web navigation tests. One bright spot was the excellence in word rearrangement and symbol ordering compared to the GPT models. Gemini also outperformed all chatbots at generating non-English text, but frequently blocked responses between 10 language pairs.
Google disputed the study results, citing internal research showing that Gemini Pro outperforms GPT-3.5. It highlighted the upcoming 2024 release of Gemini Ultra, which Google claims will exceed GPT-4 in early internal benchmarking.
For the time being, OpenAI retains the dominant position in chatbot applications for consumers and businesses. With expectations high for Gemini, its mediocre performance in third-party tests could be a harbinger of trouble if the 2024 Gemini Ultra release doesn’t show substantial gains. For most use cases, experts currently recommend OpenAI’s GPT-4 as the best publicly accessible AI system.
The study also found that OpenAI’s GPT-3.5 outperformed the new open source chatbot Mixtral in almost all test categories. However, Gemini surpassed Mixtral in the evaluations, indicating that Google’s model is promising compared to other non-OpenAI systems.
As AI rapidly advances, robust testing and benchmarking by independent researchers provides an important check on claims made by tech giants like Google. While tech companies may tout the capabilities of their latest AI developments, rigorous third-party validation is needed to objectively measure strengths and weaknesses.
If Gemini Ultra doesn’t deliver on its promises next year, Google could fall further behind the competition. For an AI system intended to demonstrate Google’s prowess in natural language processing, disappointing performance compared to freely available chatbots like GPT-3.5 and GPT-4 risks tarnishing Gemini’s reputation. As AI development accelerates at a dizzying pace, tech giants must deliver on their lofty AI promises or risk losing ground to upstart rivals.