Different AI labs have different priorities. OpenAI has traditionally focused on consumer applications, while Anthropic tends to target enterprises. Elon Musk’s xAI, however, has been placing particular emphasis on video-game support, a focus highlighted in a recent report by Business Insider’s Grace Kay.
According to the report, xAI delayed a model release last year for several days because Musk was dissatisfied with how the chatbot answered detailed questions about the video game Baldur’s Gate. High-level engineers were reportedly pulled from other projects to improve the responses before launch.
For engineers expecting to tackle fundamental AI challenges, the task was a frustrating diversion. Yet, it raises an obvious question: Did Musk get the gaming guidance he wanted?
BaldurBench: Testing xAI Against the Competition
To answer that, RPG enthusiast Ram Iyer devised a set of five general questions about Baldur’s Gate and tested them against xAI’s Grok model and three major competitors: ChatGPT, Claude, and Gemini. The experiment, dubbed “BaldurBench,” compared both accuracy and style across the models.
The results were encouraging:
- Grok performed well, offering detailed answers with gamer-specific jargon like “save-scumming” and “DPS,” alongside structured tables and theorycraft analysis.
- ChatGPT favored bulleted lists and concise sentence fragments.
- Gemini highlighted key points using bold formatting, making answers visually clear.
- Claude, surprisingly, focused on the player experience, cautioning against spoilers and encouraging flexible party compositions with a note to “just play what sounds fun to you.”
Most differences between the models were stylistic rather than factual. Given xAI’s targeted effort to improve gaming knowledge, the parity with other top models indicates that Musk’s push succeeded—but perhaps at a steep operational cost for engineers.
Takeaway
While the story underscores Musk’s hands-on management style, it also demonstrates that AI models can deliver specialized knowledge when a company prioritizes it. Whether this level of game-specific focus will continue remains to be seen, but for now, Grok is capable of holding its own in Baldur’s Gate.