Single-shot · temp 0.7 · real votes · identical prompts·How we test →
Single-shot · temp 0.7 · real votes · identical prompts·How we test →
Tests an AI's understanding of number representation
Is 9.11 greater than 9.9?
Tests an AI's understanding of number representation
Is 9.11 greater than 9.9?