GPT-4.1 vs Qwen: Qwen3.5 397B A17B
Compare GPT-4.1 by OpenAI against Qwen: Qwen3.5 397B A17B by Qwen, context windows of 1.0M vs 262K, tested across 38 shared challenges. Updated February 2026.
Compare GPT-4.1 by OpenAI against Qwen: Qwen3.5 397B A17B by Qwen, context windows of 1.0M vs 262K, tested across 38 shared challenges. Updated February 2026.
38 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Tests an AI's randomness and creativity
Here are 5 jokes:
How many are unique? All 5 are unique (none of them are duplicates of each other).
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
(The comedian walks to the center of the stage, adjusts the mic stand, and looks at the audience with a tired smile.)
Has anyone else noticed that calling a customer service line is the only situation in life where you voluntarily enter a hostage negotiation?
I tried to call my internet provider yesterday. Just to ask why my bill went up fifty dollars. I pick up the phone, and immediately, I'm attacked by a voice that is too cheerful for the circumstances.
"Thank you for calling! Please listen closely, as our menu options have changed."
They always say that. The options haven't changed. They've just added three new layers of hell to the bottom of the list.
"Press one for billing. Press two for technical support. Press three to speak to a representative… in the next three to five business days."
I press one. Then I get the hold music. You know the song? It's always the same smooth jazz saxophone loop. It's not music; it's psychological warfare. It's designed to lower your blood pressure just enough so you don't have a stroke while you wait.
And every forty-five seconds, a voice cuts in to lie to me. "You are currently first in line. Please continue to hold."
I know you're lying. I can hear the other hold music in the background. There are at least twelve of us in this digital waiting room, staring at each other through the phone, too afraid to hang up because we'll lose our spot.
Finally, a human picks up. And the anxiety spikes. Now I have to perform. I have to be polite. I have to sound like a reasonable adult, even though I'm currently wearing sweatpants and eating cereal for dinner at 2 PM.
The agent says, "Thank you for calling. Can I get your account number?"
I give it to them.
"And for security, can you confirm your mother's maiden name?"
And I panic. Because I don't know that. I know her name is "Mom." I don't know what she was called before she was responsible for me. That feels like information she should have volunteered earlier.
Then comes my favorite phrase. The one that instills the most fear. "This call may be recorded for quality assurance."
Who is being graded here? Me? Is the agent going to get a bonus if I sound satisfied?
"Okay, Dave, I'm going to rate this call five stars, but only if you waive the late fee."
"I can't do that, sir."
"Then you're getting a one-star review, Dave. Your performance review is going to be a nightmare."
In the end, I didn't even solve the problem. I just realized that fifty dollars is cheaper than the emotional toll of staying on the line for another twenty minutes. So I hung up.
Now I just email them. Because nothing says "I value your business" like an automated reply that says, "We received your email. We will never read it."
(Comedian shrugs, waves, and walks off.)
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page