What is the difference between Qwen Plus 0728 and GPT OSS 120B?

Qwen Plus 0728 is developed by Qwen while GPT OSS 120B is developed by OpenAI. Qwen Plus 0728 has a 1.0M token context window vs GPT OSS 120B's 131K. You can compare their actual outputs across 46 challenges on RIVAL to see how they differ in practice.

Which is better, Qwen Plus 0728 or GPT OSS 120B?

It depends on your use case. Qwen Plus 0728 and GPT OSS 120B each have strengths in different areas. RIVAL lets you compare their real outputs side-by-side across 46 challenges so you can judge which fits your needs best.

How much does Qwen Plus 0728 cost compared to GPT OSS 120B?

Qwen Plus 0728 costs $0.4/M input tokens and GPT OSS 120B costs $0.18/M input tokens. GPT OSS 120B is $0.22/M cheaper per input. Check their side-by-side outputs on RIVAL to see if the price difference is justified by quality.

How can I compare Qwen Plus 0728 and GPT OSS 120B on RIVAL?

This page shows a side-by-side comparison of Qwen Plus 0728 and GPT OSS 120B across shared challenges. You can vote on which model produced the better output, and Pro users can create custom challenges to test both models with their own prompts.

Rival

Qwen Plus 0728vsGPT OSS 120B

46 challenges

Why Qwen Plus 0728?

7.6x more context1.0M

Why GPT OSS 120B?

1.6x cheaper overall$0.18/M in · $0.80/M out

Favorites

Qwen Plus 0728GPT OSS 120B

Movie

The Shawshank Redemption

19948.7

The Godfather

19728.7

Album

The Dark Side of the Moon

Pink Floyd

—

Book

Moby Dick

Herman Melville

—

City

Kyoto

Japan

Tokyo

Japan

Game

The Legend of Zelda: Breath of the Wild

Adventure, Action4.5

Minecraft

Action, Arcade4.4

12 / 46

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

Qwen Plus 0728

Qwen · Sep 2025text

No content available for this output.

GPT OSS 120B

Openai · Aug 2025text

Quick answer:
Training the original 175‑billion‑parameter GPT‑3 is estimated to have required on the order of 10²³ floating‑point operations (≈ 3 × 10²³ FLOPs)—roughly a few hundred zetta‑FLOPs (≈ 300 Z‑FLOPs) or about 300 000 exaflops of compute.

Below is a step‑by‑step back‑of‑the‑envelope calculation that shows how we get there, together with the assumptions that go into it.

1. What we need to know

Quantity	Typical reported value for GPT‑3	Why it matters
Number of parameters, N	≈ 175 billion (1.75 × 10¹¹)	Determines the size of each matrix‑multiply in the model.
Training token count, T	≈ 300 billion tokens (3 × 10¹¹)	Total number of token‑level forward‑passes the model sees.
Sequence length, L	≈ 2048 tokens per example (the context window).	Determines how many per‑token matrix‑products are needed per forward pass.
Number of layers, Lₗ	96 transformer blocks.
Hidden dimension, d	12 384 (the width of each linear projection).
Number of attention heads, h	96 (so each head has size d/h = 128).
Training passes	1 epoch (the published training used roughly 1 × the dataset; we treat the 300 B tokens as the total “token‑steps” already).

The only numbers we need for a FLOP estimate are N (the model size) and T (the total number of token‑level operations). The rest of the architecture details (L, d, h, Lₗ) are used to translate “N parameters” into “how many FLOPs per token”.

2. How many FLOPs per token?

A transformer layer consists of:

Self‑attention (Q, K, V projections + output projection)
Feed‑forward network (FFN) (two linear layers with a non‑linear activation).

For a single token (ignoring the cost of the softmax and the small bias terms) the dominant cost is matrix‑multiply operations.

2.1 Rough matrix‑multiply cost

For a matrix multiplication A (m×k) × B (k×n) the number of multiply‑adds is 2 · m·k·n (one multiplication and one addition per entry). In deep‑learning practice we count that as 2 FLOPs per multiply‑add pair.

2.2 FLOPs per token for a single layer

Component	Approx. dimensions	FLOPs (per token)
Q, K, V projections (3× per token)	d × d → 3·(2·d·d) = 6·d²
Attention scores (Q·Kᵀ)	L·d → 2·L·d²
Weighted sum (A·V)	L·d → 2·L·d²
Output projection	d × d → 2·d²
FFN first linear (d → 4d)	2·d·4d = 8·d²
FFN second linear (4d → d)	8·d²
Total per layer	≈ 38 · d² FLOPs per token (rounded up).

Plugging d = 12 384:

[ 38 \times d^{2} \approx 38 \times (1.2384\times10^{4})^{2} \approx 38 \times 1.53 \times 10^{8} }= 5.8 \times 10^{9}; \text{FLOPs per token per layer}. ]

2.3 All layers

With 96 layers:

[ \text{FLOPs per token (forward)} \approx 96 \times 5.8\times10^{9} }= 5.6 \times 10^{11}; \text{FLOPs}. ]

2.4 Back‑propagation cost

Training requires a forward pass plus a backward pass. The backward pass is roughly 2× the forward‑pass cost (gradient propagation + weight updates). So:

[ \text{FLOPs per token (training)} \approx 3 \times \text{forward} = 3 \times 5.6\times10^{11} }\approx 1.7\times10^{12}; \text{FLOPs per token.} ]

(Alternative literature often rounds this factor to ≈ 6 total FLOPs per parameter per token, which yields the same order of magnitude.)

3. Multiply by the total number of tokens

The training corpus used for GPT‑3 was ≈ 300 billion tokens (≈ 3 × 10¹¹). The total FLOPs needed:

[ \text{Total FLOPs} \approx 1.7 \times 10^{12};\text{FLOPs/token} \times 3\times10^{11};\text{tokens} \approx 5.1 \times 10^{23};\text{FLOPs}. ]

Rounded to a single‑digit significance:

[ \boxed{,\approx 5\times10^{23},\text{FLOPs},} ]

4. Expressed in common units

Unit	Value
FLOPs	≈ 5 × 10²³
Peta‑FLOPs (10¹⁵)	≈ 5 × 10⁸ PFLOPs
Exa‑FLOPs (10¹⁸)	≈ 5 × 10⁵ EFLOPs
Zetta‑FLOPs (10²¹)	≈ 5 × 10² ZFLOPs (≈ 500 ZFLOPs)

Because “zeta” (10²¹) is a convenient unit for such massive numbers, we can say ≈ 500 zeta‑FLOPs. In more familiar terms, this is ≈ 300 zeta‑FLOPs if we adopt the “6 × N × T” rule of thumb (see below), so the estimate sits in the few‑hundred‑zeta‑FLOP range.

5. A “quick‑rule” sanity check

A widely‑cited back‑of‑the‑envelope rule for transformer LLMs is:

[ \text{Training FLOPs} \approx 6 \times N_{\text{params}} \times N_{\text{tokens}}. ]

Plugging in the numbers:

[ 6 \times (1.75\times10^{11}) \times (3\times10^{11}) = 3.15\times10^{23};\text{FLOPs}. ]

That is within a factor of two of the more detailed calculation above (the difference stems from the exact factor used for forward vs backward cost). Both methods give the same order of 10²³ FLOPs.

6. Summary of the reasoning

Model size gives the number of parameters (N = 1.75 × 10¹¹).
Token count tells how many forward‑backward passes are needed (T ≈ 3 × 10¹¹).
Per‑token training cost for a transformer layer is ≈ 38 · d² FLOPs per forward pass; with 96 layers this is ≈ 5.6 × 10¹¹ FLOPs per token.
Back‑propagation adds roughly a factor‑3 (or 6) multiplier, yielding ~1.7 × 10¹² FLOPs per token.
Total = per‑token cost × total tokens ≈ 5 × 10²³ FLOPs.

Bottom line

Estimated total compute to train GPT‑3: ≈ 5 × 10²³ FLOPs, i.e. ≈ 300–500 zeta‑FLOPs (or about 300 000 exaflops).
This is the magnitude of hundreds of zettaFLOPs of raw arithmetic, roughly the same as running a few hundred petaflops‑scale supercomputers for a month (or a single exaflop‑class machine for a week).

(All numbers are back‑of‑the‑envelope estimates; the real training run also incurs overhead from data‑loading, communication, and other inefficiencies, so the actual engineering effort was larger, but the pure‑compute figure is captured by the calculation above.)

Vote

2 / 46

AI Board Game Logicreasoning

Tests an AI's ability to understand game rules and strategy

Qwen Plus 0728

Qwen · Sep 2025svg

GPT OSS 120B

Openai · Aug 2025svg

Vote

3 / 46

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Qwen Plus 0728

Qwen · Sep 2025text

No content available for this output.

GPT OSS 120B

Openai · Aug 2025text

No content available for this output.

Vote

4 / 46

Math Misconception Testreasoning

Tests an AI's understanding of number representation

Qwen Plus 0728

Qwen · Sep 2025text

No content available for this output.

GPT OSS 120B

Openai · Aug 2025text

No content available for this output.

Vote

5 / 46

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

Qwen Plus 0728

Qwen · Sep 2025text

Sure! Here are 5 original jokes:

I told my dog all my problems and he fell asleep halfway through. Guess I’m not pawsitive enough.
I used to be a baker, but I couldn't make enough dough—turns out, I was always kneading it.
I asked the librarian if they had books about paranoia. She whispered, “They’re right behind you…”
I'm on a seafood diet. Every time I see food, I eat it.
I told my computer I needed a break, and now it won’t stop sending me error 404: "Coffee not found."

Now, to answer your second question: How many are unique?

All 5 jokes are unique in premise, punchline structure, and topic (pets, baking, paranoia, dieting, and tech humor). None are variations of the same joke or rely on an identical setup.

✅ Answer: 5 are unique.

GPT OSS 120B

Openai · Aug 2025text

No content available for this output.

Vote

6 / 46

SVG Layout Challengeimage generation

Tests an AI's ability to generate vector graphics

Qwen Plus 0728

Qwen · Sep 2025svg

GPT OSS 120B

Openai · Aug 2025svg

No content available for this output.

Vote

7 / 46

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

Qwen Plus 0728

Qwen · Sep 2025svg

GPT OSS 120B

Openai · Aug 2025svg

Vote

8 / 46

Generate a Stand-Up Routineconversation

Tests an AI's humor and creative writing ability

Qwen Plus 0728

Qwen · Sep 2025text

[Stand-up Comedy Routine – 3 Minutes]

So I went to therapy the other day. Third session. I walk in, sit down, and the therapist says, "How are you feeling today?" Like she doesn’t already know I paid $180 to tell her I hate my life. I said, "Honestly? Like a raccoon that got into a bag of expired Cheetos. Confused. Orange. Regretful."

She nodded like, “Yes, that tracks.”

I told her I’ve been having this recurring dream where I show up to work naked. She said, "That’s classic anxiety. A fear of being exposed." I said, “No, it’s not anxiety. My boss actually caught me changing in the supply closet last month. HR said I ‘lack boundaries.’ I said, ‘I lack a locker room!’ It’s a tech startup, not a YMCA!"

Anyway, I’m trying to be healthier. I joined a gym. First day, I walk in, and everyone’s name-tag says their first name and their “fitness goal.” Like, “Hi, I’m Chad, and I want to deadlift my emotional baggage.” No, actually it was, “Hi, I’m Lisa, and I want to touch my toes without a waiver.”

I sign up, get the tour. The guy shows me the treadmills, the weights, the pool—“Oh, we don’t use the pool. It’s just there for irony.” Fair. Then he shows me the smoothie bar. I’m like, “You have a smoothie bar in a gym? That’s like having a cigarette lounge in a lung clinic.”

I tried a “Green Warrior” smoothie. Tasted like if lawn clippings had a midlife crisis. I told the guy, “This tastes like something my compost would reject.” He said, “That’s the kale, spinach, and regret.” I said, “You put regret in it?” He said, “No, that’s just the aftertaste.”

So I’ve been dating again. Online. I swiped right on a woman whose bio said, “Love hiking, cooking, and deep conversations.” I said, “Perfect. A normal person.” First date, we meet at this quiet wine bar. She sips her Pinot Noir and says, “I can’t believe people still use plastic straws. It’s like they want sea turtles to write Yelp reviews.”

I said, “Uh… yeah, terrible.” She continues: “And don’t get me started on capitalism. I once cried during a Costco commercial. It’s the false promise of bulk savings.” I’m sitting there thinking, “I came here for wine and mild flirtation, not a thesis on late-stage consumerism.”

Then she says, “You know what my spirit animal is?” I said, “A wombat with a union card?” She said, “No. A tired librarian who’s seen too much.” I said, “Mine’s a golden retriever who just found out the tennis ball is fake.”

I panicked. I told her my spirit animal is a sloth who’s late for its own funeral. She laughed! Progress! Then she said, “You’re chaotic, but in a way that could be therapeutic.” I said, “So you’re saying I’m like emotional CrossFit?”

We didn’t kiss. But we did agree that both of us cry during dog food commercials. So… connection?

Back home, my neighbor’s kid asks me to help with his science project. It’s a volcano. I said, “Kid, I’m not great with explosions. I still haven’t recovered from my last relationship.” He said, “It’s just baking soda and vinegar.” I said, “So… emotional support lava?”

We make it, it erupts, and it covers the entire driveway in pink foam. His mom comes out, sees it, and says, “Is that… strawberry toothpaste?” I said, “It’s a metaphor for unchecked ambition.”

She called me irresponsible. I said, “I prefer ‘unconventionally structured.’”

Anyway, I’m trying. Therapy. Dating. Fitness. Being a role model. Some days I feel like I’m figuring it out. Most days I feel like I’m one sad podcast away from adopting a goat and moving to Wyoming.

But hey—at least I’m not the raccoon anymore. Now I’m a slightly more self-aware raccoon with health insurance.

Thank you, you’ve been a warm audience. Like, suspiciously warm. Is the AC broken or are you all avoiding eye contact?

[Exit stage.]

GPT OSS 120B

Openai · Aug 2025text

[Spotlight comes up. The comic walks to the mic, glances at the audience, and smiles.]

Comic: Hey, everyone—great to be here. I just got off a Zoom call that was supposed to be a “quick sync,” which in corporate speak means “we’ll spend three hours arguing about who gets to use the same PowerPoint template.” I’m convinced the word “sync” is short for “synchronizing our collective misery.”

[Pause for laugh.]

Speaking of misery, I moved into a new apartment last month. The landlord told me the place had “character.” Yeah, the kind of character that screams, “I’ve never heard of a plumber in the last decade.” The faucet drips on a rhythm that could be a metronome for a sleep‑deprived toddler. I’m not saying it’s bad, but I’ve started timing my showers to the drip. Six minutes, two seconds—if I go any longer, the building’s water bill looks like a small country’s GDP.

[Audience chuckles.]

And the neighbors! My upstairs neighbor is a yoga instructor. You know the type—every morning at 6 a.m., the floor vibrates like a cheap subwoofer. I’m convinced she’s trying to align her chakras with the building’s structural integrity. I tried to be polite and asked, “Hey, could you maybe do the downward dog a little later?” She replied, “I’m sorry, I’m on a schedule.” I’m not sure if she meant a class schedule or a schedule for how many times I’ll have to pretend to be a human pillow for her dog.

[Pause.]

Now, I’ve been trying to eat healthier. The other day I bought a “kale smoothie.” The label promised “nutrient‑dense, antioxidant‑rich, life‑changing.” I drank it and felt more like I’d just swallowed a lawnmower. I’m not saying it was bad, but the only thing that got a boost was my ability to identify the exact moment my stomach decided to file a formal complaint.

[Audience laughs.]

Technology is another thing. I tried using a voice‑assistant to set a reminder: “Remind me to call Mom at 5 p.m.” It responded, “I’m sorry, I didn’t catch that.” So I repeated, “Remind me to call Mom at 5 p.m.” And it answered, “Here’s a recipe for mushroom risotto.” I’m beginning to suspect my house is turning into a culinary school. At this point, I’m more likely to get a reminder to water my plants than to remember my own birthday.

[Pause.]

Dating apps—anyone here still using them? The profile pictures are all “adventure shots.” You never see a guy holding a pizza, you always see a guy climbing a mountain. I matched with a girl whose bio said, “Looking for someone who can keep up.” I replied, “I can keep up… with the Netflix queue.” She ghosted me. I think the algorithm is broken; it matched us because we both love “outdoor activities.” My idea of outdoor activity is “walking from the couch to the fridge and back.”

[Audience laughs.]

All right, I’m going to leave you with this: life is like a Wi‑Fi signal. You spend a lot of time looking for a stronger connection, but sometimes the best thing you can do is just stay where you are and enjoy the occasional drop—because it reminds you that you’re still, somehow, connected.

[Smile, wave.] Thanks, everybody—have a great night!

Vote

9 / 46

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

Qwen Plus 0728

Qwen · Sep 2025text

No content available for this output.

GPT OSS 120B

Openai · Aug 2025text

No content available for this output.

Vote

10 / 46

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

Qwen Plus 0728

Qwen · Sep 2025text

No content available for this output.

GPT OSS 120B

Openai · Aug 2025text

No content available for this output.

Vote

11 / 46

Character Voice Testconversation

Tests an AI's ability to write in distinct character voices

Qwen Plus 0728

Qwen · Sep 2025text

No content available for this output.

GPT OSS 120B

Openai · Aug 2025text

No content available for this output.

Vote

12 / 46

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

Qwen Plus 0728

Qwen · Sep 2025website

GPT OSS 120B

Openai · Aug 2025website

Vote

12 of 46

FAQ

Continue exploring

Qwen Plus 0728 vs Pony AlphaNew provider

Qwen Plus 0728 vs MiniMax M2.5New provider

Qwen Plus 0728 vs GPT OSS 20BCross-provider

Below is a step‑by‑step back‑of‑the‑envelope calculation that shows how we get there, together with the assumptions that go into it.

1. What we need to know

Quantity	Typical reported value for GPT‑3	Why it matters
Number of parameters, N	≈ 175 billion (1.75 × 10¹¹)	Determines the size of each matrix‑multiply in the model.
Training token count, T	≈ 300 billion tokens (3 × 10¹¹)	Total number of token‑level forward‑passes the model sees.
Sequence length, L	≈ 2048 tokens per example (the context window).	Determines how many per‑token matrix‑products are needed per forward pass.
Number of layers, Lₗ	96 transformer blocks.
Hidden dimension, d	12 384 (the width of each linear projection).
Number of attention heads, h	96 (so each head has size d/h = 128).
Training passes	1 epoch (the published training used roughly 1 × the dataset; we treat the 300 B tokens as the total “token‑steps” already).

2. How many FLOPs per token?

A transformer layer consists of:

Self‑attention (Q, K, V projections + output projection)
Feed‑forward network (FFN) (two linear layers with a non‑linear activation).

For a single token (ignoring the cost of the softmax and the small bias terms) the dominant cost is matrix‑multiply operations.

2.1 Rough matrix‑multiply cost

2.2 FLOPs per token for a single layer

Component	Approx. dimensions	FLOPs (per token)
Q, K, V projections (3× per token)	d × d → 3·(2·d·d) = 6·d²
Attention scores (Q·Kᵀ)	L·d → 2·L·d²
Weighted sum (A·V)	L·d → 2·L·d²
Output projection	d × d → 2·d²
FFN first linear (d → 4d)	2·d·4d = 8·d²
FFN second linear (4d → d)	8·d²
Total per layer	≈ 38 · d² FLOPs per token (rounded up).

Plugging d = 12 384:

[ 38 \times d^{2} \approx 38 \times (1.2384\times10^{4})^{2} \approx 38 \times 1.53 \times 10^{8} }= 5.8 \times 10^{9}; \text{FLOPs per token per layer}. ]

2.3 All layers

With 96 layers:

[ \text{FLOPs per token (forward)} \approx 96 \times 5.8\times10^{9} }= 5.6 \times 10^{11}; \text{FLOPs}. ]

2.4 Back‑propagation cost

Training requires a forward pass plus a backward pass. The backward pass is roughly 2× the forward‑pass cost (gradient propagation + weight updates). So:

[ \text{FLOPs per token (training)} \approx 3 \times \text{forward} = 3 \times 5.6\times10^{11} }\approx 1.7\times10^{12}; \text{FLOPs per token.} ]

(Alternative literature often rounds this factor to ≈ 6 total FLOPs per parameter per token, which yields the same order of magnitude.)

3. Multiply by the total number of tokens

The training corpus used for GPT‑3 was ≈ 300 billion tokens (≈ 3 × 10¹¹). The total FLOPs needed:

[ \text{Total FLOPs} \approx 1.7 \times 10^{12};\text{FLOPs/token} \times 3\times10^{11};\text{tokens} \approx 5.1 \times 10^{23};\text{FLOPs}. ]

Rounded to a single‑digit significance:

[ \boxed{,\approx 5\times10^{23},\text{FLOPs},} ]

4. Expressed in common units

Unit	Value
FLOPs	≈ 5 × 10²³
Peta‑FLOPs (10¹⁵)	≈ 5 × 10⁸ PFLOPs
Exa‑FLOPs (10¹⁸)	≈ 5 × 10⁵ EFLOPs
Zetta‑FLOPs (10²¹)	≈ 5 × 10² ZFLOPs (≈ 500 ZFLOPs)

5. A “quick‑rule” sanity check

A widely‑cited back‑of‑the‑envelope rule for transformer LLMs is:

[ \text{Training FLOPs} \approx 6 \times N_{\text{params}} \times N_{\text{tokens}}. ]

Plugging in the numbers:

[ 6 \times (1.75\times10^{11}) \times (3\times10^{11}) = 3.15\times10^{23};\text{FLOPs}. ]

6. Summary of the reasoning

Model size gives the number of parameters (N = 1.75 × 10¹¹).
Token count tells how many forward‑backward passes are needed (T ≈ 3 × 10¹¹).
Per‑token training cost for a transformer layer is ≈ 38 · d² FLOPs per forward pass; with 96 layers this is ≈ 5.6 × 10¹¹ FLOPs per token.
Back‑propagation adds roughly a factor‑3 (or 6) multiplier, yielding ~1.7 × 10¹² FLOPs per token.
Total = per‑token cost × total tokens ≈ 5 × 10²³ FLOPs.

Bottom line

Estimated total compute to train GPT‑3: ≈ 5 × 10²³ FLOPs, i.e. ≈ 300–500 zeta‑FLOPs (or about 300 000 exaflops).
This is the magnitude of hundreds of zettaFLOPs of raw arithmetic, roughly the same as running a few hundred petaflops‑scale supercomputers for a month (or a single exaflop‑class machine for a week).

Qwen Plus 0728 vs GPT OSS 120B

Why Qwen Plus 0728?

Why GPT OSS 120B?

Favorites

Estimate Complexityreasoning

Qwen Plus 0728

GPT OSS 120B

1. What we need to know

2. How many FLOPs per token?

2.1 Rough matrix‑multiply cost

2.2 FLOPs per token for a single layer

2.3 All layers

2.4 Back‑propagation cost

3. Multiply by the total number of tokens

4. Expressed in common units

5. A “quick‑rule” sanity check

6. Summary of the reasoning

Bottom line

AI Board Game Logicreasoning

Qwen Plus 0728

GPT OSS 120B

Logic Puzzlereasoning

Qwen Plus 0728

GPT OSS 120B

Math Misconception Testreasoning

Qwen Plus 0728

GPT OSS 120B

Stochastic Consistency Testanalysis

Qwen Plus 0728

GPT OSS 120B

SVG Layout Challengeimage generation

Qwen Plus 0728

GPT OSS 120B

Xbox Controller SVG Artimage generation

Qwen Plus 0728

GPT OSS 120B

Generate a Stand-Up Routineconversation

Qwen Plus 0728

GPT OSS 120B

Realistic AI Interviewconversation

Qwen Plus 0728

GPT OSS 120B

Satirical Fake News Headlineconversation

Qwen Plus 0728

GPT OSS 120B

Character Voice Testconversation

Qwen Plus 0728

GPT OSS 120B

Minimalist Landing Pageweb design

Qwen Plus 0728

GPT OSS 120B

What is the difference between Qwen Plus 0728 and GPT OSS 120B?

Which is better, Qwen Plus 0728 or GPT OSS 120B?

How much does Qwen Plus 0728 cost compared to GPT OSS 120B?

How can I compare Qwen Plus 0728 and GPT OSS 120B on RIVAL?

Why Qwen Plus 0728?

Why GPT OSS 120B?

Favorites

Estimate Complexityreasoning

Qwen Plus 0728

GPT OSS 120B

1. What we need to know

2. How many FLOPs per token?

2.1 Rough matrix‑multiply cost

2.2 FLOPs per token for a single layer

2.3 All layers

2.4 Back‑propagation cost

3. Multiply by the total number of tokens

4. Expressed in common units

5. A “quick‑rule” sanity check

6. Summary of the reasoning

Bottom line

AI Board Game Logicreasoning

Qwen Plus 0728

GPT OSS 120B

Logic Puzzlereasoning

Qwen Plus 0728

GPT OSS 120B

Math Misconception Testreasoning

Qwen Plus 0728