🔥Claude Opus 4 vs. Gemini 2.5 Pro vs. OpenAI o3 Coding Comparison 🚀

claude-opus-4-vs-gemini-25-pro-vs.-openai-o3-coding-comparison-

Anthropic just launched two new AI models, Claude Opus 4 and Claude Sonnet 4 (a drop-in replacement for Claude 3.7 Sonnet), which hit the market on May 22.

Both of these models have similar SWE benchmarks, so in this blog, we will mainly focus on Claude Opus 4. ✌️

It's on now GIF

Now that this new model, Claude Opus 4, is launched, let’s see if we have something cool or just another regular AI model. 👀

TL;DR

If you want to jump straight to the conclusion, when Claude Opus 4 is compared against the other two models, Gemini 2.5 Pro and OpenAI o3, Opus simply dominates and that too by a good margin in coding which you can see for yourself below in the comparison.

Tweet praising Claude Opus 4 AI Model

If you are looking for a good AI coding assistant, maybe for your editor or in general, Claude Opus 4 is the best option for you (at least for now!)

Brief on Claude Opus 4

If you are on this blog, it’s likely for the Claude Opus 4 model, so let me give you a brief introduction to this model before we move any further.

It hasn’t even been a week since this model was launched, and they claim it to be the best AI model for coding. Not just that, but an AI model that could autonomously work for a full corporate day (seven hours). Looking scary already!! 😬

Claude Opus 4 Model best AI model claim

It has about a 200K token context window (not the numbers you might expect, but it is what it is), and it’s said to be the best model for coding. It should justify this, but we’ll see in just a moment.

Claude Opus 4 leads on the SWE-bench with a score of 72.5% and can reach up to 79.4% with parallel test-time compute.

Claude Opus 4 Coding Benchmark

As you can see, there’s already over a 10% improvement over Anthropic’s previous model, Claude 3.7 Sonnet.

This Claude 4 lineup also marks a 65% lower chance of the model using hacky and shortcut methods to get the job done.

Now, imagine an AI model (in this case Claude Opus 4) doing PRs, making commits, and doing everything you can think of all on its own with just a few prompts. How cool would that be, right?

Here’s exactly that. The Claude team has shared this quick GitHub Actions integration with Claude Opus 4, where you can see the model making changes on the PR and addressing feedback in real-time. 👇

Doesn’t this look a bit dangerous to you? How quickly the amount of control these AI models are already taking has increased in these 2-3 years from GPT-3.5 to this model.

This is getting really insane, and I’m not sure if I love or hate this happening. 🥴

Coding Comparison

As you might have already guess, In this section, we will be comparing Claude Opus 4 (SWE 72.5%) vs. Gemini 2.5 Pro (SWE 63.2%) vs. OpenAI o3 (69.1%) on coding.

💁 All three of these models are coding beasts, so we won’t be testing them with any easy questions. We’ll use really tough ones and see how they perform head-on.

All three of these models are coding beasts, so we won’t be testing them with any easy questions. We’ll use really tough ones and see how they perform head-on. One thing I will also account for is taste.

1. Particles Morph

Prompt: You can find the prompt I’ve used here: Link

Response from Claude Opus 4

You can find the code it generated here: Link

Here’s the output of the program:

This looks crazy good, and the fact that after thinking for about 100 seconds (~1.66 minutes) it was able to do this in one shot is even crazier to me. The particles morph behavior from one shape to another is exactly how I expected; it does not start from one point and morph to another shape but right from the shape it’s in.

There is room for improvement, like the shapes aren’t really 100% correct, but the overall implementation is rock solid!

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

This is not bad, but it’s definitely not at the Claude Opus 4 level of quality. The shapes look poor and don’t really meet the expectations I had. Is that how the bird looks? Seriously? The overall UI is also not up to par.

This is definitely not what I was expecting and somewhat disappointing from this model, but we’re comparing it (SWE bench 63.2%) to Claude Opus 4 (SWE bench 72.5%), and maybe that’s the reason.

🫤 I’ve noticed that after every new model is launched, the previous best model seems to fade in comparison to the new one. How fast the AI models are improving is just crazy.

Response from OpenAI o3

You can find the code it generated here: Link

Here’s the output of the program:

The response we got from the o3 is even worse than from the Gemini 2.5 Pro. Honestly, I was expecting a bit more from this model, yet here we have it.

I’m not sure if you noticed, but the particles don’t morph directly from their current shape; instead, they first default to a spherical shape and then morph to the requested shape.

2. 2D Mario Game

Prompt: You can find the prompt I’ve used here: Link

Response from Claude Opus 4

You can find the code it generated here: Link

Here’s the output of the program:

It did it in a matter of seconds. Implementing a whole 2D Mario game, which is super difficult, in just a matter of seconds is a pretty impressive feat.

And not just that, look at how beautiful the UI and the overall vibe are. This could actually serve as a solid start for someone who’s trying to build a 2D Mario game in vanilla JS.

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

It is functional, I must say that, and it’s somewhat good. But it’s a bit too minimal and also a bit buggy.

If you see the timer running in the top right, it’s just not working correctly (I am not so familiar with this game, and maybe this is how it works), but whatever, this doesn’t feel like a good output from a model considered this good.

Response from OpenAI o3

You can find the code it generated here: Link

Here’s the output of the program:

o3 didn’t really do any good on this question. As you can see, it just looks like a prototype and not even a working game. It’s complete nonsense, and there’s no real Mario game here. It has lots and lots of bugs, and there’s no way the game ends.

Disappointing result from this model one more time!👎

3. Tetris Game

Prompt: You can find the prompt I’ve used here: Link

Response from Claude Opus 4

You can find the code it generated here: Link

Here’s the output of the program:

As you can see, we got a perfectly implemented Tetris game with vanilla HTML/CSS/JS in no time, I even forgot to keep track of it. It did it that fast.

It did implement everything I asked for, including optional features like the ghost piece and high score persistence in local storage. You might not hear it, but it also implemented background theme music and the next three upcoming pieces.

Tell me, for real, how long would this take you if you were to code this all alone, with no AI models?

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

This one is equally good and works perfectly like the Claude Opus 4; even the UI and everything looks nice. I love that it could come up with a nice solution to this problem.

Response from OpenAI o3

You can find the code it generated here: Link

Here’s the output of the program:

This one’s interesting. Everything from the tetriminos falling to everything else seems to work fine, but there’s no way for the game to end. Once the tetriminos hit the top, the game is supposed to end, but it doesn’t, and the game is simply stuck forever.

Now, this could be an easy fix in the follow-up prompt, but this is a pretty simple question, so I decided to just do it in one shot. Not that big of an issue, but still.

4. Chess Game

Prompt: You can find the prompt I’ve used here: Link

Response from Claude Opus 4

You can find the code it generated here: Link

Here’s the output of the program:

Now, this is out of this world. It implemented an entire chess game from scratch with no libraries. I had thought it would use something like Chess.js or any other external libraries, but there you have it, a fully working chess game, even though it misses some moves like “en passant” and some other specific moves.

Other than piece-specific moves, all the moves are calculated perfectly in the move log. This is pure insanity!

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

Gemini 2.5 Pro also decided to implement everything from scratch, and it has also tried to implement other moves like “en passant,” not just piece-specific moves.

The game overall seemed fine, but the soul of Chess is missing. The pieces are just there; they don’t move. This felt like a small issue that it could easily fix in follow-up prompts, but it did not.

Follow up prompt to Gemini 2.5 Pro AI Model

You can find it’s updated code from the follow-up prompt here: Link

Response from OpenAI o3

You can find the code it generated here: Link

Here’s the output of the program:

OpenAI o3 developed Chess game demo

OpenAI o3 took a more solid approach and decided to use Chess.js, which I’d prefer if I were looking to build a production-level Chess game, but the implementation didn’t really fit.

It seems like the external Chess.js imports didn’t work and are failing as it’s trying to use the Chess object, which is undefined.

Conclusion

Did we get a clear winner here? Yes, and absolutely yes, and it’s Claude Opus 4.

Amazon funded Anthropic is doing some real magic with these Claude models, first my earlier favorite, Claude 3.7 Sonnet and now the two beasts (Claude Sonnet 4 and Claude Opus 4).

Tweet on the Claude parent company Anthropic

Claude Opus 4 is completely better than the other two models, even though it has a much lower token context window compared to the other two. Being this much better in coding with such a low context window is by far the best thing I’ve seen recently in this AI boom.

What do you think and which one do you pick for yourself? Let me know in the comments below!

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
top-rust-development-companies-you-should-know

Top Rust Development Companies You Should Know

Next Post
hbk-t210-io-link-enabled-torque-sensor

HBK T210 IO-Link Enabled Torque Sensor

Related Posts