Gemini CLI is a new free, semi-open source agentic devtool. It gives autonomy to Google’s Gemini models: in response to prompts, Gemini CLI can apply changes to your codebase, navigate files, and run commands in your terminal (with your approval, of course). We wanted to see how it stacks up to Claude Code, a similar CLI tool that we’ve used nearly everyday since launch.
The biggest difference: the underlying models
Claude Code and Gemini CLI are quite similar agent-wise. This means that their ops-based capabilities (e.g. editing files, accessing files, using CLI tools) are pretty on par with each other, and they have the same general use cases.
What really differentiates them from each other is that they’re driven by different LLMs. Asking the difference between Claude Code and Gemini CLI is pretty close to finding how Claude differs from Gemini.
Gemini Pro 2.5 is widely considered one of the best (general) models out there today (as of writing this post), but it pales in comparison to Claude Sonnet and Opus for programming tasks. All models have their weak points, and unfortunately, Gemini’s happens to be writing good code… Since Gemini CLI’s effectiveness rests on the shoulders of Gemini Pro 2.5, this puts it far behind Claude Code right off the bat.
Context windows
Asking ChatGPT to review a function that you’ve copy/pasted into your chat window only can help you so much. Why? It’s missing the context of your codebase. LLMs work best when they ingest most/all of the code in your program. They can then find patterns and better understand what your snippet in question is supposed to do.
This is why local devtools are so much more valuable: they can take in an entire project directory as context, and use that to help write and improve code.
Gemini really shines here: it has a context window of one million tokens. This lets it remember entire sections of your codebase (or sometimes even the whole codebase), as well as your recent prompts and its outputs. This makes it really powerful for refactoring larger files.
Claude Code has a context window of about 200,000 tokens. As a result, it can’t remember instructions as long or work with the entirety of larger codebases.
Sometimes, towards the end of an LLM’s context window, they start to become less effective, since they’re focusing too heavily on previous prompts. This makes them more “narrow-minded” in their problem-solving approaches. Some users reported better results when starting with a fresh context window, especially when their agents got stuck or started producing messier outputs.
The interfaces and UX
Both Claude Code and Gemini CLI are command line programs that are run from your terminal. You launch them by executing the claude
or gemini
command.


There’s only so much you can do design-wise with a CLI program, so naturally their layouts are rather similar.
The biggest UI/UX difference that we noticed was that Gemini CLI puts its actions in individual boxes, while Claude uses more of a tree format. These are used differently too: Gemini “thinks”, takes actions, and then summarizes a series of actions taken in a single box. Claude announces every step it takes (e.g. searched for XYZ regex pattern, searched for ABC regex pattern, read file sample.xml
, etc.).


The default response structures are different too. Claude Code responds in single sentences, bullet points, and lists. Gemini CLI responds in longer paragraphs and short numbered lists. We found Claude’s format much more readable, especially when working from a small terminal window. These are products of the underlying models and finetuning. We asked Gemini to respond in bullet points, and it was able to adopt and remember that preference.
Reliability
We really benefitted from having both tools installed. You can’t count on either of them 100% of the time (for different reasons that we’ll get into), so having a backup was helpful.
Our biggest overall qualm with Claude Code was that it was often down. It only has two nines of uptime, which is a product of Anthropic scaling so fast. As of July 28th, they’ve been improving reliability by optimizing Claude Code’s rate limits, and the majority of Claude Code users won’t be impacted by these limits. Downtime-related errors came through as red 400 responses from the API.
Gemini was unreliable in different ways: it didn’t always know what it was doing. With Claude Code, feeding error logs would help resolve any bugs (which was important for full-stack/frontend issues, since Claude and Gemini can’t view/interact with the app like a user can). Gemini would struggle to pinpoint what was wrong. With extra context in addition to the error logs, Gemini was able to get further.
Again, this is mostly a product of the underlying models, but Gemini is most useful when given very precise instructions. It’s not as valuable for junior devs or vibe coders, since to get high-quality outputs it you’ll need to understand more complex architectural patterns, and be decent at debugging on your own. Any model has this drawback to some extent, but we saw it to a much lesser degree with Claude Code. Claude could redirect itself better, especially when a method it was trying didn’t work as well.
Other users have reported that Gemini doesn’t always complete every task you give it, and sometimes even tries to offload tasks back to the user.
Claude sometimes suggests unsafe commands. It altered terminal permissions for us once, and bash commands were unusable until we manually resolved them. Lesson learned: don’t let agents execute commands you’re not familiar with.
Pricing + Value
Claude Code gives new users $5 worth of API tokens. This ran out rather quickly for us, as we were making pretty intensive requests via Claude Opus. We quickly found that a Claude Pro subscription (for $20/month) was much more cost-effective than paying for API tokens.
For medium-heavy usage, a Claude Pro subscription is pretty sufficient for Claude Code. Tokens will run out and refresh every 5 hours, so you’ll have to be somewhat efficient with your prompting. Claude Pro only supports access to Sonnet and Haiku, whereas Claude Max and the Claude API grant users the highly-sophisticated Opus model. After trying Opus, we were still quite satisfied by Sonnet’s performance for most programming tasks.
Unlike Claude Code, Gemini CLI is usable at its free tier. Google is generous with API limits, as Gemini grants 100 requests per day (which is more than we needed for our medium-heavy workload). After you run out of Gemini Pro 2.5 usage, you’ll get switched over to Gemini Flash 2.5. Flash is a smaller model, more comparable to Claude Haiku, so don’t expect it to come up with anything too profound.
Unlike Gemini’s paid tiers, the free tier will opt you into input and output collection, so use at your discretion.
Complete your agentic dev loop
You’ll get the most value from your agents when they’re not confined to your local dev environment. With ephemeral environments, they can push code and quickly see how it performs running on production-like infrastructure. Agents can pull logs and resolve bugs that happen later in the pipeline, so you can ship features faster and more confidently.
Try Shipyard today and get unlimited full-stack, production-like environments. We promise you (and your agents) will be more productive than ever.