AI and Energy — Two Billion Tokens Later

Over the past month, I have consumed nearly two billion tokens1 through my daily work with AI. That amounts to an estimated 100 kWh of energy — roughly what it takes to run a couple of refrigerators for a month.

That surprised me. In a blog post from last August, I estimated that training models2 required vast amounts of energy, but that the energy cost of asking individual questions was negligible. I figured my prompts over a week consumed about one kilowatt-hour. That was not wrong — at the time.

What changed is how I use AI. Around the turn of the year, many discovered that AI models had become remarkably good at solving complex tasks in multiple steps. I now work daily with Claude Code, a so-called “agent” that can read files, run commands, reason through multiple steps, and write code. This requires entirely different volumes of tokens compared to a regular chat question. A simple question might use a few hundred tokens. A working day with Claude Code can consume millions. I found myself using it so heavily that I had to upgrade from a “Pro” subscription to a more expensive “Max” plan to avoid hitting daily limits.

The Meter

To find out what this actually costs in energy, I built a meter (source code on GitHub). It is a Python script that runs in Claude Code’s status bar. Every time the AI makes an API call, token statistics are piped to the script, which accumulates daily totals — input tokens, output tokens, and cached tokens — in a local file. At midnight, the day’s totals are archived to a history file. Energy is estimated by multiplying token counts by published estimates for each token type, with an uncertainty factor of roughly 3x in either direction.3

The Result

One hundred kilowatt-hours per month. That is a far cry from the negligible amounts I estimated last year. Worth noting is that this figure only covers computation energy — the energy consumed by the GPUs processing my tokens. The actual operational energy in a data center, including cooling and other infrastructure, is likely 1.5–3 times higher.

The Dependency

But the energy figure is not the only thing that gave me pause. All of my inference4 takes place in data centers in the United States. This means that for virtually all of my daily work, I depend on data centers in another country being supplied with energy — and on the company Anthropic continuing to exist and deliver its services to me.5

That is a vulnerability. Not just for me — but for anyone building their workflows around a single AI provider in another country.

Looking Ahead

I have started thinking about what I can do about it. In practical terms, that means reducing my energy consumption by replacing token-heavy MCP servers with leaner CLI tools, ensuring that the tools I build work easily with models from other providers — such as OpenAI or Mistral — and moving compute closer to home.6 That could mean domestic cloud services like Berget AI, or running lighter models locally.

But the bigger insight is simpler than that: using AI in my daily work has gone from being energetically invisible to the equivalent of an extra appliance in the household. That does not change the fact that the technology is useful — but it does change how deliberately I want to use it.

Footnotes

  1. A token is the basic unit that AI models work with. Much like a human reads word by word, an AI reads token by token. A token corresponds to roughly three quarters of a word in English. The word “refrigerator” is three tokens.

  2. Training a model means feeding an AI enormous amounts of data — for example, text from the Internet — and letting it find patterns and relationships in the data. The process requires thousands of specialized processors working for weeks or months, and the result is a digital artifact (the model) that can then answer questions and solve tasks.

  3. Nobody outside Anthropic knows the actual energy per token for Claude models. The mid-point estimates come from Simon P. Couch’s analysis of Claude Code’s energy consumption, which derives per-token figures from Epoch AI’s research and Anthropic’s pricing ratios. The 3x uncertainty means the real figure could be three times higher or lower — a wide range, but honest about what we actually know.

  4. Inference is what happens every time you ask an AI a question — the model receives your text, processes it, and produces a response. Unlike training, which happens once, inference occurs every time someone uses the model. It is inference that gives rise to the ongoing energy consumption.

  5. Which may not be a given, considering their ongoing dispute with the US Department of Defense, which I wrote about here.

  6. I am considering buying a more powerful desktop computer to run models locally. It will not fully replace the most capable models from OpenAI, Anthropic, or Google, but can hopefully cover a large share of my baseline needs for lighter tasks.