Retrieve LLM Token Usage in AI Agents

I actually solved the problem by adapting @solomon‘s solution for my case.

What I did was simply adding two nodes right after the Agent node (Edit Fields > HTTP request, bcs I was having some trouble with the Execute Sub-workflow node), as you can see below:

The Edit Fields node just filters the execution_id at the active workflow using {{ $execution.id }}.

The HTTP request node does a POST to {your_n8n_url}/webhook/log-tokens, with the following JSON body (in MY case, I also send together workspace_id and instance bcs these are UUIDs for my SaaS backend, for billing):

{
  "execution_id": "{{ $json.execution_id }}",
  "workspace_id": "...",
  "instance": "..."
}

BUT you’ll probably use only:

{
  "execution_id": "{{ $json.execution_id }}"
}

Then I created another workflow:

Basically, it has a Webhook trigger, that listens to the POST request did at the last workflow, listening path "log-tokens”:

Then I added a Wait node of 5 secs just to make sure the workflow is over. Then Get an execution node:

By this point, I already had the AI Agent tokenUsage, then I just used and edited the Edit Fields node made by @solomon after that to get the execution_id in Webhook, and to filter whatever else I needed for my implementation:

I also kept the Split Out intact as in the original as in @solomon’s solution.

Then I noticed the output value delivered by n8n is somehow completely inflated. After testing and checking directly at OpenAI Platform, I realized that specifically for OpenAI models there’s a ~1.6x increase in token usage in n8n in relation to the number displayed at the official OpenAI API data (and it’s actually what’s used for billing).

Also, in my case, it’s interesting to convert currency, as my country, SaaS etc. don’t use USD.

Then I used a Code node with the following script, for reasoning this discrepancy and converting USD to BRL (approximately, of course, using a slightly pessimistic scenario):

// --- CONFIGURAÇÃO ---
const CORRECTION_FACTOR = 0.60;
const USD_TO_BRL = 6.00; 
const PRICING = { input: 0.25, output: 2.00 };

const item = $input.item.json;

// --- FUNÇÃO DE RESGATE SEGURO ---
// Tenta pegar o valor de todas as formas possíveis que o n8n costuma entregar
function getValue(obj, pathString) {
  // Tentativa 1: Acesso direto por string (ex: json["tokenUsage.promptTokens"])
  if (obj[pathString] !== undefined) return obj[pathString];
  
  // Tentativa 2: Navegação aninhada (ex: json.tokenUsage.promptTokens)
  const keys = pathString.split('.');
  let value = obj;
  for (const key of keys) {
    if (value && value[key] !== undefined) {
      value = value[key];
    } else {
      return undefined; // Não achou
    }
  }
  return value;
}

// 1. Busca os valores (Tenta aninhado ou plano)
let rawPrompt = getValue(item, 'tokenUsage.tokenUsage.promptTokens') || getValue(item, 'tokenUsage.promptTokens') || 0;
let rawCompletion = getValue(item, 'tokenUsage.tokenUsage.completionTokens') || getValue(item, 'tokenUsage.completionTokens') || 0;

// 2. DEBUG DE EMERGÊNCIA
// if (rawPrompt === 0) return { json: { erro: "Não achei os tokens", estrutura_recebida: item } };

// 3. Aplica a correção
const realPrompt = Math.ceil(rawPrompt * CORRECTION_FACTOR);
const realCompletion = Math.ceil(rawCompletion * CORRECTION_FACTOR);
const realTotal = realPrompt + realCompletion;

// 4. Calcula custo
const costInputUSD = (realPrompt / 1000000) * PRICING.input;
const costOutputUSD = (realCompletion / 1000000) * PRICING.output;
const totalUSD = costInputUSD + costOutputUSD;

return {
  json: {
    execution_id: item.execution_id,
    workspace_id: item.workspace_id,
    instance: item.instance,
    // Tenta pegar o modelo, se falhar, usa fallback
    model: getValue(item, 'tokenUsage.model') || 'gpt-5-mini', 
    
    usage: {
      prompt_tokens: realPrompt,
      completion_tokens: realCompletion,
      total_tokens: realTotal
    },
    
    costs: {
      usd: Number(totalUSD.toFixed(6)),
      brl: Number((totalUSD * USD_TO_BRL).toFixed(4))
    },
    
    audit: {
      raw_found: rawPrompt + rawCompletion,
      original_source: rawPrompt > 0 ? "found_data" : "zero_data_error"
    }
  }
};

If you’re afraid of the reasoning being too strict, you can set the value 0.60 to 0.65 or 0.70 at CORRECTION_FACTOR. Also, you’ll have to change PRICING based on the exact cost per million tokens of the model you’re using. In my case, it’s gpt-5-mini, so its $0.25 input / $2.00 output.

And, of course, if you don’t want any currency conversion (or another conversion for another currency), you’ll have to adapt the code a bit, but as the code also delivers the USD value spent, I think it’s ready to use even in this case.

Well, after that, I just use another regular HTTP request to POST the collected data into my SaaS backend via API. But, at this point, you can send wherever you want, or even use another type of node, like Sheets, to send data.

1 „Gefällt mir“

I did try both @solomon and @filipeleal suggested solutions, but I keep getting the error “The resource you are requesting could not be found".

Do I need to have the paid version of n8n for it to work?

Hey @andre2026 , it could be the URL in your credentials. Yesterday I helped another user with the same problem, and it was the credentials.

You must include “/api/v1” at the end of the base URL of your credentials. Exactly as shown in the example placeholder for that field.

1 „Gefällt mir“

Thank you very much @solomon!

It wasn’t the missing “/api/v1”, but it was indeed an error on how I wrote the base url.

Thanks for your help. It is working now!

1 „Gefällt mir“

and be interested to hear more on this one

Any idea why the token outputs are not being passed to the separate execution data workflow (same setup as provided here)?

The token outputs are displayed in the AI model node as inputs, but I can’t seem to access them.


Native token usage output on AI Agent node — a must-have for production

Hey everyone, adding my voice here because this feature is critical for anyone running AI agents in production, especially in a multi-client / agency context.

Why this matters beyond simple cost tracking

We’re currently operating multiple AI agents on n8n (self-hosted business plan via Coolify), each serving a different client project. The inability to natively access tokenUsage from the Agent node forces us into a fragile workaround chain:

  1. Edit Fields → capture $execution.id
  2. HTTP POST/webhook/log-tokens with { execution_id, project_id }
  3. Separate “Token Logger” workflow → Webhook trigger → Wait 5s → n8n API GET /executions/{id} → Code Node to parse tokenUsage recursively → Persist to Supabase

This works (thanks to @solomon and @Antony_Eardrop for pioneering this pattern), but it has real issues:

  • The tokenUsage path is inconsistent — sometimes nested under tokenUsage.tokenUsage.promptTokens, sometimes flat. @aminabudahab reported the same issue. This means fragile parsing logic.
  • The 5s wait is a race condition — if the agent takes longer than expected, the execution data isn’t ready yet.
  • Multi-provider normalization is on us — OpenAI uses prompt_tokens/completion_tokens, Anthropic uses input_tokens/output_tokens, Gemini uses promptTokenCount/candidatesTokenCount. We have to write a normalizeUsage() function manually.
  • No native way to get the model name — critical for cost calculation when you use multiple models across workflows.

What the ideal implementation would look like

The Agent node should expose a dedicated output (similar to the Tools output leg) with a standardized object:

{
  "model": "gpt-4o",
  "provider": "openai",
  "input_tokens": 1234,
  "output_tokens": 5678,
  "total_tokens": 6912,
  "cost_usd": 0.042,
  "intermediate_steps": [
    { "tool": "calculator", "input_tokens": 200, "output_tokens": 50 }
  ]
}

This would:

  • Eliminate the need for a second workflow + API call + wait hack
  • Enable direct persistence (Supabase, Postgres, Sheets, whatever)
  • Make multi-client billing straightforward
  • Open the door for native observability dashboards

Current ecosystem gap

We’ve evaluated every alternative:

  • LangSmith — works via env vars but tracing is global per instance (can’t split by workflow/client)
  • Langfuse — no native n8n integration, requires LiteLLM proxy as middleware
  • n8n-trace — great for workflow observability but doesn’t track LLM tokens at all
  • OpenRouter — returns usage.cost automatically which is great, but still no native n8n surfacing

All of these are workarounds for something that should be a first-class output of the Agent node.

n8n is positioning itself as an AI-first platform — native token observability would be a massive differentiator. Happy to contribute to the spec if the team is open to community input.

+1 from our team :raising_hands:

3 „Gefällt mir“

fantastic, perfect summary of the cost tracking issue. I do a lot of http calls to GenAI as well as the lang nodes; so have to concoct my own rough calc on usage and pretend its deterministic data:) swap out an endpoint or llm node, and have to go in and change the calculator. n8n will kill em with this proposed feature, at least with the ai agent node.. Wonder if agent could read usage.cost and be prompted to calc up his tool usage, or sub-flow, or sub-agent usage; and output a Total-total. Otherwise a following node. *But multi-agents, swarms, neurals and subflows are the wave of the future?? And my fragile mind says your most powerful LLM is generally at the helm - well why not have a beautiful cost calc right about there, where all the action is?

I tried the “Get an execution” approach. The module outputs data, but it doesn’t output the token usage. Maybe I need to use the workaround with the HTTP call, but that feels too cumbersome.

I wonder why the “Model a Message” module outputs token usage by default, while the AI Agent doesn’t.

Is there any feedback from the n8n staff regarding this topic?