Administration

FinOps & Gateway

FinOps provides real-time cost visibility, token budget enforcement, and spend attribution across every agent, team, and model in your Procurator deployment. The Gateway layer enforces those budgets at the API call level — blocking model calls that would exceed configured limits before they ever reach the provider.

Overview

AI model API costs scale directly with usage — and autonomous agents can consume tokens at rates that surprise finance teams. FinOps closes the loop between agent activity and cost management by making spend visible, attributable, and controllable in real time.

Unlike post-hoc billing analysis, Procurator's Gateway enforces budgets proactively: before a model API call is made, the Gateway checks the current spend against the applicable budget. If the call would exceed the limit, it is blocked and the agent receives a budget-exceeded error — no surprise invoice at the end of the month.

Requires Model Cost Rates

FinOps accuracy depends on model cost rates configured in the Models registry. Without rates, Procurator tracks token counts but cannot calculate costs. Set input/output rates per model for full FinOps capability.

Control Panel

The screenshot below shows the live Procurator administration interface for this feature.

app.operativus.ai/procurator/finops
Procurator finops administration interface

FinOps & Gateway — real-time cost tracking, budget enforcement, and spend attribution.

Architecture

Agent Execution Request │ ▼ ┌─────────────┐ │ Gateway │ ← Checks: token rate limits, budget headroom, │ (Pre-call) │ allowed models, organization policy └──────┬──────┘ │ ┌────┴────┐ │ BLOCKED │ → Return budget-exceeded / policy-violation error │ │ to agent; session marked as BUDGET_EXCEEDED └─────────┘ │ ┌────┴────┐ │ ALLOWED │ → Forward call to model provider API └────┬────┘ │ ┌──────┴──────┐ │ Gateway │ ← Records: input tokens, output tokens, latency, │ (Post-call) │ model ID, agent ID, team ID, session ID └──────┬──────┘ │ ▼ FinOps Attribution ├── Agent budget: deduct tokens + cost ├── Team budget: deduct tokens + cost ├── Org budget: deduct tokens + cost └── Session record: append token/cost turn data

Key Capabilities

💰

Real-Time Cost Tracking

Costs are calculated and attributed within milliseconds of every model call completion — no batch processing lag.

🚦

Pre-Call Budget Enforcement

The Gateway blocks model calls that would exceed budget limits before they reach the provider — not after. Prevent runaway spend entirely.

📊

Multi-Level Attribution

Every token is attributed to an agent, a team, and a model simultaneously. Slice cost data any way you need.

📈

Burn Rate Analytics

View hourly, daily, and weekly burn rates per agent and team. Project forward to estimate month-end spend against budgets.

🔔

Budget Alerts

Receive alerts at configurable saturation thresholds (50%, 75%, 90%, 100%) before agents are blocked — time to take action.

📅

Budget Reset Cycles

Configure budgets to reset daily, weekly, monthly, or on a custom interval — aligned to your billing cycles or sprint cadence.

Administration

FinOps Dashboard

Navigate to Administration → FinOps & Gateway. The dashboard provides an organization-wide financial overview:

  • Total Spend: Organization-wide spend for the current period, with trend vs. prior period.
  • Top Spenders: Ranked table of agents and teams by spend — identify cost leaders at a glance.
  • Budget Saturation: Progress bars for all active budgets, color-coded by saturation level.
  • Model Breakdown: Spend split by model provider and model ID — see what percentage of spend goes to each model.
  • Hourly Burn Chart: Time-series chart of token consumption and cost over the current period, with anomaly highlights.
  • Blocked Calls: Count of gateway-blocked calls in the current period — a budget that's blocking agents is worth revisiting.

Budget Configuration

Budgets can be applied at three levels. A single model call is checked against all applicable budgets simultaneously — a call is blocked if any applicable budget would be exceeded.

Budget LevelScopeUse Case
Agent BudgetApplies to all sessions for a specific agentLimit individual agent spend — useful for agents with variable load or in testing
Team BudgetApplies to all sessions for agents belonging to a teamDepartmental or project budget allocation — shared limit across a group of agents
Organization BudgetApplies to all sessions across the entire orgHard monthly cap — ensures total org spend never exceeds a threshold

Budget Configuration Reference

FieldTypeRequiredDescription
namestringrequiredHuman-readable budget name (e.g., "Engineering Team — April").
budgetTypeenumrequiredAGENT, TEAM, or ORGANIZATION.
targetIdstringoptionalagentId or teamId (omit for ORGANIZATION type).
limitUsddecimalrequiredMaximum spend in USD for the budget period.
limitTokensintegeroptionalMaximum total tokens (input + output) for the budget period. Use as an alternative or complement to USD limits.
resetCycleenumrequiredDAILY, WEEKLY, MONTHLY, NEVER, or CUSTOM.
resetDayintegeroptionalFor MONTHLY: day of month to reset (1-28). For WEEKLY: day of week (0=Sunday). For CUSTOM: not applicable.
alertThresholdsarrayoptionalArray of saturation percentages to alert on (e.g., [50, 75, 90]).
alertWebhookUrlstringoptionalURL to POST budget alert payloads to.
enforcementActionenumoptionalBLOCK (default) or ALERT_ONLY. ALERT_ONLY raises alerts but does not block calls.

Enforcement Actions

When a model call is blocked by the Gateway, the following happens:

  • The call is not forwarded to the model provider — no tokens are consumed and no cost is incurred.
  • The agent receives a structured error: {"error": "BUDGET_EXCEEDED", "budgetId": "...", "currentSpendUsd": 45.23, "limitUsd": 45.00}.
  • The session is marked BUDGET_EXCEEDED in the Sessions ledger.
  • An alert is triggered regardless of configured thresholds (100% saturation).
  • A budget block event is logged to the Security audit trail.

Cost Attribution Deep Dive

Every token in Procurator carries attribution metadata through the full stack:

  • Session-level: Each session record carries a totalTokens and totalCostUsd field, updated in real time as turns complete.
  • Turn-level: Each conversation turn records the model used, input tokens, output tokens, and turn cost. This enables granular analysis of which turns in a multi-turn session are expensive.
  • Agent-level: The agent detail page in the Admin Dashboard shows cumulative token and cost totals for the current budget period.
  • Team-level: The Team detail page aggregates costs across all agents in the team.

Budget Alerts

Configure alerts per budget to receive proactive warnings before enforcement kicks in:

# Example alert thresholds alertThresholds: [50, 75, 90] # At 50% saturation → "INFO: Budget 'Engineering' at 50% ($22.50 / $45.00)" # At 75% saturation → "WARN: Budget 'Engineering' at 75% ($33.75 / $45.00)" # At 90% saturation → "CRITICAL: Budget 'Engineering' at 90% ($40.50 / $45.00)" # At 100% (blocked) → "ENFORCED: Budget 'Engineering' exceeded — calls blocked"

Alert payloads are sent to the configured webhook URL, Slack channel, or email group associated with the budget.

The Gateway

The Gateway is Procurator's AI model API proxy layer. All model calls made by agents route through the Gateway, which provides:

  • Budget enforcement: Pre-call budget checks as described above.
  • Rate limiting: Per-agent and per-team requests-per-minute (RPM) and tokens-per-minute (TPM) limits to prevent provider-side rate limit errors from taking down an entire deployment.
  • Request logging: Every model API request and response is logged with timing data for observability.
  • Credential abstraction: Agents never hold provider API keys. The Gateway injects credentials at call time from the Models registry's secret store.
  • Retry and fallback: Configurable retry logic for transient provider errors, with optional fallback to a secondary model if the primary is unavailable.
Gateway as Central Control Point

Because all model calls route through the Gateway, adding a new capability (budget enforcement, logging, rate limiting, model fallback) applies universally to all agents without any agent reconfiguration. The Gateway is where organizational policy lives at the infrastructure level.

Permissions

  • finops:read— View FinOps dashboard, spend data, and budget status
  • finops:create— Create new budgets
  • finops:modify— Edit budget limits, thresholds, and enforcement actions
  • finops:delete— Remove budgets
  • gateway:configure— Configure Gateway rate limits, retry policies, and fallback models