FinOps & Gateway

Overview

AI model API costs scale directly with usage — and autonomous agents can consume tokens at rates that surprise finance teams. FinOps closes the loop between agent activity and cost management by making spend visible, attributable, and controllable in real time.

Unlike post-hoc billing analysis, Procurator's Gateway enforces budgets proactively: before a model API call is made, the Gateway checks the current spend against the applicable budget. If the call would exceed the limit, it is blocked and the agent receives a budget-exceeded error — no surprise invoice at the end of the month.

Requires Model Cost Rates

FinOps accuracy depends on model cost rates configured in the Models registry. Without rates, Procurator tracks token counts but cannot calculate costs. Set input/output rates per model for full FinOps capability.

Control Panel

The screenshot below shows the live Procurator administration interface for this feature.

app.operativus.ai/procurator/finops

Procurator finops administration interface

FinOps & Gateway — real-time cost tracking, budget enforcement, and spend attribution.

Architecture

Agent Execution Request │ ▼ ┌─────────────┐ │ Gateway │ ← Checks: token rate limits, budget headroom, │ (Pre-call) │ allowed models, organization policy └──────┬──────┘ │ ┌────┴────┐ │ BLOCKED │ → Return budget-exceeded / policy-violation error │ │ to agent; session marked as BUDGET_EXCEEDED └─────────┘ │ ┌────┴────┐ │ ALLOWED │ → Forward call to model provider API └────┬────┘ │ ┌──────┴──────┐ │ Gateway │ ← Records: input tokens, output tokens, latency, │ (Post-call) │ model ID, agent ID, team ID, session ID └──────┬──────┘ │ ▼ FinOps Attribution ├── Agent budget: deduct tokens + cost ├── Team budget: deduct tokens + cost ├── Org budget: deduct tokens + cost └── Session record: append token/cost turn data

Key Capabilities

💰

Real-Time Cost Tracking

Costs are calculated and attributed within milliseconds of every model call completion — no batch processing lag.

🚦

Pre-Call Budget Enforcement

The Gateway blocks model calls that would exceed budget limits before they reach the provider — not after. Prevent runaway spend entirely.

📊

Multi-Level Attribution

Every token is attributed to an agent, a team, and a model simultaneously. Slice cost data any way you need.

📈

Burn Rate Analytics

View hourly, daily, and weekly burn rates per agent and team. Project forward to estimate month-end spend against budgets.

🔔

Budget Alerts

Receive alerts at configurable saturation thresholds (50%, 75%, 90%, 100%) before agents are blocked — time to take action.

📅

Budget Reset Cycles

Configure budgets to reset daily, weekly, monthly, or on a custom interval — aligned to your billing cycles or sprint cadence.

Administration

FinOps Dashboard

Navigate to Administration → FinOps & Gateway. The dashboard provides an organization-wide financial overview:

Total Spend: Organization-wide spend for the current period, with trend vs. prior period.
Top Spenders: Ranked table of agents and teams by spend — identify cost leaders at a glance.
Budget Saturation: Progress bars for all active budgets, color-coded by saturation level.
Model Breakdown: Spend split by model provider and model ID — see what percentage of spend goes to each model.
Hourly Burn Chart: Time-series chart of token consumption and cost over the current period, with anomaly highlights.
Blocked Calls: Count of gateway-blocked calls in the current period — a budget that's blocking agents is worth revisiting.

Budget Configuration

Budgets can be applied at three levels. A single model call is checked against all applicable budgets simultaneously — a call is blocked if any applicable budget would be exceeded.

Budget Level	Scope	Use Case
Agent Budget	Applies to all sessions for a specific agent	Limit individual agent spend — useful for agents with variable load or in testing
Team Budget	Applies to all sessions for agents belonging to a team	Departmental or project budget allocation — shared limit across a group of agents
Organization Budget	Applies to all sessions across the entire org	Hard monthly cap — ensures total org spend never exceeds a threshold

Budget Configuration Reference

Field	Type	Required	Description
name	string	required	Human-readable budget name (e.g., "Engineering Team — April").
budgetType	enum	required	`AGENT`, `TEAM`, or `ORGANIZATION`.
targetId	string	optional	agentId or teamId (omit for ORGANIZATION type).
limitUsd	decimal	required	Maximum spend in USD for the budget period.
limitTokens	integer	optional	Maximum total tokens (input + output) for the budget period. Use as an alternative or complement to USD limits.
resetCycle	enum	required	`DAILY`, `WEEKLY`, `MONTHLY`, `NEVER`, or `CUSTOM`.
resetDay	integer	optional	For MONTHLY: day of month to reset (1-28). For WEEKLY: day of week (0=Sunday). For CUSTOM: not applicable.
alertThresholds	array	optional	Array of saturation percentages to alert on (e.g., `[50, 75, 90]`).
alertWebhookUrl	string	optional	URL to POST budget alert payloads to.
enforcementAction	enum	optional	`BLOCK` (default) or `ALERT_ONLY`. ALERT_ONLY raises alerts but does not block calls.

Enforcement Actions

When a model call is blocked by the Gateway, the following happens:

The call is not forwarded to the model provider — no tokens are consumed and no cost is incurred.
The agent receives a structured error: {"error": "BUDGET_EXCEEDED", "budgetId": "...", "currentSpendUsd": 45.23, "limitUsd": 45.00}.
The session is marked BUDGET_EXCEEDED in the Sessions ledger.
An alert is triggered regardless of configured thresholds (100% saturation).
A budget block event is logged to the Security audit trail.

Cost Attribution Deep Dive

Every token in Procurator carries attribution metadata through the full stack:

Session-level: Each session record carries a totalTokens and totalCostUsd field, updated in real time as turns complete.
Turn-level: Each conversation turn records the model used, input tokens, output tokens, and turn cost. This enables granular analysis of which turns in a multi-turn session are expensive.
Agent-level: The agent detail page in the Admin Dashboard shows cumulative token and cost totals for the current budget period.
Team-level: The Team detail page aggregates costs across all agents in the team.

Budget Alerts

Configure alerts per budget to receive proactive warnings before enforcement kicks in:

# Example alert thresholds
alertThresholds: [50, 75, 90]

# At 50% saturation → "INFO: Budget 'Engineering' at 50% ($22.50 / $45.00)"
# At 75% saturation → "WARN: Budget 'Engineering' at 75% ($33.75 / $45.00)"
# At 90% saturation → "CRITICAL: Budget 'Engineering' at 90% ($40.50 / $45.00)"
# At 100% (blocked)  → "ENFORCED: Budget 'Engineering' exceeded — calls blocked"
          

Alert payloads are sent to the configured webhook URL, Slack channel, or email group associated with the budget.

The Gateway

The Gateway is Procurator's AI model API proxy layer. All model calls made by agents route through the Gateway, which provides:

Budget enforcement: Pre-call budget checks as described above.
Rate limiting: Per-agent and per-team requests-per-minute (RPM) and tokens-per-minute (TPM) limits to prevent provider-side rate limit errors from taking down an entire deployment.
Request logging: Every model API request and response is logged with timing data for observability.
Credential abstraction: Agents never hold provider API keys. The Gateway injects credentials at call time from the Models registry's secret store.
Retry and fallback: Configurable retry logic for transient provider errors, with optional fallback to a secondary model if the primary is unavailable.

Gateway as Central Control Point

Because all model calls route through the Gateway, adding a new capability (budget enforcement, logging, rate limiting, model fallback) applies universally to all agents without any agent reconfiguration. The Gateway is where organizational policy lives at the infrastructure level.

Permissions

finops:read— View FinOps dashboard, spend data, and budget status
finops:create— Create new budgets
finops:modify— Edit budget limits, thresholds, and enforcement actions
finops:delete— Remove budgets
gateway:configure— Configure Gateway rate limits, retry policies, and fallback models

Overview

Control Panel

Architecture

Key Capabilities

Real-Time Cost Tracking

Pre-Call Budget Enforcement

Multi-Level Attribution

Burn Rate Analytics

Budget Alerts

Budget Reset Cycles

Administration

FinOps Dashboard

Budget Configuration

Budget Configuration Reference

Enforcement Actions

Cost Attribution Deep Dive

Budget Alerts

The Gateway

Permissions

Related Features