Agents

The agents deployed on your workspace. Open the code, or run one live on the harness.

Loading agents…

Author an agent in plain language, or bring your own code.

1 Author your agent
✦ MiddyMind will scaffold this onto claude-opus-4-8, register the skills it implies, and wire the hooks. You can switch to Bring your code anytime to take full control — the engine binding below stays the same.
2 Entity type & connectors
connector.salesforce
CRM accounts, contacts, and opportunities
managed
connector.zendesk
Support tickets and conversation history
managed
connector.gong
Call recordings and conversation intelligence
managed
connector.snowflake
Warehouse tables and derived facts
managed
connector.kb
Knowledge base / docs for grounding
managed
Bring data
CSV · JSON · text · PDF · images — read into the agent's first run
3 Policy — autonomy & caps
per turn · hard cap, enforced by the runtime
4 Capabilities — tools & skills
Leave all unchecked for full access (default). Check any to restrict this agent to only what's checked.
Sign in to load capabilities.
✓ Validated · agent will spin up on the harness in-tenant (dedicated-VPC) with the Signal Brief injected each turn.
Agent
running live on the MiddyMind harness
Open an agent from Explore to chat.
Every reply runs live on the harness and is saved to your Memory Fabric.

Agent Info

Status
Source
Model
Autonomy
Bound entity
Cost cap
Last run

Spec

Logs & debug

Open an agent to see its backend runs, model calls, and errors.

Fleet & metering

Every agent your teams have deployed on the engine, and the two metered counters that price it. Counts and signatures leave the perimeter; data never does.

Deployed agents
on your workspace
Runs (recent)
orchestration runs
Spend (recent runs)
USD

Deployed agents

+ Deploy agent
Agent Entity Status Turns / 7d Entities Eval vs SLA Actions
Sign in to see the agents on your workspace.

Recent runs

View in Audit
No runs yet — deploy an agent and run it.

Agent Templates

Deploy a working, governed agent in one screen. Each template is eval-gated — it goes live only after passing on your own data.
Loading catalog…

Memory Fabric

Every entity your workspace knows, the typed + sourced facts about it, and its interaction timeline. This is what your agents read before they act.

Entities

Select an entity to see its facts and timeline.

Connections

Bring data into your Memory Fabric. Upload a file to ingest it as facts now, or connect a managed source. Programmatic ingestion uses your API key.

Ingest a file → Memory Fabric

No file
CSV → one fact per row · JSON object → one per key · JSON array → one per item · text → one document fact. Max 10 MB.

Connected sources

Loading…

Available connectors

Programmatic access

Your current session key (Bearer token). Use it to write facts or ingest from your own systems:
Operate · the Conductor drives the turn

Run a Goal

State a goal; the Conductor plans, routes to sub-agents, runs, and verifies — grounded in the entity's memory. The latency strip tracks read-time p95 against the 150ms SLA. The outcome routes to one-click approval and is fully replayable in Audit.

p50 read
p95 read · SLA 150ms
throughput · 1h
Surface 1 · the supervisor's default

Work Queue

Open entities that need a human, ranked by a composite severity + SLA-remaining score. A low-severity item near breach still surfaces. Rows carry only triage essentials — full timeline and evidence are one keypress away.

J/K move · Space peek · E assign · ⇧E escalate
SevEntityWhat's happeningOwnerWaitingAssignee
No open work items for your workspace.
Surface 2 · make-or-break · highest-frequency interaction

Approval Inbox

One glance → one decision. Every card carries its evidence and recommendation, so the human is verifying, not researching. Confidence is a calibrated word-label, never a bare percentage. Irreversible actions require visible sign-off.

No pending approvals. Agents are acting within policy.
Surface · is this the same thing?

Resolution Review

Candidate pairs the engine flagged as possibly one entity — reviewed side-by-side with a per-attribute algorithm table (exact · Jaro-Winkler · phonetic · embedding) and a weighted consensus. Merges are reversible and audited; a wrong link is a correction, never data loss.

Loading…
Fleet · running agents mapped to the engine lifecycle

Fleet

Every deployed agent grouped by the lifecycle stage its template declared — with live status, last/next run, spend and a word-label standing. Read-only: a thin projection of real rows, not an execution canvas.

read-only · grounded in the cost ledger + schedules
Activity · the live build-log

Activity

A streamed feed of real lifecycle events — runs completed, facts written, approvals raised, ship-gate decisions — straight from the append-only audit log. No canned feed; every entry is a real event for your workspace.

idle
Intake · completeness review (human-in-the-loop)

Intake Queue

Incomplete intakes the agents flagged for follow-up — each shows the missing fields and the source so a reviewer can request them or mark the record complete. Driven by real request_more_information facts, not fixtures.

EntityMissing fieldsSummarySourceFlagged
No incomplete intakes — every record is complete.
Surface 3 · the world model for one entity

Entity View

What does the system actually know about this thing — resolved identity, the sources it was stitched from with confidence, a bitemporal timeline, and the freshness of every fact. Every fact drills to its source and last-verified time.

No entities yet
Stitched sources
Surface 4 · org chart of your AI workforce

Agent Roster

Every agent, its model, what it's doing right now, and live spend against caps. A soft alert (amber bar + bell) is observability; a hard cap (red bar + stop) is enforcement — the two are visually distinct. One bad loop must never become a $14,000 bill.

spend window: daily · weekly · monthly
No agents have run yet. Deploy one from Create.
Surface · Live Eval & validation (PRD §B.9 / §B.10)

Eval

Static benchmarks lie — frozen test sets get memorized while the real workload doesn't improve. The eval set is the tenant's own data, fresh; cases the agent got wrong are promoted to the top. Pass-rate by intent against SLA tells a supervisor which intents are safe to push toward auto.

live-sampled · anonymized · replayed as eval
No eval signals yet — they accrue as the workspace handles live interactions.
Memory benchmark
Long-conversation recall, run in an isolated sandbox — never touches tenant memory. Complements Live Eval (outcomes), it doesn't replace it.
Not run yet.
The validation process — converse → harvest → grade → tune → deploy (§B.10), gated
1

Converse & harvest

Live, anonymized interactions are continuously sampled and replayed as eval cases — the test set is the tenant's own fresh data, not a frozen benchmark.

2

Grade by outcome

Each trajectory is graded against the vertical's verifier (resolution, recurrence, reconciliation, replay-completeness) and scrubbed of PII before any signal leaves the perimeter.

3

Promote failures

Cases the agent got wrong rise to the top; skills below threshold are flagged as new, amplified, or thinning so regressions are caught early.

4

Tune (per-tenant)

Graded trajectories feed per-tenant retrieval, routing, and optional fine-tunes. The moat: 10,000 in-tenant graded trajectories a competitor never had access to.

5

Gated deploy

A release that regresses pass-rate on a tenant's own promoted-failure set does not advance (§16.2). An intent earns its way escalate → one-click → auto only when eval clears SLA.

Surface 5 · make-or-break · the compliance officer's home

Audit & Rights

Two jobs: time-travel / replay, and rights fulfillment. Render any past run as a read-only DAG — goal → sub-goals → skill calls → outcomes — each node click-to-evidence. The graph is something you read, never edit. An audit is a query, not a three-week project.

valid-time + advanced (tx-time) ↗
No runs to replay yet. Run an agent to populate the audit DAG.
Surface 6 · the capability DAG — a view, NOT an editor

Registry

The skills and tools available to agents, with a dependency graph (nodes = skills, edges = dependsOn). You author the vocabulary; the Conductor authors the sentence. There is no place to draw execution order — if a draggable canvas ever appears here, it's a regression (§4 non-goal).

⚠ This shows capabilities and their dependencies — not the order the Conductor will run them. Execution order is read after the fact in Audit.
No skills registered.
Surface 7 · mostly the platform engineer's surface

Connectors & Policy

A source catalog with sync health, schema-drift alerts that propose remappings, and the policy editor. On a breaking change the connector pauses and requires manual review rather than silently propagating downstream.

Source catalog
No connected sources yet.
Policy editor · per agent · reversible HITL ladder
Agent$ cap / turnBound entityHITL tier (click to change · reversible)
No agents deployed yet.

Admin

Owner-only · logging · metrics · system health

LLM call log

TimeAgentModelStatusInOutCostLatencyTrace
Loading…

Audit trail

TimeTenantActorEventEntityTrace
Loading…

Auth events

TimeSubjectMethodOutcomeRouteIP
Loading…

Debug

Your workspace · recent activity, spend, and errors — your tenant only

Recent model calls

TimeAgentModelStatusTokensCostTrace
Sign in to view.

Recent runs

RunStatusPromptCostWhen

Errors

WhereDetail
No errors.
Account · your workspace

Account

Loading…
Settings · this workspace

Studio & branding

Configure how this client's console and demo look and behave. Defaults come from the client plugin; your edits are saved per tenant and applied live on the next load.
Loading…