Trust & Safety

Daedalus’s progressive learning capability — the core of its value — is also the source of its responsibility. An AI that improves over time needs guardrails that improve with it.

Three-layer safety model

Layer 1: Behavioral (Values) — Golden Rules, ASK→OFFER→CONFIRM pattern, steering files. These guide behavior when the agent is functioning well. They’re necessary but not sufficient — they rely on the agent choosing to follow them.

Layer 2: Structural (Architecture) — Hard constraints enforced by the Go platform layer, not by the AI model. Write-ahead logging, approval queues, scope boundaries, rollback capability. The agent cannot bypass these because they’re in a separate process with separate authority.

Layer 3: Observational (Transparency) — Audit trail, trust tiers for self-modification, anomaly detection, behavioral drift monitoring. The user sees exactly what changed, when, and why.

The key insight

The intelligence layer (Python/model) and the platform layer (Go) are separate processes with separate authority. The agent can propose changes but cannot execute them without the platform’s approval. This means forgetting a behavioral rule doesn’t result in unconstrained action — the architecture prevents it.

Open source safety

Making Daedalus open source means someone could fork it and strip the safety layer. Our approach: make the safety architecture so deeply integrated that removing it breaks functionality. The approval queue is how specs flow to execution, not a gate in front of it. Audit logging is part of the state transition logic, not a side effect.

The responsible path should be the easy path. The irresponsible path should require deliberate effort.

Trust & Safety Abstract — Full technical paper
Landscape & Rationale — Why Daedalus exists in the current AI landscape

Trust & Safety

Three-layer safety model

The key insight

Open source safety

Read more