Part of Alfred’s six-layer architecture. The Data layer captures your world and feeds it into the vault.
Your world, flowing in
You don’t have to hand everything to Alfred manually. Streams are data pipelines that capture events from external sources — your email, your calendar, your payments, your conversations — and deliver them to Alfred’s Inbox automatically. Once a stream is active, Alfred receives a continuous feed of structured events. The Curator processes each one, creating and updating vault records just as if you’d shared the content yourself.How streams work
Every stream follows the same pattern:An event occurs
Something happens in the outside world — an email arrives, a calendar event starts, a payment is processed, a conversation takes place.
The stream captures it
Your configured stream detects the event and wraps it in a StreamEvent envelope with metadata: source, timestamp, tenant, and the raw payload.
Source types
Streams come in three flavours, depending on how the source delivers data.Scheduled Pull — Alfred reaches out on an interval
Scheduled Pull — Alfred reaches out on an interval
A Temporal workflow polls an external API on an interval you define. Alfred reaches out, checks for new data, and pulls it in.
| Source | What it captures | Typical interval |
|---|---|---|
| Gmail | New emails and threads (full email content, not just metadata) | Every 5 minutes |
| Calendar | Upcoming and past events | Every 15 minutes |
| Notion | Pages, databases, and block-level content (API token auth) | Every 15 minutes |
| Health data | Activity, sleep, vitals | Every 30 minutes |
| OpenClaw sessions | Your Alfred conversations and subagent runs | Every 5 minutes |
Webhook Push — external services POST events instantly
Webhook Push — external services POST events instantly
An external service POSTs events directly to your dedicated webhook URL. Alfred receives them instantly as they happen.
Each webhook stream generates a unique, secure URL. Copy it from your dashboard and configure it in the external service.
| Source | What it captures |
|---|---|
| Polar | Payment confirmations and subscription events |
| GitHub | Commits, PRs, issues, and releases |
| Stripe | Charges, invoices, and subscription changes |
| Custom | Any service that supports webhook delivery |
Realtime — a persistent connection for continuous data
Realtime — a persistent connection for continuous data
A long-lived WebSocket connection maintains a continuous feed for sources that produce data in real time.
Realtime streams stay connected as long as your Alfred is running, automatically reconnecting if the connection drops.
| Source | What it captures |
|---|---|
| Omi | Ambient audio from your wearable — speech detection, local transcription, quality-gated stream events |
Omi audio pipeline
Omi delivers raw PCM audio over a persistent WebSocket. Alfred processes it through a local pipeline:- RMS energy speech detection — audio frames are evaluated against an RMS energy threshold of 300. Silence is discarded immediately.
- Speech grouping — consecutive speech segments are grouped together. A gap of 60 seconds or more starts a new group.
- Local transcription — each speech group is transcribed by
whisper-large-v3running locally (int8 quantization, CPU-only, 4GB RAM via thelearncontainer). - Quality gate — transcripts must pass language confidence > 0.5 and contain at least 5 words, otherwise they are dropped.
- Stream events — passing transcripts are emitted as standard StreamEvents into the Inbox.
Notion integration
Notion is connected via an API token (internal integration), not OAuth. Alfred polls the Notion API every 15 minutes and:- Fetches all accessible pages and databases
- Retrieves block-level content for full page fidelity (not just page titles or properties)
- Parses database schemas to understand property types, relations, and rollups
- Emits each changed page as a StreamEvent with the full block tree in the payload
The StreamEvent envelope
Every event from every stream arrives in a standard envelope:| Field | Purpose |
|---|---|
stream_id | Identifies which stream produced this event |
stream_type | scheduled_pull, webhook_push, or realtime |
tenant_id | Your tenant identifier |
received_at | When Alfred received the event |
source_ref | Unique reference from the source, used for deduplication |
payload | The raw event data from the external source |
summary | Optional human-readable summary of the event |
source_ref field ensures Alfred never processes the same event twice, even if a stream delivers it more than once.
Default stream: OpenClaw Session Logs
Every Alfred comes with one stream already active — OpenClaw Session Logs. This stream captures every conversation between you and Alfred, plus all subagent runs that happen behind the scenes. A Temporal schedule polls for new session data every 5 minutes and delivers it to your Inbox. Alfred processes these sessions to extract:- Decisions made during conversation
- Entities mentioned — people, projects, organizations
- Tasks and commitments identified in dialogue
- Patterns over time — recurring topics, evolving priorities
OpenClaw session capture
OpenClaw sessions are captured via thealfred-inbox hook, which writes session data directly to a shared volume at /alfred-data/streams/. This ensures every conversation and subagent run is persisted as a stream event regardless of polling schedules.
Email processing at scale
When Alfred processes your email — whether during the initial backfill or ongoing stream ingestion — it uses a pipeline designed for efficiency at volume.Domain-clustered batch processing
Emails are grouped by sender domain before processing. All emails fromgithub.com become one batch, all emails from stripe.com become another, and so on. Each inbox file contains up to 100 emails.
The format varies by domain type:
- Service domains (automated senders like GitHub, Stripe, Linear) get compact format — subject line and snippet only. This reduces token usage without losing signal.
- Personal domains (human senders) get full body text, giving the Curator the context it needs to extract relationships, commitments, and decisions.
Parallel curator
The Curator processes inbox files concurrently usingasyncio.gather with a semaphore. The concurrency level is configurable via watcher.max_concurrent in config.yaml (default: 4 workers).
Performance characteristics
As a benchmark: 1,600 emails produce approximately 78 inbox files. With 4 concurrent workers, the Curator processes the full batch in roughly 2.5 hours. Actual times vary depending on email complexity and the model selected for the Curator specialist.Gmail backfill
When you connect Gmail, Alfred performs a one-time backfill that ingests full email content as stream events — not just metadata or extracted facts. Every email body, attachment reference, and thread context is captured, giving the Curator the same rich material it would receive from a freshly arrived message.Stream event counts
Stream event counts displayed on your dashboard are computed directly from JSONL files stored on your tenant’s encrypted volume. There is no separate database — the JSONL files are the source of truth.Managing streams
Navigate to your dashboard to see all your configured streams. Each stream shows:| Element | What it means |
|---|---|
| Status dot | Green = active, Red = error, Dim = paused |
| Stream name | The name you gave this stream |
| Type badge | Scheduled Pull, Webhook Push, or Realtime |
| Last event | When the most recent event was received |
What happens next
Once events land in your Inbox, the standard specialist pipeline takes over. The Curator reads and structures. The Janitor verifies. The Distiller surfaces insights. Your vault grows richer, automatically. Streams are the front door — the rest of Alfred’s team handles everything behind it.Using the Inbox
How content flows through Alfred’s intake pipeline
Your Specialists
Monitor and direct your specialists
