Data Layer

Part of Alfred’s six-layer architecture. The Data layer captures your world and feeds it into the vault.

Your world, flowing in

You don’t have to hand everything to Alfred manually. Streams are data pipelines that capture events from external sources — your email, your calendar, your payments, your conversations — and deliver them to Alfred’s Inbox automatically. Once a stream is active, Alfred receives a continuous feed of structured events. The Curator processes each one, creating and updating vault records just as if you’d shared the content yourself.

How streams work

Every stream follows the same pattern:

An event occurs

Something happens in the outside world — an email arrives, a calendar event starts, a payment is processed, a conversation takes place.

The stream captures it

Your configured stream detects the event and wraps it in a StreamEvent envelope with metadata: source, timestamp, tenant, and the raw payload.

Alfred receives it

The StreamEvent lands in your Inbox. The Curator reads it, extracts entities, and creates or updates vault records — people, tasks, decisions, whatever the content contains.

Source types

Streams come in three flavours, depending on how the source delivers data.

Scheduled Pull — Alfred reaches out on an interval

A Temporal workflow polls an external API on an interval you define. Alfred reaches out, checks for new data, and pulls it in.

Source	What it captures	Typical interval
Gmail	New emails and threads (full email content, not just metadata)	Every 5 minutes
Calendar	Upcoming and past events	Every 15 minutes
Notion	Pages, databases, and block-level content (API token auth)	Every 15 minutes
Health data	Activity, sleep, vitals	Every 30 minutes
OpenClaw sessions	Your Alfred conversations and subagent runs	Every 5 minutes

Webhook Push — external services POST events instantly

An external service POSTs events directly to your dedicated webhook URL. Alfred receives them instantly as they happen.

Source	What it captures
Polar	Payment confirmations and subscription events
GitHub	Commits, PRs, issues, and releases
Stripe	Charges, invoices, and subscription changes
Custom	Any service that supports webhook delivery

Each webhook stream generates a unique, secure URL. Copy it from your dashboard and configure it in the external service.

Realtime — a persistent connection for continuous data

A long-lived WebSocket connection maintains a continuous feed for sources that produce data in real time.

Source	What it captures
Omi	Ambient audio from your wearable — speech detection, local transcription, quality-gated stream events

Realtime streams stay connected as long as your Alfred is running, automatically reconnecting if the connection drops.

Omi audio pipeline

Omi delivers raw PCM audio over a persistent WebSocket. Alfred processes it through a local pipeline:

RMS energy speech detection — audio frames are evaluated against an RMS energy threshold of 300. Silence is discarded immediately.
Speech grouping — consecutive speech segments are grouped together. A gap of 60 seconds or more starts a new group.
Local transcription — each speech group is transcribed by whisper-large-v3 running locally (int8 quantization, CPU-only, 4GB RAM via the learn container).
Quality gate — transcripts must pass language confidence > 0.5 and contain at least 5 words, otherwise they are dropped.
Stream events — passing transcripts are emitted as standard StreamEvents into the Inbox.

Audio is retained for 48 hours, then automatically purged.

Notion integration

Notion is connected via an API token (internal integration), not OAuth. Alfred polls the Notion API every 15 minutes and:

Fetches all accessible pages and databases
Retrieves block-level content for full page fidelity (not just page titles or properties)
Parses database schemas to understand property types, relations, and rollups
Emits each changed page as a StreamEvent with the full block tree in the payload

The StreamEvent envelope

Every event from every stream arrives in a standard envelope:

{
  "stream_id": "stream_gmail_abc123",
  "stream_type": "scheduled_pull",
  "tenant_id": "tenant_xyz",
  "received_at": "2026-03-01T14:32:00Z",
  "source_ref": "msg_18f3a2b4c5d6e7f8",
  "payload": { ... },
  "summary": "Email from Alice Chen re: Q1 launch timeline"
}

Field	Purpose
`stream_id`	Identifies which stream produced this event
`stream_type`	`scheduled_pull`, `webhook_push`, or `realtime`
`tenant_id`	Your tenant identifier
`received_at`	When Alfred received the event
`source_ref`	Unique reference from the source, used for deduplication
`payload`	The raw event data from the external source
`summary`	Optional human-readable summary of the event

The source_ref field ensures Alfred never processes the same event twice, even if a stream delivers it more than once.

Default stream: OpenClaw Session Logs

Every Alfred comes with one stream already active — OpenClaw Session Logs. This stream captures every conversation between you and Alfred, plus all subagent runs that happen behind the scenes. A Temporal schedule polls for new session data every 5 minutes and delivers it to your Inbox. Alfred processes these sessions to extract:

Decisions made during conversation
Entities mentioned — people, projects, organizations
Tasks and commitments identified in dialogue
Patterns over time — recurring topics, evolving priorities

Your conversations with Alfred aren’t ephemeral. They become part of your vault — searchable, connected, and available to your specialists for deeper analysis.

OpenClaw Session Logs are enabled by default for all tenants. You can pause the stream from your dashboard if you prefer conversations to remain unprocessed.

OpenClaw session capture

OpenClaw sessions are captured via the alfred-inbox hook, which writes session data directly to a shared volume at /alfred-data/streams/. This ensures every conversation and subagent run is persisted as a stream event regardless of polling schedules.

Email processing at scale

When Alfred processes your email — whether during the initial backfill or ongoing stream ingestion — it uses a pipeline designed for efficiency at volume.

Domain-clustered batch processing

Emails are grouped by sender domain before processing. All emails from github.com become one batch, all emails from stripe.com become another, and so on. Each inbox file contains up to 100 emails. The format varies by domain type:

Service domains (automated senders like GitHub, Stripe, Linear) get compact format — subject line and snippet only. This reduces token usage without losing signal.
Personal domains (human senders) get full body text, giving the Curator the context it needs to extract relationships, commitments, and decisions.

Parallel curator

The Curator processes inbox files concurrently using asyncio.gather with a semaphore. The concurrency level is configurable via watcher.max_concurrent in config.yaml (default: 4 workers).

Performance characteristics

As a benchmark: 1,600 emails produce approximately 78 inbox files. With 4 concurrent workers, the Curator processes the full batch in roughly 2.5 hours. Actual times vary depending on email complexity and the model selected for the Curator specialist.

If you want faster processing during the initial backfill, you can increase watcher.max_concurrent via the Terminal — but higher concurrency uses more memory and may increase LLM costs.

Gmail backfill

When you connect Gmail, Alfred performs a one-time backfill that ingests full email content as stream events — not just metadata or extracted facts. Every email body, attachment reference, and thread context is captured, giving the Curator the same rich material it would receive from a freshly arrived message.

Stream event counts

Stream event counts displayed on your dashboard are computed directly from JSONL files stored on your tenant’s encrypted volume. There is no separate database — the JSONL files are the source of truth.

Managing streams

Navigate to your dashboard to see all your configured streams. Each stream shows:

Element	What it means
Status dot	Green = active, Red = error, Dim = paused
Stream name	The name you gave this stream
Type badge	Scheduled Pull, Webhook Push, or Realtime
Last event	When the most recent event was received

Every stream supports Pause, Resume, Edit, and Delete. For webhook streams, you’ll also see a copyable webhook URL.

What happens next

Once events land in your Inbox, the standard specialist pipeline takes over. The Curator reads and structures. The Janitor verifies. The Distiller surfaces insights. Your vault grows richer, automatically. Streams are the front door — the rest of Alfred’s team handles everything behind it.

Using the Inbox

How content flows through Alfred’s intake pipeline

Your Specialists

Monitor and direct your specialists

Getting Started

Architecture

Your Vault

Guides

Reference

Your world, flowing in

How streams work

Source types

Omi audio pipeline

Notion integration

The StreamEvent envelope

Default stream: OpenClaw Session Logs

OpenClaw session capture

Email processing at scale

Domain-clustered batch processing

Parallel curator

Performance characteristics

Gmail backfill

Stream event counts

Managing streams

What happens next

Using the Inbox

Your Specialists

Getting Started

Architecture

Your Vault

Guides

Reference

​Your world, flowing in

​How streams work

​Source types

​Omi audio pipeline

​Notion integration

​The StreamEvent envelope

​Default stream: OpenClaw Session Logs

​OpenClaw session capture

​Email processing at scale

​Domain-clustered batch processing

​Parallel curator

​Performance characteristics

​Gmail backfill

​Stream event counts

​Managing streams

​What happens next

Using the Inbox

Your Specialists

Your world, flowing in

How streams work

Source types

Omi audio pipeline

Notion integration

The StreamEvent envelope

Default stream: OpenClaw Session Logs

OpenClaw session capture

Email processing at scale

Domain-clustered batch processing

Parallel curator

Performance characteristics

Gmail backfill

Stream event counts

Managing streams

What happens next