Inside Syncore Notes: Streaming Meeting Transcription for Agents
Why we built it
Granola, Otter, and Fireflies are great as standalone meeting apps. Where they fall short is the moment your agent needs the meeting context. You finish a Zoom call, ask Claude "what did we decide about the migration?", and Claude has nothing — the transcript lives in another vendor's app.
Syncore Notes is the alternative: a browser-native recorder that pipes transcripts directly to your local Syncore daemon, where any agent on your machine can read them through the note-taker skill.
The data path
mic + tab audio (getUserMedia + getDisplayMedia, mixed via Web Audio API)
│
▼ AudioWorklet — Float32 → Int16 PCM conversion runs on the audio thread
│
▼ WebSocket → ai-gateway → Deepgram nova-3
│ (Bearer = Supabase JWT, no Deepgram key in the browser)
│
▼ Results → live UI render + chunk to ~/.syncore/data/note-taking/<sid>/transcript.md
via daemon's loopback HTTP listenerThree things in this path matter:
AudioWorklet, not ScriptProcessor. ScriptProcessor was deprecated for a reason — it runs on the main thread and stutters when the UI repaints. Worklet runs on the dedicated audio thread, so PCM conversion is jitter-free even while React renders the live transcript.
Pinned 16 kHz sample rate. AudioContext accepts a sampleRate option. We pin it to 16 kHz mono, which is what Deepgram nova-3 wants natively. No Float32 → 24 kHz → 16 kHz resampling round-trip; the browser's resampler handles it once at the mic boundary.
Gateway-mediated WebSocket. The browser never sees a Deepgram API key. It sends a Supabase user JWT to wss://ai-gateway.syncorelabs.ai/v1/deepgram/v1/listen, and the gateway swaps in the shared Deepgram key after verifying tier + per-user quota. Free tier gets a generous window; premium gets unlimited. The browser tab couldn't leak credentials even if it tried.
Live diarization is harder than it looks
Deepgram emits two kinds of Results frames: interim (low-confidence, fast) and is_final (locked in). The naive UX is to display interim live and replace it with the final. But Deepgram re-emits is_final=true frames across utterance boundaries — same text, slightly different speaker ID — which produces visible duplicate rows.
We tried bumping utterance_end_ms from 1200 → 2500 to reduce mid-utterance speaker shuffles. It widened the window between is_final and speech_final enough that duplicates became 5-6 second visible artifacts. We rolled back to 1200 and added a 12-second dedup ring keyed on speaker|text (with a 6-character minimum) to catch the re-emissions. Belt-and-suspenders, but live transcripts now stay clean.
The Notes panel — user-typed context
Live transcription captures *what was said*. It doesn't capture *what mattered*. Two participants might say the same sentence; only one of them is the decision.
We added a Notes editor on the right side of the recording UI. Anything the user types there auto-saves to notes.md next to the transcript on a 1.5-second debounce. The skill's get_session returns both:
{
"transcript": "...full live transcript markdown...",
"user_notes": "Decision: ramp down legacy auth by Q3.\nRisk: data migration window blocks mobile release.",
"speaker_names": { "0": "Eric", "1": "Sarah" }
}The skill manifest description tells the agent: weight `user_notes` heavily for summary / action item / follow-up tools. Those are the points the user explicitly cared about, not just words that were spoken.
This sounds obvious in retrospect. It changes the quality of generated minutes by a lot — less paraphrasing of small talk, more focus on what the user actually flagged.
Tab close handling
Browser tabs can die mid-meeting: laptop sleeps, user accidentally closes the tab, OS kernel evicts the page. We listen for pagehide and fire a keepalive: true PUT to the daemon marking the session as status: closed, ended_reason: tab_closed. The transcript up to that point is preserved; get_session still works.
This is the kind of thing nobody tests until it bites them. We test it.
Why this matters for agents
A meeting that's just an audio file is a black box. A meeting that's a structured transcript with user notes, speaker names, and a stable session id is agent-readable knowledge. Combined with the Syncore Wiki skill, your agent can ingest meetings into a long-term wiki: meeting → raw/meetings/<sid>.md → derived concept pages updated → log entry. The chain of "we said X in a meeting" → "this should affect the design doc" closes automatically.
That last step — wiki ingest — is the subject of [its own post](/blog/syncore-wiki-llm-maintained-knowledge-base). Notes is the input layer; the Wiki is the layer that compounds.
Try Syncore for free
Connect 50+ tools to Claude, Cursor, and Windsurf in under 5 minutes. No API keys required to get started.