Syncore Search: One Gateway, Three Search Engines, No Per-User Keys
The problem with "search" as an agent tool
Agents need three different search modes:
- Web research with citations — Perplexity. Best when the answer requires synthesis across sources.
- Page extraction / structured crawl — Firecrawl. Best when you want clean markdown out of a specific URL or list.
- Real-time social search — Grok over X. Best when freshness matters more than depth.
Each one has its own API, billing model, key registration, and rate limits. The traditional MCP setup makes the user wire up all three separately, paste API keys, and somehow remember which agent action maps to which engine.
What Syncore Search ships
A single gateway endpoint per engine, all behind one Bearer token. Premium tier users don't need to register anything — Syncore's shared keys are pre-funded and metered against your tier quota. BYOK works too, for power users who want to bill providers directly.
agent → /v1/perplexity/chat/completions ┐
/v1/firecrawl/crawl ├─→ ai-gateway → upstream
/v1/grok/chat/completions ┘ (shared key, quota check)The gateway is a single Cloudflare Worker (~3000 lines TS). Each provider proxy is an isolated module — 80-200 lines each. Adding a new search engine is one file: src/providers/<name>.ts that handles auth swap and request rewriting.
How quotas work
Every request flow:
1. JWT verify — Supabase user JWT, ES256, 30s clock skew tolerance, JWKS-based.
2. Tier lookup — single Supabase row read per request. No worker-memory cache yet; tier rarely changes mid-request and the read is sub-10ms. Adding a short TTL cache is on the list.
3. Quota precheck — has the user used > tier_limit for this provider this calendar month?
4. Rate limit — CF Rate Limiting API, key = <user_id>:<provider>, per-tier limiter binding (RATELIMIT_FREE / RATELIMIT_PREMIUM / RATELIMIT_ULTRA).
5. Forward — request goes upstream with the shared provider key.
6. Record — usage incremented in Supabase usage_counters table on success.
Steps 1-4 are shared infra in src/auth.ts + src/quota.ts + src/rate_limit.ts.
Why a custom gateway and not LiteLLM / OpenRouter
LiteLLM and OpenRouter solve the same shape of problem for LLM completions. We considered both and chose to build our own for three reasons:
Per-skill quotas. A user's "embed" usage isn't the same shape as their "perplexity" usage. The model billing API doesn't generalize cleanly. Our PROVIDERS array and per-provider UNIT_NAME (queries / pages / seconds) keep the accounting honest.
Audit logging. Every gateway call lands in our Tinybird tool_calls_v1 table. We can query "how often did the average free user hit Perplexity this week?" and tune limits with real data. That telemetry doesn't exist in third-party gateways.
Webhook handling. Firecrawl jobs send results via webhook. The gateway receives, HMAC-verifies, and stores them in Supabase Storage. A pure routing gateway can't do this.
The skill_search composite
There's a fourth endpoint, /v1/skill-search, that's an interesting case. It takes a query string, calls /v1/embed internally to get a 1024-d bge-m3 vector, then queries a Tinybird ANN pipe over the skill catalog. Returns ranked skill_ids. Used by the daemon's syncore__discover tool to pick which skills to surface for a given user request.
The whole call chain — embed query → vector ANN → results — runs in one round-trip from the daemon's perspective. ~200ms p50.
What this looks like as a user
Limits are accounted in each provider's natural unit (tokens / pages / images / audio-seconds), not "queries":
| Tier | Perplexity tokens | Firecrawl pages | Grok tokens | Seedream images | Deepgram seconds |
|---------|-------------------|-----------------|-------------|-----------------|------------------|
| Free | 100K | 100 | 50K | 0 (gated) | 0 (gated) |
| Premium | 2M | 5K | 1M | 200 | 30K (~8 hr) |
| Ultra | 20M | 50K | 10M | 2K | 300K (~83 hr) |
Embed quota is 6K / 150K / 1.5M per month respectively (cache hits free). Numbers are starting guesses; we adjust based on real usage telemetry. You hit the same code paths whether you're Free or Ultra — the only difference is one row in Supabase.
Tradeoffs we made
No real-time stream proxy. Some providers expose SSE for streaming completions. We don't proxy them — agents calling our gateway get the full response, not a stream. Tradeoff for simpler auth + quota accounting; might revisit when an agent UI wants progressive rendering.
KV cache only on /v1/embed. Other providers' responses aren't deterministic enough to cache safely. Embed is.
No fallback between providers. If Perplexity is down, the gateway returns 502 — it doesn't auto-route to Firecrawl. Agents do that decision; we don't want gateway behavior to be magic.
The gateway is the "infrastructure" layer of Syncore. The skills layer (/skills/perplexity/main.py etc.) is what the agent sees as MCP tools. The gateway makes those skills usable without per-user provider accounts.
Try Syncore for free
Connect 50+ tools to Claude, Cursor, and Windsurf in under 5 minutes. No API keys required to get started.