HELIX.ai
An AI copilot that understands your codebase.
Context-aware AI pair programmer. Indexes your repo into a semantic graph, retrieves relevant code at query time, and produces suggestions with full file-level context — not just window snippets.
From brief to production system.
Off-the-shelf AI assistants miss codebase context. They hallucinate APIs, misuse internal patterns, and treat each prompt as cold. Engineers waste cycles correcting LLM output instead of writing code.
Hybrid retrieval: AST-aware chunking via Tree-sitter + pgvector semantic search + symbol graph traversal. Claude 4.6 routes through a custom prompt cache that hits 78% of the time. Streamed responses via Server-Sent Events. VSCode + JetBrains plugins.
Beta users report 3.4x faster feature delivery on legacy code. 62% reduction in token cost vs naive RAG. Onboarding time for new engineers cut from 2 weeks to 4 days. Currently in private beta with 14 design partners.
How it shipped, week by week.
Retrieval research
Spent a week reading prior art on code retrieval. Decided on hybrid AST + semantic + symbol graph after benchmarking three approaches on a fixed eval set.
Indexer + storage
Built the Tree-sitter indexer. Settled on pgvector with HNSW after benching FAISS, Pinecone, and pgvector. pgvector won on operational simplicity at our scale.
API + caching layer
FastAPI gateway with prompt cache. Tuned cache key strategy for Anthropic's 5-min TTL. Hit rate climbed from 12% to 78% over two weeks of tuning.
Editor plugins
VSCode plugin built on the LSP. JetBrains plugin via their platform. Streamed SSE for sub-second time-to-first-token.
Beta launch
Onboarded 14 design partners. Set up usage telemetry and error tracking. Iterating weekly based on partner feedback.
What it does. How it's built.
Features
- AST-aware repo indexing via Tree-sitter (12 languages)
- Symbol graph + dependency map for cross-file reasoning
- Semantic search via pgvector (HNSW index)
- Prompt caching aligned with Anthropic's 5-min TTL
- VSCode + JetBrains plugins
- Streaming SSE responses for sub-second time-to-first-token
- Per-team prompt templates
- Local-first index option (no code leaves your machine)
Architecture
- 01FastAPI gateway with Pydantic validation
- 02Tree-sitter parsers for 12 languages (Python, TS, Go, Rust, Java, ...)
- 03PostgreSQL + pgvector for embeddings (HNSW for fast recall)
- 04Redis for prompt cache + session state
- 05Claude API: Sonnet for code, Haiku for routing decisions
- 06React + TanStack Query frontend
- 07VSCode extension built on the LSP
- 08Local indexer in Rust for the local-first option
Annotated excerpts.
async def retrieve_context(
query: str,
repo: RepoIndex,
k: int = 12,
) -> list[CodeChunk]:
embedding = await embed(query)
# Vector search across chunks
semantic = await repo.vector_search(embedding, k=k * 2)
# Walk symbol graph from top hits
graph_hits = await repo.expand_symbols(
seeds=[h.symbol_id for h in semantic[:5]],
depth=2,
)
# Rerank with cross-encoder
merged = dedupe(semantic + graph_hits)
reranked = await rerank(query, merged)
return reranked[:k]class PromptCache:
"""Cache aligned with Anthropic's 5-minute TTL."""
TTL_SECONDS = 5 * 60
async def get_or_compute(
self,
key: PromptKey,
compute: Callable[[], Awaitable[Response]],
) -> CachedResponse:
cached = await self.redis.get(key.fingerprint)
if cached and not self._stale(cached):
return CachedResponse(response=cached, cache_hit=True)
response = await compute()
await self.redis.setex(
key.fingerprint,
self.TTL_SECONDS,
response.model_dump_json(),
)
return CachedResponse(response=response, cache_hit=False)He integrated Claude into our editor and the cache hit rate is sitting at 78%. Token bill dropped by more than half month-over-month. Excellent communicator — async-friendly, no theatrics, just delivery.
Continue browsing
Have a project like this in mind? Let's talk.
Send me a brief and I'll respond within 24 hours.