I build production-grade RAG systems, knowledge-graph-grounded AI, and multi-agent pipelines for teams that need their AI to be accurate, not just impressive in a demo.
My background is unusual: I'm a PhD researcher in Computer Science at Top rated university, and my dissertation is literally a formal theory of when knowledge-grounded LLMs work and when they structurally fail. That means I don't just wire up RAG — I can tell you at a complexity-theoretic level which retrieval architectures will hold up on multi-hop reasoning over your domain, and which will quietly hallucinate under load. .
What I help clients with:
RAG pipelines — multi-document ingestion, hybrid search (dense + sparse), retrieval quality evaluation, and grounding guarantees over domain-specific corpora, including clinical and structured (FHIR / knowledge graph) data
Multi-agent orchestration — LangChain, LangGraph, CrewAI, Claude Agentic SDK — with real understanding of agent memory, tool use, FSM-based control flow, and failure modes. I've built and evaluated production multi-agent systems hitting 92%+ accuracy on clinical reasoning benchmarks.
Knowledge graph + LLM integration — Neo4j, Cypher, FHIR, biomedical KGs (UMLS, PrimeKG). If your data has structure, I'll use it instead of flattening it into chunks.
AI integration — connecting LLMs (OpenAI, Claude, Gemini, Llama via Ollama) to your databases, APIs, and business logic via Python and FastAPI, or Node.js when needed.
Evaluation & auditing — if your AI system is hallucinating, drifting, or failing silently, I diagnose root cause and design rigorous evaluation frameworks. I've built mutation-based benchmarks evaluating eight LLMs across thousands of test cases.
AI security & robustness — secure code generation, vulnerability evaluation, prompt-injection-aware system design.
I work best on projects where precision matters — healthcare, legal, finance, regulated enterprise, scientific tooling — domains where "close enough" isn't acceptable and you need someone who can defend the architectural choices when auditors or regulators ask.
Tech stack: Python · PyTorch · LangChain / LangGraph / CrewAI · Claude Agentic SDK · OpenAI / Anthropic / Gemini APIs · Claude Code · Ollama · Pinecone / pgvector / Weaviate / Qdrant · FastAPI · Neo4j · FHIR · n8n · Node.js · React