Projects
Present Link to heading
House Proceedings Corpus
A multi-service search and analysis platform for large-scale congressional text, indexing over 181 million tokens across 2,700+ U.S. House transcripts. A FastAPI gateway routes queries, Kafka decouples ingest pipelines, Elasticsearch powers full-text and faceted search, and Neo4j stores entity graphs linking speakers, bills, committees, and topics. A spaCy-based NLP pipeline extracts named entities, noun phrases, and dependency structures at ingest time. The system follows clean layered architecture with repository pattern, dependency injection, and full observability via Prometheus metrics, Grafana dashboards, and structured JSON logging.
Enterprise Search & Ingestion Platform
A large-scale semantic search and image ingestion platform for enterprise workloads. The search API blends dense vector nearest-neighbor retrieval via Qdrant with keyword and metadata filters, delivering sub-200ms query latency. Circuit breakers, retry logic with exponential backoff, and Redis caching provide fault tolerance and graceful degradation. The ingestion engine moves images from S3 through CLIP embedding generation on distributed Ray/GPU clusters, processing over 400 million images in under 24 hours. Batch-level fault isolation, dead-letter tracking, and end-of-run reconciliation ensure zero-error completion across multi-million-image datasets. Full observability spans the pipeline via Prometheus and structured logging.
Past Link to heading
PerceptivePanda
PerceptivePanda was an AI-native customer research platform that replaced traditional human-led interviews with AI-driven micro-interviews at scale. I co-founded the company and served as CTO. The core contribution was a deterministic control layer wrapping LLMs using a dialogue state framework grounded in discourse theory from my work at Stanford—particularly the Questions Under Discussion model. The system tracked addressed questions, identified threads for deeper probing, and orchestrated LLM calls through a structured state machine, ensuring coherent analytical conversations while preserving natural fluidity. PerceptivePanda was a StartX ’24 company, acquired by Zapier in January 2026.
ClearGraph / Tableau Ask Data
At ClearGraph, I architected the natural language querying technology that became Tableau Ask Data after Tableau’s 2017 acquisition. I led integration and scaling, growing the team to 30+ engineers and deploying to hundreds of thousands of users. The system—designed before BERT existed—used Montague Grammar and compositional semantics from my Stanford research. User utterances were parsed via context-free grammar rules into a formal intermediate representation, resolving ambiguity, underspecification, and implicit context at each derivation stage. The work produced 9 issued patents covering NLQ architecture, intent inference, cascading edits, table calculations, and incremental visual feedback:
- US-11550853-B2 — Table calculations via natural language
- US-11314817-B1 — Intent inference and context for NL expressions
- US-11301631-B1 — Visual correlation of NL terms to structured phrases
- US-11244114-B2 — Analyzing underspecified NL utterances
- US-11055489-B2 — Levels of detail via NL constructs
- US-11048871-B2 — Analyzing NL expressions in data visualization
- US-10902045-B2 — NL interface with cascading filter edits
- US-20220253481-A1 — Inferring intent for NL in data visualization
- US-20210319186-A1 — Using NL constructs for data visualizations
Partial Order Optimality Theory
Companion implementation to “A constructive solution to the ranking problem in Partial Order Optimality Theory” (Journal of Logic, Language & Information, 2017). Classical OT assumes grammars are strict total orders over constraints, making it impossible to model free variation. PoOT generalizes to arbitrary partial orders, but the ranking problem—finding all compatible grammars from observed data—was unsolved. The paper provides an exact set-theoretic construction exploiting the lattice structure of partial orders. The codebase generates the full lattice, computes grammar sets via winner/loser set intersection, identifies harmonically bounded candidates, and derives candidate entailments. Includes Finnish vowel coalescence and case datasets.
Doctoral Dissertation: On Adjectival Comparatives
The syntax and semantics of ordinary comparative constructions in English. PhD Dissertation, Stanford University, 2015. The dissertation argues that standard degree-based analyses of comparatives are fundamentally flawed—requiring abstract degrees as first-class objects and covert operators to handle scope. The alternative framework, grounded in Barker and Shan’s continuation semantics and Muskens’ simplified Montague logic, computes comparative meaning compositionally through continuation-passing without positing degrees or hidden structure. The account covers phrasal, clausal, sub-, and differential comparatives, and resolves long-standing puzzles around scope interactions between comparatives and quantifiers.