Projects

Present Link to heading

House Proceedings Corpus

Python FastAPI Kafka Elasticsearch Neo4j spaCy Docker Prometheus Grafana

A multi-service search and analysis platform for large-scale congressional text, indexing over 181 million tokens across 2,700+ U.S. House transcripts. A FastAPI gateway routes queries, Kafka decouples ingest pipelines, Elasticsearch powers full-text and faceted search, and Neo4j stores entity graphs linking speakers, bills, committees, and topics. A spaCy-based NLP pipeline extracts named entities, noun phrases, and dependency structures at ingest time. The system follows clean layered architecture with repository pattern, dependency injection, and full observability via Prometheus metrics, Grafana dashboards, and structured JSON logging.

View on GitHub

Enterprise Search & Ingestion Platform

Python FastAPI Ray Spark Databricks CLIP S3 Qdrant Redis GPU Docker

A large-scale semantic search and image ingestion platform for enterprise workloads. The search API blends dense vector nearest-neighbor retrieval via Qdrant with keyword and metadata filters, delivering sub-200ms query latency. Circuit breakers, retry logic with exponential backoff, and Redis caching provide fault tolerance and graceful degradation. The ingestion engine moves images from S3 through CLIP embedding generation on distributed Ray/GPU clusters, processing over 400 million images in under 24 hours. Batch-level fault isolation, dead-letter tracking, and end-of-run reconciliation ensure zero-error completion across multi-million-image datasets. Full observability spans the pipeline via Prometheus and structured logging.

Past Link to heading

PerceptivePanda

Python LLMs Discourse Theory Dialogue State Management FastAPI React

PerceptivePanda was an AI-native customer research platform that replaced traditional human-led interviews with AI-driven micro-interviews at scale. I co-founded the company and served as CTO. The core contribution was a deterministic control layer wrapping LLMs using a dialogue state framework grounded in discourse theory from my work at Stanford—particularly the Questions Under Discussion model. The system tracked addressed questions, identified threads for deeper probing, and orchestrated LLM calls through a structured state machine, ensuring coherent analytical conversations while preserving natural fluidity. PerceptivePanda was a StartX ’24 company, acquired by Zapier in January 2026.

ClearGraph / Tableau Ask Data

NLP Montague Grammar Context-Free Grammar Formal Semantics Java Python Elasticsearch

At ClearGraph, I architected the natural language querying technology that became Tableau Ask Data after Tableau’s 2017 acquisition. I led integration and scaling, growing the team to 30+ engineers and deploying to hundreds of thousands of users. The system—designed before BERT existed—used Montague Grammar and compositional semantics from my Stanford research. User utterances were parsed via context-free grammar rules into a formal intermediate representation, resolving ambiguity, underspecification, and implicit context at each derivation stage. The work produced 9 issued patents covering NLQ architecture, intent inference, cascading edits, table calculations, and incremental visual feedback:

US-11550853-B2 — Table calculations via natural language
US-11314817-B1 — Intent inference and context for NL expressions
US-11301631-B1 — Visual correlation of NL terms to structured phrases
US-11244114-B2 — Analyzing underspecified NL utterances
US-11055489-B2 — Levels of detail via NL constructs
US-11048871-B2 — Analyzing NL expressions in data visualization
US-10902045-B2 — NL interface with cascading filter edits
US-20220253481-A1 — Inferring intent for NL in data visualization
US-20210319186-A1 — Using NL constructs for data visualizations

Partial Order Optimality Theory

Python Order Theory Lattice Theory Combinatorics

Companion implementation to “A constructive solution to the ranking problem in Partial Order Optimality Theory” (Journal of Logic, Language & Information, 2017). Classical OT assumes grammars are strict total orders over constraints, making it impossible to model free variation. PoOT generalizes to arbitrary partial orders, but the ranking problem—finding all compatible grammars from observed data—was unsolved. The paper provides an exact set-theoretic construction exploiting the lattice structure of partial orders. The codebase generates the full lattice, computes grammar sets via winner/loser set intersection, identifies harmonically bounded candidates, and derives candidate entailments. Includes Finnish vowel coalescence and case datasets.

View on GitHub

Doctoral Dissertation: On Adjectival Comparatives

Formal Semantics Continuation Semantics Montague Grammar Type Theory Syntax

The syntax and semantics of ordinary comparative constructions in English. PhD Dissertation, Stanford University, 2015. The dissertation argues that standard degree-based analyses of comparatives are fundamentally flawed—requiring abstract degrees as first-class objects and covert operators to handle scope. The alternative framework, grounded in Barker and Shan’s continuation semantics and Muskens’ simplified Montague logic, computes comparative meaning compositionally through continuation-passing without positing degrees or hidden structure. The account covers phrasal, clausal, sub-, and differential comparatives, and resolves long-standing puzzles around scope interactions between comparatives and quantifiers.

Read the dissertation