David Collen · AI Innovators · AI Dev Lab
AI Innovators / David Collen
AI Dev Lab  ·  Impact Series  ·  Profile 2
Conversational AI

David Collen

Why most conversational AI underperforms. What decades of building systems that actually understand people reveal about the limits of machine learning orthodoxy. David has been doing the hard version of this problem since before most people in AI were paying attention.

The Core Insight

Machine learning is not the only branch of the AI tree. For conversational systems, it's often not the best one. The teams that outperform are the ones willing to question what everyone else has decided is obvious.

Three decades of doing the hard version

David Collen was an architect in San Francisco before he accidentally started a software company in the mid-1990s. His team put the first 3D model on the internet, a moment that pulled them into a cascade of adjacent problems: virtual environments, talking conversational characters, early voice systems.

In 2003, the intelligence community invited his team to participate in a research initiative called NIMD (Novel Intelligence from Massive Data), assembling 18 of the top AI companies in the world to work on identifying patterns in large data sets. That project introduced Collen to essentially every major AI research thread of the era and gave him a comprehensive map of what approaches worked for what kinds of problems.

From there: soldier tracking systems for the Army, the first indoor-outdoor positioning system (later productized into navigation systems for automotive and handheld devices), and eventually, in 2008 with AI pioneer Bruce Wilcox joining the team, what may have been the first voice assistant for an automotive navigation platform.

The three buckets of conversational AI

Collen maps the entire conversational AI landscape into three distinct approaches. Understanding them is essential for any organization trying to evaluate what they're actually buying when they invest in conversational AI.

Keyword-based systems (chatbots): These have existed since 1965 and haven't evolved significantly. They pattern-match on trigger words. They work for narrow, predictable flows and fall apart when users go off-script. When COVID-19 stress-tested them at scale, most failed visibly.

Large-scale machine learning: This is what Google, Amazon, Microsoft, and Apple have invested in. These systems are expensive to build, require enormous amounts of training data, and are almost entirely cloud-dependent. Collen makes a pointed observation: the reason Alexa and Google Assistant exist is to harvest user data, not to understand people better. That's a fundamentally different design goal than building a system that actually serves the user.

Symbolic reasoning: This is SapientX's approach. Rather than pattern-matching or training on massive datasets, symbolic systems parse the grammatical structure of language (the same part-of-speech analysis your seventh-grade English teacher used) and reason about meaning from constituent parts. The result is a system that runs on 2% of an Android phone's CPU and achieves 99% intent accuracy across all languages they test before deployment.

When adding machine learning makes things worse

One of the more counterintuitive findings Collen shared: adding more machine learning to an existing system doesn't always improve it. In some cases, it makes it measurably worse.

He cited internal data from Amazon's Echo and Alexa teams showing that as they added machine learning layers to what had started as a rule-based system, performance declined year over year. Latency increased. Response quality dropped. The system became harder to reason about and harder to improve because the machine learning components introduced opacity into what had previously been predictable behavior.

The lesson isn't that machine learning is bad: it's that architectural decisions have compounding consequences, and the dominant narrative that more data and bigger models always lead to better outcomes is simply not true in complex, multi-component systems.

What the Yamaha senior care project reveals

SapientX was working with Yamaha on a voice assistant for senior care: a companion designed to handle smart home tasks, monitor cognitive and health patterns through passive and active conversation, and adapt its communication style based on what it learns about the individual user over time.

This deployment constraint is the most useful frame for evaluating any conversational AI system. The user population can't learn a command syntax. They won't troubleshoot API errors. They need a system that understands natural language in all its inconsistency, runs reliably without a cloud dependency, and fails gracefully when it encounters something it doesn't recognize.

The constraints of that deployment reveal the architecture that actually matters. The best AI for a specific use case is not always the one with the most parameters or the largest training set. It's the one designed with the actual environment and user in mind from the beginning.

Key Facts
01

Put the first 3D model on the internet in the mid-1990s. Spent decades in voice, conversational AI, and spatial computing before it was mainstream.

02

Participated in NIMD (Novel Intelligence from Massive Data), a 2003 intelligence community research initiative with 18 top AI companies.

03

Symbolic reasoning systems achieve 99% intent accuracy across all languages before deployment, tested and validated before release.

04

Runs on 2% of Android CPU. No cloud dependency required. Works fully offline.

05

Working with Yamaha on a senior care companion that monitors cognitive health and adapts conversational style to individual users over time.

"Machine learning is not the be-all and end-all. If you crack open an AI textbook, there are many different branches in the tree. Some approaches are better at one thing than another."

David Collen  ·  Founder, SapientX

What This Means For Your Organization

Lessons that travel beyond the story.

01
Know which bucket your AI vendor is in.

Keyword-based, machine learning, or symbolic reasoning: each has different performance profiles, infrastructure requirements, and failure modes. The wrong architecture for your use case won't improve with more data or fine-tuning. It'll just fail faster at scale.

02
Machine learning orthodoxy has real costs.

The assumption that bigger models always win is not well-supported by production evidence. Adding ML layers to existing systems can increase latency, introduce opacity, and reduce reliability. Evaluate AI tools on production performance, not benchmark scores.

03
Design for the user who won't adapt to the technology.

The most rigorous test of a conversational AI system is deploying it to users who cannot and will not learn commands. If it only works for cooperative, tech-comfortable users, it's not production-ready. Design constraints reveal architecture quality.

Work with a team that
knows the landscape.

The principles David followed are the same ones we bring to every AI engagement at AI Dev Lab. If you're building AI products or figuring out where AI fits, we should talk.

Let's Talk Read Original Interview No commitment  ·  30 minutes  ·  Senior leadership
Jason Wells
Jason Wells
Co-Founder & Chief Strategy Officer, AI Dev Lab
MBA, Wharton MS Applied Mathematics Former SVP, Sony Pictures Kearney Alum 4× Ironman
Building AI products in transit and enterprise since before it was a pitch deck category.