Dylan Fox · AI Innovators · AI Dev Lab

AI Dev Lab · Impact Series · Profile 1

Audio Intelligence

Dylan Fox

Founder & CEO, AssemblyAI

How a developer-first approach to audio intelligence built a category Big Tech missed. Dylan left a comfortable engineering role at Cisco to build what legacy speech AI providers refused to build: a platform where the developer actually mattered.

The Core Insight

When the vendors who should be serving a market treat it as a side project, the infrastructure they build reflects that indifference. The organizations who depend on it pay the price.

The gap nobody was filling

When Dylan Fox was working on a machine learning team at Cisco, his team needed speech recognition. They tried to license it from the established players, including Nuance and others who had dominated the space for years. The experience was, in his words, unexpectedly painful. Documentation was thin. Support was slow. The underlying technology had fallen behind what academic research had made possible.

Google had released its first public speech-to-text API around the same time. The technology was actually quite good, but there was no meaningful support structure around it. No relationship. No feedback loop. A developer couldn't use it as a genuine differentiator in a product because the dependency was too fragile.

Fox saw a specific pattern emerging: deep learning research was rapidly advancing the state of what was possible in speech recognition, but none of that progress was reaching developers in a usable form. The companies with the most capable models had no incentive to make them easy to use. The companies with developer-friendly products were running on older technology.

The Twilio model applied to audio

The insight that became AssemblyAI was straightforward in retrospect: take the Twilio approach (complex infrastructure made accessible through a clean API and a genuine commitment to developer experience) and apply it to the latest deep learning research in speech recognition.

Fox left Cisco in 2017 to build it. He applied to Y Combinator more as a learning exercise than an expectation, submitting about a month after the deadline, and was accepted. From zero to funded in a week. The model was sound: developers needed this, no one was building it right, and the moment was exactly right.

What AssemblyAI built was not just a transcription API. It was a platform where developers could get audio intelligence (transcription, entity recognition, sentiment analysis, topic detection, summarization) with a single API parameter and the confidence that a team was on the other end who cared whether the integration worked.

The real problem with relying on Big Tech infrastructure

Fox identified five specific failure modes that come with building on infrastructure that a major tech company treats as a side project: you can't make it a genuine differentiator; the services aren't adequately supported; they can be deprecated or changed without notice; they're updated infrequently; and you have no real relationship with the people building them.

He made the point directly: Google Assistant and Alexa exist primarily to harvest user data. That's the product. Making the voice interface actually understand people better isn't the goal. Capturing what people ask for is. The incentives are misaligned with what a developer building a serious product actually needs.

This is a pattern that extends well beyond audio AI. Any time an organization builds a critical workflow on top of infrastructure a vendor maintains as an afterthought, they're taking on risk that doesn't show up in the procurement process. The vendor doesn't lose sleep over your dependency. You do.

How you actually compete with incumbents

At the time of our conversation, AssemblyAI was training models on somewhere between 100 and 150 GPUs, with training cycles running six weeks or more. That's significant investment, but it's targeted investment, focused entirely on a specific problem domain rather than a general-purpose system.

That focus is the mechanism by which a smaller organization competes with incumbents who have more resources. You don't try to out-general them. You go deeper on a specific problem, build a better product within that scope, and provide the relationship and support quality that large organizations structurally can't replicate at scale.

AssemblyAI's transcription models eventually reached a level where the team was confident they were among the largest and most accurate in the industry, not because they had Google's compute budget, but because every engineering decision was in service of the same specific outcome.

Key Facts

Left Cisco in 2017 to start AssemblyAI. Applied to Y Combinator a month after the deadline and was accepted.

Built on PyTorch with end-to-end deep learning models, with training cycles up to 6 weeks on 100–150 dedicated GPUs.

Identified 5 structural failure modes of building on Big Tech AI infrastructure as a core competitive argument.

Developer experience treated as a first-class product concern, not an afterthought to the core technology.

Audio intelligence beyond transcription: entity recognition, sentiment analysis, topic detection, and summarization. All via a single API parameter.

“

"What if you could build a Twilio-style company using the latest deep learning research, one that's actually built for the developer, not the enterprise sales team?"

Dylan Fox · Founder, AssemblyAI

What This Means For Your Organization

Lessons that travel beyond the story.

Ask who your vendor is really building for.

The question isn't whether the technology works. It's whether the vendor's incentives align with your need for reliability, updates, and real support. If your use case isn't core to their business, treat that dependency as a risk item.

Developer experience is part of the product.

AI adoption in any organization is not primarily a technology problem: it's a usability problem. The tools that get used are the ones where people can actually get help when something breaks. Documentation and support are not afterthoughts.

Focus beats scale in specialized domains.

Incumbents win on breadth. Challengers win on depth. If you're evaluating AI tools for a specific use case, the vendor who built exclusively for that use case almost always outperforms the platform that added it to a growing product catalog.

More in the series

Work with a team that
knows the landscape.

The principles Dylan followed are the same ones we bring to every AI engagement at AI Dev Lab. If you're building AI products or figuring out where AI fits, we should talk.

Let's Talk Read Original Interview No commitment · 30 minutes · Senior leadership

Jason Wells

Co-Founder & Chief Strategy Officer, AI Dev Lab

MBA, Wharton MS Applied Mathematics Former SVP, Sony Pictures Kearney Alum 4× Ironman

Building AI products in transit and enterprise since before it was a pitch deck category.

Dylan Fox

The gap nobody was filling

The Twilio model applied to audio

The real problem with relying on Big Tech infrastructure

How you actually compete with incumbents

Lessons that travel beyond the story.

More in the series

Work with a team that
knows the landscape.

Solutions

Resources

Company

The gap nobody was filling

The Twilio model applied to audio

The real problem with relying on Big Tech infrastructure

How you actually compete with incumbents

Lessons that travel beyond the story.

More in the series

Work with a team thatknows the landscape.

Work with a team that
knows the landscape.