The moment that started everything
Tobias Martens spent a decade working at the intersection of technology and public institutions: the European Commission, the German Institute for Standardization (DIN), and corporate consulting. That background gave him a frame most AI founders don't have: he'd watched how standards get made, how they get adopted, and how their absence creates invisible friction across entire industries.
The founding insight of Whoelse AI came from a personal moment: trying to explain how different internet services worked to his nephew and grandmother at the same time. AirBnB for apartments. Tinder for dating. Ticketmaster for events. Each required a different mental model, a different vocabulary, a different way of navigating.
It occurred to him that the problem wasn't the services themselves. It was that every service required its own conceptual framework before you could use it. And that the same fragmentation that made internet services hard to explain was showing up in voice AI, in a more consequential way. When there are over a thousand voice AI technologies on the market and each one uses a different wake word, a different intent format, and a different API. Nobody can build on top of any of them with confidence.
Why 1,000 voice AI platforms is a problem, not a success
When Tobias and I spoke, there were over a thousand voice AI technologies on the market. Most people were aware of two or three. That gap isn't a marketing problem: it's a structural problem. Organizations trying to build voice-first experiences face a fragmented ecosystem where every vendor speaks a different language, maintains different standards, and can be deprecated or acquired without warning.
The consequence: organizations either standardize on one platform and accept the dependency risk, or they maintain multiple integrations and absorb the ongoing complexity cost. Neither option is good. And neither solves the underlying problem, which is that there's no shared protocol for what voice AI systems are supposed to do or how they're supposed to talk to each other.
Martens' framing is that this is the same problem the internet solved with TCP/IP, that email solved with SMTP, that telephony solved with signaling protocols. Every time a communication technology matures, it goes through a period of fragmentation, followed by convergence around a shared standard. Voice AI is in the fragmentation phase. The work Whoelse AI is doing is about accelerating the convergence.
Standards as competitive strategy, not just compliance
One of the more unusual aspects of Martens' approach is treating standards contribution as a business strategy rather than a technical obligation. Whoelse AI has contributed to DIN Standards (the German representation of ISO), the World Wide Web Consortium, and the Voice Network initiative, working groups that are writing the technical specifications for how voice AI systems should communicate.
The strategic logic is elegant: in European government procurement, ISO compliance is frequently a contractual requirement. By contributing to the development of the relevant standards rather than waiting to comply with them, Whoelse AI shaped the environment in which it would compete. When the standard is adopted, they're already aligned, and they have the expertise and documentation to help others achieve compliance.
This is a longer game than most AI startups play. But for organizations building for regulated or government-adjacent markets, it's a model worth understanding. The standards governing how AI is deployed in public contexts are being written now. The organizations that participate have influence over what those standards require.
The architecture of interoperability
Whoelse AI's technical approach to the fragmentation problem focuses on the linguistic structure layer rather than the full technology stack. Rather than trying to get different platforms to adopt a common API, the team focused on encoding language intent in a standardized way that could be interpreted by multiple underlying systems.
The practical result is a bridge layer, built on DIN Standard protocols and connecting platforms like ARM and IBM Watson, that can accept a user request, parse its intent in a standardized format, and route that intent to whichever underlying AI system is best suited to handle it. The user doesn't know which system responded. They just get an answer.
Martens describes the long-term vision as a network of specialized AI assistants, each expert in a different domain, connected by a shared protocol for passing requests and returning results. Not one AI that knows everything, but many AIs that each know their domain deeply, able to communicate with each other through a shared language. That's the infrastructure model for AI at scale.
Background spans 10 years across European Commission, DIN (German ISO representation), and corporate technology consulting before founding Whoelse AI.
Contributing author to DIN Standards, ISO protocols, and the W3C Voice Network initiative, working groups writing the technical specs for voice AI interoperability.
ISO compliance is a contractual requirement in many European government procurement processes, making standards contribution a direct business development strategy.
Technical architecture: a bridge layer between ARM and IBM Watson, using DIN Standard protocols to route user intent across different AI platforms.
Chose to work on the "most basic standard feasible": linguistic structure encoding, rather than competing on features or attempting a comprehensive platform standard.


