Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
Recently, David Collen, the ingenious founder of the startup SapientX, sat down for an enlightening conversation. Collen paints a compelling narrative of his company’s evolution and its distinctive focus on conversational characters powered by artificial intelligence.
Collen recounts the pioneering days when he defied the norm by setting the first 3D model on the World Wide Web. His experiences echo the exciting transition from traditional processes to automation AI. He also details his engagement in avant-garde research initiatives such as NIMD that have paved the way for his success.
SapientX’s tech startup impresses us with its groundbreaking approach. Aiming at the junction of symbolic reasoning and pattern recognition, their product elegantly pirouettes on the cluttered stage of AI solutions. It not only outperforms several industry titans but also stands as a beacon of practical tech solutions.
With the introduction of avatars and conversational characters, SapientX raises the bar, intertwining technology with daily human interactions. The thrilling leaps and innovations that have sculpted SapientX’s journey are a testament to their forward-thinking AI strategy, showcasing the amazing benefits of AI.
So, let’s venture forth and explore the motivation, significant jumps, and innovative strides that have delineated the exceptional journey of this AI startup, SapientX. Be prepared to be astounded and inspired!
As a young architect, I came out to San Francisco a long time ago, and I designed high-rise buildings and accidentally started a software company in the mid-90s. My team put the first 3D model on the internet, which brought us into many fascinating places, doing everything from virtual locations to talking conversational characters. Then, in 2003, the intelligence community invited us to participate in a research effort called NIMD (Novel Intelligence from Massive Data). And they had assembled 18 of the top AI companies in the world to look for bad guys on the internet.
Building and developing our first conversational AI system helped us get introduced to all the top minds at the time. And it was fascinating because all of them were working on different branches of the tree. I think my favorite person through all that was Stewart Card at Xerox PARC. And his partner, Peter Pirolli, both were doing fantastic work, which inspired us.
Another thing that we were working on was soldier tracking systems for the army. We built the first system that could track soldiers in the field inside and outside. And we productized that into navigation systems for the carmakers and the handheld device community. Then, in 2008, my old pal, Bruce Wilcox, became available. Bruce had headed up AI at a number of the big game companies. We brought him onto the team and built what possibly was the first voice assistant for the navigation platform.
In the early days of IVR, Bruce thought that was a kind of horse and a buggy thing. And that the future of these voice interfaces was based on conversational ability so that no one would ever have to learn commands. So you could tell your technology what you wanted, and it would understand.
Fast-forwarding to today, we outperformed all the big boys in understanding what people wanted. It’s because we’re taking a different approach. You think machine learning is the be-all and end-all if you’re a young engineer. But the truth is, if you crack open an AI textbook, there are many different branches in the tree. And some of the approaches are better at one thing than another.
So there are three big buckets. First, you have the early folks who are doing what I just generically referred to as chatbots, which are keyword-based systems. Even though they came out around 1965, they haven’t evolved very much. A few years ago, there were hundreds of startups, all doing these chatbots. They didn’t perform very well when interacting with people. So when COVID hit, it was nuclear winter for the chatbot companies because they weren’t doing a very good job.
The second bucket was the machine learning people like Amazon, Google, Microsoft, and Apple. And there’s a reason why they’re almost all big company names. It’s because the systems are super expensive to build and deploy, partly because of their need for insane amounts of data. And the question always comes up, How can you outperform Google?
Does Google have the best engineers on the planet? Why aren’t they doing something better than what we’re doing? And the answer is, yes, they have great engineers. But they’re on a different mission. The only reason Google Assistant exists, or Alexa exists, is to harvest user data. That’s the only reason they exist. And that’s a very different sort of thing that they have built. And they don’t have much motivation to make it work any better. They don’t have any reason to make it work offline because they can’t capture that user data. So it’s different things like GPT3, that’s cool. But you have seen how expensive the computer is that they need to train that model. Some of them run their models on $26 million computers.
So, for us, the proper practical solution is Bruce’s symbolic reasoning method. It’s functional while yet being very light. When running on your Android phone, we only use 2% of the CPU. None of the other systems are capable of doing so. We have been profiling the different methods for many years now. We have seen reports from people in the Echo and Alexa teams over at Amazon. They mention that the system, based on internal tests, scores worse year by year. It started as a rule-based system, and they’ve been adding machine learning to it. And its performance has gotten worse. In their most recent build, there’s now a lot more latency. It took a lot longer for it to respond to me in the morning when I asked him to play the news.
We were inspired by other research going on, in symbolic reasoning, in particular, research at Carnegie Mellon a dozen years ago. So the way I like to explain it is to say that it’s a little bit like when you were in seventh grade, and your English teacher was teaching you how to break apart a sentence into nouns, verbs, and adjectives in order to understand the meaning of each of those categories. We do the same thing with POS tagging, part of speech tagging.
Initially, we decompose the inbound text into constituent parts. We also performed a little bit of correction. The best speech recognizers right now typically get 96 to 97% accuracy. We improved things a little bit by modifying grammar and sentences. So that’s how we’re able to get up to 99%. By the way, all of our systems and all languages are tested to improve their accuracy until they reach 99% intent accuracy before we put them out.
We are also not a slot filling system. We’re capable of a lot more nuance of understanding than the simple slot-based systems, which are the prevalent systems out there right now. So from there, we’re doing symbolic reasoning with pattern recognition concepts to map into things we know about. And that includes learning about the user to improve that user experience.
One of our projects right now is with Yamaha. Who is having us build a voice assistant for senior care that becomes a friend and a companion that performs smart home tasks? The assistant also monitors the seniors’ health and cognitive abilities using both passive and active systems. Based on the conversations and AI perceptions, we’re able to report to a health team that they should look more closely at what’s going on with him.
There’s a lot that we can do beyond the conversation. We’re also able to adjust conversational abilities by what we learn and to change our interaction patterns based on that, even to the point of understanding and adjusting if you’d like a chatty assistant or a more clumsy one.