[thesis]

your microphone is the bottleneck

your microphone is the bottleneck

your microphone is the bottleneck

[May 18, 2026]

[6 min read]

Yellow Flower

It's 11 am at a Third Wave Coffee in HSR. You need to draft an email to a client. You open ChatGPT, hit the microphone icon, lean toward your laptop, and start whispering. You sound like a man confessing to a crime. The barista glances over. You stop. You start typing instead. The email takes twelve minutes instead of three.

This happens to maybe a million people a day in India, and ten times that globally. Nobody talks about it because the failure feels like a personal failure. You assume you're the one who's embarrassed. You assume the right answer is to find a quieter spot, or just type. You assume the microphone is fine.

It isn't.

We've spent ten years training people to talk to their devices. Siri taught us the prompt. Alexa taught us the routine. Google Assistant taught us the dictation. Then ChatGPT taught us the conversation. Wispr Flow proved that voice input, properly built, is about four times faster than typing.

But voice isn't winning. The dominant numbers are brutal. roughly 32% of voice assistant users use voice daily for search, but only about 5% ever use voice assistants in public. For every twenty people who would benefit from talking to their tools, nineteen are still using their thumbs.

The reason isn't the model. The models are great. The reason isn't the software. Wispr alone is at 70% twelve-month retention, which is Whoop-level stickiness for an app that just makes you type faster. The reason is sitting in your ears right now, and it was designed in 2019 to listen to music.

Your AirPods, your Galaxy Buds, or your Nothing Ear (a)s were all designed for two jobs. The first job is to play sound into your ears really well. The second job is to take a call in a quiet conference room.

The failure modes are predictable once you list them. The first is what Twitter calls "missing the first word." You start speaking, the mic isn't listening yet, and the verb at the start of your sentence gets eaten.

The second is ambient noise pickup. An espresso machine doesn't just enter your audio. It dominates it. The third is what I'll call the volume floor, which is the minimum loudness you have to speak at for the mic to register. For almost every consumer earbud, that floor sits above the social acceptability line. You have to speak loud enough that the table next to you can hear your strategy decisions. So you don't.

The fourth failure mode is the one nobody talks about, because it happens to other people. When you take a call from a cafe, your voice carries the cafe with it. The person on the other end hears espresso machines, chairs scraping, somebody's playlist. They ask you to repeat. They ask you to find a quieter spot. You reschedule.

Every interface category follows the same arc in the same order. There are three layers: an intelligence layer, a software layer, and a hardware layer. The bottleneck moves down the stack as each layer matures.

For voice, the intelligence problem was solved between 2017 and 2022. Word error rates collapsed. Transcription stopped being the issue.

The software problem was solved between 2022 and 2025. ChatGPT voice mode, Wispr Flow, Granola, Otter: there's a whole category of voice-native apps that just work. The UX is good. The latency is good. The integrations are real.

So where's the bottleneck now? It's the input device. It's the mic.

This is the boring part of the AI stack. Nobody raises a billion dollars to make a better microphone. Nobody writes essays about acoustic engineering. But every dollar of AI capability sitting upstream of a broken mic is wasted at the moment you walk into a real-world environment with three other humans in it. And three other humans, for most knowledge workers, is the default environment.

About 5% of voice users use voice in public. The other 95% want to. You can survey them, and they will tell you they want to. They will tell you they tried it once and it was awkward. They will tell you they switched back to typing.

It's a hardware problem first. You need a microphone array that captures speech below a whisper. You need very good voice isolation, so the person on the other end hears you and not the cafe.

We built project mnemosyne to solve this bottleneck. The model is solved, the software is solved, the missing layer is sitting in your ear, optimized for the wrong job. The world's most expensive piece of AI capability is being throttled by a 2019 part.

We'd like to fix that.



your antimattr journey starts here

your antimattr journey starts here

at antimattr, we are inspired by those who chose greatness over ordinary and project mnemosyne is our first step towards that - voice computers to unlock more, so that you can focus on things that truly matters

antimattr

antimattr

antimattr

antimattr