Recorder: a tiny Windows tray app for people who forget meetings immediately

I’m quite scatty. I have a terrible memory and probably ADHD. Unless I physically take notes, I often struggle to remember what was said on a Zoom call five minutes after it ends.

So naturally, I built something to fix this for me. Well, AI built most of it. I mostly PM’d it.

Introducing Recorder: a Windows system tray app that records your calls — and any other apps you choose to configure.

I know this already exists in Zoom and Teams, but those features usually depend on the meeting organiser enabling recording. I’m also very aware that you need permission to record people, so if you use this, make sure everyone on the call knows you’re recording. Use it at your own risk, and observe the law of the land on which you stand.

Version 1: automatic recording

The first version was simple: detect when Zoom or Teams was in focus, then start recording.

Importantly, it doesn’t stop recording just because focus shifts away from the meeting window. You might be sharing your screen, checking a document, or looking something up. Instead, it only stops when the meeting window is minimised or closed.

Version 2: transcription

Next came transcription.

There’s a .NET binding for Whisper.cpp, so I used that. It’s remarkably quick: shortly after a recording finishes, the transcript is ready.

Version 3: diarization

Then I wanted diarization: who said what.

The only tool I really know for this is Pyannote, which is Python-based, while Recorder is written in .NET. There are ways to run Python in-process, but Codex opted to run it as a separate process instead.

It wrote a bootstrap script to set everything up, so installation was fairly painless. There were a few iterations and bug fixes, but the final result was pretty good.

Version 4: one combined output

Up to this point, Recorder was dumping lots of separate files for each recording:

So I got it to combine everything into one JSON format. I also got it to pull out voice embeddings from Pyannote.

Version 5: speaker tagging

Next, I added a speaker database using SQLite.

This means you can tag speakers manually, and future recordings can be auto-tagged if Recorder detects the same voice again.

Version 6: actually recording both sides of the call

At this point, I had mostly tested it by configuring it to capture Chrome audio.

Then I had a sudden realisation: during a real live call, it would only capture the computer’s output. It wouldn’t include my microphone audio.

So I had to fix that.

Recorder now records the system audio and microphone audio separately, then merges them before running the combined recording through the rest of the pipeline.

Version 7: fixing audio alignment

After the first live test, the audio wasn’t aligned properly. It sounded like I was speaking over someone else.

That turned out to be related to WASAPI trimming silence at the beginning of the recording. I fixed the audio alignment against wall time, and that seems to have solved it.

Work in progress

There are a few things I’m still fixing or adding:

I built this for my own needs, so it may not fit yours. But feel free to have a play with it:

https://github.com/pintofbeer/recorder

It works best if you wear headphones during calls, but it’s not terrible if you don’t.

Caveats: Not fully tested; only tested with Zoom and Chrome at the moment. Use at your own risk; if it breaks your laptop, formats your hard drive, gets you sacked, don't come crying to me.

Open to suggestions for improvements.