Recorder: a tiny Windows tray app for people who forget meetings immediately
I’m quite scatty. I have a terrible memory and probably ADHD. Unless I physically take notes, I often struggle to remember what was said on a Zoom call five minutes after it ends.
So naturally, I built something to fix this for me. Well, AI built most of it. I mostly PM’d it.
Introducing Recorder: a Windows system tray app that records your calls — and any other apps you choose to configure.
I know this already exists in Zoom and Teams, but those features usually depend on the meeting organiser enabling recording. I’m also very aware that you need permission to record people, so if you use this, make sure everyone on the call knows you’re recording. Use it at your own risk, and observe the law of the land on which you stand.
Version 1: automatic recording
The first version was simple: detect when Zoom or Teams was in focus, then start recording.
Importantly, it doesn’t stop recording just because focus shifts away from the meeting window. You might be sharing your screen, checking a document, or looking something up. Instead, it only stops when the meeting window is minimised or closed.
Version 2: transcription
Next came transcription.
There’s a .NET binding for Whisper.cpp, so I used that. It’s remarkably quick: shortly after a recording finishes, the transcript is ready.
Version 3: diarization
Then I wanted diarization: who said what.
The only tool I really know for this is Pyannote, which is Python-based, while Recorder is written in .NET. There are ways to run Python in-process, but Codex opted to run it as a separate process instead.
It wrote a bootstrap script to set everything up, so installation was fairly painless. There were a few iterations and bug fixes, but the final result was pretty good.
Version 4: one combined output
Up to this point, Recorder was dumping lots of separate files for each recording:
- a WAV file
- a plain text transcript
- a JSON transcript
- diarization output
So I got it to combine everything into one JSON format. I also got it to pull out voice embeddings from Pyannote.
Version 5: speaker tagging
Next, I added a speaker database using SQLite.
This means you can tag speakers manually, and future recordings can be auto-tagged if Recorder detects the same voice again.
Version 6: actually recording both sides of the call
At this point, I had mostly tested it by configuring it to capture Chrome audio.
Then I had a sudden realisation: during a real live call, it would only capture the computer’s output. It wouldn’t include my microphone audio.
So I had to fix that.
Recorder now records the system audio and microphone audio separately, then merges them before running the combined recording through the rest of the pipeline.
Version 7: fixing audio alignment
After the first live test, the audio wasn’t aligned properly. It sounded like I was speaking over someone else.
That turned out to be related to WASAPI trimming silence at the beginning of the recording. I fixed the audio alignment against wall time, and that seems to have solved it.
Work in progress
There are a few things I’m still fixing or adding:
- Logging the focused windows throughout the call and trying to infer the meeting title.
- Installing itself as a service, or at least running automatically on startup.
- Adding a summarisation step using an LLM API call.
- Configurable webhooks for different meeting types. For example, it would be useful to post a summary of our daily team meeting into Slack. For my own life-logging needs, I’d also like a diary of all my conversations, summarised automatically.
I built this for my own needs, so it may not fit yours. But feel free to have a play with it:
https://github.com/pintofbeer/recorder
It works best if you wear headphones during calls, but it’s not terrible if you don’t.
Caveats: Not fully tested; only tested with Zoom and Chrome at the moment. Use at your own risk; if it breaks your laptop, formats your hard drive, gets you sacked, don't come crying to me.
Open to suggestions for improvements.