Transcribe Tool

Audio transcription CLI using OpenAI Whisper with speaker diarization.

Quick Reference

# Basic transcription (SRT output)
./transcribe audio.mp3 -o output.srt

# With speaker diarization
./transcribe audio.mp3 --diarize -o output.srt

# Specify model and speakers
./transcribe audio.mp3 --model small --diarize -s 2 -o output.srt

# Print to stdout
./transcribe audio.mp3 --no-write

Flags

Flag	Short	Description	Default
`--output`	`-o`	Output file path	required
`--format`	`-f`	`srt`, `text`, `json`	`srt`
`--model`	`-m`	`tiny`, `base`, `small`, `medium`, `large`, `turbo`	`tiny`
`--diarize`		Enable speaker detection	off
`--speakers`	`-s`	Number of speakers (0=auto)	`0`
`--no-write`		Print to stdout instead of file	off

Common Tasks

Transcribe a meeting recording:

./transcribe meeting.wav --model small -o meeting.srt

Transcribe interview with 2 speakers:

./transcribe interview.mp3 --model small --diarize -s 2 -o interview.srt

Get JSON output for processing:

./transcribe audio.mp3 --format json -o output.json

Quick preview (stdout):

./transcribe audio.mp3 --no-write

Output Formats

SRT (default): Subtitle format with timestamps

1
00:00:00,000 --> 00:00:05,200
[Speaker 1] Hello, how are you?

Text: Plain text with timestamps

[00:00.0 - 00:05.2] [Speaker 1] Hello, how are you?

JSON: Full metadata including segments, words, duration

Models

tiny - Fastest, use for quick drafts
small - Good balance of speed/accuracy
medium - Better accuracy, slower
large - Best accuracy, slowest

Supported Formats

MP3, WAV, FLAC, M4A, OGG, OPUS

Build

cd /home/yeho/Documents/tools/transcribe
go build -o transcribe

Dependencies

pip install openai-whisper                      # Required
pip install resemblyzer scikit-learn librosa    # For diarization

2.0 KiB Raw Permalink Blame History