# Transcribe - Audio Transcription Tool A CLI tool for transcribing audio files using OpenAI's Whisper model with speaker diarization and multiple output formats. ## Features - Multiple Whisper model sizes (tiny, base, small, medium, large, turbo) - Speaker diarization using voice embeddings (resemblyzer + clustering) - Multiple output formats: SRT subtitles, plain text, JSON - Batch processing of multiple audio files - Automatic language detection - Progress indicators with spinners ## Installation ### Prerequisites - Go 1.20+ - Python 3.8+ - FFmpeg ### Python Dependencies ```bash # Required for transcription pip install openai-whisper # Required for speaker diarization pip install resemblyzer scikit-learn librosa ``` Note: If `resemblyzer` fails to install due to `webrtcvad`, install Python development headers first: ```bash # Fedora/RHEL sudo dnf install python3-devel # Ubuntu/Debian sudo apt install python3-dev ``` ### Build from Source ```bash go build -o transcribe ``` ## Usage Output file (`-o`) is required unless `--no-write` is specified. ### Basic Transcription ```bash ./transcribe audio.mp3 -o output.srt ``` ### Choose Whisper Model ```bash ./transcribe audio.mp3 --model small -o output.srt ``` Available models: `tiny` (default), `base`, `small`, `medium`, `large`, `turbo` ### Output Formats **SRT subtitles (default):** ```bash ./transcribe audio.mp3 -o subtitles.srt ``` **Plain text with timestamps:** ```bash ./transcribe audio.mp3 --format text -o output.txt ``` **JSON:** ```bash ./transcribe audio.mp3 --format json -o output.json ``` ### Speaker Diarization Enable automatic speaker detection: ```bash ./transcribe audio.mp3 --diarize -o output.srt ``` Specify number of speakers for better accuracy: ```bash ./transcribe audio.mp3 --diarize --speakers 2 -o output.srt ``` ### Print to stdout ```bash ./transcribe audio.mp3 --no-write ``` ### Full Example Transcribe with speaker diarization: ```bash ./transcribe interview.wav --model small --diarize -s 2 -o interview.srt ``` Output: ``` 1 00:00:00,000 --> 00:00:05,200 [Speaker 1] Hello, how are you? 2 00:00:05,200 --> 00:00:12,300 [Speaker 2] I'm doing well, thanks! ``` ## CLI Reference ``` Usage: transcribe