2.0 KiB
2.0 KiB
Transcribe Tool
Audio transcription CLI using OpenAI Whisper with speaker diarization.
Quick Reference
# Basic transcription (SRT output)
./transcribe audio.mp3 -o output.srt
# With speaker diarization
./transcribe audio.mp3 --diarize -o output.srt
# Specify model and speakers
./transcribe audio.mp3 --model small --diarize -s 2 -o output.srt
# Print to stdout
./transcribe audio.mp3 --no-write
Flags
| Flag | Short | Description | Default |
|---|---|---|---|
--output |
-o |
Output file path | required |
--format |
-f |
srt, text, json |
srt |
--model |
-m |
tiny, base, small, medium, large, turbo |
tiny |
--diarize |
Enable speaker detection | off | |
--speakers |
-s |
Number of speakers (0=auto) | 0 |
--no-write |
Print to stdout instead of file | off |
Common Tasks
Transcribe a meeting recording:
./transcribe meeting.wav --model small -o meeting.srt
Transcribe interview with 2 speakers:
./transcribe interview.mp3 --model small --diarize -s 2 -o interview.srt
Get JSON output for processing:
./transcribe audio.mp3 --format json -o output.json
Quick preview (stdout):
./transcribe audio.mp3 --no-write
Output Formats
SRT (default): Subtitle format with timestamps
1
00:00:00,000 --> 00:00:05,200
[Speaker 1] Hello, how are you?
Text: Plain text with timestamps
[00:00.0 - 00:05.2] [Speaker 1] Hello, how are you?
JSON: Full metadata including segments, words, duration
Models
tiny- Fastest, use for quick draftssmall- Good balance of speed/accuracymedium- Better accuracy, slowerlarge- Best accuracy, slowest
Supported Formats
MP3, WAV, FLAC, M4A, OGG, OPUS
Build
cd /home/yeho/Documents/tools/transcribe
go build -o transcribe
Dependencies
pip install openai-whisper # Required
pip install resemblyzer scikit-learn librosa # For diarization