commit b73d5b8078478b288ae656e759c63fa22eeaeaae Author: ysandler Date: Sat Jan 17 19:18:58 2026 -0600 feat: git init diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..513a42e --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +transcribe +test/ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..3c973b0 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,93 @@ +# Transcribe Tool + +Audio transcription CLI using OpenAI Whisper with speaker diarization. + +## Quick Reference + +```bash +# Basic transcription (SRT output) +./transcribe audio.mp3 -o output.srt + +# With speaker diarization +./transcribe audio.mp3 --diarize -o output.srt + +# Specify model and speakers +./transcribe audio.mp3 --model small --diarize -s 2 -o output.srt + +# Print to stdout +./transcribe audio.mp3 --no-write +``` + +## Flags + +| Flag | Short | Description | Default | +|------|-------|-------------|---------| +| `--output` | `-o` | Output file path | **required** | +| `--format` | `-f` | `srt`, `text`, `json` | `srt` | +| `--model` | `-m` | `tiny`, `base`, `small`, `medium`, `large`, `turbo` | `tiny` | +| `--diarize` | | Enable speaker detection | off | +| `--speakers` | `-s` | Number of speakers (0=auto) | `0` | +| `--no-write` | | Print to stdout instead of file | off | + +## Common Tasks + +**Transcribe a meeting recording:** +```bash +./transcribe meeting.wav --model small -o meeting.srt +``` + +**Transcribe interview with 2 speakers:** +```bash +./transcribe interview.mp3 --model small --diarize -s 2 -o interview.srt +``` + +**Get JSON output for processing:** +```bash +./transcribe audio.mp3 --format json -o output.json +``` + +**Quick preview (stdout):** +```bash +./transcribe audio.mp3 --no-write +``` + +## Output Formats + +**SRT (default):** Subtitle format with timestamps +``` +1 +00:00:00,000 --> 00:00:05,200 +[Speaker 1] Hello, how are you? +``` + +**Text:** Plain text with timestamps +``` +[00:00.0 - 00:05.2] [Speaker 1] Hello, how are you? +``` + +**JSON:** Full metadata including segments, words, duration + +## Models + +- `tiny` - Fastest, use for quick drafts +- `small` - Good balance of speed/accuracy +- `medium` - Better accuracy, slower +- `large` - Best accuracy, slowest + +## Supported Formats + +MP3, WAV, FLAC, M4A, OGG, OPUS + +## Build + +```bash +cd /home/yeho/Documents/tools/transcribe +go build -o transcribe +``` + +## Dependencies + +```bash +pip install openai-whisper # Required +pip install resemblyzer scikit-learn librosa # For diarization +``` diff --git a/README.md b/README.md new file mode 100644 index 0000000..0a372be --- /dev/null +++ b/README.md @@ -0,0 +1,166 @@ +# Transcribe - Audio Transcription Tool + +A CLI tool for transcribing audio files using OpenAI's Whisper model with speaker diarization and multiple output formats. + +## Features + +- Multiple Whisper model sizes (tiny, base, small, medium, large, turbo) +- Speaker diarization using voice embeddings (resemblyzer + clustering) +- Multiple output formats: SRT subtitles, plain text, JSON +- Batch processing of multiple audio files +- Automatic language detection +- Progress indicators with spinners + +## Installation + +### Prerequisites +- Go 1.20+ +- Python 3.8+ +- FFmpeg + +### Python Dependencies +```bash +# Required for transcription +pip install openai-whisper + +# Required for speaker diarization +pip install resemblyzer scikit-learn librosa +``` + +Note: If `resemblyzer` fails to install due to `webrtcvad`, install Python development headers first: +```bash +# Fedora/RHEL +sudo dnf install python3-devel + +# Ubuntu/Debian +sudo apt install python3-dev +``` + +### Build from Source +```bash +go build -o transcribe +``` + +## Usage + +Output file (`-o`) is required unless `--no-write` is specified. + +### Basic Transcription +```bash +./transcribe audio.mp3 -o output.srt +``` + +### Choose Whisper Model +```bash +./transcribe audio.mp3 --model small -o output.srt +``` + +Available models: `tiny` (default), `base`, `small`, `medium`, `large`, `turbo` + +### Output Formats + +**SRT subtitles (default):** +```bash +./transcribe audio.mp3 -o subtitles.srt +``` + +**Plain text with timestamps:** +```bash +./transcribe audio.mp3 --format text -o output.txt +``` + +**JSON:** +```bash +./transcribe audio.mp3 --format json -o output.json +``` + +### Speaker Diarization + +Enable automatic speaker detection: +```bash +./transcribe audio.mp3 --diarize -o output.srt +``` + +Specify number of speakers for better accuracy: +```bash +./transcribe audio.mp3 --diarize --speakers 2 -o output.srt +``` + +### Print to stdout +```bash +./transcribe audio.mp3 --no-write +``` + +### Full Example + +Transcribe with speaker diarization: +```bash +./transcribe interview.wav --model small --diarize -s 2 -o interview.srt +``` + +Output: +``` +1 +00:00:00,000 --> 00:00:05,200 +[Speaker 1] Hello, how are you? + +2 +00:00:05,200 --> 00:00:12,300 +[Speaker 2] I'm doing well, thanks! +``` + +## CLI Reference + +``` +Usage: + transcribe