feat: git init

2026-01-17 19:18:58 -06:00
commit b73d5b8078
18 changed files with 1274 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,166 @@
+# Transcribe - Audio Transcription Tool
+
+A CLI tool for transcribing audio files using OpenAI's Whisper model with speaker diarization and multiple output formats.
+
+## Features
+
+- Multiple Whisper model sizes (tiny, base, small, medium, large, turbo)
+- Speaker diarization using voice embeddings (resemblyzer + clustering)
+- Multiple output formats: SRT subtitles, plain text, JSON
+- Batch processing of multiple audio files
+- Automatic language detection
+- Progress indicators with spinners
+
+## Installation
+
+### Prerequisites
+- Go 1.20+
+- Python 3.8+
+- FFmpeg
+
+### Python Dependencies
+```bash
+# Required for transcription
+pip install openai-whisper
+
+# Required for speaker diarization
+pip install resemblyzer scikit-learn librosa
+```
+
+Note: If `resemblyzer` fails to install due to `webrtcvad`, install Python development headers first:
+```bash
+# Fedora/RHEL
+sudo dnf install python3-devel
+
+# Ubuntu/Debian
+sudo apt install python3-dev
+```
+
+### Build from Source
+```bash
+go build -o transcribe
+```
+
+## Usage
+
+Output file (`-o`) is required unless `--no-write` is specified.
+
+### Basic Transcription
+```bash
+./transcribe audio.mp3 -o output.srt
+```
+
+### Choose Whisper Model
+```bash
+./transcribe audio.mp3 --model small -o output.srt
+```
+
+Available models: `tiny` (default), `base`, `small`, `medium`, `large`, `turbo`
+
+### Output Formats
+
+**SRT subtitles (default):**
+```bash
+./transcribe audio.mp3 -o subtitles.srt
+```
+
+**Plain text with timestamps:**
+```bash
+./transcribe audio.mp3 --format text -o output.txt
+```
+
+**JSON:**
+```bash
+./transcribe audio.mp3 --format json -o output.json
+```
+
+### Speaker Diarization
+
+Enable automatic speaker detection:
+```bash
+./transcribe audio.mp3 --diarize -o output.srt
+```
+
+Specify number of speakers for better accuracy:
+```bash
+./transcribe audio.mp3 --diarize --speakers 2 -o output.srt
+```
+
+### Print to stdout
+```bash
+./transcribe audio.mp3 --no-write
+```
+
+### Full Example
+
+Transcribe with speaker diarization:
+```bash
+./transcribe interview.wav --model small --diarize -s 2 -o interview.srt
+```
+
+Output:
+```
+1
+00:00:00,000 --> 00:00:05,200
+[Speaker 1] Hello, how are you?
+
+2
+00:00:05,200 --> 00:00:12,300
+[Speaker 2] I'm doing well, thanks!
+```
+
+## CLI Reference
+
+```
+Usage:
+  transcribe <audio files...> [flags]
+
+Flags:
+      --diarize           Enable speaker diarization
+  -f, --format string     Output format: srt, text, json (default "srt")
+  -h, --help              help for transcribe
+  -m, --model string      Whisper model: tiny, base, small, medium, large, turbo (default "tiny")
+      --no-write          Print output to stdout instead of file
+  -o, --output string     Output file path (required)
+  -s, --speakers int      Number of speakers (0 = auto-detect)
+```
+
+## Supported Audio Formats
+
+MP3, WAV, FLAC, M4A, OGG, OPUS
+
+## Architecture
+
+```
+transcribe/
+├── cmd/
+│   └── root.go              # CLI commands and flags
+├── internal/
+│   ├── whisper/
+│   │   └── client.go        # Whisper Python bridge
+│   └── diarization/
+│       ├── client.go        # Diarization Python bridge
+│       └── align.go         # Speaker-segment alignment
+├── pkg/
+│   ├── audio/
+│   │   └── audio.go         # Audio file validation
+│   ├── output/
+│   │   ├── formatter.go     # Output formatter interface
+│   │   ├── srt.go           # SRT format
+│   │   ├── text.go          # Text format
+│   │   └── json.go          # JSON format
+│   └── progress/
+│       └── spinner.go       # Progress spinner
+└── README.md
+```
+
+## How It Works
+
+1. **Transcription**: Audio is processed by Whisper (via Python subprocess) to generate timestamped text segments
+2. **Diarization** (optional): Voice embeddings are extracted using resemblyzer and clustered to identify speakers
+3. **Alignment**: Speaker segments are mapped to transcription segments by timestamp overlap
+4. **Formatting**: Results are formatted according to the selected output format (SRT by default)
+
+## License
+
+MIT License - see LICENSE file for details.