Ultra-low latency, locally hosted speech-to-text translation pipeline for streaming.
An ultra-low latency, locally hosted speech-to-text and translation pipeline. It captures local audio, detects voice activity using Silero VAD, and transcribes it using Faster-Whisper. Features hardware acceleration via NVIDIA GPUs over Docker.