Speech-to-text service powered by faster-whisper (CTranslate2 backend). Uses the large-v3-turbo model for fast, high-quality transcription with word-level timestamps.

Architecture

Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
                                      |
                                      v
                                 faster-whisper (GPU)
                                      |
                                      v
                                 Result (JSON)

ahserver: Web framework serving HTTP on port 9925
longtasks: Redis-backed async task queue with worker management
Redis: Task queue broker (queue name: asr)
faster-whisper: ASR engine running on GPU (CUDA, float16)

The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.

Model

Model: faster-whisper-large-v3-turbo-ct2
Path: /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2
Device: CUDA (float16)
GPU: Isolated via CUDA_VISIBLE_DEVICES (default GPU 5)

The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.

Deployment

Prerequisites

Python venv with faster-whisper 1.2.1: /data/ymq/demucs_venv
Redis server running on 127.0.0.1:6379
CUDA-capable GPU

Start

cd /data/ymq/asr-service
bash start.sh

Stop

cd /data/ymq/asr-service
bash stop.sh

Health Check

curl http://localhost:9925/health

Returns:

{
    "status": "ok",
    "service": "asr-service",
    "model": "faster-whisper-large-v3-turbo-ct2"
}

API Usage

Tasks are submitted via Redis, same pattern as wan22-service.

Submit a Transcription Task

import redis
import json
import uuid

r = redis.Redis(host='127.0.0.1', port=6379)

task_id = str(uuid.uuid4())
payload = {
    "task_id": task_id,
    "task_type": "transcribe",
    "audio_path": "/path/to/audio.wav",
    "language": "zh",
    "word_timestamps": True,
    "vad_filter": True,
    "output_path": "/tmp/asr-outputs/result.json"
}

# Push to the Redis queue
r.lpush('asr:queue', json.dumps(payload))
print(f"Task submitted: {task_id}")

Check Task Status

# Task status is stored in Redis by longtasks
status = r.get(f'asr:status:{task_id}')
result = r.get(f'asr:result:{task_id}')

Task Payload Format

Field	Type	Required	Default	Description
task_type	string	Yes	-	Must be `"transcribe"`
audio_path	string	Yes	-	Path to input audio file
language	string	No	`"zh"`	Language code (zh, en, ja, etc.)
word_timestamps	bool	No	`True`	Enable word-level timestamps
vad_filter	bool	No	`True`	Enable voice activity detection
output_path	string	No	-	If set, save result JSON to this path

Output Format

{
    "status": "ok",
    "text": "Full transcription text...",
    "language": "zh",
    "language_probability": 0.9876,
    "duration": 125.340,
    "segments": [
        {
            "text": "Segment text",
            "start": 0.000,
            "end": 5.120,
            "words": [
                {
                    "word": "你好",
                    "start": 0.000,
                    "end": 0.800,
                    "probability": 0.9523
                }
            ]
        }
    ],
    "processing_time": 3.45,
    "audio_path": "/path/to/audio.wav"
}

Configuration

Config file: conf/config.json

Setting	Value	Description
website.port	9925	HTTP listen port
website.host	0.0.0.0	Bind address
session_redis	127.0.0.1:6379 db=1	Session storage
password_key	ASRService2026Key	Auth key
filesroot	/tmp/asr-outputs	Output files directory

Environment Variables

Variable	Default	Description
ASR_GPU_ID	5	GPU device ID (for logging)
CUDA_VISIBLE_DEVICES	5	CUDA device isolation
PYTHONPATH	.	Python module search path

File Structure

asr-service/
├── ah.py                  # Main entry point
├── start.sh               # Start script
├── stop.sh                # Stop script
├── conf/
│   └── config.json        # Service configuration
├── app/
│   └── health.dspy        # Health check endpoint
├── workers/
│   ├── __init__.py
│   └── transcribe.py      # Transcription worker
└── README.md